





































































































































































































































































































































































































































































































































































January, 1994
EDITORIAL


BRIE Whiz, or Taking Stock of Your Options


Clinton and his cronies aren't kidding. They're dead serious about
technology--and nowhere was this more apparent than at the 1993 Technology
Summit hosted by the University of California's Berkeley Roundtable on the
International Economy (BRIE) and the U.S. Department of Commerce.
The unprecedented two-day symposium brought together industry, research, and
government leaders to begin defining the nation's technological future in an
effort to ensure U.S. worldwide technological superiority as we move into a
technology-based global economy. On the chance that things would get dull,
Commerce Secretary Ron Brown used the occasion (and the National Institute of
Standards and Technology's Advanced Technology Program) to dole out more than
$60 million in grants to high-tech companies. Brown could have saved his
breath and some of our money. At this conference, words really did speak
louder than spare change. Government leaders were there to listen, and they
got an earful.
Unlike previous administrations, this government sees technology as more than
just big toys for big boys. From clean cars to the information highway, high
technology is the keystone of Clinton's economic policy--it's the engine of
economic growth for creating new jobs, building new industries, and improving
our standard of living. BRIE codirector Michael Borrus echoed this in his
opening words, "there is no long-term low-tech prosperity for the American
economy."
Although there weren't enough soapboxes to go around, the consensus was that
continued technological leadership hinges on the U.S.'s commitment to improve
education, further research, strengthen intellectual property rights, provide
ready access to international trade, and foster a climate favorable to
entrepreneurship.
If any of these topics is uniquely American, it's entrepreneurship. Virtually
every technologically advanced country in the world supports education,
intellectual property rights, and access to trade. But unlike the U.S., other
countries haven't fostered a fierce spirit of entrepreneurship. That's how
Bill Gates was able to build Microsoft, how the two Steves created Apple, and
how Michael Dell was able to launch Dell Computer (or whatever it's called
this week) from his college dorm room. More telling, it's also why people
bitten by the entrepreneurial bug--Borland's Philippe Kahn and Logitech's
Pierluigi Zappacosta spring to mind--came to the U.S. to make their fortune.
As numerous attendees at the Technology Summit vigorously pointed out,
however, this entrepreneurial spirit is threatened by a bill currently before
Congress.
Particularly in the computer industry, stock options have been the tool that's
enabled start-up companies with limited resources to attract top-flight
talent. The promise of riches down the road and the chance to exercise
creativity has proven to be an irresistible magnet for those frustrated by the
stifling stability of large corporations. Granting stock options is one of the
few effective ways we've discovered of starting up a company while empowering
and involving employees. Thanks to stock options, every employee at Adobe has
an equity position, and one out of every twenty Microsoft employees is a
millionaire.
Ironically, changes to how accountants and revenuers treat stock options are
at the heart of the challenge to entrepreneurship. In a proposal by the
independent Financial Accounting Standards Board (FASB)--and championed in
Congress by Senator Carl Levin (D-Mich.)--stock options would be considered
compensation instead of incentive. Thus stock options, which give you the
right to buy shares at a specified price, would be charged against a company's
earnings when they're granted, not when they're exercised. If set into law,
the rule would likely eliminate or limit the granting of stock options for
employees at smaller companies. In one report, 90 percent of several hundred
start-up firms surveyed said that deducting the value of stock options from
profits would either cause them to eliminate options altogether or grant them
to only selected executives.
Standing up for start-ups are Senators Joseph Lieberman (D-Conn.), Dianne
Feinstein and Barbara Boxer (D-Calif.), and Connie Mack (R-Fla.), who've
introduced bill S.1175 to block the FASB's proposal and Levin's legislation.
The Lieberman bill, also known as the 1993 Equity Expansion Act, is still in
committee, but expected to come to a vote this session. S.1175 would create a
new employee stock option allowing tax-free exercise of options, cut in half
the capital-gains tax on sale of the stock held for two years or more, and
allow companies offering options to forgo their own options-related tax
deductions, enabling them to continue offering deductible stock options.
Entrepreneurship as expressed in law and spirit is one edge the U.S. enjoys
over its increasingly competitive international trading partners. If Clinton
and Gore are indeed serious about the U.S. maintaining technological
superiority, denouncing the FASB/Levin proposal and backing S.1175 are
opportunities to put words into action.
Jonathan Ericksoneditor-in-chief











































January, 1994
LETTERS


Microsoft Answer to AARD




Dear DDJ,


The lawyers have finally given me the green light to describe why the MS-DOS
detection code discussed in the article "Examining the Windows AARD Detection
Code" by Andrew Schulman (DDJ, September 1993) was in the Christmas beta. I
hope you will keep an open mind, listen to the truth, and accept it. It may
not make such good press, but sometimes the truth is like that.
It has never been a practice of this company to deliberately create
incompatibilities between Microsoft system software and the system software of
other OS publishers. I am not aware of any instance where Microsoft
intentionally created an incompatibility between Windows and DR DOS. Windows
is tightly coupled to the underlying MS-DOS operating system. It relies on a
number of very precise behavioral characteristics of MS-DOS which have nothing
to do with the Int 21h API. Because of this tight coupling, an MS-DOS
imitation must have exactly the proper behavior, or all sorts of subtle and
not-so-subtle problems will occur, including data loss.
Microsoft does not test Windows on anything other than Microsoft's MS-DOS. We
don't have the development or testing resources, nor do we consider it our job
to test Windows on other systems. If you're the developer of an MS-DOS
imitation, you shouldn't expect your main competitor to do your work for you.
If Windows works on your imitation, it works; if it doesn't, it's your problem
to fix. That may not give you, Andrew, the warm and fuzzies, but this is
business, not a giveaway.
During the developing of Win 3.1, a great deal of thought was given to ways to
reduce the high support burden associated with Windows. During the betas, we
got a few bug reports about Windows not working correctly on some of the
MS-DOS imitations. So it seemed like a very small portion of the market might
have problems running Win 3.1 on something other than genuine MS-DOS. In order
to be fair and up-front with them, we considered that it might be a good idea
to let them know--before they encountered problems or even data loss--that
they were running Win 3.1 on a system we hadn't tested. The intended purpose
of this disclosure message was to protect the customer and reduce the
product-support burden arising from the use of Windows on untested systems.
The plan was to include an "off switch" in the commercial release that the end
user could use to prevent the message from being redisplayed every time
Windows was run.
In order to preserve the option of putting a disclosure message in the
commercial release of Win 3.1, some MS-DOS detection code was implemented and
inserted into the relevant modules of the "Christmas" beta. This code only
detected the presence of MS-DOS; it did not detect any competing OS.
The wording of the message that was displayed if something other than MS-DOS
was detected in the Christmas beta has been the subject of accusatory
speculation. Our intention for the final release was to warn the user that
Windows (and that includes all Windows components) is being run on a system we
have not tested. The message in the beta, however, was carefully crafted to
produce a desired effect. Since this code was inserted very late in the
development schedule, we were very concerned about making sure it worked
properly, and especially that it did not have "false positives," i.e., that it
did not "misfire" when there really was genuine MS-DOS underneath. As a
result, we wanted to make sure that anytime it triggered, the beta tester
would call us so we could follow up and confirm that the code was reliably
detecting MS-DOS, or if instead it was returning false positives. In fact, the
message says to contact the Win 3.1 beta support.
The language of the message was not alarming; it did not mention the nature of
the "nonfatal error" nor the name of any competitor. Moreover, the message
either disappeared in a matter of seconds or with a single keystroke. Nor did
the message stop Windows from running.
Of course the code was concealed. This should not be surprising at all. If it
can be easily circumvented by an imitation (which I remind you we haven't
tested against), then its purpose has been defeated.
Neither the detection and concealment code nor the nonfatal-error message
created any incompatibility with DR DOS.
Prior to the March 9, 1992 RTM date for Win 3.1, we decided not to include the
disclosure message in the commercial release of the product because we didn't
want to run the risk that it would be misinterpreted and thus divert attention
from the new features of Windows 3.1. We were in a tough competitive battle
with OS/2 and wanted the attention focused on the great new features of Win
3.1, rather than artificial "controversy" whipped up by the press or our
competitors.
In fact, the planned disclosure message was never coded into the product.
Because this decision was made so late in the development cycle, and we didn't
want to risk introducing instability into the product, we left the detection
and concealment code and the nonfatal-error message in the product, but
disabled it from printing onscreen. As a technical person, Andrew, you know
that a NO-OP is a NO-OP. Even though the code remains in Win 3.1 in a
"quiescent" state, the fact remains that no messages are printed. You
insinuate that we could somehow, sometime "turn it on." How? ESP? Remote
control? If we could get people to execute a patch that would turn the code
on, we could certainly figure out a way to patch the whole thing in.
Finally, the detection and concealment code and the nonfatal-error message
code have been stripped out of the versions of Windows currently under
development. That's the story. Surely not as interesting or controversial as
you or others would have people believe, but it's what really happened.
Brad Silverberg, Vice President
Microsoft Corp.
Redmond, Washington
Andrew responds: Thanks for your thoughtful explanation to the AARD code,
Brad. As you've also told me in person since the article appeared, very late
in the beta, Microsoft decided against displaying a message when running
Windows on a non-Microsoft DOS. At this late stage, Aaron Reynolds (author of
the AARD code) needed to produce the smallest-possible binary "diff," so he
cleared a control byte rather than removing the code.
It's also noteworthy that you point out that the wording of the beta message
differed from that intended for the Windows 3.1 retail. The beta message was
intended to test Aaron's tricky code and to weed out what you, Brad, call
"false positives." (In retrospect, this code couldn't have worked 100 percent
of the time, as an April 1993 Microsoft KnowledgeBase article, "Replace Case
Mapping Function with Proprietary Version," notes that a DOS program can hook
INT 21h AH=38h and replace the built-in case mapping function.)
But even if I was off-base about why the code was left in the retail version
and the wording of the error message in the beta, this doesn't change anything
substantial, and in some ways your explanations are more damning than my
original article. You say that Microsoft intended to put into Windows 3.1 a
warning to the user that running Windows on a non-Microsoft DOS was "untested"
and might cause data loss. The intended error message would have looked
something like the following:
WARNING: This Microsoft product has been tested and certified for use only
with the MS-DOS and PC-DOS operating systems. Your use of this product with
another operating system may void valuable warranty protection provided by
Microsoft on this product.
This is not a hypothetical error message, but one that several versions of
Microsoft C produce when running on non-Microsoft versions of DOS. (This error
message and the code that produces it are discussed in Undocumented DOS,
second edition, Chapter 4.) From your letter, it appears that Microsoft
intended to hook the AARD code up to a similar message in Windows 3.1.
But while such a message seems benign to you, the word "untested" could have a
chilling effect on the typical user. As long as Windows and DOS are sold as
separate products, it would be a classic tying arrangement for Windows to
scare users of other DOSs with blanket statements that something might be
wrong. Novell rightly characterizes such manufactured error messages as
"product disparagement." If Windows has some special expectations of the
underlying DOS, then the Windows group should publish a specification saying
what those expectations are. If DR DOS isn't up to spec, then so be it. It all
comes back to the issue of documentation: If Windows has special needs from
DOS, Microsoft should specify what those needs are. When I recently asked you
this, you replied that Microsoft can't afford to spend time on such
documentation--there are only 300 people working on DOS and Windows. I'm not
sympathetic to this argument.
Microsoft says that Windows and MS-DOS are integrally tied, that they're
designed to work as a "seamless" team, and so on. Remember, though, that these
products are sold separately. For this discussion, Windows is just another DOS
utility, like DesqView. Windows' reliance on undocumented DOS functionality
should be viewed in the same light as the use of undocumented functions in
Excel and WinWord that Microsoft denied so strenuously and for so long.
Yet here, you're not denying that Windows exploits features of DOS that
Microsoft refuses to document for third parties. In fact, it is the entire
premise for your argument: Windows "relies on a number of very precise
behavioral characteristics of MS-DOS which have nothing to do with the Int 21h
API." In other words, Windows uses undocumented DOS!
Let me see if I can summarize Microsoft's argument: Windows relies on
undocumented DOS. Microsoft can't guarantee that a non-Microsoft DOS will
support these undocumented DOS features, so it came up with a test that it
knew non-Microsoft DOSs would fail, and that, by encryption, it hoped the
vendors of these DOSs would never figure out. Thus, it could detect
non-Microsoft versions of DOS and put up a message telling the user that maybe
Windows might possibly not work on these untested environments. We're supposed
to feel better about this?
The good news is that the forthcoming "Chicago" operating system solves the
problems posed by the AARD code. As you say, the AARD code has been removed.
More important, Windows 4 will not be sold separately from the under-lying
operating system. The Windows component of Chicago requires DOS 7.0 or higher,
which is the other component of Chicago. This will make life difficult for
Microsoft's competitors, but this is an intrinsic difficulty, not a contrived
one. Contrast the simplicity of Chicago's up-front check for a required DOS
version number with the AARD code's encrypted and obfuscated test for
arbitrary aspects of undocumented DOS, and you will see that there is a right
way and a wrong way for a product to have system requirements.


AARD: Credit Where Credit Is Due




Dear DDJ,


In his article "Examining the Windows AARD Detection Code" (DDJ, September
1993), Andrew Schulman graciously credits me with having "unraveled" part of
the AARD code. Although I'm certain that Andrew analyzed the rest of the code
independently, I should like to claim prior discovery of the workings of the
whole code. In fact, I posted a rough but comprehensive description,
identifying all tests and (briefly) raising many of the points taken up in
Andrew's article to the Windows/development conference of the British-based
bulletin board CIX on June 7--8, 1992.
Of course, Andrew deserves applause for bringing the AARD code to a wide
audience, but since he has relied on some of my work, propriety demands at
least a note that my analysis predated his, perhaps with the explanation that
my findings had not been as widely disseminated.
I contend personally that Microsoft deserves strong condemnation for the mere
existence of the encrypted detection code and a disguised, misleading error
message. This is one reason why I told Andrew and others of it soon after its
discovery--before knowing that DR DOS ran foul of the tests. In this light,
Andrew's omission gives the impression that the AARD code is important only
because Novell was inconvenienced.
Geoff Chappell
London, England
Andrew responds: I'm sorry that Geoff feels my article did not sufficiently
stress his priority. The article twice stated that I never would have figured
out the crucial redirector/case map/FCB test if Geoff hadn't explained it to
me. The article gave Geoff's e-mail address. Unfortunately, an additional
reference to Geoff as "a master of disassembly" and a plug for his forthcoming
book were removed during editing. The second edition of Undocumented DOS, from
which portions of the article were extracted, refers constantly to Geoff.
While Geoff told me about the code in April 1992, it seemed like he was
telling me about some obscure aspect of WIN.COM and HIMEM.SYS that did not
sound very important. The reason I eventually looked at the AARD code was that
I had heard vague reports of incompatibilities between Windows and DR DOS, and
these didn't make any sense to me. A reporter, Wendy Goldman Rohm, informed me
that the FTC was looking into something having to do with Windows 3.1 betas
running on DR DOS. I dug out all my beta copies, found the message, and worked
backwards from there. I wanted to independently confirm or deny what I knew
the FTC was already looking at.
I doubt Novell ever figured out the crucial test involving the redirector,
default-case map routine, and location of the first System FCB. The most
recent beta of Novell DOS 7.0 I examined still did not contain the necessary
minor adjustments to pass the AARD test. Thus, I suspect Novell first learned
of the crucial test from the DDJ article. And the article made quite clear
that this information came from Geoff.
What I said in the article is accurate: Geoff uncovered the crucial part of
the code. Surely it was Novell who, for what it's worth, can claim priority to
uncovering the noncrucial parts.





January, 1994
Shared Memory and PC Supercomputing


Achieving gigaflop performance on a PC




Stephen Fried


Steve is the president of Microway Inc. and can be reached at P.O. Box 79,
Kingston, MA 02364 or at steve@microway.com.


Shared-memory parallel processing is a supercomputer technique used by
companies like Cray to build very high-speed computational systems.
Shared-memory systems typically employ four to sixteen processors, with the
shared memory acting as both a connectivity element and a repository for
information. All CPUs in such a system can read or write the shared memory but
often employ a hierarchy of other devices to store data--vector registers,
data caches, local memory, and crossbar switches. Ideally, these devices
reduce average data access time. The more often an item is accessed, the
closer it will get placed to the internal units that do the actual
computations. Traditionally, shared memory has been limited to supercomputers
and super-minis, but it's now making its way into the PC world. This is
because of the connectivity advantages it offers over serially linked,
distributed-memory parallel architectures which are often too weakly connected
to execute fine-grained parallel problems efficiently, especially when the
CPUs have both scalar and vector facilities.
In addition to improved connectivity, shared-memory systems are easier to
program because the data stored in shared memory can be mapped directly into
the global storage structures of high-level languages (Fortran COMMON, for
example). Many programmers get reasonable efficiency porting their code to a
shared-memory parallel environment by simply using shared memory to hold
COMMON and local memory to hold the stack and data structures that are
repeatedly accessed. A drawback to shared memory, however, is that the number
of CPUs which can be connected together is finite. This isn't a problem for
distributed-memory systems, where interconnection bandwidth often scales with
the number of CPUs added to the system. This is one of the reasons companies
like IBM, with its networks of RS/6000s, and Intel, with its hypercube, have
heavily promoted distributed topologies. From a practical perspective,
however, parallel versions of many large applications were written for
shared-memory supercomputers, and the techniques used to code shared memory
often don't map well to distributed-memory architectures. Furthermore, most
heavy-duty parallel code is written in Fortran, which, unlike C, doesn't lend
itself to distributed processing. In short, there's a schism between
distributed- and shared-memory parallel processing that hinges, in part, on
the higher-level language of choice.


Shared vs. Distributed Memory Coding Concerns


From the programmer's perspective, there are differences between the
shared-memory and distributed-memory approaches to implementing algorithms.
The common matrix multiply in Figure 1, for example, depicts three arrays, A,
B, and C, stored in the memory of both types of systems just before each of
the nodes on the system starts its portion of a matrix multiply. (Figure 1
also shows the number of data-bus or signal lines used to connect the parts of
each system together and the bandwidth of each line in Mbytes/sec.) In both
systems, A has been divided into quarters. In shared-memory systems where
crossbar switches are used to connect multiple banks of memory to multiple
processors (not shown), this split isn't needed. However, crossbar switches
are prohibitively expensive for the PC market. The key to an efficient
parallel matrix multiply is keeping the quarters of A in local RAM. In the
distributed-memory system, this is done using message-passing techniques over
serial lines. In a shared-memory system (such as the QuadPuter860, an
i860-based EISA add-in board my company produces), it's done using block moves
from an address in shared memory to local memory. The distributed-memory
system also has to scatter the rows of B and gather back the resulting rows of
C. The secondary scattering and gathering of data in the shared-memory system
is invisible to the program. The rows of B end up in local caches, and if C is
stored in noncached shared memory to begin with, it will end up there at the
end of the computation. Whichever technique, A must find its way into the
highest-speed memory in the system and the rows of B must end up cached
between uses.
To determine how efficiently an algorithm runs on a parallel machine, you need
to know how long it takes to run on a single CPU, divide by the number of
processors, then add in the time to sow the input and harvest back the output.
Frequently, the time to start a process can also be important. An i860 can
complete a 1024 complex FFT in less than a millisecond: If it takes 15
milliseconds to start a process, the system isn't suitable for doing parallel
FFTs. Two of the measures of merit of a parallel system are shown in Table 1
(the top four rows were compiled by Adam Kolawa of Parasoft). The latency is
the time required to send a null message and receive back a reply without
doing a calculation. The bandwidth of the shared memory (exemplified in the
QuadPuter-860) beats most of the distributed-memory systems when it comes to
interprocess transfers. Shared-memory supercomputers from companies such as
Cray, and i860 VME boards from companies such as Sky use crossbar switches
that run at bandwidths over 500 Mbytes per second. While the QuadPuter's
five-way arbitrated 64-bit memory probably runs an order of magnitude slower
than these heavily connected crossbar systems, the QuadPuter still fares well
if you compare its throughput-to-interconnect bandwidth with that of the more
expensive systems. The QuadPuter-860 even compares well with i860
supercomputers such as the Intel Delta.


Shared COMMON Memory


Central to utilizing shared memory with Fortran is the i860's virtual memory
and Fortran COMMON. Most i860 systems virtualize addressing using the i860's
page tables, similar to those of the 80486. Here, physical memory ceases to
become a resource of the user, instead becoming an operating-system resource.
The trick to using COMMON as an interface between processors running in
parallel is to dedicate a region of the shared physical memory to a
virtual-address region I call "shared COMMON." This region of memory in the
parallel version of OS-860 (Microway's single-threaded kernel) is mapped with
a single page table used by all four modules. All accesses to a virtual
address in the shared COMMON region are made to the same physical-memory
location. Once shared COMMON is implemented, other resources that need to be
made available to complete a parallel-Fortran environment include the ability
to send/receive signals between CPU modules and to control whether the pages
in shared COMMON are cacheable or not.


Root Nodes and Farming


Since much of the PC parallel market uses MS-DOS, we've developed a
methodology for writing shared-memory parallel programs using NDP Fortran
(again from Microway) and its i860 run-time support. The technique uses OS-860
to manage the i860's memory, exceptions, and the I/O interface, with DOS
running on the host. We borrowed a page out of Inmos T8 history by employing
one of the i860 modules as the "root" node. This special node manages all
communications between the T800s in a Transputer network and the i860s on the
QuadPuter-860. The implication here is that only procedures running on the
root node can open or close files or use I/O statements like WRITE or
printf(). All other modules have to send messages to the host through the root
if they want to communicate with the user. This isn't a limitation if the main
applications are numeric intensive.
To illustrate how parallel processing algorithms are organized, I again borrow
from Inmos history. The most common metaphor in the transputer business was
that of a "farm"; see Figure 2. Since most problems could be mapped onto a
"transputer farm," this became the methodology of choice for users who didn't
want to learn Occam or get a computer-science degree just for parallel
processing. In this paradigm, the root node manages the show, farming out
tasks to the four worker nodes, and then harvesting the results. The typical
Inmos analogy described farms as a cottage industry, such as sweater knitting.
In this model the root node dispatches yarn that gets knitted into bodies,
sleeves, and necks by worker nodes. The parts are then passed to a final
worker and sewn into sweaters. In many parallel applications, the
post-processing stage is nonexistent, and often the root will join the workers
after the initial distribution is complete.


Matrix Multiplies on the Farm


The key to doing fast parallel matrix multiplies is having the columns of A
stored in high-speed memory. If you examine the problem for vector lengths of
100, for every access of a row of B, you'll make 100 accesses to a column of
A. For vector lengths of 40 or less, all of A and at least a row of B will fit
into the data cache, and you can run your program out of shared memory without
having to worry about data flow. Each processor does one fourth of the dot
products, independent of the data-flow strategy. When the vector lengths grow
larger than 40, you'll get significant shared-memory and cache thrashing
unless you add some sophistication. You can either break A into quarters and
move it into shared memory, or "strip mine"--a supercomputer technique in
which the data is processed in strips. In this case, each of the CPUs
processes strips of A, each containing 2000 elements, or 20 columns. During
the first part of a strip-mined matrix multiply, each of the processors
multiplies the 20 columns in its cache by all the rows of B. During the second
half, the five columns that didn't get processed by the first iteration also
get multiplied by all the rows of B. The disadvantage of strip mining as
opposed to distributing A into local memory is that all the rows have to be
read in for every iteration. As vector lengths increase, a point will come
when a jump must be made to storing quarters of A in local memory. As the
problem continues to grow even larger, B won't fit in the data cache and the
code will have to strip mine B, reading A out of local memory.


A Simple Farming Example


To demonstrate how parallel applications are written with Fortran without
resorting to message-passing, I'll examine the parallel computation of
fourth-order polynomials using 1000 values of x and 1000 sets of coefficients;
see Example 1(a). Our ultimate goal is to make a call from an ordinary main
program which carries out this task on four processors, using a worker routine
which looks like the original Fortran code.
The worker routine runs on the worker modules and the root itself, which calls
a copy of the worker procedure after it has started up the worker modules.
POLY_WORKER in Example 1(b) computes 1000 polynomials for a single set of
coefficients. It's called as part of a more elaborate process which divides up
the tasks among processors and makes calls to the four CPUs with different
sets of coefficients.
Distributing four copies of POLY_WORKER and arranging things so that two
nearly identical pieces of code can be used on both a single processor and
parallel processors is easier said than done. Between the root node (which
controls the show) and the worker nodes (which get things done), there's a
trail of procedures which set the stage in Example 1(b). The Fortran routines
presented here are examples of early QuadPuter code. Since then, we've
developed a remote procedure call (RPC) model which uses assembly-coded
finite-state machines to hide the Fortran details I'm about to discuss.
However, this Fortran code demonstrates how to use signals and shared memory
for parallel processing and runs much faster on our hardware than
more-sophisticated message-passing techniques which bog down on the i860.
The POLY_WORKER code on each of the worker modules gets called by set-up
routines which manage communications with the root module and interface shared
memory for POLY_WORKER. Here, there's no movement of arrays from shared to
local memory, resulting in a simple WORKER_SET_UP routine. A worker routine's
code begins by waiting at a "start-up barrier" for a signal from the root.
When this arrives, it reads shared memory for the coefficients needed by
POLY_WORKER along with the start and end indexes and pointers to the input X
array and the output RESULT array.
The counterpart of WORKER_SET_UP on the root, which sets up the shared-memory
control block, is ROOT_WORKER_START. The array of shared-memory data
structures used to control the workers is the module control block (MCB),
located in shared memory, and defined using a Fortran STRUCTURE; see Example
1(c). It contains both the system-wide parameters needed by the worker module
and the application-specific information needed by the worker routines.
Integers hold pointers to reals (passing pointers in reals is a bad idea in
any language). On the root side, I obtain the addresses of arrays like X and
RESULT using a VMS Fortran extension, %LOC. On the worker side, I fake out the
next module by passing these addresses by value to POLY_WORKER (the Fortran
default is to pass by address). This is done using the %VAL VMS extension.
Every language used for developing real-world programs ultimately acquires
these little tricks that make it possible to get things done without switching
tongues.


A Parallel Fortran Run Time



The goal of implementing shared memory on the QuadPuter was to make it
possible to run fine-grained vector problems (those in small vector loops can
be parallelized) on one to four i860s. To write code that executes on one to
four worker modules, it's necessary for the system to know how many workers
there are and which module is currently the root. This information makes it
possible to write signaling primitives that deal with processors on a logical
basis, even though the ultimate code executes using lower-level physical calls
("root wait for worker three" translates into "module 4 wait for module 1," in
a system where the root is module 4 and the third worker is worker 1). Logical
references make it easier to write code. For example, when you're located on
the worker (which is how you feel when you have to write or debug worker code)
and you want to send a signal to the root, you shouldn't have to determine who
you are and who the root is, but just call a routine which gets the job done.
Given these facilities, it's possible to construct worker routines which run
on either the root or a worker and which carry out the appropriate tasks.
Logical signaling routines require that when a program gets started, it have
some knowledge of the system. The problem is that until all modules in a
system have started, the loader won't know what the system contains. There are
a couple of solutions to this: You could write a worm-like facility (worms are
virus-like programs which can move from processor to processor in a network)
or parse the command line at run time. The technique I used to load programs
and build a picture of the system was to have each module leave some
information behind in shared memory during its start-up phase. When you
combine this "left behind" information with the fact that each module is
passed its module number on its command line, and that the last module loaded
is by definition the root, you have all the ingredients needed by the root to
fill in the module-control blocks with the information which describes the
physical system.
The formal technique used to implement the start-up procedure starts with the
passing of a filename to the shared-memory parallel version of RUN860
(Microway's "alien file" server which runs on the DOS host and provides the
i860s with basic input and output services). RUN860 is passed a file
containing a command line for each i860 in the system. Each line specifies the
application being loaded on the module, module number, physical location of
the module's shared-memory resources that will get allocated for local storage
(other than local memory), and whether or not to initialize all of shared
memory prior to load. The last line in the command file also tells the system
how much shared memory to allocate to "shared COMMON"--the region of memory
that user programs will be able to address using identical virtual addresses.
This line in the command file is always the root, as it is always the last
module to get loaded and started.
By running several applications in succession and only initializing common
memory when the first application is run, you can build parallel applications
which act as numeric filters. In this mode, results are left behind in the
shared COMMON region from prior runs, which get processed in succession. This
eliminates spooling intermediate results to a hard disk between the stages of
a pipelined computation. For many problems, the time it takes to load an Mbyte
or less of code is much less than saving temporary data to disk, then reading
it back in for the next phase.
Throughout this article, I've discussed a hypothetical declaration of a
special type of COMMON which gets handled by the Fortran compiler and linker
in a special manner. While this has been implemented, a better technique is to
build up this region by a program which executes before the Fortran MAIN
program starts. You can think of this routine as an overlay-style array
manager. If you're running large applications, COMMON's status is troublesome.
While a C-like malloc() approach could be taken, that wouldn't work either, as
heaps are designed to hold hundreds of small items. A better approach is to
build a database describing the arrays being processed, then use this database
to carve up shared memory before the Fortran MAIN starts execution. This gives
you shared COMMON divided into a static region and one or more reusable
regions, each of which can be used by various stages of an application.
When the MAIN program starts in this scenario, the addresses of arrays are
passed to it as if MAIN were a Fortran procedure. The arrays in regions that
may get reused will have overlapping addresses. The presumption is that the
user will finish one of these phases before starting the next and that he will
store data which has to survive across phases in an area allocated for static
storage.
The Fortran MAIN program isn't the first to execute because what we've done is
write a Fortran "parallel run time" in Fortran instead of in C, which would
expose it to the user. As it turns out, Fortran MAIN programs and C main()
procedures are never the first routines in an application to execute. They're
always called from a UNIX-like CRT0 procedure. Because all the parallel
Fortran modules start with a knowledge of the system, it's possible to build a
single executable which runs on both the root and worker modules. While this
approach wastes memory, it simplifies debugging and code development.
Figure 3 shows the structure of the program that emerges and a map of shared
COMMON. As you can see, there's a strong symmetry between both sides. The
symmetry breaks down for those tasks that can only be carried out by the
host--the initialization of the MCBs and post-processing of data left behind
in COMMON after the completion of a parallel task.


Performance and Conclusions


Figure 4 plots the efficiency of a matrix multiply vs. vector length. Below
the vector length, I've listed the time for each i860 to do its share of the
computation and the total cycle time. The time to do a dot product grows as
the square of the vector length n, while the I/O grows linearly with n. As a
consequence, large matrix multiplies can be efficiently parallelized, and the
efficiency grows to over 90 percent by the time n hits 200. Actually, we've
discovered that with users who use the QuadPuter and DOS, typical scientific
problems frequently depend more on vector adds and multiplies than dot
products. Since these primitives are less memory bound, they tend to
parallelize more easily. Typical of our users is a crystallographer whose
public-domain code improved by a factor of two after vectorization and a
factor of three after parallelization. This code runs in 19 seconds on the
QuadPuter and takes 30 seconds on his RS/6000-550.
We designed the QuadPuter to maximize throughput per EISA slot, the goal being
to build a system which used five cards and had a throughput of one gigaflop.
This was accomplished through the use of cool-running 25-MHz i860s. The time
to execute a parallel program on the QuadPuter has computation and
shared-memory communication components. Problems which vectorize easily
sometimes present a challenge to the communications component, and require the
use of memory-partitioning schemes like that in Figure 1. Algorithms that
don't vectorize well or are scalar bound are less communications bound and
parallelize efficiently using Fortran COMMON mapped to shared memory.


Vector vs. Scalar and Superscalar Devices


Moore's law states that the number of gates available on a square centimeter
of silicon doubles roughly once per year. This miracle of technology has
brought the cost of a megabyte down from $1 million to just $50 over the last
12 years. It's also made possible the ability to go from relatively simple
CPUs, to devices that run at the speed of a Cray.
The Crays of the early '80s consumed 150 KW, yet had less numeric throughput
than a 40-MHz i860. What's made devices like the i860 so powerful is that it
became possible to pack pipelined numeric devices, data caches, code caches,
and 32-bit RISC processors onto a single die. The multipliers on devices like
the i860 are, in fact, called "Cray multipliers" (named for Seymore Cray, who
invented them when at IBM). Cray multipliers are square silicon devices that
take their input operands in on two sides and emit the result on the other
two. The results literally flow from the input sides to the output sides. The
larger the multiplier, the faster the device that employs it. Floating-point
units use Cray multipliers to do the mantissa arithmetic and are sized so this
arithmetic flows in a single cycle for single and possibly double precision.
These and other flow-through devices have grown as big as they're going to
grow. The only way to speed up numeric devices today is to employ more of them
running in parallel. The i860 does this by stacking up the adder and
multiplier in various pipelined combinations. The other approach to this
problem is to use multiple, scalar numeric units. The only other easy way to
speed up devices is to increase the size of the data cache.
The main problem with superscalar architectures is writing code for them. Of
course, if Intel is on the ball, you probably won't have to. My guess is that
Intel will follow the lead of Inmos and initiate multiple scalar operations in
a single cycle using a long instruction-decode pipeline. In a superscalar
device, it won't be necessary to write pipelined code, although it might be
necessary to properly schedule code if you want to take full advantage of the
units.
Figure 5 compares the operation of the i860 instruction used in a dot product,
with the execution behavior of a hypothetical superscalar device. The i860
instruction is one of the 32 pipelined numeric instructions which specify how
the i860's pipelined multiplier and adder feed each other. In this case,
operand pairs are pumped into the multiplier once per cycle. They then pass
through the multiplier in three cycles and out into the adder, which also
takes three cycles to accumulate a result. In a dot product, the results
circulate in the adder stages until the entire product is complete, at which
point they get added up and stored in memory. The latency through the six
stages is six cycles, but the rate at which operands are accepted is one pair
per cycle. This translates into two numeric operations per cycle, or 50
megaflops at 25 MHz.
The four units of the superscalar device can be divided. My guess is that an
on-chip scheduler would discover when the addresses of operands for
instructions in the decode queue weren't aliased and issue instructions as
fast as possible to the available numeric units. In this case, the processor
would start the first multiplication in unit #1 and start a second in unit #2,
one cycle later. Since these are scalar numeric units instead of pipelined
units, they should permit a faster two-cycle operation. This means that on the
third cycle, you could repeat the process. The actual instruction used for a
dot product might be a multiply/accumulate similar to that developed for the
Weitek 1167/4167 numeric units. This instruction automatically takes the
output of a multiplier and accumulates it in a specific register until the
algorithm is complete. In this case, we'd accumulate partial dot products in
two registers. Alternatively, some logic in the scheduler might deduce this
fact and use the adders as accumulators. Instead of writing a complicated,
pipelined, primitive operation that involved 16 to 32 floating-point
registers, the superscalar code would use just two lines in its inner loop:
reg1=reg1+x(i)*y(i)
reg2=reg2+x(i+1)*y(i+1)
It might not even be necessary to use registers to hold the dummy arguments
used to represent the contents of the adders. One of Intel's goals appears to
be replacing registers with memory. This is also Inmos's goal and should be
the goal of any CISC company that has to compete with RISC. By making memory
operations run at the same speed as register operations, it's possible to
build CISC chips competitive in speed with RISC. However, this speed probably
comes at a cost in die area which will translate into dollars. On the plus
side, the scalar devices mostly run in two cycles, and don't have to be
pipelined to yield decent performance--they can get excellent performance
executing small regions of scalar code which may not be vectorizable but do
have significant numeric content.
In the end, the four numeric units of this hypothetical device are just as
efficient at dot products as an i860, which uses two units. However, they're
faster at scalar-bound calculations, could be easier to write code for, and
could often execute small sections of scalar code at vector speeds. There are
two ways they might lose: if the less-complicated nature of RISC devices makes
it easier to port RISC technology to smaller geometries that allow you to
increase the clock speed more often; or if the smaller size of a RISC device
makes it possible to place several RISC devices on the same die. Another
problem could be CISC's extra logic, which burns more power. The four numeric
units on the right-hand side of Figure 5 take almost twice as many gates as
the two units on the left. This causes heating problems that prevent the
devices from keeping up with clock-speed increases. Only time will tell if the
Intel superscalar CISC gambit will pay off.
--S.F.
Example 1: (a) Parallel computation of fourth-order polynomials using 1000
values of x and 1000 sets of coefficients; (b) POLY_WORKER carries out the
task of computing 1000 polynomials for a single set of coefficients; (c) the
module control block structure.
(a) N = 1000
 DO 100 I= 1,N
 DO 100 J= 1,N
 100 Result(j,i) = a(i)*x(j)**4 + b(i)*x(j)**3 +
 & c(i)*x(j)**2 +d(i)*x(j) +e(i)

(b) Subroutine POLY(x,result,N,a,b,c,d,e,iend,istart)
 real*4 x(N),result(N),a,b,c,d,e
 integer*4 N,iend,istart
 DO 100 J= istart,iend 100 Result(j) = a*x(j)**4 + b*x(j)**3 +
 & c*x(j)**2 + d*x(j) +e
 END

(c) STRUCTURE/Module_control_block/
 integer*4 istart
 integer*4 iend
 integer*4 x_ptr
 integer*4 res_ptr
 real*4 a
 real*4 b
 real*4 c
 real*4 d
 real*4 e
 END STRUCTURE
 RECORD /Module_control_block/MCB(4)


Table 1: Interprocessor-communication bandwidths and latencies.

System Latency Interprocessor Rate
RS/6000 Ethernet 3.5 msec 50--100 Kbytes/sec
DELTA 130 msec 5--7 Mbytes/sec
RS/6000 Bit-3 240 msec 10 Mbytes/sec
IBM V-7 140 msec 3--5 Mbytes/sec
T800 <3 msec 1.5--6 Mbytes/sec
QuadPuter-860 10--20 msec 67 Mbytes/sec
 Figure 1: Matrix-multiply data storage using distributed- and shared-memory
architectures.
 Figure 2: A six-process/processor farm.
 Figure 3: Flow of parallel Fortran shared memory with inset of shared-memory
organization.
 Figure 4: Efficiency vs. vector length for single-precision matrix
multiplies.
 Figure 5: Comparing dot products; (a) i860 Vector unit executing m12apm; (b)
a pair of superscalar units executing multiply accumulate at the same rate as
m12apm.


















































January, 1994
CPU Performance: Where Are We Headed?


Doubling down on your microprocessor bets




Hal W. Hardenbergh


Hal is a hardware engineer who sometimes programs. He is the former editor of
DTACK and can be contacted through the DDJ offices.


I am not here to tell you how wonderful the massively parallel personal
computer system you'll buy next year will be. Or how wonderful your next
personal symmetrical multiprocessor will be. The thing is, your operating
system and application programs--compilers, editors, spreadsheets, CAD,
schematic entry, communications--are all scalar programs, and scalar programs
don't work well on parallel processors. So let's get real. Let's discuss what
improvements we can expect in our personal uniprocessor computer systems and
what factors control those improvements.


Nothing Doubles Forever


We've benefitted from long-term hardware trends with well-established doubling
rates. The predominant trend has been CPU performance doubling every two years
since electronic computers were invented nearly a half century ago. The second
most important trend (at least for you OOP programmers who have an eye on the
new operating systems with voracious memory appetites) is memory capacity,
which doubles every 1.5 years.
Unfortunately, nothing doubles forever; both of those trends have ended. See
Figure 1: Those of you who are waiting for 64 Mbytes of DRAM (the minimum
requirement of NT 2.0) to become cheap will have to wait a while longer.
Intel's Gordon Moore thinks memory price/bit will start to go up as DRAM chip
density increases.
Mainframe CPU performance, however, dropped off that trend years ago. In this
article, I'll examine where CPU performance increases have been coming from
and why (and when) microprocessors will also drop off that trend.
I once had a small company, Digital Acoustics, that made 6502-based products.
When the 68000 came along, we started making attached processors, mostly for
the Apple II. Based on considerable experience with these two processors, I
(back then) estimated that an Apple II (or Commodore PET) had about 1/20th the
performance of a 12.5-MHz 68000. The 12.5MHz 68000 had about the same integer
performance as the VAX 11/780, introduced in 1978. The 11/780 was nearly
contemporaneous with the Apple II and PET, which appeared late in 1977.
The Apple II and PET rated 0.05 VUP (Vax Unit of Performance). Sixteen years
later, a 66-MHz Pentium rates 64.5 VUPs, an increase of 1290 over typical PCs
in January 1978. (Contemporary 8080 machines had about the same performance as
the Apple II.) The 6502 was introduced in August 1975 and the Pentium in March
1993--17.58 years later. Using just these two CPUs as data points, we can
calculate the doubling time as: dbltime=17.58 years*LOG(2)/LOG(1290)=1.70
years.
Hmm. That's faster than the commonly accepted CPU doubling time of two years.
Suppose we compare 1978's 1-VUP VAX 11/780 (not a microprocessor) with 1993's
110.9-VUP DEC 200-MHz Alpha box: dbltime=15 years * LOG(2)/LOG(110.9)=2.21
years.
The fastest minicomputer in 1978 was 20 times faster than a PC; today's
fastest minicomputer (a server, not a workstation) isn't even twice as fast as
a Pentium-based PC. Despite incessant, contrary propaganda, PC performance has
been rapidly gaining on RISC-based workstations; see Figure 7.
Industry legend would have you believe that a particular fast microprocessor
owes its speed to the incredibly elegant and sophisticated arrangement of
registers, ALUs (arithmetic logic units), and lately, caches that its
rocket-scientist designers have built in. Actually, all of these techniques,
including multiple-issue per clock, were pioneered on mainframes.
The term "superscalar" applies to a processor that can issue more than one
instruction per clock. Purists insist that the term should be restricted to
those CPUs, such as Pentium and SuperSparc, that have duplicate execution
units (usually integer), and so can issue two integer instructions in the same
clock. The PowerPC, HP PA7100, and i860 can issue more than one instruction of
differing types in the same clock. Since most personal-computer application
programs have a mix of 85 percent integer, 15 percent branch, and 0 percent
floating-point instructions, the ability to issue multiple integer
instructions in a single clock is highly desirable.
The anonymous workers who've advanced the state of the semiconductor
lithographic art over the years are more important than the
microprocessordesigner rocket scientists.


First Principles


The job of a CPU's ALU is to examine a lot of data, manipulate some of it, and
change a little of it. To do this, the ALU must have access to the data, which
is located in DRAM. The ALU's ability to do its job is limited by the data
path connecting the integer core to the DRAM, which is the infamous von
Neumann bottleneck.
In the Apple II, the data path was 1 byte wide at a clock rate of 1 MHz, for a
data-bus bandwidth of 1 Mbyte/second. Since a 6502 system seemingly comprised
a simple CPU/ALU directly connected to the DRAM memory system, it was easy to
spot and measure the bottleneck. In fact, the X and Y registers helped
decouple the ALU from the DRAM.


Data-bus Bandwidth


The 66-MHz Pentium is 1290 times faster than the 6502-based Apple II. A faster
data bus makes most of this performance increase possible. A 64-bit (8byte)
data path, plus a peak bus burst rate of 66 MHz, provides a 528 times faster
data bus at the Pentium data I/O pins. That accounts for most, but not all, of
that 1290times increase.
To make higher performance possible, instruction traffic is offloaded from the
DRAM data bus by caches. The on-chip primary cache reduces the Pentium databus
traffic significantly, and the external secondary cache reduces the DRAM
databus traffic even further (the exact reduction is program and cache-size
dependent). So, DRAM data-bus bandwidth doesn't limit the performance of the
66-MHz Pentium. In fact, some Pentium systems have been introduced which use
only a 32-bit data path from the secondary cache to the DRAM.
(Intel has announced upcoming 100MHz DX2 and DX3 versions of the Pentium;
running the secondary cache two or three times slower than the on-chip clock
may impact the performance of those Pentium versions. The PC marketplace is
marvelously efficient. We'll quickly learn what memory configuration works
best at lowest cost. Witness the 486DX2/66's triumph over the 486DX/50.)
Figure 2 was presented by IBM's George Marr at CompCon '77. (I chose to delete
the 1960--70 portion of the graph because the pre-Intel 4004 era isn't
important here.) I've extended the curve, which represents the best commercial
practice rather than the state of the art. It's proven remarkably accurate for
mass-production microprocessors. (The "boutique" micro market, with production
of dozens per month, doesn't represent commercial practice.)
This trend to smaller minimum features is due to constant improvements by
anonymous workers in the semiconductor production-equipment industry. The
benefits are available to anyone with deep pockets. This trend, by itself, is
entirely responsible for increasing clock speeds. See Figure 3 (originally
published in 1989 by Gelsinger et al). If Marr's (1977) feature size drops by
half every 7.16 years, then the size of a given CMOS transistor will be four
times smaller, and a given charge/discharge current will change the output
from a 1 to a 0 four times faster, or twice as fast in 7.16/2=3.58 years.
Remarkably, Gelsinger's (1989) microprocessor clock doubles every 3.58 years!
(I added the grid and the computed doubling rates to Figure 3.)
Figure 4 shows the trend to larger die sizes. This is due mostly to the
increasingly pure silicon provided to semiconductor makers. Again, anonymous
workers provide this benefit to every semiconductor producer.
Smaller feature sizes and larger dies are together responsible for the
increase in the number of transistors per microprocessor die. See Figure 5,
originally published in 1986 by Intel's Myers et al. I've added recent
microprocessors, slightly adjusted the trend lines to accommodate the new
data, and provided computed doubling rates (both trend lines originally
doubled every two years).


Data-bus Width


Figure 6 shows the trend of data-bus width. The 90.5-bit datum shown for the
MIPS R4000 needs some explanation. That chip has a 64-bit data-bus path and a
separate 128-bit data bus to the external secondary cache. I decided the
effective bus width was the geometric mean of those two external buses. You
disagree? Okay, you tell me the width of the R4000's data bus.
Wide buses are so expensive that nobody uses them unless they're absolutely
necessary. The fact that buses have been doubling in width every five years is
a clear indication that ever-wider buses have been constantly needed.
More recent CPUs, such as the superscalar Pentium and SuperSparc, have very
complex internal bus structures. Even the 486 gets its instructions from a
128-bit circular queue, not the internal cache. It's hard to identify the
exact point which limits data transfer to and from the DRAM, thereby defining
the effective databus width.



Burning Bright


The easiest and cheapest way to run a microprocessor faster is to use more
current to charge and discharge its parasitic capacities ("capacity" as in
resistance and inductance). To get more current, lower-resistivity silicon is
used, which simply means doping the silicon more heavily with impurities
during the fabrication process. PC microprocessors once ran cool, but then the
486DX2/66 consumed 6 watts, and the 66-MHz Pentium 13 watts, 16 watts peak.
The Pentium is Intel's last cool desktop engine. Look for the "Hexium" (aka
P6) to consume 25 watts or more. This is a result of Intel's continuing drive
to narrow the performance margin between PCs and high-performance
server/workstations.
This is yet another (rather obvious) performance trick pioneered on
mainframes. As I write this in late September, the most recent issue of IEEE
Micro and the two latest issues of Electronic Engineering Times have articles
on microprocessor thermal management, including liquid-cooling techniques for
Pentiums.
Microprocessor electricalpower consumption is still far less than that
consumed by your desktop CRT display. What implications does this have for
future high-performance portable computers? Good question.


Rocket-scientist Watch


The availability of millions of transistors has made it possible for
microprocessor designers to incorporate mainframe techniques developed a dozen
years earlier on a single microprocessor chip. For a given transistor count
(such as Pentium's 3.1 million) the job of chip designers is to choose an
optimum mix of features that will extract the maximum performance from the new
chip. Designers are constrained by the underlying instruction-set
architecture; even the designers of DEC's Alpha and the new PowerPC are on
their third or fourth iteration. (Ah, for the luxury of a blank piece of
paper!)
Again, the job of modern microprocessor designers is less one of innovation
and more one of optimization. Less risk-taking and more mistake-avoidance.
They're all utterly dependent on the anonymous folk who drive the improvements
in process technology, improvements that are equally available to all vendors.
This hasn't always been clearly understood.


No Joy


Which may explain why Sun Microsystem's Bill Joy, the noted UNIX software
expert, announced in 1985 that Sun's SPARC offerings were going to double in
performance every year for ten years; see Figure 7. The UNIX workstation
industry, implying that Joy's prediction applied to all RISC CPUs, joyfully
(pun intended) embraced that prediction. It never happened, of course. What's
really hilarious is that until recently the RISC fanatics actually pretended
that RISC performance was doubling every year, despite the complete absence of
supporting data!
It's ironic that current industry perception of Sun's SPARC offerings is that
the SPARC has fallen off the performance curve set by DEC's Alpha, HP's Snake,
IBM's RS/6000, and MIPS's R4400. The two CPUs most demeaned by the RISC camp,
Intel's x86 engines and Sun's SPARC series, are the two that lead their
respective arenas in unit sales. Is there a lesson here?
Figure 7 also shows the microprocessor performance trend published in 1986 by
Intel's Myers et al. Myers specifically noted the unavailability of a good
performance metric at that time, which inhibited an accurate projection. What
he needed was SPECint92.


Heavy Iron


Minicomputer CPUs were once built on large printed-circuit boards, using many
integrated circuits. With the passage of time, the number of ICs has steadily
dropped. With DEC's Alpha and HP's PA7100, the number is one--the CPU is a
single chip. These are minicomputers: See how they are designed, manufactured,
and marketed. They're also microprocessors by any reasonable definition. This
would be worrisome if CPU performance of minicomputers and
microprocessor-based PCs wasn't converging.
Soon, all minicomputers (or server/workstations) will use microprocessor CPUs.
Trends of minicomputer and PC CPU performance will be the same for the same
reasons, except that production quantities of minicomputers will continue to
be much lower.


The Limits to Performance


In order of importance, CPU performance is governed by feature size, data-bus
width, and silicon purity (die size). We're about to hit a fundamental limit
to data-bus width, one that was reached years ago in the mainframe world.
After this event, CPU performance will be governed by feature size and silicon
purity alone. I think we'll reach this limit around 1996, after which
performance will double every three years, as shown in Figure 7.
All scalar computer programs, the ones we use on our personal computers, have
branches every six instructions on average. Some instructions also have data
dependencies, meaning an instruction cannot be executed until the results of
previous instruction(s) are available. For this reason, there's a limit to the
data-bus width that can be used, even if an infinitely wide data bus were
available.
The next generation of Intel's and Sun Microsystem's desktop engines will have
four integer execution units (Pentium and SuperSparc both have two). This is
the upper limit of what is useful. Already, two-pipe CPUs issue only 1.5
instructions per clock even using an optimized compiler tuned to the innards
of the CPU. Why put more than four integer pipes on a chip when the excess
over four can only be used once in a blue moon? The first of the four-pipe
CPUs will probably ship by 1996. It'll be fascinating to learn--and a real
concern for all of us--how many instructions per clock those CPUs will be able
to issue, on average.
I've bounced this idea off a couple of rocket scientists, er, microprocessor
designers. (David Ditzel,Sun Microsystem's SPARC architecture maven, has a
business card listing his job description as "Rocket Scientist.") They
immediately began to speculate about how to handle the branch problem. I had
to remind both of them that data dependencies were a serious problem when you
want to issue a bunch of instructions all at the same time.
How'd you like to become famous? A celebrity, with a statue erected in your
honor in the virtual-reality park of your choice? All you have to do is figure
out how the scalar programs we use in our PCs can be efficiently used in a
superscalar CPU which issues eight or more instructions at a time.


The Next Millennium


We're due to hit the wall on feature size in about ten years. It's not that we
can't make chips with feature sizes under 0.25 micron, it's just that we
can't, yet, make 30 million such chips a year. The known techniques for making
such devices are horribly expensive for mass production. (Will handcraftsmen
ultimately triumph over the automated production lines?) Besides, a feature
that's 0.25 microns wide is only 700 silicon atoms wide. Who knows, efficient
operating systems and efficient application programs may make a comeback.
Soon, programmers won't be able to depend on the next-generation
microprocessor to run their absurdities. But then, I'm prejudiced.
The best modern CPU benchmark is SPECint92, measured on computer systems (not
CPUs) running compiled, public-domain application programs under UNIX. This
benchmark is normalized to unity for the 11/780. An associated benchmark is
SPECfp92, which is normalized to unity for the expensive FPA (Floating Point
Accelerator) stunt box which could optionally be purchased with the 11/780.
This article will concentrate on integer performance relative to the 11/780,
called VUP (Vax Unit of Performance). If UNIX can be run, this is equal to
SPECint92.


Memory capacity doubles every 1.5 years.
CPU performance doubles every 2 years.
Feature size halves every 7 years.
Data-bus width doubles every 5 years.
DRAM chip speed doubles every 7 years.




Microprocessors Hit the Performance Wall (Again)





Nick Tredennick




Nick is chief scientist at Altera and can be contacted at 2610 Orchard
Parkway, San Jose, CA 95134 or as nickt@altera.com.


Whap! Microprocessors just hit the performance wall. Their designers have done
and tried everything. The latest n-way superscalar microprocessors have large
on-chip, nonblocking, critical-word-first, write-back caches, register
renaming, out-of-order execution, speculative prefetch, branch prediction,
branch folding, on-the-fly decoding, operand forwarding, buffered writes,
multiple execution units, and duck feathers. They have everything! We're
finally there, we've hit the wall. This time the fundamental limit is the
inherent parallelism in the instruction stream itself. You can't improve
performance by issuing 12 instructions per clock if the inherent parallelism
in the instruction stream is only four.
Whenever I look at a new microprocessor, I'm invariably impressed: The new
design always has better performance than I thought possible. I look at the
new design and think: "This time, microprocessors have hit the performance
wall. There's no way to improve this design significantly, because there's no
way to get past the $&whatever.it.is.this.time performance barrier." This
happens every couple of years since the original MC68000 design-- and this
year is no exception. Barriers in the past may have been pins, lead
inductance, bus protocol, critical path in the controller, or the critical
path in the execution unit. This time it is the ultimate barrier: the inherent
parallelism in the instruction stream. Or is it_?
If parallelism in the instruction stream is the problem, let's quit using
instructions. This may not be as silly as it sounds. The proof in the pudding
is that several accelerator cards for the Macintosh improve graphics
performance by intercepting QuickDraw commands and executing them in hardware.
Recent microprocessors include multiple-integer units, special-branch units,
and separate floating-point units on the same chip. In the coming generations,
we can add more execution units. It wouldn't do to build a special execution
unit for each anticipated application--there are just too many. Suppose we add
a large reconfigurable-logic unit (RLU), an array of logic functions with
programmable interconnections. Each connection to or from a logic function is
controlled by a memory bit which can be written by the CPU. Rather than
running an MPEG subroutine, the CPU simply configures the RLU as an MPEG
encoder (or decoder) by writing to the connection memory. Then the CPU routes
the data through the hardware MPEG encoder. Need JPEG? Reconfigure the RLU.
Need a special data filter? Reconfigure the RLU. Doing logic simulation? Build
the logic in the RLU and run the test vectors through it. When the CPU
intercepts a call to a subroutine for which there's a hardware algorithm, it
pages the configuration from a ROM or disk to the RLU's configuration memory
and passes the data or pointers to the data to the RLU.
There's a lot of work to do to make this happen, but the payoff could be
enormous. Logic simulation, for example, might be sped up by a thousand times
or so. We haven't hit the last wall. Something like the RLU is on the other
side of it. And when we get there, we'll look back and think it was obvious.
The high-end workstation/server market is serviced by highly skilled craftsmen
who produce computer systems by the dozens which are, in fact, faster than
Pentium systems. In this market, price is no object--don't expect much change
from $100,000 for your lovingly hand-tooled 200-MHz Alpha server.
The personal-computer market is serviced by numerous modern, high-speed
automated production lines that produce millions of computer systems annually
(nearly 30 million 486 systems in 1993). Price is very important. As I write
this, I can drive down the street and buy a complete 486DX2/66-based no-name
clone for $1288--two floppies, a 200-Mbyte hard drive, 4-Mbyte DRAM, VLB
motherboard, 128K secondary cache, 14-inch color Super-VGA monitor.
Workstation folk incessantly claim they can beat such a system on
price/performance. They're wrong.

A parallel-processing computer can be regarded as one big, crude, superscalar
CPU, with N integer pipes that execute instructions in the same clock. But
these pipes are in separate ICs, so hardware logic can't test for data
dependencies and branches. You really don't want to run scalar code on a
parallel processor.


 Figure 1: Historically, the predominant trends have been the doubling of CPU
performance every 2 years and of memory capacity every 1.5 years.
 Figure 2: The minimum feature size of microprocessors halves every 7.16
years. (Source: IBM's George Marr, CompCon '77.)

 Figure 3: Microprocessor clock rate doubles every 3.58 years. (Source:
Intel's Gelsinger et al.)
 Figure 4: Microprocessor and DRAM die size. Die size doubles every 5 years.
 Figure 5: Transistors per die double every 2 years. (Source: Intel's Myers et
al.)
 Figure 6: Data-bus width doubles every 5 years.
 Figure 7: CPU performance trends.































January, 1994
Optimizing Pentium Code


Writing fast code for a fast microprocessor




Mike Schmit


Mike is the president of an assembly- language tools publisher, Quantasm
Corp., and the author of ASMFLOW and Pentium optimization tools. He can be
contacted at 408-244-6826 or on CompuServe at 76347,3661.


When naming its next generation 80x86 microprocessor, Intel broke the
name-recognition mold by opting for "Pentium" instead of the predictable
"80586." What the processor vendor didn't break, however, was binary
compatibility with previous-generation 80x86 CPUs, making it possible for you
to continue running your old applications (except for those that perform
strange timing-related loops and the like).
Still, there are numerous differences between the Pentium and 80486, including
new instructions such as CPUID, RDMSR, RDTSC, and others; see Table 1. Unless
you're writing systems-level code, however, you probably won't need to use
many of these instructions.
Among the basic hardware differences are a 64-bit bus, 8K code and data
caches, fewer clock cycles for some instructions (especially floating point),
branch-prediction logic, dual-integer pipelines, higher clock speeds, and a
"superscalar pipelined architecture" that can execute two instructions per
cycle. (To be more precise, the Pentium can generate two results in a single
clock cycle.) "Pipelined architecture" refers to a CPU that executes each
portion of an instruction in different stages. When a stage is completed,
another instruction begins executing in the first stage while the previous
instruction moves to the second stage. Both the 80486 and Pentium have
five-stage pipelines; see Table 2. At some point in the pipeline, some
instructions may prevent others from advancing because of address or register
conflicts or the number of cycles actually required by the microcode to
execute an instruction.
Before discussing this dual-pipeline architecture in detail, I'll first
examine some of the main differences between the Pentium and the 80486.


Pentium vs. its Predecessors


The Pentium's 64-bit bus doesn't change how you program, only how you
optimize. Data structures and code should be aligned on 32-bit boundaries for
peak performance. The CPU still has an internal architecture of 32 bits; the
64-bit bus just gives the caches greater memory bandwidth.
The Pentium cache is 8K for code and 8K for data vs. 8K total on the 486.
You'll rarely write code that requires a substantial difference from any 80486
optimizations for only a single shared (code and data) 8K cache. On the 486,
the cache is four-way set associative write through. There are 128 sets of
four lines, each line with 16 bytes. The Pentium has two 8K caches that are
two-way write-back (for data).
The Pentium offers improved floating-point performance; it also implements the
integer multiply and divide instructions via hardcoding. You'll discover some
code sequences that are faster on one 80x86 chip and (relatively) slower on
another; see Table 3.
Branch prediction is also new to the 80x86 architecture. When a jump or call
instruction is encountered, the address of the instruction is used to access
the BTB (branch-target buffer) to predict the outcome of the instruction.
There isn't much more that you can do to take advantage of the
branch-prediction logic since it's automatic. Almost every jump or call in a
loop will execute in one cycle if the prediction logic is correct. If you have
an odd-ball procedure and are concerned about the effect of the
branch-prediction logic, don't worry because the Pentium keeps track of the
last 256 branches in the BTB and tries to predict the destination for each
call/jump. It does this by keeping a history of whether or not a jump was
taken. For example, if the prediction is correct, then a conditional jump
takes only one cycle.
The Pentium supports two 32-byte-long prefetch queues. The branch prediction
takes place in the D1 pipeline stage (second stage) and predicts whether a
branch is or isn't taken, as well as its destination. When it predicts a
branch, the other prefetch queue begins fetching instructions. If the
prediction turns out to be incorrect, then both queues are flushed, and
prefetching is restarted. For the 486 and previous CPUs, the best optimization
in regards to conditional branching was to not do the branch; the code is
fastest when a conditional jump isn't taken. The best optimization for the
Pentium is to just be consistent; that is, either always take the conditional
jump or always don't. Once a loop is determined to usually take the jump, then
it runs at the fastest rate, and failure to jump will cause a delay.
One place you might exceed the 256-branch limit is during a hardware interrupt
while in a tight loop. Another is during a task switch in a multitasking
environment where each task switch will impose a restart penalty on each task
while it refills the branch-target buffer. There isn't anything you can do in
an application program to prevent these delays.
Example 1 shows an instance where you might unknowingly cause the branch-table
buffer to overflow. This code scans a string (of known length) for spaces.
When a space is found, a function is called; otherwise, a counter is
incremented. When a space character is found, if the space function is small,
the next iteration of the loop1 code will run in four cycles; otherwise, it
runs in ten cycles. This is a dramatic change from earlier CPUs. However, the
space function would need to be at least several hundred instructions in
length to completely modify the BTB and the additional six cycles for the next
iteration of loop1 would be insignificant.
This leads to an interesting timing artifact. Suppose you're using a hardware
timing device (or some other method) to time the loop1 code, but not the space
function and you modify the space function, thereby removing several jumps.
Your timing will now show the loop1 code as being faster.


Dual-integer Pipelines


There are two integer pipelines, the U and V pipes. The U pipe is fully
capable of executing any (integer) instruction. The V pipe can only execute
simple instructions. When two simple instructions are next in the prefetch
queue and the conditions of several "rules" are met, then the CPU "pairs" the
instructions and begins execution of both at the same time. The key to
optimizing for the Pentium is knowing and following the instruction-pairing
rules as best as possible.
Simple instructions include MOVs, ALU operations (such as ADD, SUB, CMP, AND,
and OR), INC, DEC, PUSH, POP, LEA, NOP, shifts, CALL, JMP, and conditional
jumps. Table 4 lists the simple instructions. Instructions that you might
believe to be simple but aren't--flags register operations such as STC, CLC,
CMC, and so forth; the XCHG instructions; and type conversions such as CBW
(NOT and NEG)--aren't included in this list. The Pentium Programming Manual
and Pentium Data Book only provide four rules for pairing simple instructions.
After running a series of tests to verify each rule, I've developed an
extended set of the Pentium instruction-pairing rules; see Figure 1.
The prefix-byte rule (rule #6) is important when you use segment overrides
(remember that MASM and TASM automatically insert segment overrides based on
the ASSUME directive parameters) and when you're writing mixed 16- and 32-bit
code. The REP, REPE, and REPNE prefixes can't be used on any simple
instructions. The LOCK prefix can be used with some ALU (arithmetic logic
unit) instructions.
Because of the one-byte rule (rule #7), the only instructions that will pair
on the first execution are INC/DEC reg, PUSH/POP reg, and NOP. This should
never be a consideration in real applications because attempting to optimize
code that executes only once (per cache fill) is of little value. The only
time this might be useful is when you're performing timing tests of code
repeated inline. But it also means that instructions may pair differently on
the first execution as compared to subsequent executions; see Example 2(a).
The logic that determines read/write dependencies (rules #8, #9, and #10) is
based on each register as a single 32-bit entity. Therefore, a read/write to
one part of a register is the same as using the entire register. So writing to
AL, AH, or AX is the same as writing to EAX. Although Intel is vague in its
description of pairing instructions that change the flags, I determined that
all simple ALU/INC/DEC instructions can be paired with conditional jumps. This
leads to the interesting optimization that you should always use CMP or TEST
to set the flags (when possible) since they only write to the flags register.
Example 2(b) tests to see if AX is zero. The CMP instruction is three bytes
long, the others two. The OR instruction writes to AX, reducing pairing
opportunities; thus, the best choice is to use TEST. Example 2(c) shows
examples of read/write dependencies.
Four other delays can occur in the Pentium that don't affect instruction
pairing, but do add extra cycles and should be considered when reordering
instructions. First, if two-paired instructions access the same data-cache
memory bank, there's a one-cycle delay in the second instruction. A data-cache
memory bank conflict occurs when bits 2--4 are the same in the two physical
addresses. This is difficult to program around, especially in low-level
subroutines that only receive pointers to data items. The best strategy is to
not pair instructions that might access the same data-cache memory bank.
Second, an AGI (address-generation interlock) will occur when any instruction
in cycle n writes to a register used in an effective address calculation in
any instruction in cycle n+1. This can occur either when an instruction in one
cycle changes a register that's the base or index portion of an effective
address calculation for the next cycle, or when an instruction in one cycle
changes SP (or ESP) and the next instruction relies on SP (or ESP). Example
3(a) shows AGI examples.
The third delay is the prefix-byte delay. On the 486, prefix bytes don't add
cycles. On the Pentium, a prefix (such as a segment override) takes one extra
cycle to process. Also, beware of rule #6, because a prefixed instruction
can't be paired in the V pipe.
The fourth is a sequencing delay. Most simple instructions execute in one
cycle because they're hardwired (no microcode). Some forms of ALU instructions
execute in two or three cycles; for example, ADD mem, reg and ADD reg, mem.
Sequencing hardware allows them to function as simple instructions. The
three-cycle form (read-modify-write: ALU mem, reg) is pairable. However, when
two read-modify-write instructions are paired together, there's a two-cycle
sequencing delay; see Example 3(b). The instructions in Example 3(b) take
three cycles each. If they're paired, they take a total of five cycles because
of the two-cycle sequencing delay. If you have a spare register, you could
rewrite it like Example 4(a).
But this still takes five cycles (the third and fourth instructions should
pair). This leads you to believe that this is exactly how the CPU sequences
the operation using an internal scratch register. So you need to take into
account an extra register when you write your code. You could rewrite the
above code as in Example 4(b).
Although smaller, this still takes five cycles. The problem is that in writing
the code this way you are blocking yourself from finding other pairing
opportunities. Another way to write the code with two spare registers is shown
in Example 4(c). Again, this still takes five cycles, but it can be reordered
to Example 4(d), which only takes three cycles. If you need to save and
restore the two registers that you used, this will take back the two cycles
you saved. But if the pushes and pops are outside of a loop, then this portion
of your loop has gone from five to three cycles--a 40 percent improvement.
Finally, the 486 has an extra cycle delay when an effective address
calculation uses a base register and an index register. The Pentium does not
have this delay.


String-instruction Optimizations


Many of the optimizations for the Pentium are also optimizations for the 486
when compared to the 386 and previous CPUs. As I said earlier, the key factor
in Pentium optimization is to use as many of the simple instructions as
possible to create more opportunities for pairing to occur. (The downside, of
course, is that your code is larger.)
Consider the 8088 code in Example 5, which copies an ASCIIZ string (a string
of ASCII characters terminated with a null byte). The alternative to writing
the code with the string instructions is to use the combination of the
corresponding MOV and INC for the LODSB and STOSB; see Figure 2(b). The Figure
2(b) code doesn't exactly duplicate the Figure 2(a) function since STOSB uses
the ES segment by default. Adding a segment override to the second MOV in
Figure 2(b) would do this and change the cycle counts by 1 or 2 on some CPUs.
As Intel came out with each new CPU, I periodically reviewed code such as that
in Example 5 to see if it would benefit from being changed. Even on the 386,
the string instructions tended to be better or equal in performance. But with
the more RISC-like 486, the simple load and store instructions tended to
perform better. Although string operations on the 486 don't continue to
measure up in speed, they're still more compact (in this case 6 vs. 11 bytes).
With the Pentium, the speed increase is dramatic--from six cycles to three, a
(theoretical) 50 percent speed-up compared to only a 43 percent increase on
the 486 (fourteen to eight cycles). You'll notice in Figures 2 through 4 that
I've indicated the number of cycles for each Pentium instruction assuming that
no pairing occurs. In the column titled "w/pair," the cycles are given
assuming that pairing occurs according to the Intel pairing rules. An
instruction that executes in the V pipe will show the number of cycles beyond
those required for the U pipe instruction (usually 0).
So executing 80x86 string instructions is slower than just executing the
individual move and increment, since these instructions can be paired and
executed in a single cycle. In addition, the CMP/Jcc (or TEST/Jcc) combination
can be paired so this is also executed in a single cycle. But what about the
repeat-string instructions? You should use the repeat-string instruction (MOVS
and STOS) when the string is any significant length. For very short strings
(less than four or five bytes, words, or dwords), the REP overhead will
usually overwhelm any savings. Another consideration when using REPE or REPNE
with CMPS or SCAS is that executing the individual operations requires the use
of one or two additional registers that may need to be saved and restored; see
Figure 3.
Figure 4 shows an ASCIIZ string copy with a limitation on the maximum length
of the string. This illustrates that the LOOPNE (also LOOPE) is much slower
than the equivalent Jcc/DEC/Jcc. Again, the LODSB and STOSB are replaced with
MOV/INC pairings. This reduces the Pentium cycle count from fourteen to four
cycles, a 71 percent speed-up.



Making Assumptions


Remember that all the pairing guesses and cycle counting are just assumptions.
You must check your assumptions by timing actual code. The Pentium has 8K of
cache for code, so virtually every loop that will run out of the cache can be
highly optimized. This is because when one instruction pairs with another, it
is generally executed in 0 extra cycles. (Actually, when two instructions
pair, they execute in the number of cycles required by the instruction with
the largest cycle count.) If you're just reordering instructions in a tight
loop, there's no prefetch time, and all other factors should be the same.
So now you can pull out Michael Abrash's Zen Timer (see Zen of Assembly
Language: Volume I, Knowledge by Michael Abrash, Scott, Foresman, 1990) or use
an in-circuit emulator to time your code before and after to see if you saved
all the cycles you thought you would. Before you do, however, note that Intel
gives away a free hardware timer to everyone who buys a Pentium. Actually,
it's just a new instruction RDTSC (Read Time Stamp Counter). That's the good
news. The bad news is that it is not fully documented. The Intel Pentium
Processor User's Manual, Volume 3: Architecture and Programming Manual (Intel
#241430-001) doesn't list it anywhere except in "Appendix A" (page A--7) in
the opcode map. There's no other description of it in the entire 291-page
chapter that describes the instruction set.
On every machine cycle, there's an internal 64-bit counter that is
incremented. This means that with every Pentium there's a timer accurate to
one machine cycle with a range of up to 8800 years (at 66 MHz). The
instruction opcode for RDTSC is 0F 31, and it returns the 64-bit timer count
in EDX:EAX.


Pentium Timing Results


To check the cycle counts I've discussed here, I used a slight modification of
the Ztimer so that I could run the same code on a 486, the Pentium (an Intel
60-MHz Pentium system, thanks to the Center for Software Development testing
lab in San Jose), and a 33-MHz ZEOS 486. All the code and data was aligned on
dword boundaries, and the strings were small enough to be entirely in the
cache. All code and data were preloaded into the cache. All the timings came
within one cycle of the predicted counts. The surprise was that some timings
were faster by a cycle. I suspect that happened because either the published
cycle counts are incorrect, or more instructions pair on the Pentium than are
published.
When I timed the code in Figure 2(a), it only took six cycles per loop instead
of my original prediction of seven. This lead me to believe that the OR/Jcc
instruction combination is also pairable. The Intel documentation notes two
exceptions to the rule about writing to a register before immediately using it
in the same cycle. "The first is the commonly occurring sequence of compare
and branch which may be paired." The second is for pairing pushes and pops
since they depend on SP or ESP. It would appear that all ALU instructions
update the flags register and can be paired with conditional jump
instructions.
When I changed the code in Figure 2(b) to include an ES-segment override
prefix, the code took four cycles instead of three, because the
segment-override prefix takes an extra cycle to process. It's my view that
this works as follows: The prefix codes are independent, non-pairable,
single-cycle instructions.When a prefix is on an instruction that you assume
will be paired in the U pipe, the net effect is that you're penalized one
extra cycle. When a prefix is on an instruction you assume will be paired in
the V pipe, no pairing occurs (rule #6). The easy way to understand this is
that there are three instructions, the middle one being a nonpairable prefix
instruction.
The code in Figure 4(a) took one cycle less than predicted on the Pentium. The
only two reasonable explanations for this are that the published cycle counts
are wrong or that pairing rules allow some other pairings. The code in Figure
4(b) also took one cycle less than I originally predicted. My guess was that
the DEC/Jcc combination was pairing. To prove this, I added a CMP between the
DEC and Jcc, which then required an extra cycle.


Conclusion


Optimizing for the Pentium also tends to optimize for the 80486 because the
simple instructions pairable on the Pentium make up the RISC-like core set of
instructions in both the 486 and the Pentium.
In general, issuing a sequence of simple instructions is better than issuing a
complex instruction that takes the same number of cycles. These simple
instruction sequences expose more chances for pairing. This load/store style
of code generation does, however, require more registers and increases code
size. This impacts performance and may degrade performance on older chips.
However, if you are writing code for the Pentium, you can increase performance
of common operations by 50 percent or more by using the pairing rules, the
right instructions, and the built-in Pentium timer.


Acknowledgments


I'd like to thank Intel and the Center for Software Development for the use of
the Pentium machine.



Pentium OptimizationSomething Old, Something New




Michael Abrash




Michael, DDJ's former "Graphics Programming" columnist, is the author of The
Zen of Assembly Language. He can be reached on MCI mail at 313-3923.


To this longtime assembly-language fan, the Pentium is an unexpected treat.
The trend toward diminishing returns on hand-tuning that reached its peak with
the 386 has completely reversed itself--the Pentium is stunningly rich in
optimization opportunities, as Mike Schmit's article details.
You've likely read that the Pentium, like RISC processors, is too complex to
program effectively in assembly language, that the key to Pentium performance
is using a Pentium-aware compiler. Yes and no. First, compilers are no more
able to match skilled humans on the Pentium than they were on the 8088. It is
painstaking work to hand-tune Pentium code; you must find the places where the
next instruction is guaranteed to go through the U pipe, then work with the
long list of pairing rules and special cases to optimize the code forward from
those points. On the other hand, the rules and special cases allow for
enormous benefits from redesigning code. The payback for Pentium optimization
is even greater than for 486 optimization--and I've seen the speed of an
entire word-counting utility doubled on the 486 simply by rearranging three
instructions. Surely, hand-tuning for the Pentium is complex enough that it
should be reserved for the most critical paths, but those are the only places
assembly makes sense, anyway.
Then, too, there are hazards to compiling for Pentium optimization. First,
Pentium-optimized code runs well on the 486, but can slow down considerably on
the 386, because Pentium optimization consists largely of breaking complex
instructions into series of simple, RISC-like instructions, and then
rescheduling them. On the 486 and Pentium, the cycle counts for complex
instructions are the same as the cumulative cycle counts for equivalent simple
instructions, but on the 386 complex instructions take fewer cycles.
Second, RISC code is notorious for being large, and RISC-like instructions
share that trait. Consider incrementing two memory locations on the Pentium in
32-bit code. Two INC [mem] instructions take five cycles and 12 bytes; two MOV
reg,[mem] / INC reg / MOV [mem],reg sequences interleaved together take only
three cycles but explode to 26 bytes. For key loops, this is fine; the
Pentium's 8K code cache is ample for almost any loop. However, Pentium
performance relies heavily on instruction fetches hitting the internal cache,
because cache misses are expensive and because code that hasn't already been
executed out of the internal cache can't be dual-piped. Unfortunately,
compilers apply Pentium optimization to all code, not just key loops, so
programs become larger and suffer more cache and possibly page misses, when
they are Pentium-optimized. My preference is to Pentium-optimize only
time-critical code--by hand or via compiler switches and #pragmas--letting the
compiler handle the rest of the code with 386/486 optimization. Ideally, the
tuned code would also have a 386-optimized form, with the correct version
selected at startup. This approach gives a compact code footprint and very
good performance on all the processors, significantly better on both counts
than merely compiling with Pentium optimization.
Table 1: New instructions for the Pentium.
CMPXCHG8B Compare and Exchange 8 Bytes
CPUID CPU Identification (EAX is input)
 for EAX = 0: returns vendor string in EBX, EDX, ECX
 for EAX = 1: returns EAX[0:3] stepping ID
 EAX[4:7] model number
 EAX[8:11] family number
 EAX[12:31] reserved
 EBX reserved (0)

 ECX reserved (0)
 EDX feature flags
RDMSR Read from Model Specific Register (ECX is register number)
WRMSR Write to Model Specific Register (ECX is register number)
RSM Resume from System Management Mode
RDTSC Read Time Stamp Counter
MOV Move to/from Control Registers (new registers)
Table 2: (a) 80486 five-stage pipeline operation; (b) Pentium dual five-stage
pipeline operation. PF=prefetch, D1=instruction decode, D2=address generation,
EX=execute and cache access, WB=write back, i1=instruction #1, i2=instruction
#2, and so on.
(a)
Cycle
 Stage 1 2 3 4 5 6 7 8
 PF i1 i2 i3 i4
 D1 i1 i2 i3 i4
 D2 i1 i2 i3 i4
 EX i1 i2 i3 i4
 WB i1 i2 i3 i4
(b)
Cycle
 Stage Pipe 1 2 3 4 5 6 7 8
 PF U i1 i3 i5 i7
 V i2 i4 i6 i8
 D1 U i1 i3 i5 i7
 V i2 i4 i6 i8
 D2 U i1 i3 i5 i7
 V i2 i4 i6 i8
 EX U i1 i3 i5 i7
 V i2 i4 i6 i8
 WB U i1 i3 i5 i7
 V i2 i4 i6 i8
Table 3: Pentium instructions showing reduced clock cycles over the 486.
 486 Pentium
mul 13--42 10--11
ret 5 2
popa 9 5
pusha 11 5
lods 5 2
rep movs 3 1
rep stos 4 1
repe/ne cmps 7 4
repe/ne scas 5 4
fadd 8--32 1--3
fmul 11--16 1--3
fcos fsin 257--354 16--126
fdiv 73--89 39
Table 4: Simple instructions that can execute in the V pipe.
MOV reg, reg
MOV reg, mem
MOV reg, imm
MOV mem, reg
MOV mem, imm
alu reg, reg
alu reg, mem
alu reg, imm
alu mem, reg
alu mem, imm
INC reg
INC mem
DEC reg
DEC mem

PUSH reg
POP reg
LEA reg, mem
JMP near
CALL near
Jcc* near
NOP
shift reg, 1
shift mem, 1
shift reg, imm
shift mem, imm
*Jump on condition code.
alu=Add, adc, and, or, xor, sub, sbb, cmp, test; shift=sal, sar, shl, shr,
rcl, rcr, rol, ror. (rcl and rcr not pairable with immediate.)
Example 1: Unknowingly overflowing the branch-target buffer. This code scans a
string (of known length) for spaces.
 ; with without branch prediction
 ; (cycles are for a case of non-space character)
 loop1:
 mov al, [si] ; 1 1
 inc si ; 0 0 (0 due to pairing)
 cmp al,   ; 1 1
 jne foo ; 0 3
 call space ;
 jmp bar ;
 foo:
 inc dx ; 1 1
 bar:
 dec cx ; 0 0
 jnz loop1 ; 1 3
 ; total 4 10

Example 2: (a) Different pairing occurs on first and subsequent executions
from the cache; (b) testing to see if ax is zero; (c) read/write dependencies.
(a)

 first subsequent
mov ax, 1 1 1
inc bx 2 1
mov cx, 1 2 2
call xyz 3 2


(b)

cmp ax, 0
or ax, ax
test ax, ax


(c)

read-after-write (do not pair)
 mov al, 1
 add bh, ah

 mov ax, 1
 add bx, ax

write-after-write (do not pair)
 mov eax, 1
 add eax, ebx


 mov ax, 1
 mov ax, 2

write-after-read (pairs)
 mov ax, bx
 inc bx

read-after-read (pairs)
 mov eax, ebx
 add ecx, ebx

Example 3: (a) AGI examples; (b) two-cycle sequencing delay because the
instructions take three cycles each.
(a)

inc bx ; two INCs pair
inc ax
mov cx, [si] ; two MOVs pair
mov dx, [bx] ; AGI delay because BX changed in previous cycle

pop bx ; two POPs pair
pop ax
ret ; no AGI delay

pop bx ; two POPs pair
pop ax
add sp,0
ret ; AGI delay


(b)

add [bx], 2
add [si], 2


Example 4: (a) Rewriting Example 3(b) like this still takes five cycles; (b)
rewriting 4(a) by taking into account an extra register--the code is smaller,
but still takes five cycles; (c) rewriting the code using two spare registers;
(d) this reordered sequence only takes three cycles.
(a) mov ax, [bx]
add ax, 2
mov [bx], ax
mov ax, [si]
add ax, 2
mov [si], ax


(b) mov ax, [bx]
add ax, 2
mov [bx], ax
add [si], 2


(c) mov ax, [bx]
add ax, 2
mov [bx], ax
mov cx, [si]
add cx, 2
mov [si], cx



(d) mov ax, [bx]
mov cx, [si]
add ax, 2
add cx, 2
mov [bx], ax
mov [si], cx

Example 5: 8088 code that copies an ASCIIZ string.
loop1: lodsb
 stosb
 or al, al
 jne loop1

Figure 1: Enhanced list of Pentium instruction-pairing rules.

 1. Both instructions must be simple. (See Table 4.)
 2. Shift/rotate can only be in U pipe.
 3. ADC and SBB can only be in U pipe.
 4. JMP/CALL/Jcc can only be in V pipe (Jcc=jump on condition code).
 5. Neither instruction can contain both a displacement and an immediate
operand.
 6. Prefixed instructions can only be in the U pipe (except for 0F in Jcc).
 7. The U pipe instruction must be only one byte in length or it will not pair
until the second time it executes from the cache.
 8. There can be no read-after-write or write-after-write register
dependencies between the instructions except for special cases for the flags
register and the stack pointer (rules #9 and #10).
 9. The flags register exception allows a CMP or TEST instruction to be paired
with a Jcc even though CMP/TEST writes the flags and Jcc reads the flags.
10. The stack pointer exception allows two PUSHes or two POPs to be paired
even though they both read and write to the SP (or ESP) register.

Figure 2: ASCIIZ string copy with maximum string length. w/pair=cycles with
pairing, bytes=instruction length in bytes. You can further optimize the
Figure 2(b) code from three cycles per byte to two cycles per byte. (Cycles
are per repeated byte, word, or dword; assumes cache hits.)

(a)

 ; 8088 286 386 486 Pent. w/pair bytes
loop1:
lodsb ; 16 5 5 5 2 2 1
stosb ; 15 3 4 5 3 3 1
or al, al ; 3 2 2 1 1 1 2
jne loop1 ; 16 8 8 3 1 0 2
 ; ---------------------------- ---
 50 18 19 14 7 6 6


(b)

 ; 8088 286 386 486 Pent. w/pair bytes
loop2:
mov al, [si] ; 17 5 4 1 1 1 2
inc si ; 3 2 2 1 1 0 1
mov [di], al ; 18 4 2 1 1 1 2
inc di ; 3 2 2 1 1 0 1
cmp al, 0 ; 4 3 2 1 1 1 2
jne loop2 ; 16 9 8 3 1 0 2
 ; ----------------------------- ---
 61 25 20 8 6 3 11




Figure 3: CPU cycles for common repeat string instructions.
 486 Pentium
REP MOVS repeat move string 3 1

REP STOS repeat store string 4 1
REPE/NE CMPS repeat while equal (not equal) compare 7 4
REPE/NE SCAS repeat while equal (not equal) scan 5 4
Figure 4: ASCIIZ string copy with maximum string length.
(a)
 ; 8088 286 386 486 Pent. w/pair
loop3:
 lodsb ; 16 5 5 5 2 2
 stosb ; 15 3 4 5 3 3
 or al, al ; 3 2 2 1 1 1
 loopne loop3 ; 19 10 13 9 8 8
 ;-----------------------------
 53 20 24 20 14 14

(b)
 ; 8088 286 386 486 Pent. w/pair
loop4:
 mov al, [si] ; 17 12 4 1 1 1
 inc si ; 3 2 2 1 1 0
 mov [di], al ; 18 9 2 1 1 1
 inc di ; 3 2 2 1 1 0
 cmp al, 0 ; 4 3 2 1 1 1
 je exit4 ; 4 3 3 1 1 0
 dec cx ; 3 2 2 1 1 1
 jnz loop4 ; 16 9 8 3 1 0
exit4:
 ;------------------------------
 68 42 25 10 8 4




The Center for Software Development


The Center for Software Development is a nonprofit organization with a mission
to enable software companies to bring higher-quality products to market more
rapidly and promote the growth of software companies. The Center was created
through a partnership between The Software Entrepreneurs' Forum, the City of
San Jose, and Novell. Many companies are supporting the Center, including
Novell, Adobe, AT&T, GO, IBM, Intel, Microsoft, Oracle, and others.
The Center has independent multivendor labs with PCs, Macs, UNIX workstations,
and other resources useful for software testing. Developers can utilize the
Center's industry- and technology-specific labs, such as the Network Software
Test Lab, PostScript Lab, Mobile Computing Lab, and the International Lab. The
Center's labs are set up in two types of space: a Walk-in Lab, where
developers rent preconfigured machines by the hour, and a set of secure Custom
Labs.
The Software Industry Resource Center includes a library of commonly needed
information, including market-research data, brochures, and reference
information from local service firms, as well as an extensive collection of
technical materials. The Center also provides software developers with access
to service providers such as legal firms, accounting firms, venture-capital
funds, marketing consultants, and the like. Platform vendors and service
providers use the Center's meeting facilities to put on seminars and clinics.
Software developers use the meeting facilities for presentations to potential
publishers, venture capitalists, and major end users.
The Center is an exciting, useful, one-of-a-kind resource. For more
information, you can contact the Center at 408-289-8378.
--M.S.






















January, 1994
Skip Lists


They're easy to implement and they work




Bruce Schneier


Bruce is the author of Applied Cryptography (John Wiley & Sons) and can be
reached at 730 Fair Oaks Ave., Oak Park, IL 60302.


Binary trees look great on paper, but they can have problems in real-life
implementations. If nodes are inserted and deleted randomly, binary-tree
performance is unmatched. If, however, nodes are inserted in order, the
binary-tree structure degenerates into a linked list, and performance
plummets. There are tree-balancing algorithms in every post-Knuth book on
algorithms ever written, but they are difficult to implement and consume a lot
of execution time. Dean Clark's "Splay Trees" (DDJ, December 1992) and my
article, "Red-Black Trees" (DDJ, April 1992) describe binary trees with
almost-balanced properties that are relatively efficient. Skip lists are even
better.
Skip lists were invented by William Pugh at the University of Maryland as an
easy alternative to balanced trees. To understand skip lists, we first have to
look at their derivation from simple linked lists.
A linked list is a dynamic data structure in which every node points to the
node after it. It's easy to implement, but to find a particular node, you have
to look at every node before it in the list; see Figure 1(a). If, however,
every other node had an additional pointer that skipped to the node two ahead
of it, you would only have to look at every other node in the list before you
found the node you were looking for, and then maybe one more; see Figure 1(b).
If, in addition, every fourth node had yet another pointer to the node four
ahead of it, you would only have to look at about every fourth node plus
another two before you found the one you wanted; see Figure 1(c). And so on.
This would be great for searching--as efficient as a balanced binary tree--but
insertion and deletion would be a nightmare. With every insertion and
deletion, every node would have to have its pointers rearranged to conform to
the skip structure.


Skip Lists


Pugh's skip lists get around this problem by taking a probabilistic approach
to the skip pointers; see Figure 1(d). Instead of demanding that every second
node have two pointers, every fourth node have three pointers, and so on, the
number of pointers in a particular node is determined probabilistically. On
the average, half the nodes only have pointers to the node directly in front
of them. The other half also have pointers to the node two ahead of them. Half
of those nodes also have pointers to the node four ahead of them, half of
those nodes also have pointers to the node eight ahead of them, and so on.
This means that half the nodes have one pointer, a quarter have two pointers,
an eighth have three pointers, and so forth. The addition of randomness means
that insertions and deletions won't require modifications to the whole list.
Sure, the random dice might degenerate into a normal linked list with lousy
performance characteristics, but the odds of that happening are minuscule for
a list of reasonable size.
Skip-list algorithms are substantially simpler than balanced tree algorithms.
C code for searching, inserting, and deleting nodes in skip lists is given in
Listing One (page 88). Pascal code is available electronically; see
"Availability," page 3.
The function search() implements the searching algorithm. Searches start at
the top level of pointers, those that cover the most ground. Simply search
until the node's value is greater than the value you are looking for, and then
drop down to the next level of pointers. Eventually you will either find the
node or overshoot the value on the bottom pointer level. Figure 2(d) shows an
example search path through a skip list.
Skip-list insertion is implemented in the function insert(), which is nothing
more than searching and splicing. Depending on the number of pointers the new
node has, determined randomly by the function randomlevel(), different
pointers will have to be updated. In the example source code, the array
update[] contains the list of pointers that must be updated.
Deleting is also easy: Search for the correct node, pull it out of the list,
and reconnect a few pointers. The function delete() implements skip-list
deletion.


More with Skip Lists


There are still more tricks to improve skip-list performance. You can vary the
constant p in the probabilistic algorithm that determines how many pointers a
given node has. This affects both performance as well as disk-storage
requirements. If p=1/2 (as described in the example), the average node has two
pointers.
This is no better than a linked list. If p is reduced to 1/4, the algorithm
runs no slower, but the average node has only 1.33 pointers. A p of 1/8 runs
one-third slower than a p of 1/4, but only 1.14 pointers are required for each
node. If p=16, search time is double that of a p of 1/4, but only 1.07
pointers are required for each node, and so on. Since the space savings
converge on 1 fairly rapidly, little savings is gained by making p any less
than 1/4.


Conclusion


William Pugh said it best: "From a theoretical point of view, there is no need
for skip lists. Balanced trees can do everything skip lists can and have good
worst-case time bounds (unlike skip lists). However, the difficulty of
implementing balanced trees greatly restricts their use_. Skip lists are
simple data structures that can be used in place of balanced trees for most
applications." They're easy to implement. They're as versatile as any balanced
tree algorithm. They're faster than most tree-balancing "trick" algorithms.
They require less memory storage for pointers than tree structures. And they
work.


Bibliography


Pugh, William. "Skip Lists: A Probabilistic Alternative to Balanced Trees."
Communications of the ACM (June 1990).
Pugh, William. "A Skip List Cookbook." Tech. Report CS-TR-2286.1. Department
of Computer Science, University of Maryland, July 1989.
Pugh, William. "Concurrent Maintenance of Skip Lists." Tech. Report,
CS-TR-2222.1. Department of Computer Science, University of Maryland, April
1989.
 Figure 1: (a) Finding a particular node; (b) looking at every other node in
the list; (c) looking at about every fourth node plus another two; (d) taking
a probabilistic approach to skip pointers.
 Figure 2: Search path through a skip list.
[LISTING ONE] (Text begins on page 50.)
/* This file contains source code to implement a dictionary using
skip lists and a test driver to test the routines. A couple of comments
about this implementation: The routine randomLevel has been hard-coded to
generate random levels using p=0.25. It can be easily changed. The insertion
routine has been implemented so as to use the dirty hack described in the CACM
paper: if a random level is generated that is more than the current maximum

level, the current maximum level plus one is used instead. Levels start at
zero and go up to MaxLevel (which is equal to (MaxNumberOfLevels-1).
The compile flag allowDuplicates determines whether or not duplicates
are allowed. If defined, duplicates are allowed and act in a FIFO manner.
If not defined, an insertion of a value already in the file updates the
previously existing binding. BitsInRandom is defined to be the number of bits
returned by a call to random(). For most all machines with 32-bit integers,
this is 31 bits as currently set. The routines defined in this file are:
 init: defines NIL and initializes the random bit source
 newList: returns a new, empty list
 freeList(l): deallocates the list l (along with any elements in l)
 randomLevel: Returns a random level
 insert(l,key,value): inserts the binding (key,value) into l. If
 allowDuplicates is undefined, returns true if key was newly
 inserted into the list, false if key already existed
 delete(l,key): deletes any binding of key from the l. Returns
 false if key was not defined.
 search(l,key,&value): Searches for key in l and returns true if found.
 If found, the value associated with key is stored in the
 location pointed to by &value
*/

#define false 0
#define true 1
typedef char boolean;
#define BitsInRandom 31
#define allowDuplicates
#define MaxNumberOfLevels 16
#define MaxLevel (MaxNumberOfLevels-1)
#define newNodeOfLevel(l) (node)malloc(sizeof(struct
nodeStructure)+(l)*sizeof(node *))

typedef int keyType;
typedef int valueType;

#ifdef allowDuplicates
boolean delete(), search();
void insert();
#else
boolean insert(), delete(), search();
#endif

typedef struct nodeStructure *node;
typedef struct nodeStructure{
 keyType key;
 valueType value;
 node forward[1]; /* variable sized array of forward pointers */
 };
typedef struct listStructure{
 int level; /* Maximum level of the list
 (1 more than the number of levels in the list) */
 struct nodeStructure * header; /* pointer to header */
 } * list;
node NIL;
int randomsLeft;
int randomBits;
init() {
 NIL = newNodeOfLevel(0);
 NIL->key = 0x7fffffff;

 randomBits = random();
 randomsLeft = BitsInRandom/2;
};
list newList() {
 list l;
 int i;
 l = (list)malloc(sizeof(struct listStructure));
 l->level = 0;
 l->header = newNodeOfLevel(MaxNumberOfLevels);
 for(i=0;i<MaxNumberOfLevels;i++) l->header->forward[i] = NIL;
 return(l);
 };
freeList(l)
 list l;
 {
 register node p,q;
 p = l->header;
 do {
 q = p->forward[0];
 free(p);
 p = q; }
 while (p!=NIL);
 free(l);
 };
int randomLevel()
 {register int level = 0;
 register int b;
 do {
 b = randomBits&3;
 if (!b) level++;
 randomBits>>=2;
 if (--randomsLeft == 0) {
 randomBits = random();
 randomsLeft = BitsInRandom/2;
 };
 } while (!b);
 return(level>MaxLevel ? MaxLevel : level);
 };
#ifdef allowDuplicates
void insert(l,key,value)
#else
boolean insert(l,key,value)
#endif

register list l;
register keyType key;
register valueType value;
 {
 register int k;
 node update[MaxNumberOfLevels];
 register node p,q;
 p = l->header;
 k = l->level;
 do {
 while (q = p->forward[k], q->key < key) p = q;
 update[k] = p;
 } while(--k>=0);
#ifndef allowDuplicates
 if (q->key == key) {

 q->value = value;
 return(false);
 };
#endif
 k = randomLevel();
 if (k>l->level) {
 k = ++l->level;
 update[k] = l->header;
 };
 q = newNodeOfLevel(k);
 q->key = key;
 q->value = value;
 do {
 p = update[k];
 q->forward[k] = p->forward[k];
 p->forward[k] = q;
 } while(--k>=0);
#ifndef allowDuplicates
 return(true);
#endif
 }
boolean delete(l,key)
register list l;
register keyType key;
 {
 register int k,m;
 node update[MaxNumberOfLevels];
 register node p,q;
 p = l->header;
 k = m = l->level;
 do {
 while (q = p->forward[k], q->key < key) p = q;
 update[k] = p;
 } while(--k>=0);
 if (q->key == key) {
 for(k=0; k<=m && (p=update[k])->forward[k] == q; k++)
 p->forward[k] = q->forward[k];
 free(q);
 while( l->header->forward[m] == NIL && m > 0 )
 m--;
 l->level = m;
 return(true);
 }
 else return(false);
 }
boolean search(l,key,valuePointer)
register list l;
register keyType key;
valueType * valuePointer;
 {
 register int k;
 register node p,q;
 p = l->header;
 k = l->level;
 do while (q = p->forward[k], q->key < key) p = q;
 while (--k>=0);
 if (q->key != key) return(false);
 *valuePointer = q->value;
 return(true);

 };
#define sampleSize 65536
main() {
 list l;
 register int i,k;
 keyType keys[sampleSize];
 valueType v;
 init();
 l= newList();
 for(k=0;k<sampleSize;k++) {
 keys[k]=random();
 insert(l,keys[k],keys[k]);
 };
 for(i=0;i<4;i++) {
 for(k=0;k<sampleSize;k++) {
 if (!search(l,keys[k],&v)) printf("error in search #%d,#%d\n",i,k);
 if (v != keys[k]) printf("search returned wrong value\n");
 };
 for(k=0;k<sampleSize;k++) {
 if (! delete(l,keys[k])) printf("error in delete\n");
 keys[k] = random();
 insert(l,keys[k],keys[k]);
 };
 };
 freeList(l);
 };
End Listing



































January, 1994
Maximizing Performance of Real-time RISC Applications


Guidelines for writing RISC-based, real-time application software




Mitchell Bunnell


Mitch is a cofounder of Lynx Real-Time Systems and can be contacted at 16780
Lark Ave., Los Gatos, CA 95030.


Although RISC processors were designed for fast computational performance
rather than fast real-time performance, embedded real-time system designers
can still benefit from RISC technology. One advantage of RISC processors
compared to older-generation CPUs is very fast throughput--from 20 to 60 or
more MIPS; floating-point computation is faster, too. A typical CISC
processor, for example, runs at less than 1 megaflop; RISC processors execute
at between 3 and 14 megaflops. Even though the instruction set is reduced,
it's more than made up for by having instructions execute in a single-clock
cycle (and at incredibly fast clock rates). Also, some RISC processors now use
the extra on-chip real estate to provide special functions specifically for
embedded designers, such as graphics functions for X Window terminal
manufacturers. This can dramatically reduce hardware chip count.
To achieve high throughput, RISC processors support features such as
pipelining, memory caching (including delayed write), and large numbers of
registers or register windows. Pipelining allows the processor to perform
several operations simultaneously. This is how the one cycle per instruction
is achieved. Because of a RISC processor's high execution speed, a fast memory
cache is used, allowing most memory operations to be executed with 0 wait
states. Main memory access (with 5 or more wait states) is only done when
cache misses occur. Delayed writes further lessen the chance the processor
will have to wait for memory access.
Having a large number of registers allows local data to be kept in the
processor without resorting to memory access. The SPARC processor, for
instance, has 128 registers, partitioned into eight-register windows. Register
windows allow subroutine and function calls to be performed without taking the
time to save register contents to memory. Register windows and the large
number of registers are what make RISC processors so fast when executing a
single program: A 66-MHz PowerPC gets 50 Specmarks and 80 floating-point
Specmarks; a 50-MHz SuperSparc gets 65 Specmarks and 83 floating-point
Specmarks; and a 99-MHz PA-RISC processor gets 80 Specmarks and 150
floating-point Specmarks. By comparison, a 50-MHz 80486 gets 30 Specmarks and
14 floating-point Specmarks. Consequently, CISC designers are now adopting
those RISC features, such as pipelines and write-back caching, that make RISC
so fast.


RISC and Real Time


Real-time applications involve multiple simultaneous tasks running
asynchronously. The CPU must switch execution from one task to another so that
tasks begin execution after the beginning of their period (or in response to
an external event) and finish before their deadlines. External events and
periodic timing normally interrupt the processor, so interrupt handling must
also be performed.
Analysis of a real-time system must account for the worst-case execution time
of each task. The overhead of context switching from one task to another, or
of having a task or interrupt preempt the current task, must also be accounted
for. Context-switch time and preemption time must be added to the task's
execution time for real-time analysis.
Many of the things that make RISC processors fast can hinder real-time
performance. Saving and restoring a large number of processor registers or
register windows can take more time than saving and restoring a few. This is
part of the context switch and interrupt-preemption time. The overhead of
software trap handling and pipeline saving/restoring also makes for a longer
preemption time; Figure 1(b) compares the number of instructions to deal with
interrupts between a CISC (68030) and a RISC (SPARC) processor. Code
implementations for CISC vs. RISC interrupt dispatch are available
electronically; see "Availability," page 3. Because there are typically more
context switches and interrupts in a real-time system, there are more
translation look-aside buffer (TLB) reload misses. Some RISC processors reload
the TLB in software. The additional overhead of loading the TLB in software
vs. hardware can take its toll. In the worst case, a short yet critical
section of code could take over ten times as long to execute as the best case
because of TLB reloads. RISC processors are at their best when executing a
single task to completion. Being constantly interrupted to run a different
task can slow them down.
Predictability is another concern. It may not matter that a task usually meets
its deadline easily. What matters is whether or not it always meets its
deadline. Whether or not there's a memory cache miss or hit can make a big
difference in RISC-instruction execution time; see Figure 1(a). The contents
of the instruction pipeline can also make a difference. Program-execution time
can vary in the turbulent environment of a real-time system where I/O
constantly interrupts the CPU. To have more-predictable execution times, some
designers turn off caching when running their application. Unfortunately, this
makes most RISC processors run five to ten times slower.
Despite these difficulties, RISC can be effectively used for real-time work,
but both the system-software designer and application designer must be
careful. Figure 2 lists the guidelines I follow when writing real-time
software for RISC processors. To illustrate how you can apply these
application-design guidelines, I'll develop a packet-switch example with eight
serial inputs and four serial outputs. Packets of up to 80 data bytes are sent
in. These packets contain a byte describing the requested output port; they
send out the requested output port with a checksum; see Listing One (page 90).
The real-time specifications for the application example are:
The baud rate of the incoming lines is 20,000 baud (2000 bytes per second).
The baud rate of the outgoing lines is 40,000 baud.
Total incoming packet rate to any outgoing channel is known never to exceed
4000 bytes per second. An incoming packet should generate an outgoing packet
in 60 milliseconds or less for output channel 0. An incoming packet should
generate an outgoing packet in 250 milliseconds or less for an output channel
other than channel 0.
No packets should be lost.
There is hardware flow control on the output data lines other than channel 0.
Flow on these channels could be stopped for, at most, 15 milliseconds every
100 milliseconds.
Real-time requirements should always be kept in mind because almost every
aspect of application-software design affects real-time performance. For many
real-time applications, a data rate, a response, or both are specified.
Sometimes the response is implied. In this example, the data rate is 4000
bytes persecond for each output channel; data can come in on any input channel
at up to 2000 bytes per second. The response is 60 milliseconds for channel 0.
Once a packet is received, the system has no more than 60 milliseconds to
completely transmit a corresponding output packet. For the other three output
channels, the corresponding output packet should appear in 250 milliseconds or
less from the time an input packet is received.
This is a "hard" real-time system because it isn't allowed to lose or transmit
packets late. I've designed the example program to run on the RS/6000 RISC
processor to meet the real-time requirements. The application program was
designed to make it easy to show that it meets the real-time requirements.
This was done by making the application multitasking, running it on an
operating system with priority preemptive scheduling, and using rate-monotonic
(RM) analysis.


Designing the Tasks


Real-time applications should be designed with the proper number of tasks--no
fewer, no more. Too many tasks can waste time in extra scheduling and
context-switch overhead. In RISC-based systems, it's particularly important to
avoid needless context switches.
In the example, there are eight serial-input lines. While I could have used
eight separate tasks to read from these channels, I'll instead rely on
interrupt routines to buffer the input data. I reduced the number of context
switches by having a single input task read from all input channels. This task
could have been designed to wait for data on any input channel. Instead, it
uses an interval timer to poll each channel every ten milliseconds. This
further cuts down on context switches. The input task does as much work as
possible before giving up the CPU because its period is just short enough to
meet the response specification.
The input task puts each received packet in a single packet-list FIFO queue.
Next, it performs a checksum on each packet and puts each packet into an
output double buffer; see Figure 3. Each output buffer is emptied by a
channel-output task. When the output buffer send is empty, the output task
waits for the double buffer to be swapped. The input task swaps the double
buffer at the configured rate for that output channel--20 milliseconds for
output channel 0, 100 milliseconds for output channels 1 through 3.
Each output task writes the data from a double buffer to its associated output
device. Writing the data to the device means it requests the device driver to
send the data to the device. The device driver write-service routine buffers
the data, and an interrupt routine writes each byte to the output serial
device. I could have relied on this buffering by the driver and not had any
output tasks, but there's always the chance that the buffer may be too small.
By using output tasks, buffering is completely under application-software
control. The output tasks are threads and share the same address space as the
input task.
The eight serial input ports and the four serial output ports generate CPU
interrupts. The serial ports have a FIFO--they interrupt each time four
characters are received or sent. Since there's a hardware timeout, if less
than four but at least one character is received in a four-character time
period, an interrupt is generated. There's an interrupt from the clock every
ten milliseconds. The system also has an Ethernet card for networking. We can
log into the system remotely to start executing the sample program and
retrieve any error messages. The network is only used by non-real-time tasks
in this example.


Real-time Analysis


The packet-switch program was designed to make real-time analysis easy. First,
assume that all tasks (including interrupt handlers) will complete execution
for each of their cycles before the beginning of the next cycle. An input
packet will be processed and put in its output double buffer within 20
milliseconds of being received. Because the rate of the input task is ten
milliseconds, even a packet that just missed a poll will be handled during the
next period; see Figure 4. For channel 0, the corresponding output packet will
be written to the device within 20 milliseconds of being written to the double
buffer. The packet will be transmitted in another 20 milliseconds, making the
packet-response time 60 milliseconds in the worst case for channel 0. For the
other output channels, the output task runs every 100 milliseconds. The packet
response for these channels is 220 milliseconds--within the original
specification of 60-millisecond response for channel 0 and 250-millisecond
response for the other output channels.
The throughput requirements will be met as long as input and output buffers
are large enough and all tasks meet their timing deadlines. The example
program sets the buffer sizes in terms of throughput requirements and task
periods. To determine that the program meets the throughput and response
requirements, only an analysis showing whether all tasks meet their timing
deadlines needs to be performed.


Rate-monotonic Analysis



Rate-monotonic analysis is a technique for determining if a set of tasks will
always meet their timing deadlines. RM analysis is easy to apply on a set of
periodic tasks where the deadline for each task is the end of its period. Each
task should be scheduled according to priority preemptive scheduling; the
tasks with a shorter period have higher priority. Tasks with the same period
are still given different priorities. Priority preemptive scheduling means the
CPU runs the highest-priority task that isn't waiting for something. A
higher-priority task that finishes waiting will preempt a lower-priority task
to execute, then resume the lower-priority task when it waits again.
To apply the basic RM formula, the set of tasks being analyzed must have known
periods and known, bounded, worst-case execution times. Under RM theory, a set
of n periodic tasks will always meet their deadlines if the conditions in
Figure 5 are true. In Figure 5, Ci is the task-execution time and Ti is the
task period for task i. Bi represents the time during which task i can be
blocked by lower-priority tasks. This can be because they hold semaphores that
either task i, or some task with higher priority than task i, needs. For a
realizable system, Ci must include a context-switch time and a preemption
time. Bi must include worst-case preemption disable time and
interrupt-execution overhead (if the interrupt is generated by a device used
only by lower-priority tasks). These execution times must be worst-case times.
To apply RM analysis to the packet-switch example, we have a set of periodic
tasks with known periods, but we need worst-case execution times. To determine
the worst-case execution times, I timed each code segment starting with a
flushed cache, TLB, and pipeline using the RS/6000's high-resolution timer.
This gave an accurate picture of how long it takes to execute the code,
although it isn't the worst case because even a single cycle of execution of a
task can benefit from the memory cache, instruction, data pipeline, and
MMU-translation caching (the TLB). Since the application tasks are preemptive,
the cache, pipeline, and TLB will be disturbed while a task is executing its
cycle. This can cause longer execution times than those measured, so some
application designers actually turn off the cache when running their real-time
applications. We don't have to be that drastic.
The worst-case execution time (or at least a time longer than the worst-case
execution time) can be accurately determined without turning off the cache. If
each task restored the state of the cache and pipeline after it executed, then
the measured execution time (one-shot timing starting with an empty cache and
pipeline) would be the correct worst-case execution time. Each task doesn't do
this; the important part of the state is restored when the lower-priority task
runs after it resumes execution. The time it would take to restore the cache
and pipeline couldn't be longer than the execution time of the higher-priority
task. So you could double the measured execution time for the higher-priority
task to use in the analysis and pretend it had restored the CPU state.
Therefore, to determine the worst-case execution times, one-shot measured
times are doubled. This may seem like an extreme solution, but it's usually
better than disabling the cache and risking an increase in execution times by
a factor of five or more.
Table 1 lists the worst-case execution and blocking times for the application
tasks (running on a RS/6000 under LynxOS). The measured worst-case times have
been doubled. Table 2 lists the calculations necessary to apply the RM formula
for some of the tasks. According to this analysis, all tasks will complete
executing all their cycles before the beginning of their next cycle; all
deadlines will be met. Therefore, since the data throughput and packet
response times would meet specification if all task deadlines were met, the
requirements for the real-time application example are met.


Conclusion


With the proper operating system and careful coding and analyzing of
application software, RISC processors can be used effectively in real-time
applications. To achieve this, tasks should be designed to execute as much
code as possible before relinquishing the CPU, and as few tasks as possible
should be created. Task-execution times should be measured, and a real-time
analysis should be performed.
For more information on rate-monotonic analysis, I recommend A Practitioner's
Handbook for Real-time Analysis: A Guide to Rate Monotonic Analysis for
Real-time Systems. Boston, MA: Kluwer Academic, 1993.
Figure 1: (a) Code execution time for cache miss and cache hit. Processor:
RS/6000 50 MHz; cache: 0 wait state; main memory: 4 wait state. Cache-miss
case measured by a one-shot run of code using the nanosecond resolution timer
built into the RS/6000; (b) typical number of instructions for an interrupt
dispatcher--how many instructions to dispatch and interrupt to a C handler;
(c) time to save CPU context for one window and all windows. Processor: SPARC
MHZ 40, machine: SparcStation 2. Time to store a single-register window vs.
saving all eight-register windows. Times were measured in one-shot runs using
1-microsecond resolution time built onto the Sparcstation motherboard.
(a) Cache miss 4-Kbyte checksum 3275 microseconds
 Cache hit 4-Kbyte checksum 195 microseconds
 Cache miss getpid() system call 52 microseconds
 Cache hit getpid() system call 34 microseconds
(b) CISC (68040) 9 instructions
 RISC (SPARC) 114 instructions
(c) Single-register window 4 microseconds
 Eight-register window 30 microseconds
Figure 2: (a) RISC real-time system software guidelines; (b) RISC real-time
application software guidelines.
(a)
o Lock MMU TLB entries for interrupt handlers and high-priority tasks.
o Optimize code for software trap handling.
o Save as few registers as possible on system calls, interrupts, and context
switches.
o Publish performance numbers that can be used in a real-time analysis.
(b)
o Understand your real-time requirements.
o Design your application to use the fewest number of tasks possible.
o Program each task to do as much work as possible before relinquishing the
CPU.
o Use threads instead of full processes for tasks when you can.
o Perform a real-time analysis on your application using worst-case execution
times.
o Choose an operating system that follows RISC real-time system-software
guidelines.
 Figure 3: Packet-switch example design.
 Figure 4: Packet-response timeline.
 Figure 5: Under rate-monotonic theory, a set of n periodic tasks will meet
their deadlines if these conditions are True.
Table 1: Worst-case execution and blocking times for application tasks.
Preemption disable time is for LynxOS on an RS/6000. The treatment of the
Ethernet interrupt as a blocking time (instead of a task) is allowed because
of the LynxOS interrupt system.
Serial-output interrupt handler
number of tasks: 4
event: UART FIFO half empty (4 character times)
period: 1 millisecond
priority: hardware (highest)
execution time: 0.040 milliseconds
blocking time: 0.065 milliseconds (OS worst-case interrupt disable)
Serial-input interrupt handler
number of tasks: 8
event: UART FIFO full (4 character times)
period: 2 milliseconds
priority: hardware (2nd highest)
execution time: 0.045 milliseconds
blocking time: 0.065 milliseconds (OS worst-case interrupt disable)
Periodic-timer (clock) interrupt handler
number of tasks: 1
event: hardware clock
period: 10 milliseconds
priority: hardware (3rd highest)
execution time: 0.035 milliseconds
blocking time: 0.065 milliseconds (OS worst-case interrupt disable)

Input/process task
number of tasks: 1
event: clock timer
period: 10 milliseconds
priority: very high
execution time: 1.6 milliseconds
blocking time: 0.15 milliseconds (OS worst-case preemption disable)
 + 0.035 milliseconds (Ethernet interrupt blocking)
Output task 0
number of tasks: 1
event: output buffer switch
period: 20 milliseconds
priority: high
execution time: 0.11 milliseconds
blocking time: 0.15 milliseconds (OS worst-case preemption disable)
 + 0.035 milliseconds (Ethernet interrupt blocking)
Output tasks 1--3
number of tasks: 3
event: output buffer switch
period: 100 milliseconds
priority: medium
execution time: 0.55 milliseconds
blocking time: 15 milliseconds (H.W. flow control)
 + 0.15 milliseconds (OS worst-case preemption disable)
 + 0.035 milliseconds (Ethernet interrupt blocking)
Table 2: Calculations for applying rate-monotonic formula.
Output-interrupt 0
0.04 ms/1 ms+0.065 ms/1 ms<=1(2 1--1--1)
4%+6.5%<=100%
10.5%<=100%
Output-interrupt 0 will meet all its deadlines.
Output-interrupt 3
3*0.04 ms/1 ms+0.04 ms/1 ms+0.065 ms/1 ms<=4(2 1--4--1)
12%+4%+6.5%<=75.7%
22.5%<=75.7%
Output-interrupt 3 will meet all its deadlines.
Input-interrupt 7
4*0.04 ms/1 ms+8*.045 ms/2 ms--0.065 ms/2 ms<=12(2 1--12--1)
16%+18%+3.25%<=71.35%
37.25%<=71.35%
Input-interrupt 7 will meet all its deadlines.
Input/process task
34% (I/O int.)+0.035 ms/10 ms+1.6 ms/10 ms+0.185 ms/10 ms<=13(2 1--13--1)
34%+18.2%<=71.2%
52.2%<=71.2%
Input/process task will meet all its deadlines.
Note: Because the rate of the clock interrupt and the input/process task is
the same, the clock interrupt handler and input/process task are treated as
one task on the right-hand side of the equation.
Output-task 0
34% (I/O int.)+16.4% (Input)+0.11 ms/20 ms+0.185 ms/20 ms<=14(2 1--14--1)
34%+16.4% +0.55%+0.925%<=71%
51.9%<=71%
Output-task 0 will meet all its deadlines.
Output-task 3
50.4%+0.55%+3*0.55 ms/ 100 ms+15.185 ms/100 ms<=17(2 1--17--1)
50.4%+0.55%+1.65%+15.19%<=70.7%
67.79%<=70.7%
Output-task 3 will meet all its deadlines.
[LISTING ONE] (Text begins on page 54.)


/* conf.h -- tuning parameters for packet switch example */
#define OCHAN_MAX 8 /* number of serial input channels */
#define ICHAN_MAX 8 /* number of serial output channels */
#define IRATE_MIL_SEC 10 /* input/process task rate */
#define ORATE0_MIL_SEC 20 /* output task rate for output channel 0 */
#define ORATE_MIL_SEC 100 /* output rate for other output channels */
#define IBYTES_PER_SEC 2000 /* bytes per second on input channel */
#define MIN_PACKET_SIZE 4 /* minimum input packet size (bytes) */
#define PACK_BYTES_MAX 80 /* maximum number of data bytes/packet */
/* calculated buffer sizes based on throughput and task rates */
#define IBUFF_MAX ((IBYTES_PER_SEC*IRATE_MIL_SEC)/1000)
#define OBUFF_MAX ((IBUFF_MAX*ORATE_MIL_SEC)/IRATE_MIL_SEC)
#define PACKETS_MAX ((IBUFF_MAX*ICHAN_MAX)/MIN_PACKET_SIZE)
extern int num_ichans, num_ochans;
extern int packet_process_prio;
/* packet.h -- data structure definitions for packet switch example */
#define PACKET_START_BYTE 0xFE
/* internal structure for incoming/outgoing packets */
struct packet {
 struct packet *next;
 char checksum;
 char curr_bytes;
 char ochan;
 char nbytes;
 char data[PACK_BYTES_MAX];
};
/* structures for double buffers used for output */
struct buff {
 int nbytes;
 char data[OBUFF_MAX];
};
struct outbuff {
 int chan;
 int fd;
 int thread_id;
 int thread_priority;
 struct buff *ready_buff;
 struct buff *sending_buff;
 int sending_buff_sent;
 int new_send_sem;
 struct buff buffs[2];
};
extern struct outbuff ochans[];
extern struct packet *get_new_packet();
extern struct packet *deq_packet();
/* input.c -- code for input task of packet switch example */
#include <stdio.h>
#include <fcntl.h>
#include <time.h>
#include <signal.h>
#include "conf.h"
#include "packet.h"
/* input channel structures */

struct packet *partial_packet[ICHAN_MAX];
int input_state[ICHAN_MAX];
int infds[ICHAN_MAX];
unsigned char ibuff[IBUFF_MAX];
/* input states */

#define STATE1_NEED_STARTBYTE 0
#define STATE2_NEED_OCHAN 1
#define STATE3_NEED_BYTECOUNT 2
#define STATE4_NEED_DATA 3
/* input_packets -- read some data from incoming channel and store
 in packet structure. Enqueue these packets on the FIFO packet list. */
input_packets(chan)
int chan;
{
 int out;
 register int i;
 register struct packet *p;
 int state;
 out = read(infds[chan], ibuff, IBUFF_MAX);
 if (out <= 0)
 return out;
 p = partial_packet[chan];
 state = input_state[chan];
 for (i=0; i < out; i++) {
 switch (state) {
 case STATE1_NEED_STARTBYTE:
 if (ibuff[i] == PACKET_START_BYTE)
 state = STATE2_NEED_OCHAN;
 break;
 case STATE2_NEED_OCHAN:
 p = get_new_packet();
 p->curr_bytes = 0;
 p->ochan = ibuff[i];
 state = STATE3_NEED_BYTECOUNT;
 break;
 case STATE3_NEED_BYTECOUNT:
 p->nbytes = ibuff[i];
 if (p->nbytes > PACK_BYTES_MAX)
 p->nbytes = PACK_BYTES_MAX;
 state = STATE4_NEED_DATA;
 break;
 case STATE4_NEED_DATA:
 p->data[p->curr_bytes++] = ibuff[i];
 if (p->curr_bytes >= p->nbytes) {
 enq_packet(p);
 state = STATE1_NEED_STARTBYTE;

 }
 }
 }
 partial_packet[chan] = p;
 input_state[chan] = state;
 return 0;
}
void handler()
{
}
/* input_task -- input thread main loop. Read incoming data from all channels
and seperate into packets. Perform checksum on these packets and put packets
in output double buffer for the output task to take care of. */
input_task()
{
 int chan;
 static struct itimerval v;

 int output_rc, output0_rc;
 /* set periodic timer for input rate */
 signal(SIGALRM, handler);
 sigblock(sigmask(SIGALRM));
 v.it_value.tv_sec = v.it_interval.tv_sec = 0;
 v.it_value.tv_usec = v.it_interval.tv_usec = IRATE_MIL_SEC*1000;
 setitimer(ITIMER_REAL, &v, (struct itimerval *)0);
 output_rc = output0_rc = 0;
 for (;;) {
 sigpause(0);
 for (chan=0; chan < num_ichans; chan++) {
 input_packets(chan);
 }
 process_packets();
 if (output0_rc++ >= (ORATE0_MIL_SEC/IRATE_MIL_SEC)) {
 switch_obuffer(0);
 }
 if (output_rc++ >= (ORATE_MIL_SEC/IRATE_MIL_SEC)) {
 output_rc = 0;
 for (chan=1; chan < num_ochans; chan++) {
 switch_obuffer(chan);
 }
 }
 }
}
/* open_input_channels -- open serial input channels */

open_input_channels()
{
 int i;
 char name[80];
 for (i=0; i < num_ichans; i++) {
 sprintf(name, "/dev/ichan%-d", i);
 infds[i] = open(name, O_RDONLYO_NONBLOCK);
 if (infds[i] < 0) {
 fprintf(stderr, "couldn't open %s\n", name);
 exit(1);
 }
 }
}
/* main.c -- start of packet switch example */
#include <stdio.h>
#include <time.h>
#include <resource.h>
#include "conf.h"
int num_ichans, num_ochans;
int packet_process_prio;
main(argc, argv)
int argc;
char **argv;
{
 if (argc != 3) {
 fprintf(stderr, "usage %s <num in channels> <num out channels>\n");
 exit(1);
 }
 num_ichans = atoi(argv[1]);
 num_ochans = atoi(argv[2]);
 if (num_ichans <= 0 num_ichans > ICHAN_MAX) {
 fprintf(stderr, "bad number of input channels (%d). ", num_ichans);

 fprintf(stderr, "minimum is 1 maximum is %d\n", ICHAN_MAX);
 exit(1);
 }
 if (num_ochans <= 0 num_ochans > OCHAN_MAX) {
 fprintf(stderr, "bad number of output channels (%d). ", num_ochans);
 fprintf(stderr, "minimum is 1 maximum is %d\n", OCHAN_MAX);
 exit(1);
 }
 packet_process_prio = getpriority(PRIO_PROCESS, 0);
 open_input_channels();
 open_output_channels();
 init_output_tasks();
 init_packets();
 input_task();
}
/* output.c -- code for output double buffers and output threads */
#include <stdio.h>
#include <fcntl.h>
#include <pthread.h>
#include "conf.h"
#include "packet.h"
struct outbuff ochans[OCHAN_MAX];
/* switch_obuffer -- switch output double buffer */
switch_obuffer(chan)
int chan;
{
 struct outbuff *buffp;
 struct buff *tmp;

 buffp = &ochans[chan];
 if (!buffp->sending_buff_sent) {
 fprintf(stderr, "out of real-time. channel %d\n", chan);
 return;
 }
 /* switch buffers. double buffering */
 buffp->sending_buff_sent = 0;
 tmp = buffp->ready_buff;
 buffp->ready_buff = buffp->sending_buff;
 buffp->sending_buff = tmp;
 sem_signal(buffp->new_send_sem);
 buffp->ready_buff->nbytes = 0;
}
/* put_packet_in_buff: copy a packet (output format) to the output buffer */
put_packet_in_buff(chan, pkt)
int chan;
struct packet *pkt;
{
 struct outbuff *buffp;
 struct buff *rbuff;
 int i;

 if (chan == -1) { /* broadcast */
 for (i=0; i < num_ochans; i++) {
 put_packet_in_buff(i, pkt);
 }
 return;
 }
 if (chan >= num_ochans) {
 return; /* bad channel number */

 }
 buffp = &ochans[chan];
 rbuff = buffp->ready_buff;

 if (3+pkt->nbytes + rbuff->nbytes > OBUFF_MAX) {
 fprintf(stderr, "output buffer full %d\n", chan);
 return; /* output buffer full */
 }
 rbuff->data[rbuff->nbytes++] = PACKET_START_BYTE;
 rbuff->data[rbuff->nbytes++] = pkt->nbytes;
 rbuff->data[rbuff->nbytes++] = pkt->checksum;
 bcopy(pkt->data, &rbuff->data[rbuff->nbytes], pkt->nbytes);
 rbuff->nbytes += pkt->nbytes;
 return;
}
/* output_task -- main routine for each output thread. */
output_task(buffp)
struct outbuff *buffp;
{
 int out;
 for (;;) {
 sem_wait(buffp->new_send_sem);
 out = write(buffp->fd,buffp->sending_buff->data,
 buffp->sending_buff->nbytes);
 buffp->sending_buff_sent = 1;
 }
}
/* init_dbuff -- initialize a double buffer (one per each output channel. */
init_dbuff(chan, p)
int chan;
struct outbuff *p;
{
 char name[80];
 p->chan = chan;
 p->ready_buff = &p->buffs[0];
 p->sending_buff = &p->buffs[1];
 p->sending_buff_sent = 1;
 sprintf(name, "ochan%-d", chan);
 sem_delete(name);
 p->new_send_sem = sem_get(name, 0);
 if (p->new_send_sem < 0) {
 fprintf(stderr, "could not get sem %s\n", name);
 exit(1);
 }
}
/* init_output_tasks: initialize output double buffers, create output
threads*/
init_output_tasks()
{
 int i;
 int res;
 pthread_attr_t thread_attr;
 pthread_t tid;

 for (i=0; i < num_ochans; i++) {
 init_dbuff(i, &ochans[i]);
 pthread_attr_create(&thread_attr);
 thread_attr.pthread_attr_prio = packet_process_prio - 1 - i;
 res = pthread_create(&tid, thread_attr, output_task, &ochans[i]);
 if (res < 0) {

 fprintf(stderr, "could not create output thread %d\n", i);
 exit(1);
 }
 }
}
/* open_output_channels -- open serial output channels */
open_output_channels()
{
 int i;
 int fd;
 static char name[80];
 for (i=0; i < num_ichans; i++) {
 sprintf(name, "/dev/ochan%-d", i);
 ochans[i].fd = fd = open(name, O_WRONLYO_TRUNC);
 if (fd < 0) {
 fprintf(stderr, "couldn't open %s\n", name);
 exit(1);
 }
 }
}
/* packet.c -- routines and data structures to deal with packets */
#include "conf.h"
#include "packet.h"
/* packet lists */
struct packet *packet_head, *packet_tail;
struct packet *packet_free;
struct packet packet_table[PACKETS_MAX];
/* free_packet -- put packet on free list */
free_packet(p)
struct packet *p;

{
 p->next = packet_free;
 packet_free = p;
}
/* get_new_packet -- get a free packet */
struct packet *get_new_packet()
{
 struct packet *p;
 p = packet_free;
 packet_free = p->next;
 return p;
}
/* enq_packet -- put packet on FIFO packet list */
enq_packet(p)
register struct packet *p;
{
 p->next = 0;
 if (packet_head)
 packet_tail->next = p;
 else
 packet_head = p;
 packet_tail = p;
}
/* deq_packet -- get packet form FIFO packet list */
struct packet *deq_packet()
{
 register struct packet *p;
 if (p=packet_head)

 packet_head = p->next;
 return p;
}
/* init_packets -- initialize free packet list */
init_packets()
{
 register int i;
 for (i=0; i < PACKETS_MAX; i++)

 free_packet(&packet_table[i]);
}
/* process.c -- perform processing needed on packets */
#include "conf.h"
#include "packet.h"
/* do_checksum -- perform checksum on a packet */
do_checksum(pkt)
struct packet *pkt;
{
 register int sum;
 register char *p;
 register count;
 count = sum = pkt->nbytes;
 p = pkt->data;
 while (count--) {
 sum += *p++;
 }
 pkt->checksum = sum;
}
/* process_packets -- put each packet of FIFO packet queue in output buffer */
process_packets()
{
 register struct packet *p;
 do {
 p = deq_packet();
 if (p) {
 do_checksum(p);
 put_packet_in_buff(p->ochan, p);
 free_packet(p);
 }
 } while (p);
}
End Listing




















January, 1994
Polymorphic Protocols


Is there a solution to the Internet dilemma?




William F. Jolitz


Bill is the creator of 386BSD and is doing independent work on the Simplified
Internet Gigabit Networking Architecture (SIGNA), a software-only experimental
platform. He can be reached at wjolitz@ cardio.ucsf.edu or on CompuServe at
76703,4266.
For a computer system to interoperate on a network, its protocol
implementation must function identically to all other protocol
implementations. Specifications of this behavior are documented so that
implementors can understand the fine points of operation, while avoiding
intimate study of hundreds of different implementations.
Although rigid protocol specifications ensure network interoperability, they
also limit the extent to which a protocol can be altered to suit new
circumstances. This constraint, in turn, determines the life cycle of
surrounding networks in which the protocol is used. If freedom of
implementation isn't constrained, implementations become too costly for
widespread use. If implementations are too narrow, they may become successful,
but eventually languish for lack of flexibility. Thus, rigidly static
standards are a two-edged sword, as evidenced by the current crisis of
exhausted network address space on the Internet.


Internet Addresses


Network addresses identify computers (or hosts). While for convenience they
may be given human-usable names, these names are for our eyes only. All
network addresses are converted internally to a unique machine-interchangeable
identifier, which is then used in negotiating a packet through a series of
computer networks (collectively known as the internetwork or catenet) from the
source host sending the packet to (cross your fingers) a destination host
computer that ultimately receives the packet.
On the Internet, such addresses are broken into two fields--the network
portion to which the host is connected, and the host portion used to uniquely
locate the host on its local network; see Figure 1. It's the responsibility of
the Internet's routing mechanism to actually steer packets from the source
network to a destination network, whereupon the remaining host portion of the
destination address is then used to direct the message to its destination. The
two fields together provide a unique identifier for the system worldwide
(analogous to that of a name, residence address, and zip code).
The TCP/IP protocol suite, originally designed to replace the venerable NCP
protocol used in the ARPAnet, was thought to have more than adequate room for
expansion. Since the original ARPAnet NCP used 16-bit network addresses, it
was thought that a 32-bit network address space (which provided 65,536 times
as many addresses as NCP) would suffice. Instead of a few hundred distinct
computer networks totalling in the millions of computers, however,
hundreds-of-thousands of distinct computer networks now exist, each with from
1 to 50,000 computers present.
Networks are now considered useful as management entities. This
"dynamic-network" concept directly impacted the fixed address-space allocation
mechanism thought satisfactory for TCP/IP. Where once you might have handed
out a single host address, you now could hand out a network-number assignment
corresponding to hundreds or thousands of host addresses. This increase in
networks is rapidly consuming the four billion values contained in the TCP/IP
address space. (According to the Internet Protocol Transition Workbook: The
Catenet Model for Internetworking, the designers of TCP/IP weren't
unacquainted with this possibility, since at the time, other designers working
at the network-link layer--that is, Ethernet--thought a 48-bit address space
preferable. To limit the overhead of address fields, it was decided to
restrict the maximum length of the host portion of the Internet address to 24
bits; this does not include the network field, which was 8 bits for 256
networks. The possibility of true, variable-length addressing was considered.
At one point, it appeared that addresses might be as long as 120 bits each for
source and destination. The overhead in the higher-level protocols for
maintaining tables capable of dealing with the maximum possible address sizes
was considered excessive.)
In the early days of the ARPAnet, a typical computer (such as a Honeywell
mini) had limited total memory, usually 32 Kbytes. Additionally, the
per-packet protocol overhead of large addresses was a significant
consideration, since the network backbone consisted of a 56-Kbit/second coax
link handling many terminal sessions. The actual data payload of a single byte
corresponding to a keystroke made the 40-byte TCP/IP header overhead appear
excessive. As such, the historical reasons for deciding on fixed addressing
were not unreasonable: simplicity (for use in situations such as
bootstrapping) and limited bandwidth (due to use of terminal sessions).


Extensible Interoperability


Today, the backbones of the Internet--1.5 Mbit/sec. (T1) and 45 Mbit/sec. (T3)
links--handle an average data payload of 8 Kbytes or more (up to 64 Kbytes),
with file and application-information transfers dominating. As such, the
addition of even 30 bytes more per packet is a trivial price to pay for
broader scope. Even the most economical of workstations and PCs sold today
would barely notice the additional cost in increased packet overhead; yet the
fixed design of Internet address space inherently limits the number of systems
which can use this communications architecture.
What is necessary is an interoperable, extensible protocol which could be
redefined as needed, without reimplementing all of the networking software yet
again. This protocol would have to be redeployed, replacing the existing
Internet "installed base." (This is unlikely due to sheer numbers.)
At first glance, "extensible interoperability" seems a heretical concept.
After all, interoperability achieved success by cleaving to a rigid design.
However, looking at analogous work often yields insight into an apparently
contradictory goal. During the '70s, one trend in language design was
"extensible" languages, which let you build more elaborate operators, data
types, and syntax, as a means of tailoring the expression of the program to
fit its specific needs. Here lie some of the roots of object-oriented
programming--specifically, polymorphism.
Designing a protocol can be considered akin to designing a programming
language. Instead of denoting the syntax, we denote state diagrams and the
details of information exchanged. Like the famous Backus-Naur Formulation
(BNF) of language grammars, protocols can also be diagramed. They can be
designed with degrees of extensibility incorporated into the design and
specification. If the usage of a protocol varies greatly, its actual
expression will shift to accommodate the new use dynamically. These
polymorphic protocols are just one mechanism by which the
extensible-interoperability paradigm can be made concise.
The power of a good paradigm stems from its essential applicability in any
area of a discipline. Extensible interoperability provides the framework in
which a wide range of familiar computer concepts in other areas, such as
polymorphism, are examined, analyzed, and transformed to resolve an apparently
intractable problem in a completely different area, in this case, networking.


Whatever Happened to OSI?


The Open Systems Interconnect (OSI) was chartered by ISO as the sole,
officially sanctioned internetworking protocol standard for adoption
worldwide. By now, OSI was expected to displace ad hoc standards like TCP/IP.
But the transition from TCP/IP to OSI seems as distant as ever, and TCP/IP is
therefore compelled to stretch well beyond its design limits.
OSI is an example of a standard spanning too large a gap in time with too
broad a mandate. (Like the DoD's Euclid and Ada languages, OSI isn't
"compact.") In addition, the lack of reverse compatibility made the transition
uninteresting and painful--no matter how good a standard looks on paper,
testing occurs in the field, and no amount of paperwork can compel people in
the field to transfer to other protocols if the transition process is too
painful.
In the case of address space, OSI has no limitation like that of TCP/IP.
Because of its origins in the telecommunications sphere, where international
phone numbers have variable lengths, OSI's two network protocols, CONS and
CLNP, allow for large network addresses. To the designers' credit, a
considerable amount of work has been undertaken throughout the OSI protocol
stack to avoid future limitations inherent to its design.
Despite this care, OSI, like TCP/IP, is still, in essence, a product of the
"fixed-estimate" philosophy of design, since some portions are always of fixed
size (for example, the size of the header field is 255 bytes). Thus, the
"flag-day" phenomenon is not obviated in future use. It may be unlikely, since
the limits have been made much larger, but they still exist. Remember, the
prior limits were also thought sufficient--until they were found inadequate.
Due to economic and logistical dictates, this flag-day approach to protocol
conversion is no longer as feasible as it was when the much-smaller ARPAnet
was operational. For example, in the case of the Internet (growing at a
current rate of 30 percent annually), if we launched a new network protocol
today, the millions of extant hosts would grow to tens-of-millions by the time
the new protocol was in place, resulting in a "Red Queen's Race." Another
problem is the ever-increasing speed of technological change, which makes it
impossible to anticipate even short-term needs.
This isn't new to the computer industry, where the cost of obsolescence is a
common thing. Unlike the past hardware-driven economics, future economics will
be driven more by the scope of certain standards. At this point, the
transition to extensible dynamic standards becomes mandatory, with the
Internet address-space limitation a good current case example.


Polymorphic Protocols: A Simple Example


The Internet address-space limitation dilemma illustrates the power of
polymorphic protocols. By relying on the paradigms, not on the creation of new
protocols, we can keep the solution light and tractable. By working at the
network layer of the OSI 7-layer model, we can design a replacement for the IP
datagram in Figure 1, using as a basis for the extensible portion the extant
IP Options portion. IP options are a variable-length extension of the IP
header, which contains a sequence of possible optional data fields with
byte-sized tags, padded out to 32-bit aligned boundaries. These add
information for special treatment of the IP datagram, of which they're a part.
By turning all fields of a datagram into Options fields (which may be of
variable length), we're no longer bound by representational limits or
data-field bounds. This incurs a substantial cost, since header size and
decoding and encoding time will increase. We can, however, mitigate these
costs by providing a suggested content, ordering, and placement for certain
classes of datagram denoted by a "class" option. Optionally, an implemented
class subrange can be returned in a control message to suggest a preference.
If the datagram class is unknown, a successive field interpreter of the
datagram options list can be used to ensure interoperability. Unknown options
can thus be immediately passed back via a control message using exactly the
same mechanism as in today's IP (ICMP).
One key point here is the attempt to uncouple the protocol-interchange
abstractions (packet formats) from the actual implementations and their
current use, and to leave open a means by which a future variation won't
"break" older implementations. While this example might appear similar to
mechanisms used in higher levels of the OSI (presentation) and XNS
(application) protocols, they aren't the same. In this example, we're still
dealing with network-layer information below the transport layer.
This technique could also be used in a limited "reverse-compatibility"
mechanism, by implementing a dual-stack arrangement at the network layer to
attempt communications via the extensible format first, otherwise falling back
to the traditional IP. Migration to a new network protocol could then occur as
a three-phase project, over the course of a decade. A dual-stack
implementation would be replaced first by a hybrid (transmission and
high-level migration with compatibility kept intact), then by a
second-generation consolidated implementation (no compromises for reverse
compatibility).



The Value of Polymorphic Protocols


At this stage, the advantages of dynamic, extensible polymorphic protocols are
difficult to quantify. However, that value might become clearer if we glance
at other limits in TCP/IP that have begun to loom on the horizon; for example,
the packet size itself (64 Kbytes is pretty small), TCP sequence and
windowfield size (impacting gigabit networking over long distances), and
source and destination ports (each currently 16 bits, impacting the number of
applications accessible on a machine). How many flag days should we allocate
to these, or others not yet of immediate concern? Perhaps fixed-format
protocols are merely the computer equivalent of the buggy whip. In any case,
it's probably time to explore new ideas and approaches in this area--before
it's too late.


References


Barker, Paul and Colin J. Robbins. "You Cannot Promote OSI Applications over
OSI Networks." ConneXions (May, 1993).
Crocker, David. "The ROAD to a new IP." ConneXions (November, 1992).
Crocker, David. "Letters to the Editor: Responding to "Internet 2000
(10/92).'" ConneXions (February, 1993).
"A Generic Ultimate Protocol (GUP)." ConneXions (November, 1992).
Stallings, William. Networking Standards: A Guide to OSI, ISDN, LAN, and MAN
Standards. Reading, MA: Addison-Wesley, 1993.
U.S. Department of Defense. Internet Protocol Transition Workbook: The Catenet
Model for Internetworking. Washington, D.C.: July 1978.
 Figure 1: IP datagram format.


Proposals on the Internet Address Dilemma


TCP/IP is an example of accidental success. Intended solely for the purposes
of tying together the second-generation, DoD-sponsored computer-research
community, its "strength through simplicity" approach has been remarkably
effective. Tiny TCP/IP implementations have been done in as little as six
pages of C code ("tiny tcp"), yet supercomputers routinely use more elaborate
implementations of the same protocol to shoot gigabits of application data to
each other, achieving massive computational feats in coordination. However,
the address-space limit inherent in the design of TCP/IP is a concern to both
standards working groups and the Internet community.
In an attempt to extend the standard, many proposals focus on amending the
TCP/IP address-space limit. These include TUBA (use the OSI protocols to
attach to the existing Internet applications layer), SIP (minimize the
existing IP header and use the space recovered from jettisoning features to
stick on more address bits), PIP (a completely novel internetworking protocol
that attempts functionality enhancements beyond current IP), EIP (add more
address bits via IP options), IPAE (encapsulate mode address bits in the
existing IP datagram), and Nimrod (separate routing mechanism from the
existing IP host number, which is left unchanged).
One proposal, GUP, is akin to our concept of polymorphic protocols. GUP
discusses the possibility of using triplets of the form <type,length,value>,
but takes degrees of freedom to the utmost (an enforced variable-length
approach) and is permanently hindered by the inability to increase performance
as needed. (You have to iterate through the variable-length fields instead of
using ready-made, arranged groupings.)
There's little consensus as to how to solve this dilemma. Since the list of
answers runs the gamut from redistribution methods (SIP) to entirely new
protocols (PIP), don't expect any agreement soon--just ad hoc solutions.
--W.F.J.




































January, 1994
Examining OS/2 2.1 Threads


Understanding the scheduler is the key




John M. Kanalakis, Jr.


John is a programmer with CTB, Macmillan/McGraw Hill and can be contacted at
408-649-7478.


One reason for using modern 32-bit operating systems is that you can
simultaneously run multiple applications which require real-time processing.
In OS/2 2.1 this is made possible by a preemptive multitasking kernel capable
of scheduling tasks back-to-back to reduce or eliminate idle CPU time.
The OS/2 multitasking model is based upon execution of sections of code, as
opposed to entire programs. That is, OS/2 identifies the smallest unit of
execution to be a thread, rather than a process or program. Since processes
consist of one or more threads, many sections of a single process can execute
at once. A process also owns different resources. Every thread created by a
process shares the creating process's resources. This includes globally
declared variables, static variables, and physical devices. Each thread's
individuality centers about its own local stack. Since a thread is composed of
only registers and memory pointers, creation and context-switching are very
fast.


Thread States


Threads exist in one of three specific states: running, ready, or blocked.
OS/2 2.1 allows only one thread to be in a running state. Future
multiprocessor versions will allow a running-state thread to exist for each
active CPU. Threads waiting to be regularly scheduled are placed in the ready
state, and the scheduler decides when they will move to the running state and
be allowed CPU run time. The scheduler ignores threads in the blocked state
until they move to the ready state. Threads are usually put into the blocked
state to keep them from interfering with another thread's access to a
resource.
Depending on each thread's priority class and level, the scheduler determines
which thread will context switch from the ready state to running state. OS/2
maintains four priority classes--time critical, fixed high, regular, and idle
time (in order of descending priority). Time-critical threads are executed
instantly until they're blocked or destroyed and are often used for real-time
processing. Conversely, idle-time threads demand the least CPU attention and
run only when all higher-priority threads have been served. Each thread within
a specific thread class has an associated priority level between 0 (lowest)
and 31 (highest). The thread in the highest class and with the greatest level
will always execute first. Threads within the same class and at the same
priority level are scheduled in a round-robin fashion.


Round-robin Scheduling


In round-robin scheduling, ready threads posses the same priority as the
running thread, and the scheduler shares CPU access equally among them in an
order much like that of a circular linked list; see Figure 1. Though most
scheduling mechanisms implement some form of round-robin scheduling, there's
much debate over time-slice length vs. actual task-switching overhead time.
For instance, if it takes the scheduler 5 milliseconds to switch from one
thread to another and each time-slice is set for 20 milliseconds, then 20
percent of the total CPU time is inefficiently spent on switching between
threads. However, if the time-slice is set to 1000 milliseconds, then only 0.5
percent of the total CPU time is spent on switching between threads. Setting
longer time-slices may sound efficient, but with a greater time-slice the
system responsiveness becomes slower and the illusion of concurrency
dissipates.
One way OS/2 handles efficiency is that you can set the time-slice length in
the CONFIG.SYS. This lets you decide which value best meets your needs. Also,
OS/2 improves round-robin efficiency by implementing a bias, which allows the
smart scheduler to dynamically reset the priorities of different threads.


Implementing a Bias


Each thread's priority may be dynamically set. You're free to write code which
initially sets a thread's priority class and level, then dynamically reset
this value through API calls. More interestingly, OS/2's smart scheduler can
dynamically set these values under any of three conditions. The first
condition, a foreground bias, distinguishes the regular priority class from
the fixed-high priority class. When several threads are in regular execution,
the foreground bias boosts the priority of the foreground thread above other
threads also within the regular priority class. This provides smoother user
interaction, improved system responsiveness, better keyboard and mouse input,
and faster processing of posted messages.
The second condition in which the scheduler changes a thread priority is an
I/O bias. A thread receives an I/O bias after an I/O operation is performed.
An I/O bias increases the thread's level to its maximum value but does not
shift the thread into a higher class. This helps a thread to quickly release
ownership of a resource and to perform final data processing for another
thread.
Finally, there's the time-out bias. A thread of very low priority may rarely
obtain any CPU time. To compensate, the scheduler offers the time-out bias,
whereby threads within the regular class run after waiting a certain period of
time. This wait length is determined by OS/2's CONFIG.SYS file and has a
default of three seconds. Threads offered the time-out bias don't execute
within the length of a standard time-slice. To be fair to legitimately
higher-priority threads, OS/2 offers time-out-biased threads shorter
time-slices. CONFIG.SYS determines the lengths of both short and standard
time-slices. This lets the OS/2 smart scheduler decide how to efficiently
schedule threads.


Context Switching


The OS/2 scheduler handles context switching in a straightforward manner. An
application running on top of MS-DOS is, at any instant, composed of register
values that hold: machine instructions and work variables in the AX, BX, CX,
and DX registers; pointer values in the SI, DI, BP, and SP registers; segment
pointers in the CS, DS, ES, and SS registers; and CPU flags. The application
runs by incrementing the instruction pointer, IP, which points to machine code
stored in the code segment. To perform a context switch, the scheduler simply
saves the register values to a structure in memory; see Figure 2. This
structure, the thread control block (TCB), contains an entry for each thread
that exists, regardless of its state. When booting, the 8259 programmable
timer is set to trigger the clock interrupt (INT 8), roughly a thousand times
per second. After a specific clock interrupt, the current thread state is
stored in memory and the scheduler determines which thread to run next. The
context switch is then performed by duplicating the register information from
the next TCB entry into the CPU registers. Finally, the scheduler resumes with
a new thread's code execution.
With multiple threads accessing shared resources, one thread's actions may
affect another's processing. If two threads are free to simultaneously access
one global variable, it can become confusing to manage that variable's data.
You might want to use a Boolean flag variable set to True if any thread is
accessing a specific data structure. Normally, a thread could test the value
of the Boolean variable, find it False (available for access), set it to True
to publicly indicate the thread's access, and begin using the guarded data
structure. The only problem is that the scheduler might perform a context
switch just after the Boolean variable is tested, but before it resets; see
Figure 3. In this case, two threads own access to the guarded data structure.
A semaphore is a special variable managed by the kernel that's tested and
reset within a single CPU action. As a result, the scheduler can't perform a
context switch until the tested variable is actually reset. Since this
variable is established and maintained by the operating system, it's
guaranteed to protect guarded data from multiple threads.


Example Multithreaded Programs


Listing One (page 96) shows the fundamental framework of a multithreaded
program. The program does little more than inform the user when each thread
executes. It essentially stands as the fundamental framework to work from. The
preprocessor definition, INCL_DOSPROCESS, instructs the compiler to include
information relating to processes and threads. Each thread function marks the
existence of a new thread. The new thread exists as long as its function is in
scope or until it is destroyed with DosKillThread(). The only new variable
required is a thread handle of type TID. The main program creates the threads
by calling DosCreateThread() and passing a thread handle, thread function,
optional parameter and flag, and local stack size. Following the
DosCreateThread() calls, the two threads are created and run instantly. From
this point, three threads exist: main, ThreadFunction1, and ThreadFunction2,
all running simultaneously. main sleeps for 3700 milliseconds to allow the
other two threads to continue. If the main thread doesn't sleep, it may end
before the other two threads are even created. (Remember, ending the main
thread destroys all subsequent threads.) Both threads run until they lose
scope and are terminated. The final call in main to DosExit() closes the main
thread, but this call isn't required. Any code that exists within the main
function is permissible in the thread function. The thread's data elements are
shared with individual threads.
One problem with multithreaded programming is that two threads may change
important shared variables. OS/2 provides "critical sections" and "semaphores"
for handling these situations. A critical section is a region in memory that's
declared to be under special protection, which prevents multiple threads from
accessing it at the same time. When a thread reaches a critical section,
DosEnterCritSec() may be called to prevent other threads from accessing it.
Upon leaving the critical section, the thread must call DosExitCritSec() to
restore mutual access. Using the critical-section approach will guarantee that
no other threads will access the same data; however, this is inefficient. By
calling DosEnterCritSec(), the scheduler suspends all other threads within the
process whether they use the same data or not. A semaphore is a more
reasonable approach to data protection. As mentioned earlier, it is a special
flag used between threads.
The three types of semaphores are: events, mutual exclusion, and multiple
wait. Event semaphores, created with DosCreateEventSem(), notify a thread that
a certain event has occurred and are usually used for protecting shared data.
For a thread to wait for the semaphore before executing, it may call
DosWaitEventSem(), which blocks the waiting thread until the event-semaphore
signal is posted. When another thread is ready to signal the waiting threads,
it calls DosPostEventSem(). The waiting threads are then moved from a blocked
condition to a ready condition until the scheduler allows the function to
continue running. The thread can specify how long it is willing to wait before
timing out and returning an error as an argument passed to DosWaitEventSem().
Otherwise, by default, the thread waits indefinitely.
Listing Two (page 96) implements an event semaphore. The main thread is
executed and creates two peer threads. The main thread is blocked with a
semaphore until thread2 posts a flag; it's then blocked until thread1 posts a
flag. Thus, the main thread suspends while thread1 works; it then allows
thread2 to work, which allows main to continue. The code is similar to Listing
One. The added INCL_DOSSEMAPHORES includes semaphore information. Two event
handles are created to be used as the actual signal. A semaphore may also be
named, as in the example. By convention, the first seven characters in the
name must be \SEM32\ SemaphoreName. After both threads are created, main calls
DosWaitEventSem(), passing the handle of the event to wait for a completion
signal from thread2. At the same time, thread2 is making the same function
call to DosWaitEventSem(), waiting for the completion of thread1. Thread1
executes its code and, upon completion, signals the second thread to stop
waiting by posting a signal with DosPostEventSem(). Eventually, the main
function is returned to running status and finishes the program.
The mutual-exclusion (mutex) semaphore is commonly used to serialize access to
a shared resource. This means the mutex is a flag which informs other threads
that the desired resource is unavailable. The mutex semaphore is primarily
used for shared allocated resources, rather than shared data members. It's
created with DosCreateMutexSem() with the same arguments as those for event
semaphores. When a thread needs exclusive control over a resource, it calls
DosRequestMutexSem(). From this point, the thread has complete, unshared
access to the specified resource. Other threads attempting to use that
resource will be blocked in a sequential order. When the thread owning a
resource is finished, it calls DosReleaseMutexSem() to release possession.
Ownership is then transfered to the next waiting thread with the highest
priority.



Conclusion


Threads are becoming better recognized for their natural ability to increase
program efficiency, and it makes sense to limit the amount of CPU time wasted
on user input or the moving of disk heads. With a good design, multithreaded
programs can have amazing speed advantages over linear programs.
 Figure 1: Round-robin scheduling.
 Figure 2: The scheduler performs a context switch to save the register values
to a structure in memory.
 Figure 3: Two threads owning access to a guarded data structure.
[LISTING ONE] (Text begins on page 74.)

#define INCL_DOSPROCESS

#include <OS2.h>
#include <Stdio.h>
#include <String.h>

VOID EXPENTRY ThreadFunction1(ULONG);
VOID EXPENTRY ThreadFunction2(ULONG);

main()
{
 TID FirstThreadID, SecondThreadID;
 printf("Executing main thread.\n");
 DosCreateThread(&FirstThreadID, ThreadFunction1,0,0,4096);
 DosCreateThread(&SecondThreadID, ThreadFunction2,0,0,4096);
 DosSleep(3700);
 DosExit(EXIT_PROCESS, 0);
}
VOID EXPENTRY ThreadFunction1(ULONG)
{
 printf("Thread Function 1 is currently executing.\n");
}
VOID EXPENTRY ThreadFunction2(ULONG)
{
 printf("Thread Function 2 is currently executing.\n");
}

[LISTING TWO]

#define INCL_DOSPROCESS
#define INCL_DOSSEMAPHORES

#include <OS2.h>
#include <Stdio.h>
#include <String.h>

VOID EXPENTRY ThreadFunction1(ULONG);
VOID EXPENTRY ThreadFunction2(ULONG);

HEV EventHandle1, EventHandle2;
UCHAR SemaphoreName1[27], SemaphoreName2[27];
TID FirstThreadID, SecondThreadID;

main()
{
 printf("Executing main thread.\n");
 strcpy(SemaphoreName1,"\\SEM32\\FirstSemaphore");
 strcpy(SemaphoreName2,"\\SEM32\\SecondSemaphore");
 DosCreateEventSem(SemaphoreName1, &EventHandle1, 0, 0);
 DosCreateEventSem(SemaphoreName2, &EventHandle2, 0, 0);

 DosCreateThread(&FirstThreadID, ThreadFunction1,0,0,4096);
 DosCreateThread(&SecondThreadID, ThreadFunction2,0,0,4096);
 DosWaitEventSem(EventHandle2, SEM_INDEFINITE_WAIT);
 DosExit(EXIT_PROCESS, 0);
}
VOID EXPENTRY ThreadFunction1(ULONG)
{
 printf("Thread Function 1 is currently executing.\n");
 DosPostEventSem(EventHandle1);
}
VOID EXPENTRY ThreadFunction2(ULONG)
{
 DosWaitEventSem(EventHandle1, SEM_INDEFINITE_WAIT);
 printf("Thread Function 2 is currently executing.\n");
 DosPostEventSem(EventHandle2);
}
End Listings













































January, 1994
Symmetric Multiprocessing for PCs


Tips, tricks, and tools for Fortran NT apps




John Norwood and Shankar Vaidyanathan


John and Shankar work at Microsoft Corp. and can be reached at One Microsoft
Way, Redmond, WA 98052.


The traditional definition of supercomputing is generally restricted to
Fortran and mainframe computers. But with the advent of the 80486, Pentium,
MIPS, and DEC Alpha processors and 32-bit operating systems, such as Windows
NT, PC computational power has come to rival that of heavy-metal systems. In
this article, we'll describe techniques that provide this computational power,
focusing on multithreaded application development for single-processor and
symmetric-multiprocessor machines. We'll cover DLLs, shared common blocks,
multiprocess/multithreaded programming, and the Win32 API. We'll also provide
Fortran interface statements for the Win32 console API and a black-box
solution for calling 32-bit DLLs from 16-bit applications (such as Visual
Basic) under NT.
To illustrate, we'll implement a simple matrix-multiplication algorithm
(Listing One, page 98), then gradually add functionality.
Matrix multiplication is an easy algorithm to multithread since there's no
need to handle memory contention when writing to the final output matrix. The
driver module (Listing Two, page 98) is a minimal program that calls the
computation routine and provides the matrices to be multiplied, their
dimensions (assuming conformation), and the number of threads to perform the
task. In the compute module (compute.for), we have a common block that stores
the matrices A, B, and C and keeps track of the number of spawned threads
through MaxThreadCount. The subroutine Initialize initializes the common-block
variables from the parameters provided by the driver module--an inefficient
but simple approach. The subroutine Compute then spawns the specified number
of threads. Each thread has as one of its parameters a number corresponding to
the iteration that spawned it.
The essence of the implementation is the thread function MatMult, in which
each thread begins on the row corresponding to its thread number. When the
thread is finished multiplying that row by all columns of the second matrix,
it jumps by the total number of threads to a new row. Thus, the threads
"leap-frog" to the end of the matrix. This implementation can, at the end,
leave a few final threads operating, but it's simple to implement and avoids
memory-contention issues when writing to the final matrix.


SMP Issues and Results


Still, the approach we've just described doesn't avoid memory contentions on
read access to the input arrays. This isn't a problem on single-processor
machines since only one thread is accessing memory at any given instant. On
symmetric-multiprocessor (SMP) machines, however, it becomes a potentially
serious issue. SMP machines are coarse-grained, parallel machines that excel
in performing separate, discrete tasks, but don't have fine-grained
memory-arbitration and messaging facilities like those in vector or other
super-scalar implementations. This can result in processors "stalling" on
simultaneous access to a memory location that another processor is using.
Listing Three (page 98) is a simple first pass at minimizing this problem by
staggering the rows and columns each thread accesses. Each thread not only
leapfrogs on the rows of the first matrix, but also "chases" the subsequent
threads on the columns of the second matrix--thus minimizing the chances of
simultaneous memory access in the columns.
We experimented briefly with changing the process and thread priorities, but
achieved little measurable benefit. Pushing all threads and processes to the
maximum available priority may steal a few more time-slices from other
user-level threads, but the NT kernel still maintains its core activities. The
net result is that the machine becomes totally unresponsive to user input,
while net performance hardly changes. An additional side effect of real-time
priority is that on a four-processor machine, using more than four threads is
inefficient--the first four threads totally lock out any subsequent threads
until they terminate. Thus, the best mix of performance and utility is
achieved by eliminating all unnecessary applications and services and allowing
the sophisticated NT thread and processor scheduler to deliver its designed
functionality.
We ran the example under default priority, allowing a flexible number of
threads. We minimized the external interferences by stopping all the
extraneous applications and services we could, but we don't claim any
particular rigor in the results; they simply illustrate the benefits of using
SMP machines with minimal code redesign.
We ran Listing Two with the Listing Three modifications on an NCR 3400 machine
with four processors and 64 Mbytes of physical RAM. We experimented with a
different number of threads on a 300x300 matrix for A, B, and C and averaged
our results over 100 runs. The compiler options used were /Ox and /Ob2; Table
1 shows the results.
On a four-processor machine, a simple task such as matrix multiplication with
four threads was about 3.8 times faster than a task with a single thread and
3.7 times faster than a nonthreaded task. As more and more threads are
generated, thread overhead and context switching take their toll.


Build Option Considerations


To convert this computational module into a DLL that can be called from
different applications (and dynamically loaded), you simply add the dllexport
attribute to the Compute subroutine and use the /LD compiler option. This
automatically creates an import library for linking calling applications.
While there are a number of options for linking NT Fortran executables, /LD is
the only supported option for linking DLLs. /LD causes the DLL to be linked
with a run-time import library. The actual run-time code resides in
MSFRT10.DLL, which must be in the path. This has the following implications
for applications that intend to call the DLL: If the calling executable is a
console application, you should compile it with the /MD option. The executable
can then be linked to the run-time import library, and both the application
and the DLL will share the same instance of the run time from the run-time
DLL. Unit numbers opened in the executable will be accessible in the DLL and
vice versa, and screen I/O using WRITE or PRINT statements will coordinate
correctly.
If the calling application is compiled and linked using the /ML
(single-threaded static run-time) or /MT (multithreaded static run-time)
options, the executable and DLL will use separate instances of the run-time
code. Then unit numbers will be local and separate for the executable, and the
DLL and screen I/O from the executable might not appear as expected if the DLL
also does screen I/O.
A Win32 application can be either a console subsystem executable or a Windows
subsystem application. Console applications are always in text mode and can't
attain higher-resolution screen output, but screen output is rapid. Windows
applications have full access to the user-interface functionality provided by
the Win32 API. If a Microsoft Fortran application is compiled and linked to be
a console application, the user interface consists of character-mode input and
output in a console window using READ, WRITE, or PRINT statements. There's no
default for text positioning or mouse input. Microsoft Fortran allows you to
generate Windows subsystem applications using the QuickWin implementation and
/MW compiler option. Because QuickWin has its own API--similar to that for
graphics output under DOS and 16-bit Windows in earlier versions of the
compiler--it's very easy to generate a Windows subsystem application.
QuickWin applications must be statically linked (implied by /MW). Furthermore,
Windows subsystem applications don't have console windows available by
default. Thus, any screen I/O from a Fortran DLL (which is, by definition,
console I/O) won't be visible.


Common Blocks Across Processes


All threads of a given process share the same address space and can access the
process's global variables; this can vastly simplify communication between
threads. Suppose, however, we want to accomplish the same task of matrix
multiplication by spawning processes instead of threads, which is a possible
scenario on an SMP. We need to allow the same common block to be accessed
across all the processes. Spawned processes inherit handles to files, console
buffers, named pipes, serial-communication devices, and mailslots; but they
don't inherit global variables.
Listing Four (page 98), a modified version of Listing Three, spawns processes
instead of threads. The common block is named bridge with the attribute
dllexport and is contained in the source file bridge.for along with a DATA
statement. Since this module contains only data, the linker can be used with
the /EDIT option to rename the .data section and set the new section
attributes as read, write, and shared. The common block for the shared data
must have at least one data item initialized in a DATA statement or it will
not be stored in a section that can be modified. If there's any run-time
functionality in this file, renaming the .data section will cause the code to
fail. The parent process could wait for all the child processes to signal
events, but for simplicity, here the parent process sleeps for ten seconds
before it prints out the results. The initialization and driver routines
remain the same, and the child process (process.for) identifies its number
from the command-line argument passed to it by its parent (compute.for).


Console Input and Output


Many Fortran applications rely on character-mode input and output for speed
and simplicity. While the default functionality of READ and WRITE statements
is largely sufficient, sometimes more control is required (for text
positioning and trapping mouse input, for example). Windows NT provides these
text-mode services via the console API. The Win32 API documentation gives
detailed descriptions of these functions and an overview of the console API.
We've included complete interface files that can be included in Fortran
console applications and DLLs for increased text-mode services.
Interfacing console output from Windows subsystem applications is largely
unexplored territory. For example, if a Windows application calls a DLL that
does console output, that output is normally just sent to the bit bucket. If
the DLL calls AllocConsole to create a console window, the output can be
displayed.
But there's another level of complexity in this situation. By default in
console applications, run-time screen I/O functions like READ, PRINT, and
WRITE are associated with the handles stdin and stdout. These handles are, in
turn, associated with the console input screen buffer and the output screen
buffer. A Windows subsystem application calling AllocConsole creates a console
window but it won't automatically associate the run-time screen I/O file
handles with that console window. This isn't a concern when using the console
APIs, since they directly use the console input and output handles, but READ
or WRITE statements to the screen will fail. The solution is to force an
association between the run-time standard handles and the console standard
handles. This is illustrated in the InitConsole routine (available
electronically; see "Availability," page 3). The logical sequence is as
follows. First, the console input and output handles are obtained by calling
CreateFile with the special filenames of CONIN$ and CONOUT$. These console
file handles are converted to C run-time file handles using the
_open_osfhandle routine. The _dup2 C run-time function then forces stdin,
stdout, and stderr to be identical to the C file handles pointing to the
console. The C routines are always available to the Fortran code in this
article, since the interface statements are included in the console.fi and
console.fd files (available electronically).
The InitConsole routine allows a self-contained DLL capable of doing console
I/O using both the console API and run-time screen I/O. The DLL needs to know
the type of application (console, Windows subsystem, or other) calling it so
it can adjust its functionality accordingly. This information is provided in
the final parameter from the calling application. The code displays the
progress of the matrix multiplication using different colors for different
threads. This requires use of a global critical section to synchronize access
to the console-output buffer. Console output is limited in resolution, but it
is very rapid and easy to implement.


Mixed-language Considerations



Microsoft Fortran PowerStation 32 for Windows NT is capable of mixing with
Microsoft Visual C/C++ for Windows NT. While this process is easier than with
prior versions, certain tricks make the process even easier. For example, the
default naming and passing convention for a Fortran subroutine is a modified
version of stdcall that has the following effect on routine names: They are
prepended with an underscore, appear in all capital letters, and are appended
with the @ symbol, followed by the number of bytes passed on the stack for the
argument list. The passing convention is that arguments are pushed on the
stack from right to left. Unlike the cdecl convention, however, the callee,
not the caller, cleans up the stack. By default, Fortran passes all arguments
by reference, but this can be overridden using the value attribute. Passing by
reference means that every argument will usually require four bytes of stack
space for its address. The only exception is character variables, which
require eight bytes: four for the address, followed by four for an integer
passed by value that contains the string length. The hidden string-length
parameter is required by the character*(*) indeterminate size type allowed in
Fortran. The hidden string parameter following character variables can be
suppressed using the stdcall or c attribute on the name of the subroutine or
in an interface statement to the Fortran subroutine. The amount of stack space
for an argument passed by value is always rounded up to the next multiple of
four bytes that can contain the data item.
The default structure packing for both Microsoft C/C++ and Fortran is 8-byte
packing. The only allowed metacommands or compiler options for Fortran are for
1-, 2-, and 4-byte packing, so 8-byte packing is only possible as the default.
Common blocks are accessible from C as global static structs owned by Fortran.
The dllexport attribute allows common blocks to be exported from DLLs, as
demonstrated in the previous example illustrating shared memory in a common
block.
Microsoft C/C++ and Fortran development environments let you manage projects
in their respective languages. The Fortran Visual WorkBench allows debugging
mixed-language applications since it has access to both C and Fortran
expression evaluators. It is often convenient to use the Fortran Visual
WorkBench debugger from the C/C++ Visual WorkBench. Using the Tools menu makes
this easy. In the C/C++ Visual WorkBench, go to the Options menu and select
Tools. Click the Add button and browse for the Fortran f32vwb.exe file. Enter
$(TARGET) in the Arguments text field. When you select the Fortran Visual
WorkBench under the Tools menu, it will automatically start debugging the
application you're working on in the C/C++ Visual WorkBench.
Linking mixed-language C/C++ applications with Fortran object modules requires
that the Fortran library LIBF.LIB (static single thread), LIBFMT.LIB (static
multithread), or MSFRT.LIB (DLL run-time import library) precede the
equivalent LIBC.LIB, LIBCMT.LIB, or MSVCRT.LIB. The versions of both libraries
should come from the Fortran LIB directory. Since the order is important, the
libraries should be added from the Options Projects Linker Input text field.
You should not add them via the Additional Libraries text field or by
including them in a project. To link a C/C++ Win32 Windows application that
includes Fortran object files, use all the libraries normally used by C/C++
and add LIBF.LIB and CONSOLE.LIB. The latter is necessary to provide routines
for file I/O.


32-bit DLLs and 16-bit Applications


There are times when it is desirable to call a 32-bit DLL from a 16-bit
application. In such cases, the 16-bit application could run either under
16-bit Windows 3.x or under the WOW layer on Windows NT. In the first case,
you'd use Win32s and Universal thunk facilities. In the second, you'd turn to
Generic thunk functionality. Since this article is mainly about running on NT
and threading, we'll examine only the second case.
Peter Golde of Microsoft has graciously shared a 16-bit C DLL that can be used
in conjunction with Visual Basic to access 32-bit DLLs under NT. The scope of
this article doesn't permit a discussion of the implementation details of this
solution, but the DLL is provided with a VB example in Listing Five (page xx)
that calls our multithreaded matrix-multiplication DLL. Peter's solution is
elegant because it can be used to call any 32-bit DLL under NT without
creating a new 16-bit DLL each time. DLLs called from 16-bit applications
can't do console I/O (AllocConsole fails), thus requiring the "other" (WIN16$)
case in the final parameter that is passed in to the DLL from the driver
module, and this will suppress all console I/O from the example code.


Conclusion


The advent of SMP machines at affordable prices promises to bring to the
desktop more computational horsepower then ever before. The granularity of the
parallelism they offer needs to be considered, but even simple modifications
to conventional algorithms combined with multithreading can result in a
tremendous boost beyond single-processor expectations.


References


Norwood, John. "Mixed Language Windows Programming." Dr. Dobb's Journal
(October 1991).
Vaidyanathan, Shankar. "Multitasking Fortran and Windows NT." Dr. Dobb's
Sourcebook of Windows Programming (Fall 1993).
Vaidyanathan, Shankar. "Building Windows NT Applications using Fortran."
Proceedings of the Tech*Ed Conference.  FR-301, volume 2. (March 1993).
Table 1: Matrix-multiplication results on an SMP.
 Threads Time in Seconds
 none 21.77
 1 22.21
 2 11.33
 3 7.62
 4 5.91
 5 5.82
 6 5.89
 7 5.82
 8 5.82
 30 5.94
 60 6.02
[LISTING ONE] (Text begins on page 80.)

C The triple DO loop that performs matrix multiplication
 Do i = 1, A_ROWS
 Do j = 1, B_COLUMNS
 Do k = 1, A_COLUMNS
 C(i, j) = C(i, j) + A(i, k) * B(k, j)
 End Do
 End Do
 End Do

[LISTING TWO]

 include mt.fi'
 include flib.fi'

**** Driver program to do the Matrix Multiplication. Input matrices are *****
**** initialized to random values here. Maximum number of threads to be *****
**** spawned is also identified here. *****
 Program Driver

 include flib.fd'
 real*4 ranval
 integer*4 i, j, k, inThreadCount
 integer*4 A_Rows, A_Columns, B_Columns
 real*4 A[Allocatable](:,:), B[Allocatable](:,:), C[Allocatable](:,:)

 A_Rows = 50 ! size of A array
 A_Columns = 100 ! size of B array
 B_Columns = 100 ! size of C array
 inThreadCount = 8 ! number of threads to be spawned

 Allocate (A(A_Rows, A_Columns), B(A_Columns, B_Columns),
 + C(A_Rows, B_Columns) )
 Do i = 1, A_Columns
 Do j = 1, A_Rows
 Call Random (ranval)
 A (j, i) = ranval
 End Do
 Do k = 1, B_Columns
 Call Random (ranval)
 B(i, k) = ranval
 End Do
 End Do
 Call Compute (A, B, C, A_Rows, A_Columns, B_Columns, inThreadCount)
 End
***** Initiate transfers data from the arguments into the common block. *****
 Subroutine Initiate(In_A, In_B, In_A_Rows, In_A_Columns,
 + In_B_Columns, In_Thread_count)
 real*4 In_A(In_A_Rows, In_A_Columns)
 real*4 In_B(In_A_Columns, In_B_Columns)
 integer*4 In_A_Rows, In_A_Columns, In_B_Columns
 integer*4 In_Thread_count, i, j, k
 include common.inc'
 MaxThreadCount = In_Thread_count
 A_Rows = In_A_Rows
 A_Columns = In_A_Columns
 B_Columns = In_B_Columns
 Do i = 1, A_Columns
 Do j = 1, A_Rows
 A (j, i) = In_A(j, i)
 End Do
 Do k = 1, B_Columns
 B(i, k) = In_B(i, k)
 End Do
 End Do
 End ! Initiate
***** MatMult is where the actual calculation of a row times a column is *****
***** performed. This is the thread procedure. *****
 Subroutine MatMult (CurrentThread)
 include common.inc'
 integer*4 CurrentThread
 automatic
 integer*4 i, j, k
C The loop variable i ranges from the current thread number to the
C maximum number of rows in A in steps of the maximum number of threads
 Do i = CurrentThread, A_Rows, MaxThreadCount
 Do j = 1, B_Columns
 Do k = 1, A_Columns
 C(i, j) = C(i, j) + A(i, k) * B(k, j)

 End Do
 End Do
 End Do
 End ! MatMult
***** Compute does the actual computation by spawning threads. *****
 Subroutine Compute
 + (In_A, In_B, In_C, In_A_Rows, In_A_Columns,
 + In_B_Columns, In_Thread_count)
 real In_A(In_A_Rows, In_A_Columns)
 real In_B(In_A_Columns, In_B_Columns)
 real In_C(In_A_Rows, In_B_Columns)
 integer In_A_Rows, In_A_Columns, In_B_Columns
 integer In_Thread_count
 include common.inc'
 external MatMult
 integer*4 ThreadHandle [Allocatable](:), threadId
 integer*4 CurrentThread[Allocatable](:), count
 integer*4 waitResult
 integer*4 i, j
 Call Initiate (In_A, In_B, In_A_Rows, In_A_Columns,
 + In_B_Columns, In_Thread_count)
 Allocate (ThreadHandle(MaxThreadCount),
 + CurrentThread(MaxThreadCount) )
 Do count = 1, MaxThreadCount
 CurrentThread(count) = count
 ThreadHandle(count) = CreateThread( 0, 0, MatMult,
 + CurrentThread(count), 0, threadId)
 End Do
C Can't wait on more than 64 threads
 waitResult = WaitForMultipleObjects(MaxThreadCount,
 + ThreadHandle, .TRUE., WAIT_INFINITE)
C Transfer result from common back into return argument.
 Do i = 1, A_Rows
 Do j = 1, B_Columns
 In_C(i,j) = C(i,j)
 C(i, j) = 0.0
 End Do
 End Do
 Deallocate ( ThreadHandle, CurrentThread )
 End ! Compute

######################################################################
C File Name: common.inc
 include mt.fd' ! Data declarations for Multithreading API
 include flib.fd' ! Data declarations for runtime library
 real*4 A, B, C ! Input Matrices A & B and Output Matrix C
 integer*4 A_Rows, A_Columns, B_Columns ! Matrix Dimensions
 integer*4 MaxThreadCount ! Maximum numner of Threads
 common MaxThreadCount, ! common block
 + A_Rows, ! Rows in A = Rows in C
 + A_Columns, ! Columns in A = Rows in B
 + B_Columns, ! Columns in B = Columns in C
 + A(1000, 1000),
 + B(1000, 1000),
 + C(1000, 1000) ! Maximum Array size is 1000 X 1000

[LISTING THREE]

C This is variation in the MatMult subroutine Do loops. Loop variable i ranges

C from current thread number to maximum number of rows in A in steps of
C maximum number of threads. Loop variable j ranges across all columns of B,
C but is staggered according to current thread number to minimize memory
C contention on an SMP machine. Loop variable jj translates (maps) value of j
C to fall within permissible range of B, that is from 1 to B_Columns

 Do i = CurrentThread, A_Rows, MaxThreadCount
 Do j = (CurrentThread-1)*MaxThreadCount,
 + B_Columns + (CurrentThread-1)*MaxThreadCount - 1
 jj = 1 + mod(j, B_Columns)
 Do k = 1, A_Columns
 C(i, jj) = C(i, jj) + A(i, k) * B(k, jj)
 End Do
 End Do
 End Do

[LISTING FOUR]

C File Name: Driver.for
C Include contents of Program Driver from Listing 2 here
C Then modify all occurrences of InThreadCount to InProcCount

######################################################################
C File Name: Compute.for
 include mt.fi'
 include flib.fi'
C Include contents of Subroutine Initiate from Listing 2 here
C Then modify all occurrences of InThreadCount to InProcCount
C Compute does the actual computation by spawning processes
 Subroutine Compute(In_A, In_B, In_C, In_A_Rows, In_A_Columns,
 + In_B_Columns, In_Proc_Count)
 real*4 In_A(In_A_Rows, In_A_Columns)
 real*4 In_B(In_A_Columns, In_B_Columns)
 real*4 In_C(In_A_Rows, In_B_Columns)
 integer*4 In_A_Rows, In_A_Columns, In_B_Columns
 integer*4 In_Proc_Count
 include mt.fd'
 include flib.fd'
 include common.inc'
 logical*4 ProcHandle ! Process Handle
 integer*4 x, y, count
 character*32 inbuffer [Allocatable] (:)
 record /PROCESS_INFORMATION/ pi ! Process Information
 record /STARTUPINFO/ si ! Startup Information

 si.cb = 56 ! Size of Startup Info
 si.lpReserved = 0
 si.lpDeskTop = 0
 si.lpTitle = 0
 si.dwFlags = 0
 si.cbReserved2 = 0
 si.lpReserved2 = 0

 Call Initiate (In_A, In_B, In_A_Rows, In_A_Columns, In_B_Columns,
 + In_Proc_Count)
 Allocate (inbuffer(MaxProcCount) )

 Do count = 1, MaxProcCount
 write(inbuffer(count),"(A7, 1X, I4)") process', count

 ProcHandle = CreateProcess( 0, loc(inbuffer(count)),
 + 0, 0, .TRUE. , 0, 0, 0, loc(si), loc(pi))
 print"(+',a,i5)", "Generating Process # " , count
 End Do

 write(*,*)
 write(*,*)
 Call sleepqq(10000) ! Sleep for 10000 milliseconds
 Do x = 1, A_Rows
 Do y = 1, B_Columns
 In_C(x,y) = C(x,y)
 C(x,y) = 0.0
 End Do
 End Do
 End ! Compute
######################################################################
C File Name: Process.for -- MatMult is the Process that multiplies the
C appropriate Row of A with the appropriate column of B
 Program MatMult
 include common.inc'
 automatic
 integer*4 CurrentProc, i, j, k, jj
 character*32 buffer
 integer*2 status
C Obtaining the command line arguments
 Call GetArg (1, buffer, status)
 read (buffer(1:status), (i4)') CurrentProc
 Do i = CurrentProc, A_Rows, MaxProcCount
 Do j = (CurrentProc-1)*MaxProcCount,
 + B_Columns + (CurrentProc-1)*MaxProcCount - 1
 jj = 1 + mod(j, B_Columns)
 Do k = 1, A_Columns
 C(i, jj) = C(i, jj) + A(i, k) * B(k, jj)
 End Do
 End Do
 End Do
 End
######################################################################
C File Name: Bridge.for -- The common block for shared data must have one data
C item initialized in a DATA statement or it will not be stored in a section
C that can be modified. The LINK /EDIT command is used to rename the .data
C section and set the new sections attributes as read, write, shared. The
C source files should contain only the common declaration and the DATA
C statement. If there is any runtime statements, renaming the .data section
C will call the cause the code to fail.
 Subroutine dllsub[dllexport]
 real*4 A, B, C
 integer*4 A_Rows, A_Columns, B_Columns
 integer*4 MaxProcCount ! Maximum number of processes
 common /bridge[dllexport]/ MaxProcCount,
 + A_Rows,
 + A_Columns,
 + B_Columns
 + A(100, 100),
 + B(100, 100),
 + C(100, 100)
 data MaxProcCount /0/
 End
######################################################################

C File Name: Common.inc
C Common Block contents
 real*4 A, B, C
 integer*4 A_ROWS, A_COLUMNS, B_COLUMNS
 integer*4 MaxProcCount
 common /bridge[dllimport]/ MaxProcCount,
 + A_ROWS,
 + A_COLUMNS,
 + B_COLUMNS,
 + A(1000, 1000),
 + B(1000, 1000),
 + C(1000, 1000)
######################################################################
# File Name: Makefile

all: bridge.dll process.exe driver.exe
bridge.dll: bridge.obj
 link /edit bridge.obj /section:.data=.bridge,srw
 fl32 /LD bridge.obj
bridge.obj: bridge.for
 fl32 /LD /c bridge.for
process.exe: process.obj bridge.lib
 fl32 /MD process.obj bridge.lib
process.obj: process.for common.inc
 fl32 /MD /c process.for
driver.exe: driver.obj compute.obj bridge.lib
 fl32 /MD driver.obj compute.obj bridge.lib
driver.obj: driver.for common.inc
 fl32 /MD /c driver.for
compute.obj: compute.for common.inc
 fl32 /MD /c compute.for

[LISTING FIVE]

VERSION 2.00
Begin Form Form1
 Caption = "Form1"
 ClientHeight = 6045
 ClientLeft = 1095
 ClientTop = 1485
 ClientWidth = 9180
 Height = 6450
 Left = 1035
 LinkTopic = "Form1"
 ScaleHeight = 6045
 ScaleWidth = 9180
 Top = 1140
 Width = 9300
 Begin CommandButton Compute
 Caption = "Compute"
 Height = 375
 Left = 1200
 TabIndex = 1
 Top = 5040
 Width = 1575
 End
 Begin Grid grdC
 Height = 4335
 Left = 1200

 TabIndex = 0
 Top = 480
 Width = 6495
 End
End
 These declarations set up the two core functions to access the CALL32 DLL:
 Declare32 and CALL32. These are the only two functions you need to use to
get
 access to any 32-bit DLL. The Option Base is used to start arrays at
 index 1 just as in Fortran
Option Base 1
Declare Function Declare32 Lib "call32.dll" (ByVal Func As String, ByVal
Library As String, ByVal Args As String) As Long
Declare Sub Compute Lib "call32.dll" Alias "Call32" (A As Single, B As Single,
C As Single, A_ROWS As Long, A_COLUMNS As Long, B_COLUMNS As Long,
MaxThreadCount As Long, DO_CONSOLE As Long, ByVal id As Long)
Const A_ROWS% = 30
Const A_COLUMNS% = 200
Const B_COLUMNS% = 30
Const DO_CONSOLE% = 3
Dim A(A_ROWS, A_COLUMNS) As Single
Dim B(A_COLUMNS, B_COLUMNS) As Single
Dim C(A_ROWS, B_COLUMNS) As Single
Dim MaxThreadCount As Long
Dim idCompute As Long
Dim i As Long
Dim j As Long
Sub Compute_Click ()
 This code simply initializes the two input arrays and then calls the
 32-bit DLL to multiply them. It then puts the result in the grid.
 MaxThreadCount = 8
 Randomize
 For i = 1 To A_COLUMNS
 For j = 1 To A_ROWS
 A(j, i) = Rnd
 Next j
 For k = 1 To B_COLUMNS
 B(i, k) = Rnd
 Next k
 Next i
 Call Compute(A(1, 1), B(1, 1), C(1, 1), A_ROWS, A_COLUMNS, B_COLUMNS,
 MaxThreadCount, DO_CONSOLE, idCompute)
 For i = 1 To A_ROWS
 grdC.Row = i
 For j = 1 To B_COLUMNS
 grdC.Col = j
 grdC.Text = Str$(C(i, j))
 Next j
 Next i
End Sub
Sub Form_Load ()
 This code sets up the call to the CALL32 DLL by first using the Declare32
 function to get an id number. At this point CALL32 creates a function
pointer
 to that 32-bit DLL subroutine and all access to the routine will be through
 that function pointer. The code also initializes the row and column numbers
 and sets the size of the grid fields.
 idCompute = Declare32("COMPUTE", "compute", "pppppppp")
 grdC.Rows = A_ROWS + 1
 grdC.Cols = B_COLUMNS + 1
 grdC.Row = 0
 For i = 1 To B_COLUMNS
 grdC.Col = i
 grdC.Text = Str$(i)

 grdC.ColWidth(i) = TextWidth("123.1234567")
 Next i
 grdC.Col = 0
 For i = 1 To A_ROWS
 grdC.Row = i
 grdC.Text = Str$(i)
 grdC.RowHeight(i) = TextHeight("1") + 10
 Next i
End Sub
End Listings




















































January, 1994
PROGRAMMING PARADIGMS


The Meyer Method




Michael Swaine


In October, this magazine took what you might call a controversial position on
one of the most popular languages in use today. It was the annual
object-oriented issue, and the cover shouted: "Beyond C++: Considering the
Alternatives." The clear implication was that C++ was not the final word on
object-oriented programming languages. Okay, I lied. You wouldn't call that a
controversial position. Nobody who has a clue as to what "object oriented"
means would argue that C++ is the final word on object-oriented languages.
Still, to cast aspersions on a language that seems to be gathering momentum
like a snowball on a Sierra slope is to fling down the defiant mitten of
challenge. And we certainly flung it in that issue's nine articles on
alternatives to C++, which ranged from the lofty and intricate structures of
Eiffel, to no-class Drool. (Sorry, Dave.) The case, or at least one case,
against C++ is that it is not as "pure" an object-oriented language as, say,
Eiffel or Smalltalk. That's a point the language's biggest boosters can grant.
The crucial test, they would say, is: Is it what I need? If what I need is a
better C, what does it matter that C++ isn't pure? And C++ does add certain
desirable features to C.


Comparing Paradigms


You can't argue with that, although you can still argue that C++ is not a
really fine object-oriented language. But what kind of argument is that? Those
who criticize C++ as a less-than-pure object-oriented language are judging it
against standards that its designer may not have had in mind when he designed
it, and that most of its users may not have in mind when they use it. Is it
fair to judge a language by standards that its designer and many of its users
don't hold it to?
Well, is it fair to hold inner-city gang members to laws they may not buy
into? It is if we believe in the enduring and universal value of those laws.
Judging C++ against object-oriented standards begs the question of the value
of those standards.
Right about here I need to confess that I really don't care whether C++ is a
"pure" object-oriented language or not. In fact, I'll have nothing more to say
about the question in the rest of this column.
I brought up the issue as an example of a kind of debate that comes up in
discussions (and the design) of programming languages. It's similar to the
historical arguments about the dangers of GO TO statements. Such arguments
compare one programming paradigm with another, or one aspect of one paradigm
with an aspect of another.
Such arguments usually can't be resolved by writing some code.
It's much the same in science: You can't directly compare one paradigm with
another because they use terms differently, have different goals, and approach
their subjects differently. You usually can't design a clean experiment that
decides which paradigm is the right one, the way that you can decide between
rival theories within a single paradigm on the basis of one crucial
experiment.
It's similar with programming paradigms: You can't decide which one is "right"
because they use terms differently, have different goals, approach their
subjects differently. In science and in programming, paradigms usually get
replaced because long experience with alternative paradigms shows that one
just seems to work better than another.
Nevertheless, some concrete programming techniques can be used to compare
paradigms. I'll present one shortly.


C++ Blinders


First, though, a very specific criticism of C++: The fact that C++ is a better
C gets in the way of its being seen and used as a truly object-oriented
development environment.
It's possible to buy and use a C++ development environment without ever really
dealing with the object-oriented features of the language. I've been working
lately with the Symantec C++ development environment for the Mac. Some 93
percent of the documentation is generic to Symantec's line of C-based
products. Reading the documentation, you could easily convince yourself that
you had purchased Symantec's Think C compiler. Which, in fact, you have: It's
part of the package. But virtually none of the documentation presents C++ as
an object-oriented development environment. It tells what all the features
are, but not why they're there.
I'm not faulting Symantec; that's how C++ is. It is a consequence of C++'s
being a good enhancement of C that makes it really easy to use C++ without
ever adopting the object-oriented paradigm; in fact, without ever learning
object-oriented programming.
Merely having a C++ development environment does nothing to educate you about
object-oriented concepts. Having a knowledge of C++ doesn't necessarily mean
that you know object-oriented programming; and it's precisely because of this
that good books on the subject are necessary.


Meyer's Method


Let me point you to Bertrand Meyer, creator of Eiffel and an author with a
solid understanding of the theory of object-oriented programming.
In one of his books, Meyer uses an interesting technique for explaining the
differences between two theoretical constructs in object-oriented
programming--between aspects of two different programming paradigms. It's a
technique that I think should be in any programmer's "intellectual" toolkit.
Here's what Meyer does:
In one chapter of his Object-oriented Software Construction (Prentice Hall,
1988), Meyer compares the concepts of inheritance and genericity. Inheritance,
specific to object-oriented languages, lets you construct modules through
successive specialization and extension. Genericity, a feature of Ada that was
originally introduced in Algol-68, is defined as the ability to define
parameterized modules, with the parameters usually being types. Both
inheritance and genericity are ways of making software components more
extendible and reusable. Both make use of overloading (more than one meaning
for one name) and polymorphism (more than one form for one program entity).
Meyer asks the obvious question: If inheritance and genericity are two
attempts to do the same thing, that is, to make more-flexible modules, how do
they compare? Are they redundant? Incompatible? Complementary? Should one
choose between them, or does it make sense to combine them?
Having asked the question, or questions, Meyer could immediately go on to
answer them. However, he doesn't choose to do that; instead, he works through
what you need to think about in order to answer the questions for yourself.
First, Meyer presents examples of the uses of genericity and inheritance,
carefully chosen to demonstrate the most salient features and consequences of
the two techniques. These are just the kind of examples you'd find in books on
Ada and Eiffel, or Smalltalk programming. His genericity examples include
parameterized routines and packages, and they touch on constrained and
unconstrained genericity. For inheritance, he works through the design of a
general-purpose module library for files, with classes like FILE, TEXT_FILE,
DIRECTORY, DEVICE, and TAPE. The point of the examples is not to evaluate the
techniques, but to examine them in enough depth that you feel that you have a
grasp of their characteristics.
Next, he uses your knowledge of these characteristics to work through the
process of simulating each technique in terms of the other: simulating
inheritance using genericity, and simulating genericity using inheritance.
(Having just worked through the examples makes it easier for you to see what
would constitute an acceptable simulation.)
He approaches the simulation of inheritance by trying to construct inheritance
in Ada, a language that doesn't have it. (Negatives in technology are always
susceptible to time decay; let's say Ada doesn't traditionally have
inheritance, and didn't in the version he used.) He asks whether Ada can be
made, through its mechanisms of genericity, to simulate the characteristics of
inheritance. Overloading, he says, is easy, but polymorphism is a different
story. The closest he can come to simulating polymorphic entities is to use a
record with variant fields, a feature that even Pascal has. This attempt,
though, falls short in several ways. So he concludes that you can't, in fact,
simulate inheritance using genericity.
Next, he shows how to simulate genericity with inheritance, using his own
object-oriented language, Eiffel, as the vehicle. Perhaps not surprisingly, he
demonstrates that genericity can be simulated by inheritance. Inheritance is
the more general concept. The real point, though, is that you see the details
of just how he simulates genericity using inheritance.
It isn't pretty. He needs to employ spurious duplications of code, and the
conceptually simpler of two cases turns out to be just as complex to implement
as the conceptually more difficult.


Meyer's Moral


The moral of Meyer's lesson, or at least the moral that I draw from it, is not
that Eiffel is better than Ada, or that inheritance is better than genericity,
or even that genericity and inheritance are just different approaches to the
same problem and have different strengths and weaknesses. The moral, I think,
is that these techniques embody different ways of thinking about the problem
at hand.
It seems entirely possible that an experienced Ada programmer just getting
started with Eiffel might employ inheritance just as though it were a tool for
simulating genericity. Such a programmer would end up writing unnecessarily
complex, and probably inefficient, code.

And it wouldn't make much difference, I suspect, if that Ada programmer had
been told that the effective use of inheritance requires a different way of
thinking about problems than does genericity. Most of the time, we use the
tools we know how to use in the ways we know how to use them, and we use
unfamiliar tools in the same ways. If it can be used like a hammer, it will
be.
But having worked through the exercise of implementing genericity and
inheritance in terms of one another, that Ada programmer would have the
conceptual background to be able to see why he probably shouldn't pound nails
with the new tool.
Okay, schematically, what Meyer has done is this: To compare the concepts x
and y, he implements x in terms of y and vice versa. This, I claim, is a
special, computational, concrete case of a more general, noncomputational,
abstract technique that you may be familiar with from other contexts: ensuring
that you understand related concepts by defining each in terms of the other.
That technique is an old and a useful trick. The idea is, if you can figure
out how to define x in terms of y, you can be assured that you understand x,
at least in the context of y. But if you can also define y in terms of x, you
have a context-free understanding of the relationship between x and y.
Meyer's technique is exactly the same thing, except that you're not just
writing definitions, you're writing code; so you can be more sure that you've
grasped the relationship: You can test programs more easily than you can test
definitions.
Once you've implemented x in terms of y and vice versa, you are in a position
to be able to see the implications of using one technique or the other.
I mentioned before that one paradigm usually replaces another only on the
basis of people's experience with the two, and the perception that one seems
to "work better." Meyer's method is a shortcut to the relevant experience.
I think that Meyer's trick is an important tool for examining theoretical
issues concretely. The point is not to see how efficient each implementation
is, but to see how it's done: to understand the architecture of each concept.
The efficiency issue is a different thing altogether, part of implementation
evaluation. This, of course, is very important. There's an example of it in
that same October issue: Mike Floyd's comparative implementations of linked
lists in a dozen languages.
Meyer's trick is something else. It's a tool for understanding programming
concepts.


Modules vs. Connections


How else could Meyer's technique be used? How about in attacking the eternal
debate in mental science and artificial intelligence between connectionist and
modular paradigms? Is it possible to implement such models in terms of one
another?
Not easily. Before starting, you would immediately run up against the
complication that neither model solves the problem set: Neither is a model of
the mind, neither tells how to build a Turing-test intelligent system. Still,
it seems worth asking whether some sense can be made of the real practical
differences between these approaches using Meyer's technique. I'm not equipped
to answer the question, but maybe a sketch of the debate will inspire someone
who is.
Actually, the two paradigms have many goals in common, and maybe they aren't
such distinct paradigms at that. But each is a kind of metatheory, making no
predictions, but simply characterizing what acceptable theories can look like.
In that sense, they are not directly comparable and can legitimately be
thought of as distinct paradigms. At least that's how I understand it.
The modular paradigm assumes that intelligence is made up of parts and that
the parts can be understood or implemented separately. Virtually all computer
models of mental phenomena and virtually all work in artificial intelligence
before the advent of neural-network models was modular. Programming languages
are probably inherently biased toward decomposing problems and implementing
solutions through distinct modules.
The connectionist paradigm assumes that intelligence is a matter of which
inputs get hooked up with which outputs, and how. Neural nets are the
programming realization of connectionist thinking.
Both modularity and connectionism apply equally well in principle to
artificial intelligence and to the natural type, but you can only trace the
history of the AI cases back a short distance in history. Most of the history
is in theories of natural intelligence.
Plato and Aristotle were modularists: They both described the soul as
tripartite. John Locke was a more recent philosopher who tried to define the
faculties, or functional modules, of the mind. Phrenology took the idea to a
ridiculous extreme, trying to read personality from bumps on the head, which
presumably were associated with oversized brain modules. Intelligence testing
in this century was an attempt to discover, through the statistical method of
factor analysis, the factors that made up intelligence. Noam Chomsky's claims
for the special nature of speech are consistent with the modular paradigm.
Connectionism is most clearly seen in the psychological school of strict
Skinnerian behaviorism, which takes as its purpose the elucidation of the
connections between inputs (stimuli) and outputs (responses). Neural-network
models aren't as strict (one might say blind) as Skinnerian behaviorism, but
do share some of its biases. Learning is a matter of increasing the strength
of some connections with respect to others. The raw material of mind is
homogeneous. There are no hardwired subsystems of thought.
So: Is it possible to implement a connectionist model using modules, and vice
versa? Obviously, nobody's going to implement a full connectionist model of
the mind in terms of modules or vice versa; but implementing aspects of the
paradigms would be interesting enough. Is that possible?
Yes, apparently. In principle, any modular theory can be modeled by a
connectionist system. In fact, researchers in these fields have done one or
the other, although it's not clear that both sides of the trick have been
done, which is the point.
What I'd like to see is something like a neural-net model of Chomsky's
generative grammar along with a modular model of the connectionist account of
language acquisition.
That would be cool.




































January, 1994
C PROGRAMMING


Symantec C++ Professional




Al Stevens


A few months ago, I wrote about the tech support that Symantec provides on
CompuServe and some problems that I had. In response to that column,
Symantec's technical staff ambushed me at Software Development '93 in Boston.
At least three tech supporters and one vice president cornered me. My message
came through loud and clear, they said. I have since tried using their tech
support again. Let me set the record straight by saying that they have
addressed the problem and are solving it. When you sign onto the Symantec C++
Professional forum (GO SYMDEVTOOLS) you are treated to tech support from the
fountainhead.
Symantec C++ has a long genealogy going back to the mid-eighties. Walter
Bright created Northwest C. Northwest C begat DataLight C, which begat Zortech
C++, which begat Symantec C++ Professional. Walter is now one of the tech
supporters on CompuServe. I don't know how long they can keep it up or how
long Walter will be content doing tech-support penance. For now, though, your
problems get attention and action, not only from Walter, but from a complement
of other Symantec developers as well. Symantec CEO Gordon Eubanks himself
fielded a question one day. How much more attention could you want?
Symantec C++ Professional is a full-featured compiler that supports DOS,
extended DOS, Windows 3.1, Win32, and NT development. My first interest in it
involved the DOS compiler. An earlier version had fatal template problems, and
I could not use it to compile the template exercises in my C++ tutorial book.
The new version has corrected those problems, but some new ones have crept in.
I can, however, code workarounds, which I could not do before. For instance,
Example 1(a) will not compile in SC++. The compiler insists that you code the
member pointer with the <T> qualification like Example 1(b). I don't mind
doing that, but three other compilers, Borland C++, Watcom C++, and Comeau
C++, accept the first idiom as well, and, as near as I can tell, either way is
correct.
The second problem involves template functions. The code in Example 2 does not
compile. The compiler complains that it finds no function to match Foo(int *)
even though the Foo template function is there to match it.
Of all the C++ compilers I have used, SC++ is the only one that declares a
fatal compile error if a non-void function fails to return something. All of
the others, including cfront, allow the program to compile, although some of
them issue a warning. The SC++ compiler will not compile programs such as
Example 3(a).
On the surface, it would appear that SC++ is the only correct compiler in this
respect, because the C++ Annotated Reference Manual (ARM) says that falling
out of the bottom of a type-returning function is illegal. However, the ARM
contradicts that rule when it says that Example 3(b) is a "correct and
complete" program, yet SC++ will not compile it. Most C++ books, including
Stroustrup's The C++ Programming Language, Second Edition, contain many such
programs with implicit int main functions that do not return anything and that
SC++ will not compile.
There are two ways around the problem: One is to always return something from
main; the other is to declare main as void. I don't like either one, although
I have used the latter. The compilers let you get away with declaring main as
void for two reasons. First, the ARM says that main is not "predefined" by the
compiler and that its type is "implementation dependent." Second, a lot of
prior art exists because AT&T's cfront compiler allows it.
On the other hand, I don't like adding superfluous return 0; statements to the
main function in every program that I publish. The majority of compilers do
not insist on the practice, and many respected C++ programmers and writers
don't use it. The extra statements add unnecessary lines of code to books and
articles that need an economy of code to make their points and teach their
lessons. To get around the SC++ error message, I use this statement in the
makefiles:
sc "-Dmain=void main" $*.cpp
The statement coerces a main function into a void main function, and SC++
compiles the program.
By using that compile command and changing the template exercises to avoid the
pointer idiom that SC++ does not like, I was able to compile and run all of
the exercises in my tutorial with SC++. There are some differences in the way
that SC++ implements iostreams, but that's nothing new. All compilers
implement them differently.
Programmers don't care as much about the correctness of code as programming
writers do. The programmer has a job to do, and if there is a workaround to a
problem, the programmer is glad to use it. Issues of portability can be
postponed in the interests of expedience. Writers, on the other hand, try to
write code that works in all of the various platforms readers use. At the same
time, we want it to be correct. C++ is not a standard language. The ARM's
specification, which is the base document for the ANSI committee's
deliberations, has ambiguities. Different implementors make different
interpretations; no one is right, and no one is wrong.
I was not able use SC++ to build the persistent-object database code from
another book. The program compiles fine, but locks the computer when I run it.
Execution does not get beyond the startup code. I put error traps at the
beginning of main and in the constructor of the only class with a global
object, and neither trap executed. I built the program by including debugging
information according to the docs and ran it under CodeView, but CodeView
failed to recognize the source code and behaved as if it were debugging a
machine-language program without source code. The SC++ documentation for
compiling mentions a Symantec debugger, but I could not find a DOS debugger in
the package. The new, improved Symantec tech support assures me that all such
problems will be addressed. They could be fixed by the time you read
this--Symantec posts patches on CompuServe.


Support Hose


There's no better pipeline for answers, workarounds, and bug fixes than
public, online forums such as those on CompuServe. It is almost the perfect
solution to most tech-support problems. The phone is never busy, and you can
post your questions without waiting for a qualified tech-support person to
become available. Everyone participates at the most convenient time for them.
No one checks your serial number to make sure you registered. Your question is
seen by a host of experts, including other users who can help. The vendor's
quality of support is on display for their customers to evaluate. Patches and
upgrades are there to be downloaded. Other than for your connect time, there
is no charge. What could be better?
A word of advice for programmers who use tech-support forums: patience. It
takes the vendor a while to get around to fixing problems. They have to react
to unexpected bugs and at the same time manage their technical resources
within the framework of release schedules, which are usually driven by
marketing considerations. That's what keeps them in business. Therefore, if
you submit a problem and don't have a patch to fix it within a day or two,
don't get upset. If the problem is a show-stopper (most are not) and you need
to return the product and switch to another vendor, do so, but don't publicly
whine at and chastise the tech-support people. Chances are, you'll have
similar but different problems with the product you switch to.


Tutorial Torture


The exercises in my C++ tutorial book are a minor-league C++ torture test, and
if I were a compiler vendor, I'd get a copy of the book and make sure that my
compiler worked with those exercises. That way I wouldn't have to read about
me in my column when my compiler came out. The book is one of three on the
market named, Teach Yourself C++. One of the others was written by my pal,
Herb Shildt, who made off with my title in the middle of the night. The third
is subtitled _in 21 Days. I haven't read it, but the title makes about as much
sense as C++ for Dummies would. (Oh, no, don't tell me that they're actually
going to_.)


False Rumor


I picked up a false rumor at Software Development '93 in Boston. The grapevine
said that Borland's new C++ compilers, now in beta, will support Windows
development only--not DOS development. I called Borland for verification. The
rumor is not true. Borland told me that in trying to effectively apportion
their development resources, they released betas with support for Windows
development only. That, they say, is probably where the rumor started. The
released product will support 16-bit development for DOS as well as 16- and
32-bit Windows, Win32, and NT development.
I don't know how much of a market there is for unadulterated DOS C++ code
these days, so I cannot presume that anyone would make a marketing mistake by
dropping support for DOS development. I do know, however, that if you want to
develop code for a plain-vanilla iostream environment, perhaps to port all
over the place, and you have a PC to do it on, you need a DOS compiler. I will
always want to do that because much of the code that I write is about C++
itself, not about Windows. The programmers who use the code do not all have
Windows. You shouldn't need a Windows-hosted compiler and run-time system to
run the C++ code from a book. Furthermore, when I do write a Windows program,
I write and test the application algorithms in DOS and keep the user interface
separate. Borland's was always my favorite DOS C and C++ compiler with clearly
the best source-level debugger. I'm happy to report that nothing has changed.


See PopPop


I wish Bjarne had named it something else. It's hard to pronounce. Speakers at
programmers' conventions make the microphone go "pop-pop" when they say the
cheek-puffing, lip-smacking "plus-plus." If you sit in the front row at one of
those sessions, you can count on being sprayed.


TED and the D-Flat++ Editor


I wrote some of this column with the TED text editor that I built to check out
the D-Flat++ applications model. Every now and then, TED forgets that he is in
word-wrap mode and starts scrolling the text to the left as a line gets longer
and longer. Then after a time, he remembers and properly wraps the stuff. I
think it has to do with whether or not I am typing a space when the program
gets to the right margin. Sometimes I'll delete a character to correct one of
my many typographical errors. TED deletes the character but then displays the
wrong line of text. I'm doing a lot of document saves.

As I find these problems, I fix them. The tab-expansion logic still has
problems, but I know where they are and what to do about them. Unfortunately,
fixing the problem risks a performance penalty that I haven't figured out how
to avoid. Well, if the big boys can release incomplete and buggy software, so
can I.
D-Flat++ has two text-editor classes. The first is the EditBox class, the
single-line editor that you typically see in dialog boxes. The File Open
dialog box uses one for the filename text-entry control. The second class is
the Editor class, which is derived from EditBox. The Editor class is the
multiline text editor. It wraps words and forms paragraphs.
Listing One, page 134, is editbox.h, the header file that describes the
EditBox class, which is derived from the TextBox class. The EditBox class adds
three data members: a text column number, a flag to indicate the
insert/overwrite mode, and a flag to indicate that the text has been changed
by the user. There are new member functions for moving the cursor from
character to character and word to word, and for inserting and deleting
characters in the buffer. The class overrides some of the TextBox's member
functions, because they will have different behavior, and adds some functions
of its own.
Listing Two, page 134, is editbox.cpp, the source code that implements the
EditBox class. It is surprising how little code it takes to implement a text
editor when much of the behavior is already defined in the base class. The
EditBox class intercepts the SetFocus, ResetFocus, Paint, Move, and ClearText
methods to turn the keyboard cursor on and off. The Keyboard method processes
keys unique to the EditBox and those different from the TextBox.
Next month, we'll discuss the Editor class, which is derived from the EditBox
class. Features I haven't built yet include the ability to mark blocks and do
clipboard operations. When those are finished, D-Flat++ will be completed.


Access Specifiers


Reviewing the D-Flat++ source code, I see that it has evolved in ways that
resemble many medium and large design projects. It needs a number of design
improvements, particularly among the class-member access specifications. You
can look at most class declarations and see public members that should be
private or protected. Some protected members should be private with protected
member functions to support access to them.
It isn't that the design does not work. It works. But parts of the design that
should be hidden are exposed. In a class design of any size and consequence,
such lapses in principle are inevitable. When something starts working, you
tend not to revisit it. But that doesn't mean that you can't do something
about it. Later, when time and schedule allow, I intend to overhaul the access
specification of all the classes. (I can still hear my mother's caution about
pavement and good intentions.)


The D-Flat++ Source Code


D-Flat and D-Flat++ are available to download from the DDJ Forum on CompuServe
and on the Internet by anonymous ftp. Page 3 of this issue has the details. If
you cannot get to one of the online sources, send a diskette and a stamped,
addressed mailer to me at Dr. Dobb's Journal, 411 Borel Ave., San Mateo, CA
94402. I'll send you a copy of the source code. It's free, but if you care to
support my Careware charity, include a dollar for the Brevard County Food
Bank. They support some of the needs of our hungry and homeless citizens.


Migrant Mouse


I'm doing a lot of traveling lately and taking a laptop along. With more and
more of my work being done in Windows, a mouse has become a necessary
traveling companion. If you've ever tried using a mouse on an airplane you
know that the space on a typical tray table is limited, and there isn't enough
room to maneuver your mouse. I checked out those clip-on trackball devices and
found them too hard to use. So, to solve the problem, I slide a note binder
into my lap under the tray table and run the mouse around on the binder. But
now, when I sit there with my hand in my lap, out of sight and moving around
in circles, the other passengers and the flight attendants tend to stare. I
stopped trying to explain. It only makes it worse when I tell them that I'm
manipulating my mouse.
Example 1: The compile problem in (a) is dodged by using the member pointer
with the <T> qualification in (b).
(a) template <class T>
 class Foo {
 Foo *nextfoo;
 // ...
 };


(b) Foo<T> *nextfoo;



Example 2: The template function.
template <class T>
void Foo(T *tp)
{
 *tp = 321;
}
template <class T>
void Bar(T& tr)
{
 Foo(&tr);
}

main()
{

 int x = 123;
 Bar(x);
}

Example 3: The compiler declares a fatal compile error if a non-void function
fails to return something in (a) even though this contradicts the ARM rule in
(b).
(a) #include <iostream.h>
 main()
 {

 cout << "Hello, Timna";
 }


(b) extern f();
 main() { }


[LISTING ONE] (Text begins on page 105.)

// -------- editbox.h
#ifndef EDITBOX_H
#define EDITBOX_H

#include "textbox.h"

class EditBox : public TextBox {
 void OpenWindow();
protected:
 int column; // Current column
 Bool changed; // True if text has changed
 Bool insertmode; // True if in insert mode
 virtual void Home();
 virtual void End();
 virtual void NextWord();
 virtual void PrevWord();
 virtual void Forward();
 virtual void Backward();
 virtual void DeleteCharacter();
 virtual void InsertCharacter(int key);
public:
 EditBox(const char *ttl, int lf, int tp, int ht, int wd, DFWindow *par=0)
 : TextBox(ttl, lf, tp, ht, wd, par)
 { OpenWindow(); }
 EditBox(const char *ttl, int ht, int wd, DFWindow *par=0)
 : TextBox(ttl, ht, wd, par)
 { OpenWindow(); }
 EditBox(int lf, int tp, int ht, int wd, DFWindow *par=0)
 : TextBox(lf, tp, ht, wd, par)
 { OpenWindow(); }
 EditBox(int ht, int wd, DFWindow *par=0) : TextBox(ht, wd, par)
 { OpenWindow(); }
 EditBox(const char *ttl) : TextBox(ttl)
 { OpenWindow(); }
 // -------- API messages
 virtual Bool SetFocus();
 virtual void ResetFocus();
 virtual void SetCursor(int x, int y);
 virtual void ResetCursor();
 virtual void SetCursorSize();
 virtual unsigned char CurrentChar()
 { return (*text)[column]; }
 virtual unsigned CurrentCharPosition()
 { return column; }
 virtual Bool AtBufferStart()
 { return (Bool) (column == 0); }
 virtual void Keyboard(int key);
 virtual void Move(int x, int y);
 virtual void Paint();

 virtual void PaintCurrentLine()
 { Paint(); }
 virtual void ClearText();
 virtual void LeftButton(int mx, int my);
 Bool Changed()
 { return changed; }
 void ClearChanged()
 { changed = False; }
 Bool InsertMode()
 { return insertmode; }
 void SetInsertMode(Bool imode)
 { insertmode = imode; ResetCursor(); }
};
inline Bool isWhite(int ch)
{
 return (Bool)
 (ch ==   ch == \n' ch == \t' ch == \r');
}
#endif

[LISTING TWO]

// ------------- editbox.cpp
#include <ctype.h>
#include "desktop.h"
#include "editbox.h"

// ----------- common constructor code
void EditBox::OpenWindow()
{
 windowtype = EditboxWindow;
 column = 0;
 changed = False;
 text = new String(1);
 BuildTextPointers();
}
Bool EditBox::SetFocus()
{
 Bool rtn = TextBox::SetFocus();
 if (rtn) {
 ResetCursor();
 desktop.cursor().Show();
 }
 return rtn;
}
void EditBox::ResetFocus()
{
 desktop.cursor().Hide();
 TextBox::ResetFocus();
}
// -------- process keystrokes
void EditBox::Keyboard(int key)
{
 int shift = desktop.keyboard().GetShift();
 if ((shift & ALTKEY) == 0) {
 switch (key) {
 case HOME:
 Home();
 return;

 case END:
 End();
 return;
 case CTRL_FWD:
 NextWord();
 return;
 case CTRL_BS:
 PrevWord();
 return;
 case FWD:
 Forward();
 return;
 case BS:
 Backward();
 return;
 case RUBOUT:
 if (CurrentCharPosition() == 0)
 break;
 Backward();
 // --- fall through
 case DEL:
 DeleteCharacter();
 BuildTextPointers();
 PaintCurrentLine();
 return;
 default:
 if (!isprint(key))
 break;
 // --- printable keys processed by editbox
 InsertCharacter(key);
 BuildTextPointers();
 PaintCurrentLine();
 return;
 }
 }
 TextBox::Keyboard(key);
}
// -------- paint the editbox
void EditBox::Paint()
{
 TextBox::Paint();
 ResetCursor();
}
// -------- move the editbox
void EditBox::Move(int x, int y)
{
 TextBox::Move(x, y);
 ResetCursor();
}
// --------- clear the text from the editbox
void EditBox::ClearText()
{
 TextBox::ClearText();
 OpenWindow();
 ResetCursor();
}
// ----- move cursor to left margin
void EditBox::Home()
{

 column = 0;
 if (wleft) {
 wleft = 0;
 Paint();
 }
 ResetCursor();
}
// ----- move the cursor to end of line
void EditBox::End()
{
 int ch;
 while ((ch = CurrentChar()) != \0' && ch != \n')
 column++;
 if (column - wleft >= ClientWidth()) {
 wleft = column - ClientWidth() + 1;
 Paint();
 }
 ResetCursor();
}
// ---- move the cursor to the next word
void EditBox::NextWord()
{
 while (!isWhite(CurrentChar()) && CurrentChar())
 Forward();
 while (isWhite(CurrentChar()))
 Forward();
}
// ---- move the cursor to the previous word
void EditBox::PrevWord()
{
 Backward();
 while (isWhite(CurrentChar()) && !AtBufferStart())
 Backward();
 while (!isWhite(CurrentChar()) && !AtBufferStart())
 Backward();
 if (isWhite(CurrentChar()))
 Forward();
}
// ---- move the cursor one character to the right
void EditBox::Forward()
{
 if (CurrentChar()) {
 column++;
 if (column-wleft == ClientWidth())
 ScrollLeft();
 ResetCursor();
 }
}
// ---- move the cursor one character to the left
void EditBox::Backward()
{
 if (column) {
 if (column == wleft)
 ScrollRight();
 --column;
 ResetCursor();
 }
}
// ---- insert a character into the edit buffer

void EditBox::InsertCharacter(int key)
{
 unsigned col = CurrentCharPosition();
 if (insertmode CurrentChar() == \0') {
 // ---- shift the text to make room for new character
 String ls, rs;
 if (col)
 ls = text->left(col);
 int rt = text->Strlen()-col;
 if (rt > 0)
 rs = text->right(rt);
 *text = ls + " " + rs;
 }
 (*text)[col] = (char) key;
 if (key == \n')
 BuildTextPointers();
 Forward();
 changed = True;
}
// ---- delete a character from the edit buffer
void EditBox::DeleteCharacter()
{
 if (CurrentChar()) {
 String ls, rs;
 unsigned col = CurrentCharPosition();
 if (col)
 ls = text->left(col);
 int rt = text->Strlen()-col-1;
 if (rt > 0)
 rs = text->right(rt);
 *text = ls + rs;
 changed = True;
 }
}
// ---- position the cursor
void EditBox::SetCursor(int x, int y)
{
 desktop.cursor().SetPosition(
 x+ClientLeft()-wleft, y+ClientTop()-wtop);
}
// ---- left mouse button was pressed
void EditBox::LeftButton(int mx, int my)
{
 if (ClientRect().Inside(mx, my)) {
 column = max(0, min(text->Strlen()-1,
 mx-ClientLeft()+wleft));
 ResetCursor();
 }
 else
 TextBox::LeftButton(mx, my);
}
// ---- set the size of the cursor
void EditBox::SetCursorSize()
{
 if (insertmode)
 desktop.cursor().BoxCursor();
 else
 desktop.cursor().NormalCursor();
}

// ---- reset the cursor
void EditBox::ResetCursor()
{
 SetCursorSize();
 if (visible)
 SetCursor(column, 0);
}
End Listings






















































January, 1994
ALGORITHM ALLEY


The Chip-is-Bad Fever




Tom Swan


Some people collect shells. Others collect stamps. I stockpile random-number
generators. Who knows? Maybe someday, they'll be as valuable as baseball
cards.
I still remember the first random- number generator I acquired. It gave me a
case of the "chip-is-bad" fever, a disease that causes hardware engineers to
repair erratic circuits by replacing every chip in sight. The generator was
implemented as an interrupt service routine in the operating system for an RCA
VIP computer with an 1802 microprocessor. At each clock tick, the subroutine
incremented register R9--one of 16 16-bit registers in the mighty 1802. To
obtain a number at random, you simply used the current register value. One
day, forgetting about that subroutine and noticing R9 values changing
willfully, I concluded the chip had gone bananas. Fortunately, just before
pulling the processor, I remembered the interrupt, which I easily disabled by
modifying a copy of the operating-system code.
Of course, it's hard to imagine a worse random-number generator than an
interrupt-driven clock subroutine! Because most VIP programs were games,
however, user-response times determined when programs accessed values in R9,
and the resulting sequences were as random as alphabet soup. The mix of two
nonrandom processes--a fast cycling register and human interaction--created
apparently random sequences: an excellent example of how combination
generators can produce "more random" results than their individual parts.
Last month, I examined the chi-square test for randomness. This time, I list
three random-number generators you can implement in most programming
languages. I'll also show you how to mix two methods to create a combination
generator. If you like scrambled eggs, this column's for you.


Middle-square Method


Most first-year computer-science students learn the middle-square
random-number algorithm. The method was heavily used in computing's early
days, but its output tends to break down into nonrandom, repeating patterns
or, worse, a stream of zeros. Better random-number algorithms are known, and
the middle-square method is now considered obsolete. It provides a useful
platform, however, for investigating related issues that apply to all
generators.
The middle-square method is simply explained. Square an N-digit seed value,
producing a product with 2N digits, then use the middle N digits of that
product as a random number. Save the result for the next seed. Example 1,
Algorithm #15, shows the middle-square method in pseudocode for eight-digit
decimal random numbers. Because of the potential for overflow in the
multiplication of Seed*Seed, you probably can't implement Algorithm #15
directly on most computers. Squaring the seed value 87654321, for example,
gives 7683279989971041, which is way too large to store in a 16- or 32-bit
register. Because only the middle eight digits of that value (27998997) are
needed, it's possible to avoid overflow by breaking the multiplication into
pieces, as shown in Listing One, MIDSQR.PAS (page 136). The program uses a
technique explained in Robert Sedgewick's Algorithms in C++ (Addison-Wesley,
1992), where p*q is expressed as (104p1+p0)(104q1+q0), assuming p1=p/104,
q1=q/104, p0=p%104, and q0=q%104 (with % meaning modulo).
That math and the program's logic are easier to follow if you think of 104v
and v/104 as shift operations that move decimal digits left and right. If
v=1234, for instance, then 104v=12340000. If v=87654321, then v/104=00008765.
If v=12345678, then v%104=00005678. With these formulas, it's possible to
multiply eight-digit values using 32-bit integers without concern for
overflow--an important consideration because integer wraparound may create a
weak link in the random-number chain. Despite that fact, even top-rated C
compilers typically ignore multiplication overflows in their stock
random-number generators. For more trustworthy sequences, use the expressions
in Listing Two, LONGMULT.PAS (page 136), to multiply eight-digit values and
extract the high, middle, or low eight digits (whichever the algorithm
specifies) from the 16-digit product.
Some programs require floating-point (aka real) numbers selected at random so
that 0<=r<1.0. The easy way to create random reals is to imagine a decimal
place to the left of an integer value k. In MIDSQR.PAS, for example, divide
RandomInt by m8 (equal to 108). If k=52478293, k/108 gives 0.52478293. You may
use a similar technique with any integer random function.


Linear-congruential Method


The hands-down, most popular random-number generator goes by the name
"linear-congruential method." You've probably seen it expressed as the formula
a ab+c mod m where a is a starting seed for each successive value, b, a
constant such as 31415621, c, 0 or 1, and m, the computer's word size. With
careful programming, it's also possible for m to be a power of ten, as in
Example 2, Algorithm #16.
Here again, implementations of Algorithm #16 may fail to produce reliable
random sequences unless you avoid integer wraparound in the multiplication of
Seed*b. In this case, we need only the eight least-significant digits of the
16-digit product, obtained by breaking the multiplication into pieces, as
shown in Listing Three, LINEAR.PAS (page 136), using the same formulas
explained earlier.
The pseudocode in Example 2 also solves a subtle problem ignored by stock
generators that simply return Seed%m. That may be a mistake because, as
LINEAR.PAS demonstrates, the least-significant digits of successive seed
values can be nonrandom. Run the program and examine the sequence produced by
function RandomInt: 88971108, 8878069, 50915850, 46492851, 86225472, 48898113,
85623174_.The rightmost digit in each value cycles from 0 to 9--an obviously
nonrandom, but expected, characteristic of the linear-congruential method.
Instead of truncating the product as is commonly done, a properly written
generator should extract the higher-order digits as shown in function
RandomRange.
Another important consideration is the choice of values for constant b. In
Seminumerical Algorithms, Donald Knuth lists several criteria for selecting a
proper constant, and most generators are written to conform to his proposals.
Just for fun, I examined the source code for several random-number generators
based on Algorithm #16. Without naming sources, some of the constants used
included 22695477, 7654321, 31415821, 3141592621, and 134775813. Not wanting
to be outdone, I derived my own constant, 31415621, for LINEAR.PAS. You might
want to replace that value with others and compare the results using last
month's chi-square analysis.


Fibonacci Method


A relatively obscure method for generating random sequences is based on the
Fibonacci series, 1, 2, 3, 5, 8, 13_ in which each successive value, starting
with the third, equals the sum of the preceding two. The technique uses a
small list of unsigned words initialized in reverse to the first 17 Fibonacci
values so that List[1]=2584, List[2]=1597_List[16]=2, List[17]=1. You also
need two index variables, I=17 and J=5. After initialization, use the method
shown in Example 3, Algorithm #17, to produce the next number in a random
sequence. First, assign List[I]+List[J] to a temporary variable, K, and save
that value in List[I]. Then, decrease I and J by one, and if either index
equals 0, reset it to 17.
Unlike the classic Fibonacci sequence, which sums successive numbers,
Algorithm #17 adds the nonadjacent values List[I] and List[J]. The method
produces sequences with extremely long periods--according to one source, of
16,777,088 on 8-bit systems; even longer using 16- or 32-bit registers--and is
well suited for implementation in assembly language. (Values are stored
backwards in List so you can use a fast assembly-language instruction to test
when an index is decremented to 0.) Because it uses no multiplications, the
algorithm cannot suffer from overflow. Listing Four, FIBO.PAS (page 136),
implements Algorithm #17 in Pascal.


Combining the Classics


There is no such thing as the perfect random-number generator--any algorithm
can produce a sequence that fails one or more tests for randomness. For better
results, you might try combining two or more techniques, which would produce a
sequence that often seems "more random" than values that are generated by each
method alone. The VIP's interrupt subroutine coupled with human interaction is
a good example of a combination generator. Another, more-general technique
works with any two integer random functions. Listing Five, COMBINE.PAS (page
137), demonstrates the basic idea. The program defines an array called Values
that holds 100 32-bit integers (the size of a LongInt type). Each value in the
array is initialized using the linear-congruential method. In order to obtain
a random number, the program uses the Fibonacci method to produce an index
from 1 to 100, used as an index V into the Values array. That value is
returned as the next random number, and Values[V] is assigned a new number
using the linear-congruential technique. The downside of combination
generators is that they run more slowly and consume more memory than do other
methods. For Monte Carlo testing and other critical applications, however,
many experts recommend a mix of generators rather than relying on just one.


Your Turn


I've gotten over my case of the chip-is-bad fever, but I haven't lost my
fondness for random-number generators. If you know a good one, or if you have
another algorithm to share, I'd like to hear from you. Write to me in care of
DDJ, or send CompuServe mail to my ID, 73627,3241.
Example 1: Pseudocode for Algorithm #15 (middle-square method, obsolete).
const
 m4=10000;
 m8=100000000;
var

 Seed: Integer
function NextRandom: Integer;
begin
 Seed((Seed*Seed)/m4) mod m8;
 NextRandomSeed;
end;
Example 2: Pseudocode for Algorithm #16 (linear-congruential method).
const
 m4=10000;
 m8=100000000;
 b=31415621;
 r=65536;
var
 Seed: Integer
function NextRandom: Integer;
begin
 Seed((Seed*b)+1) mod m8;
 NextRandom((Seed/m4)*r) div m4;
end;
Example 3: Pseudocode for Algorithm #17 (Fibonacci method).
var
 List: array[1..17] of WORD;
 I, J: Integer;
function NextRandom: Integer;
var
 K: Integer;
begin
 KList[I]+List[J];
 List[I]K;
 II--1;
 JJ--1;
 if I=0 then I17;
 if J=0 then J17;
 NextRandomK;
end;
[LISTING ONE] (Text begins on page 111.)

(* ------------------------------------------------------------------------ *(
** midsqr.pas -- Middle-Square Random Number Generator. Note: This method **
** method is considered obsolete and is listed for reference only. **
** Copyright (c) 1993 by Tom Swan. All rights reserved. **
)* ------------------------------------------------------------------------ *)
program MidSqr;
const
 m4 = 10000; { 10 ^^ 4 }
 m8 = 100000000; { 10 ^^ 8 }
var
 Seed: LongInt;
procedure RandomInit(StartingSeed: LongInt);
begin
 Seed := StartingSeed
end;
{ Square the eight-digit seed; return middle eight digits of 16-digit result.
}
function RandomInt: LongInt;
var
 p0, p1: LongInt;
begin
 p0 := seed mod m4;
 p1 := seed div m4;

 Seed :=
 (((p1 * p0 + p1 * p0) + (p0 * p0 div m4)) +
 ((p1 * p1 mod m4) * m4) ) mod m8;
 RandomInt := Seed
end;
var
 N: Integer;
begin
 Writeln;
 Writeln(Middle-Square Random Number Generator (obsolete)');
 Writeln;
 RandomInit(45086273);
 for N := 1 to 100 do
 Write(RandomInt:10);
 Writeln
end.

[LISTING TWO]

(* -------------------------------------------------------------------------
*(
** longmult.pas -- Demonstrate 32-bit multiplication **
** Copyright (c) 1993 by Tom Swan. All rights reserved. **
)* -------------------------------------------------------------------------
*)
program LongMultTest;
const
 m4 = 10000; { 10 ^^ 4 }
 m8 = 100000000; { 10 ^^ 8 }
var
 p, q, p1, p0, q1, q0: LongInt;
 low, mid, high: LongInt;
begin
 p := 12345678; { First value }
 q := 87654321; { Second value }
 p1 := p div m4;
 p0 := p mod m4;
 q1 := q div m4;
 q0 := q mod m4;
 low :=
 (((p0 * q1 + p1 * q0) mod m4) * m4 + p0 * q0) mod m8;
 mid :=
 (((p1 * q0 + p0 * q1) +
 ((p0 * q0) div m4)) +
 ((((p1 * q1)) mod m4) * m4)) mod m8;
 high :=
 ((((((p1 * q0 + p0 * q1) +
 ((p0 * q0) div m4)) +
 ((((p1 * q1)) mod m4) * m4)) div m4) +
 ((((p1 * q1)) div m4) * m4))) mod m8;
 Writeln(low = , low);
 Writeln(mid = , mid);
 Writeln(high = , high);
 Writeln;
 Writeln(p,  * , q,  = , high, low)
end.

[LISTING THREE]

(* -------------------------------------------------------------------------
*(
** linear.pas -- Linear Congruential Random Number Generator **

** Copyright (c) 1993 by Tom Swan. All rights reserved. **
)* -------------------------------------------------------------------------
*)
program Linear;
const
 m4 = 10000; { 10 ^^ 4 }
 m8 = 100000000; { 10 ^^ 8 }
 b = 31415621; { constant multiplier }
var
 Seed: LongInt;
 I: Integer;
function LongMult(p, q: LongInt): LongInt;
var
 p1, p0, q1, q0: LongInt;
begin
 p1 := p div m4;
 p0 := p mod m4;
 q1 := q div m4;
 q0 := q mod m4;
 LongMult :=
 (((p0 * q1 + p1 * q0) mod m4) * m4 + p0 * q0) mod m8
end;
procedure RandomInit(StartingSeed: LongInt);
begin
 Seed := StartingSeed
end;
function RandomInt: LongInt;
begin
 Seed := (LongMult(Seed, b) + 1) mod m8;
 RandomInt := Seed
end;
function RandomRange(R: LongInt): LongInt;
begin
 RandomRange := ((RandomInt div m4) * R) div m4
end;
begin
 Writeln;
 Writeln(Linear Congruential Random Number Generator');
 Writeln;
 RandomInit(1234567);
 for I := 1 to 32 do
 Write(RandomInt:10);
 Writeln;
 for I := 1 to 32 do
 Write(RandomRange(32768):10);
end.

[LISTING FOUR]

(* -------------------------------------------------------------------------
*(
** fibo.pas -- Fibonacci Random Number Generator **
** Copyright (c) 1993 by Tom Swan. All rights reserved. **
)* -------------------------------------------------------------------------
*)
program Fibo;
var
 List: array[1 .. 17] of WORD;
 I, J: Integer;
procedure RandomInit;
var
 N: Integer;

begin
 List[17] := 1;
 List[16] := 2;
 for N := 15 downto 1 do
 List[N] := List[N+1] + List[N+2];
 I := 17;
 J := 5
end;
function RandomInt: WORD;
var
 K: WORD;
begin
 K := List[I] + List[J];
 List[I] := K;
 I := I - 1;
 J := J - 1;
 if I = 0 then I := 17;
 if J = 0 then J := 17;
 RandomInt := K
end;
var
 N: Integer;
begin
 Writeln;
 Writeln(Fibonacci Random Number Generator');
 Writeln;
 RandomInit;
 for N := 1 to 100 do
 Write(RandomInt:10);
 Writeln
end.

[LISTING FIVE]

(* -------------------------------------------------------------------------
*(
** combine.pas - Combination Random Number Generator. Combines Algorithm #16
**
** (Linear Congruential Random Number Generator) with #17 (Fibonaccii Random
**
** Number Generator) to create a table-driven combination generator. **
** Copyright (c) 1993 by Tom Swan. All rights reserved. **
)* -------------------------------------------------------------------------
*)
program Combine;
const
 m4 = 10000; { 10 ^^ 4 }
 m8 = 100000000; { 10 ^^ 8 }
 b = 31415621; { constant multiplier }
 maxIndex = 100; { array of Values }
var
 List: array[1 .. 17] of WORD;
 Values: array[1 .. maxIndex] of LongInt;
 I, J: Integer;
 Seed: LongInt;
function LongMult(p, q: LongInt): LongInt;
var
 p1, p0, q1, q0: LongInt;
begin
 p1 := p div m4;
 p0 := p mod m4;
 q1 := q div m4;
 q0 := q mod m4;

 LongMult :=
 (((p0 * q1 + p1 * q0) mod m4) * m4 + p0 * q0) mod m8
end;
function LinearRandomInt: LongInt;
begin
 Seed := (LongMult(Seed, b) + 1) mod m8;
 LinearRandomInt := Seed
end;
procedure RandomInit(StartingSeed: LongInt);
var
 N: Integer;
begin
 Seed := StartingSeed;
 List[17] := 1;
 List[16] := 2;
 for N := 15 downto 1 do
 List[N] := List[N+1] + List[N+2];
 I := 17;
 J := 5;
 for N := 1 to maxIndex do
 Values[N] := LinearRandomInt
end;
function FiboRandomInt: WORD;
var
 K: WORD;
begin
 K := List[I] + List[J];
 List[I] := K;
 I := I - 1;
 J := J - 1;
 if I = 0 then I := 17;
 if J = 0 then J := 17;
 FiboRandomInt := K
end;
function RandomRange(R: LongInt): LongInt;
var
 V: Integer;
begin
 V := 1 + (FiboRandomInt mod 100);
 RandomRange := ((Values[V] div m4) * R) div m4;
 Values[V] := LinearRandomInt
end;
var
 N: Integer;
begin
 Writeln;
 Writeln(Combination Random Number Generator');
 Writeln;
 RandomInit(7654321);
 for N := 1 to 1000 do
 Write(RandomRange(65536):10)
end.

End Listings








January, 1994
UNDOCUMENTED CORNER


The Windows 3.1 Virtual Machine Control Block, Part 1




by Kelly Zytaruk


Kelly graduated with a Bachelor of Science in Electrical Engineering and
Computers from the University of Waterloo in Ontario, Canada. He has spent the
last ten years programming in C and assembler on Intel-based machines. Most
recently he has worked on peripheral hardware design and virtual device
drivers.




Introduction




by Andrew Schulman


Much of the preemptive multitasking needed for Microsoft's forthcoming Chicago
operating system (Windows 4) already exists inside Windows 3.1 Enhanced Mode.
The Windows 3.1 Virtual Machine Manager (VMM) that lives inside WIN386.EXE is
missing features such as threads and mutexes, but it does have a fully
preemptive time-slice scheduler, semaphores, lists of suspended processes,
priorities, and many other features that you'd expect from a preemptive
multitasking operating system. In fact, DOS386.EXE (which contains the VMM and
many of the VxDs that make up the Chicago operating system) is an outgrowth of
WIN386.EXE.
One reason we tend not to think of Windows 3.1 as a full-blown multitasking
operating system is that, ironically, Windows doesn't extend these
capabilities to Windows applications. As Kelly Zytaruk notes in this month's
"Undocumented Corner," the Windows kernel is a simple, non-preemptive
multitasking operating system that runs as a single task within the larger
Windows Enhanced Mode preemptive multitasking operating system.
But Windows Enhanced Mode doesn't preemptively multitask Windows applications;
what, then, are the "tasks" that VMM manipulates? They're Virtual Machines
(VMs). All Windows applications run in a single VM, called the "System VM."
Each DOS Box is a separate VM. Thus, right now, all these preemptive
multitasking facilities largely benefit DOS programs. Things will be different
in Chicago, where the Win32 API provides threads, thread-synchronization
facilities, and preemptive multitasking.
Since each task runs in its own separate VM, Windows multitasking is largely
invisible to applications. Windows doesn't provide DOS and Windows
applications with an API for communicating with other tasks, apart from a
handful of calls such as INT 2Fh AX=1683h (Get Current VM ID) and AX=1685h
(Switch VMs and CallBack) documented in the Windows Device Driver Kit (DDK).
However, Windows 3.1 does extend a multitasking API to Virtual Device Drivers
(VxDs). The DDK documents a set of scheduler services provided by VMM, such as
Adjust_Exec_Priority, Begin_Critical_Section, Wait_Semaphore,
Call_When_Task_Switched, Suspend_VM, Release_Time_Slice, Call_When_Idle, and
so on.
VMM identifies each virtual machine with a VM handle, which is the 32-bit
linear address of a Virtual Machine Control Block (VMCB).
At first glance, the structure of the VMCB appears to be documented in the
VMM.INC file included with the DDK. However, Microsoft documents only the
first four fields of this actually quite large structure.
This month's "Undocumented Corner" is the first of two articles in which Kelly
Zytaruk lays bare the VMCB structure for Windows 3.1. This month, Kelly shows
the overall structure of the VMCB and begins a detailed explanation of each
VMCB field owned by VMM. Next month, he'll explain the remaining VMM fields
and present a Windows VM Explorer application.
Some of the VMCB structure is apparent from even a cursory disassembly of VMM.
For example, Get_Next_VM_Handle expects a VM handle in the EBX register, and
returns the handle of the next VM in the EBX register. The implementation for
Get_Next_VM_Handle starts with MOV EBX, DWORD PTR [EBX+68h]; not surprisingly,
Kelly documents offset 68h in the VMCB as the Next pointer in VMM's linked
list of VMs.
Knowing the VMCB structure raises the question of where you find a VM handle.
In a VxD, it's simple: EBX in a VxD usually points to the current VM. VMM
provides functions such as Get_Sys_VM_Handle and Get_Next_VM_Handle.
But how can something other than a VxD get a VM handle? In "Identify the
Running DOS Application from Windows" (Windows/DOS Developer's Journal,
December 1992), Paul Bonneau showed that Windows 3.1 stores a DOS Box's VM
handle at offset 0FCh in the WINOLDAP data segment. This will likely change in
future versions of Windows, but Figure 1 presents a VM_FROM_HWND() macro that
a Windows application could use to get the VM handle for a DOS program.
A documented way for applications to get VM handles is to use my generic VxD,
which gives normal DOS and Windows programs access to VMM and VxD functions,
including Get_Sys_VM_handle and Get_Next_VM_Handle (see my article, "Call VxD
Functions and VMM Services Using Our Generic VxD," Microsoft Systems Journal,
February 1993). In Part 2 of this article, Kelly will use an improved version
of the generic VxD for his VM explorer.
Once you have a VM handle, how do you get a VMCB? The VM handle is the 32-bit
linear address of a VMCB. To do anything with such an address, a program needs
to turn it into a protected selector:offset pointer. As noted in last month's
"Undocumented Corner," the documented Windows API functions AllocSelector,
SetSelectorBase, and SetSelectorLimit (or their DPMI INT 31h equivalents) can
be lashed together to create a handy map_linear function that lets you turn a
VM handle into a far pointer to a VMCB, then access fields in the structure;
see Figure 1.
Normally, you would think of each VM as having its own address space, but
various fields in the VMCB allow you to access data in other VMs. The
CB_High_Linear field documented in the DDK provides a way to look at real-mode
addresses in other VMs. The LDT field at offset 114h in the VMCB is also quite
useful, as it provides a way to access protected-mode addresses in other VMs.
In the future, I'll present the source code for a PROTDUMP utility (see Figure
2) that uses VMCB+114h to do this. While there isn't room to present the
source code for PROTDUMP here, PROTDUMP.EXE (and VXD.386, the generic VxD
required by certain PROTDUMP command-line options) is available
electronically; see "Availability," page 3.
The VMCB structure Kelly presents here is valid only for the retail version of
Windows 3.1. The LDT is at VMCB+0x11C in the debug version of Windows 3.1, and
at VMCB+0x5C in the Chicago prerelease. Chicago still has a VMM, VxDs, a VMCB,
and so on--in fact, in Chicago these Windows components become even more
important, because they may largely replace real-mode MS-DOS--but of course
all the VMCB offsets have changed. Much of the VMCB contents appear to have
been moved to a Thread Control Block. For example, offset 0 in the VMCB now
appears to hold the initial thread handle, as returned by
Get_Initial_Thread_Handle. (Given a thread, you can get back to the VMCB with
Get_VM_Handle_For_Thread.) As another example, Schedule_VM_Event now calls
Get_Initial_Thread_Handle and then does a Schedule_Thread_Event.
However, the specifics of what is in which VMCB field aren't nearly as
important as simply seeing what's kept in the VMCB to start with; that is,
seeing what VMM maintains on a per-VM basis. The real reason to look at the
VMCB isn't to start using a highly version-specific undocumented structure,
but to help clarify how Windows Enhanced Mode works. Knowing about the VMCB
can also improve your understanding of the DOS Protected Mode Interface
(DPMI); for example, see Kelly's explanation of the CB_PM_App_CB at offset
64h.
Also consider the selector list at offset 48h in the VMCB. The Enhanced Mode
DOS extender in the DOSMGR VxD has to implement certain DOS functions such as
INT 21h AH=52h by returning a protected-mode selector. DOSMGR can't know when
you're done using one of these selectors, so they are permanent. However, if
INT 21h AH=52h is called more than once in a VM, it's important not to
allocate more selectors (only 8192 are available), so such permanent selectors
are allocated with a function that first consults the selector list at
VMCB+0x48. Whenever a program asks this function to map a linear address to a
protected-mode selector, the function first walks the selector list to see if
that linear address already has a corresponding selector. If it does, the
function can just return the same selector without allocating a new one.
In DPMI, this function is INT 31h AX=2 (Segment to Descriptor). The Windows
kernel uses this function to create permanent selectors such as __0040 and
__B800 (see page 37 of Matt Pietrek's Windows Internals, Addison-Wesley,
1993). The DPMI server in VMM implements this function by calling
Map_Lin_To_VM_Addr (documented in the DDK). The DOS extender in DOSMGR
implements calls such as INT 21h AH=52h using the V86MMGR Xlat_Return_Ptr
service, which in turn also calls Map_Lin_ To_VM_Addr.
Figure 2 shows a sample selector list for the System VM. Figure 2(a) first
uses protdump -vm to get a list of all VMs. Here, the VMCB for VM #1 (the
System VM) is at 804C1000h. The offset 48h in Figure 2(b) is a pointer (32-bit
linear address) to a VMM linked list; hence the protdump -ptr - list options.
Each selector-list entry is an array of two dwords, so you can display eight
bytes using the protdump -dword option. VM #1 selector 101Fh has a base
address of F0000h. This is the Windows __F000 selector, documented in the
Windows 3.1 SDK. We can examine this selector from a DOS box (a separate
address space) using PROTDUMP. In Figure 2(c), the -prot option indicates that
101F:FFF0 is a protected-mode pointer, not a real-mode address; #1 indicates
VM 1.
Of course F000:FF0 is just the top of the ROM BIOS, and the beginning of
extended memory. We could have looked at this from any VM, using a real-mode
address. But the fact that PROTDUMP examined it using a protected-mode
selector in another VM means that PROTDUMP (a DOS program) could just as
easily look at any Windows data structure. For example, Program Manager here
happened to have a task handle of 0617h, so PROTDUMP can examine its Task
Database (TDB); see Figure 2(d). PROTDUMP uses the LDT selector at VMCB+114h
to examine protected-mode selectors in other VMs (see CB_LDT in Figure 4).
It is also easy for PROTDUMP to look at real-mode addresses in other VMs,
using the documented CB_High_Linear field in the VMCB. The protdump -all
switch examines the same address in all VMs. In Figure 2(e), PROTDUMP is
showing the current PSP in each VM; "DOS" is a handy indicator for the DOS
data segment, and the current PSP is at offset 330h in DOS 4 and higher.
The documentation for both DPMI INT 31h AX=2 and VMM Map_Lin_To_VM_Addr says
that selectors allocated with these functions "should never be modified or
freed." From Kelly's description of VMCB+48h, you can see why: The next time
someone asked to map a linear address corresponding to the freed or modified
selector, VMM would return the old selector, unless the selector list were
also modified.
In addition to Part 2 of Kelly's article next month, future "Undocumented
Corner" columns will cover topics such as the linear-executable file format
used by VxDs, the W3 format used by WIN386.EXE and DOS386.EXE, the Windows
instance-data manager, undocumented MFC, NetWare Lite, and 386 memory-manager
IOCTL interfaces. Please send your comments and suggestions to me on
CompuServe at 76320,302.
Windows Enhanced Mode is a preemptive multitasking operating system that runs
one or more separate tasks. Each task believes it has sole access to the CPU
and peripherals (keyboard, display, mouse, printer, and so on).
When we talk of multitasking under Windows, we instinctively think of the
Windows kernel and the running of multiple Windows programs. But the Windows
kernel should be seen as a simple operating system running as a single task of
a larger and more complex operating system. The Windows kernel provides a
means by which one or more well-behaved Windows programs can run in the same
address space, sharing I/O and system resources. Task switching is
non-preemptive: It doesn't take place amongst Windows programs until a program
decides to call GetMessage() or a similar function (see "Inside the Windows
Scheduler," by Matt Pietrek, DDJ, August 1992). Any program can effectively
starve the others of CPU time.
The real excitement in Enhanced Mode is in the larger, more-complex operating
system that is run by the Virtual Machine Manager (VMM). Tasks under the VMM
are called Virtual Machines (VMs) because each task appears to have sole
control of the machine (or CPU). It "virtually" owns the machine. Task
switching is done preemptively: The VMM decides when it's time for a task
switch.
The Windows Enhanced Mode operating system actually consists of both the VMM
and Virtual Device Drivers (VxDs). VxDs use services provided by VMM as well
as by other VxDs. These services are documented by Microsoft in the Virtual
Device Adaptation Guide included with the Windows Device Driver Kit (DDK).
These services are used by VxD to limit access to, control, modify, or
simulate system resources that are used by Windows or DOS programs. The VxD
can make its actions totally transparent to the application, or it can provide
services that an application calls explicitly (that is, nontransparently).
All accesses into the operating system from a VM must first pass through the
VMM. The VMM acts as a kind of distribution system for APIs and faults. The
VMM then passes the request or fault on to the appropriate VM.
Each VM has its own private address space, interrupt-vector tables, and I/O
ports. With few exceptions, each VM can appear to own all aspects of the
computer while concurrently running with other VMs. The first VM is the System
VM, which runs the Windows kernel, graphical interface, and all Windows
programs. As each non-Windows program is run, a VM is created, and a DOS box
is started within that VM to run the program.
It is the responsibility of the VMM to keep VMs separate and to provide
preemptive task switching and scheduling services. Each VM has an associated
Control Block (VMCB). VxDs can allocate and use portions of the VMCB to
maintain VM-unique data areas. During system initialization, VxDs allocate
areas within the VMCB by calling a documented Allocate_Device_CB_Area provided
by VMM.
Figure 3 shows a sample VMCB. The VMCB size varies from machine to machine as
the configuration changes. Here, the VMCB is 1B40h bytes in size; VMM owns the
first 210h bytes, VPICD owns 0BCh bytes, and so on. This article examines the
contents of the VMM portion at offset 0 in the VMCB, ignoring portions of the
VMCB owned by other VxDs, such as DOSMGR.
Figure 4 shows the VMCB format. VMM keeps all VMs on a linked list, and the
link from one VM to the next is at offset 68h in the VMCB. This link is the
32-bit linear address of the next VMCB. As another example, the selector to a
VM's Local Descriptor Table (LDT) is kept at offset 114h in the VMCB.
This information is accurate only for Windows 3.1 Enhanced Mode retail
version. The Windows DDK provides a debug version of WIN386.EXE (similar to
the SDK's debug versions of KRNL386.EXE and other DLLs). This includes a debug
version of VMM which adds several additional fields early on in the VMCB, thus
throwing out later fields. For example, the Next pointer is at offset 70h in
Windows 3.1 debug, and the LDT selector is at offset 11Ch. Some of the field
names come from the .VC command in the debug VMM. This command is accessible
from Soft-ICE/Windows and WDEB386.



Documented Fields


The Windows DDK documents only the first four fields of the VMCB; these fields
are part of the block owned by VMM. The following provides some details
missing from the DDK:
0x00. CB_VM_Status. Current execution status of this VM. The execution status
is a bitmap with values documented in the DDK, such as VMStat_Exclusive,
VMStat_Background, VMStat_PM_Exec (VM is currently running a protected-mode
program), VMStat_PM_Use32 (protected-mode program is 32-bit), VMStat_Idle, and
so on. For example, when a program running under Enhanced Mode calls INT 2Fh
AX=1680h (documented in the MS-DOS Programmer's Reference as the "MS-DOS Idle
Call"), VMM calls the Release_Time_Slice service, which turns on the
VMStat_Idle bit in the current VM's CB_VM_Status field. One status bit
(10000h), which indicates that Close_VM has been called for this VM, appears
to be undocumented.
The status bits are, for the most part, informative only. They are set after
the internal state of the VM has been altered. VMM uses them to determine the
current state of the VM and to decide how to change to a different state.
Altering these bits (for example, turning on the VMStat_Background bit) is
unlikely to produce the desired effect.
0x04. CB_High_Linear. Address of VM's real-mode memory in entire VMM linear
address space. When a VM becomes active, its Real Mode address space is mapped
down to linear address 0, which is reserved for the active VM. When the VM is
not active, its memory is still accessible via the CB_High_Linear address. All
access to the VM's real-mode memory should be through the CB_High_Linear
address. In Figure 5, for example, to read the WORD at 0040:0008 in VM #1, you
must convert 0040:0008 to a linear address, then add in the VM's
CB_High_Linear address.
0x08. CB_Client_Pointer. Linear pointer to Client Register Structure. When a
Virtual Machine makes a call into the operating system, all registers are
saved to the Client Register Structure (CRS). The registers are restored from
the CRS when the operating system returns to the VM. VxDs can examine and
alter the VM registers by accessing the CRS. VMM points the EBP register at
the CRS, which is defined in VMM.INC in the DDK. For example, if a VxD refers
to dword ptr [ebp+1Ch], it generally means Client_EAX.
0x0C. CB_VMID. Unique ID number to identify VM. Each VM has a unique ID
number, starting with 1 for the System VM. Any application can call INT 2Fh
AX=1683h (Get Current Virtual Machine ID) to get the ID for its VM. This is
different from VM handle; VxDs refer directly to the VMCB (VM handle) when
referencing VMs.


Undocumented Fields


The fields of the VMCB described in this section are undocumented.
0x10. CB_PM_Int_Table. Linear address of protected-mode Interrupt Table. This
field is valid only if a protected-mode program is present in the VM (that is,
[ebx+CB_VM_Status] & VMStat_PM_App) ; it is 0 if a real-mode program is
executing.
While in protected mode, interrupts are processed through the Interrupt
Descriptor Table (IDT), which in many cases points to VMM entry points. If the
VMM decides to pass the interrupt on to the application, it reflects the
interrupt through the address given in PM_Int_Table. This VMM table of 256
8-byte entries is not the same as the IDT (which only holds 60h entries, up to
INT 5Fh). VMM uses this table to exert complete control over interrupt
ownership. Get_PM_Int_Vector returns a value from this table.
Set_PM_Int_Vector inserts a value into this table and, if the VMM permits,
alters the IDT entry as well.
0x14. CB_VM_ExecTime. VM execution time--the number of milliseconds this VM
has actually been active. This is not a measure of the lifetime of the VM but
rather an accounting of how much CPU time the VM has had. This value is
returned by the VMM Get_VM_Exec_Time and Get_Last_Updated_VM_Exec_Time.
0x18. CB_V86_PageTable. Linear address of the page table used by this VM. The
first 1 Mbyte+64K (possibly up to 4 Mbytes) of each DOS box is mapped to
physical memory via this page table. Each VM has its own unique page table.
Thus programs in different VMs can use the same virtual addresses, but via the
page tables, have different linear and physical address. When this VM becomes
active, this page table is mapped to Linear address 0.
0x1C. CB_Local_Port_Trapping_BitMap. Port Trapping Enable/Disable Array. This
32-byte array is treated as a continuous string of 256 bits. There's one bit
per port: If the bit is set, Local Port trapping has been enabled for this
port; otherwise, it's disabled. If Global Port trapping has been enabled, the
bit will be set in the system Global Port trapping array and in each VM. The
bits in this array are accessed directly by calls to Enable_Local_Trapping,
Disable_Local_Trapping, Enable_Global_Trapping, and Disable_Global_Trapping.
But what about port numbers higher than 256? As it turns out, each bit does
not map one-to-one to a port. VMM passes the port number through a hashing
function and converts the port number to a bit offset. This means you can't
enable local or global I/O trapping on more than 256 different ports.
0x3C. CB_Begin_Nested_Exec_List. List handle to Nested_Exec_List. When
Begin_Nested_Exec is called, the current CS:IP and status is saved and the
CS:IP in the CRS is changed to an address that will cause an entry exception
into the operating system. The CS:IP is saved in a list node. This is the
handle to the list; End_Nested_Exec uses this list to restore the CS:IP and
execution state.
0x40. CB_OS_Stack. Linear address of operating-system stack. VMM switches
stacks to an internal stack for VMM calls. Each VM supplies a different
private stack. This is a 32-bit protected-mode stack. While executing within
the VMM, the previous ESP is saved in this field and restored from here upon
exiting.
0x44. CB_Scheduler_Flags. Bitmap of scheduler flags; see Table 1.
0x48. CB_Selector_List. List handle of mapped selectors. The
Map_Lin_To_VM_Addr service maps a linear address to an address in the VM
address space. If the VM is in V86 mode, then the returned address is a
segment:offset pair. If the VM is in protected mode, the service must allocate
an LDT selector. The LDT selector is then linked onto the CB_Selector_List. If
a call is made to Map_Lin_To_VM_Addr with another linear address that can be
satisfied by a previously allocated selector, it will reuse the previous
selector instead of allocating a new one. For this reason, it's important not
to delete selectors allocated by this service. As Figure 2 shows, each
selector entry includes the base address and selector; VMM uses the LSL
instruction to get the selector limit (size).
0x4E, 0x50. CB_Locked_PM_Stack_LDT and CB_Locked_PM_Stack_GDT. LDT selector of
locked protected-mode stack and GDT selector of locked protected-mode stack.
When a protected-mode stack is allocated, both a 16-bit LDT selector and an
equivalent 32-bit GDT selector are allocated. If the VM is currently running a
16-bit protected-mode program, the LDT will be used for the SS register. If
the VM is currently running a 32-bit protected-mode program, the GDT selector
will be used.
0x52, 0x54. CB_Locked_PM_Stack_Prev_SS and CB_Locked_PM_Stack_Prev_ESP. When a
call is made to use the locked protected-mode stack, the current stack-pointer
registers (SS:ESP) are saved in these fields. SS:ESP is restored from here
when the application is finished with the locked PM stack.
0x58. CB_Locked_PM_Stack_hMem. Page handle to protected-mode stack. When a
protected-mode stack is allocated, the page handle is saved in this field so
that the stack can be locked, unlocked, and freed by calling the page
allocator.
0x5C. CB_Locked_PM_Stack_Count. PM stack reference count. Each time a call is
made to Use_Protected_Mode_Stack, this counter is incremented. When
End_Use_Protected_Mode_Stack is called, the counter is decremented. When the
counter reaches 0, the stack is switched back to the original stack as saved
in the CB_Locked_PM_Stack_Prev_SS and CB_Locked_PM_Stack_Prev_ESP fields.
0x60. CB_Locked_PM_Stack_EIP. When Begin_Use_Locked_PM_Stack is called, the
current application EIP is saved in this field. It is restored as the
application EIP when End_Use_Locked_PM_Stack is called.
0x64. CB_PM_App_CB. Protected-mode Application Control Block. The PM App CB is
the DPMI-host private-data area that a program allocates after calling INT 2Fh
AX=1687h, and before switching to protected mode. VxDs can allocate space in
here with the Allocate_PM_App_CB_Area call, which returns an offset into the
PM App CB. The space requested by various VxDs determines the number of
host-data paragraphs returned in the SI register by INT 2Fh AX=1687h.
Because this field holds a single value, rather than a pointer to a list, VMM
can safely run only one DPMI client in a VM at a time. Each call to the DPMI
Switch to Protected Mode entry point will overwrite this field with the
address of the host private-data area allocated by the DPMI client. If DPMI
clients are "nested" (that is, a DPMI client spawns a program that also calls
the DPMI entry point), when the second client exits it will deallocate the
first client's data area.
The first two DWORD fields of the protected-mode Application Control Block are
documented as PMCB_Flags and PMCB_Parent. The VMM uses two DWORD fields in the
descriptor block for DPMI: DPMI_PageList and DPMI_DOSMem_List. The offsets of
the two fields are accessible only from within the VMM, because there is no
way to determine where the VMM portion of the PM_App_CB starts. DPMI_PageList
is a handle to a list of pages allocated for a protected-mode program by calls
made to INT 31h AX=501h (Allocate Memory Block). DPMI_DOSMem_List is a handle
to a list of selectors allocated for a protected-mode program by calls to INT
31h AX=101h (Allocate DOS Memory).
0x68. CB_VM_List_Link. Linear address of next VM. As VMs are created, they are
linked on a list, newest VM first. The System VM is thus always the last on
the list. This field is the linear address of the VMCB for the next VM on the
list. The system VM, as the last in the list, has a value of 0 in this field.
Get_Next_VM_Handle uses this field to determine the next VM handle. This
function gives the appearance of a circular list, as it will return the
address of the first VM in the list when it finds a value of 0 in this field.
0x6C, 0x70. CB_ListNext (pointer to next VM on a VM list) and CB_ListHead
(pointer to head of a VM list). Throughout its life, a VM can appear on the
Ready list, the Waiting list, or a Blocked-semaphore list. This field is the
linear address of the head of the list to which the VM is currently attached.
The CB_ListNext field points to the next VM on the list. The list is usually
ordered by execution priority, starting with the highest-priority VM.
0x74. CB_BlockedSemaphore. Handle of semaphore blocked on. If the VM has
called WaitSemaphore to access code or data associated with a semaphore but
the semaphore is already in use, then the VM will block on the semaphore. This
field will contain the handle for the semaphore on which the VM is blocked.
0x78. CB_SuspendedList_Head. Head of VM list during Suspend. If the VM was on
a list (see CB_ListHead) when a call was made to suspend the VM, the VM will
be removed from the list, and the head of the VM List will be saved here.
Resume_VM will use this field to place the VM back on the list when it is
runable again.
0x7C. CB_Suspended_BlockedSemaphore. Semaphore handle during Suspend. If the
VM was blocked on a semaphore when a call was made to suspend the VM, then the
semaphore reference count will be decremented by 1, and the handle to the
semaphore will be saved. Resume_VM will use this field to block the VM again
on the semaphore when it is runable.
0x80. CB_Exec_Priority_Bits. Execution Priority. This is the current execution
priority of this VM. It may be any number of scheduler "boost" values
documented in the DDK, such as High_Pri_Device_Boost and
Critical_Section_Boost. Increasing or decreasing the boost values either
increases or decreases the VM's execution priority as the requirements of the
VM change. The priority can be changed by calling the VMM Adjust_Exec_Priority
with either a positive or a negative boost in EAX. The lists to which
CB_ListHead (offset 0x70) point use these priority values to order the VM on
the list.
0x88. CB_Suspend_Stack. When a task switch occurs, the state registers are
saved on the stack. When a VM has been suspended, its ESP is saved in this
field so that it can be restored to the same place when the task resumes. Thus
the entire register set can be saved and restored.
0x8C. CB_TSS_ESP0. Task State Segment, Stack pointer 0. This specifies the
linear address of the stack to use when an exception occurs that causes a Ring
0 transition and entry to the 32-bit VMM. The real-mode registers are saved
onto this stack. The CRS field at offset 0x08 points to the bottom of this
stack; this is where the VM registers are saved.
0x90. CB_hMem_Stack. Page handle to stacks. The VMM and VM in combination
require several stacks allocated from the same page-allocate call. The handle
to the page or pages that contain the stacks is held here.
0x94. CB_SuspendedVM_Count. Suspended count. Each time a VM is suspended, this
value is incremented. It's decremented by a call to Resume_VM. This VM will
not be permitted to resume until this value goes to 0.
0x98. CB_SuspendedVM_EventHandle. Suspend event handle. When a VM is
suspended, the VM's locked stacks and other associated memory can be unlocked
and the resources reused, since the VM won't be using them for a while. This
isn't a time-critical operation and the operating system may not be in a
stable state to perform the operation. An event is scheduled such that when
the VMM becomes stable, the locked memory will be unlocked. The event handle
is stored in this field.
An event is a function called when certain criteria are met and the VMM is in
a stable condition; for example, when interrupts are enabled or when a
critical section is unowned. Events can be scheduled globally or by VM. A
global event will be called when the VMM is stable (usually just before
returning to a VM). A VM event, on the other hand, will be called only when
the VM is current. Events are usually scheduled by hardware interrupts or
other asynchronous events, but they can be scheduled by calls to functions
such as Call_VM_Event or Schedule_Global_Event.


On to Next Month


That's it for this month. In the next "Undocumented Corner," we'll examine the
remaining VMCB fields and create a VM explorer. We'll also uncover a useful
undocumented structure created during VM initialization.

Figure 1: Accessing a VMCB from a Windows program.
// for Windows 3.1 only!
#define VM_FROM_HWND(hWndDosBox) \
 *((LPDWORD) MK_FP(GetWindowWord(hWndDosBox, GWW_HINSTANCE), 0x0FC))

// undocumented Windows API function; see UndocWin, pp. 303-304
extern BOOL FAR PASCAL IsWinOldApTask(HANDLE hTask);
#define ISDOSBOX(hWnd) IsWinOldApTask(GetWindowTask(hWnd))
if (ISDOSBOX(hwnd)) {
 VMCB far *vm_cb = (VMCB far *) map_linear(VM_FROM_HWND(hwnd),
 sizeof(VMCB));
 WORD ldt = vm_cb->CB_LDT;
 // or *((WORD far *) ((BYTE far *) vm_cb) + 0x114)
 FreeSelector(FP_SEG(vm_cb));
 // now do something with LDT in other VM
 }

Figure 2: Using PROTDUMP to display a selector list and to examine memory in
other VMs.
 (a) C:\DDJ\VM>protdump -vm
 #1 VMCB=804C1000 high lin=81C00000h
 #2 VMCB=8065A000 high lin=82000000h

(b) C:\DDJ\VM>protdump -ptr -list -dword 804c1048 8
 804007E0 000E0000 0000104F
 804007D4 000D0000 00001047
 804007C8 000C0000 0000103F
 804007BC 000B8000 00001037
 804007B0 000B0000 0000102F
 804007A4 000A0000 00001027
 80400798 000F0000 0000101F
 8040078C 00000400 00001017
 80400770 00000000 0000100F
 804002D4 000009A0 00001007
 804002C8 0001E490 000000C7

(c) C:\DDJ\VM>protdump -prot #1 101f:fff0
 81CFFFF0 EA 5B E0 00 F0 30 36 2F 30 36 2F 39 31 00 FC 00 .[...06/06/91...
 81D00000 00 00 00 56 44 49 53 4B 33 2E 33 80 00 01 01 00 ...VDISK3.3.....

(d) C:\DDJ\VM>protdump -prot #1 0617:00f0
 81C4FEF0 00 00 50 52 4F 47 4D 41 4E 00 54 44 00 00 00 00 ..PROGMAN.TD....

(e) C:\DDJ\VM>protdump -all DOS:330 -word 2
 #1 81C00CD0 29DC
 #2 82000CD0 710B


Figure 3: Layout of a sample VM Control Block. Note that sizes are rounded up
to a multiple of four bytes.
VxD Owner Size Offset into VMCB
VMM 210h 0h
VPICD BCh 210h
VTD 17h 2CCh
VDDVGA 840h 2EAh VKD E3h B24h
VFD 2h C08h
DOSMGR 4Dh C0Ch
. . .
. . .
. . .
VSERVER 4h 1A9Ch
VMPOLL 14h 1AA0h
VPFD 8Ch 1AB4h
Figure 4: The Windows 3.1 Virtual Machine Control Block (offsets are for the
retail version).
typedef struct {
 DWORD CB_VM_Status; // 00h

 DWORD CB_High_Linear; // 04h
 DWORD CB_Client_Pointer; // 08h
 DWORD CB_VMID; // 0Ch
 DWORD CB_PM_Int_Table; // 10h
 DWORD CB_VM_ExecTime; // 14h
 DWORD CB_V86_PageTable; // 18h
 DWORD CB_Local_Port_Trapping_BitMap[8]; // 1Ch
 DWORD CB_Begin_Nested_Exec_List; // 3Ch
 DWORD CB_OS_Stack; // 40h
 DWORD CB_Scheduler_Flags; // 44h
 DWORD CB_Selector_List; // 48h
 WORD CB_unused0; // 4Ch
 WORD CB_Locked_PM_Stack_LDT; // 4Eh
 WORD CB_Locked_PM_Stack_GDT; // 50h
 WORD CB_Locked_PM_Stack_Prev_SS; // 52h
 DWORD CB_Locked_PM_Stack_Prev_ESP; // 54h
 DWORD CB_Locked_PM_Stack_hMem; // 58h
 DWORD CB_Locked_PM_Stack_Count; // 5Ch
 DWORD CB_Locked_PM_Stack_EIP; // 60h
 DWORD CB_PM_App_CB; // 64h
 DWORD CB_VM_List_Link; // 68h
 DWORD CB_ListNext; // 6Ch
 DWORD CB_ListHead; // 70h
 DWORD CB_BlockedSemaphore; // 74h
 DWORD CB_SuspendedList_Head; // 78h
 DWORD CB_Suspended_BlockedSemaphore; // 7Ch
 DWORD CB_Exec_Priority_Bits; // 80h
 DWORD CB_SchedulerStatus // 84h
 DWORD CB_Suspended_Stack; // 88h
 DWORD CB_TSS_ESP0; // 8Ch
 DWORD CB_hMem_Stack; // 90h
 DWORD CB_SuspendedVM_Count; // 94h
 DWORD CB_SuspendedVM_EventHandle; // 98h
 WORD CB_ForeGround_TS_Priority; // 9Ch
 WORD CB_BackGround_TS_Priority; // 9Eh
 DWORD CB_Weighted_Priority; // A0h
 DWORD CB_Weighted_Time; // A4h
 DWORD CB_Next_Scheduled_VM; // A8h
 DWORD CB_Last_Weighted_VMTime; // ACh
 DWORD CB_ExtendedErrorCode; // B0h
 DWORD CB_ExtendedErrorRefData; // B4h
 DWORD CB_V86_PgTbl_PhysAddr; // B8h
 DWORD CB_Int_Table_Instance; // BCh
 DWORD CB_hMem_VMDataArea; // C0h
 DWORD CB_Int_Table_hMem; // C4h
 DWORD CB_DeviceV86Pages[9]; // C8h
 DWORD CB_V86PageableArray[8]; // ECh
 DWORD CB_MMGR_Flags; // 10Ch
 DWORD CB_MMGR_Pages; // 110h
 WORD CB_LDT; // 114h
 WORD CB_unused2; // 116h
 DWORD CB_hMem_LDT; // 118h
 DWORD CB_VM_Event_Count; // 11Ch
 DWORD CB_VM_Event_List; // 120h
 DWORD CB_Priority_VM_Event_List; // 124h
 DWORD CB_CallWhenVMIntsEnabled_Count; // 128h
 DWORD CB_CallWhenVMIntsEnabled_List; // 12Ch
 DWORD CB_Next_Timeout_Handle; // 130h
 DWORD CB_Prev_Timeout_Handle; // 134h

 DWORD CB_First_Timeout; // 138h
 DWORD CB_Expiration_Time; // 13Ch
 DWORD CB_IDT_Base_hMem; // 140h
 WORD CB_unused3; // 144h
 WORD CB_IDT_Limit; // 146h
 DWORD CB_IDT_Base; // 148h
 struct { // 14Ch
 DWORD Ex_EIP;
 WORD Ex_CS;
 } CB_Exception_Handlers[32];
 DWORD CB_V86_CallBack_List; // 20Ch
 // end of VMM portion
 // start of VxD CB areas
 } VM_ControlBlock; // size: 210h bytes


Figure 5: Using the documented VMCB CB_High_Linear field to read real-mode
memory in other VMs. The somehow_get_VM_handle() function could be implemented
using VM_FROM_HWND() in Figure 1, or with the generic VxD.
DWORD LinAddr = (0x40L << 4) + 8;
DWORD VMHandle = somehow_get_VM_handle();
VMCB far *VMCB = map_linear(VMHandle, sizeof(VMCB));
LinAddr += VMCB->CB_High_Linear;
FreeSelector(FP_SEG(VMCB));
WORD far *WPtr = map_linear(LinAddr, sizeof(WORD));
WORD W = *WPtr;
FreeSelector(FP_SEG(WPtr));
// W is WORD at 40:08 in other VM


Table 1: VMM scheduler flags.
Flag Description
0001h The VM has either been Suspended or was just created and needs to be
Resumed before it can continue.
0004h A VM event has been scheduled to Call the VM when the VM Ints have been
enabled.
0008h The VM has completed VM_Critical_Init.
0010h Begin_Nested_Exec has been called and interrupts from this VM can be
serviced when it is blocked, regardless of whether or not VM interrupts have
been enabled.
0020h When this VM is blocked, do not switch away from it unless another VM
has higher priority.



























January, 1994
PROGRAMMER'S BOOKSHELF


Genetic Programming




Peter D. Varhol


Peter is an assistant professor of computer science and mathematics at Rivier
College in Nashua, New Hampshire.


We can all generally agree upon what a good computer program looks like. It
tends to be short, uses computer time efficiently, and certainly leads to the
one right answer under all circumstances. The algorithms it uses are easy to
understand, not convoluted, and expressed in the most concise way possible. We
can look at it, nod our heads, and say, "Yeah, this makes sense."
Well, throw away those preconceived notions of what a proper computer program
looks like when you open John Koza's Genetic Programming: On the Programming
of Computers by Means of Natural Selection. This comprehensive book examines
what genetic programming is all about, how its concepts derive from the roots
of human genetics, and how it can be used to solve a wide variety of problems
in system control, planning, and decision support.
Genetic programming begins with program induction--the idea that, under many
circumstances, problem solving can be thought of as the discovery of a
computer program that produces the proper outputs when given the appropriate
inputs. The program itself is a black box--feed it the inputs, and it produces
the correct outputs for that input set. This obviously isn't true for all
computer programs, but this analogy fits well to a large number of programs
that perform computations or data manipulations. The process starts off with
the random selection of a number of possible programs. These are tested
against the problem, and the best ones are retained, while the worst are
discarded. The remaining ones are used to generate other programs, all of
which are then once again tested against the problem. If this is done
correctly, the average "fit" of the population increases, although the genetic
algorithms themselves are not concerned with average fitness, but rather, the
fitness of individual programs.
The interesting thing about genetic programming is that this program may be
neither the shortest, the most efficient, nor even the best. It is, however,
the most "fit" of those generated, in that it is the closest solution to the
problem, or the shortest, or the best in whatever measure of fitness you care
to use.
It is easy to see the analogy with classical genetics. The fittest programs
survive in each succeeding generation, until a threshold criterion is reached.
The threshold criterion may specify an average error rate, or average time to
reach a solution, or some other quantitative standard that fitness can be
measured against. Of course, unlike genetics in biology, genetic programming
must stop at some point in time, since the goal is to formulate a solution to
a problem.
There is even an analogy to mutation. At each generation of the program, it is
possible to introduce a random change to a random part of one or more of the
programs. In succeeding generations, a mutation may result in programs that
would never have been derived from the original algorithm.
Is genetic programming, then, guaranteed to give the optimum result? No,
according to Koza. It might, after a while, produce an answer that meets the
solution parameters of the problem, but there is no way of telling whether or
not there is a best answer, let alone whether we have found it.
So what good is this technique, if it is not guaranteed to give the optimal or
most-efficient answer? There are many problems, especially those with
nonlinear behavior, that do not easily lend themselves to any analytical
answer using conventional techniques. Chaotic processes such as the weather or
the structural behavior of materials may lend themselves well to genetic
techniques.
How do these genetic techniques work? Koza describes the genetic algorithm in
three steps:
1. Randomly generate an initial population of computer programs capable of
solving the problem.
2. Iteratively perform the following operations:
a. Execute each program in the population and assign it a fitness value
according to how well it solves the problem.
b. Create a new population of computer programs by applying the following
operations:
i. Copy existing programs to the new population.
ii. Create new programs by genetically recombining randomly chosen parts of
two existing programs.
3. Evaluate the resulting population of programs. The best program that
appears in any generation may be an acceptable solution to the problem.
One concern you might have about the genetic approach is that it may lead into
a blind alley. That is, a search path may prove to be promising at first, but
after succeeding generations, it may become clear that the algorithm is not
converging on an optimum range of solutions. No particular approach is
guaranteed to converge on a solution. One solution is the aforementioned
mutation. By randomly mutating one or more characteristics of the program,
it's possible to produce potential solutions that could not have been derived
from the genetic algorithm itself. If the mutations do not improve the overall
fitness of the programs in succeeding generations, they will gradually be
weeded out.
Representing the problem to be modeled genetically is also a concern. Koza
favors a binary string in his initial examples, which have some nice
characteristics from a genetics standpoint. However, he acknowledges that this
is overly simplistic, and does not well represent the characteristics of
computer programs. In more complex cases, he uses equations to represent
behavior, and subsequent generations of programs continually modify the
equations according to genetic principles. Since we're already used to
representing the behavior of a system with equations, this seems to be one
appropriate form of problem representation.
Once the problem representation is selected, the programmer has to develop a
measure of fitness. Fitness can be measured in a number of different ways and
is highly dependent upon the specification of the problem. If the desired
output from the program is known, fitness can be simply the difference between
the desired and actual outputs from one of the programs. If the goal is to
maximize the result, then fitness is measured by the magnitude of the output.
These steps are not automatic, and the programmer's selection of the
representation and the fitness measure can greatly affect the outcome. Koza
provides plenty of examples and explains them well, but setting up a problem
seems just as much an art as a science. An intuitive feel for the problem may
be just as important as good genetic technique. For example, I applied Koza's
explanation of symbolic regression to determine the appropriate model form for
data that predicts the amount of time needed to manufacture a given item,
given the number of items that have already been manufactured. The goal of
symbolic regression is to find the functional form, rather than the functional
coefficients, for a given set of finite data.
I'd expect this data to take a negative log form, since the problem
effectively describes the manufacturing learning curve. However, I defined
several functional forms as candidates: F={+, --, *, SIN, COS, EXP, LOG}.
My measure of fitness is the average error between the predicted times and the
actual times. For each form, I designed (like Koza, using Lisp) a set of four
equations for which I substituted one or more of these transformations. For
example, one of those generated was the simple linear-regression form (+ A (*
B X)). (A and B are the y-intercept and slope, respectively, calculated as a
part of each form.) The result after each generation was 24 different
functional forms. I threw out the 12 worst and combined aspects of the
most-successful 12 in different ways. My best effort, after three generations,
was (+ A (* COS (B) (LOG (* X (-- 0 1))))), which is reasonably close to the
negative log answer I'd expect.
The genetic programming approach may be one of the best proposed so far for
the slippery concept of machine learning. The computer produces a range of
possible outputs, examines the algorithms that produced those outputs, keeps
the best ones for another trial, and produces more through the
genetic-recombination process. The system is clearly learning to produce a
reasonable output.
Upon reflection, genetic programming seems comparable to supervised-learning
neural-network techniques. What's the difference between genetic programming
and this class of neural networks? Both have the capability of learning, both
deal well with nonlinear systems, and both can be viewed as a black box. At
one level, the two concepts are clearly related. Through successive trials,
each attempts to converge upon an acceptable solution. There are also
analogies to fitness, in that possible solutions are discarded if they do not
conform well to the expected output.
What is different is the learning algorithm. A neural network adjusts
coefficient values as the error is propagated back through the network. It has
no "memory" and can return to previous states based on succeeding inputs. The
genetic algorithm, on the other hand, tries to improve the overall population
fitness in each succeeding generation. Koza gives an example of a genetic
technique to choose the appropriate neural-network structure to solve a
problem. For anyone who has ever attempted to build a neural net before, this
is an attractive alternative to the usual trial-and-error approach.
Lisp is Koza's language of choice for experimentation in genetic programming.
There are advantages to using Lisp, not the least of which are the easy
manipulation of symbols and the ability to rapidly prototype different genetic
structures.
Genetic programming won't replace traditional, procedural programming anytime
soon. Most of the programming problems we deal with have exact and readily
available solutions. Genetic programming is similar to the expert-system
paradigm in that it seeks acceptable, though not necessarily the best,
solutions to problems that aren't precisely defined or are nonlinear in
nature.
Nonetheless, Genetic Programming is well worth the space on your bookshelf.
More and more problems we deal with have these characteristics, and, though
still in its conceptual infancy, genetic programming is a potentially powerful
approach. Koza's treatment is so comprehensive that, while this may not be the
last word on genetic programming, it may be the only word you'll need.
Genetic
Programming
John R. Koza
MIT Press, 1992, 819 pp, $55.00
ISBN 0-262-11170-5













January, 1994
OF INTEREST
Version 5.37 of Rhetorex's MS-DOS driver for voice and speech applications
improves accuracy by providing response time as low as 100 milliseconds. This
improves accuracy when differentiating between human speech and ambient noise,
silence, and telephone signals in RDSP multiline voice-processing
applications. MS-DOS, OS/2, and UNIX systems are supported by the RDSP
software, which includes a device driver, firmware loader, configuration
module, C interface module, demos with source code, and various utilities.
Reader service no. 20.
Rhetorex Inc.
200 E. Hacienda Ave.
Campbell, CA 95008
408-370-0881
Greenleaf Software is getting small with the release of ArchiveLib, a
Windows-compatible data-compression and archive library for C or C++
programmers. ArchiveLib is an object-oriented, data-compression, run-time
library that lets programmers compress ASCII or binary data and archive
buffers of data within applications without having to store them as files.
Retrieval of compressed data can be read into either a disk file or a memory
buffer. ArchiveLib sells for $279.00. Reader service no. 21.
Greenleaf Software Inc.
16479 Dallas Pkwy., Suite 570
Dallas, TX 75248
800-523-9830
A native, ANSI-standard Fortran 90 compiler for UNIX platforms has been
introduced by Edinburgh Portable Compilers (EPC). EPC's Fortran 90 provides
facilities to organize and manage compilation of program units, detects
incorrect compilation order, and automatically binds a program from its
constituent parts. Optimizing techniques provide array scalarization,
dependence analysis, scalar optimizations, and symbolic register allocation.
Other key features such as modules, array operations, derived types,
recursion, generic and internal procedures, pointers and dynamic allocation,
and new control structures are improvements over existing Fortran 77
implementations. This compiler is initially available for SunOS and
SPARC-based systems. Reader service no. 22.
Edinburgh Portable Compilers
20 Victor Square
Scotts Valley, CA 95066
408-438-1851
The Handbook of Information Security Management, published by Auerbach
Publications and edited by Zella G. Ruthberg and Harold Tipton, examines how
security experts handle information-security threats. It describes steps you
can take to protect against charges of negligence in protecting information
resources, ensure cost-effective development, guard against computer abuse,
and establish effective controls over network communications. The 773-page
book costs $125.00. ISBN 0-7913-1636-X. Reader service no. 23.
Auerbach Publications
One Penn Plaza
New York, NY 10119
800-950-1216
Software Technology (And Tools) is offering training courses for
object-oriented analysis and design. The thrust of the curriculum is that
excellence in engineering begins with a proper analysis of the problem. Good
analysis allows the problem to be solved on paper, independent of
implementation strategies and techniques.
The agenda includes the following topics: basis for analysis, object-oriented
analysis, objects, attributes, relationships, state models, model
organization, design of implementation, state- controlled implementation, and
CASE for OOA. Classes are held monthly, and on-site training is available.
Reader service no. 24.
Software Technology (And Tools)
6 West Main Street, Suite C
American Fork, UT 84003
801-756-0839
Phar Lap's TNT DOS-Extender SDK (also known as the 32-bit Extended-DOS 6.0
development toolkit) lets developers use Microsoft's Visual C++ 32-bit Edition
compiler to build 32-bit DOS applications that can take advantage of Windows
NT capabilities, including threads and multitasking. In addition, Microsoft
has included a free, trial-size version of TNT DOS-Extender Lite on every copy
of the Visual C++ 32-bit Edition CD-ROM.
The SDK, which includes the TNT DOS-Extender, a 32-bit version of Microsoft's
CodeView debugger, and all the components of Phar Lap's 386DOS-Extender SDK,
is priced at $495.00. Reader service no. 25.
Phar Lap Software Inc.
60 Aberdeen Ave.
Cambridge, MA 02138
617-661-1510
Math Advantage 5.0, a mathematical library from Quantitative Technology,
contains over 500 routines for matrices, signal, image, vector processing,
complex functions, interpolation, integration, polynomial functions, rational
functions, and root finding. Special functions are included for both Microsoft
and Borland compilers. Math Advantage supports PCs, Macs, and SPARCstations.
Price will vary according to platform. Reader service no. 26.
Quantitative Technology Corp.
331 Page Street, Suite 12
Stroughton, MA 02072
800-633-6284
Analog Devices is set to release a new class of single-chip digital-signal
processors (DSP). The ADSP-21060 will provide 32-bit single precision (or
40-bit extended precision) and an IEEE floating-point DSP core with three
independent, parallel computational units: ALU, multiplier, and shifter.
Performance is equivalent to 40 MIPS, with 120 MFLOPS peak (80 MFLOPS
sustained). On-chip, configurable memory banks, such as dual-ported, 4-megabit
internal SRAM provide fast, independent local memory access for the DSP core,
DMA controller, and I/O processor. Other features include an I/O processor
with DMA controller, a memory mapper, and communications. Ten DMA channels,
used with the dual-ported SRAM, handle background transfers between internal
and external memory, peripherals, host, and serial/link ports--without impact
on the performance of the DSP core.
Interface to off-chip memory supports programmable DRAM. The system-bus
crossbar provides flexible interconnections between a 16/32-bit host CPU, a
DMA device controller, external memory, peripherals, and optional boot EPROM.
Target pricing per 1000-piece quantities is $296.00. Reader service no. 27.
Analog Devices Inc.
DSP Market Development
Norwood, MA 02062
617-461-3881
Adlersparre recently introduced its Stash 3.0 data-compression software. Stash
works on MVS, VM, AS/400, DOS, OS/2, Mac, and most UNIX machines. It
translates files from ASCII to EBCDIC, or vice versa to maintain consistency
between systems, and multiple files can be compressed and decompressed in one
run. Transactions created on PCs or minis that are batched can be uploaded to
a mainframe for processing, thus reducing upload time. Prices range from
$8500.00 for a mainframe license, to $40.00 for a PC version. Reader service
no. 28.
Adlersparre & Associates
15 North Road
P.O. Box 403
Chesterfield, MA 01012
800-795-9896
Object Master for Think C and C++ provides tools to software developers to
assist them during code development. This program is integrated with Think
Project Manager and works with Think C, Symantec C++, and any other ANSI C or
standard C++ language.
Object Master's Browser window allows access to different pieces of code with
no restriction to physical location. It parses all project files and maintains
a data dictionary of all project components, including classes, functions, and
methods. Changes made to the code are immediately displayed, and no
compilation is necessary. The product sells for $225.00. Reader service no.
29.
ACI US Inc.
10351 Bubb Road
Cupertino, CA 95014
408-252-4444
The SoftProbe 386EX/SIM Simulator/Debugger is a DOS-hosted, bus-level debugger
for real- and protected-mode C/C++ and PL/M applications. The SoftProbe,
recently released by Systems & Software, is designed to run on the 386EX
processor, a version of the 386 specifically targeted for the embedded-systems
market. The SoftProbe performs program execution via software simulation of
all 386EX and 387 functions, without using the actual processor or evaluation
boards.

The SoftProbe 386EX/SIM is compatible with MetaWare C/C++, Watcom/386 32,
MASM, and Intel's C-86/286/386, PL/M-86/286/386, and ASM-86/286/386 compilers.
Reader service no. 30.
Systems & Software Inc.
18012 Cowan, Suite 100
Irvine, CA 92714
714-833-1700
DocuMento 1.1, an online help and documentation tool for OS/2 2.x
applications, has been released by SE International. With DocuMento, only one
information source is used to generate printed documentation, online
documentation, and online help. The tool lets programmers write online
documentation using a standard word processor, then create online help by
converting the document into HLP files without any IPF-Tag language knowledge.
Reader service no. 31.
SE International Inc.
One Park Place, Suite 240
621 NW 53rd Street
Boca Raton, FL 33487
407-241-3428
Win2Mac from Altura is a cross-platform porting tool for Windows developers.
Implementing the Windows 3.1 API as a Macintosh library lets Windows
developers work with a single set of existing Windows source code to recompile
and build Macintosh applications. Reader service no. 32.
Altura Software Inc.
510 Lighthouse Ave., Suite 5
Pacific Grove, CA 93950
408-655-8005
The Annotated ANSI C Standard by Herb Schildt has been published by
Osborne/McGraw-Hill. Schildt annotates the C language as described in the
Standard, presumably making it more understandable to C programmers. The
Annotated ANSI C Standard includes the original, untouched Standard on the
left-hand page, with Schildt's annotations, comments, clarifications, and
examples on the right. The 219-page book sells for $39.95; ISBN:
0-07-881952-0. Reader service no. 33.
Osborne/McGraw-Hill
2600 Tenth Street
Berkeley, CA 94710
800-227-0900
Nohau has released the POD-16Y1 probe, an in-circuit emulator for Motorola's
68HC16Y1 microcontroller. The POD-16Y1 supports all the 68HC16Y1 operating
modes: single chip, partially expanded, and fully expanded. The probe works
with the EMUL16/300-PC emulator board, which also supports the 68300 family of
microcontrollers. The emulator supports the chip at full speed, currently
16.78 MHz. The user interface is a Microsoft Windows 3.x application.
POD-16Y1 comes with emulation RAM from 256 Kbytes in one bank to 4 Mbytes in
four banks. The emulator consists of an emulator plug-in board, a
five-foot-long twisted-pair ribbon cable, a pod board, and an optional trace
board. Prices start at $2495.00. Reader service no. 34.
Nohau Corporation
51 E. Campbell Ave.
Campbell, CA 95008
408-866-1820
ACCENT STP 2.0 from National Information, is a translation tool for Sun Open
Look applications. According to a spokesperson, the Sun user community will
now be able to migrate to CDE and COSE without rewriting the graphical user
interface portion of their application. ACCENT STP will translate 80 to 100
percent of the C or C++ application source code produced by XView, OLIT, or
Devguide GIL files--including header files--for which equivalent paradigms are
offered in Motif. The translated output will be recognizable Motif C or C++
source code. Version 2.0 features a number of new enhancements, including
support for "drag and drop," the TTY Widget, and internationalization.
ACCENT STP consists of three separate modules: the Devguide Conversion, XView
Conversion, and OLIT Conversion; each costs $4995.00. A WindowMaker GUI editor
is $1495.00. Reader service no. 35.
National Information Systems Inc.
4040 Moorpark Ave.
San Jose, CA 95117
800-441-5758





























January, 1994
SWAINE'S FLAMES


Defense Talks at Foo Bar


Saturdays I man the pumps at an establishment called "Foo Bar" out on Poison
Oak Road. Tonight's pretty quiet except for the guy at the end of the bar in
the Apple T-shirt sobbing into his third Pepsi. I'm going to have to cut him
off soon.
Even Memphis Joe and Corbett are quiet until the car alarm goes off.
All eyes turn to the parking lot, where through the open front door we see a
very surprised cat jump off the hood of a BMW. The Pepsi drinker sniffs,
"Casper: off," and the alarm offs.
"Nice system," Corbett says. "Keyed to the owner's voice, I presume. Pretty
foolproof, really."
Memphis Joe turns back to his mint julep. "It sucks," he grumbles. I check
under the bar for my persuader.
"You're wrong," Corbett argues. "Any security system is just locks and keys.
Here's a key that can only be used by the owner, can't be lost or stolen, and
works at a distance."
"So you think your throat is a safe hiding place," Memphis Joe says. "Fine.
Personally, if somebody holds a knife to my throat, I'll sing like a bird."
"There's a deterrent," I say, trying to lighten things up, but Corbett is off
on one of his tangents.
"Rather than putting the key in your throat or head," he says, his eyes
dreamy, "what you really want to do is put the lock in the crook's head."
Memphis Joe raises his glass. "Corbett, that is, without a doubt, the ripest
load of garbage you've yet delivered."
I cut in. "Criminals sure are more violent these days. Drugs, do you figure?"
Memphis Joe doesn't figure that. "Nah, it's plastic money."
Corbett stares at him. "You're nuts. Before credit cards and ATMs, people
walked around with cash all the time. They were far more vulnerable to crime."
"More vulnerable to pickpockets. Today, the only time a thief can be halfway
sure somebody's carrying cash is when they've just left an ATM machine. And
that's when the victim is on guard, so no pickpocket stands a chance."
"Leading to less crime."
"Leading to the nonviolent dip getting edged out by the violent mugger."
Corbett shrugs. "Anyway, the real cause of the increase in violent crime in
America is the easy availability of military-grade weapons."
"Huh. I might have known you were one of them."
"Face facts, Joe. There's solid scientific evidence that the mere possession
of a gun increases your risk of getting shot."
"Correlational data," Memphis Joe snarls. "The only way to demonstrate a
causal link is by experimental manipulation. You have to give people guns and
observe the effects, contrasted with a control group not given guns."
"Oh, there's a brilliant plan," Corbett sneers.
"Otherwise you can't control the psychological factors."
"Somebody should control your psychological factors."
Memphis Joe growls. Corbett's lip curls. The cat, which has sneaked in to
steal bread, exits hastily, staying clear of the BMW.
Before things can go any farther, I cough and nod meaningfully at the sign
behind the bar, the one that says, "These premises protected by Smith and
Wesson." They shut up just as though a lock has snapped shut in each of their
heads.
I like it quiet.
Michael Swaineeditor-at-large



























February, 1994
EDITORIAL


Software Patents Step One Toe Over the Line


Few announcements have rocked the computer industry more than Compton's
NewMedia crowing about its newly granted patent for what amounts to inventing
multimedia. In particular, Compton's patent (5,241,671) covers "a search
system in which a multimedia database consisting of text, picture, audio and
animated data is searched through multiple graphical and textual entry paths."
Sound familiar? It should, because it describes the underpinnings of just
about every multimedia or interactive database on the market or under
development. Compton's, with patent in hand, will license the patent for from
1 to 3 percent of gross revenues of your interactive products or services.
Although the patent will likely be challenged through the U.S. Patent Office's
re-examination process or courts, there's no question who will be king of the
interactive hill if Compton's defends and holds onto this patent.
The criteria for granting a patent is that an invention be novel and
nonobvious to skilled practitioners of the art. With this in mind, DDJ
reviewed the 40-page patent document, looking for uniqueness. But that's not
what we found. For instance, the patent describes a multientry search system
into a multimedia database. As contributing editor Tom Genereaux points out,
this technique was (and is) used by the Prodigy System and its forebears. The
Prodigy database is a multikeyed tree of intermixed graphics and text. That
the medium supports a higher level of graphical detail is irrelevant in this
case: Prodigy is constrained by the need to deliver its images over telephone
lines using least-common-denominator modems.
The technique claimed by Compton's has also been used in the Orion system at
MCC (first described in 1985) and the EDC system at General Electric
(described in 1982). In fact, multiple-keyed game discs date back to the mid
'80s: Dragon's Lair (a game based on a laser-disc technology) used a similar
system to deliver keyed sequences of moving images.
The menuing system described in Compton's patent has been used in the Star
system from Xerox and in document-imaging systems developed at Wang. Again,
that a higher-bandwidth medium is now available is irrelevant. Keyed animation
playbacks have been in common use in graphics laboratories and companies for a
decade, and the part of the patent that purports to be unique--the keying of
text and graphics--was used by Star and Wang workstations to access images.
Nor does the patent necessarily pass the nonobviousness test: A "textual entry
path means and said graphical entry path means_for assisting a user in
searching said graphical and textual information" is something we're all
familiar with--a user interface. But don't take my word for it: We've uploaded
the bulk of the patent's text into the DDJ Forum on CompuServe so that you can
check it out yourself. While we haven't included the flowcharts, there's still
enough legalese to give you a taste of the patent.
What Compton's has to gain from this is obvious--money, and lots of it,
particularly if the company cuts deals with cable TV and telephone companies.
However, the patent licenses themselves may be a drop in the bit bucket
compared to what the company really hopes to gain. Compton's is willing to
forgive royalties if you license their SDK, form a strategic alliance with
them, or let them be the exclusive distributor of your interactive media.
Getting a hammerlock on the distribution channels is where the real money is,
and the patent can be a carrot or stick to make this possible.
But Compton's road may still be rocky. Not only will the company have to deal
with the inevitable challenges to its patent by competitors who have equally
deep pockets (Microsoft, for one), the multimedia giant wanna-be may also have
to worry about Britannica, its former owner. Compton's NewMedia is owned by
the Tribune Publishing Company, which bought Compton's from Encyclopedia
Britannica in 1993 for $56 million, a surprising amount approximately twice
the level of revenues. Reportedly, Britannica, which still co-owns the patent,
is miffed over Compton's broad claims and the way in which the patent was
announced.
Compton's patent isn't the real problem, however. When you get down to it, the
issue is how patents are granted. There's no question that all developers and
inventors deserve intellectual property protection, but even U.S. Patent and
Trademark Commissioner Bruce Lehman admits that far too many existing software
patents should never have been granted. Consequently, the Patent Office is for
the first time undertaking a review of the software-patent process, starting
with a pair of public hearings entitled, "Request for Comments on Intellectual
Property Protection for Software-Related Inventions." These hearings will take
place at the San Jose Convention Center (San Jose, California) on January
26--27, 1994, and Crystal Forum (Arlington, Virginia) on February 10--11,
1994. No matter how you feel about software patents, here's your chance--maybe
your only chance--to really be heard.
Jonathan Ericksoneditor-in-chief












































February, 1994
LETTERS


More on Discrete-event Simulation




Dear DDJ,


In Peter Varhol's "Extending a Visual Language for Simulation" (DDJ, June
1993), the code for the factorial function should be:
if X < 0 then fact := 0
else
 if X = 0 then fact := 1

The code for procedure Poisson contains variables which are calculated but
unused, as in testY_int. X[0] appears to be positive, based upon comments and
its use in calculating testX_int, but the logarithm Ln(--X[0]) is calculated.
This simulation appears to be a continuous time simulation using VisSim in
which the Poisson distribution is used to determine if an arrival has occurred
in a given time interval. A true discrete-event simulation proceeds from event
to event and would calculate the time interval to the next arrival (using the
exponential distribution for a Poisson process). Such a simulation would
generally be far faster.
Louise Baker
Albuquerque, New Mexico
Peter replies: I've since fixed the code for both the Poisson and exponential
distributions. Now it also lets the user enter a random seed, or lets the
system generate one. You've put your finger on one of the problems of writing
discrete-event libraries for a continuous-simulation engine. Each VisSim clock
tick represents a fixed amount of time, so 100 clock ticks would be equivalent
to 100 seconds in the simulation run. I even started writing it in the way you
suggested, but came to the same conclusion you did--it would be too slow to
represent all but the shortest simulation.
Instead, I did the opposite. In my code, each clock tick represents a
customer-driven event--an enqueue, a dequeue, or a service complete. To make a
long story short, the probability distributions determine the amount of time
that has passed between events. Therefore, a fixed-length VisSim clock tick
actually represents a variable amount of time, depending on values generated
by the distributions.
Performance depends a lot on the system and the complexity of the simulation.
If VisSim does plots, for example, the simulation can take much longer than
otherwise. Interestingly, if there are no outputs, VisSim does nothing. On
reasonably straightforward one- or two-queue simulations with one or two
plots, I can run 1000 transactions in about 10 seconds on a 486/33, which
isn't too shabby. It isn't as good as some of the procedural simulation
languages, but they don't have the ability to view the simulation, or to
change parameters while the simulation is running.
Huffman Compression


Dear DDJ,


The code in Example 1 is extracted from a program I built around Al Stevens's
Huffman encoding routines, which appeared in the October 1992 issue. As Tom
Swan pointed out in his "Algorithm Alley" column (DDJ, July 1993), Huffman
compression routines are slow. My version of compress() is faster than the
original because it is not recursive and because the Huffman tree is traversed
a maximum of 256 times rather than once for each character in the input file.
Maintaining a large file-output buffer in memory boosts performance even more,
because, in addition to the obvious advantage of writing often to memory and
seldom to disk, you can then use the lowest-level file-output routines.
Richard Zigler
Marion, Michigan


Fuzzy Logic By Any Other Name_




Dear DDJ,


Though useful, "fuzzy logic," as discussed by Michael Swaine in "Programming
Paradigms" (DDJ, July 1993), is neither non-Western nor fuzzy. "Gradient
logic" might be a better name for it. Gradients and continuums have been
familiar to Western thinkers for a very long time. After all, who discovered
calculus?
It's just that traditionally, if a system has had more than two truth values,
we haven't called it logic--we've called it arithmetic, or mathematical
modeling. Lotfi Zadeh's accomplishment was to link logic and arithmetic in a
handy way; he did not invent a new form of thought.
Anyhow, there have long been many extensions of Aristotelian logic, such as
Boolean algebra, modal logic, deontic logic, conditional logic, and defeasible
(default) logic. The last of these does much the same job as fuzzy logic,
except that it deals with uncertainty in the inference rules rather than in
the truth values.
As for the idea that "non-Western" thinking transcends ordinary logic, it's
very easy to think that anything transcends ordinary logic if you don't
understand it very well.
Michael A. Covington
Athens, Georgia


It's No Secret




Dear DDJ,



The December 1993 "Editorial'' by Jon Erickson, "Cryptography Fires Up the
Feds" has come closer to the truth than he might imagine. As a researcher in
cryptography and an inventor of a patented public-key cryptographic system, I
have had my own visits from the NSA. I have read all of the technical and news
articles regarding U.S. policy and, until recently, have been perplexed by the
rationale behind it.
Conversations with NSA personnel and careful reading of the events of the last
several years culminating in the Clipper/Skipjack initiatives of this year,
lead to the only logical conclusion regarding government policy on encryption
technology: U.S. government attempts to control encryption technology are
directed not at foreign governments, but at United States citizens.
This conclusion is a simple deduction of the facts: 1. Since encryption
algorithms (including RSA and DES) cannot feasibly be contained within our
borders, restricting products that contain them is futile, if the goal is to
keep them from being used by foreign governments. 2. Powerful encryption
technology has been developed abroad, including some recent work by an
erstwhile enemy. 3. No foreign government will purchase equipment that
contains encryption technology open to U.S. intelligence agencies.
These facts must be obvious even to those bureaucrats in Washington attempting
to dictate policy on the exchange of information. Logically, therefore, the
government's attempts to control encryption technology are directed at its own
citizens. Limiting export of products including encryption technology inhibits
domestic development of the technology, as do acts such as the
Clipper/Skipjack initiative. Without the restrictions the government is
pursuing, in several years we could buy a reasonably priced telephone that
allows us to communicate securely--free from possible government
eavesdropping. With government restrictions, only gangsters and drug dealers
will use secure communications devices purchased abroad. We need to insist to
our legislators and policy makers that we don't wish to purchase a false sense
of security at the expense of our Constitutional liberties.
Walter M. Anderson
Bedford, Massachusetts
Date Redux


Dear DDJ,


In the July 1993 "Letters," Karl Hoppe gives an algorithm for determining the
date of Easter, but he doesn't mention that this algorithm works only for the
Gregorian Calendar. (Readers probably also noticed that there's a
typographical error in the listing: There should be a variable, J, for the
quotient of C/4, and in the next step 21 should be 2J.)
For the Julian calendar (until 1583), the algorithm can be changed to the
following (based on the paper by Chr. Zeller, Acta Math. 9, 1894), using the
same A, B, and C, as before:

Step Remainder
(19A+15)/30 D
(D+C+C/4-B)/7 E

Easter is then D+7--E days after March 21. The same algorithm works for years
after 1582 if 19A+15 is replaced by 19A+15+B--B/4-- B/3 and D+C+C/4--B is
replaced by D+C+C/4+B/4+2--2B. Hoppe's algorithm has the advantage of giving
the month and day of Easter directly, while Zeller's algorithm produces only
the offset of Easter from March 21. On the other hand, Zeller's algorithm is
fully explained in his paper.
Hoppe's letter inspired me to read Peter Meyer's "Julian and Gregorian
Calendars" (DDJ, March 1993). This is an interesting article and the routines
seem to work, but I wish Peter had indicated the advantage of Gregorian-day
numbers over the commonly used Julian-day numbers. He does note that the
Julian-day number for any date is simply the Gregorian-day number plus the
Julian-day number of October 15, 1582. (For this to be true, interpreting
"Julian-day number" in the astronomical sense, you must use the gdn for the
Gregorian calendar on or after October 15, 1582, and make dates from October
5, 1582 through October 14, 1582, invalid.)
Peter comments that, "no function or program can be relied upon unless it is
tested thoroughly," and he includes a program, DATETEST, which presumably does
this. But this routine shows only that date_to_gdn() is the inverse of
gdn_to_date(); this is important information, and if it is false, the
functions are clearly wrong. If it is true, however, the functions are not
necessarily correct. For example, modify date_to_gdn() by replacing the line
dt-->gdn = gdn' with dt-->= gdn/2' and modify gdn_to_date() by replacing
gdn = dt-->gdn' with gdn = 2*(dt-->gdn)'. These functions are obviously
different from the ones given in Meyer's code and do not give the correct
values. Nevertheless, DATE-TEST 0 1 applied to these functions "reveals no
bugs," because the modified date_to_gdn() is the inverse of the modified
gdn_to_date().
The only actual check of date_to_gdn() would be to show that for any given
date, the function sets date.gdn to the number of days before or after October
15, 1582. This requires that either you have a way of determining that number,
which is known to be correct and which is independent of date_to_gdn(), or
that you offer a convincing theoretical proof that the routine does what it is
supposed to do. The numerous "magic numbers" in Meyer's code make this latter
alternative difficult for one who does not know the meanings of these numbers
and the significance of the operations using them. The claimed range of years
--37,390 to 40,555 makes the first process more difficult since there probably
doesn't exist a program which is known to give correct results for all years
in this range.
I have used Peter's code to produce the Julian-day number for any date and the
date for all dates from January 1, --4712 through January 1, 4000 against the
values given by a program I wrote (directly from the definition of Julian Day,
with no magic numbers needed). This check produced no errors, and used many,
but not all, of Peter's dat_to_gdn() and gdn_to_date() functions. Assuming
that my code is correct, this verifies much of Peter's code for the range
tested.
Peter's code is useful for converting Gregorian dates to Julian dates and
back; no other program I know of can do so. I believe it works as claimed, but
I'd like to know for sure that it works throughout the large range he gives.
B.J. Ball
Austin, Texas


Example 1: Zigler's compression routine.


typedef struct
 {
 int cnt ; /* count of nodes to root */
 DWORD path ; /* bit-encoded path to root */
 } HPATH ; /* path to root of Huffman tree */
static int pascal compress ( void )
 {
 register int c ; /* chars from input file */
 h ; /* follows path through tree */
 int ncnt ; /* count nodes to root */
 child ; /* child node of current node */
 HPATH * php ; /* pointer to HPATH array */
 DWORD acc = OL ; /* accumulator for code bits */
 for ( c = 0 ; c < 256 ; c++ )
 {
 if ( ht[c].cnt > 0 )
 {
 php = hp + c;
 h = c;
 ncnt = 0;
 acc = OL;
 do
 {
 ncnt++;
 acc <<= 1;
 child = h;

 h = ht[h].parent;
 if ( child == ht[h].left )
 (int)acc = 1;
 }
 while( ht[h].parent != -1 );
 php->cnt = ncnt;
 php->path = acc;
 }
 }
 while ( (c = getc(fi)) != EOF )
 {
 php = hp + c:
 ncnt = php->cnt;
 acc = php->path;
 while ( ncnt-- )
 {
 outbit ( (int)acc & 1 );
 acc >>= 1;
 }
 }
 }









































February, 1994
Patterns and Software Development


Adding value to reusable software




Kent Beck


Kent is founder of First-Class Software, providing consulting, tools, and
components for Smalltalk developers. He can be reached on CompuServe at
70761,1216.


Patterns are a way of developing and packaging reusable software components.
The idea of patterns is gaining attention in certain programming
circles--especially those based on object-oriented languages and paradigms. At
last fall's OOPSLA '93 conference, the foreground topics focused on mainstream
development methodologies (such as the second-generation versions of Booch,
Rumbaugh, Shlaer-Mellor, and the like), but smoldering in the background was
much discussion around patterns. This subject will likely catch fire in the
coming year.
Driving the discussion of patterns is the ongoing need to create truly
reusable software--the long-awaited benefit of OO languages and methodologies
that has yet to materialize.
In this article, I'll look at patterns as a method of guiding reuse. Although
some of this discussion may be abstract, it draws upon my ten years of
experience as a programmer and current vendor of object tools (Profile/V and
the Object Explorer).
Patterns should not be confused with methodologies. A methodology tells you
how to write down the decisions you have made. A pattern tells you which
decisions to make, when and how to make them, and why they are the right
decisions. Methodologies are free of content: Once you imagine a specific
solution to a problem, a methodology gives you the wherewithal for writing it
down and arriving at a correct implementation. By contrast, patterns are all
content.


Abstractors and Elaborators


I divide the world of software development into two parts: the abstractor,
creating reusable pieces; and the elaborator, massaging those pieces to fit
the needs of a user. Microsoft has lately been promulgating a roughly similar
vision, in which software development is divided into two categories:
component builders (for example, programmers who write a DLLs or class
libraries in C or C++), and solution builders (those who use high-level tools
such as Parts, Visual Basic, PowerBuilder, or an application framework in
conjunction with low-level DLL components to construct application-level
solutions for end users). The abstractor/elaborator categorization is more
general, so I'll stick with it.
The economics of reusable software are dominated by the cost of communicating
between abstractor and elaborator. For example, if an abstractor takes 1000
hours to create a piece of reusable software, and 100 elaborators each take
100 hours to understand how to use it, then the elaborators have collectively
spent ten times as many hours as the abstractor. Obviously, these numbers are
hypothetical, but six months to create a reusable component and two-and-a-half
weeks to learn how to use to use it effectively are well within the realm of
possibility.
Making the abstractor more efficient (by providing, say, a faster compiler or
whizzy debugger) won't reduce the total effort spent on writing software; if
you view the abstractor and the elaborators as belonging to the same economic
domain (say, a large corporation or organization), the equation's total is
little changed. The only way to significantly affect the sum is to either
reduce the number of elaborators (a bad thing, because it implies that
software is not being reused, and thus more work is done from scratch), or
reduce the time they spend figuring out the software.
This is nothing new. The old story of maintenance taking up 70 percent of the
resources is really another way of saying the same thing. The new wrinkle is
that, when you introduce software reuse into the equation, it isn't just one
hapless programmer trying to figure out an obscure piece of code--it's
hundreds.
Constructing a software component so that it is reusable is a step forward,
but nowadays it's not enough. The abstractor needs to do more. Why should the
abstractor care? In one model of reuse, there is a development team within a
company building software components for other teams to use; in this model,
making the elaborators more efficient reduces the development resources
required. The company can then use the freed-up resources to shorten
time-to-market, increase features, reduce development cost, or improve
quality.
In another model of software reuse (the market model), reusable components are
available for developers on the open market (for example, the Visual Basic
add-on market). Here, if you are a VBX vendor (abstractor) and your customers
(elaborators) are able to produce finished applications sooner, you will have
a substantial edge over your competition.
If the time it takes elaborators to figure out reusable software is an
important issue and solving the problem has significant payback, how can we
reduce the time necessary to understand how to reuse software? What is it
that, in the hands of elaborators, would make them more successful, sooner?
Another way of asking the question is, what do abstractors know that they
aren't communicating?
What's missing is a way for abstractors to communicate their intent. The
abstractor, in building a piece of reusable software, is solving a whole set
of future problems. Indeed, most reusable software results from the experience
of being an elaborator several times, then having a flash of insight that
solves a number of elaborator problems once and for all. The abstractor needs
to communicate which problems a reusable component is intended to solve, how
to think about the problem, how this thought process is embodied in the
software, in what order to pursue subissues of the problem, and so on.
Communicating with elaborators is more important than, say, using a better
programming environment.
If you need to communicate what you were thinking about when you wrote your
reusable software, what form would such communication take? Of course there
are the usual mechanisms--a tutorial, reference manual, comments in the code,
the coding conventions used by the source (if it is available to the
elaborator), and, of course, word of mouth--bits of advice passed from guru to
novice.
Researchers and developers have been exploring another approach, which falls
under the rubric of patterns. I'll discuss the abstract definition later;
first, I'll provide a concrete example of how patterns can be used to
communicate the programmer's intent.


A Multicurrency Library


Let's take as an example a class library for handling multicurrency
transactions. There are two principal classes: a Money entity, which has a
value and a currency, and a CurrencyExchange, which can convert a Money in one
currency to a Money in another. How can you use these objects? What is the
intent behind the design? Here are three patterns that describe it. While by
no means complete, a set of 15 or 20 such patterns would provide any
elaborator a good start on reusing the library.
The Money Object Pattern 
Problem: How to represent a monetary value in a system which needs to deal
with many different currencies.
Constraints: One important concern in a system dealing with financial
calculations is efficiency--making sure the calculations run in a timely
manner and use as little memory as possible. The simplest representation of
monetary values, and one which maps well onto the hardware, is representing
them as fixed or floating-point numbers.
While you'd like your system to be as efficient as possible, you'd also like
it to be flexible. For instance, you'd like to be able to decide as late as
possible in which precision computations should occur. The rapidity of change
of most financial systems dictates that flexibility is more important than
efficiency for most applications---you can always buy faster hardware. When
you need real number crunching, you can translate from and to a representation
more flexible than simple numbers.
Another consideration, related to flexibility, is that a system handling
multiple currencies should be as simple to use as possible. Only code
concerned with creating or printing currency values should be aware that many
currencies are possible. The rest of the code should look as much as possible
like you are just using numbers.
Solution: When you need to represent a monetary value, create an instance of
Money whose value is the value you need to represent and whose currency is the
standard, three-character abbreviation (USD for United States dollars, for
instance).
The Money Arithmetic Pattern 
Problem: How can you do arithmetic with Money?
Constraints: Money arithmetic should be as simple as possible. Taking this
constraint to the extreme would lead you to allow Money and numbers to freely
interoperate, perhaps with a default currency to allow conversion of numbers
to Money.
A far more important principle than mere programming convenience is making
sure financial algorithms are correct. Restricting the combinations of values
that can operate together arithmetically can catch many programming errors
which might otherwise produce answers that seem reasonable, but are incorrect.
Solution: Send a Money the message + or -- with another Money as the
parameter, or * or / with a number as the parameter. A Money will be the
result of any of these messages. Adding a Money and a number, or multiplying
two Moneys will result in an error.
The Money Print Pattern
Problem: How can you print a Money?
Constraints: The simplest possible design has a single global exchange rate.
Asking a Money to print itself would cause it to convert to the common
currency and print.
This simplest solution ignores the complexity of most financial systems, which
must deal with multiple exchange rates--some historical, some current (perhaps
kept up-to-date with currency exchanges), some projected. By specifying an
exchange rate (in the form of a CurrencyExchange), your printing code will be
slightly more complicated, but much more flexible as a result.
Solution: Print Money by sending CurrencyExchange the message "print" with
Money as an argument. Money will be printed in the CurrencyExchange's
preferred currency. There is a second message, printCurrency, which takes two
arguments. The first is the Money to be printed, and the second is the
currency (again, a three-character string containing a standard abbreviation)
in which to print it.



Patterns


As you can see, a pattern has three parts:
Problem. The first part of every pattern is the problem it solves. This is
stated as a question in a sentence or two. The problem sets the stage for the
pattern, letting readers quickly decide whether the patterns applies to their
situation.
Context. Patterns explicitly describe the context in which they are valid. The
context is the set of conflicting constraints acting on any solution to the
problem. You saw in Money an example of efficiency vs. flexibility. Other
patterns might blalance development time and run time, or space and speed.
The constraints aren't just described, however. The pattern also specifies how
the constraints are resolved. Money states that flexibility and correctness
are more important than raw efficiency. Other patterns might find a balance
between two or more constraints, instead of saying that one dominates. The
aforementioned patterns really just sketch the context section. A fully
developed pattern might have two or three pages of analysis to back up its
solution.
Solution. Given the analysis of the constraints in the context section, the
solution tells you what to do with your system to resolve the constraints.
Supporting the solution is an illustration of it at work--either a diagram or
code fragments.


Patterns Form Language


Although patterns are interesting in isolation, it is when they work together,
forming a coherent language, that their power becomes apparent. A few times in
my life I've been fortunate enough to work with someone who just seems to ask
the right questions first. Rather than chasing issues that seem interesting
but are ultimately secondary, some people can zero in on the one issue at any
given moment that will allow the most progress. A language of patterns can
function in much the same way.
By choosing the order in which the patterns are considered, the pattern writer
has the chance to guide the reader in dealing with issues in the right order.
In the patterns above, I have chosen to ignore efficiency for the moment,
confident that should the issue arise later, it can be dealt with locally (I
can imagine a later pattern which tells how to temporarily suspend the
flexibility of Money to gain efficiency). In general, a good pattern language
will lead you to address issues with wide scope early, and those with limited
impact later.
How can you write your own patterns? The bad news is that applying patterns to
programming is a new enough technique that there isn't anything like a body of
experience to draw on. However, the Hillside Group has made progress with
patterns. (See the accompanying text box entitled, "Pattern Resources.")
The first step in writing a pattern is a process of discovery. You notice
yourself making the same decision over and over. You might find yourself
saying, "Oh, this is just a such and so," or, "Oh, we don't have to worry
about that now." These are the moments that you can capture as patterns.
Once you have noticed a recurring decision, you have to invent the pattern
that encodes it. First, you must catalog the constraints that make the
solution right. You will often find in exploring the constraints that you
don't quite have the solution right--either it isn't the right solution, or
you've described it too specifically or too generally. Finally, you have to
find a problem statement that will help a reader choose when the pattern is
appropriate.


A Pattern Checklist


After you have a pattern, you need to evaluate and refine it. Here is my
checklist when I'm looking at a new pattern:
Does it read well? Does it have a sense of tension and release? Two thirds of
the way through the context section of a good pattern you should be saying, "I
never thought of this problem in quite this way. Now that I see all the
constraints that have to be satisfied, I can't understand how there is any
solution." Then, when you read the solution, you should blink your eyes, drop
your shoulders, and give a sigh. Strongly literary patterns will make a bigger
impact on the reader, and are likely to be based on deeper insight and clearer
thinking than patterns that don't read like a story.
Does it tell me what to do? In the early stage of finding a pattern, I often
find that I have really only described a solution without having stated the
problem. The typical symptom of these solution-oriented patterns is that they
don't tell you what to do and when to create the solution. Solution patterns
leave the hard work to the reader--figuring out when a solution is appropriate
and how to create it. As a pattern writer, you have this information tucked
away in your head somewhere. Introspecting enough to pin it down and express
it is what will make your patterns (and the code they support) valuable.
Does it stand without being easily broken into other patterns? I have heard
"client-server" suggested as a pattern. While I can imagine a description of
it that would read well, it fails the atomicity test. There is really a
language of patterns which create client-server architectures. Somewhere in
there are the decisions that divide responsibility for computation and storage
between a shared server and multiple clients. Just saying "client-server,"
though, is too vague; it captures too many decisions to be a pattern.
Does it fit with other patterns to solve a larger problem? On the one hand, a
pattern needs to stand on its own, without being further decomposable.
However, for a pattern to be complete it must work in harmony with others to
solve a larger problem. If I can't imagine how a pattern could be part of a
larger language, either it isn't a good pattern, or other patterns are out
there waiting to be discovered.
Using patterns to enhance reuse is just one of the ways patterns are being
applied to programming.


Pattern Resources


The idea of patterns capturing design expertise originated with the architect
Christopher Alexander. His books The Timeless Way of Building and A Pattern
Language (both from Oxford Press) are required reading for anyone who wants to
get serious about patterns. A forthcoming Addison-Wesley book, Design
Patterns: Micro-architectures for Object-Oriented Design, by Erich Gamma et
al., catalogs some of the most common object patterns.
The Hillside Group is a nonprofit corporation founded to promote communication
to and through computers by all potential users, focusing initially on
patterns as a strategy. The founding members are myself, Ken Auer, Grady
Booch, Jim Coplien, Ralph Johnson, Hal Hildebrand, and Ward Cunningham. Our
sponsors are Rational and the Object Management Group. In August 1994 we will
sponsor the first annual Pattern Languages of Programs conference. For more
information, contact plop94@ee.pdx.edu. The Hillside Group also has a mailing
list, which you can contact at patterns-request@cs.uiuc.edu.
--K.B.

























February, 1994
Designing an Application Framework


Reusability is what's important




Grady Booch


Grady is chief scientist at Rational and author of Object-oriented Analysis
and Design (Benjamin/Cummings, 1994), on which this article is based. Grady
can be contacted at egb@rational.com.


A major benefit of C++, Smalltalk, and similar object-oriented programming
languages is the degree of reuse they make possible in well-engineered
systems. Achieving a high level of reuse means that you write and maintain
less code for each new application.
Reusing individual lines of code is the simplest form of reuse (all of us have
copied and pasted code from one application into another), but it offers the
fewest benefits (that code must be replicated across applications). You can do
better by specializing existing classes through inheritance. Better yet, you
can reuse whole groups of classes organized into a "framework"--a collection
of classes that provide a set of services for a particular domain.
Domain-neutral frameworks, such as general foundation-class libraries, math
libraries, GUI libraries, and the like, apply to a wide variety of
applications. Vertical application frameworks--hospital patient records,
securities and bonds trading, general business management, telephone switching
systems, and so on--are specific to a vertical domain. Wherever there's a
family of programs that solve substantially similar problems, there's an
opportunity for an application framework. In this article, I'll present a
foundation-class library design with an adaptable architecture. A number of
requirements must be met when writing such a foundation-class library. In
general, the library must provide a collection of domain-independent data
structures and algorithms sufficient to cover the needs of most
production-quality C++ applications. Additionally, it must be:
Complete. The library must provide a family of classes, united by a shared
interface, but each employing a different representation so that developers
can select the ones with the time and space semantics most appropriate to
their given application.
Adaptable. All platform-specific aspects must be clearly identified and
isolated so that local substitutions may be made. In particular, developers
must have control over storage-management policies, as well as the semantics
of process synchronization.
Efficient. Components must be easily assembled (efficient in terms of
compilation resources), must impose minimal run-time and memory overhead
(efficient in execution resources), and must be more reliable than hand-built
mechanisms (efficient in developer resources).
Safe. Each abstraction must be type-safe, so that static assumptions about the
behavior of a class may be enforced by the compilation system. Exceptions
should be used to identify conditions under which a class's dynamic semantics
are violated; raising an exception must not corrupt the state of the object
that threw the exception.
Simple. The library must use a clear and consistent organization that makes it
easy to identify and select appropriate concrete classes.
Extensible. Developers must be able to add new classes independently, while at
the same time preserving the architectural integrity of the framework.
Unfortunately, these requirements are open-ended: A library that provides
abstractions for all the foundation classes required by all possible
applications would be huge. Since problems like this could easily suffer from
analysis paralysis, you should focus on providing the most generally useful
library abstractions and services.


Domain Analysis


Your first step is domain analysis: Survey the theory of data structures and
algorithms, then harvest abstractions found in production programs. For
example, start the analysis session by organizing abstractions into
structures, which contain all structural abstractions, or into tools, which
contain all algorithmic abstractions. There is a "using" relationship between
these two categories: Certain tools build upon the more primitive services
provided by some structures.
In the second phase of the domain analysis, study the foundation classes used
in a variety of application areas (the wider the spectrum, the better). Along
the way, you may discover common abstractions that overlap with what you
encountered in the first phase: This indicates that you've discovered truly
general abstractions, so you'll keep these within the boundary of your
problem. You may also find certain domain-biased abstractions--currency,
astronomical coordinates, measures of mass and size, and the like. You can
reject these abstractions because they are either difficult to generalize
(such as currency), highly domain specific (such as astronomical coordinates),
or so primitive that it is hard to find compelling reasons to turn them into
first-class citizens (measures of mass and size, for example). On the basis of
this analysis, you can settle on the kinds of structures in Table 1.
Organizing the abstractions represented by this list is a problem of
classification. I chose this particular organization because it clearly
separates behavior among each category of abstractions.
You might settle upon the kinds of tools shown in Table 2, which are also
based upon the domain analysis. Many of these abstractions have obvious
functional variations. For example, you may distinguish among many different
kinds of sorting agents (for quick sorting, bubble sorting, heap sorting, and
so on), as well as among different kinds of searching agents (those
responsible for sequential searching; binary searching; and pre-, in-, and
post-order tree searching).


Design


Coggins's Law of Software Engineering states that, "pragmatics must take
precedence over elegance, for Nature cannot be impressed." A corollary is that
design can never be entirely independent of language--the particular features
and semantics of a language influence architectural decisions. Ignoring these
influences leaves you with abstractions that do not take advantage of the
language's unique facilities, or with mechanisms that cannot be efficiently
implemented in any language.
Object-oriented languages offer three basic facilities for organizing a rich
collection of classes: inheritance, aggregation, and parameterization.
Inheritance is the most visible (and most popular) aspect of object-oriented
technology; however, it is not the only structuring principle to consider.
Indeed, parameterization combined with inheritance and aggregation can lead
you to a very powerful, yet small, architecture.
The elided declaration of a C++ domain-specific queue class in Example 1(a) is
a concrete realization of the abstraction of a queue of events: a structure in
which you can add event objects to the tail of the queue, and remove them from
the front. C++ encourages abstraction by letting you state the intended public
behavior of a queue (expressed via the operations clear, add, pop, and front),
while hiding its exact representation. Certain uses of this abstraction may
demand slightly different semantics--you may need a priority queue, in which
events are added to the queue in order of their urgency. You can take
advantage of the work you've already done by subclassing the base queue class
and specializing its behavior, as in Example 1(b).
Virtual functions encourage abstraction by allowing you to redefine the
semantics of concrete operations (such as add) from a more generalized
abstraction. In combination with parameterized classes, you can craft even
more general abstractions. The semantics of queues are the same, no matter if
it's a queue of cabbages or a queue of kings. With template classes, you can
restate the original base class, as in Example 1(c). This is a common strategy
when applying parameterized classes: Take an existing concrete class, identify
the ways in which its semantics are invariant according to the items it
manipulates, and extract these items as template arguments.
You can combine inheritance and parameterization in very powerful ways. For
example, you can restate the original subclass, as in Example 1(d). Type
safety is the key advantage of this approach. You can instantiate any number
of concrete queue classes, as in Example 2. The language will enforce
abstractions, so that you can't add events to the character queue, nor
floating-point values to the event queue.
Figure 1 illustrates this design by showing the relationships among a
parameterized class (Queue), its subclass (PriorityQueue), one of its
instantiations (PriorityEventQueue), and one of its instances (mailQueue).
This example leads to this library's first architectural principle: Except for
a few cases, the classes you provide should be parameterized. This decision
supports the library's requirement for safety.


Macro Organization


Classes are a necessary but insufficient vehicle for decomposition. This
certainly applies to the class library I'm designing here. One of the worst
organizations you could devise would be a flat collection of classes, through
which developers would have to navigate to find the classes needed. It's far
better to place each cluster of classes into its own category, as in Figure 2.
This helps satisfy the library's requirement for simplicity. A quick domain
analysis suggests the opportunity for exploiting the representations common
among the classes in this library. Consequently, you assert the existence of
the globally accessible category named Support to organize lower-level
abstractions. You'll also use this category to collect the classes needed in
support of the library's common mechanisms.
This leads to the library's second architectural principle: Distinguish
clearly between policy and implementation. In a sense, abstractions such as
queues, sets, and rings represent particular policies for using lower-level
structures such as linked lists or arrays. For example, a queue defines the
policy whereby items can only be added to one end of a structure and removed
from the other. A set, on the other hand, enforces no such policy. A ring does
enforce an ordering, but sets the policy that the front and the back of its
items are connected. I'll therefore use the support category for those more
primitive abstractions, upon which I can formulate different policies.
By exposing this category to library builders, you support the library's
requirement for extensibility. In general, application developers need only
worry about the classes found in the categories for structures and tools.
Library developers and power users, however, may wish to use the more
primitive abstractions found in Support, from which new classes may be
constructed, or through which the behavior of existing classes may be
modified.
As Figure 2 suggests, you organize this library as a forest of classes, rather
than as a tree, since there's no single base class, as with languages such as
Small-talk. Although not shown in the figure, the classes in the Graphs,
Lists, and Trees categories are subtly different from the other structural
classes. Abstractions such as deques and stacks are monolithic in that they're
treated as a single unit: There are no identifiable, distinct components.
Referential integrity is therefore guaranteed. Alternatively, a polylithic
structure (such as a graph) permits structural sharing. For example, you may
have objects that denote a sublist of a longer list, a branch of a larger
tree, or individual vertices and arcs of a graph. The fundamental distinction
between monolithic and polylithic structures is that, in monolithic
structures, the semantics of copying, assignment, and equality are deep; in
polylithic structures, copying, assignment, and equality are shallow
operations (aliases may share a reference to a part of a larger structure).


Class Families



A third principle central to the design of this library is the concept of
building families of classes, related by lines of inheritance. For each kind
of structure, I'll provide several different classes, united by a shared
interface (such as the abstract base class Queue). Each class has several
concrete subclasses, each having a slightly different representation, and
therefore having different time and space semantics. In this manner, the
library's requirement for completeness is supported. You can select the one
concrete class whose time and space semantics best fit the needs of a given
application, yet still be confident that, no matter which concrete class is
selected, it will be functionally the same as any other concrete class in the
family. This intentional and clear separation of concerns between an abstract
base class and its concrete classes allows you to initially select one
concrete class and later, as the application is being tuned, replace it with a
sibling concrete class with minimal effort. (The only real cost is the
recompilation of all uses of the new class.) You can be confident that the
application will still work because all sibling concrete classes share the
same interface and the same central behavior. Another implication of this
organization is that it makes it possible to copy, assign, and test for
equality among objects of the same family of classes, even if each object has
a radically different representation.
In a simple sense, an abstract base class captures the relevant public design
decisions about the abstraction. Another important use of abstract base
classes is to cache a common state that might otherwise be expensive to
compute. This can convert an O(n) computation to an O(1) retrieval. The cost
is the cooperation required between the abstract base class and its
subclasses, to keep the cached result up to date.
The various concrete members of a family of classes represent the forms of an
abstraction. You must consider two fundamental forms of most abstractions when
building a serious application: the form of representation (which establishes
the concrete implementation of an abstract base class) and the choice of
in-memory structures (is the structure stored on the stack or on the heap?).
In the "bounded" form of an abstraction, the structure is stored on the stack
and thus has a static size at the time the object is constructed; in the
"unbounded" form, the structure is stored on the heap and thus may grow to the
limits of available memory.
Because both bounded and unbounded forms share a common interface and
behavior, you can make them direct subclasses of the abstract base class for
each structure. The second important variation concerns synchronization. Many
useful applications are "sequential systems"--they involve only a single
thread of control. Other applications, especially those involving real-time
control, are "systems concurrent"--they require the synchronization of several
simultaneous threads of control within the same system. The synchronization of
multiple threads of control is important because of mutual exclusion. Simply
stated, it is improper to allow two or more threads of control to directly act
upon the same object at the same time, because they may interfere with the
state of the object, and ultimately corrupt its state. For example, consider
two active agents that both try to add an item to the same Queue object. The
first agent might start to add the new item and be preempted, leaving the
object in an inconsistent state for the second agent.
There are three possible design alternatives: sequential, guarded, and
synchronous. Each requires different degrees of cooperation among the agents
that interact with a shared object. The interactions among the abstract base
class, the representation forms, and the synchronization forms yield the same
family of classes for every structure, as shown in Figure 3. This architecture
explains why I've chosen to organize this library as a family of classes
rather than having a singly rooted tree:
It accurately reflects the regular structure of the various component forms.
It involves less complexity and overhead when selecting one component from the
library.
It avoids the endless ontological debates engendered by a "pure
object-oriented" approach.
It simplifies integrating the library with other libraries.


Conclusion


Building frameworks is hard. In crafting general class libraries, you must
balance the needs for functionality, flexibility, and simplicity. Strive to
build flexible libraries, because you can never know exactly how programmers
will use your abstractions. Furthermore, it is wise to build libraries that
make as few assumptions about their environment as possible so that
programmers can easily combine them with other class libraries. You must also
devise simple, efficient abstractions that programmers can understand. The
most profoundly elegant framework will never be reused, unless the cost of
understanding it and then using its abstractions is lower than the
programmer's perceived cost of writing them from scratch. The real payoff
comes when these classes and mechanisms get used over and over again,
indicating that others are gaining leverage from the developers' hard work,
allowing them to focus on the unique parts of their own particular problem.
Table 1: Structures for the class library.
Structure Description
Bag Collection of (possibly duplicate) items.
Collection Indexable collection of items.
Deque Sequence of items in which items may be added and removed from either
end.
Graph Unrooted collection of nodes and arcs, which may contain cycles and
cross-references; structural sharing is permitted.
List Rooted sequence of items; structural sharing is permitted.
Map Dictionary of item/value pairs.
Queue Sequence of items in which items may be added from one end and removed
from the opposite end.
Ring Sequence of items in which items may be added and removed from the top of
a circular structure.
Set Collection of (unduplicated) items.
Stack Sequence of items in which items may be added and removed from the same
end.
String Indexable sequence of items, with behaviors involving the manipulation
of substrings.
Tree Rooted collection of nodes and arcs, which may not contain cycles or
cross-references; structural sharing is permitted.
Table 2: Tools determined by domain analysis.
Tool Description
Date/Time Operations for manipulating date and time.
Filters Input, process, and output transformations.
Pattern matching Operations for searching for sequences within other
sequences.
Searching Operations for searching for items within structures.
Sorting Operations for ordering structures.
Utilities Common composite operations that build upon more primitive
structural operations.

Example 1: (a) Abstraction of an event-queue class; (b) subclassing from
EventQueue; (c) restating EventQueue using a template class; (d) PriorityQueue
as a template class.
(a) class NetworkEvent...

 class EventQueue {
 public:

 EventQueue();
 virtual ~EventQueue ();

 virtual void clear();
 virtual void add(const NetworkEvent&);
 virtual void pop();

 virtual const NetworkEvent& front() const;
 ...
 };

(b) class PriorityEventQueue : public EventQueue {
 public:


 PriorityEventQueue ();
 virtual ~PriorityEventQueue ();

 virtual void add(const NetworkEvent&);
 ...
 };

(c) template<class Item>
 class Queue {
 public:

 Queue();
 virtual ~Queue();

 virtual void clear();
 virtual void add(const Item&);
 virtual void pop();

 virtual const Item& front() const;

 ...
 };

(d) template<class Item>
 class PriorityQueue : public Queue<Item> {
 public:

 PriorityQueue ();
 virtual ~PriorityQueue ();

 virtual void add(const Item&);
 ...
 };

Example 2: Instantiating concrete queue classes.
Queue<char> characterQueue;
typedef Queue<NetworkEvent> EventQueue;
typedef PriorityQueue<NetworkEvent> PriorityEventQueue;


 Figure 1: Inheritance and parameterization.
 Figure 2: Foundation-class library categories.
 Figure 3: Class families.



















February, 1994
Computer-Aided Software Testing


Simulating user interaction via interclient communication




Birger Baksaas


Birger is a software consultant based in Tnsberg, Norway. He can be reached
by phone at +47 94 26 27 52 or +47 33 31 97 64 or by fax at +47 22 55 33 77.


According to the Quality Assurance Institute, as much as 80 percent of all
software is tested manually. Generally, this is expensive, boring, and
inefficient. Consequently, testing is usually kept to a minimum and often
squeezed in at the end of the development process. Yet, to be competitive, all
parts of software products must be thoroughly tested and verified.
Furthermore, quality-assurance programs are necessary to meet emerging
standards like the ISO 9000-3. (For more on this subject, see the accompanying
text box entitled "ISO 9000-3 and Software Quality Assurance.")
Test drivers, most often written to test individual routines, can be applied
to the complete application. Especially with modular designs (such as with X
Windows), it's possible to write a computer-aided software-testing system that
simulates user interactions. The system can perform a sequence of operations
predefined in script files.
The advantages of even a simple automatic test system are obvious:
Long-lasting testing is enabled by automatically repeating the script, control
of memory consumption and other errors (which appear after long and heavy use)
is made easier, and regression testing is facilitated by rerunning the
scripts.


The X Windows Approach


In X Windows, the X server is software which manages the keyboard, screen, and
mouse. When the user presses a button or key, the X server sends events to the
right window of the client (application). The server is responsible for
managing the hierarchy of overlapping windows on the screen and gives notice
about changes to the client's windows via events. The events are true messages
sent over an asynchronous communication channel.
The test system I'll describe here simulates user interaction by producing
events and sending them to the applications. The applications'
event-dispatching mechanism doesn't distinguish between events generated by
the server and those sent by another client; it reacts similarly to both. Most
of the code is placed in the test driver, a separate process that sends events
to all applications initialized for the test system, based on button or key
commands from a script file.
Example 1 shows the format of a typical script file. The process_name token is
the name of the application program. The widget_name is the name of the widget
that will eventually receive and act on the events. The input file is parsed
using strtok. (The process is straightforward and is therefore not detailed in
this article.) A test script for opening a file is shown in Example 2.


The Application


External test drivers keep the extra test code linked into the applications to
a minimum. The production system can then be released with that code still in
place exactly as it was tested.
Some test-related code is needed in the application because the test driver
doesn't know the IDs or internal addresses of all the application's widgets
and gadgets. The test driver sends the events to one known widget in each
application; code associated with that widget reroutes all test events to the
individual widgets within the same process.
First, let's look at initialization. During this process, the test driver is
able to pick up the IDs of the application's main widgets via two interclient
communication facilities: properties and atoms.
The main widgets of each application (in Motif, the XmMainWindow) receive all
the events from the test driver. The window IDs of these MainWindows are
stored as properties--data maintained by the X server and available to all
other processes running on the display that know the properties' names.
The window IDs of the MainWindows are obtained by calling XtWindow with a
widget structure as input. To identify properties, X uses unique resource IDs,
called "atoms" which are created by the Xlib function XInternAtom. The input
to XInternAtom is a string (in this case the name of the receiving
application) taken directly from the test scripts. The routine XChangeProperty
creates the properties, which are also associated with windows. We use the
root window because the data is shared by multiple windows and processes; see
Listing One (page 78). Listing Two (page 78) is the general routine for
sending an event from the test process to the applications.
In each application, a table is created to keep track of all of its widgets.
In this system, this array is called the WidgetTable. Symbolic widget names
are #defined (Listing Five, page 80) and used as indexes into WidgetTable.
When the widgets are created, the returned addresses are put into the array.
Apart from being necessary for this test system, WidgetTable simplifies all
later operations on widgets and gadgets and also follows the example set by
the demo programs shipped along with Motif.


The Test Driver


The test driver declares and sends ClientMessage events to the MainWindow
widget of the application processes. ClientMessage events carry a data area
which is a union structure of 20 bytes, 10 shorts, or 5 long values. By
selecting one of these data types, the events can be used to send an array
between clients on the display. In this case, we use them to give information
to the applications about the commands in the test script.
The test driver has to translate a widget name found in the test script to the
number used as an index into the WidgetTable of the relevant application. This
is done most easily by scanning a table, as shown in Listings Six and Seven
(page 80). The test driver puts the widget index into one of the locations of
the ClientMessage event's data field. The type of operation is put into the
next data-field location, and in case of key events, the keycode is put into
the third.
Then the test driver sends the XClientMessage events to the application via
the XSendEvent function call to the window ID (found as a property in the X
server; see Listing Three, page 78).
To enable a MainWidget to receive events, XtAddEventHandler must be called for
it. ClientMessage events should not be specifically selected by XSelectInput;
they are always received by the window they are sent to.
When the MainWidget receives a ClientMessage, the event handler uses the
destination widget's index placed in the data field to look up the widget
structure in the WidgetTable. The receiving-window ID is obtained by calling
XtWindow on this widget structure. The desired event type (XButtonEvent, for
example) is declared, initialized, and passed to the intended window by
XSendEvent and is therefore handled by Xlib's ordinary event-dispatch
mechanism. The receiving widgets react by calling their callback routines as
if the events were coming from the server; see Listing Four, page 78. The
application is also able to process events sent by the X server. Manual user
interaction is therefore possible, whether automatic tests are running or not.
This automatic test method gives a true simulation of user action because the
exact same code is executed in the application in both situations. A system
like this won't replace manual test methods, but it can certainly improve
their performance.


ISO 9000-3 and Software Quality Assurance


ISO 9000 was established to facilitate the international exchange of goods and
services. ISO 9000-3 (officially titled "Guidelines for the Application of ISO
9001 to the Development, Supply, and Maintenance of Software") is a subset of
the original standard established to set international guidelines for software
quality assurance. Although ISO 9000-3 addresses several issues, standardized
software testing and validation procedures are central to the proposal. It
also covers related areas, including internal audits, purchasing
specifications, training, and the like.
Generally speaking, ISO 9000-3 describes uniform software-development methods
that meet client requirements, as defined by a specification. To be ISO 9000-3
compliant, the development process must adhere to a standard set of procedures
by documenting the use of a formal life-cycle model governing the design,
development, and maintenance processes. These formal procedures are referred
to in terms of the development framework (in-house quality-assurance
programs), life-cycle activities (the overall software-development process),
and supporting activities (those necessary to qualify, conform, and confirm
that the software product was developed properly). Conformance to ISO 9000 is
certified by third-party auditors, and certification is valid for three years.
If you're interested in implementing ISO 9000 guidelines, a video course
entitled, "Understanding and Implementing ISO 9000" developed by The Media
Group (Williston, VT) is one place to start. The training video, which is one
in a series of ISO 9000-related courses, details the 20 elements of ISO
9000-1, including subsets ISO 9000-2 and 9000-3, and includes practical
suggestions from companies such as Polaroid and Bachman, which comply with the
standard. For more information, call The Media Group at 800-678-1003.
As with many ISO standards, 9000-3 has gained more momentum in Europe than
North America. However, U.S. software vendors with plans for cultivating a
global client base may find it necessary to seek ISO 9000 certification.
Likewise, implementing a certification software quality-assurance program may
also provide a competitive edge when cracking non-U.S. markets. In any event,
establishing well-defined software testing and quality-assurance programs
makes good sense for all software developers.
--editors

Example 1: Typical test script.


B[uttonPress]
<process_name>
<widget_name>
K[eyPress]
<process_name>
<widget_name>
<key>


Example 2: Sample test-script syntax.
ButtonPress myapp file-cascade
ButtonPress myapp open-button
KeyPress myapp open-file-field n
KeyPress myapp open-file-field e
KeyPress myapp open-file-field w
KeyPress myapp open-file-field .
KeyPress myapp open-file-field t
KeyPress myapp open-file-field x
KeyPress myapp open-file-field t
ButtonPress myapp open-ok-button
[LISTING ONE] (Text begins on page 36.)

void APPstoreMainID(Widget W, /* I: Widget of MainWindow */
 char *ProcessName) /* I: Name of the property */
{
 /* Code to initialization the test system within the application. Stores
 the Main Widget window ID as a property in the X server */

 Atom atom;
 Display *display;
 Window window;

 display = XtDisplay(W);
 window = XtWindow(W);
 /* Atoms are the addressing mechanism for properties */
 atom = XInternAtom(display, ProcessName, 0);
 /* Adding property for ProcessName. The property is associated
 with the root window */
 XChangeProperty(display, DefaultRootWindow(display), atom,
 XA_WINDOW, 32, PropModeReplace, (unsigned char *)&window, 1);
 }

[LISTING TWO]

int GENsendEvent(Widget FromW, /* Widget the event is sent from,
 used to get the display ID. */
 XEvent *Event, /* Event to be sent */
 char *ProcessName) /* Destination process name */
/* Description: General routine to send an event, here used to send events
 from the test process to the applications. */
{
 int ret, format;
 unsigned long nitems, left;
 Display *display=NULL;
 Window window, *retData=NULL;
 Atom atom, type;

 long eventMask = 0;

 display = XtDisplay(FromW);
 /* Get propery identifier. */
 atom = XInternAtom(display, ProcessName, 1);
 /* Get the receiving window ID, stored as a property in the server. */
 ret = XGetWindowProperty(display, DefaultRootWindow(display), atom,
 0, 4, False, XA_WINDOW, &type, &format, &nitems,
 &left, (unsigned char **)&retData);
 /* Check ret: Success value is special for XGetWindowProperty, see Ref Man.*/
 if (retData != NULL) {
 window = *retData;
 eventMask = ButtonPressMask;
 eventMask = ButtonReleaseMask;
 eventMask = NoEventMask;
 Event->xany.window = window;
 Event->xany.display = display;
 ret = XSendEvent(display, window, TRUE, eventMask, Event);
 }
 XFree((caddr_t)retData);
}

[LISTING THREE]

int TESTsendButtonEvent(int Down, /* I: TRUE if ButtonPress */
 int WidgetIndex, /* I: Destination widget number
 converted from the name read
 from the test script via a table */
 char *ProcName, /* I: Name of application read
 from test script*/
 Widget FromWidget) /* I: Main widget of test driver */
 /* Declares and sends events from the Test Driver to the applications */
{
 int eventType;
 XClientMessageEvent event;
 if (Down)
 eventType = ButtonPress;
 else
 eventType = ButtonRelease;
 event.type = ClientMessage;
 event.format = 32;
 event.data.l[0] = eventType;
 event.data.l[1] = WidgetIndex;
 GENsendEvent(FromWidget, (XEvent *)&event, ProcName);
 /* GENsendEvent is shown in Listing Two */
}

[LISTING FOUR]

#include <Xm/PushBGP.h>
void APPdistributeEvent(Widget W, XtPointer ClientData,
 XClientMessageEvent *Event,
 Boolean *ContToDispatch)
/* Receives events from the test driver and passes the events to individual
 widgets based on the message received by the test driver. This is a standard
 callback routine. Therefore the parameters are not explained. */
{
 int ret, format, widgetIndex, eventType, keyCode;
 Display *display;

 Window window;
 long eventMask = 0;
 XButtonEvent newEvent; /* Used both for button and key events. The structures
 are identical, except for the button/code field. */
 eventType = (int)Event->data.l[0];
 widgetIndex = (int)Event->data.l[1];
 if ((eventType == KeyPress) (eventType == KeyRelease)) {
 keyCode = (int)Event->data.l[2];
 }
/* Checks should be performed to ensure that widget index is within legal
 range, skipped in this listing */
 if (XtIsWidget(WidgetTable[widgetIndex])) {
 display = XtDisplay(WidgetTable[widgetIndex]);
 window = XtWindow(WidgetTable[widgetIndex]);
 /* Position is set inside the window */
 newEvent.x = 4;
 newEvent.y = 4;
 }
 else { /* gadgets */
 display = XtDisplay(((XmGadget)WidgetTable[widgetIndex])->object.parent);
 window = XtWindow(((XmGadget)WidgetTable[widgetIndex])->object.parent);
 /* Gadgets are part of their parent widget's window. Below is code to find
 where gadget is positioned inside parent. Event is sent to parent
 in the case of gadgets. */
 newEvent.x = ((XmGadget)WidgetTable[widgetIndex])->rectangle.x + 4;
 newEvent.y = ((XmGadget)WidgetTable[widgetIndex])->rectangle.y + 4;
 }
 newEvent.window = window;
 newEvent.display = display;
 newEvent.subwindow = 0;
 newEvent.root = XRootWindowOfScreen(XtScreenOfObject(W));
 newEvent.type = eventType;
 newEvent.time = CurrentTime;
 newEvent.state = 0;
 if ((eventType == KeyPress) (eventType == KeyRelease)) {
 /* The field below is the only difference between the event types used. */

 newEvent.button = keyCode;
 }
 else {
 newEvent.button = Button1;
 }
 ret = XSendEvent(display, window, TRUE, eventMask, (XEvent *)&newEvent);
 /* Error checks should be done on ret */
}

[LISTING FIVE]

/* Application include file which defines widget names used as
index into WidgetTable */

#define file-cascade 1
#define open-button 2
#define open-file-field 3
#define open-ok-button 4

[LISTING SIX]

/* Table used to find widget indexes of application App1 in test driver */

TEST_ITEM App1TestTable[] = {
{"file-cascade", 1},
{"open-button", 2},
{"open-file-field", 3},
{"open-ok-button", 4}
};
int App1TableSize = (sizeof App1TestTable / sizeof App1TestTable[0]);

[LISTING SEVEN]

/* Definition of the TEST_ITEM structure */
typedef struct {
 char *WidgetName;
 int WidgetIndex;
} TEST_ITEM;
End Listings














































February, 1994
The Black Art of GUI Testing


Automated testing in an event-driven environment




Laurence R. Kepple


Dr. Kepple is president of Segue Software. He can be reached at
kepple@segue.com or on CompuServe at 71670,467.


When developing an application with a character-based user interface (CUI),
the standard automated-test strategy is to use record/playback to drive the
application and bitmaps to validate the application's state. Graphical user
interfaces change this, however. The richness of the GUI and the complexity of
its object-oriented, message-passing paradigm has greatly increased the
complexity of the testing problem to the point that testing GUI-based software
can be as difficult as developing it. In fact, GUI testing is so technically
difficult that software developers are taking on the role of software tester
when they're asked to create tests that validate program modules and test
code. In turn, software testers are becoming test-code developers just to keep
up with the magnitude of the tasks before them. This article describes how the
shift from CUIs to GUIs affects test automation, and why programming, rather
than record/playback, is a superior solution.


The CUI Test-tool Paradigm


In the standard test-automation strategy for CUI, the tester records a
live-interaction session with the target software, later playing back the
recording using bitmaps (taken at recording time) to validate the application
state after or during playback. CUI-paradigm tools bypass the logical
information known by the GUI about application objects. Instead, CUI tools
rely on bitmaps to provide information about the application to the tester.
Both recordings and bitmaps expect application components to remain at the
same screen location over time--a fairly reasonable assumption for most CUI
applications. In the CUI environment, one app owns the screen (often writing
directly to it), the arrangement of screens is fixed, and graphical elements
such as fonts are weak or missing altogether.


Context Sensitivity


Because software recording captures a live-interaction session between the
tester and the target software, it is "context sensitive"--it captures the
context that existed when the recording was made. The total context that a
recording captures is extensive, consisting of timing, screen location, fonts,
and the like.
However, context sensitivity is a problem in GUI environments. Ironically, in
GUI software recording most of the information you record actually works
against you at playback time. Application-object attributes such as screen
location and font are constantly changing, yet a recording that captures all
of this temporary context at creation time naturally replays the same context
at playback time. Context identity between record time and playback time is a
special case in GUI environments--it can happen. But the general case is that
the playback context will be different from the creation context. The
resulting context conflict limits usefulness of the automated recording
approach when applied to GUIs; see Figure 1.
Instead of depending on context-sensitive components such as bitmaps, the GUI
paradigm demands an approach that focuses on "logical object
functionality"--what an object essentially does, rather than how it happens to
look on the screen. For example, when given a valid filename, a typical File
Open dialog box brings up the specified file in a new window. The File Open
dialog box retains this essential functionality no matter where it appears on
the screen, no matter what system font happens to be selected, and no matter
which color scheme the user may currently have selected. A recorded test
(especially one validated by bitmaps) buries this logical object functionality
under irrelevant contextual data that relates to the temporary screen
appearance of the object. Context conflict can cause playback failure or false
indications of error; it is the single biggest obstacle to effective GUI test
automation using record/playback technology.
Test-tool manufacturers who are trying to retrofit GUI compatibility onto test
systems originally designed for CUI environments have devised several means to
cope with context sensitivity. Often, however, these strategies are complex,
error prone, and resource intensive. For example, some manufacturers
compensate for the variable screen location of objects by scanning screen
bitmaps and finding the wayward objects in their new locations.


Synchronization Strategies


The second major obstacle in adapting CUI test systems to GUI environments is
synchronization. Test-tool manufacturers have circumvented the problem in two
ways. In the first, the tester may direct the test tool to "sleep" at various
points during the test. These sleep intervals break a recording into short
spurts of activity surrounded by long periods of inactivity. Thus, timing
differences between a recording's creation and playback contexts are
obliterated by the long waits. But hardcoding timing assumptions into
automated tests is a poor practice, leading to failure-prone, unmaintainable
code. In addition, this approach dramatically decreases the speed of automated
testing.
Another strategy for overcoming timing problems is to use bitmaps to "pace" a
recording at playback. Pausing playback until the application's screen matches
a stored bitmap compensates for inherent timing incompatibilities. Like the
"sleep" strategy, however, bitmap pacing dramatically slows testing speed.
Bitmaps are large objects, and constant bitmap loads and compares are
expensive, slow operations. Since bitmaps are highly context sensitive,
changes to screen appearance render stored bitmaps useless for pacing.
Therefore, automation that depends on rigid stability of screen appearance
puts those who depend on it at great risk in real-world GUI projects.
Both the sleep and bitmap pacing strategies make it impossible to use the
resulting test automation for performance testing. This is because both
approaches deliberately slow the target application down so much that the
inherent timing incompatibilities between record and playback are overwhelmed.
Consequently, it's impossible to use such automation to time the performance
of the target or to see how fast it can process input.


Programming a Response


While traditional test tools grind away at bitmap analysis, the GUI holds the
very information the test tool needs--the current location of the desired
object. A simple call to the GUI can determine the current location of this
screen object, but using this and similar strategies means rethinking the test
tool in terms of the GUI paradigm.
Programming languages such as C and C++ provide facilities to name GUI objects
and drive and validate their operation. In my case, however, I've written a
higher-level language called "4Test" (part of my QA Partner test tool). 4Test
is an object-oriented language that interacts with GUI objects via a class
library that defines the properties and methods associated with each class of
GUI object. 4Test uses a suite of GUI drivers that turn the logical test
actions requested by the test programmer into the object- and GUI-specific
event streams needed to drive and validate the tests. Before acting on an
object, the GUI driver asks the GUI for the current location of the target
object. Since the GUI is the ultimate authority on object location in real
time, the test tool always knows where to find an object.
Checking for the object's current location also allows the test tool to
perform positive object identification. This means that the test tool will
perform, in effect, an assertion check on each object named in the test
program. Is it available? Is it in the right state for the desired test
action? Even a simple click on the OK button of a dialog box involves
extensive state validation to determine whether the right dialog box is up and
whether the OK button is clickable or grayed out. This powerful and automatic
state-checking mechanism is an invaluable aid to testers drilling down through
layers of menus and dialog boxes in complex GUI applications.


Eliminating Synchronization Problems


The two-tiered architecture comprised of the test program language process and
the driver process also allows GUI-paradigm test programs to be event driven.
After the test program requests an action against an application object (a
click on an OK button, for example), the process is suspended until the GUI
notifies the test driver that the target object is now available and that the
desired test action was successfully executed. This event-driven architecture
eliminates the synchronization problem. The tester simply decides on an
acceptable time-out interval beyond which the test tool should not wait for an
object to become available. After that interval expires, the test program
awakens with an error.
This triggered-on-object synchronization frees test programs from timing
dependencies. The same GUI test suite that runs on a 25-MHz machine will run
on a 66-MHz PC without changes, making it possible to reuse GUI application
test suites on a wide range of different systems as part of their standard
system-validation process, for example. By accessing GUI objects solely
through the medium of the GUI, event-driven test programs are safe at any
speed.
The event-driven approach also allows any regression suite to become a
performance test without any additional work. By simply setting the time-out
interval to a desired threshold and rerunning the regression suite, a tester
can determine if system response time, at any point during the test, falls
below the specified threshold.


Conclusion



Effective software development for GUI environments requires tool-supported,
automated testing strategies grounded in the GUI paradigm. Tests should be
event driven and focused at the level of logical object functionality, not
temporary screen appearance. Test portability should be a major concern and
will pay off handsomely with increased reusability of tests across both GUI
and hardware boundaries.


Capture/Playback Techniques




George J. Symons




George is vice president at Software Research and can be contacted at
symons@soft.com.


While Windows-based applications have become the norm, they have complicated
the testing of applications. Although user interfaces now provide the user
with more aesthetic options, the fact that options can be invoked in any order
has created a more complex environment for testing, as inconsistencies across
platforms are now possible in terms of colors, fonts, screen size, and general
look-and-feel.
Capture/playback tools can be operated in a variety of modes, and no vendor
implements all of these modes today. It is important to understand the
strengths and weaknesses of each mode because testing is not a single task--it
is a process that goes on throughout the life of an application, and each mode
has its benefits at different times during that process. The following are the
three capture/playback modes.


True Time


With true time, keyboard and mouse inputs are replayed exactly as recorded by
the tester. Playback timing is duplicated from the server's own timing
mechanism, allowing tests to be run as if executed by a real user. The results
of the tests indicate any variance from the baseline cases, permitting the
tester to determine the implication of those differences. Therefore, if a
button moved to a different location in the window, it would be flagged as an
error, and the tester must then determine its significance. For instance, the
movement of a button will affect documentation, even though the program still
runs as it did before.


Character Recognition


Character recognition allows the test to search for items that may have moved
or fonts that may have changed since a previous version of the application was
tested. Character recognition helps extend the life of a test script by
allowing it to adjust for minor changes in window layout or fonts being used.
The downside of character recognition is that it requires some additional time
to create the scripts. It also may pass a test even if an error should have
been reported. In this case, a moving button may not be caught, and the
documentation will go out unchanged. Character recognition can also be used to
take a portion of a screen image and convert it to ASCII characters to be
saved in a file for printing or comparing with other values as part of the
test-verification procedure.


Widget Playback


The final mode is widget, or object-level, playback. With widget playback, the
X and Y coordinates on the screen are no longer significant, as the
application's widgets are activated directly. Widget testing is the only
reasonable way to do portability testing.
The same test script can run on multiple hardware and operating-system
platforms. Such tests will not check for GUI correctness, but will check that
the application's engine ran successfully. With widget testing, tests might
pass despite conditions in which a user could not operate the application
interface, such as a command button being hidden behind a window. Therefore,
even if widget testing has been run, it is still important to do user-level
testing, either manually or with the true-time capture/playback mode.
 Figure 1: Record/playback context conflict.






















February, 1994
Software Testing Cycles


Overcoming testing bottlenecks




N. Scott Bradley


Scott is director of technical marketing at Mercury Interactive and can be
reached at 408-987-0100 or scott@merc-int.com.


Over the past decade, application programmers have seen a dramatic improvement
in GUI development tools as we've moved from manual coding and API programming
to GUI builders, application generators, and user-interface management
systems. Although end users have benefited tremendously from enhanced
application features and program size, quality-assurance (QA) engineers must
test an order of magnitude more code. Consequently, testing is often a
bottleneck, as each code change, enhancement, bug fix, and platform port
necessitates retesting the entire application to ensure a quality release.
The key to testing more code in less time is a clearly defined software test
cycle. The test cycle I'll discuss in this article is composed of four steps:
test generation, playback, verification, and reuse (maintenance and porting).


Test Generation


Test generation is the only phase still driven entirely by human developers,
even when an automated test tool is involved. With test generation, there are
two methods for creating tests--programming and recording (capturing)--and two
levels of representing test commands--analog and object oriented.
Programming is most frequently used by testing experts. QA engineers who are
not constrained by time can use programming to create robust test suites,
often beginning test development before the application is finished.
Recording, on the other hand, is used by application experts. This category
includes QA engineers less proficient at programming, customer-support
engineers, and beta-site customers who have neither the time nor the expertise
to program tests. Recording is also used by developers and porting groups who
work under short time constraints.
Programming is the only way to create the general framework or structure of a
test suite; it allows tests to be generated before the application is ready.
Creating programmed tests at the object level enables the test tool to query
widget and object information such as existence, state, and contents. However,
this approach requires the test developer to program the physical widget
descriptions and keep track of the UI class hierarchies and state-handling
mechanisms--similar to handcoding the GUI.
The main drawback is the long test-development time required. All of the UI
internal-class hierarchies and state-handling mechanisms must be included in
the programmatic test script. Testing something as simple as selecting an item
from a menu or clicking a button results in lengthy and awkward test scripts
that are complex to maintain. Developing UI tests using a programming-only
tool requires the test developer to reprogram the tests or create aliases and
huge mapping files for each platform. The complexity involved in maintaining
these test suites can be as great as that of developing the initial
application code. Other shortcomings include problems in code maintainability,
complexity of use, and necessary expertise level.
Analog recording--recording mouse movements and keyboard strokes--is the
simplest and most productive way to generate tests. The drawback here is that
the tests are position-dependent: Each movement of the mouse is to a specific
X-Y location. Thus, object-oriented recording was developed, in which test
scripts are recorded and reverse-engineered at the widget level, rather than
at the analog level. While the user interacts with the application, his
actions are automatically mapped to application functions; see Figure 1.
Object-oriented recording by itself does not solve the problem of reuse,
however. To ensure test reusability, the test tool must use what I call the
"data-model" approach, which separates the GUI description from the body of
the test. If, for example, a menu-item label changes from "Open" to "Start," a
single modification to the data model causes all tests to be automatically
remapped by the testing tool. By contrast, the "code-model" approach, used by
most commercially available test tools, requires physical descriptions to be
manually programmed and changed throughout the test script as the application
changes. The test script must be constantly modified and recompiled since the
GUI elements and their descriptions are embedded in the actual test script.
Aliases and large, complex mapping files can be created, although they must be
updated as well. The code model does not differentiate between the logical and
physical descriptions, and it creates a test script that is very difficult to
read and maintain.
The data-model approach separates the test script from the GUI data, which can
change from release to release and between platforms. It enables the logical
name information to be separated from the physical description. The logical
data is a clear description of the GUI object, while the physical data is a
detailed list of the object's attributes used for unique identification of the
object. The test tool is responsible for resolving logical and physical
descriptions during run time. The GUI test tool automatically remaps the rest
of the test suite. The data model provides a basis for the automation of GUI
testing and makes the test script more readable.


Test Playback


An automated test tool must be able to replay unattended overnight, and it
must replay correctly 100 percent of the time. If the test suite loses
synchronization with the application after the first few tests, then all
subsequent results will be erroneous and the benefits over manual testing will
be lost.
The problem of synchronization is partially solved by working at the object
level, since tests can keep pace with the application by determining when
windows and widgets become active and inactive. This method relies on the
operating-system mechanics, assuming that when the test tool gets processing
time from the operating system, the application is ready to receive more test
inputs. Working at the object level solves synchronization problems for most
applications running stand-alone on a non-preemptive operating system.
Unfortunately, this approach fails with a true multitasking operating system
or on the server side of a client/server application.
The solution to testing client/server systems is output synchronization, which
keeps pace with the output by "watching" the screen for visual cues, enabling
the tool to emulate a human tester and avoid timing errors. The test tool
monitors the application, waiting for data from the server or for a response
such as "query complete" before continuing, thus maintaining the integrity of
the test suite and the accuracy of the results.


Verification


Speed in developing and updating a string of baseline results is critical, as
it contains the expected responses to which the test tool compares all future
results. A test tool should have built-in baseline acquisition and
verification; the baseline acquisition should be automatic and in conjunction
with the initial test development. Any new results required should be
automatically updatable. A test tool also needs multimethod verification,
which enables the testing of graphics, text, files, and widgets and objects.
Bitmap verification, also known as "image verification," is the only effective
means of testing a graphically oriented application. The test tool compares
expected and actual areas of the GUI, representing the difference as an
overlap (exclusive-OR) of the two images.
Text verification, also known as "value verification," tests the values in an
application, not just the GUI. It uses a form of optical character recognition
(OCR) to "read" the bits on the screen and translate them into alphanumeric
characters. This is the only way to test the server side of a client/server
application.
File verification validates input and output data files processed in the
background and not sent to the GUI level.
GUI-level verification reads the window and widget resources to check the
existence of a particular field/button and/or determine its contents. The
state of a widget (active/disabled) can be checked also. This method works
well for standard GUI elements on the client side of client/server
applications.
All four methods are necessary for complete application testing.


Test Reusability


Maintaining test code requires substantial time and effort for upkeep. As a
result, reusability--immunity to GUI changes in the application--is critical
in successful application testing. In a new version of an application,
buttons, widgets, and objects may have moved, and items may have been added or
removed from menus. Much as GUIBuilders automated GUIdesign (tracking
user-interface descriptions, object-class hierarchies, and state handling),
GUITest Builders automatically accommodate changes in the user-interface
hierarchy via a dynamic GUImap.
For example, translating a GUI-based software application from English to
French changes all buttons, labels, and menus, but the underlying application
functionality remains the same. Manually programmed tests require the test
developer to check through hundreds or thousands of test scripts, finding and
replacing each button, label, and menu name; see Figures 2(b) and 3(b). The
GUITest Builder lets the test designer make logical-name changes once. The
tool automatically remaps the changes to the physical attributes that identify
the GUI objects. The new logical names are remapped to constant physical
attributes (such as [MSW_class: button, MSW_id:1445, MSW_style:BA0000,
owner:'WinBurger']). The resulting test script is clear and concise; see
Figures 2(a) and 3(a).
Path-specific programmed tests must be rewritten to port an application from
one platform to another (Solaris to Windows, for example). All the physical
pathnames must be changed throughout the complete suite of tests. By contrast,
the GUI Test Builder lets each window of the ported application be "learned"
automatically; then the original logical names can be mapped directly to the
new physical names. If, on the Solaris platform, "open" originally mapped to
[X_class: vcontrolToggle, X_attachedname: OK, X_path: table_main; work_area;
frame_22;option_Btn; open], on the new Windows platform it may map to
[MSW_class:button, label:open, MSW_id:1376, MSW_style:BB0000,
owner:"WinBurger"]. The GUI test tool will make the platform-dependent changes
in the GUI map automatically. When the test script is replayed, it reads the
logical names of the objects and uses the GUI map to translate the logical
names into the physical descriptions. The physical descriptions are then used
to identify the GUI objects, allowing the test scripts to be independent of
the platform.



Conclusion


Today's automated software-testing tools can automate construction and
maintenance of all GUI-related information during testing. They support visual
management of GUI tests, interactive editing for generating and manipulating
GUI descriptions, automatic porting of GUI test files across multiple
platforms, and the like. With tools like these, application testers can
finally match the gains achieved by application developers who use GUI
builders.

Figure 1: (a) Object-oriented; (b) analog scripts.
(a) menu_select (File;Open)

(b) move_loc_abs (120,230);
 button_press ("Left");
 move_loc_rel (0, 25);
 button_press ("Left");

Figure 2: (a) Object-oriented recording; (b) programming.
(a) menu_select ("File;Open");

(b) const WAPP_WND = "/[WndBorder]WinBase - (Point.rev)";
 const WAPP_MENU = "{WAPP_WND}/$Menu";
 const FILE_MENU = "{WAPP_MENU}/File";
 const OPEN="{FILE_MENU}/Open";
 MenuGrab (OPEN);


Figure 3: (a) Object-oriented recording; (b) programming.
(a) button_press ("Ok");

(b) const OPEN_WND = "/[WndBorder]WinBase - (Point.rev)/[DialogStyleBox]Open
Test";
 const OK_BUTTON = "{OPEN_WND}/[PushButton]OK";
 PushButtonClick(OK_BUTTON);

































February, 1994
Signal Analysis via the Bootstrap


A resampling algorithm for error estimation




Gary McGrath


Gary has a BS in physics from the University of California at Irvine and a PhD
from the University of Hawaii. He is currently doing research work at the
Fermi National Accelerator Laboratory and can be reached at gary@master
.ps.uci.edu.
Many quantities are natural variables to examine statistically but are useless
as test statistics because their error isn't easily estimated. Still, many
possible test statistics are easily computed. For example, computing the
median value of a data set requires little more than sorting the data and
finding the middle value. The second moment of a distribution is calculated by
simply summing the square of the difference between each value and the mean
value. These types of quantities are frequently used as some sort of decision
criteria, but a single value itself does not allow rejection of a hypothesis.
Instead, hypotheses are rejected based on confidence intervals; however,
calculating the confidence interval for the median or moments is extremely
difficult using conventional techniques. What we need is a calculable and
accurate estimation of probability distributions for complex quantities.
Conventional error estimation requires assumptions about the shape of the
distributions involved. Much effort has been dedicated to formulating more
precise approximation methods for error estimation and propagation, but they
inevitably rely on many assumptions. For example, if C=A+B and A and B are
sets of measured data, the standard method for estimating the value and error
of C is to calculate the mean values for A and B and use those values to
calculate the mean value for C. Then, the error on mC is found with the
formula in Example 1, where s2AB is the covariance between the two quantities
and s2A and s2B are the variances. We use A and B to calculate the confidence
interval for C, and the distribution of C is assumed to be Gaussian. These
formulas are valid only under certain assumptions about the distributions of A
and B. Furthermore, calculating the confidence interval with standard
techniques asserts the probability distribution for C and uses the calculated
mean and standard deviation.
Systematic errors resulting from bad assumptions are propagated throughout the
calculation, leaving an ambiguous result. Accurate error propagation using
conventional methods is a delicate process, sensitive to assumptions. If the
true distribution is not approximated well by a Gaussian, but is instead
skewed with a long tail, treating that distribution as a Gaussian induces a
large systematic uncertainty in the final result. Many analyses are proven
faulty due to bad assumptions and thereby cast doubt on all tenuous results.
The significance required for a formidable result has increased throughout the
years to a level that precludes many experiments.
Proper signal analysis requires a technique which minimizes assumptions. If
the final result is calculated without assuming distribution functions or
using crude approximation methods for propagating the error through an
equation, the calculated significance is reliable. Such a method allows the
analysis of tenuous signals because a borderline result lacks calculational
uncertainties. It exactly states the probability of the result occurring
randomly without many assumptions convolved into the calculation.
When Efron extended the Jackknife technique to form the Bootstrap technique
(see "References"), he provided a method to easily propagate errors through
complex calculations, resulting in an estimate of the error with few
assumptions. The Bootstrap is a resampling algorithm that estimates the error
on quantities by resampling the data in random ways. The astrophysics
community adapted this technique to search for tenuous signals in
multidimensional data, in which the mechanisms and local efficiencies are not
well understood. Generating a background via the Bootstrap allows a sensitive
analysis without assumptions convolved into the calculation. The measured data
is assumed to be a discrete sampling of the background distribution and can
thus be used to calculate the background. By using the data itself to
calculate the background, local efficiencies and systematic uncertainties are
directly incorporated into the calculation. This technique provides a way to
analyze signals without assuming efficiency functions or probability
distributions. When it's crucial to detect signals without many assumptions
convolved into the calculation, the Bootstrap provides a sensitive technique.


The Bootstrap


Signals are often regarded as a correlation between measured quantities;
however, the most basic signal emerges as a measured value that differs from
our expectation. The Bootstrap provides signal detection by generating the
probability distribution for the measured value. Put the data through the same
process many times, but sample the quantities with replacements from the data
each time. This process will generate many estimates, which are binned to make
a distribution. Use that distribution to test the significance of the result.
Instead of knowing perfectly the components needed to calculate the
background, the data itself is used to generate the background and thereby
accounts for local biases.
The Bootstrap begins by sampling values randomly with replacements from the
data to form a fake set. Suppose, for example, that the data is 2,5,7. You
select three random numbers between 1 and 3, say, 1,3,1. Then you choose the
data values at the location pointed to by the random numbers to form the fake
set 2,7,2. Use this data to calculate the test variable. Each time the
procedure is repeated, a different background estimate results. As the
different values are found, bin them into a distribution and keep running
totals for mean and variance calculations.
The Bootstrap recognizes that the data is a discrete sampling of the
background distribution; thus, the background distribution can be built from
the data. Binning these background estimates forms the entire probability
distribution for the measured variable. This distribution tests the
statistical significance of the measured value. Probability theory states that
integrating the probability distribution from a value to o measures the
probability of randomly obtaining that value. So, with the entire distribution
in hand, find the significance by integrating the distribution out to the
expected value.
If the distribution is approximately Gaussian, it may be more convenient to
express the results using the mean and standard deviation rather than the
entire distribution. It is therefore recommended to keep the running totals
necessary for mean and variance calculations. Just remember, expressing the
results in this way adds another assumption into the process. It's preferable
to express the random-occurrence probability, since the mean and standard
deviation suppose a Gaussian distribution.
When doing conventional theoretical background calculations, the distributions
of the input quantities are previously measured or calculated theoretically,
and those distributions are used to obtain discrete values for the
calculation. Furthermore, some sort of noise is added to make the calculation
more realistic. Using the measured quantities in hand that represent a
sampling of that distribution with the correct noise and errors included is
more economical than performing conventional background calculation.


Implementation


As with any Monte Carlo calculation, the Bootstrap technique relies heavily on
a pseudorandom-number generator. Since this calculation supposedly removes
correlation, it's essential that no correlation exist in the random numbers
out of the generator. To achieve this, you must use a good random-number
generator, which eliminates most system-supplied generators. These generators
tend to have a slightly nonuniform distribution and often have bad
time-occurrence correlations. This is undesirable for small signals, but
tolerable for everyday calculations. In this article, I use ran1 from
Numerical Recipes, which does well, but is much slower than most system
routines. Whichever generator you use, it must be thoroughly tested for both
uniform distribution and uncorrelated time occurrence. Knuth's spectral test
is great for detecting correlated occurrence.
The error on the mean background estimate converges slowly, necessitating many
repetitions of this procedure to get a precise estimate. Statistics show that
the error of the mean of a sampled distribution is s=s.d./--N, where N is the
number of points in the sample. Therefore, the calculated mean approaches the
"true" mean as 1/--N. For a data sample of 100 numbers, the Bootstrap yields
an estimate of approximately 10 percent. For an accurate value, repeat the
Bootstrap procedure many times. There is one exception to this rule: The fewer
the data points, the smaller the total number of permutations. Generating many
more estimates than there are permutations is wasteful. Repeating too many
times calculates the same numbers repetitively, doing little to enhance the
accuracy of the calculation.
With the probability distribution in discrete form, the confidence intervals
are easily calculated. Instead of integrating the binned distribution, do the
integral as the estimates are generated. Counting the number of background
estimates above and below the test value, P=Nabove/(Nbelow+Nabove),
immediately gives the probability for the value to occur randomly greater than
the mean background estimate. If the estimates are binned, the probability is
also available from the histogram, but the square shape of the bins causes a
slight loss of precision. That error could be reduced with other methods, like
trapezoid integration, but the counting method is clearly the best.
The first example demonstrates the original Bootstrap technique by calculating
the median and its error for a measured set of grades for a hypothetical
university course called Intro to Computer Science 101. If an administrator
believes a well-run class results in a median grade of 50 percent, determining
if ICS 101 is taught well is very difficult using conventional
techniques--unless the median is exactly 50 percent. Since there's only one
sample, the error on the median grade must be determined to see if the median
grade is statistically different from 50 percent. Since a median higher or
lower than 50 percent is tested for, a two-tail confidence interval is
appropriate. The grades for ICS 101 are as follows: 45, 67, 73, 27, 44, 96,
49, 38, 51, 63, 11, 87, 67, 46, 76, 57, 53, 61, 57, 64. Figure 1 shows the
distribution of grades. The median grade is 57. The program in Listing One
(page 81) calculates the median grade distribution (also shown in Figure 1)
using the Bootstrap technique. The median grade for ICS 101 had a 13 percent
probability for occurring randomly, well within the 5 percent confidence
limits chosen by the administrator.


Correlation Signals


Signals in multidimensional data appear as a significantly large measurement
correlated with particular values for other measured quantities. To detect the
significance or strength of the signal, it must be compared to the background.
Conventional testing schemes assume that the background follows some fit
function, and the errors are given by an assumed probability distribution.
Otherwise, they calculate a theoretical background and assume the error on the
measured values. These techniques are appropriate for large signal-to-noise
ratios, but untrustworthy otherwise. When searching for small signals in a
noisy background, extending the Bootstrap to correlated signals provides a new
method that relies on fewer assumptions to calculate the background
distribution.
Signals in multidimensional data are seen by a correlation in measurements.
The most prevalent examples are signals seen by an increase in one variable
when another variable is changed to a particular value. For example, if a
directional microphone were turned through 360 degrees and there happened to
be a loud car at 37 degrees, the decibel meter would peak at 37 degrees. Thus,
there's an interdependence between the angle and decibel measurements.
Conventional techniques fit the background to a function with the supposed
signal removed. Unfortunately, the shape of the fit function often correlates
strongly with the chosen points of where the signal begins and ends. Assume a
distribution with a "reasonable" width for the error on the background map
points. If the data is counting data, the map points have a Poisson
distribution. In any event, the end result is untrustworthy for small signals.
Some experiments attempt to calculate the background theoretically with some
sort of noise added to make the calculation more realistic. This technique is
usually the least sensitive possible, since few of the fundamental processes
are understood well enough to yield a calculation with small systematic
uncertainty.
Searching for small correlation signals in noisy data and systematic biases
lends itself to a variation in the Bootstrap technique; the result is a new,
powerful method of signal analysis. Instead of sampling the distribution with
replacements, you randomly shuffle one quantity with respect to the others to
eliminate the correlation. As before, repeat this process many times to build
up an estimate of the background distribution at each point on the map. The
distributions have the signal included in them and thus give a conservative
estimate. This process is also better for large data sets than tradition
sampling because the events are shuffled in the same memory location rather
than allocating new memory for the sampled events.
Remove the point-by-point correlation by considering the two sets of measured
quantities separate and independent. Because they're independent, the values
can be shuffled into any random order. This technique essentially averages
over one variable while including the biases in the other variables. To sample
uncorrelated events, it's efficient to shuffle the values with respect to each
other using a technique developed by Knuth. After shuffling, read the arrays
of data sequentially for each point. Reading the entire shuffled data set and
making a background map provides one background estimate.
As the median example showed, repeating the Bootstrap process many times
yields many discrete samplings of the background distribution. Bin these
samples to see the distribution or use them to directly determine the
significance. Using these samples to compute the significance of the measured
values is easy. Count the number of estimates above and below the test value
for the probability, or calculate the mean and standard deviation and assume a
Gaussian distribution.
Having no systematic selection of the values used in this procedure includes
the signal into the background estimate. One benefit of using the data with
the signal included is that this generates a conservative background estimate.
The common criticism of background subtraction results is that slight
variations in the assumed background shape can wildly vary the results, which
indicates that the results are not trustworthy. The Bootstrap calculates the
background with the signal values included, generating a conservative, stable
estimate. As previously demonstrated, this estimate is also free of the many
assumptions normally required. Altogether, this technique yields a sensitive
and trustworthy result.
The next example uses the Bootstrap to see a small signal from a star in a
telescope. A simple telescope that measures nothing more than total light
level sweeps past a star to detect whether or not it can see the light from
the star; see Figure 2(a). The two measured values are the angle of the
telescope (q), measured from some reference point, and the light level in the
telescope. The signal is detected by a correlation between the light level and
the angle at which the star is located (q=0). A turbulent atmosphere further
complicates the measurement by randomly fluctuating the light level. Table 1
shows the measured data.
The program in Listing Two (page 81) calculates the background distributions
by randomizing the angle with respect to the light level. Doing so essentially
averages the light levels over the different angles. Figure 3 shows the
results. Notice that the data shows a measurement above the background at the
angle of the star. The program indicates the random chance probability at q=0
is 8 percent; thus, the star seems to be detected by this telescope with 92
percent confidence. The signal might appear stronger than estimated to the
human eye, but that illustrates the problem with fitting functions to make a
background. This result shows that there is a signal, given no assumptions.
Although this example is useful to demonstrate the procedure, it trivializes
the advantages of this technique.
Perhaps the greatest benefit from this type of analysis is the ease with which
systematic biases are handled. Since the data includes biases, using the data
to calculate the background includes the systematic biases, provided you
choose to randomize about the correct variables. If there were a strong bias
against a combination of two angles, it would not be preferable to randomize
out that correlation. Instead, randomizing the set of those angles with
respect to another parameter provides the needed calculation while including
the bias in angles. To illustrate this point, the next example adds a bias
into the data.
Now let's extend the previous discussion to account for the earth's rotation.
Table 2 shows the angle of the earth F, the angle of the telescope q, and the
light level measured by the telescope shown in Figure 2(b). In this example,
there is a large bias to the measurement. When the data was taken, there
happened to be a tree branch overhead shadowing telescope angles between --5
and +5 degrees. The experimenter looks at the distributions shown in Figure 4
and sees there is a bias, which leads him or her to randomize the earth's
angle. Randomizing the earth's angle separately from the telescope angle and
light level grouped together leaves the strong telescope-angle bias in the
calculation. The program in Listing Three (page 81) calculates the background
in this way.
Figure 5 shows the results of the analysis. The histogram represents the
measured values, the plotted points show the mean values of the background
distribution, and the error bars represent the standard deviation of the
background distribution. The program also generates the random-chance
probabilities for the measured variables. The star is seen at a=30 degrees
with a 95 percent confidence.
Although the last example performed well, this technique is actually best for
counting data. If the data in Table 2 is converted to counting data, both
biases are handled optimally. For instance, treat the first light level of 23
to actually mean there are 23 photons collected; replace that one point with
23 values of F=0 and q=--10. After doing this for all the points, repeat the
process with only two changes. First, instead of incrementing the background
map by the light level associated with the q value, increment it by one for
each F,q pair. Second, calculate the mean m as done previously, but set the
s=--m since this is counting data. Counting data is distributed in a Poisson
distribution, which can be approximated by a Gaussian with a variance equal to
the mean. Listing Four (page 82) converts the data from the previous example
into counting data and then performs the suggested algorithms.
The results from this example handle the biases perfectly. Figure 6 shows the
results. The random-chance probability for the observation is extremely small,
P(a=30_) ~0. Once again, notice that the Bootstrap slightly overestimates the
background because the signal is included. This final result does not assume
that any local-angle distributions are uniform. Instead, it entirely accounts
for the distributions with only the angle-to-angle interdependence removed. By
doing so, it provides a formidable result relying on few assumptions.



Conclusions


The Bootstrap is the best possible choice for many signal-analysis situations.
It is one of few ways to propagate errors through complex transformations with
little difficulty, and it requires few assumptions. Its ease of use, accuracy,
excellent handling of local biases, and O(n) run time make it an essential
tool for contemporary signal analysis.


References


Efron, B. Annals of Statistics, 7, 1979.
----. The Jackknife, the Bootstrap and Other Resampling Plans, CBMS 38, 1982.
Flannery, B.P., et al. Numerical Recipes. Cambridge: Cambridge University
Press, 1986.
Knuth, D. The Art of Computer Programming. Reading, MA: Addison-Wesley, 1973.
 Example 1: Error on mC is found with this formula.
 Figure 1: Grade distribution (wide histogram) and Bootstrap-generated
median-probability distribution (small histogram).
 Figure 2: (a) The measured values are the angle of the telescope q, measured
from some reference point, and the light level in the telescope. The signal is
detected by a correlation between the light level and the angle at which the
star is located (q=0); (b) large bias to the measurement.
 Figure 3: Observed (histogram) and predicted (points) light levels vs. q. The
source is at q=0 degrees.
 Figure 4: (a) Light level vs. telescope angle; (b) light level vs. earth
angle.
 Figure 5: Observed (histogram) and predicted (points) light levels vs. a. The
source is at a=30 degrees.
 Figure 6: Observed (histogram) and predicted (points) light levels vs. a. The
source is at a=30 degrees.
Table 1: Data from a telescope that measures total light level.
 q Light Level
 --5 1.1
 --4 0.3
 --3 1.9
 --2 1.6
 --1 2.3
 0 3.3
 1 1.8
 2 1.0
 3 1.7
 4 1.6
 5 0.8
Table 2: The angle of the earth (_), the angle of the telescope (q), and the
light level measured by the telescope shown in Figure 2(b).
 F q Light Level
 0 --10 23
 0 --5 11
 0 0 10
 0 5 10
 0 10 20
 10 --10 19
 10 --5 9
 10 0 13
 10 5 10
 10 10 24
 20 --10 20
 20 --5 11
 20 0 11
 20 5 13
 20 10 35
 30 --10 22
 30 --5 13
 30 0 17
 30 5 14
 30 10 25
 40 --10 34
 40 --5 13
 40 0 9

 40 5 10
 40 10 19
 50 --10 22
 50 --5 10
 50 0 8
 50 5 11
 50 10 25
[LISTING ONE] (Text begins on page 48.)

#include<stdio.h>
#include<math.h>
#include<stdlib.h>

#define NumOfGrades 20
#define NumOfBins 20
#define TestMedian 50
#define TGA {45,67,73,27,44,96,49,38,51,63,11,87,67,46,76,57,53,61,57,64}
#define nBSLoops 1000000


main()
{
int i, j, index, numOver=0;
int grades[NumOfGrades]=TGA, fake[NumOfGrades], iseed = -534;
float sum=0, sumSq=0, median, fmedian, ave, sdev, prob1Tail, prob2Tail;
float hist[NumOfBins];
extern int cmp();
extern float ran1(), FindMedian();

 /*Make sure histogram starts at 0*/
 for( i=0; i<NumOfBins; i++ )
 hist[i]=0.0;
 /*First, find the true median*/
 median = FindMedian( grades );
 printf( "median=%f\n", median );

 for( i=0; i<nBSLoops; i++ )
 {
 /*Make a fake array by sampling the grades with replacement*/
 for( j=0; j<NumOfGrades; j++ )
 {
 index = (int)( (float)NumOfGrades * ran1( &iseed ) );
 fake[j] = grades[ index ];
 }
 /*Find the median for that fake set of grades*/
 fmedian = FindMedian( fake );
 /*Find the components needed later for mean and probability*/
 sum += fmedian;
 sumSq += fmedian*fmedian;
 if( fmedian >= TestMedian )
 numOver++;

 /*Build histogram*/
 index = (int)( (fmedian - 30.0)/2.0 );
 if( (index >= 0) && (index < NumOfBins) )
 hist[index]++;
 }
 /*Output the histogram for printing purposes*/
 printf( "median probability distribution\n" );

 for( i=0; i<NumOfBins; i++ )
 printf( "%d %f\n", (2*i+31), ((float)hist[i]/(float)nBSLoops));
 /*The probability of random occurance*/
 prob1Tail = 1.0 - (float)(numOver)/(float)(nBSLoops);
 prob2Tail = 2.0*prob1Tail;
 printf( "1-tail-prob=%f 2-tail-prob=%f\n", prob1Tail, prob2Tail );
 /*Find the mean and stdev*/
 ave = sum/nBSLoops;
 sdev = sqrt( (sumSq-ave*ave*nBSLoops)/( (float)nBSLoops-1.0 ) );
 printf( "average=%f sdev=%f\n", ave, sdev );

exit(0);
}
float FindMedian( int array[NumOfGrades] )
{
float median;
extern int cmp();
 qsort( array, NumOfGrades, sizeof(int), cmp );
 median = ( (float)(array[9]+array[10]) / 2.0 );
return( median );
}
int cmp( int *in1, int *in2 )
{
int flag;
 flag = *in1 - *in2;
return( flag );
}

[LISTING TWO]

#include<stdio.h>
#include<math.h>

#define NumOfAngles 11
#define TMA { 1.1, 0.3, 1.9, 1.6, 2.3, 3.3, 1.8, 1.0, 1.7, 1.6, 0.8 }
#define TAA { -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5 }
#define nBSLoops 1000000

main()
{
int i, j, index, numOver[NumOfAngles];
int iseed = -534;
int angle[NumOfAngles]=TAA;
float measure[NumOfAngles]=TMA, fmeasure[NumOfAngles]=TMA;
float sum[NumOfAngles], sumSq[NumOfAngles], mean[NumOfAngles], temp;
float sdev[NumOfAngles], prob1Tail[NumOfAngles], fm, tempMeasure;
extern float ran1();
 for( i=0; i<NumOfAngles; i++ )
 {
 sum[i] = 0;
 sumSq[i] = 0;
 numOver[i] = 0;
 }
 for( i=0; i<nBSLoops; i++ )
 {
 /*Make a fake array by performing Knuth shuffling*/
 j = NumOfAngles;
 do
 {

 index = (int)( (float)j * ran1( &iseed ) );
 tempMeasure = fmeasure[ index ];
 fmeasure[ index ] = fmeasure[ j ];
 fmeasure[ j ] = tempMeasure;
 j--;
 }
 while( j > 0 );
 /*Find the components needed later for mean and probability*/
 for( j=0; j<NumOfAngles; j++ )
 {
 fm = fmeasure[j];
 sum[j] += fm;
 sumSq[j] += fm*fm;
 if( fm >= measure[j] )
 numOver[j]++;
 }
 }
 for( i=0; i<NumOfAngles; i++ )
 {
 /*The probability of random occurance*/
 prob1Tail[i] = ((float) numOver[i])/nBSLoops;
 /*Find the mean and stdev*/
 mean[i] = sum[i]/nBSLoops;
 temp = (sumSq[i] - mean[i]*mean[i]*nBSLoops)/( nBSLoops-1.0 );
 sdev[i] = sqrt( temp );
 printf( "\nAt angle=%d\n", angle[i] );
 printf( "measured=%f mean=%f sdev=%f\n", measure[i],
 mean[i], sdev[i] );
 printf( "Prob: 1t=%f\n", prob1Tail[i] );
 }
exit(0);
}

[LISTING THREE]

#include<stdio.h>
#include<math.h>

#define NumOfAngles 30
#define NumOfSMAngles 7
#define TEA {0,0,0,0,0,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,\
40,40,40,50,50,50,50,50}
#define TTA {-10,-5,0,5,10,-10,-5,0,5,10,-10,-5,0,5,10,-10,-5,0,5,10,\
-10,-5,0,5,-10,-5,0,5,10}
#define TLL {23,11,10,10,20,19,9,13,10,24,20,11,11,13,35,22,13,17,14,25,\
34,13,9,10,19,22,10,8,11,25}
#define nBSLoops 1000000

main()
{
int i, j, k, index, numOver[NumOfAngles], temp, alpha;
int iseed = -534;
int SkyMap[NumOfSMAngles], FakeSkyMap[NumOfSMAngles];
int EarthAngle[NumOfAngles]=TEA;
int TeleAngle[NumOfAngles]=TTA;
int LightLevel[NumOfAngles]=TLL;
float sum[NumOfSMAngles], sumSq[NumOfSMAngles], mean[NumOfSMAngles];
float sdev[NumOfSMAngles], prob1Tail[NumOfSMAngles], fsm;
extern float ran1();

 /*First, make sure certain arrays are zero*/
 for( i=0; i<NumOfSMAngles; i++ )
 {
 sum[i] = 0;
 sumSq[i] = 0;
 numOver[i] = 0;
 SkyMap[i] = 0;
 }
 /*Make the Measured SkyMap*/
 for( i=0; i<NumOfAngles; i++ )
 {
 alpha = (EarthAngle[i]+TeleAngle[i])/10+1;
 if( (alpha >= 0) && (alpha < NumOfSMAngles) )
 SkyMap[ alpha ] += LightLevel[i];
 }
 /*With the measured SkyMap in hand, begin generating background maps*/
 for( k=0; k<nBSLoops; k++ )
 {
 /*Make a fake EarthAngle array by performing Knuth shuffling*/
 j = NumOfAngles;
 do
 {
 index = (int)( (float)j * ran1( &iseed ) );
 temp = EarthAngle[ index ];
 EarthAngle[ index ] = EarthAngle[ j ];
 EarthAngle[ j ] = temp;
 j--;
 }
 while( j > 0 );
 /*Zero out the FakeSkyMap*/
 for( i=0; i<NumOfSMAngles; i++ )
 FakeSkyMap[i] = 0;
 /*Make the Fake SkyMap*/
 for( i=0; i<NumOfAngles; i++ )
 {
 alpha = (EarthAngle[i]+TeleAngle[i])/10+1;
 if( (alpha >= 0) && (alpha < NumOfSMAngles) )
 FakeSkyMap[ alpha ] += LightLevel[i];
 }
 /*Find the components needed later for mean and probability*/
 for( j=0; j<NumOfSMAngles; j++ )
 {
 fsm = (float)FakeSkyMap[j];
 sum[j] += fsm;
 sumSq[j] += fsm*fsm;
 if( FakeSkyMap[j] >= SkyMap[j] )
 numOver[j]++;
 }
 }
 for( i=0; i<NumOfSMAngles; i++ )
 {
 /*The probability of random occurance*/
 prob1Tail[i] = ((float) numOver[i])/(float)nBSLoops;
 /*Find the mean and stdev*/
 mean[i] = sum[i]/nBSLoops;
 temp = (sumSq[i] - mean[i]*mean[i]*nBSLoops)/( nBSLoops-1.0 );
 sdev[i] = sqrt( temp );
 }
 /*First, look at the observed distribution*/

 for( i=0; i<NumOfSMAngles; i++ )
 printf( "%d %d\n", (10*(i-1)), SkyMap[i] );
 /*Then, the predicted*/
 for( i=0; i<NumOfSMAngles; i++ )
 printf( "%d %f %f\n", (10*(i-1)), mean[i], sdev[i] );
 /*Now, the random chance probability*/
 for( i=0; i<NumOfSMAngles; i++ )
 printf( "alpha:%d Prob=%f\n", (10*(i-1)), prob1Tail[i] );
exit(0);
}

[LISTING FOUR]

#include<stdio.h>
#include<math.h>

#define NumOfAngles 30
#define NumOfSMAngles 7
#define TotalLightLevel 481
#define TEA {0,0,0,0,0,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,\
40,40,40,50,50,50,50,50}
#define TTA {-10,-5,0,5,10,-10,-5,0,5,10,-10,-5,0,5,10,-10,-5,0,5,10,\
-10,-5,0,5,-10,-5,0,5,10}
#define TLL {23,11,10,10,20,19,9,13,10,24,20,11,11,13,35,22,13,17,14,25,\
34,13,9,10,19,22,10,8,11,25}
#define nBSLoops 100000

main()
{
int i, j, k, index, numOver[NumOfAngles], temp, alpha;
int iseed = -534;
int SkyMap[NumOfSMAngles], FakeSkyMap[NumOfSMAngles];
int EarthAngle[NumOfAngles]=TEA, DEarthAngle[TotalLightLevel];
int TeleAngle[NumOfAngles]=TTA, DTeleAngle[TotalLightLevel];
int LightLevel[NumOfAngles]=TLL;
float sum[NumOfSMAngles], mean[NumOfSMAngles];
float prob1Tail[NumOfSMAngles];
extern float ran1();
 /*First, make sure certain arrays are zero*/
 for( i=0; i<NumOfSMAngles; i++ )
 {
 sum[i] = 0;
 sumSq[i] = 0;
 numOver[i] = 0;
 SkyMap[i] = 0;
 }
 /*Make the Measured SkyMap*/
 for( i=0; i<NumOfAngles; i++ )
 {
 alpha = (EarthAngle[i]+TeleAngle[i])/10+1;
 if( (alpha >= 0) && (alpha < NumOfSMAngles) )
 SkyMap[ alpha ] += LightLevel[i];
 }
 /*Make the discrete angle sets*/
 k = 0;
 for( i=0; i<NumOfAngles; i++ )
 for( j=0; j<LightLevel[i]; j++ )
 {
 DEarthAngle[k] = EarthAngle[i];

 DTeleAngle[k] = TeleAngle[i];
 k++;
 }
 /*With the measured SkyMap in hand, begin generating background maps*/
 for( k=0; k<nBSLoops; k++ )
 {
 /*Make a fake EarthAngle array by performing Knuth shuffling*/
 j = TotalLightLevel;
 do
 {
 index = (int)( (float)j * ran1( &iseed ) );
 temp = DEarthAngle[ index ];
 DEarthAngle[ index ] = DEarthAngle[ j ];
 DEarthAngle[ j ] = temp;
 j--;
 }
 while( j > 0 );
 /*Zero out the FakeSkyMap*/
 for( i=0; i<NumOfSMAngles; i++ )
 FakeSkyMap[i] = 0;
 /*Make the Fake SkyMap*/
 for( i=0; i<TotalLightLevel; i++ )
 {
 alpha = (DEarthAngle[i]+DTeleAngle[i])/10+1;
 if( (alpha >= 0) && (alpha < NumOfSMAngles) )
 FakeSkyMap[ alpha ]++;
 }
 /*Find the components needed later for mean and probability*/
 for( j=0; j<NumOfSMAngles; j++ )
 {
 sum[j] += (float)FakeSkyMap[j];
 if( FakeSkyMap[j] >= SkyMap[j] )
 numOver[j]++;
 }
 }
 for( i=0; i<NumOfSMAngles; i++ )
 {
 /*The probability of random occurance*/
 prob1Tail[i] = ((float) numOver[i])/(float)nBSLoops;

 /*Find the mean*/
 mean[i] = sum[i]/nBSLoops;
 }
 /*First, look at the observed distribution*/
 for( i=0; i<NumOfSMAngles; i++ )
 printf( "%d %d\n", (10*(i-1)), SkyMap[i] );
 /*Then, the predicted*/
 for( i=0; i<NumOfSMAngles; i++ )
 printf( "%d %f %f\n", (10*(i-1)), mean[i], sqrt( mean[i] ) );
 /*Now, the random chance probability*/
 for( i=0; i<NumOfSMAngles; i++ )
 printf( "alpha:%d Prob=%f\n", (10*(i-1)), prob1Tail[i] );

exit(0);
}
End Listings






February, 1994
Your Own Token-Ring Network Manager


An IEEE 802.5 MAC-layer manager with a Windows front end




Andy Yuen


Andy is a software engineer in Sydney, Australia and specializes in automation
and system management. Andy can be reached through the DDJ offices.


When a local area network (LAN) has problems, the first troubleshooting tool
you usually reach for is either Novell's LANalyzer or Network General's
Sniffer. While both are useful for analyzing large networks interconnected
with routers and gateways, there's a class of token-ring network problems that
can be solved using the IEEE 802.5 built-in network-management functions.
These functions include gathering the network-configuration data, detecting
marginally operating token-ring cards and cabling systems, and contending with
other physical network problems.
In this article, I present a token-ring network-management application
consisting of a Windows front end called TRMGR and a network-management agent
called TRAGN. Among other things, this tool lets you list all the active
token-ring adapters on the ring, associate descriptive names with token-ring
adapters, identify soft errors and the fault domain, monitor the network
status, record the fault domain when a fault or hard error (beaconing) occurs,
and remove a token-ring adapter from the ring. I've tested TRAGN and TRMGR on
a 33-MHz 386 with the IBM 4/16 token-ring adapter, DXMA0MOD.SYS/DXMC0MOD.SYS
drivers, and the Novell IPX/NETX LAN requester.


An IEEE 802.5 MAC-layer Network-management Primer


The IEEE 802.5 Token Ring Access Method Standard defines both the physical
layer and the medium access control (MAC) layer. The physical layer defines
physical properties such as data-symbol encoding, decoding, and timing. The
MAC layer defines the control and mediation on ring access and station
management.
The 802.5 standard defines a set of data collection and distribution points,
called "servers," to collect reports from other network stations. You use
these servers to manage the stations. The configuration-report server (CRS)
collects information on station-configuration changes due to insertion or
removal of stations, and removes stations on the ring according to the
manager's request. The ring-error monitor (REM) gathers and analyzes both
hard- and soft-error reports sent by stations on a ring and assists in fault
isolation and correction. Finally, the ring-parameter server (RPS) provides a
set of operating parameters to stations during the insertion process.
Any token-ring station can declare itself to be a CRS, REM, or RPS by setting
the functional address, a set of locally administered group addresses; see
Table 1(a). Functional addresses are analogous to TCP/IP sockets, in which a
particular socket number identifies the application. A station can declare
itself to be both a CRS and an REM simply by logically ORing their addresses
together: X'00000018', for example.
Stations send reports to the servers via MAC frames; see Figure 1. The
information field in a frame consists of a vector (the fundamental unit of
information) that contains a length field, a function-identifier field, and
zero or more subvectors. Only one vector is allowed in a MAC frame. Figure 2
shows the MAC information-field structure. There are 23 vectors and 21
subvectors defined by the IEEE 802.5 standard. Luckily, we only need the
vectors and subvectors in Table 2 to implement the token-ring network manager.
The most-significant byte of the vector identifier contains the destination
class (high nybble) and source class (low nybble). The least-significant byte
contains the vector code. The class information provides a means to route the
frame to the appropriate management function. Table 1(b) lists the function
classes. For example, the report-error vector X'6029' means that the frame is
sent by a ring station (source class X'0') to REM (destination class X'6'),
and the vector code is X'29' (report error).
The BCN (beaconing) vector is sent by a station which detects a beaconing
condition. It contains the UNA subvector and the BCN type subvector. The
reporting station and its nearest active upstream neighbor (NAUN) constitute
the fault domain.
The AMP, SMP, and Report SUA Change vectors all contain the UNA subvector,
which can be used to find out which stations are active on the ring. The AMP
MAC frame is normally generated every three seconds by the active monitor. SMP
frames are then sent by standby monitors to complete the neighbor-notification
process.
The report-error vector contains the isolating error count, nonisolating error
count, and UNA subvectors. These subvectors provide information on the type
and number of soft errors which have occurred on the ring. Together, the
source address of the MAC frame and the UNA subvector define the fault domain.
The remove-ring-stations vector is used to remove a ring station from the
ring. It doesn't contain subvectors.


Implementing TRMGR


TRMGR gets information about the network from TRAGN and presents it to you.
TRMGR also requests the agent to carry out any action in the management of the
network. The agent is simply a real-mode DOS program started before Windows
that performs all the token-ring API calls for the Windows front end.
TRMGR and TRAGN communicate via a software interrupt in exactly the same way
that DOS does with INT 21h. I arbitrarily chose an unused software interrupt,
INT 78h, as the agent service interrupt. True, the Windows front end still
requires the DPMI interface to invoke a real-mode interrupt, but this is
easier than doing everything via DPMI.
TRMSG.H (Listing One, page 84) shows the message structure. There are ten
messages--five requests and five data responses. TRMGR actively polls TRAGN by
sending it any of the five supported request messages. Table 3 lists the
request messages, while Table 4 lists the messages sent by TRAGN in response
to TRMGR's MSG_REQUEST_MSG request.
TRMGR is written in C and C++. I used Zortech C++ 3.0 for development and Gpf
(from GPF Systems, Moodus, CT) for the user interface. Gpf, a visual tool for
generating GUI apps, functions much like Visual C++'s AppStudio and AppWizard
except that it generates C, not C++, code. I used the OS/2-hosted Version 1.3
of Gpf, which has code generators for 16- and 32-bit OS/2 and 16-bit Windows.
I used the Gpf editor to design TRMGR's GUI and the Gpf code generator to
generate a skeleton program which handles the Windows GUI. I actually use Gpf
only as an event dispatcher, so that when certain events (double clicking,
selecting a menu item, and the like) occur, it passes control to one of my
functions. The events handled by my functions are shown in Table 5. TRMGR
source files include TRMGR.H, TRMGR.C (Gpf-generated Windows GUI skeleton),
TRMGR.EXT (Gpf-generated header file), TRMGR.IDS (Gpf-generated resource
identifiers), TRMGR.RC (Gpf-generated resource file), TRFNS.H, TRFNS.C
(functions to handle all Windows events that TRMGR is interested in), TRSMT.H,
TRSMT.CPP (the only C++ module, which provides services to TRFNS.C and
accesses TRAGN via TRMSG for LAN management), TRMSG.H, and TRMSG.C (message
layer which interfaces to TRAGN via DPMI). All of these files, along with
executables, are available electronically; see "Availability," page 3.
TRSMT and TRMSG constitute the network-management part of the GUI front end.
TRSMT is written in C++ because I wanted to use the C++ hashed search-table
classes zHashTable and zGHSearch that come with the Zortech compiler. A hash
table is used to implement a database within TRSMT to record information about
token-ring stations (adapters), soft errors, and faults. The information
recorded (defined in the structure AdapterEntry in TRSMT.C) includes the
adapter's address and symbolic name, soft-error counts, beacon type, and
occurrence time of soft error and fault. The structure members pNaun and pNext
link the information regarding adapters, soft errors, and faults into separate
linked-list structures. Figures 3 and 4 show how these linked lists work. The
entries in these figures are labeled Entry 1, 2, 3, and so on, to identify the
entry in the discussion. They don't represent the relative position in the
database because, in a hash-table implementation, the entries are bound to
scatter within the hash table.
pRingHead always points to the TRMGR adapter entry. Only the pNaun fields of
the entries are used to form a linked list. By going through this linked list,
you can find out which stations are active on the ring. In Figure 3, the ring
configuration in network order is: entry 1, 2, 5, 6, and 4. The adapter at
entry 3 is not active on the ring.
The soft-error and fault linked lists are formed differently from the
configuration linked list. In the soft-error list, pHead points to an entry in
the database, say, entry 1. The pNext fields are used to link all the adapters
which have reported errors. For each adapter, pNaun points to its NAUN. As you
may recall, the reporting adapter, its NAUN, and the medium between them
constitute the fault domain. The time at which the soft errors occurred can
also be found in the adapter entry. In Figure 4, soft errors have occurred in
fault domains which consist of adapters pairs (1, 2), (3, 4), (4, 5), and (5,
2).
A hash table is used because all reports or data messages have the adapter
field set. Adapter addresses are unique, making them good keys for locating an
item quickly in a database. Remember that each MSG_DATA_CONFIG message has
information only on the reporting adapter and its NAUN, and it takes several
reports to figure out who is on the ring. Since these reports are started by
the AMP MAC frame every three seconds, we must locate the database entry for
each adapter quickly. A hash table allows just that. Basically, TRSMT provides
TRFNS access to this database for displaying the network information.
TRSMT also provides TRFNS with the Poll function, which is called whenever
Windows is idle; that is, it is called within the PeekMessage loop, which
replaces the standard Windows GetMessage loop in TRMGR.C. Poll keeps sending
MSG_REQUEST_DATA to TRAGN to solicit network reports until either TRAGN says
there isn't any more report (by returning a ERR_NO_DATA return code) or Poll
has processed the maximum number of messages in a row. In either case, Poll
returns control to PeekMessageLoop and gives up the CPU so that other Windows
applications can run. If any report is returned from TRAGN, Poll processes it
and updates the database.
All database information items, except for adapter addresses and their
symbolic names saved in the text file TRMGR.CFG, are discarded when you
terminate TRMGR. The TRMGR.CFG file is read by PeekMessageLoop at startup time
such that names associated with adapters in a previous TRMGR session are
maintained.
TRMSG.C is the module which interfaces between TRSMT and TRAGN. Since Windows
is running in protected mode, TRMsg uses DPMI to invoke the TRAGN's functions.
It passes the message identifier msg in the EAX register and simulates a
real-mode interrupt using the Zortech dpmi_SimRealModeInterrupt function (INT
31h function 300h). The first time this function is called, it only passes msg
to TRAGN and not the whole message. (MSG_REQUEST_ACTIVATE should always be the
first message sent to TRAGN.) Upon returning from the interrupt, the address
to a block of DOS memory which holds the message structure is returned in the
EDX:EAX register pair. TRMsg converts the real-mode SEG:OFFSET address into a
protected-mode SELECTOR:OFFSET pointer using dpmi_SegToDescriptor (INT 31h
function 0002h) and saves it for all future use. From then on, all message
exchanges use this memory block. If TRMGR were a real-mode app, I would have
passed the message pointer from TRMGR to TRMSG, instead of the other way
around. In Windows protected mode, however, I couldn't allocate DOS memory
using DPMI. Hence, I let TRAGN provide the memory block for communication.
When a MSG_REQUEST_INACTIVATE message is encountered, TRmsg frees the
descriptor by calling dpmi_FreeDescriptor (INT 31h function 0001h).
The advantage of this message-based architecture is that I can go on to
program the Windows front end before writing any code for the
network-management agent. The UI can be tested simply by faking message
exchanges. In this project, I wrote the entire UI before the agent. It is
almost plug-and-play when the agent is ready. Under Windows, you can allocate
DOS memory using the Windows GlobalDosAlloc, which returns both a DOS segment
and a protected-mode selector. I chose not to use it because I wanted to
isolate all Windows-related code to TRFNS.C. In retrospect, this isn't really
necessary, and using GlobalDosAlloc may prove even more convenient.


Implementing TRAGN


To implement TRAGN, I started with the programs on the diskette that comes
with IBM's Technical Reference: Token-ring Network PC Adapter manual, although
I eventually wrote my own version of the API headers. Overall, the TRAGN
program consists of: DLCCONST.H, DLCPTBLS.H (token-ring API constants and
structures); TRAGN.C (source file for the network-management agent; TRUTL.H,
TRUTL.C (a set of utility functions); TRDIR.H, TRDIR.C (a set of functions to
access the token-ring API's direct interface); TRAPP.H, TRAPP.ASM (assembly
program to setup API appendages); and TRMSG.H (message definitions). All of
these files are available electronically.
TRAGN handles all message exchanges with TRMGR and carries out TRMGR's
requests: activating the token-ring adapter, removing a station, reporting on
soft errors and faults, and the like. It interfaces to the token-ring API via
TRDIR.
When run, TRAGN first checks to see if token-ring API support has been
installed. If not, TRAGN terminates with an error message. In general, all
applications using the token-ring API should check for its availability by
checking if the token-ring API support INT 5Ch interrupt vector has been set,
and if so, whether it actually is the token-ring API support. TRAGN then
installs an interrupt handler for INT 78h (chosen arbitrarily), which is used
for message exchange with TRMGR, and sets up the data received and ring-status
appendages by calling appinit in TRAPP.ASM. TRAGN then terminates and stays
resident.
The Zortech compiler comes with a TSR package to facilitate writing TSRs.
However, it's only available in the small memory model and programming the
token-ring API is easier using the large model. The 8086 assembly-language
module TRAPP.ASM installs interrupt handlers and handle appendages because the
Zortech compiler doesn't support the interrupt keyword like the Microsoft
compiler. Again, Zortech provides an interrupt package, but I couldn't get it
to work in the large memory model.

When TRMGR requests activation of the token-ring adapter via INT 78H, TRAGN
will go through the following steps:
1. Issue a dir_interrupt call to check if the token-ring adapter has been
initialized. If so, go to step #3.
2. Call dir_initialise to initialize the adapter; return error code to the
caller if it fails.
3. Call dir_openadapter to physically open the adapter, set various open
parameters, and register the ring-status appendage so that TRAGN will be
informed of all ring-status changes. Return error code to caller if it fails.
4. Call dir_set_func_address to set the functional address for CRS and REM so
that TRAGN can receive configuration and error reports. Return error code to
caller if operation fails.
5. Call receive to start receiving MAC frames and set the data-received
appendage such that TRAGN will be informed when MAC frames are received.
Return error code to caller if operation fails.
When a MAC frame is received, the data-received appendage will be taken. It
checks if the MAC frame is one of the ones we want--BCN, AMP, SMP, Report SUA
Change, or Report Error. If so, it saves the information in a message block
and queues it. The MAC frame is passed to us in a buffer from the
direct-interface buffer pool (setup in the dir_openadapter call). Whether or
not the MAC frame is useful, you still need to return it to the buffer pool.
Unfortunately, if it is released by calling buffer_free within the appendage,
it will cause a system crash. I therefore added the message MSG_HOUSEKEEPING.
A housekeeping message is created with the buffer address stored in the
buffer_one field if the MAC frame isn't the one we want. It also gets queued.
Whenever a MSG_REQUEST_DATA message is received, the first element in the
queue will be retrieved. The message is passed back to TRMGR if it is a
management report, and the memory for both the message and the MAC frame is
freed. If it is a housekeeping message, it frees the MAC frame and message
buffers and retrieves the next message until either there is no more message,
or the message is a management report. Remember that the MAC-frame vector
identifier and length are in big-endian format, unlike that of the PC, which
uses Intel's little-endian format. The ring-status appendage is handled in a
similar fashion, except it does not come from a MAC frame. Consequently, you
don't have to release any memory back to the direct buffer pool.
When TRMGR requests deactivation of the token-ring interface, TRAGN simply
calls receive_cancel to cancel the receive command. It does not actually close
the adapter. When TRMGR requests the removal of an adapter, TRAGN builds a MAC
frame and sends it via dir_transmit.


Conclusions


With the IEEE 802.5 network-management tool presented here, you can begin
snooping around your LAN, discovering and fixing potential problems. However,
you should be aware of some of this application's limitations. For instance,
802.5 MAC frames don't normally go through a bridge from one LAN segment to
another. Consequently, TRMGR can only troubleshoot the LAN on which it is
connected. Nor can TRMGR analyze the traffic load. (Use Sniffer or LANalyzer
along with specially modified token-ring cards for this.)
TRMGR may be confused when the TRMGR adapter gets removed, but its token-ring
adapter driver automatically resets it. For example, in the Novell environment
where the IPX/NETX LAN requester has such a "feature," TRMGR works much more
reliably if the LAN requester is not activated. Finally, although a station
may use up to two token-ring adapters, TRMGR supports only the primary
token-ring adapter.
Table 1: (a) Functional addresses for declaring CRS, REM, or both; (b)
function classes.
 Functional
(a) Function Name Address
 Active Monitor X'00000001'
 CRS X'00000010'
 REM X'00000008'
 RPS X'00000002'
(b) Function Class Value
 Ring Station X'0'
 CRS X'4'
 RPS X'5'
 REM X'6'
Table 2: (a) Vectors used by TRMGR (*does not contain any subvector); (b)
subvectors used by TRMGR.
(a) Vector Vector
 Name Identifier
 BCN (beacon) X'0002'
 Active monitor present X'0005'
 Stand-by monitor present X'0006'
 Remove ring stations* X'0408'
 Report SUA change X'4026'
 Report error X'6029'
(b) Subvector Subvector Subvector Subvector
 Name Identifier Length Value
 BCN Type X'01' 4 2-byte beacon type
 Upstream neighbor's X'02' 8 6-byte node address
 address
 Isolating error count X'2D' 8 Five 1-byte error counters and one byte
reserved
 Nonisolating error count X'2E' 8 Five 1-byte error counters and one byte
reserved
Table 3: TRMGR request messages.
Message Description
MSG_REQUEST_ACTIVATE This is the first message the TRMGR sends to TRAGN. It
requests TRAGN to prepare the token-ring adapter and establish itself as both
REM and CRS so that it can receive management reports. No other parameter in
the message structure is required. On return, the message member adapter
contains the token-ring node address of the TRMGR station.
MSG_REQUEST_MSG TRMGR uses this to get management reports from TRAGN. On
return, the msg and information fields will be changed to the appropriate
message type and report, respectively, if there is any report available. If no
report is available, retcode will be set to ERR_NO_DATA.
MSG_REQUEST_RESET TRMGR uses this to reset the adapter on which TRMGR is
running. Necessary if TRMGR adapter gets removed automatically as a result of
fault isolation or by another station running TRMGR or another network
manager.
MSG_REQUEST_REMOVE The only active network-management facility in TRMGR. The
adapter field contains the address of the ring station to be removed. TRMGR
uses this message to remove a station from the ring if the administrator has
determined that a station is causing performance problem by generating many
soft or hard errors.
MSG_REQUEST_INACTIVATE TRMGR sends this to TRAGN before it terminates to tell
TRAGN it can stop gathering management reports.
Table 4: Messages sent by TRAGN in response to TRMGR's MSG_REQUEST_MSG
request.
Message Description
MSG_DATA_ERROR Fields adapter and naun contain the fault domain. isoerr and
noniso contain the soft-error counts.
MSG_DATA_INDICATION Field ringstatus contains current status of the ring.
Generated only when there is a change in ring status.
MSG_DATA_CONFIG Field adapter contains reporting-station address; naun
contains its neighbor's address. Collecting this report for all stations on
the ring allows the ring's overall configuration to be determined.
MSG_DATA_BEACON Fields adapter and naun make up the fault domain, and
beacontype contains the beacon type.
MSG_HOUSEKEEPING For TRAGN's internal use only. Used for release of MAC- layer
buffers. Not included in original design--added later due to certain
limitations of the DOS token-ring API.
Table 5: Function/event table.
Function Event Window/Control

ChangeName BN_CLICKED Config/ChangeName button
FillAdapters WM_INITDIALOG Config
FillErrors WM_INITDIALOG Errors
FillFaults WM_INITDIALOG Faults
PaintMain WM_PAINT Main
QuitTRMGR WM_CLOSE Main
RecordAdapter LBN_DBCLK Config/Errors/Faults list box
ResetAll BN_CLICKED Config/Reset button
ResetErrors BN_CLICKED Errors/Clear button
ResetFaults BN_CLICKED Faults/Clear button
SelectAdapter LBN_SELCHANGE Config list box
SetOption BN_CLICKED Display menu: dynamic, static
ShowDetails LBN_DBCLK Config/Errors/Faults list box
Figure 1: MAC frame format.
SD Starting Delimiter (1 byte)
AC Access Control (1 byte)
FC Frame Control
DA Destination Address (6 bytes)
SA Source Address (6 bytes)
RI Routing Information (0--18 bytes)
INFO Information (0 or more bytes)
FCS Frame Check Sequence (4 bytes)
ED Ending Delimiter (1 byte)
FS Frame Status (1 byte)
SD AC FC DA SA RI INFO FCS ED FS
Figure 2: MAC frame information-field format.
VL Vector Length (2 bytes)
VI Vector Identifier (2 bytes)
SVL Subvector Length (1 byte)
SVI Subvector Identifier (1 byte)
SVV Subvector Value (n bytes)
VL VI SVL SVI SVV _ SVL SVI SVV
 Figure 3: Linked-list structure for Config.
 Figure 4: Linked-list structure for soft errors and faults.
[LISTING ONE] (Text begins on page 58.)

#ifndef __TRMSG
#define __TRMSG

//#include "trdir.h"
#ifndef DLCPTBLS

typedef unsigned char BYTE;
typedef unsigned int WORD;
typedef unsigned long DWORD;
typedef unsigned char *ADDRESS;

#endif

#define SERVINT 0x78

/* lower half to upper half messages */
#define MSG_HOUSEKEEPING 0
#define MSG_DATA_ERROR 1
#define MSG_DATA_INDICATION 2
#define MSG_DATA_BEACON 3
#define MSG_DATA_CONFIG 4

/* upper half to lower half messages */

#define MSG_REQUEST_MSG 5
#define MSG_REQUEST_RESET 6
#define MSG_REQUEST_REMOVE 7
#define MSG_REQUEST_ACTIVATE 8
#define MSG_REQUEST_INACTIVATE 9

#define MAXMSGS 9

/* error return codes */
#define ERR_OK 0
#define ERR_INVALID_MSG 1
#define ERR_FAILURE 2
#define ERR_NO_DATA 3

/* ring status constants */
#define STS_SIGNALLOSS 0x8000
#define STS_HARDERROR 0x4000
#define STS_SOFTERROR 0x2000
#define STS_XMITBEACON 0x1000
#define STS_WIREFAULT 0x0800
#define STS_REMOVAL1 0x0400
#define STS_REMOVERECVD 0x0100
#define STS_CNTOVERFLOWED 0x0080
#define STS_SINGLESTN 0x0040
#define STS_RECOVERY 0x0020



/* definition all all messages for communicating with the upper half */
typedef struct node{
 struct node *next; //points to the next message
 ADDRESS buffer_one; //DLC buffer address
 WORD msg; //message type
 BYTE retcode; //return code
 BYTE dir_retcode; //direct interface return code
 BYTE more; //number of messages still queued
 BYTE beacontype; //becaon type
 WORD ringstatus; //ring status
 BYTE adapter[6]; //reporting adapter's address
 BYTE naun[6]; //and its neighbour
 BYTE isoerr[5]; //isolating error counts
 BYTE noniso[5]; //non-isolating error counts
} Msgs;

#if __cplusplus
extern "C" {
#endif

int TRMsg(Msgs *msg);
void TRPrintMsg(Msgs *msg);

#if __cplusplus
}
#endif

#endif
End Listing





February, 1994
Examining the Windows Setup Toolkit


Taking the pain out of installation




Walter Oney


Walter is a freelance developer and software consultant based in Boston. He
specializes in system tools and in interfacing complex applications to
Windows, DOS, and NT. You can reach him on CompuServe at 73730,553.


One of the best-kept secrets in the Microsoft Windows Software Development Kit
(SDK) is the Setup Toolkit. In this article, I'll describe the contents of the
Setup Toolkit and explain how you can use it to quickly build high-quality
setup programs for your Windows application.
Writing setup programs is often a thankless task. If the installation goes
smoothly, the users talk about the product instead; if it goes badly, your
customer-support department won't let you forget the flood of complaints. In
writing the install program, you must design a creative, visually appealing
introduction to your product--and at the same time be utterly paranoid in
planning for all the varied configurations and preferences that the install
program will confront. Moreover, setup programs must often be written under
absurd time pressure, after everything else is done.
Good Windows setup programs are especially hard to write. All of the work of
determining what to install and where and of prompting the user through a
series of diskettes has to be done with an event-driven GUI. You must let an
impatient user cancel the process at any time. Often you have to provide
special drivers, only one of which is appropriate to any given user
configuration. Context-sensitive help may be an additional requirement. You
may need to update one or more profile files, and, if you're going to be
working with an OLE server, the Windows registry database. You may have to
install a redistributable system component like TOOLHELP.DLL or Win32s, and
you must then take care to put files into the right directories after
verifying that the user doesn't already have a more recent version from some
other vendor.
Most major vendors of Windows applications have built their own install
programs to enable them to wring the last possible custom nuance out of their
GUI. These one-of-a-kind programs are major development undertakings and are
beyond the reach of most developers. Unless you've written your application in
Visual Basic 3.0 (which lets you use Setup Wizard to build an install
program), you're probably stuck.
But as you'll see, the Setup Toolkit in the Windows SDK lets you build a
pretty good setup program with minimum fuss--by just writing a Basic program
and customizing a few dialog templates. You won't get all the effects of a
fully handcrafted setup program, but you will get a credibly professional
result. Although you have to write some code, the toolkit solves the harder
problems: interacting with the user, reading and parsing control files,
interfacing with DOS to check disk space and the like, and recovering from
errors. The toolkit is also available in a 32-bit version for Windows NT,
which allows your install program to port easily from 16-bit Windows. However,
the toolkit's documentation is sketchy at best.


The Minimal Install


I investigated the Setup Toolkit by creating an installable version of a small
application--in this case, Paul Yao's MIN, a minimum Windows program discussed
in his book Windows 3.1 Power Programming Techniques (Bantam, 1992). The
overall flow of my setup program is as follows:
1. User inserts the single installation disk and then launches SETUP from
Windows.
2. SETUP reads the SETUP.LST file from the installation disk, creates a
temporary working directory on the user's hard disk, and copies and
decompresses the rest of the setup program into the working directory.
3. SETUP invokes the Microsoft Test engine to interpret the customized setup
script for the MIN sample.
4. Script announces itself to the user and copies eight source files, plus one
prebuilt executable file, to the C:\MIN directory.
5. Script builds a Program Manager group with an icon for MIN.
In a nutshell, the steps in creating a MIN setup program are to customize the
MSCUISTF.DLL dynamic link library, which contains all dialogs presented to the
end user, then create an installation script named MIN.MST. This script is
essentially a Basic program that will be executed by the Microsoft Test engine
in order to perform the installation. (Note that the Microsoft Test program is
not included, only its script engine; to debug scripts, you might find it
helpful to obtain the complete Test package and work with it directly.) You
next create a MIN.INF file that describes all the files that will be
installed, on which installation disk they'll be found, to where they will be
copied, and so on. Next, create a SETUP.LST file that directs the SETUP.EXE
program to copy and decompress certain files from the first installation disk
to a temporary directory on the user's system. Finally, you create images of
the installation disks. A disk-layout program in the toolkit substantially
automates this often-tedious process.
First, you use a tool such as Microsoft's Dialog Editor or Borland's Resource
Workshop to modify the dialog templates contained in DIALOGS.RES (in the
BLDCUI subdirectory). DIALOGS.RES contains templates for 16 different dialog
boxes. I changed the icon located at the top-left corner of the WELCOME dialog
by modifying the IDC_SETUP entry (in DIALOGS.RC) that specifies the icon
filename. Once the dialog has been changed, you build MSCUISTF.DLL in the
BLDCUI subdirectory, using the makefile from the toolkit. However, the
makefile was designed for Microsoft C 6.x instead of Visual C++ (which I use).
The linkers in these two packages are slightly incompatible, so I had to
remove the /NOP link flag and add the /NOE option.
The next step is creating an installation script. The Toolkit provides three
samples: SAMPLE1.MST, SAMPLE2.MST, and SAMPLE3.MST. These illustrate
progressively more complex operations that can be done with the Setup Toolkit.
I took SAMPLE1 and removed a great deal of code to end up with MIN.MST, shown
in Listing One (page 86). As mentioned, this is just a Basic program to be
executed by the language engine included with Microsoft Test. The very first
line in MIN.MST includes SETUPAPI.INC. This file contains many useful
functions, described further in the Toolkit documentation. The next several
lines of the setup script declare manifest constants, global variables (such
as DEST$, which contains the name of the installation directory on the user's
machine), and local functions (Install and MakePath). Obviously, a more
complicated setup script would have more code in this section of the file.
Immediately after the INIT label, is a section containing initialization code.
The SetBitMap and SetTitle functions establish the background and title for
the main frame window of the setup program. In the example, the bitmap
originates as BITMAP.DIB in the BLDCUI subdirectory and contains a "Microsoft
Setup" logo on a shaded blue background. We'll change this later. The
GetSymbolValue function retrieves a string from a symbol table that can also
be accessed by the C programs in the DLLs associated with the setup program.
The setup script wants to determine the pathname of the MIN.INF file. If SETUP
was invoked with a command-line argument, STF_SRCINFPATH is that pathname.
Otherwise, the script assumes that SETUP copied and decompressed MIN.INF into
the current working directory, the name of which is the value of the
STF_CWDDIR symbol. The last step of initialization is to call ReadInfFile to
read the .INF file into memory so that other setup helper functions can access
it.
By this time, the user has seen both a dialog announcing that setup has been
initialized (generated by SETUP.EXE under control of options in the SETUP.LST
file) and a frame window with the title and background bitmap established by
the INIT section of code. The WELCOME section of code calls the UIStartDlg
function to display the WELCOME dialog from the resources in MSCUISTF.DLL.
This is the dialog I customized earlier. The first two arguments to UIStartDlg
let you specify a DLL and a dialog resource integer ID within that DLL. The
third argument names a dialog function exported by the DLL. The fourth and
fifth arguments specify the dialog and dialog function that FInfoDlgProc will
use to react to a Help request. (FInfoDlgProc is one of the generic dialog
functions in bldcui\dlgprocs.c--it responds to the Continue, Exit, and Help
buttons in the WELCOME dialog.) If the user dismisses the WELCOME dialog by
pressing Continue, the setup script regains control and calls the UIPop
function to remove the dialog from the screen.
If the user elects to continue with setup, the setup script calls its own
internal Install function to do the actual work of installing the product. The
CreateDir function creates the installation directory (C:\MIN) if it doesn't
already exist. AddSectionFilesToCopyList and CopyFilesInCopyList work together
to actually install files from the installation diskettes. The "copy list" is
a list of files to be installed. You build this list from pieces of the .INF
file. For example, for MIN.CUR, MIN.DEF, and other entries in the [Files]
section be added to the copy list, you write AddSectionFilesToCopyList
"Files", SrcDir$, DEST$.
Once you've built the copy list, call CopyFilesInCopyList to actually install
them. This function first sorts the copy list by source disk so that the user
doesn't have to insert disks that were previously used and removed. Then it
prompts the user through the disk sequence and copies the files to their
respective destinations. It also updates the thermometer-style progress bar to
show the percentage of completion.
After installing all the necessary files, Install creates a Program Manager
group and adds an icon for the MIN sample program. This is accomplished via
functions in SETUPAPI.INC that use DDE messages to communicate with the
Program Manager. Having written C code to do the same thing, I can attest to
how much easier these functions are to use. CreateProgmanGroup creates a group
named "Samples." There is an undocumented limit of 24 characters for the group
name, by the way. ShowProgmanGroup displays the group so the user can watch as
icons are added to it. CreateProgmanItem adds an individual icon to the group.


Laying Out the Disks


The setup program requires two auxiliary files. The first, SETUP.LST, is used
by SETUP.EXE in order to initialize the setup program on the user's machine.
Although you have to create SETUP.LST before you lay out your disks, I'll
defer talking about it. The second file, which has an extension of .INF, lists
all the files to be installed, their destinations on the user's hard disk,
attributes of the files, and so on. You could build this file by hand, and the
toolkit manual gives you instructions on how to do this. It's easier, however,
to use the disk-layout tools Microsoft provides instead. You do this in two
steps. First, you run the DSKLAYT tool to specify the files which will be
installed. Then you run the DSKLAYT2 tool to build actual disk images and the
.INF file. Both programs are in the DISKLAY subdirectory of the toolkit.
DSKLAYT is a Windows program that works with the files that make up your
product, letting you specify layout-time and install-time options for each of
your product files. A layout-time option is one that affects how you build the
install-disk images. An install-time option takes effect when the SETUP
program is actually running on the end user's machine. In the case of MIN.EXE,
I specified the same handling for all my files: Files can be placed on any
diskette (in this instance, there was only one); every file is a "vital" file
(that is, it must be successfully copied to avoid an install-time error); all
files are stored in compressed form on the installation diskette; none of the
files possess a meaningful version resource; all files should show their real
date and time; and finally, all files have read/write permission on the user's
disk (otherwise it's harder to delete them later).
DSKLAYT makes some undocumented assumptions about how you've setup your
development environment, and it will bite you if you do not fulfill those
assumptions. You'll want to place all of the files that belong on your
installation disks (and no others) in a single top-level directory (which I
call the "layout working directory") on your disk. You can have a subdirectory
hierarchy, but keep the names of your subdirectories short; the list box
DSKLAYT gives you for file selection doesn't scroll horizontally. You can have
extraneous files in this directory tree, but you'll have to specifically
exclude them from the installation disks. Finally, copy the files you need
from the setup-toolkit directory (see Table 1) into your layout working
directory before you run DSKLAYT.
After specifying file properties, the resulting layout is saved in the DISKLAY
subdirectory (in my case, with the filename MIN.LYT). Then, make DISKLAY the
current directory (in a DOS box) and run the DSKLAYT2 utility to build an
image of my one-and-only installation disk like this dsklayt2 min.lyt
..\min.inf /d \setup /f /k 144.
DSKLAYT2 creates an .INF file (MIN.INF in the parent directory) and builds
images of the installation disks in \SETUP\DISK1, \SETUP\DISK2, and so on. The
/f option tells DSKLAYT2 to overwrite the .INF file. The /k 144 option tells
it to layout 1.44-Mbyte, high-density, 3.5-inch disks.
There's a chicken-and-egg problem that DSKLAYT2 deals with quite nicely. The
.INF file belongs on the first disk. You can't build the .INF file, though,
until you know how files are going to be distributed across the installation
disk set (because one of the parameters in each file description is the disk
number on which the file is stored). But you can't lay out the first disk
until you know how long the .INF file is. DSKLAYT2 solves this problem by
first building and measuring a dummy .INF file and then automatically copying
the finished .INF file (in compressed form) onto the first disk image. This
automatic behavior is the reason you don't need to mention your .INF file when
you run DSKLAYT.
Unfortunately, the toolkit's automation breaks down at this point in two
respects. The first has to do with the .INF file. Remember that you need to
tell DSKLAYT about all of the setup-program files, in addition to your product
files. This is so DSKLAYT2 will place the setup program on the first disk,
where it belongs. DSKLAYT2 also puts the setup-program files into the .INF
file, however, and this causes them to be installed onto the end user's
computer. Of course, you may want the setup program installed on the user's
machine--to make it easier for the user to modify the installation. In the MIN
example, however, I didn't really want to clutter up the user's disk with 11
extraneous files. To avoid having setup install itself, therefore, I had to
hand edit MIN.INF and compress it by hand into the disk image. Although this
is only a minor headache, it must be done every time DSKLAYT2 is used to
layout a new set of disk images.
The second breakdown of automation concerns the SETUP.LST file. This file must
be present (in uncompressed form) on the first setup disk. It tells SETUP.EXE
how to launch the Test engine against the right setup script. It also lists
the files which should be copied from the first disk into the temporary
working directory that SETUP.EXE will create. Example 1 shows the SETUP.LST
file for this example.
If you look at the [Files] section of SETUP.LST, you might wonder about the
files whose names end with an underscore. The trailing underscore denotes
files which are compressed on the setup disk. You are required to type the
filenames in this way. One would think DSKLAYT2 could build this portion of
SETUP.LST, since it knows which files are compressed and which aren't, but
that's not the case.


Customizing the Installation



The example so far results in a workable but rudimentary setup program. You
can customize it to be more eye-catching. One way is to change the background
screen. Recall that SetBitmap is responsible for this, and we merely need to
give it the name of a DLL and a resource identifier within that DLL. In my
case, I created a monochrome bitmap using Windows Paintbrush, then added 2
BITMAP MIN.BMP to DIALOGS.RC in the BLDCUI subdirectory. After rebuilding
MSCUISTF.DLL, I changed the SetBitmap call in the setup script to refer to
bitmap 2 instead of 1.
Another worthwhile enhancement is to let the user choose where product files
are installed. This can be accomplished with the setup-script code in Listing
Two (page 86), which introduces another setup API
function--IsDirWritable()--the purpose of which should be clear. The handling
of the REACTIVATE return from the dialog is, however, anything but clear. This
return occurs when the user switches focus to the setup program after having
been away from it for a while. Some (not all) of the dialog functions in the
toolkit then return this special value. You need to respond by redisplaying
the dialog box, as shown in the code fragment.


Version Information


With Windows 3.1, Microsoft introduced a standard for controlling software
versions on end-user machines. This standard addresses the situation in which
several products from one company share components such as DLLs, font files,
device drivers, and so on; alternatively, there may be products from multiple
software vendors that rely on a third-party DLL or device driver. For example,
vendors of 32-bit Windows applications will likely redistribute Microsoft's
Win32s components.
You plainly only want one copy of such shared files installed on the end
user's machine. Further, if your product relies on the latest version of a
particular file, you don't want some other vendor's install program to wipe
out the copy you installed with some previous version. Yet, without some way
of checking version levels, somebody else's install program might do just
that--simply because that program was built before the newer version was
available.
Microsoft's answer is to promulgate a standard way of describing software
versions within a program's resources and to provide a set of version-checking
APIs packaged in the redistributable file VER.DLL. If you're writing C
programs, your routines will use VER.DLL directly. With the Setup Toolkit,
however, you don't need VER.DLL; the toolkit incorporates the necessary code
in another way.
An example of how to version-stamp your application is shown in Listing Three
(page 86). This shows a portion of an RC file which you'd customize for your
application.
Once the version information is present in your executable files, users can
query the database with a utility like MSD. The more important use of the
version information, of course, happens at the time an install program decides
whether and where to install the file. If you specify the "Overwrite Older"
option in DSKLAYT and use the CopyFilesInCopyList API to copy files, the
toolkit will automatically check version information to avoid overwriting a
later version of a file. The user is never informed that your older version is
being skipped. If you need finer control, you can use the GetVersionOfFile()
API to interrogate a version resource.


Additional Features


The Setup Toolkit provides other features for enhancing your installation
programs that we can only touch on here. Do you want to provide a private
profile file to record installation options and user preferences? Just use
CreateIniKeyValue to create and modify it. How about imprinting the installed
copy with the name of the user? There's a StampResource subroutine in the
toolkit that may help. Will your product open an especially large number of
files? Use GetConfigNumFiles to verify that the user's CONFIG.SYS has
specified a large-enough FILES parameter. Is your product an OLE server? Use
the extensive set of registry functions to interrogate and modify the Windows
Registration Database concerning your product.
And finally, do you need to accomplish a function that isn't already provided
for by a Basic function in the Toolkit? You can call many DLL functions
directly from Basic simply by declaring them. For example, to call the
WinExec() function in the Kernel portion of the Windows API, you would first
declare it as follows: DECLARE FUNCTION WinExec LIB "KERNEL.EXE" (cmd$, show%)
AS INTEGER. The ability to call functions and subroutines within DLLs also
allows you to write your own DLLs, of course, thereby extending the
capabilities of your install script to include anything possible in Windows.
Example 1: Contents of SETUP.LST.
[Params]
 WndTitle = Windows Power Programming Sample Setup
 WndMess = Initializing Setup...
 TmpDirSize = 500
 TmpDirName = ~msstfqf.t
 CmdLine = _mstest min.mst /C "/S %s %s"
 DrvModName = DSHELL

[Files]
 min.ms_ = min.mst
 min.in_ = min.inf
 setupapi.in_ = setupapi.inc
 mscomstf.dl_ = mscomstf.dll
 msinsstf.dl_ = msinsstf.dll
 msuilstf.dl_ = msuilstf.dll
 msshlstf.dl_ = msshlstf.dll
 mscuistf.dl_ = mscuistf.dll
 msdetstf.dl_ = msdetstf.dll
 _mstest.ex_ = _mstest.exe

Table 1: Toolkit files which must be present on your installation disk (in
addition to your application's files). The toolkit files must reside on Disk
#1.

File Compress?
setup.exe No
setup.lst No
_mstest.exe Yes
mscomstf.dll Yes
mscuistf.dll Yes
msdetstf.dll Yes
msinsstf.dll Yes
msshlstf.dll Yes
msuilstf.dll Yes
setupapi.inc Yes
min.mst Yes
[LISTING ONE] (Text begins on page 68.)

-----------------------------------------------------
$INCLUDE setupapi.inc'


CONST WELCOME = 100
CONST APPHELP = 900

GLOBAL DEST$

DECLARE SUB Install
DECLARE FUNCTION MakePath (szDir$, szFile$) AS STRING
-----------------------------------------------------
INIT:
 CUIDLL$ = "mscuistf.dll"
 HELPPROC$ = "FHelpDlgProc"

 SetBitmap CUIDLL$, 1
 SetTitle "Windows Power Programming Samples"
 szInf$ = GetSymbolValue("STF_SRCINFPATH")
 IF szInf$ = "" THEN
 szInf$ = GetSymbolValue("STF_CWDDIR") + "MIN.INF"
 END IF
 ReadInfFile szInf$
 DEST$ = "C:\MIN"
WELCOME:
 sz$ = UIStartDlg(CUIDLL$, WELCOME, "FInfoDlgProc", APPHELP, HELPPROC$)
 IF sz$ = "CONTINUE" THEN
 UIPop 1
 ELSE
 GOTO QUIT
 END IF
 Install
QUIT:
 END
-----------------------------------------------------
SUB Install STATIC
 SrcDir$ = GetSymbolValue("STF_SRCDIR")
 CreateDir DEST$, cmoNone
 AddSectionFilesToCopyList "Files", SrcDir$, DEST$
 CopyFilesInCopyList
 CreateProgmanGroup "Samples", "", cmoNone
 ShowProgmanGroup "Samples", 1, cmoNone
 CreateProgmanItem "Samples", "Minimum Windows Program",
 MakePath(DEST$,"min.exe"), "", cmoOverwrite
END SUB
-----------------------------------------------------
FUNCTION MakePath (szDir$, szFile$) STATIC AS STRING
 IF szDir$ = "" THEN
 MakePath = szFile$
 ELSEIF szFile$ = "" THEN
 MakePath = szDir$
 ELSEIF MID$(szDir$, LEN(szDir$), 1) = "\" THEN
 MakePath = szDir$ + szFile$
 ELSE
 MakePath = szDir$ + "\" + szFile$
 END IF
END FUNCTION

[LISTING TWO]

-----------------------------------------------------
GETPATH:
 SetSymbolValue "EditTextIn", DEST$

 SetSymbolValue "EditFocus", "END"
-----------------------------------------------------
GETPATHL1:
 sz$ = UIStartDlg(CUIDLL$, DESTPATH, "FEditDlgProc", APPHELP, HELPPROC$)
 DEST$ = GetSymbolValue("EditTextOut")
 IF sz$ = "CONTINUE" THEN

 IF IsDirWritable(DEST$) = 0 THEN
 GOSUB BADPATH
 GOTO GETPATHL1
 END IF
 UIPop 1
 ELSEIF sz$ = "REACTIVATE" THEN
 GOTO GETPATHL1
 ELSE
 GOTO QUIT
 END IF
-----------------------------------------------------

BADPATH:
 sz$ = UIStartDlg(CUIDLL$, BADPATH, "FInfo0DlgProc", 0, "")
 IF sz$ = "REACTIVATE" THEN
 GOTO BADPATH
 END IF
 UIPop 1
 RETURN

[LISTING THREE]

#include <windows.h>
#include <ver.h>

VS_VERSION_INFO VERSIONINFO
FILEVERSION 1,0
PRODUCTVERSION 2,10
FILEFLAGSMASK VS_FFI_FILEFLAGSMASK
FILEFLAGS (VS_FF_PRERELEASE VS_FF_DEBUG)
FILEOS VOS_DOS_WINDOWS16
FILETYPE VFT_DLL
FILESUBTYPE VFT2_UNKNOWN
BEGIN
 BLOCK "StringFileInfo"
 BEGIN
 BLOCK "040904E4"
 BEGIN
 VALUE "CompanyName", "Your Company Name\0"
 VALUE "FileDescription", "Your File Description\0"
 VALUE "FileVersion", "1.0\0"
 VALUE "InternalName", "Name from .DEF file\0"
 VALUE "LegalCopyright", "Copyright \251 1993 ...\0"
 VALUE "LegalTrademarks", "...\0"
 VALUE "ProductName", "Name of your product\0"
 VALUE "ProductVersion", "2.1\0"
 END
 END
 BLOCK "VarFileInfo"
 BEGIN
 VALUE "Translation", 0x0409, 1252
 END

END
End Listings




























































February, 1994
NT-Style Threads for MS-DOS


Phar Lap's TNT 386/DOS-Extender makes it possible




Al Williams


Al is the author of DOS and Windows Protected Mode and Commando Windows
Programming, both published by Addison-Wesley. Al can be contacted at 310 Ivy
Glen Court, League City, TX 77573.


Mention features such as threads, DLLs, and 32-bit memory allocation, and
you'll probably think about operating systems such as Windows NT or OS/2 2.1.
By using special tools, however, you can take advantage of features such as
these in an MS-DOS program. Better still, your source code can be compatible
with Windows NT or Win32s.
In this article, I'll examine Phar Lap's TNT 386DOS-Extender, which provides a
subset of the Win32 programming API. You can use many features from the
Win32-base API specification, including multiple threads, DLLs, and 32-bit
memory allocation.
TNT also allows you to mix standard DOS and BIOS interrupts (and C library
functions that use them) with Win32 functions in your programs. Of course,
doing so prevents you from compiling your program for native Windows NT or
Win32s. However, a mixed program can still run in a DOS box under Windows or
NT.


About DOS Extenders


DOS extenders let you write programs for MS-DOS that take advantage of
protected-mode features; a typical 386 extender supports access to four
gigabytes of virtual memory, for example. Most extenders supply functions that
stand in for the common DOS and BIOS interrupts. Of course, you can't use an
ordinary 16-bit compiler to generate 32-bit programs. Instead, you need a
special 32-bit compiler.
Traditionally, programmers have turned to extenders when the need arose for
additional memory. However, the new hybrid extenders (like TNT) offer much
more than simple extended-memory management. They mimic other operating
systems and provide modern operating-system features for MS-DOS programs.
In this article, I'll develop a DOS program that removes a directory tree.
Instead of recursing down the tree, I'll use NT-style threads to process the
directories in parallel. The resulting program, XPRUNE, will work with TNT or
Windows NT.


About Multithreading


In simple terms, a thread is a piece of code that can execute in parallel with
other threads. Threads share global variables, but not local (stack-based)
ones. For example, if your program is performing a complex calculation, it
might spawn a thread to check for keyboard input. If this thread detects a key
press, it aborts the calculation. Ordinarily, you would have to code the
calculation to occasionally scan for input. With the thread method, your
calculation continues unimpeded until the keyboard-scanning thread aborts it.
Multiple threads can share the same code. That is, a game program might have a
function that draws an asteroid on the screen. This function tracks the
asteroid's position, color, and spin rate. Although this is one function, it
could run in multiple threads to create several asteroids at once.
Windows NT contains a call for creating new threads (CreateThread()), but you
won't use it often from inside C/C++ programs. Instead, you should use the C
function _beginthread() (declared in PROCESS.H), because only by starting a
thread this way can you call C library functions. Figure 1 shows the details
of the _beginthread() call.
The _beginthread() function returns a thread handle. You can use this handle
to query the thread's status or control the thread. For example, you can use
WaitForSingleObject() with a thread handle to wait for the thread to
terminate. Under NT, you can also use WaitForMultipleObjects() to wait for
multiple threads, but TNT doesn't implement this call; see Table 1.


Using Threads


XPRUNE (see Listing One, page 88) is a TNT program that uses threads. XPRUNE
accepts a directory name and removes the directory and all the files and
directories under it.
Traditionally, programs like XPRUNE would use simple recursion, deleting each
directory tree in sequence. With threads, XPRUNE can delete all the trees in
parallel. Of course, you can't remove a directory until all of its contents
are gone, so XPRUNE needs some synchronization.
Under DOS, deleting directories in parallel won't help the XPRUNE program go
faster--DOS is inherently single-tasking. Still, when the program runs on an
NT system (perhaps even a multiprocessor system), things should speed up
considerably.
The erase_all() function is the heart of XPRUNE. This function removes a
directory and its contents. It sends each file that it finds in the initial
directory to del(). When del() detects a subdirectory, it creates a new thread
using erase_all().
Since you can only pass one pointer to a thread function, erase_all() accepts
a pointer to structure (struct pkt) as an argument. Callers must allocate this
pointer using malloc(). In general, the pointer you pass to the thread should
not point to a local or global variable; the local variable may go out of
scope, and the global variable's value may get changed unexpectedly.


Synchronization


XPRUNE can't remove a directory until it deletes all the files in it.
Therefore, erase_all() must wait for all the del() calls (and their threads)
to complete before removing the directory.
Creating a list of thread handles and using WaitForSingleObject() is one way
to wait for the multiple del() threads to complete, but this is awkward. A
better method is to use semaphores (see the accompanying text box entitled,
"About Semaphores").
Many multitasking operating systems (including Windows NT) support semaphores.
Since TNT does not, I wrote TNTSEM (see Listings Two and Three, page 88).
TNTSEM implements semaphores using events--an NT feature that TNT does
support. Of course, TNTSEM will work fine under Windows NT, too.


Compiling XPRUNE



You can compile XPRUNE with any TNT-supported compiler. I used Microsoft
Visual C++ 32-bit edition (the version that runs under DOS and generates
WIN32s code). From the command line, enter:
cl /MT /Zi xprune.c tntsem.c.
To make the resulting executable work with TNT, enter:
rebind xprune.


Other TNT Capabilities and Limitations


As you can see in Table 1, TNT supports numerous NT features. For example, you
can create and use DLLs. You can also call DOS and BIOS functions directly. Of
course, doing so prevents your program from running under Windows NT (except
as a DOS program in the NT DOS box). If you program in C, you won't use many
of these functions anyway--the compiler's run-time library uses them. You
continue to use fopen(), malloc(), and other familiar calls.
TNT can still produce files compatible with previous versions of Phar Lap's
386 extenders. However, you can't make NT-style calls from these programs. The
NT-style programs allow you to mix old and new API calls along with DOS and
BIOS function calls--the best of both worlds.
TNT can help you write DOS-extended programs that may also run under Windows
NT. However, if you are an experienced NT programmer, you'll quickly be
frustrated with some omissions in the TNT API (semaphores, for example). Phar
Lap promises to continue adding API functions in subsequent releases.
Also, TNT does not support Unicode. Only the ANSI character-set functions are
available. Still, for most programmers this isn't a problem.
If you are coming from a DOS or Windows environment, you'll find TNT's
functions quite rich. You can allocate large amounts of memory and perform
high-level file operations by making simple TNT calls.
You may find that using features like threading can slow down a program under
DOS (this is a problem with DOS, not TNT). However, the time difference is
usually not very large. If you compile XPRUNE with the /DSINGLE option, it
will not use threads. Then you can compare the time difference for yourself.


Summary


TNT is worth looking at if you need to write DOS-extended programs using
widely available tools or programs that run under DOS, Windows, and Windows
NT.
If you've tried DOS extenders before, you'll find that 32-bit tools are
finally in the mainstream. Using NT development tools means not having to
compromise--you'll have full-featured debuggers and libraries.
If you need to support code for DOS and Windows NT, TNT can simplify your life
considerably. You may have to roll some of your own API functions for now, but
that is easier than trying to do it all from scratch.


References


Win32 Applications Programming Interface. Redmond, WA: Microsoft, 1992.
Williams, Al. DOS and Windows Protected Mode. Reading, MA: Addison-Wesley,
1992.
--. "Your Own Disk Duplication Program." Dr. Dobb's Journal (January, 1992).
--. "Programming with Phar Lap's 286DOS-Extender." Dr. Dobb's Journal
(February, 1992).
--. "Roll Your Own DOS Extender." Dr. Dobb's Journal (October/November, 1990).
Figure 1: (a) _beginthread() call; (b) return value; (c) arguments.
(a) unsigned long _beginthread(void (*f)(void *), unsigned stksiz, void *
arg);
(b) Thread handle (cast to type HANDLE). If an error occurs, the return value
is --1.
(c) f, pointer to thread function. Function's prototype is void f(void *arg).
 stksiz, stack size in bytes. If 0, use default size.
 arg, void pointer to pass to thread function.
Table 1: NT API functions available for use with TNT.
Console I/O
FlushConsoleInputBuffer
GetCommandLine
GetConsoleCP
GetConsoleMode
PeekConsoleInput
ReadConsole
ScrollConsoleScreenBuffer
SetConsoleCtrlHandler
SetConsoleCursorPosition
SetConsoleModeSetConsoleTitle
WriteConsole
Debugging
ContinueDebugEvent
DebugBreak
OutputDebugString
QueryPerformanceCounter
QueryPerformanceFrequency
ReadProcessMemory

UnhandledExceptionFilter
WaitForDebugEvent
WriteProcessMemory
File Manipulation
CopyFile
CreateDirectory
CreateFile
DeleteFile
DosDateTimeToFileTime
FileTimeToDosDateTime
FileTimeToLocalFileTime
FileTimeToSystemTime
FindClose
FindFirstFile
FindNextFile
FlushFileBuffers
GetCurrentDirectory
GetDiskFreeSpace
GetDriveType
GetFileAttributes
GetFileInformationByHandle
GetFileSize
GetFileTime
GetFileType
GetFullPathName
GetLogicalDrives
GetStdHandle
GetSystemDirectory
GetVolumeInformation
_lclose
LockFile
MoveFile
OpenFile
ReadFile
RemoveDirectory
SearchPath
SetCurrentDirectory
SetEndOfFile
SetFileAttributes
SetFilePointer
SetFileTime
SetHandleCount
SetStdHandle
SystemTimeToFileTIme
UnlockFile
WriteFile
Memory Management
CreateFileMapping
GlobalAlloc
FlobalFree
HeapAlloc
HeapFree
HeapReAlloc
HeapSize
LocalAlloc
LocalFree
LocalReAlloc
LocalSize
MapViewOfFile

UnmapViewOfFile
VirtualAlloc
VirtualFree
VirtualQuery
Miscellaneous
Beep
CloseHandle
DuplicateHandle
GetCPInfo
GetEnvironmentStrings
GetEnvrionmentVariable
GetLastError
GetLocalTime
GetSystemTime
GetTimeZoneInformation
GetVersion
IsDBCSLeadByte
RaiseException
SetEnvrionmentVariable
SetErrorMode
SetLocalTime
SetSystemTime
Sleep
Module Management
FreeLibrary
GetModuleFIleName
GetModuleHandle
GetProcAddress
LoadLibrary
Process/Thread Management
CreateEvent
CreateProcess
CreateThread
DeleteCriticalSection
EnterCriticalSection
ExitProcess
ExitThread
GetCurrentProcess
GetCurrentThread
GetCurrentThreadID
GetExitCodeProcess
GetPriorityClass
GetProcessHeap
GetStartupInfo
GetThreadContext
GetThreadPriority
GetThreadSelectorEntry
InitializeCriticalSection
LeaveCriticalSection
OpenEvent
OpenThread
PulseEvent
ResetEvent
ResumeThread
SetEvent
SetPriorityClass
SetThreadContext
SetThreadPriority
TerminateProcess

WaitForSingleObject


About Semaphores


Semaphores are a way to allow threads to wait for several events to occur. The
TNTSEM implementation differs from true NT semaphores, but the ideas are the
same. Each semaphore has an associated count. A thread blocking on the
semaphore will not execute until the count is 0.
When you create a semaphore (using sem_create()), you specify its initial
count. You can modify the count using sem_signal(). When you want to wait for
the semaphore to return to 0, call sem_wait().
XPRUNE creates a TNTSEM for each directory it wants to delete. Every
subdirectory increments its parent's semaphore. When the semaphore returns to
0, XPRUNE can remove the directory. Without the semaphore, XPRUNE could try to
remove the directory before its child threads have erased all of the files in
it.
--A.W.
Products Mentioned

TNT 386DOS-Extender
Phar Lap Software Inc.
60 Aberdeen Avenue
Cambridge, MA 02138
617-661-1510

[LISTING ONE] (Text begins on page 74.)

/* XPRUNE.C -- (T)NT directory removal -- Williams */

#include <windows.h>
#include <stdlib.h>
#include <process.h>
#include <string.h>
#include "tntsem.h"

/* Arguments to thread function */
struct pkt
 {
 char *dir;
 TNT_SEM sem;
 };
void erase_all(struct pkt *);
void del(char *,WIN32_FIND_DATA *,TNT_SEM);
/* Delete file of directory */
void del(char *dir,WIN32_FIND_DATA *fd,TNT_SEM wait)
 {
 char path[MAX_PATH];
 strcpy(path,dir);
 strcat(path,fd->cFileName);
 if (fd->dwFileAttributes!=FILE_ATTRIBUTE_DIRECTORY)
 {
 if (!DeleteFile(path))
 {
 printf("Failed to delete file: %s\n",path);
 }
 }
 else
 {
/* Build arguments to new thread */
 struct pkt *packet=malloc(sizeof(struct pkt));
 if (fd->cFileName[0]=='.') return; /* skip . and .. */
 strcat(path,"\\");
 packet->dir=strdup(path);
 packet->sem=wait;
/* Bump semaphore count up by 1 */
 sem_signal(wait,1);

/* Launch new thread to delete subdirectory */
#ifndef SINGLE
 _beginthread(erase_all,8192,packet);
#else
 erase_all(packet);
#endif
 }
 }
void erase_all(struct pkt *packet)
 {
 char path[MAX_PATH];
 WIN32_FIND_DATA fd;
 HANDLE findhandle;
 BOOL found=TRUE;
 TNT_SEM wait;
/* Wait on subthreads before deleting directory */
 wait=sem_create(0);
 strcpy(path,packet->dir);
 strcat(path,"*.*");
/* Find all files and call del() */
 for (findhandle=FindFirstFile(path,&fd);
 found;
 found=FindNextFile(findhandle,&fd))
 {
 del(packet->dir,&fd,wait);
 }
/* Wait */
 sem_wait(wait,-1);
 sem_delete(wait);
/* Remove backslash */
 packet->dir[strlen(packet->dir)-1]='\0';
 if (!RemoveDirectory(packet->dir))
 {
 printf("Failed to remove directory: %s\n",packet->dir);
 if (GetLastError()==5)
 printf("Directory probably contains hidden or read only files.\n");
 };
/* Clean up malloc'd pointers */
 free(packet->dir);
 free(packet);
/* Signal parent thread that we are done */
 sem_signal(packet->sem,-1);
 }
main(int argc,char *argv[])
 {
 int i;
 struct pkt *p;
 TNT_SEM sem;
 char dir[MAX_PATH];
 if (argc>=2)
 {
 strcpy(dir,argv[1]);
 if (dir[strlen(dir)-1]!='\\') strcat(dir,"\\");
 p=malloc(sizeof(struct pkt));
 p->dir=strdup(dir);
 p->sem=sem=sem_create(1);
 erase_all(p);
 sem_wait(sem,-1);
 sem_delete(sem);

 }
 else
 {
 printf("XPRUNE by Al Williams\nUsage:\n"
 "XPRUNE directory_name\n\nRemoves directory and "
 "all files within it.");
 }
 }



[LISTING TWO]

/* TNT semaphores -- usable with NT, too. Al Williams */
#ifndef _TNT_SEM
#define _TNT_SEM

typedef struct _tnt_sem
 {
 int count; /* count */
 HANDLE event; /* semaphore wait event */
 CRITICAL_SECTION cs; /* controls access to count */
 } *TNT_SEM;
TNT_SEM sem_create(int count);
void sem_delete(TNT_SEM p);
int sem_signal(TNT_SEM p,int how);
int sem_wait(TNT_SEM p,int timeout);

#endif

[LISTING THREE]

/* TNT semaphores -- usable with NT, too. Al Williams */

#include <windows.h>
#include <stdlib.h>
#include "tntsem.h"

/* Create a semaphore */
TNT_SEM sem_create(int count)
 {
 TNT_SEM p=(TNT_SEM)malloc(sizeof(struct _tnt_sem));
 if (p)
 {
 p->count=count;
 p->event=CreateEvent(NULL,TRUE,count==0?TRUE:FALSE,NULL);
 if (!p->event)
 {
 free(p);
 p=NULL;
 }
 }
 InitializeCriticalSection(&p->cs);
 return p;
 }
/* Destroy semaphore */
void sem_delete(TNT_SEM p)
 {
 if (p&&p->event) CloseHandle(p->event);

 if (p) DeleteCriticalSection(&p->cs);
 free(p);
 }
/* Signal a semaphore
 how == -x ; decrement by x
 how == 0 ; read semaphore count
 how == x ; increment by x
*/
int sem_signal(TNT_SEM p,int how)
 {
 if (!p) return 0;
 EnterCriticalSection(&p->cs);
 p->count+=how;
 if (p->count)
 ResetEvent(p->event);
 else
 SetEvent(p->event);
 LeaveCriticalSection(&p->cs);
 return p->count;
 }
/* Wait for semaphore to reach zero count */
int sem_wait(TNT_SEM p,int timeout)
 {
 return WaitForSingleObject(p->event,timeout);
 }
End Listings




































February, 1994
PROGRAMMING PARADIGMS


Natural Language




Michael Swaine


I got dem I.O. blues. That's I.O., not I/O. I/O is input/ output and Io is a
satellite and "I owe" is motivation for working, but I.O. is an
ever-more-common syndrome in this Age of Information: Information Overload. I
suffer from it; maybe you do, too.
If you do, you know that we, the afflicted, get little sympathy, since
everybody thinks that we do this to ourselves. And maybe we do.


Info Gets Routed


Still, a lot of us who never went to either library school or trucking school
are wondering why we spend so much of our time moving information from one
location to another, figuring out where to stash the latest load of
information, tracking down information that got lost in the stacks or in the
information warehouse or out on the information highway. I'm sure that as we
build more capable agents to move our information down that information
highway they'll just get lost or stoned, make unscheduled stops, pick up
hitchhikers, unionize, and strike.
I had somehow, naively, imagined that the Age of Information would bring about
a different kind of work, a more intellectual labor; that soldiers would turn
into video-game players, longshoremen into poets, and ditchdiggers into
satellite engineers. Instead, it seems that the heavy lifting has just rolled
onto our LANs and floated into our wetware. It's still manual labor: manual
labor in the head.
And not all in the head, either. Let me tell you about my magazines.
I subscribe to, at last count, umpteen magazines, and the shelves on which
they reside fill the long wall of my office, floor to ceiling, plus one wall
of the spare room. Prior to their entombment, they hang on the 50 bars of a
large magazine rack in the living room, the overflow spilling onto end tables
and, occasionally, the floor, running ahead of my ability to keep up, like
corpses in the plague years.
But I read them all.
Understand, I'm a professional wordsmith and feel free to use the word "read"
in all its nuances simultaneously, from the somewhat superficial skim that I
give to that Software License Agreement just before I rip open the package, to
the impressive thoroughness that Zelda, our 11-week-old Labrador retriever,
brings to her scrutiny of any periodical that happens to spill off that end
table.
That said, I repeat: I read them all, or at least all that Zelda doesn't get
to first. That's why I feel justified in occasionally opening the locks and
letting some of this flood of information run off before it soaks into the
water table of my library stacks, if you'll forgive a sloppy metaphor that may
become more literal than I'd like if I don't get the office roof patched.
But I digress. This month, then, the locks are open.


Connections Get Mooted


I've learned to watch for the word "amid" in stories in the Wall Street
Journal.
It's one of those weasel words that fake profundity by allowing the writer to
seem to be saying more than he or she is actually prepared to say. The Wall
Street Journal doesn't have a lock on the use of "amid," but the word does
seem to crop up awfully often in economic reporting, for what you may agree,
when you see what I'm driving at, are obvious reasons. The context is
typically something like this:
Stocks tumbled as skittish investors bailed out of utility and financial
stocks amid fears that interest rates are headed higher. [Wall Street Journal,
November 4, 1993]
Notice that the sentence does not explicitly state any connection between the
fall of these particular stocks and the fears of unspecified persons regarding
interest rates, except that they occupied more or less the same time frame;
that is, they coincided. The strong implication, though, is that there is a
causal connection; otherwise what's the point of drawing attention to the
coincidence?
But you can be sure that the writer has no credible evidence for a causal
connection; otherwise, why not state it? "Amid" seems almost always to be a
signal that the writer is about to indulge in guesswork. The guesswork may be
eminently plausible, but that only makes it that much easier for readers to
overlook the fact that it's just guesswork.
My advice is: Watch out for those "amid"s.
What in the ever-lovin' blue-eyed world, you inquire, does any of this have to
do with the price of debuggers?
Trust me, I reassure, wading deeper into it.
There are more legitimate ways for writers to suggest a connection between
ideas without actually stating the connection. One is juxtaposition: placing
the ideas next to each other and letting the reader figure out the connection,
if any.
Harper's magazine does this extremely well in its "Harper's Index" feature. In
case you haven't seen the feature or any of the half-executed imitations of
it, it consists of a list of factoids, like this:
Chances that an unemployed European has not worked in more than a year: 1 in
2.
 Chances that an unemployed American has not worked in more than a year: 1 in
9. [Harper's, November 1993]
Harper's editors construct this list carefully so that there are connections
between adjacent factoids, but they leave the discovery of the
connection--causal, contrastive, ironic--to the reader.
I like "Harper's Index," and it reminds me of why I like the semicolon. I may
write an article about the semicolon some day; if I do, it will sound
something like this:
There are basically three ways to deal with relationships between ideas in
running prose (as opposed to formal structures like "Harper's Index" that
convey information in their structure). You can make the relationship
explicit, for example, by using relative conjunctions like "therefore" or
"nevertheless." Or you can leave it for the reader to discover the existence
and nature of the relationship, by simply putting each idea in its own
sentence. Or you can use a semicolon.
The first two choices, in my opinion, encourage passive reading. In one case
the reader is given the relationship; in the other the relationship can easily
be overlooked. Only the semicolon has the virtue of making the existence of a
relationship between two ideas explicit without hinting what that relationship
is. It signals plainly that there's something left unstated. It invites the
reader to examine the connection between two ideas. The semicolon engages the
reader; it makes prose more interactive.


Gabriel's Horn Gets Tooted


By jing, you persist, this writin' stuff ain't got jellybeans to do with
programmin'.
Well dog my cats, I remonstrate, if Dick Gabriel can get away with it, why
can't I, huh, once in a while? And get away with it Gabriel did, in the
October 1993 installment of his "Critic-at-Large" column in Journal of
Object-Oriented Programming. Not only that, he justified it.
Gabriel had just returned from a week-long nature poetry workshop at which he
was intrigued to hear poet Gary Snyder exhort would-be poets to "get the
science right."
Gabriel has long encouraged scientists and engineers to "get the writing
right." Most computer scientists, he complains, are persistent dilettantes.
Despite the fact that they spend a quarter to half their professional careers
writing, they do not approach it professionally, or seriously, and as a result
do not communicate their ideas effectively.
Gabriel gives his list of tips on how to improve your writing, ending with one
that his readers may find surprising: Start a writing workshop. Professional
writers know that writing workshops are the fastest way possible to get a lot
of useful feedback on your writing. They are exceptionally useful to most
beginning writers (and more writers than would like to admit it are
beginners).

A writer's workshop consists of a few to a couple dozen writers sitting in a
circle criticizing each other's writing. There are few rules, but the few are
important. Gabriel presents one set of rules, but there are others.
Science-fiction writers seem to be especially good at workshops; what is
called the "Clarion" model is excellent. In the Clarion model, each
participant reads and critiques each other participant's work, while the
victim remains silent until all criticisms are heard.
Gabriel suggests that computer scientists get together with colleagues in
related fields to hold workshops, critiquing papers before submitting them to
conferences. Then this year, he implies, there may not be, as there was last
year, a 91 percent rejection rate on OOPSLA submissions.
I think this is a very good idea. Is anybody out there doing it? Let me know.


Plauger's Language Gets Booted


On a higher linguistic plane, P.J. Plauger drew criticism from a computational
linguist in the November 1993 issue of C User's Journal.
Plauger had published an article on natural-language processing in the April
1993 issue of the magazine, and an interesting piece it was. Reader M. Boot, a
computational linguist by profession, took Plauger to task for the simplistic
level of the piece. His criticisms were, roughly:
1. The author uses the terminology of computational linguistics in the
article, but the associated code doesn't live up to the language of the
article.
2. The techniques demonstrated are 19 years out of date.
3. This is adventure-game linguistics.
Maybe Plauger should read Computational Models of American Speech by M.
Margaret Withgott and Francine R. Chen (University of Chicago Press, 1993),
which Jon Erickson reviewed in this journal in October 1993.
Maybe a lot of us should.
Boot's beefs don't mean that the C User's Journal piece wasn't interesting and
useful. In fact, Plauger, no dummy, found it interesting; he claims that his
readers did; and I admit that I did, too. So I, for one, am happy to judge it
a good article for its intended audience, but what about that audience? Are we
all ignorant?
In this one area, yeah, I suspect that we are. This is only a guess (maybe I
should work in an "amid"), but I suspect that the distance between academic
and commercial work in computational linguistics is greater than the
corresponding gap in a lot of other areas of computer science.
If true, doesn't that suggest an opportunity? Isn't it possible that
computational linguistics could be a fruitful area for the kitchen-table
software entrepreneur?
Granted, if the distance between academic and commercial work in computational
linguistics is greater than the corresponding gap in a lot of other areas of
computer science, it may be because computational linguistics is a lot harder
than a lot of other areas of computer science. But Fermat's Last Theorem was
hard, and its cracking last year just demonstrates that hard problems can
often be broken down into smaller, more manageable problems.
Maybe there are small advances to be made in computational linguistics that
are open to the kitchen-table programmer. And, not to be overlooked, maybe
these advances could become successful commercial products. Many
natural-language applications that do not require a complete model of the
English language.
Computational linguistics is an area of interest to me, but I'm sure M. Boot
would judge me to also be 19 years out of date. If any DDJ readers are doing
interesting work in this area, and are willing to talk about it, I'd love to
hear from you.


Negroponte Gets Hooted


In the November 1993, issue of New Media, editor-in-chief David Bunnell
ridiculed the idea that there is a convergence happening in the area of
multimedia, and passed along the intelligence that the word "convergence" was
invented by Nicholas Negroponte as a marketing gimmick for his MIT Media Lab.
Did Southern Pacific Railroad and U.S. Rubber merge to create the auto
industry, he asked, or G.E. team up with the Royal Shakespeare Company to
launch the movie industry?
Historically, new industries are created and dominated by new companies, and
Bunnell predicted that the multimedia heroes will be new companies, still in
the garage today.
Ah, you say, but the new industry of multimedia depends on content, and the
big companies are buying up all the content. But Bunnell also questions the
notion of repurposing existing content.
Nicholas Negroponte had his own say on the issue of repurposing in the
November 1993 issue of Wired. At least that's what I think he was talking
about. I honestly believe that Negroponte consciously tries to write like
Marshall McLuhan. I'd better let him speak for himself:
Modern multimedia_must include the automatic transcoding from one medium into
another, or the translation of a single representation into many media_. Books
that read themselves when you are dozing off, or movies that explain
themselves with text are good examples.
I don't know about you, but I'm encouraged. I've been writing this column all
along so that it would read itself if you fell asleep.


Issues Get Disputed


Also in that November issue of Wired, which is the first monthly issue, are an
interview with Alvin Toffler (touching on such Toffleresque predictions as the
breakup of China, a Constitutional crisis in the USA, a global revolt of the
rich, and niche wars with personal nukes) and a whole slew of what Wired calls
ides fortes and any other magazine would call "viewpoints."
One of these ides mused on the issue of the viability of copyright out on the
information highway. Another was billed as being about digital archaeology,
and darned if that wasn't a fair description. Can we assume that we are
leaving a readable record behind as we generate all this electronic data?
Anyone who can read German can read the first book ever printed, but I can't
read my Osborne 1 disks. What will information archaeologists of the future
make of our era, and on the basis of what data?
I cite these ides as evidence that discussion of the social implications of
technological change is alive in computer publications. But Wired is a special
case, and not actually written by or for the agents of that change.
There are magazines that are written by and for, et cetera. This one, for
example. And I observe with pride that the issues of several programmer's
magazines that I have before me do indeed touch on these social issues.
Here's the October/November issue of PC Techniques, in which editor-in-chief
Jeff Duntemann debates encryption legislation, drug policy, and crime control
with a reader. Here's November's Windows Tech Journal, in which Zack Urlocker
talks about copyright law. And as we know, Jon's editorials often delve into
the social consequences of technological change and of governmental reaction
(or lack thereof) to that change.
Two thoughts about this: 1. It's important, because ignorance is power, placed
in the hands of others. What you don't know can hurt you and what others don't
understand can, too; 2. the best such discussions tend to be among the most
technically knowledgeable. It's encouraging that the technical community is
thinking about these things, and it's a laugh in the face of the common view
that engineers and technologists don't consider the consequences of their
work.
Which brings us back to writing, since ideas poorly expressed are not well
understood. And it brings a chance for writer/editor/programmer P.J. Plauger
to redeem himself.
In his "State of the Art" column in the November 1993 issue of Embedded
Systems Programming, Plauger talks about the "other" interfaces of embedded
systems. Most products are designed to be easy for the daily user, he says.
But there are also the rare reconfiguration uses that may crop up monthly or
yearly, and these typically sport interfaces and documentation that are all
but unusable to anyone but a trained technician. Bad. As he puts it:
Any interface you provide that gets only occasional use had better do lots of
prompting. Favor menu-style choices over open-ended command sets that must be
memorized or looked up in a manual. Provide at least brief hints about what
each option actually means.
In other words, consider your audience. The ultimate practical advice for
writers and software developers.














February, 1994
C PROGRAMMING


D-Flat++ Editor Class




Al Stevens


The D-Flat++ library is close to being finished. I'm working on some
touch-ups--the clipboard, some common dialogs, and a help system--and there
are still a few small bugs underfoot. I'll have a new version ready for
download by the time you read this.
This month's topic is the Editor class, which implements a multiple-line text
editor. Last month I described the EditBox class, the single-line text-editor
control that dialog boxes use. The Editor class is derived from the EditBox
class, adding the things that a multiline editor needs, such as word wrapping.
The base TextBox class takes care of paging and scrolling. The EditBox class
is derived from the TextBox class. The TextBox class also handles marking
selected text blocks for clipboard operations.
Text editors are a favorite topic of conversation among programmers. To see
why, post a CompuServe message about your favorite editor. You will launch a
thread of many messages and many differing and unshakable opinions. The Middle
East peace accord is a strawberry festival compared to getting programmers to
agree about editors. Writers get similarly chauvinistic about their word
processors, although to a lesser extent. No matter which features you build
into an editor, they will be either not enough, too much, or implemented
incorrectly. The editor will be too big, too slow, and too much unlike the
[your editor's name here] program. That said, let's get on with the D-Flat++
editor.
Listings One and Two, page 123, are editor.h and editor.cpp, the source files
that declare and define the Editor class. The class adds three data members, a
tab-expansion value, a flag to indicate whether or not the editor is in
word-wrap mode, and an integer with the text document's row number. The base
EditBox class already takes care of the column.
The Editor class has member functions to handle moving the cursor up, down,
forward, and backward and to handle paging. These operations move the editor's
insertion pointer, which supports keyboard input as well as other functions.
They also change the display by paging and scrolling when necessary. The
EditBox class has forward and backward cursor operations, and the Editor class
uses them, but it adds logic to move the insertion pointer forward past the
end of a line to the next one and backward past the beginning of a line to the
previous one. The TextBox class handles paging, but the Editor class
intercepts the operations to update its insertion pointers.


Editor Tabs


The Editor class has private member functions to manage tabs in the text.
Readers were always asking why the D-Flat C library's editor doesn't handle
tab expansion, so I decided to build the feature into D-Flat++. An
understanding of what that required reveals why I didn't hasten to put the
feature in the C library.
It's not that easy to keep track of tabs in text while the text is in memory.
DF++ text classes store text as a null-terminated buffer of lines, each one
terminated by a newline character. Paragraphs are terminated by blank lines.
Each tab character in the text is followed by pseudotab characters that
represent the number of spaces to the next tab stop. The program computes the
number of such insertions, which depends on how far it is to the next tab.
That number is a function of both where in the text line the tab character
occurs and the width of the tabs themselves. The Paint function recognizes
these characters and displays them as spaces. The cursor-movement functions
make sure that the cursor does not land on one of the pseudotab expansion
positions on the screen. The delete-character functions delete the expansion
characters whenever the user deletes a tab character.
The tab-expansion logic gets hairy when the user inserts or deletes characters
on a line or reforms a paragraph. In the first case, the expansions for
subsequent tabs on the same line have to be adjusted. In the case of paragraph
reforms, which occur during word wrapping or by user command, the program
would have to adjust the tabs for the whole paragraph to maintain tab
integrity. For now, it simply "de-tabs" the paragraph, which means the tabs
and tab-insertion characters are replaced by spaces.
There are trade-offs between the way that the Editor class stores text and the
way that other editors do it. One technique stores paragraphs as
newline-terminated lines, word-wraps on the fly, and does tab expansion and
collapsing only when painting the screen. This technique involves more complex
text line and tab management and cursor movement than the one that I used. If
I were to build a serious word processor with D-Flat++ (not likely, I might
add) I'd probably build such a class just to get the performance improvements
it would offer. The DF++ Editor class is more than adequate for simple
text-editing applications, however, and uses simpler code than other editors.


The D-Flat++ Source Code


D-Flat and D-Flat++ are available to download from the DDJ Forum on CompuServe
and on the Internet by anonymous ftp. See page 3 in this magazine for details.
If you cannot get to one of the online sources, send a diskette and a stamped,
addressed mailer to me at Dr. Dobb's Journal, 411 Borel, San Mateo, CA 94402.
I'll send you a copy of the source code. It's free, but if you want to support
my Careware charity, include a dollar for the Brevard County Food Bank. They
help hungry and homeless citizens.


C++ Book Report


Practical C++ by Mark Terribile (McGraw-Hill, 1994) is addressed to the C
programmer who wants to learn C++ and to "write it well." The book is not a
tutorial. It isn't a reference book either. Nor is it a treatise on
programming issues. At different times it is all of those things, but never
only one of them, and certainly none of them comprehensively. If the work
seems to lack focus and organization, perhaps it does, but that is neither its
weakness nor its strength. It's just how things are. This is clearly the best
C++ book about the language itself that I've run across.
The book explains many of the more abstruse language features and programming
issues. For example, we used to have a clear understanding of the difference
between declaration and definition in our programming jargon because the two
were distinct. C++ muddies that distinction by allowing declarations to be
definitions and vice versa depending on the context in which a declarator is
used. It is not uncommon for declare and define to be used inconsistently and
incorrectly in C++ conversations and literature (mine included). Practical C++
has the best explanation of these terms that I've seen.
The author is apparently active in the ANSI and ISO C++ committees. The book
describes several proposed and/or pending changes to the language, including
some inventions of the committees. One example is called Namespace Management,
which organizes external names into named groups and introduces new keywords.
Another is the addition of const_cast<>, static_cast<>, reinterpret_cast<>,
and dynamic_cast<>, all used to invoke or sidestep a particular implicit
conversion. Just when you thought you knew it all.
Although targeted at the C programmer who is learning C++, a more appropriate
audience for this book might be C++ programmers who want to understand the
language better. It isn't an easy read, but you can select a subject at random
from the table of contents, open to the discussion, and almost always find
something new to learn, if only a different perspective on the topic. But you
need to understand the language before you start. Everything is covered, but
it takes some plowing through the text to find it all. You probably won't
start at the beginning and proceed to the end. If you are not already a C++
programmer with some object-oriented design knowledge, you'll get mired down
early on.
There are no programs in the book for you to type in and run. The code
consists of fragments that demonstrate the points under discussion. As such,
most of them would not compile except in the larger context of a program where
you used them. Probably the author did not always take the time to do that
because there are a few code errors. Some are due to the author's oversight.
Others are production problems, such as when the page layout word-wraps a
double-slash comment. Fortunately, an experienced reader will readily
recognize the code problems and make allowances.


The Great Debate: To Preprocess or Not


Much C++ literature tells us that C++ makes the C preprocessor obsolete. As a
rule, the preprocessor is to be avoided, except for #include and compile-time
directives that change the compiled output (to exclude debug code, for
example). Many authors argue that inline functions make macros unnecessary and
undesirable because they eliminate side effects and type-check the parameters.
So they do. The authors state further that const objects can replace #define
for associating values with global symbols. As with inline function parameter
lists, they contend, const objects include types and, therefore, enjoy
type-safety, too. So they do. (I always wondered why they call them "const
variables." How can a constant object be variable?)
Given all this warm, comfortable type-checking, most programmers
understandably resist a return to the old days of reckless C programming. They
conclude that something as bad as the typeless #define is never again
necessary.
But there are always exceptions. Sometimes you run into them just when you are
trying to do the right thing. Consider this familiar macro:

#define min(a,b) ((a)<(b)?(a):(b))

There can be side effects, as seen when you code this use of the macro:

a = min(b++, c++);

Depending on which value is greater, one of the variables is incremented
twice. C programmers learned early on not to write code that could generate
such side effects. Sometimes, though, you can't be sure. For example, some
compilers implement the ctype family of functions as macros; others build them
as functions.

C++ programmers solve the side-effect issue by using inline functions such as
this one:
inline int min(int a, int b)
{
 return (a < b ? a : b);
}

There are no side effects with the inline version of the macro/function.
Furthermore, there is type-checking, which is both good and bad. Type-checking
is inherently good. However, you can't use the inline min function unless both
argument types can be implicitly converted to integer values. Other types that
do not have conversion constructors to convert them to numeric types do not
work with this inline min function. You would need an overloaded min function
for each such type.
C++ solves the many-function problem with the template class, as shown here.

template<class T>
T min(T a, T b)
{
 return (a < b ? a : b);
}

Now, one little macro fits all types as long as they have the less-than
relational operator. You can code the following statements by using the same
template macro.
int a, b, c;
a = min(b, c);

String s1, s2;
s1 = min(s2, s3);

You don't save any room by using templates. Each different type usage
generates its own copy of the run-time code, but at least you don't have to
maintain several different source-code versions of the same macro.
The template solution is not perfect. The objects being compared must be of
the same type, regardless of implicit or programmed conversion rules. And
that's where the const vs. #define issue comes in.
I use the template version of the min macro in D-Flat++. It rose up and bit me
when I tried to use it with a const argument. The example editor program that
I use to test and demonstrate D-Flat++ derives its specific application class
from the DF++ generic Application class. Users can resize application windows,
and I needed to establish a minimum size so that the window didn't get too
small to hold its menu bar and other things. That's easy enough to do:
Intercept the Size message function and adjust its height and width parameters
before letting it pass. I coded it something like this:

const int MinimumWidth = 40;
Ted::Size(int x, int y)
{
 // _
 x = min(x, MinimumWidth);
}

The compiler rejected the usage. There is, it said, no min function that
expects an int and a const int as parameters. Casting the x argument to const
int didn't work either. The Borland compiler ignored the cast. I'm not sure
why, but that's what it did.
Incidentally, this discussion is based on how Borland C++ 3.1 works. Other
compilers exhibit different behavior, and I'll discuss them next month.
There are other solutions, one very simple one and some others that are
somewhat contrived. (Even if the cast had worked, it, too, would have been a
contrivance.) Assume for this discussion that the x variable needs to be
non-const. You could declare the MinimumWidth object as non-const, but then
you would have to put it into an executable source code file. You'd need an
extern declaration in the header to make it globally visible. You can assign
the const object to a non-const variable and use the non-const variable in the
min call. Or vice versa. You can return the const int value from a non-const
member function of the class and use that value. You can find many such
workarounds, each of which involves additional code. You can even rewrite the
template function like this:
template<class T, class U>
T min(T a, U b)
{
 return (a < b ? a : b);
}
On the surface, this would appear to be the most elegant of the contrivances,
and it works in some cases, but it can turn up some nasty side effects of its
own. Whichever type you code as the first argument in a call to this macro
becomes the type of the result. There could be some unwanted conversion
truncation as a result. For example, the following expression, when using that
macro, would assign the value 1 to the foo variable.

float foo = min(4, 1.23);

Reverse the arguments, and the expression returns 1.23. That's the kind of
inconsistent behavior that makes for deep-space, black-hole bugs.
Those are the contrived solutions, and you can use one of them. On the other
hand, you can choose the simple solution--the unsafe, old-fashioned, obsolete
one that you are not supposed to need or want anymore--and the one that works.
You can change the MinimumWidth declaration to this statement:

#define MinimumWidth 40

Sometimes a little serendipity just beats the pants off of a lot of
conventional wisdom.
[LISTING ONE] (Text begins on page 97.)

// -------- editor.h

#ifndef EDITOR_H
#define EDITOR_H

#include "editbox.h"


const unsigned char Ptab = \t'+0x80; // pseudo tab expansion
#define MaxTab 12 // maximum tab width

class Editor : public EditBox {
 int tabs; // tab expansion value
 Bool wordwrapmode; // True = wrap words
 int row; // Current row
 void OpenWindow();
 void ScrollCursor();
 void AdjustCursorTabs(int key = 0);
 void ExtendBlock(int x, int y);
 void InsertTab();
 void AdjustTabInsert();
 void AdjustTabDelete();
 void WordWrap();
 Bool AtBufferStart()
 { return (Bool) (column == 0 && row == 0); }
protected:
 virtual void Upward();
 virtual void Downward();
 virtual void Forward();
 virtual void Backward();
 virtual void DeleteCharacter();
 virtual void BeginDocument();
 virtual void EndDocument();
 virtual Bool PageUp();
 virtual Bool PageDown();
 virtual void InsertCharacter(int key);
 virtual void PaintCurrentLine()
 { WriteTextLine(row); }
 virtual Bool ResetCursor();
 virtual void WriteString(String &ln,
 int x, int y, int fg, int bg);
 virtual void LeftButton(int mx, int my);
public:
 Editor(const char *ttl, int lf, int tp, int ht, int wd,
 DFWindow *par=0)
 : EditBox(ttl, lf, tp, ht, wd, par)
 { OpenWindow(); }
 Editor(const char *ttl, int ht, int wd, DFWindow *par=0)
 : EditBox(ttl, ht, wd, par)
 { OpenWindow(); }
 Editor(int lf, int tp, int ht, int wd, DFWindow *par=0)
 : EditBox(lf, tp, ht, wd, par)
 { OpenWindow(); }
 Editor(int ht, int wd, DFWindow *par=0)
 : EditBox(ht, wd, par)
 { OpenWindow(); }
 Editor(const char *ttl) : EditBox(ttl)
 { OpenWindow(); }
 virtual void Keyboard(int key);
 virtual unsigned char CurrentChar()
 { return *(TextLine(row) + column); }
 virtual unsigned CurrentCharPosition()
 { return (unsigned)
 ((const char *)
 (TextLine(row)+column) - (const char *) *text);
 }

 virtual void FormParagraph();
 virtual void AddText(const String& txt);
 virtual const String GetText();
 virtual void ClearText();
 virtual int GetRow() const { return row; }
 int Tabs()
 { return tabs; }
 void SetTabs(int t);
 Bool WordWrapMode()
 { return wordwrapmode; }
 void SetWordWrapMode(Bool wmode)
 { wordwrapmode = wmode; }
 virtual void DeleteSelectedText();
 virtual void InsertText(const String& txt);
};

#endif

[LISTING TWO]

// ----- editor.cpp

#include "editor.h"
#include "desktop.h"

// ----------- common constructor code
void Editor::OpenWindow()
{
 windowtype = EditorWindow;
 row = 0;
 tabs = 4;
 insertmode = desktop.keyboard().InsertMode();
 wordwrapmode = True;
 DblBorder = False;
}
// ---- keep the cursor out of tabbed space
void Editor::AdjustCursorTabs(int key)
{
 while (CurrentChar() == Ptab)
 key == FWD ? column++ : --column;
 ResetCursor();
}
// -------- process keystrokes
void Editor::Keyboard(int key)
{
 int svwtop = wtop;
 int svwleft = wleft;
 switch (key) {
 case \t':
 InsertTab();
 BuildTextPointers();
 PaintCurrentLine();
 ResetCursor();
 break;
 case ALT_P:
 FormParagraph();
 break;
 case UP:
 Upward();

 TestMarking();
 break;
 case DN:
 Downward();
 TestMarking();
 break;
 case CTRL_HOME:
 BeginDocument();
 TestMarking();
 break;
 case CTRL_END:
 EndDocument();
 TestMarking();
 break;
 case \r':
 InsertCharacter(\n');
 BuildTextPointers();
 ResetCursor();
 Paint();
 break;
 case DEL:
 case RUBOUT:
 visible = False;
 EditBox::Keyboard(key);
 visible = True;
 BuildTextPointers();
 PaintCurrentLine();
 ResetCursor();
 break;
 default:
 EditBox::Keyboard(key);
 break;
 }
 if (svwtop != wtop svwleft != wleft)
 Paint();
}
// --- move the cursor forward one character
void Editor::Forward()
{
 if (CurrentChar()) {
 if (CurrentChar() == \n') {
 Home();
 Downward();
 }
 else
 EditBox::Forward();
 AdjustCursorTabs(FWD);
 }
}
// --- move the cursor back one character
void Editor::Backward()
{
 if (column)
 EditBox::Backward();
 else if (row) {
 Upward();
 End();
 }
 AdjustCursorTabs();

}
// ---- if cursor moves out of the window, scroll
void Editor::ScrollCursor()
{
 if (column < wleft column >= wleft + ClientWidth()) {
 wleft = column;
 Paint();
 }
}
// --- move the cursor up one line
void Editor::Upward()
{
 if (row) {
 if (row == wtop)
 ScrollDown();
 --row;
 AdjustCursorTabs();
 ScrollCursor();
 }
}
// --- move the cursor down one line
void Editor::Downward()
{
 if (row < wlines) {
 if (row == wtop + ClientHeight() - 1)
 ScrollUp();
 row++;
 AdjustCursorTabs();
 ScrollCursor();
 }
}
// --- move the cursor to the beginning of the document
void Editor::BeginDocument()
{
 row = 0;
 wtop = 0;
 EditBox::Home();
 AdjustCursorTabs();
}
// --- move the cursor to the end of the document
void Editor::EndDocument()
{
 TextBox::End();
 row = wlines-1;
 End();
 AdjustCursorTabs();
}
// --- keep cursor in the window, in text and out of empty space
Bool Editor::ResetCursor()
{
 KeepInText(column, row);
 if (EditBox::ResetCursor()) {
 if (!(row >= wtop && row < wtop+ClientHeight())) {
 desktop.cursor().Hide();
 return False;
 }
 }
 return True;
}

// ------- page up one screenfull
Bool Editor::PageUp()
{
 if (wlines) {
 row -= ClientHeight();
 if (row < 0)
 row = 0;
 EditBox::PageUp();
 AdjustCursorTabs();
 return True;
 }
 return False;
}
// ------- page down one screenfull
Bool Editor::PageDown()
{
 if (wlines) {
 row += ClientHeight();
 if (row >= wlines)
 row = wlines-1;
 EditBox::PageDown();
 AdjustCursorTabs();
 return True;
 }
 return False;
}
// --- insert a tab into the edit buffer
void Editor::InsertTab()
{
 visible = False;
 if (insertmode) {
 EditBox::InsertCharacter(\t');
 while ((column % tabs) != 0)
 EditBox::InsertCharacter(Ptab);
 }
 else
 do
 Forward();
 while ((column % tabs) != 0);
 visible = True;
}
// --- When inserting char, adjust next following tab, same line
void Editor::AdjustTabInsert()
{
 visible = False;
 // ---- test if there is a tab beyond this character
 int savecol = column;
 while (CurrentChar() && CurrentChar() != \n') {
 if (CurrentChar() == \t') {
 column++;
 if (CurrentChar() == Ptab)
 EditBox::DeleteCharacter();
 else
 for (int i = 0; i < tabs-1; i++)
 EditBox::InsertCharacter(Ptab);
 break;
 }
 column++;
 }

 column = savecol;
 visible = True;
}
// --- test for wrappable word and wrap it
void Editor::WordWrap()
{
 // --- test for word wrap
 int len = LineLength(row);
 int wd = ClientWidth()-1;
 if (len >= wd) {
 const char *cp = TextLine(row);
 char ch = *(cp + wd);
 // --- test words beyond right margin
 if (len > wd (ch && ch !=   && ch != \n')) {
 // --- test typing in last word in window's line
 const char *cw = cp + wd;
 cp += column;
 while (cw > cp) {
 if (*cw ==  )
 break;
 --cw;
 }
 int newcol = 0;
 if (cw <= cp) {
 // --- user was typing last word on line
 // --- find beginning of the word
 const char *cp1 = TextLine(row);
 const char *cw1 = cw;
 while (*cw1 !=   && cw1 > cp1)
 --cw1, newcol++;
 wleft = 0;
 }
 FormParagraph();
 if (cw <= cp) {
 // --- user was typing last word on line
 column = newcol;
 if (cw == cp)
 --column;
 row++;
 if (row - wtop >= ClientHeight())
 ScrollUp();
 ResetCursor();
 }
 }
 }
}
// --- insert a character at the current cursor position
void Editor::InsertCharacter(int key)
{
 if (insertmode) {
 if (key != \n')
 AdjustTabInsert();
 }
 else if (CurrentChar() == \t') {
 // --- overtyping a tab
 visible = False;
 column++;
 while (CurrentChar() == Ptab)
 EditBox::DeleteCharacter();

 --column;
 }
 visible = False;
 EditBox::InsertCharacter(key);
 visible = True;
 ResetCursor();
 if (wordwrapmode)
 WordWrap();
}
// --- When deleting char, adjust next following tab, same line
void Editor::AdjustTabDelete()
{
 visible = False;
 // ---- test if there is a tab beyond this character
 int savecol = column;
 while (CurrentChar() && CurrentChar() != \n') {
 if (CurrentChar() == \t') {
 column++;
 // --- count pseudo tabs
 int pct = 0;
 while (CurrentChar() == Ptab)
 pct++, column++;
 if (pct == tabs-1) {
 column -= tabs-1;
 for (int i = 0; i < tabs-1; i++)
 EditBox::DeleteCharacter();
 }
 else
 EditBox::InsertCharacter(Ptab);
 break;
 }
 column++;
 }
 column = savecol;
 visible = True;
}
// --- delete the character at the current cursor position
void Editor::DeleteCharacter()
{
 if (CurrentChar() == \0')
 return;
 if (insertmode)
 AdjustTabDelete();
 if (CurrentChar() == \t') {
 // --- deleting a tab
 EditBox::DeleteCharacter();
 while (CurrentChar() == Ptab)
 EditBox::DeleteCharacter();
 return;
 }
 const char *cp = TextLine(row);
 const char *cw = cp + column;
 const Bool delnewline = (Bool) (*cw == \n');
 const Bool reform = (Bool) (delnewline && *(cw+1) != \n');
 const Bool lastnewline =
 (Bool) (delnewline && *(cw+1) == \0');
 int newcol = 0;
 if (reform && !lastnewline) {
 // --- user is deleting /n, find beginning of last word

 while (*--cw !=   && cw > cp)
 newcol++;
 }
 EditBox::DeleteCharacter();
 if (lastnewline)
 return;
 if (delnewline && !reform) {
 // --- user deleted a blank line
 visible = True;
 BuildTextPointers();
 Paint();
 return;
 }
 if (wordwrapmode && reform) {
 // --- user deleted /n
 wleft = 0;
 FormParagraph();
 if (CurrentChar() == \n') {
 // ---- joined the last word with next line's
 // first word and then wrapped the result
 column = newcol;
 row++;
 }
 }
}
// --- form a paragraph from the current cursor position
// through one line before the next blank line or end of text
void Editor::FormParagraph()
{
 int BegCol, FirstLine;
 const char *blkBegLine, *blkEndLine, *blkBeg;

 // ---- forming paragraph from cursor position
 FirstLine = wtop + row;
 blkBegLine = blkEndLine = TextLine(row);
 if ((BegCol = column) >= ClientWidth())
 BegCol = 0;
 // ---- locate the end of the paragraph
 while (*blkEndLine) {
 Bool blank = True;
 const char *BlankLine = blkEndLine;
 // --- blank line marks end of paragraph
 while (*blkEndLine && *blkEndLine != \n') {
 if (*blkEndLine !=  )
 blank = False;
 blkEndLine++;
 }
 if (blank) {
 blkEndLine = BlankLine;
 break;
 }
 if (*blkEndLine)
 blkEndLine++;
 }
 if (blkEndLine == blkBegLine) {
 visible = True;
 Downward();
 return;
 }

 if (*blkEndLine == \0')
 --blkEndLine;
 if (*blkEndLine == \n')
 --blkEndLine;
 // --- change newlines, tabs, and tab expansions to spaces
 blkBeg = blkBegLine;
 while (blkBeg < blkEndLine) {
 if (*blkBeg == \n' ((*blkBeg) & 0x7f) == \t') {
 int off = blkBeg - (const char *)*text;
 (*text)[off] =  ;
 }
 blkBeg++;
 }
 // ---- insert newlines at new margin boundaries
 blkBeg = blkBegLine;
 while (blkBegLine < blkEndLine) {
 blkBegLine++;
 if ((int)(blkBegLine - blkBeg) == ClientWidth()-1) {
 while (*blkBegLine !=   && blkBegLine > blkBeg)
 --blkBegLine;
 if (*blkBegLine !=  ) {
 blkBegLine = strchr(blkBegLine,  );
 if (blkBegLine == NULL 
 blkBegLine >= blkEndLine)
 break;
 }
 int off = blkBegLine - (const char *)*text;
 (*text)[off] = \n';
 blkBeg = blkBegLine+1;
 }
 }
 BuildTextPointers();
 changed = True;
 // --- put cursor back at beginning
 column = BegCol;
 if (FirstLine < wtop)
 wtop = FirstLine;
 row = FirstLine - wtop;
 visible = True;
 Paint();
 ResetCursor();
}
// --------- add a line of text to the editor textbox
void Editor::AddText(const String& txt)
{
 // --- compute the buffer size based on tabs in the text
 const char *tp = txt;
 int x = 0;
 int sz = 0;
 while (*tp) {
 if (*tp == \t') {
 // --- tab, adjust the buffer length
 int sps = Tabs() - (x % Tabs());
 sz += sps;
 x += sps;
 }
 else {
 // --- not a tab, count the character
 sz++;

 x++;
 }
 if (*tp == \n')
 x = 0; // newline, reset x
 tp++;
 }
 // --- allocate a buffer
 char *ep = new char[sz];
 // --- detab the input file
 tp = txt;
 char *ttp = ep;
 x = 0;
 while (*tp) {
 // --- put the character (\t, too) into the buffer
 *ttp++ = *tp;
 x++;
 // --- expand tab into \t and expansions (\t + 0x80)
 if (*tp == \t')
 while ((x % Tabs()) != 0)
 *ttp++ = Ptab, x++;
 else if (*tp == \n')
 x = 0;
 tp++;
 }
 *ttp = \0';
 // ---- add the text to the editor window
 EditBox::AddText(String(ep));
}
// ------- retrieve editor text collapsing tabs
const String Editor::GetText()
{
 char *tx = new char[text->Strlen()+1];
 const char *tp = (const char *) *text;
 char *nt = tx;
 while (*tp) {
 if (*(const unsigned char *)tp != Ptab)
 *tx++ = *tp;
 tp++;
 }
 *tx = \0';
 String temp(nt);
 return nt;
}
// --- write a string to the editor window
void Editor::WriteString(String &ln,int x,int y,int fg,int bg)
{
 String nln(ln.Strlen());
 int ch;
 for (int i = 0; i < ln.Strlen(); i++) {
 ch = ln[i];
 nln[i] = (ch & 0x7f) == \t' ?   : ch;
 }
 EditBox::WriteString(nln, x, y, fg, bg);
}

// ---- left mouse button pressed
void Editor::LeftButton(int mx, int my)
{
 if (ClientRect().Inside(mx, my)) {

 column = mx-ClientLeft()+wleft;
 row = my-ClientTop()+wtop;
 ResetCursor();
 }
 TextBox::LeftButton(mx, my);
}
// --------- clear the text from the editor window
void Editor::ClearText()
{
 row = 0;
 EditBox::ClearText();
}
// ------ extend the marked block
void Editor::ExtendBlock(int x, int y)
{
 row = y;
 EditBox::ExtendBlock(x, y);
}
// ---- delete a marked block
void Editor::DeleteSelectedText()
{
 if (TextBlockMarked()) {
 row = BlkBegLine;
 column = BlkBegCol;
 if (row < wtop row >= wtop + ClientHeight())
 wtop = row;
 if (column < wleft column >= wleft + ClientWidth())
 wleft = column;
 EditBox::DeleteSelectedText();
 FormParagraph();
 }
}
// ---- insert a string into the text
void Editor::InsertText(const String& txt)
{
 EditBox::InsertText(txt);
 FormParagraph();
}
// ---- set tab width
void Editor::SetTabs(int t)
{
 if (t && tabs != t && t <= MaxTab) {
 tabs = t;
 if (text != 0) {
 String ln;
 // ------- retab the text
 for (int lno = 0; lno < wlines; lno++) {
 // --- retrieve a line at a time
 ln = ExtractTextLine(lno);
 int len = ln.Strlen();
 String newln(len * t);
 // --- copy string, collapsing old tabs
 // and expanding new tabs
 unsigned int ch;
 for (int x2 = 0, x1 = 0; x2 < len; x2++) {
 ch = ln[x2] & 0xff;
 if (ch == Ptab)
 // --- collapse old tab expansion
 continue;

 // --- copy text
 newln[x1++] = ch;
 if (ch == \t')
 // --- expand new tabs
 while ((x1 % t) != 0)
 newln[x1++] = Ptab;
 }
 newln[x1] = \0';
 newln += "\n";

 // --- compute left segment length
 unsigned seg1 = (unsigned)
 (TextLine(lno) - (const char *) *text);
 // --- compute right segment length
 unsigned seg2 = 0;
 if (lno < wlines+1)
 seg2 = (unsigned) text->Strlen() -
 (TextLine(lno+1) - (const char *) *text);
 // --- rebuild the text from the three parts
 String lft = text->left(seg1);
 String rht = text->right(seg2);
 *text = lft+newln+rht;
 BuildTextPointers();
 }
 Paint();
 }
 }
}
End Listings

































February, 1994
ALGORITHM ALLEY


Algorithm Grab Bag




Tom Swan


What I like best about writing this column is hearing from many readers who
have suggestions, tips, and sample code to contribute. Over the past several
months, I've collected a grab bag of algorithms and comments passed along by
"Algorithm Alley" readers. So, this month, it's time to clean house (or disk,
that is) and share some mail I've received on past columns.


Aibohphobia


In November's column on palindrome text encryption, I questioned the value of
an encryption algorithm for which there's no recovery method. Jeff Pipkins
reminded me that such algorithms may be put to good use after all. He writes:
An irreversible encryption method can be used to encrypt passwords before
storing them in a file. It's mathematically impossible to figure out the
password, given only the ciphertext. You merely encrypt passwords entered by
users and compare the results against the encrypted words stored in the file.
Since the algorithm that checks a password for validity always has the
encrypted passwords, it's never necessary to decode the text.
There's a flaw in this logic, however. Since the encryption isn't reversible,
the encryption function is not one-to-one. That means there's necessarily more
than one password that will produce identical encryptions. A hacker doesn't
have to figure out a "real" password; it's only necessary to determine one of
the many "valid" entries for any given account.
My favorite palindromes, by the way, are "Flee to me, remote elf" and "A man,
a plan, a canal--Panama!" I'm an adjunct to a local college where I sometimes
teach a course in C. One of my assignments is to write a function that detects
palindromes. Before the assignment is complete, some of the students develop
aibohphobia, "an irrational fear of palindromes."


Loss Leader


Mort Bernstein, who wrote on that same topic, offered this explanation of why
a decryption function was unlikely to be found:
I was fascinated by your description of palindrome encryption, particularly
the fact that you doubted the existence of an equivalent decryption method.
So, I wrote a small C program (see Listing One, PALTEST.C, page 127) to see
what I could learn. The program's output explains why you can't decrypt the
text. In the test, the encryption function maps 676 possible inputs (Aa
through Zz) onto 51 outputs. The second table from the output is the frequency
of each of the 51 outputs. BB, for example, occurs 26 times.
To generalize on your observation that ABCD produces the encrypted result A5
A5 A5 A5, any alphabetically sequential input AB through ABC_XYZ always
produces an output string in which all of the characters are the same. Thus
ABCDEFG results in A8 A8 A8 A8 A8 A8 A8 and ABC_XYZ results in a string of 26
BBs. This means it is highly probable that two different inputs of the same
length can result in the exact same output. That's real information loss! It
would be interesting to find [the algorithm for] a set of inputs that result
in the same output. Perhaps some of your readers would like to give it a try.


Open-ended Searching


Dean Gienger wrote to ask whether I knew of a method for performing a binary
search on a sorted database of unknown length, containing variable-size
records. The question is interesting because, using the classic binary-search
algorithm in which data is repeatedly divided into chunks until the target key
is found, it is necessary to know how many records the database contains.
Before I could come up with a possible answer, however, Dean wrote back with
his own solution. If any readers know of other algorithms for open-ended
binary searching, please let me know. Dean writes:
I've found an interesting problem you may wish to present to your readers. The
problem is, given a sorted list L in which the number of records is unknown, I
need a function get_record(index,key) that returns true if an indexed record
exists, and also returns that record's key. Assume the list contains
variable-length records in a disk file, meaning you can't read to the end of
file to determine the

number of records it contains. FINDKEY.TXT (Listing Two, page 127) shows some
sample code I developed to investigate possible answers. I need to search an
ordered list stored on disk along with a sparse index (roughly a pointer to
every 2000 records), so the code doesn't have to search every record
sequentially.


Sample Bug


Alexander J. Oss took me to task (rightly) for a bug in Algorithm #12,
Selection Sampling, from the October 1993 column. He explains:
The problem [with the code] is that the distribution of odds is incorrect. In
this case, it's a relatively minor mistake (one of those "off-by-one" errors).
Try selecting three of four records. If your stopping condition is that the
number to select has been selected, you will always pick the first three, the
likelihood of which should be 0.25. If your stopping condition is end of file,
you will always pick all four (even though you said you only wanted three).
When implemented correctly, the algorithm should work properly with either
stopping condition--that is, after the correct number of records has been
selected, the likelihood of choosing more should be 0.
There are actually two errors in the original algorithm, and they are repaired
and reprinted in Example 1. The While loop must test for two nonterminating
conditions: that the number of selected records is less than the number
requested, and that the end of file has not been reached. A more serious
mistake occurs in the If statement. Because r is a random number between 0 and
1 (but never equal to 1), if no records have been selected, it is not possible
for the next record to be skipped. Selecting nine out of ten records,
therefore, always causes the first record to be selected, resulting in the
same nine selections on every run. In order to select a different set of nine
out of ten, the corrected program uses M+1 rather than M in the If statement
control expression.


Windows Bitmap Compression


Neil Galarneau questioned the necessity of my bitmap-compression utility in
the August 1993 column. He writes, "I assume that when Windows reads a
compressed bitmap into memory, it decompresses the image, thus compression
saves only disk space, not memory. Do you agree?"
Sorry, no. Strangely, Windows does not recognize its own bitmap-compression
algorithm. For example, the Paintbrush utility cannot read a compressed
bitmap, nor can it create one. Bitmap compression is the responsibility of the
output device driver to which the bitmap is passed. Bitmap-compression
routines are not built into Windows, and therefore, you need separate
compression and decompression utilities, or a compatible driver, for reading
and writing compressed .BMP files. Unfortunately, not all output device
drivers handle compressed bitmaps correctly.


More on Bitmaps



Waldeck Schutzer from Brazil sent the following comments and suggested repair
to the BPACK program in that same article on Windows bitmaps. He writes:

I'm a fourth-year mathematics student at the University of S~ao Paulo, and my
interests include data compression, digital image processing, cryptography,
and correlates. Since 1988 I've been a Dr. Dobb's reader, and I enjoy its fine
articles.
The length of a line in a Windows bitmap file is not even, but is a multiple
of four. I tested the BPACK program with a scanned image of 382x400 and the
output was damaged by the loss of two bytes at the end of each line. After
making the following adjustment, the program worked correctly. In function
WriteBitmapBits, replace Example 2(a) with 2(b).
The expression, (4--np&3), is equivalent to (4--np)%4, and produces the extra
length value needed to complete a double word. By the way, I'm seeking
documentation for the TIFF file format. Can you help?
Steve Rimmer's book, Bit-Mapped Graphics, Second Edition,
(Windcrest/McGraw-Hill, 1993) explains the TIFF (Tagged Image File Format)
layout and includes sample code in C. I highly recommend the book to anyone
who wants to know more about TIFF files and other graphical image formats.


Your Turn


That's it. My grab bag runneth empty. Send me your algorithms in care of DDJ,
or send CompuServe mail to 73627,3241. By the way, I occasionally have trouble
with my CompuServe connection (must have something to do with the fact that
I've moved aboard a sailboat, and data communications offshore are iffy at
best). If you sent a message and haven't received a reply, try again, or send
mail to me at DDJ. Until someone comes up with a reliable and affordable
method for mobile data communications, I'm temporarily offline. Am I the only
programmer who wants to get away from it all but still have e-mail?

Example 1: Algorithm #12, Selection Sampling (rev. 1.01).
const
 M=1000; { Input records }
 N=128; { Subset (N<=M) }
var
 requested,
 examined,
 selected: Integer;
 r: Real;
begin
 requestedN;
 examined0;
 selected0;
 while (selected<requested) and
(not EOF) do
 begin
 examinedexamined+1;
 rRandom;
 if (M+1--examined)*r
 >=(requested--selected)
 then skip next input record
 else begin
 selectedselected+1;
 use next input record
 end
 end
end.
Example 2: Correcting Windows bitmap compression.
(a) if (Odd(np))
 slSize=np+1;
 else
 slSize=np;
(b) slSize=np+(4--np&3);

[LISTING ONE] (Text begins on page 103.)

/****************************************************************************
This program computes all possible inputs to a Palindromic encoder and the
frequency of all of the possible encrypted results. (C) 1993 Mort Bernstein
****************************************************************************/
#include <stdio.h>

int i, j, k = 0;
unsigned char U[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
unsigned char L[] = "abcdefghijklmnopqrstuvwxyz";

unsigned char F[256], t;

void main(void)
{
 for(i = 0; i < 256; i++) F[i] = 0;
 printf(" ");
 for(j = 0; j < 26; j++) printf(" %c", L[j]);
 printf("\n");
 for(i = 0; i < 26; i++)
 {
 printf("%c", U[i]);
 for(j = 0; j < 26; j++)
 {
 t = (U[i] + L[j]);
 F[(int)t] += 1;
 printf(" %02X", (unsigned char)t);
 }
 printf("\n");
 }
 printf("\n");
 k = 0;
 for(i = 0; i < 256; i++)
 {
 if(F[i] == 0) continue;
 printf("[%02X]%3u", i, F[i]);
 if(++k > 9)
 {
 printf("\n");
 k = 0;
 }
 else printf(" ");
 }
}

[LISTING TWO]

!==============================================================================
! Find key record in sorted list of unknown length.
! Dean Gienger, 1993
!==============================================================================

integer FIND_KEY(the_key)
!==============================================================================
! Perform a modified binary search (since we don't know where end of list is!)
! Using step values of 1,2,4,8,16,... until we pass the value or pass the end
! of the list. When we pass value or get to end of list, start cutting the
step
! in half. Return 0 if key not found, else place in list where key is found.
!==============================================================================

integer lo,hi,step,maximum_hi

lo = 1
hi = 1
step = 1
maximum_hi = MAX_INT

! Get first record and bail out if there isn't one
record = GET_RECORD(lo)
IF record.number = 0 THEN RETURN 0


! If desired key is before first message, bail out
IF the_key < record.key THEN RETURN 0

LOOP
 ! fetch record [hi]
 record = GET_RECORD(hi)
 ! double the step
 step = step+step
 ! if there is no such message or we passed the desired key, step back
 IF (record.number = 0) OR (the_key < record.key) THEN
 step = step/4
 IF step = 0 THEN RETURN 0 ! either we found it or it's beyond eof
 maximum_hi = hi
 hi = lo
 ENDIF
 ! choose next probe point (hi)
 lo = hi
 hi = hi + step
 ! don't go beyond the highest reasonable boundary known to date
 IF hi > maximum_hi THEN
 hi = maximum_hi
 ! cut step back to reasonable value
 LOOP
 WHILE (lo+step) > hi
 step = step/2
 ENDLOOP
 ENDIF
 ENDLOOP
RETURN lo
END PROCEDURE
End Listings






























February, 1994
UNDOCUMENTED CORNER


The Windows 3.1 Virtual Machine Control Block Part 2




Kelly Zytaruk


Kelly graduated with a Bachelor of Science in Electrical Engineering and
Computers from the University of Waterloo in Ontario, Canada. He has spent the
last ten years programming in C and assembler on Intel-based machines. Most
recently he has worked on peripheral hardware design and virtual device
drivers.




Introduction




by Andrew Schulman


For the past few months, this column has focused on the Virtual Machine
Manager (VMM) and Virtual Device Drivers (VxDs) in Windows 3.1 Enhanced mode.
Windows and DOS programmers pay too little attention to this low-level aspect
of Windows. As I've been noting for the past few months, VMM and VxDs will be
the basis for most of Microsoft's key additions to Windows 4 and DOS 7, also
known as "Chicago." Microsoft is extending the WIN386.EXE file from Windows
3.1 Enhanced mode to create the DOS386.EXE file, which is the basis for
Chicago. Indeed, VxDs and VMM (which resides in WIN386.EXE and DOS386.EXE) are
the Chicago operating system.
Since the December 1993 column, a number of readers have asked me why I said
that "What TSRs were in the mid-eighties, VxDs will be in the mid-nineties."
Let me elaborate. First, Microsoft wants all real-mode DOS device drivers and
TSRs to be replaced with VxDs. I recently received a form letter from
Microsoft's Hardware Vendor Relations group which underlines this: "In future
versions of Microsoft Windows operating systems, the preferred device driver
will be a protect mode VxD (Virtual Device Driver) because it will be able to
offer multi-threading, asynchronous I/O, dynamic loading, and other benefits."
So VxDs are much more "Chicago friendly" than real-mode drivers or TSRs.
Second, since at its lowest levels the Chicago operating system is really VMM
plus a collection of VxDs, clearly this is the place for adventurous
programmers to do the sort of "impossible" things that TSRs were once known
for.
Before I let Kelly Zytaruk get on with Part 2 of his article on the Windows
Virtual Machine Control Block (VMCB), I need to make a number of points about
Chicago and clarify some issues from the previous two columns.
To reiterate a point that Kelly made last month, the VMCB structure described
here is slightly different from that found in the debug version of Windows
3.1, and totally different from the VMCB in Chicago. For example, in Chicago
VMCB+10h is a dword signature 62634D56h (VMcb'). The new thread-control block
has a dword signature 42434854h (THCB') at offset 0. Why didn't Microsoft put
the VMcb' signature at offset 0 in the VMCB too? Because there are documented
fields there! Offset 10h is the first undocumented location in the VMCB, and
hence the first place it is safe to change and put the signature. This shows
that, if you rely on obscure undocumented fields in one version of an
operating system, all bets are off in the next version!
Last month, I said that in Chicago "offset 0 in the VMCB now appears to hold
the initial thread handle." How incredibly stupid! Offset 0 in the VMCB is a
documented field (see VMM.INC) and is not changing. I have no idea where I got
this bizarre notion about VMCB+0; of course, it's still CB_VM_Status.
Because VMM in Chicago provides a THCB in addition to the old VMCB, I have
improved my PROTDUMP utility from last month so that it now shows threads as
well as VMs; this update is available electronically (see "Availability," page
3). PROTDUMP uses five new VMM functions to display all the threads for each
VM:Get_Cur_Thread_Handle, 10108h;Get_Sys_Thread_Handle,
1010Ah;Get_Initial_Thread_Handle, 1010Dh;Get_VM_Handle_For_Thread, 10111h;
andGet_Next_Thread_Handle, 10113h.
Another change to PROTDUMP involves its ability to look at protected-mode
selector:offset addresses in other VMs. As discussed last month, this requires
the -LDT selector stored at VMCB+114h. However, in the current prerelease of
Chicago, the LDT is located at VMCB+5Ch. Again, you see how unreliable
undocumented interfaces can sometimes be! I changed PROTDUMP so that, under
Windows 4 and higher, it tries to get the LDT from VMCB+5Ch. But this could
easily change again, so I also added an LDT command-line switch so you can
override this with a different offset.
So much for PROTDUMP. Next, I have some changes to the VXDLIST program from
the December 1993 issue. First, there was a bug: The program would fail to
call FreeSelector in the (admittedly unlikely) case of finding the string
"VMM" that doesn't actually belong to the VMM Device Descriptor Block (DDB).
Second, it turns out that in Chicago there is a VMM function that returns
exactly the information that my Get_First_VxD function in the December 1993
issue goes to such trouble to find. This new function, VMM_GetDDBList
(1013Fh), simply returns the 32-bit address of the VMM DDB in EAX. To call
this function from a Windows or DOS program (rather than from a VxD) would
currently require the VxDCall function from my generic VxD; see Example 1.
Thirdly, an anonymous reader sent me a list of new VMM and VxD function
numbers for Chicago, and I have added these to the VXDLIST database; a new
version of VXDLIST is available electronically. This, in turn, has enabled me
to examine DOS386.EXE and describe the new functions mentioned earlier.
Fourth, it turns out that there is an entirely new type of interface in
Chicago, called "Win32 services." The new 32-bit kernel (KERNEL32.DLL) and
other Win32 DLLs use these services to communicate directly with VxDs via a
set of VxDCall functions. For example, the Win32 Console API appears to be
implemented by calling down to VCOND (the virtual CON device). VXDLIST will
now show which VxDs provide Win32 services, and how many; see Table 1.
I don't currently have names for any of the services, though their purpose can
often be figured out easily enough. For example, the Win32 file-system
functions in KERNEL32.DLL put values that look like DOS INT 21h function
numbers in AH and issue a VxDCall 2A0018h. VxD 2Ah is VWIN32, and Win32
service 18h clearly provides an operating-systems interface that looks a lot
like a 32-bit MS-DOS. I would imagine that VWIN32 figures out whether the call
really needs to be passed down to DOS or whether it (hopefully) can be handled
entirely in "VxD land." If no real-mode DOS device drivers or TSRs have been
loaded in CONFIG.SYS or AUTOEXEC.BAT, Windows 4 should be able to remove
itself entirely from real-mode DOS, relying on the VMM/VxD operating system.
For example, Chicago (and even Windows for Workgroups 3.11) has a VFAT virtual
device that provides a 32-bit file system independent of real-mode DOS.
With all this talk of Chicago, I ought to mention that I have not yet signed
Microsoft's nondisclosure agreement for its Win32 conference. This is the only
reason I can talk about any of this. (It's worth noting, however, that the
January 1994 Microsoft Systems Journal has a major article entitled, "Chicago:
A First Look at Its Core Architecture," so in a sense, this subject is now
wide open.) I doubt I will sign it, either, since it appears to last for three
years! Microsoft recently sent the following note to publishers:
IMPORTANT NOTE: All conference attendees must sign a non-disclosure agreement
which explicitly prohibits reverse engineering, disassembly, decompilation,
etc., of Chicago_. you will not reverse engineer (including to discover the
internal architecture or functionality of the software), decompile, or
disassemble the software.
This very explicit ban on reverse engineering the prerelease software is
especially notable given Microsoft's suit against Stac Electronics, in which
Redmond is charging that Stac reverse-engineered MS-DOS to figure out how to
make Stacker 3.1 fully compatible with DOS 6.
Turning to Part 2 of Kelly's article, I should first point out that, in
addition to finishing up his coverage of the VMCB, Kelly has also uncovered an
undocumented VM initialization structure. The WINEXP application Kelly
presents this month uses this structure to display ASCII names for each
virtual machine.
WINEXP can access this VM initialization structure because the application
uses something called the "generic VxD." The generic VxD (VXD.386) enables
"normal" Windows and DOS applications to call VMM and VxD services otherwise
callable only from a VxD. In other words, the generic VxD is a surrogate,
which calls functions on behalf of less-privileged applications not allowed to
call those functions on their own. The generic VxD first appeared in Microsoft
Systems Journal (February 1993); the article, along with an old version of the
code, also appears on the Microsoft Developer Network (MSDN) CD-ROM.
The generic VxD is identical in concept to DEVHLP.SYS, a generic device driver
that I wrote several years ago for OS/2, when I still believed in developing
for that system ("Opening OS/2's Backdoor," DDJ, October 1990). Whereas OS/2
programs use generic IOCTL to call down to DEVHLP.SYS, Windows programs use
INT 2Fh function 1684h to access the generic VxD.
Kelly has modified the generic VxD in several ways; the new version is
available electronically, along with his WINEXP program. First, Kelly found an
embarrassing bug: The generic VxD uses self-modifying code, yet it is missing
the obligatory JMP SHORT $+2 to clear the instruction prefetch queue; this
results in erratic behavior, as the processor sometimes ends up executing the
old, nonmodified instructions left in the queue.
Next, Kelly added some additional services to the generic VxD. Most important
is a new service to get the VM_InfoBlock, an undocumented structure that Kelly
has uncovered. For example, this allows PROTDUMP -VM to now display ASCII
names for each DOS box. Kelly's code for getting the name from the
VM_InfoBlock also works under Chicago.
The generic VxD itself works under Chicago. But the generic VxD may soon
become a lot less important. I have just received an incredible article and
sample program by Alex Shmidt showing how normal 16-bit Windows programs can
use callgates to call 32-bit Ring 0 code, including VMM and VxD functions.
With Alex's RINGO.DLL, Windows programs would no longer need the generic VxD
in order to call a VMM function such as Get_Cur_VM_Handle (or
Get_Cur_Thread_Handle for that matter). The 32-bit Ring 0 code in RINGO.DLL
even manages to link itself into the VxD chain (yes, it shows up in the output
from VXDLIST). Alex says he named the program RINGO, in honor of both the
Beatle and Matt Pietrek's RING0 program from Microsoft Systems Journal (May
1993). Matt's article showed how to use callgates to call 16-bit privileged
Ring 0 code; Alex cleverly extends this to calling 32-bit Ring 0 code.
On the other hand, requiring a VxD may not be such an inconvenience under
Chicago because VXDLDR provides dynamic VxD loading/unloading capabilities;
perhaps VxDs won't have to be installed via an error-prone device= statement
in SYSTEM.INI. In addition, it looks as if Chicago will support writing VxDs
in C, and perhaps also allow calling VxDs from Win32 applications.
Another interesting upcoming article is Klaus Mueller's piece on instance
data. Although instance data will undoubtedly change dramatically in Chicago,
many of you have asked for an explanation of it. In the meantime, please
continue to send your articles, article ideas, and comments to me on
CompuServe at 76320,302.
Last month, I got most of the way through a blow-by-blow description of each
field that makes up the Virtual Machine Control Block (VMCB) in Windows 3.1
Enhanced mode. This month, I'll finish describing the fields, and present a VM
Explorer application for Windows; this application uses an undocumented
structure I call the VM_InfoBlock.
Please remember that these field offsets are only valid for the retail version
of Windows 3.1 Enhanced mode. In the debug version of VMM, the fields are
generally moved out by eight bytes. For example, while the LDT selector is at
VMCB+114h in the retail version, it is at VMCB+11Ch in the debug.
Incidentally, detecting the debug version of VMM is not as easy as you might
think! Windows programmers jump to the answer that you call
GetSystemMetrics(SM_DEBUG), but that only tells you whether the debug Windows
kernel is running; the VMM in WIN386.EXE is totally different from the kernel
in (for example) KRNL386.EXE. In fact, you must call a VMM function such as
Test_Debug_Installed or Get_VMM_Version.


More Undocumented Fields


Here are the remainder of the undocumented VMCB fields:
0x9C, 0x9E. CB_ForeGround_TS_Priority and CB_BackGround_TS_Priority.
Foreground and background time-slice priorities. The time-slice priority as
set by the foreground and background values are only a priority in the sense
that they determine the fraction of the total CPU time that the VM will
receive. When a VM executes in the foreground, its priority is
CB_ForeGround_TS_Priority; in the background it is CB_BackGround_TS_Priority.

The total system time-slice priority is the sum of all current priority
values. At any point in time the total can change as different VMs become
foreground or background processes.
The fraction of time that a VM will get to execute can be calculated by taking
the total VM time-slice priority (either foreground or background) and
dividing it by the total time-slice priority. This fraction is the amount of
time that a VM will get to run. The relationship between total system
time-slice priority and the VM time-slice priority determines the priority
weighting for execution.
The time-slice priority is not the same as the execution priority. The VM with
the highest execution priority is the VM that will run at any given point in
time. The time-slice priority is a value that can be used as a decision in
determining when to change the execution priority.
0xA0. CB_Weighted_Priority. Weighted priority=(Total of all VM
priorities*16)/VM Priority. This priority is inverted: The lower the number,
the higher the priority. It is also weighted: It is used as a multiplier to
determine the fraction of the total time that this VM is permitted to run.
0xA4. CB_Weighted_Time. Weighted execution time, calculated by taking the time
that a VM has been running and multiplying it by CB_Weighted_Priority times a
last-action fudge factor. If the VM has released the time slice or it is
blocked on a semaphore, the last-action multiplier is a smaller value, giving
less weight to the amount of time used. It is a cumulative value based on the
previous CB_Weighted_Time+(the difference in time since the last time it was
calculated*CB_Weighted_Priority). Each time it is calculated, the current
CB_VM_ExecTime is saved in the CB_Last_Weighted_Time field.
This value determines how much CPU time (relative to other VMs) the VM has
used, based on the time-slice priorities of the VMs. VMM uses this value to
determine whether or not it is time to give another VM its chance to run. The
higher the time-slice priority of the VM, the slower this value will increase.
0xA8. CB_Next_Runnable_VM. Handle of next runnable VM. All VMs capable of
running are linked together on a list. The first VM in the list is that with
the Execution Focus, also called the "foreground VM." If this VM is not
running in Exclusive mode, a VM will be linked on this list if it is
background executable or if it has a high-priority background. If the Focus VM
is running in Exclusive mode, the only other VMs on this list will be those
with high background priority.
0xAC. CB_Last_Weighted_VMTime. VM time that the CB_Weighted_Time was last
updated. This is subtracted from the CB_Exec_Time and multiplied by the
CB_Weighted_Priority to determine the weight of the amount of time the VM has
just been executing. The weighted time is added to CB_Weighted_Time to get a
total weighted time. This value is compared against that for other VMs to
determine if a task switch should occur.
0xB0, 0xB4. CB_DetailedErrorCode and CB_DetailedErrorRefData. Detailed error
code and reference data associated with it. These fields are associated with
the GetSetDetailedVMError function, documented in the DDK.
The first set of errors (high word=0001) is used when a VM is crashed
(VNE_Crashed or VNE_Nuked bit set on VM_Not_Executable). The device which sets
the error initially always sets the error with the high bit clear. The system
will then optionally set the high bit depending on the result of the attempt
to "nicely" crash the VM. This bit allows the system to tell the user whether
or not the crash is likely to destabilize the system. The second set of errors
(high word=0002) is used when a VM startup fails (VNE_CreateFail,
VNE_CrInitFail, or VNE_InitFail bit set on VM_Not_Executable).
0xB8. CB_V86_PgTbl_PhysAddr. Physical address ofCB_V86_PageTable. This is the
physical address that matches the linear address at VMCB+18h (see DDJ, January
1994). It is maintained as a physical address so that the Page Directory can
be updated.
0xBC. CB_Instance_Table. Linear address of instance-data buffer. Instance Data
is common data local to each VM. A perfect example is the DOS
current-directory structure (CDS). Each VM must have its own CDS within DOS,
but DOS knows nothing about multitasking and VMs. Each VM has an instance-data
buffer. The buffer contains the instance data itself; its target location and
size are kept elsewhere, in a private VMM data structure not accessible from
the VMCB.
0xC0. CB_hMem_VMDataArea. Page handle to VM data area. The VM data area is
PageAllocated and consists of the 4K page for the V86 Page table, followed by
the VM CB itself, followed by the VxD CB areas described last month. This
value is the handle returned by the VMM _PageAllocate function.
0xC4. CB_Int_Table_hMem. Page handle to instance data. This is a page handle
as returned by _PageAllocate. VMCB+BCh contains the linear address of the same
data. VMM saves the page handle so that instance data pages can be freed when
the VM exits.
0xC8. CB_DeviceV86Pages. Device V86 page bitmap. An array of 110h bits
representing the first 110h pages (one Mbyte+64K) of memory in a VM. If a bit
is set, the corresponding page has been assigned. The VMM
Get_Device_V86_Pages_Array function returns the nine dwords stored here.
0xEC. CB_V86PageableArray. V86 pageable array. An array of 100h bits
representing the first 100h pages (one Mbyte) of memory in a VM. If a bit is
set, the normal lock/unlock behavior for the corresponding page has been
disabled. _GetV86PageableArray returns the eight dwords stored here.
0x10C.CB_MMGR_Flag.High memory-area flag. If this flag is set, the A20
high-memory area (HMA) has been enabled. This determines the validity of page
accesses and requests above one megabyte.
0x110. CB_MMGR_Pages. Page handle to MMGR allocated pages. When pages are
allocated for a VM beyond the one page (4 megabytes) normally used in the DOS
box, they are linked through this field. Each page handle has a pointer to the
next page handle in the list. So it's a simple matter to determine how many
and what pages are owned by a VM. Pages can be allocated either by a VxD for
the VM or by a protected-mode program running in the VM.
0x114, 0x118. CB_LDT and CB_hMem_LDT. Local descriptor table (LDT) and page
handle to LDT memory. When a protected-mode program is running in a VM, it
requires an LDT to provide selectors. The LDT is allocated from the global
descriptor table (GDT). CB_LDT is simply the selector (in the GDT) that points
to a block of memory that contains the local descriptors. Programs can use
this to access protected-mode selector:offset pointers in other VMs. This
field is 0 if the VMStat_PM_Exec bit is not set in the CB_VM_Status.
The LDT is PageAllocated, and CB_hMem_LDT is the handle to the page for the
LDT; the handle allows the page to be freed when the protected-mode program
exits.
0x11C, 0x120. CB_VM_Event_Count and CB_VM_Event_List. Count of VM events on
the event list for this VM and linked list of VM events to be serviced by this
VM. Before the VMM returns to this VM, VMM will check the VM_Event_Count. If
the event count is nonzero, it will call the events linked on this list. The
nodes on the list include fields for a call address and reference data to be
passed to the called function. Events can be added to this list with
Schedule_VM_Event.
0x124. CB_Priority_VM_Event_List. Linked list of priority VM events for this
VM. Events can be added with Call_Priority_VM_Event. This service differs from
Schedule_VM_Event in that it combines Call_When_VM_Ints_Enabled,
Call_When_Not_Critical, and Adjust_Exec_Priority. If all of the required
conditions are met, the event is called immediately; otherwise the event is
scheduled.
0x128, 0x12C. CB_CallWhenVMIntsEnabled_Count and
CB_CallWhenVMIntsEnabled_List. Count of calls and linked list of calls to make
when VM interrupts are enabled. When interrupts are enabled within a VM, any
events linked on this list will be called. Events are placed on this list if
interrupts are disabled at the time of a call to Call_When_VM_Ints_Enabled.
0x130. CB_Next_Timeout_Handle. Handle to a time-out event. Set_VM_Time_Out
schedules a time-out to occur after a given period of time has elapsed. The
handle to the time-out is linked on this list. The first time-out on the list
will be that with the shortest expiration time.
0x134. CB_Prev_Timeout_Handle. Handle to a time-out event. This is possibly a
pointer to the previous time-out handle on the list, that is, a pointer to the
last time-out.
0x138. CB_First_Timeout. Time in milliseconds until the first time-out will
expire.
0x13C. CB_Expiration_Time. Expiration time overrun in milliseconds. If the
time-out cannot be serviced immediately when it occurs, this field will
maintain a count in milliseconds of the time elapsed since the time-out event
actually did occur. Thus, when the time-out is finally serviced, it can tell
how long ago the actual time-out happened.
0x140, 0x146, 0x148. CB_IDT_Base_hMem, CB_IDT_Limit and CB_IDT_Base. Page
handle to CB_IDT_Base, limit in bytes of IDT for this VM, and linear address
of IDT for this VM. Whether executing a protected-mode or V86-mode program in
a VM, the underlying VMM system is still running in protected mode. All
interrupts are processed through an interrupt-descriptor table (IDT). If the
application is a V86-mode program, the VMM "catches" the interrupt and
reflects it into real mode, if it chooses to, through the use of a real-mode
interrupt table at address 0000:0000 in the real-mode VM address space. This
field will then point to the system IDT.
If the application is a protected-mode program, the IDT will be a copy of the
system IDT modified specifically for this VM, with this field pointing to the
copy. The page handle allows the IDT to be locked, unlocked, and freed when
the protected-mode application finishes executing.
0x14C. CB_Exception_Handlers. An array of 32 DPMI exception handlers; each
entry is six bytes, identical to the CX:EDX return value from DPMI INT 31h
function 0202h. Installing an exception handler with either DPMI function
0203h or the ToolHelp InterruptRegister function changes this array.
0x20C. CB_V86_CallBack_List. Handle to V86 callback list. When a
protected-mode program running in a VM allocates a V86 callback (DPMI INT 31h
function 0303h), a callback structure is defined and linked to this handle.
V86 callbacks are a limited resource; this field allows them to be freed when
a VM exits.


The VM Explorer


As Figure 1 shows, the VM Explorer (WINEXP) neatly displays each field in the
VMCB. Even though WINEXP is a Windows application, and therefore running
inside the System VM (VM #1), it can display the contents of any VM.
More than this, WINEXP provides a constantly updated real-time display. It is
educational to run a windowed DOS box over WINEXP while WINEXP is displaying
the DOS box's VMCB. For example, if you run DOS programs that use DPMI, such
as VXDLIST or PROTDUMP, WINEXP will show fields change in the VMCB as the DOS
program enters and exits protected mode.
Normally, a VM Control Block is accessed from a VxD, which works with 32-bit
"flat" linear addresses ranging from 0 to 4GB. WINEXP is a normal 16-bit
Windows application using selector:offset pairs. To gain access to the VMCBs
and their associated fields, WINEXP uses Andrew Schulman's generic VxD. I've
made several enhancements to the original.
First, in Figure 1 WINEXP knows the ASCII name of each DOS box, such as
"COMMAND" or "MS-DOS Prompt". This name does not come from the VMCB. While the
name "System VM" is supplied by WINEXP itself, the DOS-box names are not.
Where do these come from?
When a new VM is created, VMM passes a Create_VM control message to each VxD
in the system. The DDK says that this message passes the VM handle of the new
VM to each VxD, permitting the VxD to perform any VM-specific initialization.
What the DDK doesn't say is that, passed along with the message, there is also
an undocumented pointer: ESI points to a VM_InfoBlock structure created from
the PIF settings (see DDJ, July 1993) for the VM. The VM_InfoBlock is shown in
Figure 2.
The field we are interested in here is AppName, which comes directly from the
"Window Title" PIF setting (if the window title later changes, AppName is not
updated). WINEXP uses this field to provide a name for each DOS box.
I've modified the generic VxD to handle the Create_VM message and save the
entire VM_InfoBlock in the VMCB, reserving space for it with the
Allocate_Device_CB_Area service. I have added a new VxD_Get_InfoBlock service
to the generic VxD to return this information. As Figure 2 shows, besides
AppName, the VM_InfoBlock includes other interesting information such as the
window handle (hWnd) for each DOS box. Note that this information is
read-only; you cannot modify the DOS box's state by changing this structure.
WINEXP keeps its display constantly updated by creating a Windows timer that
goes off four times a second. The WM_TIMER handler then calls a function that
updates the window. Unfortunately, the WINEXP code is far too long to print
here. However, the large number of C files that comprise WINEXP are available
electronically. Much of the code is not directly related to VMs, and is
devoted to displays strings at a given line number in a window, highlighting
the line, scrolling, and so on.
As a smaller example, Listing One (page 129) shows an extremely simple Windows
program (it uses Borland's EasyWin or any similar stdio facility for Windows)
that shows how to access the VMCB and VM_InfoBlock.
Besides displaying a list of VMs and the contents of an individual VMCB,
WINEXP also has an option to display time-slice information. This includes the
scheduler list and various scheduler values for each VM. One thing to keep in
mind while viewing the time-slice info: To display the data, the Windows
program (more accurately, the System VM) is the active VM. Even if the focus
is on another VM, VMEXP will always show the System VM as the current VM. This
is not a bug. Think about it!
Example 1: Calling the VMM_GetDDBList function.
#define VMM_GetDDBList 0x1013FL
DWORD Get_First_VxD(void) {
 VxDParams p;
 // should check if Windows 4+
 p.CallNum = VMM_GetDDBList;
 VxDCall(&p);
 return p.OutEAX;
 }

Table 1: VxDs provided by Win32 services.

VxD Number of Win32 Services

VMM 29
VWIN32 39
VCOMM 21
VCOND 41
 Figure 1: WINEXP, the Windows VM explorer.
Figure 2: The VM_InfoBlock. A pointer to this structure appears in ESI during
a Create_V message.
#pragma pack(1)

typedef struct { DWORD lin; WORD sel; } SELLIN; // use lin; ignore sel

typedef struct {
 DWORD SGVMI_Flags; /* initial SHELL_GetVMInfo flags */
 DWORD PIF_Bits;
 SELLIN Comspec; /* COMSPEC string */
 SELLIN CmdLine; /* command line */
 SELLIN CurDrive; /* current drive string */
 WORD MaxAllocPages, MinAllocPages;
 WORD ForeGroundPrio, BackGroundPrio;
 WORD MaxEMS_Kbytes, MinEMS_Kbytes;
 WORD MaxXMS_Kbytes, MinXMS_Kbytes;
 WORD hWnd; /* Window handle for WinOldAp */
 WORD Offset2C; /* ??? */
 char AppName[32]; /* Name from PIF "Window Title" */
} VM_InfoBlock;


[LISTING ONE] (Text begins on page 107.)

// VM.C -- Display information about Windows Virtual Machines
// bcc -W -2 vm.c (Borland EasyWin) or cl -Mq -G2 vm.c (Microsoft QuickWin)
// Andrew Schulman, December 1993

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <dos.h>
#include "windows.h"

#define VXD_VxDCall 1
#define VxD_Get_InfoBlock 7
#define Generic_Dev_ID 0x28c0

#define Get_Next_VM_Handle 0x01003BL
#define Get_Sys_VM_Handle 0x010003L

typedef void far *FP;

// from ddk vmm.inc
typedef struct {
 DWORD CB_VM_Status, CB_High_Linear, CB_Client_Pointer, CB_VMID;
 // etc.: we don't need any undocumented VM_CB fields for this
 } VM_CB; // Virtual Machine Control Block

#pragma pack(1)

///// VM_InfoBlock: see Figure 2 /////

typedef struct {
 DWORD CallNum, Reserved1;

 DWORD InEAX, InEBX, InECX, InEDX, InEBP, InESI, InEDI;
 DWORD Reserved2, Reserved3;
 DWORD OutEAX, OutEBX, OutECX, OutEDX, OutEBP, OutESI, OutEDI;
 WORD OutFS, OutGS, OutEFLAGS;
 } VxDParams;

static FP API = (FP) 0;
void show_vm(DWORD vm, VM_CB far *vm_cb);
VM_InfoBlock far *get_vm_info(DWORD vm);
WORD GetVMInfoBlockOffset(void);
void InitVxDAPI(void);
FP GetVxDAPI(WORD vxd_id);
FP map_linear(DWORD lin_addr, DWORD num_bytes);
void free_mapped_linear(FP fp);
DWORD GetSysVMHandle(void), GetNextVMHandle(DWORD vm);
BOOL VxDCall(VxDParams far *fp);

void fail(char *s) { puts(s); exit(1); }

main()
{
 VM_CB far *vm_cb;
 DWORD sys_vm = GetSysVMHandle();
 DWORD vm = sys_vm;

 while (vm_cb = (VM_CB far *) map_linear(vm, sizeof(VM_CB)+0x2000))
 {
 show_vm(vm, vm_cb);
 free_mapped_linear(vm_cb);
 if ((vm = GetNextVMHandle(vm)) == sys_vm)
 break; // GetNextVMHandle makes look like circular list
 }
 return 0;
}
void show_vm(DWORD vm, VM_CB far *vm_cb)
{
 VM_InfoBlock far *info;
 printf("#%lu\tVMCB=%08lX high_lin=%08lX",
 vm_cb->CB_VMID, vm, vm_cb->CB_High_Linear);
 // could show undoc information here too
 if (info = get_vm_info(vm))
 {
 char buf[33];
 _fstrncpy(buf, info->AppName, 32);
 buf[33] = \0';
 printf(" %04Xh \"%s\"", info->hWnd, buf);
 free_mapped_linear(info);
 }
 printf("\n");
}
VM_InfoBlock far *get_vm_info(DWORD vm)
{
 DWORD ofs = (DWORD) GetVMInfoBlockOffset();
 if (! ofs)
 return (VM_InfoBlock far *) 0; // call not supported
 ofs += vm; // add in handle to VMCB
 return (VM_InfoBlock far *) map_linear(ofs, sizeof(VM_InfoBlock));
 // caller must call free_mapped_linear
}

WORD GetVMInfoBlockOffset(void)
{
 WORD ofs = 0;
 InitVxDAPI();
 _asm mov ax, VxD_Get_InfoBlock
 _asm call dword ptr [API]
 _asm jc done
 _asm mov ofs, bx
done:
 return ofs;
}
void InitVxDAPI(void)
{
 if (! API) // one-time initialization
 if (! (API = GetVxDAPI(Generic_Dev_ID)))
 fail("This program requires device=VXD.386");
}
FP GetVxDAPI(WORD vxd_id)
{
 _asm push di
 _asm mov ax, 1684h
 _asm mov bx, vxd_id
 _asm xor di, di
 _asm mov es, di
 _asm int 2fh
 _asm mov ax, di
 _asm mov dx, es
 _asm pop di
 // returns in DX:AX
}
FP map_linear(DWORD lin_addr, DWORD num_bytes)
{
 WORD sel;
 _asm mov sel, ds
 if ((sel = AllocSelector(sel)) == 0)
 return (FP) 0;
 SetSelectorBase(sel, lin_addr);
 SetSelectorLimit(sel, num_bytes - 1);
 return MAKELP(sel, 0);
}

void free_mapped_linear(FP fp) { FreeSelector(FP_SEG(fp)); }
DWORD GetSysVMHandle(void)
{
 VxDParams p;
 p.CallNum = Get_Sys_VM_Handle;
 return (VxDCall(&p)) ? p.OutEBX : 0;
}
DWORD GetNextVMHandle(DWORD vm)
{
 VxDParams p;
 p.CallNum = Get_Next_VM_Handle;
 p.InEBX = vm;
 return (VxDCall(&p)) ? p.OutEBX : 0;
}
BOOL VxDCall(VxDParams far *fp)
{
 InitVxDAPI();
 _asm les bx, dword ptr fp

 _asm mov ax, VXD_VxDCall
 _asm call dword ptr [API]
 _asm jc error
 return TRUE;
error:
 return FALSE;
}

End Listing





















































February, 1994
PROGRAMMER'S BOOKSHELF


Teachers, Schools, and Computers




Peter D. Varhol


Peter is an assistant professor of computer science and mathematics at Rivier
College in Nashua, New Hampshire.


The day I finished Seymour Papert's The Children's Machine, I received the
property-tax bill for my house in New Hampshire. A quick calculation of the
rates revealed that over $1700 was allocated for the local school district.
Certainly, I would gladly pay this if I thought that it was going to improve
the ability of our children to comprehend and improve our society. But, given
current reports of our system of public education, there seems to be little
chance of that.
There's no question that public education as an institution in America is
broken. Nationwide, the dropout rate is 30 percent, although it's as high as
50 percent in some cities. Often, those who complete school are ill-prepared
to participate in society, as evidenced by a recent Department of Education
study which suggests that almost 50 percent of the adult population are not
literate enough to fully function in the modern world. In a world where
information is the new universal currency and knowledge is a prerequisite for
achievement and success, public education has become little more than a large
and inefficient day-care center. Seymour Papert proposes the computer as the
logical solution to these educational problems.
Papert, developer of the Logo programming language, former codirector of the
MIT AI Lab, and founder of the MIT Media Lab, is well known enough to command
attention for his ideas. Still, he goes one step further, invoking the
thoughts of such diverse thinkers as Marvin Minsky, John Kemeny, Noam Chomsky,
and Jean Piaget to describe how children learn by discovery, and how the
computer is the ultimate discovery tool.
Papert begins with a compelling thought: Take a surgeon and a teacher from a
hundred years in the past and place them in their respective workplaces of the
present. The surgeon is unlikely to even begin to appreciate the concepts and
complexities of modern surgery. The teacher, on the other hand, would feel
right at home in today's classroom, if not necessarily with the specific
subject matter. The point is that teaching as a method has barely changed in a
long time. Some may argue that education is a basic process that does not
change. However, while learning may not change, the way we present education
(as distinct from learning) is certainly ripe for some drastic changes.
Computers are already in most mainstream schools. According to Papert,
however, the educational establishment has embraced computers not as a method
of change, but as a subject of study in its own right. The curriculum includes
the math lesson, the English lesson, and the computer-literacy lesson. As
such, the establishment can claim, with some justification, that it has
integrated advanced technology into education. Computers are not an academic
subject, Papert contends, but rather a mechanism for exploring and integrating
all of the traditional subjects.
This is precisely what is wrong with computer-aided instruction (CAI), which
presents lessons to students, who are then able to learn at their own pace, go
back and review old concepts, and choose the subject of study. Researchers
have found that these techniques are able to improve students' standardized
test scores and are considered to be innovative and worthwhile under many
circumstances. Papert argues, with some force, that CAI is simply using
computers to reinforce old and obsolete teaching paradigms. The student is
still the recipient of fixed information, rather than the discoverer of new
concepts.
Papert makes a distinction between teachers (who, as individuals, run the
spectrum from unable to innovate to demanding continual innovation) and school
(the primary purpose of which is to perpetuate its own existence and
methodologies). Teachers aren't the problem, he argues, school is. The
bureaucracy is more interested in ensuring compliance with teaching
methodologies.
I thought this distinction to be specious. You have to have something concrete
to battle against, and pointing the finger at the institution means that no
individuals need take any responsibility for change. Then I raised Papert's
arguments to one of my graduate classes. One student, a high-school math
teacher, immediately became defensive at the prospect that there was something
wrong with the way she taught, especially using computers. As I watched the
interaction, it occurred to me that Papert was right. No matter how sincere
and innovative this woman may be, she is constrained by an oppressive
collection of rules and procedures that all but dictate what is to be taught
in the classroom and how.
Central to Papert's concepts of childhood education is the Logo programming
language. In Papert's view, Logo is less a way of controlling the computer
than it is a method of exploring different but related concepts, such as
mathematics, art, culture, and design. The computer, through Logo, becomes the
medium for learning things that might, at first glance, be thought of as
unrelated to programming.
Curious, I dusted off (literally) an old Logo, Coral Software's Object Logo
for the Macintosh (Coral's product has since been taken over by Paradigm
Software, updated, and remarketed). When I first used this Logo, about seven
years ago, I was new to programming and tried to use it in the same
step-by-step manner I was learning with other languages at the time. I quickly
concluded that Logo was limiting and difficult to use compared to other
languages, and set it aside.
This time, I simply used Logo for the fun of it. First, I drew a straight
horizontal line with the turtle. Then, I began sending lines off in any old
direction, up, down, diagonally, making a mess of the drawing window, but
quickly learning how the language behaved. I then cleared the drawing window
and tried some simple programs. Logo is an interpreted language, which some
purists would claim gives it a performance handicap. However, I realized that
this very feature lets children start out simply, with one-line programs that
actually perform a visible action, then gradually increase its size and
complexity until they are doing something that is unusually sophisticated.
Within two hours, my own explorations led me to produce a fractal-like drawing
using the code in Example 1.
I learned two things from this exercise. First, when I originally experimented
with Logo, I was concerned with learning the syntax and structure of the
language and failed to appreciate that the language itself was a means to do
other things. I was also caught up with learning about "objects" from the
viewpoint of a procedural language, rather than recognizing that an object is
far more intuitive than a record. At the time, I was perhaps not childlike
enough in my way of thinking (now, later in life, presumably I am). In short,
I didn't play with it, but treated it as a subject for serious study.
Second, it is wrong to think of what children do with Logo as "programming,"
even though we would recognize it as such. Many psychologists doubt that
children can program in the traditional sense prior to achieving a certain
level of logical and mathematical sophistication. Developmental psychologists
and computer scientists may debate among themselves what exactly is going on,
but children, who do not understand what the fuss is all about, simply
surprise everyone with the complexity and insight of their creations. I am
reminded of the Bugs Bunny cartoon where, after performing yet another
impossible feat, Bugs declares, "I know this violates the law of gravity, but
then, I never studied law." Many educators, including my student, demand to
know where the money will come from to integrate computers into public
education in the way Papert describes. Even though the amount spent per
student has doubled over the last ten years, schools still cannot find the
right formula to improve education. It is clear that they have the money; it
is school itself that must change to use it better. Papert claims that a
wealth of computers is not a requirement; even a few computers could well
support some effective group efforts at computer learning.
One last point of interest is that the computer revolution in education began
with the introduction and availability of the microcomputer. Papert's early
experiments used teletype terminals connected to large and expensive
mainframes. This setup was possible for small-scale experiments, but was not
going to change education in any way. The microcomputer made it possible for
small groups or even individual children to inexpensively use both computer
concepts and graphics to explore virtually any academic subject.
I teach mostly working adults in professional graduate programs, but I see the
need for change, even at this level. The classroom lecture model is expensive,
inefficient, and dominated by teaching, rather than learning. Back in my
undergraduate days, I had a psychology professor who instinctively knew this,
claiming that "my goal is to get you excited enough about the subject so that
you will leave the classroom and go learn something about it." Papert focuses
on learning, rather than cost, but I am certain that any future endeavors will
have to both encompass all students in a learning experience, and do it
without sending our property taxes any higher than they are now.
Our educational institutions at all levels have to find new ways to package
and deliver education. For my part, I am trying to sell my powers-that-be on
multimedia and virtual classrooms. Wherever the school of the future ends up,
we--as computer professionals--should be leading the way.
The Children's Machine: Rethinking School in the Age of the Computer
Seymour Papert
BasicBooks, 1993, 241 pp., $22.50
ISBN: 0-465-01830-0
Example 1: Object Logo procedure to create a fractal drawing.
to fractal.pattern :level :inc
hideturtle
publicmake "window first turtlewindows
ask :window [setwsize [500 290]
_ wselect
_ startrgn]
clearscreen
fractal 260 :level
insetfractal ask :window [getrgn] :inc
end
to fractal :size :level
penup
back :size / 2
pendown
repeat 3 [fractal1 :size :level right 120]
end
to fractal1 :size :level
if :level = 0 [forward :size stop]
fractal1 :size / 3.0 :level - 1
left 60
fractal1 :size / 3.0 :level - 1
right 120

fractal1 :size / 3.0 :level - 1
left 60
fractal1 :size / 3.0 :level - 1
end
to insetfractal :rgn :inc
if emptyrgnp :rgn [stop]
ask :window [framergn :rgn]
insetfractal insetrgn :rgn :inc :inc :inc
end
fractal.pattern 6, 4




















































February, 1994
OF INTEREST
C++/Views 3.0 from Liant is an object-oriented development tool that speeds up
and simplifies the creation and porting of GUI applications among Windows,
OS/2 Presentation Manager, OSF/ Motif, Macintosh, and DOS. It is an
application framework that combines over 100 ready-to-use classes with
productivity tools such as interface, data, event, printer, and extended GUI
classes. C++/Views includes C++/Views Constructor, which allows users to view
and edit C++ code.
You can try out the software package for 30 days for $50.00 (to be credited
toward the purchase price). Pricing varies from $499.00 to $2000.00, depending
on platforms. Reader service no. 20.
Liant Software Corp.
959 Concord Street
Framingham, MA 01701
508-872-8700
The Motorola 68000 Microcontroller Development Tools Directory, published by
MW Media, lists hundreds of development tools for the M68HC05, M68HC08,
M68HC11, and M68300 microcontroller families. Details of each chip--part
numbers, ROM, RAM, timers, serial, bus speed, packaging, and the like--are
presented in tabular form. The 126-page book includes data sheets, in-circuit
emulators, debuggers, and so on. The book is available free of charge,
although a $5.00 shipping and handling fee is required ($8.00 international).
Reader service no. 21.
MW Media
50 West San Fernando
San Jose, CA 95113
408-286-4200
Microsoft has released Visual C++ 1.5, which includes support for OLE 2.0 and
ODBC. If nothing else, this release should be a shot in the arm for OLE 2.0 as
much as Visual C++. OLE 2.0 has been haltingly adopted by developers in large
part because of its complexity. VC++ 1.5 addresses this complexity, making OLE
2.0 much more approachable by providing a set of classes for OLE-based
development. The AppWizard for VC++ 1.5 supports containers, miniservers,
full-server, and similar applications, as well as visual editing and
drag-and-drop.
Database support is through a group of data-access classes that implement an
ODBC API. For example, a new base class called RecordView displays data in a
form. VC++ 1.5 also includes royalty-free ODBC drivers for a variety of
off-the-shelf database packages.
Version 1.5 now runs under Windows NT; some of the requirements are: MS-DOS
5.0, Windows 3.1, a 386 or higher, and 8 Mbytes of RAM. Reader service no. 22.
Microsoft Corp.
One Microsoft Way
Redmond, WA 98052-6399
206-882-8080
The Second Annual Object-Oriented Numerics Conference, sponsored by Rogue Wave
Software, is scheduled for April 24--27, 1994 in Sunriver, Oregon. This
conference will provide a forum where computer scientists and scientific
programmers can discuss how to use object-oriented programming techniques to
more effectively write complex scientific code. For more information, contact:
Margaret Chapman, Program Coordinator. Reader service no. 23.
Rogue Wave Software
P.O. Box 2328
Corvallis, OR 97339
503-754-3010
amc@roguewave.com
Inmark Development announced the release of the zApp Interface Pack, an add-on
product to its C++ class libraries. zApp is a portable C++ application
framework that provides cross-platform portability through object-oriented C++
classes. zApp 2.0 supports Windows, NT, DOS text/graphics, and OS/2.
The Interface Pack sits on top of zApp 2.0 and provides, among other things, a
set of objects for creating a spreadsheet-like table with a variety of cell
types that allow for easy display of text, numeric, and image data in a matrix
format. Additional classes create tool bars and status lines with a 3-D look.
3-D custom controls include frames, panels, static text, radio buttons, and
check boxes. These classes also support customizable bitmap buttons. Reader
service no. 24.
Inmark Development Corp.
2065 Landings Drive
Mountain View, CA 94043
415-691-9000
The Windows version of the Victor Image Processing Library 3.1 from Catenary
Systems now provides functions for reading and writing grayscale and color
images using JPEG compression. The DOS version has been enhanced to include
the ability to load and save BMP and JPEG files, in addition to
TIFF/PCX/GIF/TGA/bin file formats. The DOS version also has new functions for
conversion between bilevel, grayscale, palette color, and RGB color images.
The DOS package supports Microsoft and Borland C/C++ compilers and sells for
$195.00, while the Windows version is a DLL ($295.00). The library is royalty
free and source code is available. Reader service no. 25.
Catenary Systems
470 Belleview
St. Louis, MO 63119
314-962-7833
Lotus has announced the Windows Add-In Development Kit (ADK) for customizing
and enhancing 1-2-3 Release 4 for Windows. Add-ins are combinations of the
user's C programs, 1-2-3 code, and Windows code. The ADK includes a library of
over 350 C language functions, which provide access to spreadsheet events, and
custom user-interface building tools. It requires an IBM PC or compatible (286
and higher), 4 Mbytes of RAM, and Windows 3.0 or higher. The 1-2-3 Release 4
for Windows ADK sells for $49.95. Reader service no. 26.
Lotus Development Corp.
55 Cambridge Parkway
Cambridge, MA 02142
617-577-8500
PacketView 1.10, a low-cost protocol analyzer for DOS-based PCs from Klos
Technologies, recognizes Ethernet, Token Ring, ARCNET, and FDDI networks.
PacketView includes protocol-decoding capabilities for TCP/UDP/IP, SNMP,
IPX/SPX/NCP, NetBIOS, XNS, Vines IP, and AppleTalk protocols. Users can also
create their own protocol decoders.
PacketView 1.10 sells for $299.00. A demo version is available via BBS
(14.4/8N1) at 603-429-0032, or on the Internet via anonymous FTP at
mv.mv.com:pub/users/klos/pvdemo.zip. Reader service no. 27.
Klos Technologies Inc.
604 Daniel Webster Highway
Merrimack, NH 03054
603-424-8300
Inform Software, a fuzzy-logic development-tool vendor, has teamed up with
Texas Instruments and Intel to support processors from both companies. In
particular, the Windows-hosted fuzzyTech 3.1 development environment will
generate C code for TI's TMS320C2/5 digital-signal processors and Intel's MCS
96, MCS 51, and 80C186 microcontrollers. (TI's implementation of fuzzyTech is
called fuzzyTech 3.1 MCU-320, while Intel's is fuzzyTech 3.0 NeuroFuzzy.)
fuzzyTech is a graphical CASE tool that supports design steps for fuzzy-system
engineering. It provides simulation and optimization and displays system
performance in multiple ways to give you efficient optimization options.
Finally, it generates C/assembler code optimized for the target processor.
TI's fuzzyTech MCU-320 Explorer version sells for $199.00, and the fuzzyTech
MCU-320 Edition version for $1890.00. Intel's fuzzyTech 3.0 NeuroFuzzy module
(an extension for all fuzzyTech 3.0 Editions) sells for $900.00. Reader
service no. 28.
Inform Software
1840 Oak Ave.
Evanston, IL 60201
800-929-2815
Intel Corp.
Literature Center
Document #272340

800-548-472
Texas Instruments Inc.
Literature Response Center
800-477-8924 x4500
MKS and Prentice-Hall have jointly published A DOS User's Guide to the
Internet, by James Gardner. The book provides hands-on instruction on how to
access the Internet using MKS's UUCP software for DOS, which is included with
the book. This software is a subset of MKS Toolkit 4.1, and includes the uucp
and mailx programs. The book sells for $34.95. Reader service no. 29.
Mortice Kern Systems Inc.
35 King Street North
Waterloo, ON
Canada N2J 2W9
519-884-2241
The MainWin SDK from MainSoft is a cross-platform development tool for porting
Windows applications to UNIX workstations. MainSoft claims that the underlying
code of the original application does not have to change and that developers
can upgrade all versions--Windows and UNIX--simultaneously. MainSoft further
claims that in benchmarks of common Windows programs, MainWin-based
applications run 10 to 20 times faster than comparable WABI-based apps.
MainWin requires minimal rewriting of C/C++ code, focusing on 16-bit to 32-bit
dependency conversion. MainWin initially supports AIX, HP-UX, Solaris 2.2,
SunOS 4.1, and IRIX 5.1.
The MainWin SDK sells for $5000.00 for the first copy, $2000.00 for subsequent
copies. MainWin for Workstations, the end-user environment, sells for $195.00.
Reader service no. 30.
MainSoft Corp.
883 North Shoreline Blvd., Suite C-100
Mountain View, CA 94043
415-966-0600
On the Visual Basic front, VideoSoft has announced VSVBX 3.0, a set of custom
controls for VB 3.0; Pinnacle has released Code.Print Pro, a VB printing
utility; and Baldar has released a set of data-type functions for VB.
VSVBX includes three custom controls: VSIndexTab, a container control that
implements the notebook tab metaphor; VSElastic, a control that automatically
resizes, aligns, and frames its contents; and VSAwk, a general parsing routine
based on the UNIX awk utility. The VideoSoft Custom Control Library sells for
$45.00. Reader service no. 31.
Code.Print Pro's formatting controls allow you to select fonts and otherwise
enhance printouts, including source-code listings. It also automatically
generates cross-referenced indexes. The tool sells for $99.00. Reader service
no. 32.
The Baldar Data Type Functions bring to VB the MKI$, CVI, MKD$, and CVD
functions, which were available for previous implementations of Microsoft
Basic, but not for Visual Basic. Programs written in other Microsoft Basics
can be imported into VB without rewriting code or logic. The Data Type
Functions sell for $49.00. Reader service no. 33.
VideoSoft
2625 Alcatraz Ave.
Berkeley, CA 94705
800-547-7295
Pinnacle Publishing
P.O. Box 888
Kent, WA 98035-0888
206-251-1900
Baldar
P.O. Box 4340
Berkeley, CA 94704
510-841-2474
Q/Media for Windows Version 1.2 from Q/Media Software is a stand-alone tool
for creating multimedia presentations. The tool can also be used in
conjunction with other common graphics packages. With the Clip List the user
is able to drag and drop graphics, animation, sound, and video onto the screen
for repositioning, resizing, or synchronizing without the need for scripting,
reprogramming, or file conversions. One added feature of Version 1.2 makes it
possible to preview files prior to bringing objects into the editing screen
and before loading them into the Clip List program. Q/Media for Windows costs
$99.00. Reader service no. 34.
Q/Media Software Corp.
312 E. 5th Avenue
Vancouver, BC
Canada V5T 1H4
604-879-1190
A document-imaging development tool-kit has been released by Diamond Head
Software. ImageBasic 1.1 features scanning, image processing, and image
manipulation to create production-quality imaging applications. It provides
optical recognition of typed text, hand-printed characters, and bar code. A
typical application that includes image manipulation, scanning, image
post-processing, and optical recognition of typed text requires about 30 lines
of code in ImageBasic. The integrated modules provide, among other essential
functions, display control, scanning control, and ScanFix control. The bundled
toolkit sells for $2500.00. Reader service no. 35.
Diamond Software Inc.
Ocean View Center, Suite 630
707 Richards Street
Honolulu, HI 96813
808-545-2377


















February, 1994
SWAINE'S FLAMES


Allegorizing Sun




a novel by Michael Frighten


I was working Hacker Services out of bunko--liaison work with computer
programmers who come in contact with the law. Helping lost programmers find
their way home. Translating jargon to English. It was my first week on the
job. My name is Smith, Peter J. I'm a cop.
8:45. I was at home studying WordPerfect when the call came in. Report was
there had been a homicide at Sun Jose, some kind of computer conference. The
officer on the scene had requested Hacker Services. That's me.
"You may want some backup on this one," Dispatch told me. "You know John
Connor?"
"Minor, age 14," I said. "Mother, Sara, institutionalized. Real nutso. Thinks
her son is being targeted by androids from the future."
It turned out Dispatch meant another John Connor, a retired Hacker Services
officer who, some thought, had gotten in too deep, become half-hacker himself.
9:05. I picked Connor up at his place, the basement of a warehouse adjacent to
a Coca-Cola bottling plant. His visible furniture consisted of two computers,
a bare mattress, and a '50s-vintage Coke machine. He was the real thing, all
right.
On the way to the crime scene Connor gave me some advice on handling hackers.
"Don't throw your authority around. That's very bad form. You can throw the
law at them if you know it cold; they respect a proper application of code.
They never lie, but their answers may not mean what you think. And don't
expect them to volunteer relevant information; they probably won't realize
it's relevant."
"Uh-huh. Can I talk to them?"
"Yes, but cut it off if I cough, and let me take over. Like this." He coughed.
9:20. We arrived at Sun Jose Center. The officer on the scene, Tom Graham, met
us. "It's two flights up. I been using the stairs. Maybe you guys can work the
elevator."
Connor looked at it. "It's binary. We just press 1-0."
"Shouldn't that be 1-1?" I asked as we ascended.
"They start counting at zero."
Tom Graham snapped, "They ought to count in English. Ain't this still
America?"
"Barely," Connor answered, as the doors opened on the crime scene.
A slovenly Caucasian, apparently the head hacker, approached us.
"Hello, Lieutenant. How's your daughter's cold?"
"How do you know about my daughter's cold?" I asked.
"Oh, from our tap on your phone. Sorry to hear about the divorce."
Connor coughed. "I'll handle this. Talk to the caterer, Lieutenant Smith."
The caterer turned out to be the only nonprogrammer present. While I was
questioning him, a female hacker approached with a piece of paper.
"Lieutenant, we intercepted the report from your lab. Would you like it on
disk, or just the hard-copy fax?"
I held out my hand. "Just the fax, ma'am."
Connor was there suddenly. "Let me see that," he said. "Mm-hmm. Just as I
suspected. Lieutenant Smith, arrest this man." He was looking at the caterer.
11:45. Back downtown later, I asked Connor what it was in the lab report that
tipped him off to the caterer.
He looked embarrassed. "It was a, er, sex crime, Lieutenant," he said.
"Programmers don't have sex."


Afterward


Some say that Michael Crichton's book Rising Sun capitalizes on American
xenophobia. If so, Crichton needn't have looked so far away for his aliens.
Despite learning to accept computers in the past decade, most Americans still
think computer programmers are about as human as Spock.
Michael Swaineeditor-at-large















March, 1994
EDITORIAL


Will Reverse Engineering Take a Step Backwards?


Reverse engineering is a fundamental software development process which almost
all programmers dive into at one time or another, whether with their own or
someone else's code, whether they want to or not. The courts have defined
reverse engineering as "a fair and honest means of_starting with the known
product and working backwards to divine the process which aided in its
development or manufacture." As for software reverse engineering, the courts
go on to say it's "the process of starting with a finished product and working
backwards to analyze how the product operates or it was made."
Reverse-engineering expert Andy Johnson-Laird points out that both definitions
focus on the reverse-engineering process, not the resulting product, adding
that "the static and dynamic examinations of a computer program are the only
two activities that are viewed by programmers as being reverse engineering."
Even though it's protected by copyright law, reverse engineering is
nonetheless under fire. In fact, if the federal government--pressured by
influential software companies--gets its way, reverse engineering may be
limited to inspection of software you write, and outlawed in all other
instances.
Of course, reverse engineering isn't unique to software development. When an
anonymous caveman fashioned the first wheel, an enterprising hardware guy in
the next cave probably figured out how to turn a similar hunk of rock into a
similar cylinder. From computer chips to potato chips, every industry engages
in reverse engineering. Still, it is software reverse engineering that's at
the eye of the storm.
Much of the current furor over software reverse engineering revolves around
international intellectual-property protection. Recent changes in European
Community law allow you to reverse engineer when you want to create software
that interoperates with, but does not replace, the original program. In light
of the EC provisions (not to mention U.S. cases such as Sega vs. Accolade and
Atari vs. Nintendo, both involving reverse engineering), Japan is re-examining
its intellectual-property laws, and U.S. software companies fear that the
Japanese will be equally lenient when it comes to reverse engineering. Spurred
on by IBM, Microsoft, Apple, and others, U.S. trade representatives have sent
letters to the Japanese expressing "grave concern" over the possibility that
Japan will relax reverse engineering laws.
Confused by the intellectual-property debate, the knee-jerk reaction of most
developers is to prohibit reverse engineering. Take a look at the shrink-wrap
licenses of software on your shelf (including that of the Dataware
search/retrieval engine at the heart of the Dr. Dobb's/CD) and you'll find
what Walter Oney (who writes this month's "Programmer's Workbench") calls
"draconian" license restrictions forbidding you to modify, translate, adapt,
reverse engineer, decompile, or disassemble the software. Amazingly, you
sometimes don't even get to read the license until after breaking the seal. Of
course, the validity of this shrink-wrap ban on reverse engineering has yet to
be tested in court, although both Illinois and Louisiana have statutes
legalizing such prohibitions.
The limits of reverse engineering may eventually be determined by a pivotal
court case in which Microsoft claims that Stac Electronics reverse engineered
betas of MS-DOS 6.0. It's ironic that, on one hand, Microsoft enjoins reverse
engineering of its software, while on the other implicitly condoning the
process by publishing articles such as Matt Pietrek's excellent "A Look under
the Hood of the Windows 3.1 Global Heap and the Functions that Maintain It" in
Microsoft Systems Journal (March, 1993). While he doesn't use the "R" word in
the article, Pietrek clearly states that the article was adapted from his book
Windows Internals, where he openly acknowledges that the information was
gained via reverse engineering.
This isn't to say that the right to reverse engineer software hasn't been
abused. However, reverse engineering is too difficult, time consuming, and
expensive for software thieves (the convenient targets of those who would ban
reverse engineering) to bother with. It's more cost-effective for pirates just
to copy software.
There are good reasons for going to the trouble of reverse engineering:
recycling old software for new systems, rooting out bugs coded by programmers
no longer around, and ensuring compatibility with existing software. Perhaps
even more important in these days of intellectual-property litigation, reverse
engineering provides a way to identify patented technology so that you can
avoid infringing upon it.
In the long run, U.S. companies may do themselves more harm than good by
stifling reverse engineering, particularly if it's restricted in the U.S. and
legal elsewhere in the world. In the short term, software companies have to
realize that they can't have it both ways: If you don't want programmers
stepping all over your intellectual property, you can't bar them from
discovering what that property is--and reverse engineering is the best way to
find out.
Jonathan Ericksoneditor-in-chief












































March, 1994
LETTERS


Comparing Object-oriented Languages




Dear DDJ,


Referring to "Comparing Object-oriented Languages" by Michael Floyd (DDJ,
October 1993), lines of code (LOC) is not at all a poor measure of programmer
productivity. On the contrary, it is an excellent measure of unprofessional
and careless programming.
For example, if a certain piece of code can be done in 100 lines in two days
(50 LOC/day), and someone does it in 1000 lines in four days (250 LOC/day), he
isn't five times more productive--he's 50 percent less productive. The elapsed
time is double, and the product is ten times larger with the corresponding
additional costs for verification, distribution, and maintenance.
Assuming that these additional costs are half the cost of the project, and
that they grew just a factor of 5 because the larger code is simpler (which is
not necessarily true), the cost of the project is now three times what it
should have been.
Professor E.W. Dijkstra once rightly said that programmers should be charged
for every line of code they write, rather than get paid for it.
If you look at the numbers in Table 1 of the article, someone whose pay is
based on LOCs would think that the "best" object-oriented language is Eiffel.
Assuming experienced programmers require about the same effort, the Eiffel
programmer was 80 percent more "productive" than the C++ programmer!
Jony Rosenne
Tel Aviv, Israel
Michael responds: Thanks for your comments, Jony. I believe that you present
valid arguments when coding with procedural languages. But object-oriented
programming adds a new variable to the equation, namely reusability. Designing
a class to be reusable takes longer than its procedural counterpart. The
process usually involves an analysis of the overall design to identify
components that may be useful in more than one context. Such components are
likely candidates for abstraction. Often, however, this is an iterative
process of designing, coding, redesigning, and recoding. Ultimately, a
full-featured class designed for reuse is sure to increase the overall line
count. The short-term result is that building reusable classes takes longer,
costs more to develop each line of code, and increases the overall line count.
However, the long-term benefit is that the same code costs no more to use
again.
The point of my ending comment was to raise the reader's awareness of a new
problem. Assuming you can measure code reuse in the first place, is the pain
of developing reusable classes worth the gain? And, if you're ready to buy
into it, how do you promote reuse in your company? More to the point, how do
you reward the programmer for developing reusable code?


Dear DDJ,


I was dumbfounded by your inclusion of an obscure language such as Drool and
exclusion of as popular a language as Actor from your article in the October
1993 issue of DDJ. There are over 30,000 registered users of Actor. It is
up-to-date with Windows 3.1, and it's commercially available today (Version
4.0) despite serious disinterest by its new owner, Symantec.
Actor is very much in use and very much an issue in Windows-based OO
development. It was, after all, the first OOPL available commercially for
Windows development. It was and is still a purely object-oriented language. A
number of its most vocal enthusiasts have begun a constant drumbeat in the
technical media, which we hope will culminate in the language's separation
from Symantec and reentry into the Windows commercial-development arena (see
my article in the August 1993 Windows/DOS Developer's Journal on Actor 4.1).
Most of Actor's third-party supporters--Tempus Software, BOK Technologies, and
The Windows Wurx--have agreed to bundle their Actor add-ons with the product
at virtually no cost to infuse some new life.
I find it very hard to believe DDJ doesn't keep up on Actor at all but manages
to keep in touch with the likes of Drool, Beta, Sather, and Eiffel.
Richard L. Warren
CompuServe 70750,3436


Dear DDJ,


Your October, 1993 article, "Comparing Object-oriented Languages" provided
just the sort of information that keeps me buying DDJ. Articles about OOP are
always of interest to me, and it is refreshing to see languages other than
C/C++ receive coverage.
Another object-oriented language readers may be interested in is "Mops." Mops
is the work of one (very talented) individual, Michael Hore of Australia. The
language is consistent and easy to learn, benefitting from one man's vision.
The basic syntax is derived from a commercial product that first appeared in
1984 (Neon), so the language is "proven."
Example 1 shows a Mops coding example that's very short because it
conveniently already has number (var) and point classes that respond in a
meaningful way to a print message (print:). Example 1 shows how the point
class might be defined in Mops.
Doug Hoffman
CompuServe 72310,1743


Dear DDJ,


I read your October 1993 issue with great dismay--here is an issue ostensibly
devoted to object-oriented programming languages that does not mention
Oberon-2. I am disturbed because: 1. Somehow Oberon-2 was not selected for
illustration; and 2. it was not even mentioned in the discussion of OOP
languages. While I understand that there are many such languages and that in
depth coverage of all would likely be impossible, I am puzzled since some of
the languages you did see fit to cover probably do not have a volume of
written code comparable to that of Oberon-2. Several excellent code examples
as well as the Oberon operating system illustrate the capabilities of this
well-thought-out evolution of structured, modular programming languages.
Also not included was Modula-3. It is distressing that in the article,
"Comparing Object-Oriented Languages," Ada is used to provide an approximation
of the doubly linked list example. Modula-3 would have been, in my opinion, a
better choice. In fact, there is an excellent set of software components that
includes the doubly linked list case, but also provides iterators and the like
for accessing and operating on this list of possibly heterogeneous node types:
It is written with Modula-2! (The Modula-2 Software Component Library, C.
Lins, Springer-Verlag.)
The second matter that concerns me is the lexicon that has developed in the
object-oriented programming arena. I have difficulty when simple concepts such
as type extension and procedure variables (and their combinations) become
mired in terminology that largely obscures rather than describes or defines. I
will agree that OOP comes into its own under large, complex system conditions
and certainly impacts program design in a fundamental manner, but then this is
analogous to the effect on programming that record, enumeration and pointer
types, and structured-programming constructs had at their introduction.
Unfortunately, in the case of object-oriented programming languages, we are
speaking of enhancements to existing concepts, principally type extension and
broad use of procedure variables, rather than the introduction of new
concepts. Yet these relatively simple (and powerful) notions are confounded or
lost in the egregious sea of OOP-speak.
Perhaps I must simply give in and recognize that we (in the U.S., anyway) are
living in a "mono-C-istic'' programming society. I hope, however, that this is
not, and will not be the case.
Michael A. McGaw
Fairview Park, Ohio
DDJ responds: Our thanks to the many readers who wrote in with similar
concerns about the coverage of so-called "obscure'' languages like Drool,
Beta, Sather, and Parasol. Because space considerations prevent analysis of
every object-oriented language in a single issue, the October issue provided a
cross-section with special attention given to nonproprietary languages. In
particular, Drool and Parasol are in the public domain with complete source
code available in the DDJ Forum on CompuServe.


Now On To Forth_





Dear DDJ,


Forth is an extensible language but Michael Swaine's example in DDJ (November
1993) doesn't extend Forth any more than writing a function extends C.
Extending a language would be like adding the syntax:

LOOP
_
EXIT IF (condition)
_
ENDLOOP

to Fortran or making complex a new elementary data type in C with exactly the
same standing as int --reserved word and all.
The syntax of Forth is extremely simple: "Words'' are character strings
delimited by whitespace. The semantics of Forth are simple, too; nearly all
words are imperative verbs:
Some few "defining'' words add words to the dictionary, so the first time a
user-defined word appears, the defining word puts it in the dictionary.
Thereafter, the newly defined word is an imperative verb.
A few "immediate" words are executed during compilation.
Most words are not executed during compilation--they are "compiled" into the
dictionary during compilation.
Because of these two simplicities, the Forth compiler can be exposed to the
programmer. (The source code for the fig-Forth text interpreter, compiler, and
keyboard debugger fits on a screen of 16 lines of 64 characters with plenty of
whitespace.) Extending Forth consists of defining: 1. new immediate words
which have a compile-time effect, and 2. new defining words.
A truly marvelous feature of Forth allows the programmer to specify the
compile-time effect of new defining words and the run-time effect of words
defined by the new defining word. The corresponding chore in, for example, C
would require:
Defining the new syntax so that it isn't ambiguous and doesn't conflict with
existing syntax (nontrivial).
Defining the semantics of the new syntax (pretty easy).
Altering the parser and diagnostics of the compiler to recognize and diagnose
the new syntax (without impact on parsing or diagnosing old syntax!).
Altering the code generator so that the new target code implements the new
semantics (without impacting such things as register usage, optimizers, and so
forth).
The fig-Forth text interpreter, keyboard debugger and compiler looks something
like Example 2. While some programmers are attracted to Forth by its
simplicity and compactness, others are attracted by extensibility. Some of the
first object-oriented languages I ever saw were implemented by extension of
Forth.
William E. Drissel
Grand Prairie, Texas

Example 1: Mops code sample.
:class point super{ object } \ begin class definition, superclass is object
 int x \ declare an instance variable of type integer, named x
 int y \ declare an instance variable of type integer, named y

:m put: ( x y -- ) \ begin put: method definition. x, y passed via Forth stack
 put: y \ use put: methods of the int ivars to store values from stack
 put: x
;m \ end method definition

:m get: ( -- x y ) \ stuff between parentheses is just a Forth stack picture
 get: x \ x and y will be "gotten" to the Forth stack, ready for any use
 get: y
;m

:m print:
 get: self . . \ finally, we actually use some Forth, the dot (.) to print
;m

;class \ end class definition

Example 2: A Forth interpreter, compiler and debugger, in pseudocode.

forever {
 get the next word (delimited by whitespace)
 look it up in the dictionary
 if found
 if we are compiling
 if the word is immediate
 execute the word

 else
 "compile" the word into the dictionary
 else
 execute the word
 else //(not found, must be a number or undefined)
 if it's a number
 "compile" a literal into the dictionary
 else
 send word followed by "?" to screen
 stop compiling
 flush interpreter into buffer
} // end of forever


















































March, 1994
Binary-Data Portability


The DDR compiler--a tool for binary-file portability




Jos Luu


Jos is vice president of engineering at Mainsoft Corp. He can be reached via
Internet mail at jluu@mainsoft.com.


Sooner or later, all developers writing cross-platform applications to run on
80x86/Windows and RISC/UNIX platforms have to grapple with binary-file
portability. Binary files saved by a PC usually have a data representation
natural to the 80x86 architecture. If the application program is ported to the
UNIX environment, the data representations in the new system are likely to
have a different byte order, alignment, and size. To minimize programming
efforts, you want the file read/write routines to work in both 80x86 and RISC
environments and produce compatible data files, all without an elaborate
rewrite of the source code.
Programmers choose to write binary data because it's easy. Usually, the format
of the binary file is the same as the data image stored in memory. The code
necessary to write binary data to a file is often as simple as a few
instructions. There is no need to convert the data into some complex format.
The raw data is just sent to the disk. It is simple, fast, and concise, but it
also tends to be nonportable. One of the most popular forms of binary data,
for instance, is the binary graphic image (bitmap). The graphic image consists
of a header structure indicating how the data is arranged, followed by the
data itself.
With MainWin for Workstations, a tool that provides the Windows API on UNIX
workstations, one of our goals was to create a truly portable way to handle
the differences in binary-data representation in all environments, while
retaining a single source-code base and without having to manually add a lot
of elaborate data-translation code. We knew programmers cross-developing
PC-based Windows applications to run on these workstations would often need to
deal with binary-file portability.
Our solution to the binary-data conundrum was to create an underlying
technology we call "DDR compiler." While DDR is short for "DOS data
representation," it could more precisely be referred to as the "Intel 16-bit
data representation."
Our scheme works best when programmers simply write the contents of a
structure to disk; see Example 1. Using DDR-based tools, you can automatically
generate file read/write macros that handle the differences in binary-data
formats between the 80x86 and a number of RISC workstation platforms (IBM
RS/6000, HP700, Sun, SGI, and the like). You can also extend the tool to
translate the data representation to or from practically any architecture.
This is because the tool is built from a simple script that uses a common UNIX
network utility found on most workstations, rpcgen.
The DDR compiler takes for its input the data typedefs and structures from
your application's header files. Its output is a new header file and a C
source file that implements custom read/write functions. This file can be
easily incorporated into the application, making binary data available across
all supported platforms. Because the new read/write macros are conditionally
defined, they're expanded back into their original forms when compiled on the
PC, allowing a single code base to be used for both the PC and RISC platforms.
This article details the basic DDR architecture. The complete source code for
the DDR compiler described here is available free of charge at
ftp.mainsoft.com. Although Mainsoft retains all rights to the source code, you
can download a copy, modify it, use it for your own purposes (commercial or
otherwise), and include the object code in your library. We do ask that you
don't upload the software to a BBS or pass it along to others.


Sharing Binary Data Between Systems is Not New


Of course, movement of data between systems with different data
representations isn't new. It has been a perennial problem, and there are many
schemes for managing such operations. The one we employed is based on that
used by the Sun Remote Procedure Call (RPC) protocol.
The idea behind RPC is to allow programs to easily access compute resources
across a network. A program running in one machine makes a call to a procedure
in another machine. To make this work, a scheme had to be devised to
standardize the data representations of the procedures' calling arguments and
the results. To execute a procedure in another system, calling parameters are
converted to a binary neutral format understood by the other system. The
return values from the call are also provided in the binary neutral format.
Each system provides the logic to translate data to and from the common binary
format. This process, which is illustrated in Figure 1, is referred to as
"data marshaling." There are a number of formats of this kind, each
corresponding to a given protocol, including Courier data representation of
the Xerox Network Protocol, Sun's XDR, and the X409 ISO standard.
To accomplish data translation to and from the standard neutral format, you
have to write the necessary translation routines. Inventors of these protocols
were quick to realize that this was a mechanical task that could easily be
automated. That's how they came up with structure compilers that would write
the translation procedures for them. The most widely available compiler is the
rpcgen tool, so we chose it to develop the DDR compiler. Rpcgen is the
structure compiler used for developing RPC procedures compliant to the Sun RPC
protocol, so it is potentially available on many platforms.
Rpcgen's basic functionality is exactly what's needed to translate the 80x86's
common data format into a form usable on RISC workstations. Rpcgen generates
the high-level translation routines for the complex structures. However,
nothing within rpcgen's domain allows for direct translation between
particular data representations. For that, we developed our own low-level
DOS/UNIX translation library routines. Figure 2 illustrates how we use rpcgen
for DDR functionality.


How DDR Works


The DDR tool consists of the ddrgen program and associated library and header
files. Ddrgen is simply a script that wraps around a call to rpcgen and
subsequently modifies the files generated by rpcgen so that calls are made to
the ddr library instead of the xdr library.
Ddrgen takes as input the structure definition of the binary data, defined
with approximately the same syntax as a C header file. I say "approximately"
because the ddrgen syntaxer (the rpcgen syntaxer) does not allow complex
structure declarations.
You might need to simplify some of the data-structure definitions. For
instance, the rpcgen front end does not support the declaration of nested
structures. In such situations, the nested structure can be declared at the
top level and its name used inside other structures in order to achieve the
same results.
The structure definitions drive ddrgen. As a result, ddrgen outputs three
files that can be incorporated into your application source code:
filename_ddr.c, a C source file containing the read and write translation
routines constructed from the input-file data definitions.
filename_ddr.h, a header file for filename_ddr.c.
filename_rw.h, a header file that must be inserted into all application source
modules that read or write binary data using the DDR scheme.
The C source file which acts as a data translator is made up of a series of
simple function calls, one for each data element in the structure definition
that drove the process. For instance, your input structure might look like
Example 2(a), where struct1 is a typedef that has been defined before.
The DDR compiler outputs a C file with the ddr_MYSTRUCTURE function call,
which contains a series of function calls that encode or decode each data
element. Since the input structure contains three data elements, the resulting
function has three translation calls, arranged in the same order as the
structure definition. Thus, the output file would look like Example 2(b).
Ddrgen creates the translation calls by constructing the function-call names
from the data types called out in the input structure definition. The
low-level functions for integers, shorts, and the like are provided in the
translation library. To call the ddrgen-generated functions, you replace your
binary file fread and fwrite function calls with new DDR calls that know about
the translation code. (This assumes that the program is written in the way I
described earlier, simply reading and writing data structures directly to a
file. If the program does otherwise, some code would have to be rewritten to
employ this scheme.)
Four new macros replace the standard functions.
ddr_write replaces write.
ddr_fwrite replaces fwrite.
ddr_read replaces read.
ddr_fread replaces fread.
The process of replacing the read and write routines with their
DDR-equivalents is not much more than a simple search-and-replace process with
your editor. You also have to tell each DDR routine which data structure to
read or write. Example 3 shows read and write routines compared to their DDR
equivalents. You can see that the functions are similar, but there are two
additional parameters to pass, MYSTRUCTURE and nStatus. The DDR system block
diagram (Figure 3) shows how the various elements of the application are
linked to form an executable.
To see how all the elements of the DDR system work together, I'll examine how
a single data translation takes place, starting at a ddr_fread function call
and tracing the process all the way through. I'll assume that you ran the DDR
compiler and did the edits to your source files.
The ddr.h file, which is included by filename_ddr.c, contains many of the core
definitions of the DDR system, including the conditional statements (#ifdefs)
that select between the PC-compatible read/write routines and the new
translation routines. The code is set up so that if it is compiled in a PC
environment, the original fread and fwrite routines are used. If compiled for
a supported RISC machine, the DDR routines are used. In this example, assume
that I'm cross-developing to a 32-bit RISC machine.
In this case, the ddr_fread macro is expanded, as in Example 4(a). ddr_fread
creates a DDR handle (ddrs) using the ddrstdio_create function.
ddrstdio_create will initialize the ddr handle for reading from the file
stream and set up compatible data-translation function calls in the DDR
structure. In addition, it knows this will be a DDR_DECODE operation, and it
stores the DDR_DECODE operation code in the DDR handle to use later to select
a read routine.
The next line constructs a call to the user's structure-translation function.
In the example, the user's structure is called MYSTRUCTURE; this parameter is
substituted for ##Name to produce Example 4(b).
Recall that this function was created by ddrgen (in the filename_ddr.c). It
sequentially executes a translation of each element in the structure. For this
example, focus on just the first element since they all act approximately the
same; see Example 4(c).
The function ddr_int in Example 4(c) is a call to the DDR int translation
function. (The function ddr_struc1 is a call to a function that has also been
constructed by the ddr compiler when it has processed the previously defined
struc1. This is how we handle nested data structures.)

The translation functions are constructed using a few primitives keyed to the
80x86 architecture: Read routines read 80x86 data, and write routines write
80x86 data. These primitives include read/write routines for byte (8-bits),
short (16-bits), and long (32-bits).
ddr_int uses the DDR_DECODE opcode to select the DDR_GETSHORT macro. Example
5(a) shows the code that does it in ddr.c. ddr_int creates a short storage
space (sValue) to receive the int from the selected macro. Notice that if the
opcode was DDR_ENCODE, the DDR_PUTSHORT routine would have been selected.
DDR_GETSHORT is a macro that expands Example 5(b). The macro DDR_GETSHORT is
expanded to a call to the function stored in the getshort field of the
operations_vector of ddrs.
Recall that I chose to use ddr_fread for example, and that the macro expanded
to open our DDR handle with the function call ddrstdio_create(). As a part of
its process, ddrstdio initialized ddrs->operations_vector to a whole list of
low-level function calls designed specifically for the fread translation. This
gives the flexibility to have other getshort operations read data from
different sources. For instance, creating the DDR handle with ddrmem_create
would allow translating data to or from a memory buffer instead of a file.
You can see in Listing One (page 88) that operations_vector -> getshort is a
pointer to a function. It was initialized by the create function to call
ddrstdio_getshort. This routine fetches the next int from the input stream and
does the conversion, storing the result in the target structure. Example 6(a)
shows this step.
The routine first freads the next short into a little 2-byte buffer, sBuffer.
(Notice that this is the fread that you replaced with the ddr_fread macro. The
DDR compiler has built all this superstructure above it and a little code
below it also.)
Finally, MoveShort is called; see Listing Two (page 88). MoveShort has been
selected by #ifdef architecture to be the translation macro for the selected
target architecture. For this example, I'm assuming a Sparc or other
Big-endian RISC machine. The actual byte-by-byte translation now takes place;
see Example 6(b). (This implementation happens to be a macro, but it could
also be implemented as a function.) This little gem takes bytes from pSrc
(source) and stores them in pDest (destination), swapping the two bytes, as
required by the RISC machine. pDest points to the short sValue allocated in
the ddr_int routine. ddr_int will eventually get control back and put the
short translated value (sValue) at the location pointed by ip (an integer
pointer--4 bytes) in the application data structure. This implements the 2- to
4-byte extension required for integers.
To recapitulate, I substituted ddr_fread for fread. I then went from ddr_fread
to ddrstdio_create and ddr_MYSTRUCTURE. I then chose to follow the first ddr
translation, an int, which took us to ddr_int, and then to DDR_GETSHORT.
DDR_GETSHORT expanded to ddrs->operations_vector>getshort in the DDR
structure, which was initialized earlier (by ddrstdio_create) to
ddrstdio_getshort. It read a 2-byte element from the input stream, translated
it with a version of MoveShort (selected by an #ifdef architecture), and
returned.


Conclusion


For all its complication internally, the DDR compiler turns out to be simple
to use. It is a very useful tool for creating platform-independent read/write
routines for binary data. The overall scheme is here. If you want to
investigate it further, you can download the complete DDR compiler source code
via ftp at ftp.mainsoft.com.

Example 1: Writing contents of a structure to disk.
struct MYSTRUCTURE {
 int mydata1;
 struc1 mydata2;
 long mydata3;
} mydata;

nStatus = fwrite (&mydata, sizeof(MYSTRUCTURE), 1, stream);

 Figure 1: The data-marshaling process.
 Figure 2: The use of rpcgen simplifies the translation process.

Example 2: (a) A typical input structure; (b) C code generated by the DDR
compiler.
(a) struct MYSTRUCTURE { int mydata1; struc1 mydata2; long mydata3;};
(b) bool_t ddr_MYSTRUCTURE (ddrs, objp) DDR *ddrs; MYSTRUCTURE *objp;{ if
(!ddr_int(ddrs,objp->mydata1))return (FALSE); if
(!ddr_struc1(ddrs,objp->mydata2))return (FALSE); if
(!ddr_long(ddrs,objp->mydata3))return (FALSE); return (TRUE);}


Example 3: Standard read and write routines compared to DDR equivalents.
nStatus = write (fd, &mydata, sizeof(mydata));
ddr_write (fd, mydata, sizeof(mydata), MYSTRUCTURE, &nStatus);
nStatus = fwrite (&mydata, sizeof(mydata), 1, stream);
ddr_fwrite (&mydata, sizeof(mydata), 1, stream, MYSTRUCTURE, &nStatus);
nStatus = read (fd, &mydata, sizeof(mydata));
ddr_read (fd, &mydata, sizeof(mydata), MYSTRUCTURE, &nStatus);
nStatus = fread (&mydata, sizeof(mydata), 1, stream);
ddr_fread (&mydata, sizeof(mydata), 1, stream, MYSTRUCTURE, &nStatus);

 Figure 3: DDR system block diagram.

Example 4: The expanded ddr_fread macro; (b) substituting MYSTRUCTURE for
##Name; (c) the ddr_MYSTRUCTURE routine sequentially translates each structure
element.
(a) ddr_fread (pData, nSize, nNumber, stream, Name, pnStatus) { DDR ddrs;
ddrstdio_create (&ddrs, stream, DDR_DECODE); if
(ddr_##Name(&ddrs,(void*)(pData))) *(pnStatus) = ddrs.nCount; else *pnStatus =
-1; ddrstdio_destroy (&ddrs); /* kill the handle when done */}
(b) if (ddr_MYSTRUCTURE(&ddrs,(void*)(pData))) ...
(c) bool_t ddr_MYSTRUCTURE (ddrs, objp) DDR *ddrs; MYSTRUCTURE *objp;{ if
(!ddr_int(ddrs, objp->mydata1)) return (FALSE); if (!ddr_struc1(ddrs,
objp->mydata1)) return (FALSE); /* etc... */ return (TRUE);}


Example 5: (a) The ddr_int routine uses the DDR_DECODE opcode to select the
DDR_GETSHORT macro; (b) the DDR_GETSHORT macro.

(a) bool_t ddr_int(DDR * ddrs, int *ip) { short sValue; switch
(ddrs->operation) { case DDR_ENCODE: return (DDR_PUTSHORT(ddrs, ((short *)
ip))); case DDR_DECODE: *ip = 0; /* clear whole int because the reading will
only affect the lower part */ if (!DDR_GETSHORT(ddrs, &sValue)) { return
(FALSE);} *ip = sValue; /* this is how the translated int is passed back up */
return (TRUE);
(b) #define DDR_GETSHORT (ddrs, shortp) \
(*(ddrs)->operations_vector->getshort)(ddrs, shortp)


Example 6: (a) Fetching the next int from the input stream and does the
conversion, storing the result in the target structure; (b) the MoveShort
macro.
(a) boo_t ddrstdio_getshort(DDR * ddrs, short *sp) { short sBuffer; if
(fread((caddr_t) & sBuffer, sizeof(short), 1, (FILE *) ddrs->ddr_private) !=
1) return (FALSE); /* If you cant get the int, return a fail code */
MoveShort(&sBuffer, sp); ddrs->nCount += sizeof(short); return (TRUE);}
(b) #define MoveShort(pSrc,pDest) (*(char*)(pDest) = \ *((char *)(pSrc)+1), \
*((char*)(pDest)+1) = *(char *)(pSrc))


[LISTING ONE] (Text begins on page 18.)

typedef struct {
 enum ddr_op operation;
 struct ddr_ops *operations_vector;
 void * ddr_public; /* for application usage */
 char * ddr_private; /* for internal use 1 */
 char * ddr_base; /* for internal use 2 */
 int nCount; /* for internal use 3 */
} DDR;

struct ddr_ops {
 BOOL (*getlong)();
 BOOL (*putlong)();
 BOOL (*getshort)();
 BOOL (*putshort)();
 BOOL (*getbytes)();
 BOOL (*putbytes)();
 void (*destroy)();
} ;

enum ddr_op {
 DDR_ENCODE = 0,
 DDR_DECODE = 1,
} ;

[LISTING TWO]

#ifdef unix /* used in Ddr.h (via makeheaders) */

#if defined(sparc) defined(rs6000) defined (hp700) defined(m88k)
#define BIG_ENDIAN
#define ALIGNED
#elif defined(i86) defined(vax)
#define LITTLE_ENDIAN
#elif defined(mips)
#define LITTLE_ENDIAN
#define ALIGNED
#endif

/* Move<type>(from,to) */

/*
 * Byte moving, with byte-swapping.
 * For use on 68000, sparc and most of the riscs
 */

#if defined(BIG_ENDIAN)
#define MoveByte(pSrc,pDest) (*(char*)(pDest) = *((char*)(pSrc)),1)
#define MoveShort(pSrc,pDest) (*(char*)(pDest) = *((char*)(pSrc)+1),\
 *((char*)(pDest)+1) = *(char*)(pSrc),2)
#define MoveLong(pSrc,pDest) (*(char*)(pDest) = *((char*)(pSrc)+3),\
 *((char*)(pDest)+1) = *((char*)(pSrc)+2),\
 *((char*)(pDest)+2) = *((char*)(pSrc)+1),\

#elif defined(LITTLE_ENDIAN)
#ifndef ALIGNED

 /* Low-level byte moving, without byte-swapping. Use these definitions for
386

 * and other low-enders ciscs that dont have alignment constraints. (vax) */

#define MoveByte(pSrc,pDest) (*(char *)(pDest) = *(char*)(pSrc),1)
#define MoveShort(pSrc,pDest) (*(short *)(pDest) = *(short*)(pSrc),2)
#define MoveLong(pSrc,pDest) (*(long *)(pDest) = *(long*)(pSrc),4)
#else

 /* use this for mips(ultrix) and may be other riscs (alpha) */

#define MoveByte(pSrc,pDest) (*(char *)(pDest) = *(char *)(pSrc),1)
#define MoveShort(pSrc,pDest) (*(char*)(pDest) = *((char*)(pSrc)),\
 *((char*)(pDest)+1) = *(char*)(pSrc)+1,2)
#define MoveLong(pSrc,pDest) (*(char*)(pDest) = *((char*)(pSrc)),\
 *((char*)(pDest)+1) = *((char*)(pSrc)+1),\
 *((char*)(pDest)+2) = *((char*)(pSrc)+2),\
 *((char*)(pDest)+3) = *(char*)(pSrc)+3, 4)
#endif
#else
#error "unknown machine architecture"
#endif
End Listings









































March, 1994
The WRAPI Toolkit


A multilanguage toolkit for C libraries




Gregory C. Sarafin


Greg is a consultant and multilanguage developer. He is also the author of the
In-Press tool marketed by Dabiwa. He can be contacted on CompuServe at
73747,3112.


Like many software developers, I program in multiple languages. In addition to
C/C++, I regularly use Visual Basic, Clipper, FoxPro, and Paradox. After many
years of development, my office shelves began to sag under the accumulated
weight of third-party support libraries. At last count, I had four
communications libraries (one each for Clipper, FoxPro, C, and Visual Basic),
multiple printing tools, and a variety of other libraries. After reflecting on
the time it takes to learn and remember four or five ways of accomplishing the
same thing, I decided to create a language-independent API that generates
language-specific libraries. I call it WRAPI, short for "wrapped API."
By inserting a translation layer between the host-language API and the
internal functions of a library, WRAPI creates a higher-level access to what
has typically been the domain of C and ASM programmers. A C library written
with the WRAPI toolkit will find a much wider audience than a C-only library.
The number of non-C developers, particularly in the high-level languages, is
growing dramatically, and WRAPI provides a way to leverage costly development
efforts into this large market.
The host languages WRAPI supports include C, Clipper, FoxPro (DOS and
Windows), Visual Basic (DOS and Windows), Pascal, Clarion, Fortran, and many
Windows tools via a generic DLL. Certain types of libraries lend themselves to
WRAPI. Anything that is currently called an API (such as Microsoft's Mail API
or the ODBC API) is a candidate for WRAPI. WRAPI can also be used with
universal functions such as mathematics, communications, printing, and
multimedia.
WRAPI is proven technology. It's currently the core technology in Dabiwa's
In-Press tool, which provides a set of functions for developers who need to
produce published-quality output from within host applications. Because
In-Press is built from WRAPI, it works with all of the major development
languages on the PC platform. With this article, I'm putting WRAPI technology
into the public domain, making the source code available electronically
through DDJ; see "Availability," page 3.


WRAPI Defined


You usually don't talk in terms of an API when writing a C function library.
Such functions are compiled into a LIB and subsequently integrated into an
executable whole by a linker. As long as symbol-naming and parameter-passing
conventions are uniform among the various components, the executable will
indeed execute.
Instead, the term API is generally associated with the high-level languages.
FoxPro provides API functions for everything from memory allocation to
low-level I/O. Clipper provides API functions for parameter passing. Visual
Basic for DOS requires that strings be dereferenced with StringAdress(),
arguably an API function.
The term API in the context of WRAPI stands for the interface layer of the
host language. In C, that layer is simply the call stack and the C standard
library. In reality, there is no API. For the purposes of WRAPI, the term
"API" is a fiction of convenience. Throughout the rest of this article, I'll
refer to the host API when I wish to reference the interface layer of the
target development language.
I use the term "wrap" because WRAPI resolves differences by "wrapping" them in
a macro or a data structure. WRAPI provides a mechanism for hiding the various
methods of parameter passing that exist among the supported host APIs. It also
provides uniform mechanisms for error handling, string manipulation, memory
allocation, low-level I/O, and printing. It accomplishes this through a
combination of preprocessing tricks and a set of support functions which I'll
call the "WRAPI library."
Incidentally, the first wrapping functions I wrote were patterned after the
original work of Dabiwa's David Karasek, who used compiler directives (#IFDEF,
#ELIF, and so on) and boilerplate code to write functions that conditionally
compiled to C, Clipper, FoxPro, and Visual Basic. Dave's work provided a good
starting place to further the concepts of multilanguage support. In working
with his functions, I realized that most of the compiler directives were
required to resolve differences in parameter passing, so that is where I began
my attack.


Wrapping Functions


The first task in developing WRAPI was to create the specification for
"wrapping" functions--a translation layer that sits between the host API and
the underlying "core" function; see Figure 1.
A wrapping function grabs parameters from the host API, type-checks them (for
weakly typed languages), calls the core function, traps any error condition,
and returns a value to the host API. All serious work is handled by the core
function, including the task of domain checking parameters. In the initial
iteration of WRAPI, I made the mistake of putting all parameter checking in
the wrapping function. It seemed logical at the time, but I subsequently
learned that wrapper functions work best when they're free of semantic
content.
Traditionally, parameters and return values are placed on the call stack.
FoxPro and Clipper don't pass parameters in this manner. Xbase is a weakly
typed language, and the mechanisms for passing parameters are structured
accordingly. Clipper provides a set of API functions to grab parameters off of
the Clipper "Eval" Stack (an internally maintained Clipper stack). FoxPro
passes a pointer to a structure which itself points to an array of structures,
each of which contains a parameter reference. Both Clipper and FoxPro include
a set of functions for returning values through the host API. Listing One
(page 89) illustrates the differences among C, Clipper, and FoxPro parameter
passing.
Encapsulating the Xbase parameter mechanisms was straightforward. The WRAPI
library includes a set of functions for grabbing and type checking Clipper and
FoxPro parameters. The trick was reconciling the call-stack method of
parameter passing with the explicit parameter grabbing of the Xbase languages.
After some fiddling, I came up with the function template in Example 1, which
works uniformly well for both.
All uppercase keywords are WRAPI macro definitions. Many, such as TYPE_?, are
keyed to a basic data type. WRAPI supports six such data types: C, char; S,
string; B, Boolean; I, short int; L, long int; and D, double float.
Thus the macro TYPE_I indicates that the wrapping function returns a short
integer, while the macro GRAB_S() grabs a string-type parameter from the API.
Listing Two (page 89) is a simple wrapping function written with WRAPI macros.
The function has been stripped of the error-handling mechanisms so that you
can better understand the basics of parameter wrapping. The corresponding
translations to the C, Clipper, and FoxPro APIs are illustrated in Listing
Three (page 89).
The form of the wrapping function may seem a little bizarre at first, but it
is a bit of preprocessing trickery that greatly increases code readability.


String Handling


After reconciling the different parameter-passing conventions, the next step
was to deal with strings. In C, we generally work with zstrings (short for
"zero-terminated strings"). Many of the host languages, including Visual Basic
for DOS, FoxPro, Clarion, and Pascal, pass buffered strings of one type or
another. My first inclination was to convert all strings to zstrings at the
wrapping-function level.
This approach would have allowed core functions to work exclusively with
zstrings, the advantages of which are obvious. The strategy was simple enough:
Allocate a chunk of memory one byte longer than the buffered string, copy the
buffered string, and zero terminate it. But some APIs provide precious little
available heap, so a function that receives a 48K string would have to burn up
another 48K of heap space simply to add a zero terminator.
The alternative was a data structure that could handle the various string
types with equal aplomb. This structure, the WSTR, is shown in Figure 2. The
PREP_S() and GRAB_S() macros automatically create and load a WSTR structure
with the necessary components of the API string. All API strings are resolved
to an address stored in the cp component and a string length stored in the
uiLen component. Some APIs pass strings via a handle. In such cases, the
handle is stored in the ulHnd component.
The WSTR approach adds a level of complexity to writing core functions. Most
would agree that it is easier to work with zstrings than a buffered hybrid
such as WSTR. Unfortunately, the wrapping macros worked much more efficiently
using the WSTR, and that tipped the balance. Listing Four (page 89) shows a
simple string-manipulation function. The associated C, Clipper, and FoxPro
translations are shown in Listing Five (page 89). These translations are quite
involved and are beyond the scope of this article.
Once I decided to wrap strings in a WSTR structure, it became necessary to
write a set of manipulation functions. The WRAPI library provides a basic set
of WSTR functions; see Table 1.


The C Standard Library


String manipulation highlights one of the bigger drawbacks of WRAPI
development. There are effectively no C standard-library functions. Staples of
the C programmer such as strcpy(), sprintf(), and strtok() simply don't exist
in Clipper, FoxPro, or Visual Basic. However, many APIs provide alternate
functions. Consider memcpy(): The FoxPro API includes a function _MemCpy(),
and the Clipper standard library includes an undocumented internal _bcopy().
To make WRAPI more C-friendly, I developed a strategy of wrapping C standard
functions with a macro of the same name in upper case. Thus, memcpy() became
MEMCPY(). The macro translates to the C standard function or to the
corresponding API function. Where an API didn't provide the equivalent C
standard function, I wrote one. Table 2 lists the C standard functions that I
include with WRAPI. You may wish to add to the list. My goal was to minimize
dependence on the C standard library, so the list remains quite short.

You may notice the low-level I/O functions deviate slightly from the strategy
of wrapping standard functions in uppercase macros. The open() function became
F_OPEN_RW(), F_OPEN_RO(), and F_OPEN_WO(). Likewise, lseek() became F_GOBOF(),
F_GOEOF(), F_GOTO(), and so on. Low-level I/O flags differed among the
supported APIs, and the easiest solution was to eliminate them entirely by
creating multiple macros.


Memory Allocation


The next order of business was to provide a uniform method of memory
allocation. The malloc() function cannot be used in all APIs. Again, the main
culprits are the Xbase dialects. Both allocate memory using an API function
that returns a handle. The handle is easily decoded to an address. When it is
time to free the memory, it is the handle that must be supplied, not the
decoded address. Thus, the handle must be retained somehow.
The WRAPI library includes ALLOC() and FREE() macros. ALLOC() increases the
size of each memory allocation by six bytes and returns an address six bytes
in from the actual start of allocated memory. The first six bytes are used to
store a 4-byte handle (if needed) and a 2-byte length. This allows FREE() to
function transparently.


Printing


Under DOS, printing is handled through low-level I/O functions. This allows
printing to be directed to either a device or a file. Under Windows, printing
to a file is also accomplished through low-level I/O, but printing to a print
device requires calls to GDI functions.
In designing WRAPI, I adhere to the DOS printing model. Four constants are
used to specify the print devices PRN and LPT1 through LPT3 (no support for
COM devices other than a redirected PRN). These constants all have negative
values. This allows a single function to be used when selecting a print
destination. Either a valid device constant or a file handle (a positive
number) can be passed. WRAPI then decodes this information into the
appropriate print destination.
Like many things in WRAPI, the print destination requires a wrapping
structure. Figure 3 shows the WPRN structure. Under DOS it is sufficient to
store the file handle and a Boolean flag indicating that output is directed to
a standard print device. Under Windows it is also necessary to store the HDC
(handle to a device context) and the state of the spool flag in WIN.INI.
(WRAPI automatically turns the spooler off before printing and restores it to
its previous state when finished.)
The WRAPI print functions are listed in Table 3. After receiving the desired
print destination (in the form of a signed integer) from the host application,
iWPrnStart() must be called to decode the destination into a WPRN structure.
If the destination is a standard device, iWPrnStart() opens the device under
DOS, or creates an HDC under Windows. When printing is completed, iWPrnEnd()
must be called to clean up. Failure to call iWPrnEnd() under Windows will
result in an orphaned HDC--a very bad idea.
The remaining print functions are used to send data directly to the print
device. No effort is made to format output under GDI. All output is sent via
the GDI Escape() function as DEVICEDATA. This effectively turns Windows
printing into DOS printing. Incidentally, this is also the reason the print
spooler must be turned off.


Error Handling


The error handling in WRAPI reflects my belief that the return value of a
function should not be the source of error information. WRAPI provides an
error-trapping mechanism that delivers the error to the host API in the most
appropriate manner. Table 4 lists the different error mechanisms employed by
WRAPI.
The biggest compromise in error handling is caused by host APIs that don't
have a standard error-trapping mechanism. In such cases, it is necessary for
the host application to pass a function reference to the core library. The
core library must then make that function reference available to the WRAPI
error-trapping mechanism. The error trap simply calls the referenced function.
Unfortunately, this destroys an otherwise clean separation between the host,
the wrapping, and the core.
This shortcoming of the error system is best illustrated with an example.
In-Press provides two functions: IpErrFName(), which takes the name of a
FoxPro function designated as the error handler; and IpErrFPtr(), which takes
the address of a C function (cast as a long) designated as the error handle.
Failure to register an error function with IpErrFName() under FoxPro or
IpErrFPtr() under C effectively defeats In-Press error handling. In effect,
the application is blind to In-Press errors.
Error handling blurs the line between the core and the wrapping. All core
libraries must follow certain error conventions and provide a standard set of
error macros to the wrapping functions. These macros, listed in Table 5, act
as a call back into the core library. Listing Six (page 90) is an In-Press
wrapping function and its corresponding C translation. Note the references to
the In-Press error-handling internals.


GET/SET Functions and Nil Values


A WRAPI library design is constrained by several factors. For one thing, the
names of wrapping functions are limited to ten characters (as are the names of
any support constants). This limitation is imposed by FoxPro, and there is no
way around it. More importantly, the data types are limited to the six basic
ones listed earlier: character, string, Boolean, short integer, long integer,
and double float.
In C, there is a tendency to keep related data in a structure. This works
nicely because the structure can be passed by reference. WRAPI precludes that
type of interaction between a core library and the host API. How then do you
structure a useful library if data moves in and out as simple data types?
My solution was to store the data structure internally and provide discrete
access to each component with a GET/SET function. For those unfamiliar with
the term GET/SET, it is a function that can simultaneously change the state of
an internal value while returning its previous state. A GET/SET function can
also query the state of an internal value without actually changing it by
passing a NIL value.
Although you don't need to use GET/SET functions in the design of a WRAPI
library, they are quite useful. WRAPI includes logic for NIL values and a
sample GET/SET engine.


Necessary Tools for WRAPI Development


Table 6 lists all of the tools needed to develop and test a WRAPI library. It
is no small undertaking. In addition to purchasing and installing the listed
products, you will want a mechanism for automating the build process. A set of
batch files and a sample makefile are included with the source diskette to
help you get started.


Writing WRAPI Functions


Any library requires a bit of planning. WRAPI simply adds to the process. The
general scheme for developing a WRAPI library is:
1. Map out the function list.
2. Use GET/SET functions where possible. Functions that need to return
composite values (such as arrays or structures) can either return them as
strings (using the WRAPI stringify() function) or divide the return values
among several GET-only functions.
3. Create function prototypes and constant definitions for the various
languages.
4. Map out which functions will be in which source files.
5. Write the core error-handling functions first. (They should be patterned
after the error-handling functions in the sample library.)
6. Write the error-wrapping functions and test them in a couple of languages.
7. If you'll be using GET/SET functions, set up the GET/SET engine.
8. Write any other core functions that are required for underpinning the
library.
9. Write wrapping functions in related groups. Debug under DOS first--C is
easiest. Next debug under Windows--Visual Basic works well. Always test in
FoxPro.
10. When the library is ready to ship, compile using optimization.



Testing and Debugging WRAPI Functions


It can take time to test a WRAPI library. Each change to a library must be
tested under the numerous languages. It is not uncommon to spend an entire
afternoon checking the results of a function that took only one hour to write.
WRAPI certainly doesn't lend itself to an ad hoc development cycle. After you
gain familiarity with WRAPI development you learn where to take some
shortcuts. (If it works under Clipper and FoxPro for Windows, for example, it
probably works under everything else.)
WRAPI development seems to yield robust libraries. The various host APIs
expose different types of problems. I found that general debugging was easiest
in the DOS environment using C or Clipper as the host language. Once all of
the obvious errors were exorcised, it was on to the protected-mode languages,
either FoxPro for Windows or Visual Basic for Windows, to find the memory
violations. I was able to use the CodeView debuggers quite successfully.


Conclusion


WRAPI is a work in progress and there are shortcomings to the technology that
you may want to improve. For instance, parameters must be passed by value,
although it should be possible to wrap parameters passed by reference. WRAPI
has no mechanism for wrapping interrupts. The general lack of C standard
library functions will certainly impede conversion of existing libraries to
WRAPI. Also, WRAPI has not been extended to OS/2, NT, or UNIX.
There are several inherent limitations to WRAPI. The ten-character limit on
symbol names can make function and support constants a bit terse.
The technology is not quite a "black box." Thus, the WRAPI developer is
required to understand the APIs of the supported languages. And WRAPI
functions are by their nature limited to "driver" type applications.
Proprietary hooks into a particular O/S or a development environment defeat
the purpose of WRAPI.
Even with these limitations, WRAPI is a good starting point if you need to
support multiple languages. The source files include everything you need. I
will maintain and update WRAPI on the CompuServe DDJ Forum. I hope you will
join me there as I answer questions and attempt to improve the technology
based on the suggestions of other WRAPI developers.
 Figure 1: WRAPI sets between the host API and underlying core function.
Figure 2: The WSTR data structure handles all variations of strings.
/* WSTR typedef */
typedef struct { // WRAPPED-API STRING STRUCTURE
 char *cp ; // Actual pointer to the string
 ushort uiLen ; // Length of string
 ulong ulHnd ; // Host API handle if applicable
} WSTR ;


Example 1: The "wrapping" function template.
TYPE_? <Function Name> ( PARMLIST ) {
 DECL_? ( <Return Value> ) ;
 PREP_? ( <Parameter #> , <Parameter Name> ) ; ...
 SET_ERR_NONE ( ) ;
 GRAB_? ( <Parameter #) , <Parameter Name> ) ; ...
 if ( IS_ERR_NONE ( ) )
 <Return Value> = <Core Function> (
<Parameters> ) ;
 SET_ERR_LOC ( "<Function Name>" ) ;
 TRAP_ERR ( ) ;
 SEND_? ( <Return Value> )
 }


Table 1: WSTR functions.
 Function Description

 short szwcmp ( char* , WSTR* ) compares zstring to WSTR
 short swzcmp ( WSTR* , char* ) compares WSTR to zstring
 short swwcmp ( WSTR* , WSTR* ) compares WSTR to WSTR
 void szwcpy ( char* , WSTR* , ushort ) copies WSTR to zstring
 void swzcpy ( WSTR* , char* ) copies zstring to WSTR
 void swwcpy ( WSTR* , WSTR* ) copies WSTR to WSTR


Figure 3: The WPRN structure.
/* WPRN typedef */
typedef struct { // WRAPPED PRINT DEVICE STRUCTURE
 bool bStd ; // a STD print device?
 fhandle fh ; // handle to a DOS file/device
 #ifdef _FAMILY_DLL
 HDC hdc ; // handle to a device context
 bool bSpool ; // indicates that spooler was hooked up

 #endif
} WPRN ;


Table 2: C standard functions included with WRAPI.
 WRAPI function C standard Meaning

 STRCPY strcpy Copy zstring
 STRCMP strcmp Compare zstrings
 STRLEN strlen Length of zstring
 MEMCPY memcpy Copy memory block
 MEMCMP memcmp Compare memory blocks
 MEMSET memset Fill memory block with char
 ATOI atoi zstring to short
 ATOL atol zstring to long
 ATOF atof zstring to double
 ITOA itoa Short to zstring
 LTOA ltoa Long to zstring
 UTOA utoa Unsigned short to zstring
 ULTOA ultoa Unsigned long to zstring
 FTOA Double to zstring
 ROUND Round double to decimal place
 F_OPEN_RW open Open device for read/write
 F_OPEN_RO open Open device for read only
 F_OPEN_WO open Open device for write only
 F_CREATE create Create file
 F_READ read Read from device
 F_WRITE write write to device
 F_GOBOF lseek Go to beginning of file
 F_GOEOF lseek Go to end of file
 F_GOTO lseek Go to offset in file
 F_GOBACK lseek Go to negative offset from EOF
 F_POS lseek Position pointer in file
 F_MOVE lseek Move pointer through file
 F_CLOSE close Close device


Table 3: WRAPI print functions.
 Function Description

 short iWPrnStart ( WPRN* , short ) initialize print context
 short iWPrnEnd ( WPRN* ) end print context
 bool bWPrnC ( WPRN* , char ) print a char
 bool bWPrnSZ ( WPRN* , char* ) print a zstring
 bool bWPrnSB ( WPRN* , char* , ushort ) print a buffer
 bool bWPrnWSP ( WPRN* , WSTR* ) print a WSTR


Table 4: WRAPI error mechanisms.
 Host API Mechanism

 C (DOS) Call to a function pointer
 Clarion Error posted in queue
 Clipper S'87 Call to MISC_ERROR()
 Clipper 5.x Error event
 Generic DLL Error message
 FoxPro Call to a named function
 Visual Basic ON ERROR event



Table 5: WRAPI error macros.
 Error Macro Description

 SYSTEM_NAME() Returns name of WRAPI library (for example, "In-Press")
 IS_ERR_NONE() Returns Boolean True if no error condition
 IS_ERR_ERR() Returns Boolean True if error in error handler
 GET_ERR_CODE() Returns the current error code
 GET_ERR_FPTR() Returns pointer to a C function designated as the error
handler
 GET_ERR_FNAME() Returns name of FoxPro function designated as error handler
 GET_ERR_HOST() Returns host API error designator (used for Visual Basic)
 GET_ERR_LOC() Returns current error location (name of wrapping function)
 GET_ERR_TEXT(i) Returns the text associated with error code i
 SET_ERR_NONE() Sets error code for: no error condition
 SET_ERR_TYPE() Sets error code for: invalid data type
 SET_ERR_COUNT() Sets error code for: invalid parameter count
 SET_ERR_ALLOC() Sets error code for: unable to allocate memory
 SET_ERR_OUT() Sets error code for: problem writing to output device
 SET_ERR_GDI_CMD() Sets error code for: problem writing to GDI
 SET_ERR_LOC(sz) Sets the current location (name of wrapping function)
 CALL_ERR_VBD() Invokes Visual Basic for DOS error bridge
 CALL_ERR_CLIP4() Invokes Clipper S'87 error bridge
 CALL_ERR_MSG() Invokes function to post a Windows error message


Table 6: Tools for developing and testing WRAPI libraries.
 Tool Needed for

 Borland C 3.x Borland C
 Microsoft C 5.1 Clipper (MSC 7.0 can be substituted)
 Microsoft C 7.0 Microsoft C, generic DLL, Visual Basic
 TopSpeed C TopSpeed, Clarion
 Watcom C 9.x Watcom C, FoxPro
 Clarion Testing (order with TopSpeed)
 Clipper S'87 Testing (if you can find it)
 Clipper 5.2x Testing
 Paradox for Windows Testing
 Visual Basic for DOS Testing
 Visual Basic for Windows Testing
 Microsoft CodeView & NMake Debugging and making (comes with MSC)


[LISTING ONE] (Text begins on page 26.)

/* C Parameters passed on call stack */
double SquareIt ( double dValue ) {
 return ( dValue * dValue ) ;
 }

/* Clipper Parameters aquired through API functions */
CLIPPER SquareIt ( void ) {
 double dValue = 0.0 ;
 // if one parameter of type numeric
 if ( _parinfo ( 0 ) == 1
 && _parinfo ( 1 ) == NUMERIC
 )
 {
 // grab parameter from Clipper
 dValue = _parnd ( 1 ) ;

 }
 // send return value to Clipper
 _retnd ( dValue * dValue ) ;
 return ;
 }
/* FoxPro Parameters passed in a data structure */
void SquareIt ( ParamBlk *pParmBlk ) {
 double dValue = 0.0 ;
 // if one parameter of type numeric
 if ( (*pParamBlk).pCount == 1
 && (*pParamBlk).p[0].val.ev_type == N' )
 )
 {
 // grab parameter
 dValue = (double) (*pParamBlk).p[0].val.ev-real ;
 }
 // send return value to FoxPro
 _RetFloat ( dValue * dValue , 12 , 4 ) ;
 return ;
 }

[LISTING TWO]

/* Setup for wrapping functions */
#ifdef _FAMILY_C // parameters passed on call stack
 #define SquareIt(PARMLIST) SquareIt ( double dValue )
#ednif
#ifdef _API_FOX // FoxPro requires a global data structure
 FoxInfo myFoxInfo[] =
 { { "SQUAREIT" , (FPFI) SquareIt , 1 , "N" } } ;
 FoxTable _FoxTable =
 { (FoxTable*) 0
 , sizeof(myFoxInfo) / sizeof(FoxInfo)
 , myFoxInfo
 } ;
#endif
/* Prototypes */
TYPE_D SquareIt ( PARMLIST ) ;
double _SquareIt ( double ) ;
/* Wrapping function (without error handling) */
TYPE_D SquareIt ( PARMLIST ) { // ( dValue )
 // declare return value (and any other local values)
 DECL_D ( dReturn ) ;
 // prepare to accept parameter(s)
 PREP_D ( 1 , dValue ) ;
 // grab parameter(s) from host API
 GRAB_D ( 1 , dValue ) ;
 // call core function
 dReturn = _SquareIt ( dValue ) ;
 // send return value to host API
 SEND_D ( dReturn ) ;
 }
/* Core function */
double _SquareIt ( double dValue ) {
 return ( dValue * dValue ) ;
 }

[LISTING THREE]


/* C translation of wrapping function */
double SquareIt ( double dValue )
 {
 // DECL_D ( dReturn ) ;
 double dReturn = 0.0 ;
 // PREP_D ( 1 , dValue ) ;
 ;
 // GRAB_D ( 1 , dValue ) ;
 ;
 // call to core function
 dReturn = _SquareIt ( dValue ) ;
 // SEND_D ( dReturn ) ;
 return dReturn ;
 }
/* Clipper translation of wrapping function */
void SquareIt ( void )
 {
 // DECL_D ( dReturn ) ;
 double dReturn = 0.0 ;
 // PREP_D ( 1 , dValue ) ;
 double dValue = 0.0 ;
 // GRAB_D ( 1 , dValue ) ;
 dValue = dAPIgrab ( 1 ) ;
 // call to core function
 dReturn = _SquareIt ( dValue ) ;
 // SEND_D ( dReturn ) ;
 _retnd ( dValue ) ;
 return ;
 }
/* FoxPro translation of wrapping function */
void FAR SquareIt ( ParamBlk* gpFoxParm )
 {
 // DECL_D ( dReturn ) ;
 double dReturn = 0.0 ;
 // PREP_D ( 1 , dValue ) ;
 double dValue = 0.0 ;
 // GRAB_D ( 1 , dValue ) ;
 dValue = dAPIgrab ( 1 , gpFoxParm ) ;
 // call to core function
 dReturn = _SquareIt ( dValue ) ;
 // SEND_D ( dReturn ) ;
 _RetFloat ( dReturn , 12 , 4 ) ;
 return ;
 }

[LISTING FOUR]

/* Setup for wrapping functions */
#ifdef _FAMILY_C // parameters passed on call stack
 #define FlipIt(PARMLIST) FlipIt ( APISTR wspText )
#ednif
#ifdef _API_FOX // FoxPro requires a global data structure
 FoxInfo myFoxInfo[] =
 { { "FLIPIT" , (FPFI) FlipIt , 1 , "C" } } ;
 FoxTable _FoxTable =
 { (FoxTable*) 0
 , sizeof(myFoxInfo) / sizeof(FoxInfo)
 , myFoxInfo
 } ;

#endif
/* Prototypes */
TYPE_S FlipIt ( PARMLIST ) ;
WSTR* _FlipIt ( WSTR* ) ;
/* Wrapping function (without error handling) */
TYPE_S FlipIt ( PARMLIST ) { // ( wspText )
 // declare return value (and any other local values)
 DECL_S ( wspReturn ) ;
 // prepare to accept parameter(s)
 PREP_S ( 1 , wspText ) ;
 // grab parameter(s) from host API
 GRAB_S ( 1 , wspText ) ;
 // call core function
 wspReturn = _FlipIt ( wspText ) ;
 // send return value to host API
 SEND_S ( wspReturn ) ;
 }
/* Core function */
WSTR* _FlipIt ( WSTR* wspText )
 {
 // storage for return value
 static char szaReturn [ 256 ] ;
 // WSTR wrapper for return value
 static WSTR wsReturn ;
 // pointers for string flip loop
 char *cpOld , *cpNew ;
 // counter for string flip loop
 ushort u ;
 // point return WSTR at return buffer
 wsReturn.cp = szaReturn ;
 // set return length to incoming length
 wsReturn.uiLen = *wspText.uiLen ;
 // limit return length to size of return buffer
 if ( wsReturn.uiLen > ( sizeof ( szaReturn ) - 1 ) )
 wsReturn.uiLen = ( sizeof ( szaReturn ) - 1 ) ) ;
 // set up loop variables
 u = wsReturn.uiLen ;
 cpNew = szaReturn ;
 cpOld = *wspText.cp + u - 1 ;
 // flip old into new
 while ( u-- )
 *cpNew++ = *cpOld-- ;
 // zero terminate regardless of host API
 *cpNew = \0' ;
 // return WSTR pointer
 return ( &wsReturn ) ;
 }

[LISTING FIVE]

/* C translation of wrapping function */
char* FlipIt ( char* wspText )
 {
 // DECL_S ( wspReturn ) ;
 WSTR wsReturn ; WSTR *wspReturn = &wsReturn ;
 // PREP_S ( 1 , wspText ) ;
 WSTR wsBuff1 ;
 // GRAB_S ( 1 , wspText ) ;
 if ( wspText == NULL )

 {
 wsBuff1.cp = (char*) "\xFF" ; // defined as NIL
 wsBuff1.uiLen = (ushort) 1 ;
 }
 else
 {
 wsBuff1.cp = (char*) wspText ;
 wsBuff1.uiLen = (ushort) strlen ( wsBuff1.cp ) ;
 }
 wsBuff1.ulHnd = 0L ;
 wspText = &wsBuff1 ;
 // call to core function
 wspReturn = _FlipIt ( wspText ) ;
 // SEND_S ( wspReturn ) ;
 return *wspReturn.cp ;
 }
/* Clipper translation of wrapping function */
void FlipIt ( void )
 {
 // DECL_S ( wspReturn ) ;
 WSTR* wspReturn = (WSTR*) (void*) 0L ;
 // PREP_S ( 1 , wspText ) ;
 WSTR wsBuff1 ; WSTR* wspText = &wsBuff1 ;
 // GRAB_S ( 1 , wspText ) ;
 _bcopy ( wspText , wspAPIgrab ( 1 ) , sizeof ( WSTR ) ) ;
 // call to core function
 wspReturn = _FlipIt ( wspText ) ;
 // SEND_S ( wspReturn ) ;
 _retc ( *wspReturn.cp ) ;
 return ;
 }
/* FoxPro translation of wrapping function */
void FAR FlipIt ( ParamBlk *gpFoxParm )
 {
 // DECL_S ( wspReturn ) ;
 WSTR* wspReturn = (WSTR*) (void*) 0L ;
 // PREP_S ( 1 , wspText ) ;
 WSTR wsBuff1 ; WSTR* wspText = &wsBuff1 ;
 // GRAB_S ( 1 , wspText ) ;
 _MemCpy ( wspText
 , wspAPIgrab ( 1 , gpFoxParm )
 , sizeof ( WSTR )
 ) ;
 // call to core function
 wspReturn = _FlipIt ( wspText ) ;
 // SEND_S ( wspReturn ) ;
 _RetChar ( wspReturn && *wspReturn.cp
 ? *wspReturn.cp
 : ""
 ) ;
 while ( (*gpFoxParm).pCount )
 if ( (*gpFoxParm).p[--(*gpFoxParm).pCount].val.ev_type == C' )
 _HUnLock ( (*gpFoxParm).p[(*gpFoxParm).pCount].val.ev_handle ) ;
 return ;
 }

[LISTING SIX]

/* An In-Press wrapping function (with error handling) */

TYPE_I IpBoStyle ( PARMLIST ) { // ( [iStyle] )
 DECL_I ( iReturn ) ;
 PREP_I ( 1 , iStyle ) ;
 SET_ERR_NONE ( ) ;
 GRAB_I ( 1 , iStyle ) ;
 if ( IS_ERR_NONE )
 iReturn = GETSET_I ( HND_BO_STYLE , iStyle ) ;
 SET_ERR_LOC ( "IpBoStyle" ) ;
 TRAP_ERR ( ) ;
 SEND_I ( iReturn ) ;
 }
/* The C translation of an In-Press wrapping function */
short IpBoStyle ( iStyle ) {
 // DECL_I ( iReturn ) ;
 short iReturn ;
 // PREP_I ( 1 , iStyle ) ;
 ;
 // SET_ERR_NONE ( ) ;
 ( _iIpErrCode ( 0 ) ) ;
 // GRAB_I ( 1 , iStyle ) ;
 ;
 // if ( IS_ERR_NONE ) ...
 if ( ( _iIpErrCode ( -32767 ) == 0 ) )
 iReturn = ( * ( (short*) _vpIpSet ( 80 + 512 , &iStyle ) ) ) ;
 // SET_ERR_LOC ( "IpBoStyle" ) ;
 ( _wspIpErrLoc ( szaaswsp ( "IpBoStyle" ) ) ) ;
 // TRAP_ERR ( ) ;
 if ( ! ( _iIpErrCode ( -32767 ) == 0 ) )
 vErrTrapC ( ) ;
 // SEND_I ( iReturn ) ;
 return iReturn ;
 }

End Listings




























March, 1994
Multiplatform .INI Files


Portable profile functions




Joseph J. Graf


Joe has been programming for eleven years. He is currently working on an
embedded system, writing PC and Windows-based development tools for the
embedded OS. He can be reached at jg@usorder.com.


Portable-interface toolkits allow you to write code once and recompile with
the appropriate libraries for other target platforms. Most of these interface
libraries are implemented with a lowest-common-denominator feature subset.
This means that if you depend heavily on a specific operating-system feature,
the only option available is to develop your own portable version of the
routines.
One of the features in the Windows API that I continually rely on is its
profile functions, which encapsulate the reading and writing of information to
.INI files. For example, I often use .INI files to maintain a list of
last-used settings for "smart" dialog boxes. This feature has been so popular
in the Windows version of my applications that I implemented the same for my
DOS and UNIX programs. The routines presented in this article are portable
versions of the Windows functions GetPrivateProfileString(),
GetPrivateProfileInt(), and WritePrivateProfileString(). My functions only
work with private profile strings for the obvious reason that other operating
environments do not have a WIN.INI file.


Reading from the .INI File


To read an entry's value in a section of an .INI file, you use my functions
get_private_profile_int() and get_private_profile_string(). Listing One (page
91) presents the header file which contains the prototypes to these functions.
The two functions use the same parameters as the equivalent Windows versions,
allowing you to incorporate them easily into your existing code. You can use
#ifdefs at the point of call, as in Example 1. Alternately, you can use
#defines, for example, to make get_private_profile_string resolve to
GetPrivateProfileString when compiling for Windows and to remain unchanged
when compiling for DOS and UNIX. Thus, you can maintain a single source module
for multiple platforms--at least in the case of profile-string manipulations.
Windows .INI files are divided into sections and entries. A section appears
within the file as [section_name], followed by any number of entries using the
format entry_name=entry_value. For example, the [ports] section of your
WIN.INI file contains entries such as COM1:=9600,n,8,1,x.
So the first task in obtaining an entry's value is to find the appropriate
section within the file. Your code passes the section name to either of the
GetPrivateProfileInt or _String functions as a regular string, for example,
"section_name". The Windows function then turns this parameter into
"[section_name]". Likewise, get_private_profile_int() and
get_private_profile_string() also enclose the section-name parameter within
square brackets. Once the section-name string is in the correct format, the
routines search the .INI file line by line until a matching section name is
found. If one is not found, the value specified in the default parameter will
be returned for the integer function; the string function copies the default
string into the buffer and returns the default-string length.
If the correct section was found in the .INI file, the profile routines search
for a matching entry. The entry name is compared up to the equal sign using
the length of the entry parameter. If the search traverses the section without
finding the specified entry, the default value or string is returned. If an
entry is found, a pointer is set to point to the value portion of the string.
The get_private_profile_int() function copies the entry's value string into a
temporary string buffer as long as the entry contains numbers, then returns
the numeric value of the string using the standard library function atoi().
The get_private_profile_string() function copies the value portion of the
entry to the buffer up to a maximum of buffer_len characters. If the entry
contains more characters than the buffer can hold, it is truncated to prevent
memory overruns. This function returns the number of characters copied into
the buffer.


Writing to .INI Files


The Windows profile routines only support the writing of strings to .INI
files. To maintain compatibility with the Windows routines, my code has the
equivalent limitation. In the effort to keep the code straightforward, my
write function creates a temporary file and copies the .INI file to it, as the
.INI file is read and searched. At the end, when the file has been completely
written, the old .INI file is unlinked and the temporary filename is changed
to the supplied filename. This may not be the most efficient method, but it
was easy to implement and will be easy to maintain.
The write function tries to open the file for reading first. If it can't do
so, the file must not exist, so write creates one and then writes the section
information, followed by the entry and its associated string.
The write_private_profile_string() function searches the .INI file for the
correct section as with the get functions, but this function copies its input
to the temporary file as it goes. If it never finds the section asked for, the
function appends the section and entry to the end of the file. If the function
finds the correct section, it then searches for the correct entry, copying the
file in the process. If the end of the section comes without finding the
entry, my routine adds the entry at the end of the section, and the remainder
of the file is then copied. In the case where the entry was found, its old
value is discarded, replaced by the value passed into the function.
Listing One contains the prototypes for the functions plus the #define
MAX_LINE_LENGTH, which is currently set at 80. Listing Two (page 91)
implements the functions read_line(), get_private_profile_int(), and
get_private_profile_string(). Listing Three (page 91) contains the
write_private_profile_string() function.
These functions were designed for ease of reading. There are places where the
code could be made smaller by creating subroutines from redundant code,
although the savings would be nominal at best. Also, you could write
additional functions to handle reading/writing longs, floats, and so on. I
didn't implement these additional functions because Windows itself does not
support them, and compatibility was of utmost importance. The goal of
write-once-port-many is often elusive, but achieving it in this particular
case is not too difficult.

Example 1: Calling a routine to read a profile string, using a compile-time
conditional to invoke either a Windows version or a portable version.
#ifdef WINDOWS
GetPrivateProfileString(ini_section,ini_entry,default_str,buffer,buffer_len,ini_file);
#else
get_private_profile_string(ini_section,ini_entry,default_str,buffer,buffer_len,ini_file);
#endif

[LISTING ONE] (Text begins on page 36.)

/******************************************************************************
 PORTABLE ROUTINES FOR WRITING PRIVATE PROFILE STRINGS -- by Joseph J. Graf
 Header file containing prototypes and compile-time configuration.
******************************************************************************/

#define MAX_LINE_LENGTH 80

int get_private_profile_int(char *, char *, int, char *);
int get_private_profile_string(char *, char *, char *, char *, int, char *);
int write_private_profile_string(char *, char *, char *, char *);

[LISTING TWO]


/***** Routines to read profile strings -- by Joseph J. Graf ******/
#include <stdio.h>
#include <string.h>
#include "profport.h" /* function prototypes in here */

/*****************************************************************
* Function: read_line()
* Arguments: <FILE *> fp - a pointer to the file to be read from
* <char *> bp - a pointer to the copy buffer
* Returns: TRUE if successful FALSE otherwise
******************************************************************/
int read_line(FILE *fp, char *bp)
{ char c = \0';
 int i = 0;
 /* Read one line from the source file */
 while( (c = getc(fp)) != \n' )
 { if( c == EOF ) /* return FALSE on unexpected EOF */
 return(0);
 bp[i++] = c;
 }
 bp[i] = \0';
 return(1);
}
/**************************************************************************
* Function: get_private_profile_int()
* Arguments: <char *> section - the name of the section to search for
* <char *> entry - the name of the entry to find the value of
* <int> def - the default value in the event of a failed read
* <char *> file_name - the name of the .ini file to read from
* Returns: the value located at entry
***************************************************************************/
int get_private_profile_int(char *section,
 char *entry, int def, char *file_name)
{ FILE *fp = fopen(file_name,"r");
 char buff[MAX_LINE_LENGTH];
 char *ep;
 char t_section[MAX_LINE_LENGTH];
 char value[6];
 int len = strlen(entry);
 int i;
 if( !fp ) return(0);
 sprintf(t_section,"[%s]",section); /* Format the section name */
 /* Move through file 1 line at a time until a section is matched or EOF */
 do
 { if( !read_line(fp,buff) )
 { fclose(fp);
 return(def);
 }
 } while( strcmp(buff,t_section) );
 /* Now that the section has been found, find the entry.
 * Stop searching upon leaving the section's area. */
 do
 { if( !read_line(fp,buff) buff[0] == \0' )
 { fclose(fp);
 return(def);
 }
 } while( strncmp(buff,entry,len) );
 ep = strrchr(buff,'='); /* Parse out the equal sign */

 ep++;
 if( !strlen(ep) ) /* No setting? */
 return(def);
 /* Copy only numbers fail on characters */

 for(i = 0; isdigit(ep[i]); i++ )
 value[i] = ep[i];
 value[i] = \0';
 fclose(fp); /* Clean up and return the value */
 return(atoi(value));
}
/**************************************************************************
* Function: get_private_profile_string()
* Arguments: <char *> section - the name of the section to search for
* <char *> entry - the name of the entry to find the value of
* <char *> def - default string in the event of a failed read
* <char *> buffer - a pointer to the buffer to copy into
* <int> buffer_len - the max number of characters to copy
* <char *> file_name - the name of the .ini file to read from
* Returns: the number of characters copied into the supplied buffer
***************************************************************************/
int get_private_profile_string(char *section, char *entry, char *def,
 char *buffer, int buffer_len, char *file_name)
{ FILE *fp = fopen(file_name,"r");
 char buff[MAX_LINE_LENGTH];
 char *ep;
 char t_section[MAX_LINE_LENGTH];
 int len = strlen(entry);
 if( !fp ) return(0);
 sprintf(t_section,"[%s]",section); /* Format the section name */
 /* Move through file 1 line at a time until a section is matched or EOF */
 do
 { if( !read_line(fp,buff) )
 { fclose(fp);
 strncpy(buffer,def,buffer_len);
 return(strlen(buffer));
 }
 }
 while( strcmp(buff,t_section) );
 /* Now that the section has been found, find the entry.
 * Stop searching upon leaving the section's area. */
 do
 { if( !read_line(fp,buff) buff[0] == \0' )
 { fclose(fp);
 strncpy(buffer,def,buffer_len);
 return(strlen(buffer));
 }
 } while( strncmp(buff,entry,len) );
 ep = strrchr(buff,'='); /* Parse out the equal sign */
 ep++;
 /* Copy up to buffer_len chars to buffer */
 strncpy(buffer,ep,buffer_len - 1);

 buffer[buffer_len] = \0';
 fclose(fp); /* Clean up and return the amount copied */
 return(strlen(buffer));
}

[LISTING THREE]


/***** Routine for writing private profile strings --- by Joseph J. Graf
*****/
#include <stdio.h>
#include <string.h>
#include "profport.h"

/***************************************************************************
 * Function: write_private_profile_string()
 * Arguments: <char *> section - the name of the section to search for
 * <char *> entry - the name of the entry to find the value of
 * <char *> buffer - pointer to the buffer that holds the string
 * <char *> file_name - the name of the .ini file to read from
 * Returns: TRUE if successful, otherwise FALSE
 ***************************************************************************/
int write_private_profile_string(char *section,
 char *entry, char *buffer, char *file_name)

{ FILE *rfp, *wfp;
 char tmp_name[15];
 char buff[MAX_LINE_LENGTH];
 char t_section[MAX_LINE_LENGTH];
 int len = strlen(entry);
 tmpnam(tmp_name); /* Get a temporary file name to copy to */
 sprintf(t_section,"[%s]",section);/* Format the section name */
 if( !(rfp = fopen(file_name,"r")) ) /* If the .ini file doesn't exist */
 { if( !(wfp = fopen(file_name,"w")) ) /* then make one */
 { return(0); }
 fprintf(wfp,"%s\n",t_section);
 fprintf(wfp,"%s=%s\n",entry,buffer);
 fclose(wfp);
 return(1);
 }
 if( !(wfp = fopen(tmp_name,"w")) )
 { fclose(rfp);
 return(0);
 }

 /* Move through the file one line at a time until a section is
 * matched or until EOF. Copy to temp file as it is read. */

 do
 { if( !read_line(rfp,buff) )
 { /* Failed to find section, so add one to the end */
 fprintf(wfp,"\n%s\n",t_section);
 fprintf(wfp,"%s=%s\n",entry,buffer);
 /* Clean up and rename */
 fclose(rfp);
 fclose(wfp);
 unlink(file_name);
 rename(tmp_name,file_name);
 return(1);
 }
 fprintf(wfp,"%s\n",buff);
 } while( strcmp(buff,t_section) );

 /* Now that the section has been found, find the entry. Stop searching
 * upon leaving the section's area. Copy the file as it is read
 * and create an entry if one is not found. */
 while( 1 )

 { if( !read_line(rfp,buff) )
 { /* EOF without an entry so make one */
 fprintf(wfp,"%s=%s\n",entry,buffer);
 /* Clean up and rename */
 fclose(rfp);
 fclose(wfp);
 unlink(file_name);
 rename(tmp_name,file_name);
 return(1);

 }

 if( !strncmp(buff,entry,len) buff[0] == \0' )
 break;
 fprintf(wfp,"%s\n",buff);
 }

 if( buff[0] == \0' )
 { fprintf(wfp,"%s=%s\n",entry,buffer);
 do
 {
 fprintf(wfp,"%s\n",buff);
 } while( read_line(rfp,buff) );
 }
 else
 { fprintf(wfp,"%s=%s\n",entry,buffer);
 while( read_line(rfp,buff) )
 {
 fprintf(wfp,"%s\n",buff);
 }
 }

 /* Clean up and rename */
 fclose(wfp);
 fclose(rfp);
 unlink(file_name);
 rename(tmp_name,file_name);
 return(1);
}

End Listings





















March, 1994
Portability by Design


Mapping structures are one place to start




Michael Ross


Michael is a senior compiler developer for MetaWare with over 16 years of
compiler development and management experience. He can be contacted at
408-429-6382.


When it comes to writing software for multiple operating systems and
microprocessors, you really only have two choices: design portability into
your code from the start, or wait until you are forced into expensive
rewrites.
Where I work (MetaWare), we often have to quickly port a compiler from one
platform to another. It isn't unusual to have a compiler that compiles itself
on the new platform (that is, a particular processor/OS combination) in six to
eight weeks, with one to two people working on the project. This is possible
because portability--the ability to compile, run, and achieve close to the
same results as well as the same look-and-feel across platforms--was designed
into the compiler from the very beginning. In this article, I'll discuss some
of the techniques we've developed that you can use to enhance the portability
of your code.
MetaWare got started in the compiler business in 1983 because we were having
difficulty porting our parser generator to various Pascal compilers. These
problems motivated us to write a portable compiler of our own. In moving to
various processors and operating systems, we distilled some knowledge of the
problem areas in writing transportable code.
Some people believe that if you write in C or C++ and stick to the latest ANSI
or de facto standard, your code is automatically portable. If they're a bit
more savvy, they may allow that you need some kind of platform-specific
GUI-development package. These are the same people likely to storm into your
office when you tell them that it will take six months and a rewrite of some
modules to port application XYZ to a new platform.
To keep tempests out of your office, here are some things to think about when
writing code:
Don't delay coding for portability. There's a temptation to say "I'll get it
working on one platform, then worry about the others." If you take this
approach, you could find yourself doing a significant rewrite.
Assume your application will live forever. Try to predict which platforms your
application might port to in the future, and take their peculiarities into
account.
Isolate the user-interface portion of your code from everything else. Try and
perform an abstraction so that the rest of the application doesn't know
whether it is writing to a dumb terminal, working with X Windows, or
interacting with Microsoft Windows.
Stick to ANSI standard or very widely implemented features in your source
code. For C++, avoid using templates and exception handling, since these
constructs are not widely available. Use your compiler's ANSI checking switch
for any language that has an ANSI standard. This isn't a cure-all, but it does
help. Use some form of lint, and take the warnings seriously. Make your code
compile with no warnings.
Are all the tools you need available on each platform? Assembler? Profiler?
Debugger? GUI manager? We frequently find buggy assemblers when we port.
Make code involving floating point tolerant of answers that lie within a known
correct range. To see if your code is likely to have problems in this area,
compile it with two different compilers on the same architecture. If the
results vary at all, then moving the code to a new architecture is likely to
magnify the differences.
Listing One (page 93) shows a structure used to encapsulate machine and
operating-system dependencies. The mapping structure, with its associated
macros, communicates dependencies between our C or Pascal front end and the
code generator for the target processor. You can broadly categorize the
platform characteristics encapsulated in this mapping structure as:
Alignment of data.
Special machine instructions, such as transcendentals.
Size of various fundamental data types.
"Endianness," or byte order.
Floating-point format.
Listing Two (page 94) includes an ideal representation of a floating-point
number, with a mantissa of 128 bits. The compiler uses this for internal
manipulation of floating-point constants and then converts them to whatever
external floating-point representation is required on the target machine. The
code uses the knowledge of the longword size of the host machine where the
compiler is running to assure that we don't shift by an amount that will yield
invalid results on the host system.
Your code should contain safety checks to prevent you from relying too much on
a particular word size or floating-point representation.
The problem with code that does a lot of floating-point calculation is that
the precision of the results and the execution speed of the code may vary
widely from one architecture to another. Even on a machine that has some
variation of IEEE floating point, the defaults as to which kinds of overflow
traps are signaled or ignored can vary. On the 80387 and Motorola 68040, the
varying frequencies with which compilers store floating-point results to
memory can significantly affect the accuracy of your results. Are the
potential target platforms flexible--through library calls or some other
mechanism--in determining how rounding will occur, when results are stored,
and what traps are signaled? If not, you should consider implementing your own
extended-precision floating-point operations. The MetaWare C compiler, for
example, maintains its own internal version of a floating-point constant and
constant operations. When necessary, the compiler converts floating-point
constant results to the representation of the target architecture.
"Portable" applications should have roughly the same look-and-feel across
platforms. When you port an application that is interactive and heavily
dependent on floating point (a CAD package, for example) to a new
architecture, some algorithms run too slowly. This is because they depended
upon some capability of the original architecture (such as built-in
transcendental instructions) for reasonable run-time performance. With the
trend toward RISC architectures, you may want to examine your code for such
hot spots and change the algorithm. You might even write your own
transcendental routines to insulate the application from performance changes
across platforms.
Alignment of data is a problem mainly when your application depends on
transmitting binary data across platforms, or uses different language
processors on the same platform. Aside from the obvious endianness
difficulties, there are variations in field alignment. This often necessitates
representing the binary data in textual form or having a data broker perform
the necessary transformations between the two architectures. Consider the
"bad" C code in Example 1(a), which was intentionally written to demonstrate
alignment problems. The alignment of the various structure fields will vary
from one compiler to another and from one architecture to another. Imagine
trying to write this to a binary file and read it back on another platform!
Even with the same language processor, this structure will vary in size, due
to memory "holes"; for example, on SPARC sizeof(mystruct)=24 bytes, on 80386
sizeof(mystruct)=16 bytes, and on HP-PARISC sizeof(mystruct)=24 bytes.
The problem is worse, of course, in languages like Fortran. If you write code
like that in Example 1(b), it may compile and run on some architectures, while
others will cause a trap at run time. On the Intel i860 processor, doing a
load of a double-precision value that is not aligned on a double-precision
boundary is difficult and slow, and some compilers don't support it. Other
RISC architectures have similar problems. Note that the Fortran code in
Example 1(b) fits within the ANSI standard; it does not fit our definition of
portability, however. You should examine your data structures, regardless of
the programming language you use, and ensure that structure and union fields
are aligned on common, natural boundaries.
There may soon be some light on the horizon to help with some of the
binary-data compatibility problems, in the form of SOM (system object model),
which allows a common binary format for objects across architectures and
languages. SOM meshes with the activities of CORBA, which may eventually allow
different C++ compilers to share a common object format.
The Fortran programming community that depends on de facto characteristics of
old compilers may be in for a rude shock as vendors write new Fortran-90
compilers. I'm always amazed at code such as that in Example 1(c) that lives
on in many applications. This code depends on zero initialized memory and
static allocation of local variables. With its RECURSIVE keyword and
name-space rules, Fortran-90 encourages automatic allocation of local
variables. Most optimizers will not perform up to their potential if you use
statically allocated variables.
C programmers are not exempt from writing absurdities, either. The most common
used to be the use of the register specifier. Most optimizing compilers now
ignore the register specifier altogether. If you move your code from one
architecture to another, you can't expect the same set of variables to be
maintained in registers across architectures. Usually, the register allocator
can do a better job than a person of choosing registers and keeping the
choices optimal as the source code changes.


Operating-system Dependencies


Occasionally, an application has to manipulate data that is inherently
operating-system dependent. Listing Three (page 94) shows a class definition
for managing processes. This code encapsulates everything your application
might need to know about a process, regardless of the underlying operating
system. Note that operating-system dependent actions such as stepping the
process by an instruction are deferred to member functions that can be
extremely operating-system specific. The remainder of the application,
however, uses these operations through the class definition in Listing Three,
allowing you to isolate the OS-specific code from the rest of the application.
Virtual functions (which are not used here) can also help you to isolate
system dependencies by allowing you to fill in a specific member function that
matches a particular OS.


User-interface Dependencies


If the application's user interface is more complicated than a simple
command-line switch interpreter, you'll need to consider a GUI manager. Since
X Windows is available on almost all UNIX systems, you could create a
relatively portable system by taking into consideration X Windows, Microsoft
Windows, OS/2, character-based MS-DOS, and Apple Macintosh. However, if you
can find a reliable vendor who has already encapsulated these APIs for you,
it's probably worth your money to buy the class libraries. Be aware, however,
that some GUI vendors depend on a Motif layer, which is not supplied with the
product, and not bundled with the OS by many vendors.
Another consideration in GUI class-library selection is the trend toward
software internationalization. If possible, examine the source code and see
how hard this is going to be for you. A good technique is to group all of the
messages that a user might see in a single table or separate resource file.
Then the input and output can be translated without affecting the remainder of
the application. If your system allows, you may even be able to have different
national-language versions of the user interface in DLLs that can be loaded on
demand.
Taking all of these different factors into consideration as you write a new
application or port an old one seems like a lot of extra effort. However, if
you design portability in from the beginning, you'll save yourself a lot of
effort later.

Example 1: (a) Bad code that intentionally demonstrates problems in alignment;
(b) nonportable Fortran code which will compile and run on some architectures,
but on others will cause a trap at run time; (c) Fortran-90 encourages
automatic allocation of local variables.
(a) #include <stdio.h>struct bad_layout{ char direction; union { char var;
float fvar; double dvar; } mess; int myint; };main(){ struct bad_layout
mystruct; printf("sizeof(mystruct) = %d\n", sizeof(mystruct)); }


(b) COMMON /SLOW/ I,D,L EQUIVALENCE (D,N(2)),(I,N(1)) INTEGER I,N(2) DOUBLE
PRECISION D LOGICAL L PRINT *,D END
(c) SUBROUTINE USELESS(I,J) INTEGER I,J IF (J.EQ.0) GOTO 10 I = J ** 2 GOTO
2010 I = 3 J = 1020 CONTINUE END

[LISTING ONE] (Text begins on page 40.)

/* Machine dependent information required by front-end */
#ifndef CALLCONV_H
#include "callconv.h"
#endif

#ifndef LANGUAGE_H
#include "language.h"
#endif

#ifndef FP_H
#include "fp.h"
#endif
#ifndef TCLASSES_H
#include "tclasses.h"
#endif

typedef unsigned long inline_flags_type;

typedef enum {
 CPU_unknown,

 CPU_I386, /* Intel 386 */
 CPU_370, /* IBM 370 */
 CPU_RT, /* IBM RT */
 CPU_AM29K, /* AM29000 */
 CPU_I860, /* Intel 860 */
 CPU_MC68000, /* Motorola 68000 */
 CPU_MC68020, /* Motorola 68020 */
 CPU_VAX, /* DEC VAX */
 CPU_MDD, /* McDonald Douglas */
 CPU_SPARC, /* SPARC (Sun 4) */
 CPU_R3000, /* MIPS R3000 */
 CPU_HOBBIT,
 CPU_PARISC, /* HP9000-600/700/800 PA-RISC */
 CPU_Ix86, /* 80x86 -- 16-bit Intel */
 CPU_RS6000, /* IBM RS-6000 */
 CPU_PPC /* IBM Power PC */
 } CPU_type;

/*Object module formats*/

typedef enum {
 NOFORMAT, /* OMF not applicable or unknown*/
 COFF, /* Basic AT&T COFF */
 AMDCOFF, /* AMD version of COFF */
 LAMDCOFF, /*AMD version of COFF (little-endian mode) */
 INTELOMF, /* Intel OMF (MSDOS) */
 ADOTOUT, /* BSD a.out */
 ELF, /* SVR4 ELF format */
 MACH, /* MACH object module format */
 HPUXCOFF, /* HPUX version of COFF */
 } objformat_type;


/* Operating systems */
typedef enum {
 NOOS, /* OS unknown */
 AIX_OS, /* IBM's UNIX */
 BSD_OS, /* Berkeley UNIX BSD4.x */
 SUN_OS, /* Sun OS */
 ATT_OS, /* AT&T UNIX System V.3 */
 EPI_OS, /* EPI-OS (AM29000) */
 VMS_OS, /* DEC */
 MSDOS_OS, /* MS-DOS */
 OS2_OS, /* OS/2 */
 XNX_OS, /* Xenix */
 ISIS_OS, /* Isis */
 ATT4_OS, /* AT&T UNIX System V.4 */
 NeXT_OS, /* NeXTStep from NeXT */
 NEWS_OS, /* Sony NEWS-OS */
 MSNT_OS, /* Microsoft NT */
 SOL_OS, /* Solaris */
 HPUX_OS, /* HP UNIX System V. */
 } os_type;

/* Inline transcendentals */
#define SIN_INLINE 0x00000001
#define COS_INLINE 0x00000002
#define ASIN_INLINE 0x00000004 /* Arcsin or Arccos */
#define SQRT_INLINE 0x00000008
#define LOG_INLINE 0x00000010 /* Natural or common log */
#define EXP_INLINE 0x00000020
#define TAN_INLINE 0x00000040
#define ATAN_INLINE 0x00000080
#define ATAN2_INLINE 0x00000100 /* Arctan(a,b) */
#define SINH_INLINE 0x00000200
#define COSH_INLINE 0x00000400
#define TANH_INLINE 0x00000800 /* tanh */
#define FABS_INLINE 0x00001000 /* fabs */
#define ASM_INLINE 0x00002000 /* ASM directive supported */
#define INS_INLINE 0x00004000 /* _inline(bytes) */

/* Status of pragma processing: */
typedef enum {
 OKAY_DELETE_p, /* Pragma fully processed; do not pass to backend */
 OKAY_PASS_p, /* Pass pragma to back-end */
 TOO_LATE_p, /* Pragma specified too late */
 ERROR_p, /* Suboperands of pragma incorrect */
 /* badptr set to bad parameter */
 IGNORED_p, /* Already specified */
 } PRAG_STATUS;
struct struct_aligns {char if_its_this_big, align_to_this;};

extern struct mapping_entry{
 /* False if structs are packed by default */
 bool align_members;
 ubyte shortsize; /* size of "short" in bytes */
 ubyte intsize; /* size of "int" in bytes */
 ubyte longsize; /* size of "long" in bytes */
 ubyte floatsize; /* size of "float" */
 ubyte doublesize; /* size of "double" */
 ubyte longdoublesize;/* size of "long double" */
 ubyte min_parm_align;/* Min alignment of a passed parm */

 bool unsigned_char; /*"char" unsigned by default? */
 bool pure_32_bit; /* All integer operations in 32 bits?*/
 ubyte code_ptr_size; /* Code pointer size */
 ubyte data_ptr_size; /* Data pointer size */
 /* Size of "near" pointer to code */
 ubyte near_code_ptr_size;
 /* Size of "far" pointer to code */
 ubyte far_code_ptr_size;
 /* Size of "near" pointer to data */
 ubyte near_data_ptr_size;
 /*Size of "far" pointer to data */
 ubyte far_data_ptr_size;

 ubyte short_align; /* Alignment of shorts */
 ubyte long_align; /* Long alignment */
 ubyte int_align; /* int alignment */
 ubyte ptr_align; /* pointer alignment */
 ubyte double_align; /* double alignment */
 ubyte max_field_align; /* Maximum field alignment */
 bool msb_first; /*integers stored MSB first? */
 /* Align all vars to int boundary (even chars) */
 bool word_align_vars;
 bool off_boundary_refs;/* Are off-boundary refs supported?*/
 char *version; /* Version number of compiler */
 /* Align structs 3 bytes or longer to word boundary */
 bool word_align_structs;
 calling_convention_set default_calling_convention;
 /* If true varargs must be explicitly specified with "..." syntax */
 bool ansi_varargs_only;
 /* Which transcendentals may be inline? */
 inline_flags_type inline_flags;
 ubyte max_parm_align;/* Max alignment of a passed parm */
 ubyte offset_size; /* Pascal offset size */
 ubyte area_size; /* Pascal area size */
 /* Largest array in bytes (Pascal) */
 unsigned long max_data_size;
 bool signed_halfwords_preferred; /*True for 370. (Pascal)*/
 bool range_checks_require_range; /*Do we need a lower and upper
 value for rangecheck?*/
 /* How are Pascal sets mapped? */
 /* Also see "set_unit_size" below */
 bool sets_mapped_LSB_first;
 /* used in machines where pointer arithmetic will not work unless the
 * computed address is a multiple of machine's word size. */
 ubyte address_resolution;
 /* In creating a common block for Professional Pascal interface packages,
 * we use the name of the package, prefixed and suffixed with a
 * special character. */
 /* char to be prefixed to Pascal package block*/
 char package_prefix;
 /* char to be suffixed to Pascal package block*/
 char package_suffix;
 /* size of "char" in bytes, believe it or not char is 8 bytes on NAM */
 ubyte charsize;
 /* Format of floating point */
 real_form floating_point_format;
 /* If off-boundary references are NOT supported, do we
 * handle packed structures nevertheless? */
 bool packed_structs_supported;

 ubyte char_align; /*Alignment of "char" */
 ubyte float_align; /*Alignment of "float" */
 ubyte longdouble_align; /*Alignment of "long double" */
 char bits_per_int; /*bits per integer used for bitfields */
 /* Implies the machine can convert (signedunsigned)(charshort) to float
 * in a way more efficient than to double. The result is that the front
 * end is careful to generate FLT vs FLTU in this context so back end can
 * tell from register length and FLT/FLTU whether to generate the better
 * code and which kind of better code (it differs from U to nonU). */
 bool short_to_float;
 /* Given the case index of a pseudo-function that target machine is
 * usually supports, do we support it with corrent configuration? */
 bool (*recognize_pseudo_function)(func_case_index);
 CPU_type cpu; /* Target CPU */
 os_type os; /* Target OS */
 objformat_type omf; /*Target OMF*/
 /* For non-padded structs, search this array, terminated by {0,0}.
 * If struct size is >= first member, align to at least 2nd member.
 * Thus "word_align_structs" could be {3,4},{2,2},{0,0}. */
 struct struct_aligns *struct_aligns, *array_aligns;
 bool use_INFO; /* Can code generator tolerate INFO
 * rather than LINE/FILE/STAB ? */
 type_class wchartype; /* this can be any intger type */
 ubyte wcharsize; /* sizeof (wchar_t) */
 /* We changed the way debugger information is passed to code generator.
 * If "debug_info_scheme" is 0, then old way is used. Otherwise, the
 * new way. */
 ubyte debug_info_scheme;
 bool try_supported;
 bool wchar_is_short; /* wchar_t is short int */
 bool wchar_is_long; /* wchar_t is long int */
 /* Given a toggle that may have been defined in
 * establish_machine_dependencies, is it still good at this point? */
 /* Return TRUE if so. Otherwise, the compiler will issued */
 /* "Toggle cannot be specified here." */
 bool (*toggle_is_valid)(toggle t, bool turn_it_on);

 /* "init", if not zero, is called when the first non-pragma is seen.
 * Its primary purpose is to adjust other fields in mapping that are
 * dependant on pragmas and toggles. E.g. 1167 toggle for 386 causes long
 * doubles to be 8 bytes instead of 12 */
 void (*init)(void);
 /* The default global aliasing convention. NULL implies "%r". */
 char *global_aliasing_convention;
 /* Are "far" variables support? That is, does it make
 * sense to qualify a variable declaration with "_far"? */
 /* This must be FALSE on ESA/370. */
 bool far_variables_supported;
 /* Are "far" functions supported? If false, compiler will */
 /* ignore "_far" and issue a warning. */
 bool far_functions_supported;
 /* Does the target machine's OMF handle multiple named */
 /* control sections (segments)? */
 /* If this is false, then the front-end will ignore */
 /* "pragma code", "pragma literals(xx)", and "pragma static_segment()". */
 /* (Pragma data is handled on UNIX by converting to named blocks). */
 bool multiple_named_sections_supported;
 /* Can control sections of class "common" be initialized? */
 /* On DOS and VMS this is true. */

 bool common_section_initialization_supported;
 /* Are we generated code for a strict IEEE machine? */
 /* If true, we must generate unordered comparisons. */
 /* I.e., X < Y is a different operation than !(X >= Y). */
 bool IEEE_unordered_compares_supported;
 /* code_sections_supported -- if true, the target machine */

 /* had separate I & D spaces but is able to reference the */
 /* I space as data. (E.g.,386 with its segment override). */
 /* If set to TRUE, the toggles "const_in_code" and "literals_in_code"
 * will be supported for putting const variables into instruction space.*/
 bool code_sections_supported;
 /* Is the calling convention "CALLEE_POPS_STACK" supported? */
 bool callee_pops_stack_supported;
 /* Is the calling convention "REVERSE_PARMS" supported? */
 /* (Applies to those machines that have a "PUSH" instruction only. */
 bool reverse_parms_supported;
 /* Default data aliasing convention for Professional Pascal */
 /* Default routine aliasing convention for Professional Pascal */
 char *data_aliasing_convention;
 char *routine_aliasing_convention;
 /* set_unit_size in conjunction with the "sets_mapped_LSB_first" */
 /* flag above determine how (Pascal) sets are mapped in storage. */
 /* High C/Professional Pascal ordinarily maps sets as a sequence
 * of halfwords. Microsoft maps them as an even number of bytes,
 * MSB first. */
 ubyte set_unit_size;
 /* type_checking is true if type information should be kept even when
 * g_flag is off. This is used for 370/ESA type checking at link time. */
 bool type_checking;
 /* Size of a "long long" type. If such types are not supported,
 * then the size should match "long". */
 ubyte longlongsize;
 ubyte longlong_align; /* Alignment of long long */
 /* What is the largest auto-variable of type struct that can be mapped
 * directly into a series of one or more registers? Value is in bytes. */
 uint largest_aggregate_in_registers;
 /* When doubles or long doubles are just plain variables (or arrays
 * thereof), what should the alignment be? For Pentium, it's much
 * better for them to be 8-byte aligned. */
 ubyte double_variable_align; /*double variable alignment */
 /*Alignment of "long double" variable */
 ubyte longdouble_variable_align;
 bool supports_call_lit; /* Code generator supports call-lit construct.*/
 /* These tell if we require certain calling convention bits: */
 /* Must have cc_CALLEE_POPS_STACK.*/
 bool callee_pops_stack_required;
 bool reverse_parms_required;/* Must have cc_REVERSE_PARMS.*/
 /* For C++ constructors/destructors of static/global variables, does
 * get handle the initialization level protocol? True for systems
 * that support .init/.fini sections. */
 bool support_initialization_order;
 /* Use Sun's and AT&T's convention for mapping bit fields? */
 bool ABI_bit_fields;
 /* SYS_SIZETTYPE takes on one of the values s, i, l for short, int, long.
 * SYS_SIZETSIGNED takes one of the values s, u, or n for signed,
 * unsigned or non-signed (for "char"). */
 char sizet_signed; /* s, u, n. */
 char sizet_type; /* s, i, l. c not supported currently. */


 }mapping;
extern const char *machine_name; /*Name of machine*/
#define LONGLONG_SUPPORTED (SYS_LONGLONGSIZE > SYS_LONGSIZE)
#define SYS_PACKEDSTRUCT (!mapping.align_members)
/* CHARSIZE is always the same on all machines. Wrong!! not on NAM */
#define SYS_CHARSIZE mapping.charsize
#define SYS_INTSIZE mapping.intsize
#define SYS_FLOATSIZE mapping.floatsize
#define SYS_DOUBLESIZE mapping.doublesize
#define SYS_EXTENDEDSIZE mapping.longdoublesize
#define SYS_LINKSIZE mapping.near_data_ptr_size
#define SYS_LONGLONGSIZE mapping.longlongsize
#define SYS_SHORTSIZE mapping.shortsize
#define SYS_LONGSIZE mapping.longsize
#define SYS_SIGNEDCHAR (!mapping.unsigned_char)

/*Pointer that is loaded into reg*/
#define SYS_PTRSIZE SYS_LINKSIZE
#define SYS_DPTRSIZE mapping.data_ptr_size
#define SYS_CPTRSIZE mapping.code_ptr_size
#define SYS_WCHARSIZE mapping.wcharsize
#define SYS_WCHARTYPE mapping.wchartype
#define SYS_WCHARSHORT mapping.wchar_is_short
#define SYS_WCHARLONG mapping.wchar_is_long
#define SYS_SIZETSIGNED mapping.sizet_signed
#define SYS_SIZETTYPE mapping.sizet_type
#define BITS_PER_BYTE 8
#define BITS_PER_INT mapping.bits_per_int
#define BITS_PER_LONG (sizeof(long)*BITS_PER_BYTE)
#define SYS_INTAL mapping.int_align
#define SYS_LONGAL mapping.long_align
#define SYS_LONGLONGAL mapping.longlong_align
#define SYS_SHORTAL mapping.short_align
#define SYS_FLOATAL mapping.float_align
#define SYS_CHARAL mapping.char_align
#define SYS_DOUBLEAL mapping.double_align
#define SYS_LONGDOUBLEAL mapping.longdouble_align
#define SYS_DOUBLE_VARIABLE_AL mapping.double_variable_align
#define SYS_LONGDOUBLE_VARIABLE_AL \
mapping.longdouble_variable_align
#define SYS_PTRAL mapping.ptr_align
#define SYS_OFF_BOUNDARY_REFS mapping.off_boundary_refs
#define SYS_STRINGAL 1 /*Alignment of strings*/
#define SYS_MSB_FIRST mapping.msb_first
#define SYS_LSB_FIRST (!SYS_MSB_FIRST)
#define SYS_STACK_AL mapping.min_parm_align
#define SYS_32_BIT_ARITHMETIC_ONLY mapping.pure_32_bit
#define SYS_MAX_FIELD_AL mapping.max_field_align
#define SYS_WORD_ALIGN_STRUCTS mapping.word_align_structs

#define SYS_ADR_RESOLUTION mapping.address_resolution
#define SEGMENT_REG_SIZE (mapping.far_data_ptr_size - \
SYS_INTSIZE )
#define NEW_DEBUG_INFO (mapping.debug_info_scheme> 0)

/* Establish_machine_dependencies -- initializes "mapping" to correspond
 * to "machine". Returns FALSE if machine is not recognized. */
extern bool establish_machine_dependencies(

 const char *machine, language_type l);
/* Defines which object module formatter should be used for the target OS and
 * machine name of the target operating system passed by the driver */
extern const char *targetos_name;

[LISTING TWO]

#define LS (sizeof(long)*8) /* Maximum bits for left shift */
#define BS (sizeof(long long)*8) /* Maximum bits for long long shift */
struct longlong{ /* Define structure for systems that don't support long
long*/

 long lo;
 long hi;
 };
typedef struct longlong Big_int;
 /* Grab a mantissa from a floating point number */
 static long extract_mantissa(FP_number F){
 Big_int b;
 int cnt, rbit;
 cnt = BS -1 - F->Exp; /* Determine how far to shift to find exponent */
 if (cnt >= LS){ /* If shifting at least a longword */
 cnt -= LS; /* subtract longword length from shift count */
 if (cnt == 0)
 rbit = long(b.lo) < 0; /* Guard bit needed? */
 b.lo = b.hi;
 b.hi = 0;
 }
 if (cnt){ /* Anything left to shift? */
 rbit = (b.lo & (1L << (cnt-1))) !=0;
 if (cnt == LS){ /* Some machines can't shift by long word length */
 b.lo = 0; /* Defend against this. Result would be zero */
 b.hi = 0;
 }
 else{
 b.lo = ((unsigned long)b.lo >> cnt (b.hi << (LS-cnt));
 b.hi = (unsigned long) b.hi >> cnt;
 }
 }
 }

[LISTING THREE]

#ifndef Process_h
#define Process_h

#include "addrvect.h"
#include "disassem.h"
#include "itemnumb.h"
#include "linklist.h"
#include "memacces.h"
#include "modulemg.h"
#include "stmt.h"
#include "symbolta.h"
#include "tempbkpt.h"


typedef int Outcome;
#define user_failure -1
#define failure 0

#define success 1

class Declaration;
class DiFile;
class EventMgr;
class ExecSpec;
class ExprObj;
class StackFrame;
class Status;
class Thread;

enum PSA1 { psa1_none,
 psa1_replace_breakop,
 psa1_replace_libpt,
};
enum PSA2 { psa_none,
 psa_run_to_retaddr,
};
class Process : public ItemNumber {
 Key key;
 Thread * current; // 0 in ctor
 LinkList threadlist;
 TempBkptMgr tempbkptmgr;
 EventMgr * eventmgr; // set in ctor
 MemAccess memaccess;
 Disassembler disassembler;
 Symboltable symboltable;
 ModuleMgr modulemgr;
 AddrVector destvector;
 Stmt startstmt;
 Boolean now_executing;
 DiFile * target_difile;
 PSA1 psa1;
 PSA2 psa2;
 char *current_lang; // Temporary hack to be able
 // to set the language.
 Thread * lookup_thread( unsigned int );

 Outcome check_state();
 Outcome set_execspec( const ExecSpec & );
 Boolean goal_attained();

 Outcome start_step_into( Thread * );
 Outcome start_step_over( Thread * );
 Outcome start_step_retaddr( Thread * );
 Outcome start_run( Thread * );

 Outcome analyse_stmt_into( Boolean & );
 Outcome analyse_stmt_over( Boolean & );

 Outcome run_to_return_addr( Thread * );
 Outcome execute_to_return_addr( Thread * );
 Outcome run_to_caller( Thread * );

 Outcome end_instr_step_into( Thread * );
 Outcome end_instr_step_over( Thread * );
 Outcome end_stmt_step_into( Thread * );
 Outcome end_stmt_step_over( Thread * );
 Outcome end_run( Thread * );


 Outcome check_instr_step_into( Thread * );
 Outcome check_instr_step_over( Thread * );
 Outcome check_stmt_step_into( Thread * );
 Outcome check_stmt_step_over( Thread * );
 Outcome check_run( Thread * );

 Outcome respond_to_hop_completion( Thread * );
 Outcome respond_to_retpoint( Thread * );
 Outcome respond_to_destpoint( Thread * );
 Outcome respond_to_breakpoint( Thread * );
 Outcome respond_to_watchpoint( Thread * );
 Outcome respond_to_exception( Thread * );
 Outcome respond_to_step_completion( Thread * );
 Outcome respond_to_suspension( Thread * );
 Outcome respond_to_exec( Thread *, DiFile * );
 Outcome respond_to_libpt( Thread * );

 Outcome respond_to_module_load( const Status & );
 Outcome respond_to_module_unload( const Status & );
 Outcome respond_to_thread_creation( const Status & );
 Outcome respond_to_thread_destruction( const Status & );

 Outcome start_stmt_step_into( Thread * );
 Outcome start_stmt_step_over( Thread * );
public:
 Process( unsigned int, DiFile * );
 ~Process();

 Outcome get_status( Status & );
 Outcome wait_status( Status & );
 Outcome update_status( const Status & );
 Boolean is_executing() { return now_executing; }

 Outcome run( const ExecSpec & );
 Outcome instr_step_into( const ExecSpec & );
 Outcome instr_step_over( const ExecSpec & );
 Outcome stmt_step_into( const ExecSpec & );
 Outcome stmt_step_over( const ExecSpec & );
 Outcome resume();
 Outcome stop();

 Thread * current_thread() { return current; }

 Outcome disassemble( const Addr &, Instruction &, Addr & );
 Outcome find_stmt( const Addr &, Stmt & );
 Outcome find_function( const Addr &, Declaration & );

 Outcome evaluate( char *, StackFrame &, ExprObj & );
 Outcome set_language(char *input_language);
 char *get_lang_string();
 Process * next() { return (Process *)Item::get_next(); }
};
#endif
End Listings







March, 1994
Unicode and Software Globalization


Software-design guidelines for international application development




David Van Camp


David is a senior consultant with Cap Gemini America. He specializes in
Windows, NT, and OS/2 development. You can contact him on CompuServe at
70323,3510.


Developers have traditionally produced software for the domestic U.S. market,
then remarketed it overseas after a few simple modifications. This practice,
however, is fast becoming unacceptable as more and more non-U.S. users expect
software to "speak" to them in their native tongue and conventions.
Furthermore, non-U.S. computer markets are growing rapidly, forcing software
developers to begin thinking internationally. Currently, a variety of
character standards and proprietary extensions are being used to address
application internationalization. In many cases, however, the result is often
inefficient code that's difficult to maintain, hard to port, and expensive to
produce--all problems the Unicode standard addresses.


The Unicode Standard


Unicode is an international character code standard which replaces single-byte
ASCII and ANSI, multibyte ANSI, and nearly every other character-code standard
currently defined. Unicode contains a one-to-one mapping of codes for ASCII
and the ISO 8859/1 Latin character set used by Microsoft Windows, NextStep,
and others. Additionally, Unicode contains a nearly complete superset of all
fixed-size and multibyte standards currently in use and provides for symbol
sets such as mathematical operators, technical symbols, geometric shapes, and
dingbats. The initial release of Unicode, in fact, contained 26,000
characters. More importantly, when producing code, the Unicode character
standard provides a global solution for application localization, which solves
many of the problems associated with multibyte support. Specifically, it
provides the ability to prepare your application code for supporting nearly
all languages with little or no modification, at least in terms of data
storage. (See the accompanying text box entitled, "The Unicode Consortium.")
The good news is that it's relatively simple to modify most applications to
transparently support multiple character standards. I now support Unicode
whenever possible in my applications. The degree of difficulty for supporting
Unicode depends on which and how many character standards were supported by
the original code, what future requirements are expected, and how much support
is provided by the underlying graphics engine and operating system.
Microsoft's NT operating system uses Unicode internally to represent all
characters and strings. In addition to long filenames and extended attributes,
NT supports Unicode characters in filenames. Consequently, it's possible for a
single filename to contain a mix of characters from hundreds of modern
languages and even a few archaic ones. Fortunately, Unicode prevents our
having to switch between character standards to process such filenames. This
simplifies many string operations.
For example, in the common problem of splitting a filename off the end of a
fully qualified path, you typically start at the rightmost character and
traverse back to the first backslash, colon, or the beginning of the string;
see Example 1.
For standards which use a variable number of bytes per character (such as
Shift-JIS), it's impossible to cycle backwards through a string. You can't
determine if any single byte is the first or second byte of a multibyte
character or if it's a single-byte character. Consequently, algorithms must be
rewritten to start at the left and work toward the right. But with Unicode (or
other fixed-size character-code standards), you can traverse backwards, saving
you from having to rewrite code for international support.


Implementing Unicode Support


A basic technique for implementing Unicode is transparent-character support.
This is a technique I used to develop multiple tape backup applications for
NT. The goal of the project was to write and maintain a single, common set of
source code such that Unicode would be supported if the underlying operating
system provided sufficient support, and any other single-byte character set
would be supported otherwise. This support had to be as portable and
transparent as possible. Code which must specifically process strings of a
known character standard, therefore, had to be isolated to a few specific
files and code sections.
In order to achieve this goal, the following conditions had to be met:
Transparent characters, macros, and functions had to be used.
Operations which expect a specific byte size had to be avoided or
encapsulated.
I've discovered a number of nonobvious stumbling blocks in implementing
transparent-character support, particularly if you're converting a non-Unicode
application to either Unicode or transparent support. Also, you must consider
how to program today for a non-Unicode environment, while preparing for an
eventual migration to Unicode.
Multibyte character sets may be supported as well, but you must employ the
same localization techniques you previously used. Those techniques need not be
modified; you just follow the simple rules outlined here. However,
localization for multibyte character sets may reduce the efficiency of your
code.
Code written following the guidelines in Figure 1 will transparently support
Unicode, ASCII, or any other fixed-size character standard for all major
languages and cultures with only a minimal localization effort. The basic rule
of transparent-character support is that the size of a character code is
determined at compile time. If a preprocessor macro named UNICODE is defined
in your compiler's command-line arguments and the platform supports Unicode,
your application is compiled to support Unicode. Otherwise your application
will support any 1-byte character set supported by the platform. You need only
to define another macro and modify your header files to support another 2- or
4-byte character standard if your platform provides sufficient support for
that standard. If you do not compile for Unicode, the particular character
standard supported is determined by the strings contained in your
application's resources and the default code page selected by the user. This,
of course, presumes that all strings which require language translation are
placed in resource files. All modules should be compiled with Unicode
explicitly enabled or disabled and you should avoid mixing modules compiled
for different character sizes whenever possible.
Programs written to support multiple-character standards transparently require
a transparent-character type. Many assumptions typically made about character
variables are no longer valid. You must take care to ensure that code is truly
transparent and, if multiplatform support is required, portable. The solution
provided by transparent-character support is based on never presuming the size
of a character. Since a transparent character may be of any fixed size, a new
type is used, TCHAR, which represents a character of unknown (but fixed) size.
Other special types include:
LPTSTR, a pointer to a NULL-terminated transparent character string.
LPTCHAR, a pointer to a transparent character or array.
Figure 2 provides definitions of these types for Unicode and ASCII/ANSI.
You can't use the standard C char type to store transparent characters,
because char is the smallest atomic type, which is usually a byte. Neither can
you use the ANSI wchar_t type, since it's a "wide" character type that will
always be at least two bytes in size--too large for ASCII and other
single-byte standards.
Consequently, the first steps performed when converting your code for
transparent character support are:
1. Determine which data elements of type char, char *, or any other type
derived from those are text (and not binary) data.
2. Change all of those to use TCHAR, LPTSTR, or LPTCHAR.
These steps can be trivialized if you adapt and rigidly enforce the following
coding standard early on in your product's development style: Define a type,
such as BYTE, and always use that to store binary data, and never store any
nontext data in a TCHAR. In the absence of such a standard, these steps can be
tedious and time consuming.


Common Problems withPointer Arithmetic


While most character-pointer arithmetic is still valid, a number of widely
accepted coding practices will no longer work when using transparent
characters. These practices must be avoided or you may experience many wasted
hours correcting them.
One of the most common invalid assumptions is that the length of a string plus
one is equal to the number of bytes required to store it. This can be a
particularly annoying and sometimes difficult problem to deal with when
porting existing applications. You must learn to differentiate between string
lengths and buffer sizes, or serious programming errors will result. These
errors can be very difficult to resolve.
The next steps are usually easy, but can be difficult if a string length is
stored for later use or passed to other functions. The steps are:
Search for all calls to strlen(), and, if the return value is used to
determine a byte size, multiply by sizeof (TCHAR).
Search for all memory-allocation calls (malloc, GlobalAlloc, and so on), and
change those which allocate space for characters based on a character count to
use a byte length.
Array indexing and general pointer arithmetic to determine an offset usually
work fine, so changes are not normally required: *(achPtr+nIndex)=chSomeVal;.
In this example, the contents of the nIndexth entry of the array, achPtr,
would be assigned the correct value for any fixed-size character standard. As
a general rule, whenever dealing with operations involving transparent
characters and arrays, simply keep in mind that the size of each element may
be larger than a byte.

Another common violation is using a character as an index into a 256-element
array. Since a Unicode character is two bytes in length, you must either
increase the size of the array to 65536, or modify your algorithm. Also,
remember that subtracting two pointers yields the number of bytes between
those pointers, not the number of characters. Always multiply by sizeof
(TCHAR) when calculating the byte size of a transparent character buffer.
Likewise, always divide by sizeof (TCHAR) when calculating the number of
transparent characters which can fit in a specified number of bytes.


Wide-character Functions


Just as Unicode and transparent characters required new types, a set of new
functions is needed to process them. It's critical that, for any function
employed to process single-byte characters and strings, you have an equivalent
wide-character function. Windows NT and the Microsoft C libraries provide a
good starting set of routines. For the Microsoft C library, all wide-character
string routines utilize one of two simple naming conventions to distinguish
them from standard ASCII variants. In the first convention, as specified by
the ANSI C standard, all ASCII string functions beginning with str have
wide-character equivalents which begin with wcs. For example, the
wide-character version of strlen is wcslen, which, in both cases returns the
number of nonzero characters in a string.
The other naming convention used is to identify wide-character functions by
either prefixing or embedding a w in the function name. For example, the wide
version of printf is wprintf, while vswprintf is the wide version of vsprintf.
The header files included with the Microsoft C compiler for NT provide a
complete list of these functions. The Microsoft Win32 API uses a different
naming convention to distinguish ASCII and wide-character variants of the API
procedures. All ASCII/ANSI versions of the Win32 API end with the letter "A."
Similarly, all Unicode (or wide) versions end with a "W." Therefore,
CreateWindowA expects 1-byte characters, while CreateWindowW expects Unicode
characters. This is an excellent naming convention in that it is easy both to
remember the name of any character-specific routine, and to identify all
character-specific (that is, nontransparent) code at a glance.


Transparent-character Macros


What makes the Win32 naming convention particularly appealing is that, with
only a few exceptions, whenever you use a Win32 API procedure without an A or
a W suffix--that is, when you use the same name as the one used for Windows 3
applications--you are, by default, supporting transparent characters.
Consequently, code modifications to Windows API procedures usually aren't
required to globalize an existing application. The exceptions are those Win32
API file procedures considered obsolete but provided for backwards
compatibility to Windows 3; OpenFile and _lopen, for example. Consequently,
you must use the Win32 CreateFile family of functions to support Unicode
filenames. For C library functions, however, things aren't so easy. The Win32
naming convention replaces wcs with tcs and the prefixed or embedded w with t
to create transparent equivalents. To convert your code for
transparent-character support, therefore, you must replace all calls to these
string functions to use the transparent name instead. This usually entails a
great deal of work.
There is an alternate method which requires less work for most existing
applications. Instead of modifying the source code, you create a header file,
which maps the standard string function names to the wide-character
equivalents when compiling for Unicode, as in Example 2. By doing this for
each standard string function used in your code, you can often reduce the
number of changes required. However, this technique effectively removes all of
the standard character functions when you compile for Unicode. If you need to
process ASCII data, you'll have to write your own ASCII-specific replacements.
For each of these functions--including all those which use filenames, atoi(),
and some others--you must implement the wide version yourself, or avoid using
them.


Mixing and ConvertingCharacter Standards


There are times when code must be written to support multiple character
standards, either for backwards compatibility or for transferring data between
platforms. Care must be taken when mixing transparent coding techniques with
character-specific data types, or the resulting code will be very difficult to
maintain.
Code that supports a specific character type is not transparent and should be
isolated from transparent code, preferably by placing the code in a separate
function. One method of handling this is to create two versions of your
function--one that accepts Unicode strings and another for ASCII
strings--along with a transparent mapping macro, as described earlier. An
alternate method is to write a single function that takes and returns
transparent characters. On entry to this function, the TCHAR parameters are
converted to the required character standard and processed. Any resulting
characters are then converted back to TCHAR, and the function returns; see
Example 3.
Also, notice how the routines use the standard ANSI C functions wcstombs and
mbstowcs, which convert single- or multibyte strings to wide-character strings
and vice versa, using the current code page for the multibyte string. Keep in
mind that a Unicode string converted using wcstombs may contain multibyte
characters, which can complicate processing.


Reading and WritingUnicode Text Files


Unicode contains a special character code, called the "byte-order mark," which
has a value of 0xfeff and is used to identify a text file or data stream as a
collection of Unicode characters. It is important to use this mark whenever
reading or writing text files. Additionally, this mark provides the
information necessary to transfer data between Big- and Little-endian systems.
Whenever a Unicode text file is created, or a stream of Unicode text is passed
to another application or system, it is important to ensure that the first
character written is the byte-order mark. Likewise, whenever you open a text
file, or receive a stream of text, you should check for this mark. Then, you
will need to convert the text as it is read to transparent characters before
your application can use it. For this reason, you should encapsulate all file
operations in a simple API.
The byte-order mark value was chosen because the possibility of 0xfeff
occurring in the first two bytes of any non-Unicode text stream is highly
unlikely. Also, its mirror image, 0xfffe, which is not a Unicode character,
provides useful information for transferring data between Big- and
Little-endian systems, since their byte orders are reversed. Therefore, if a
text stream starts with 0xfffe, there is a good chance that it contains
byte-swapped Unicode text, so you will need to reverse the byte order of each
character as it is read.


The Future of Unicode


The Unicode standard is still new and not yet complete. The few issues that
remain to be resolved will not significantly affect the development of
transparent code. However, another character standard which defines a superset
of the Unicode standard is under development--ISO 10646. Fortunately,
applications developed to meet the guidelines presented in this article will
require little or no modification to support this standard as well. ISO 10646
comes in two flavors: two-byte characters, which are essentially the same as
Unicode, and 4-byte characters, which will eventually represent every
character used in every known language. However, it's doubtful that ISO 10646
will ever become a common standard; it will likely be limited to
special-purpose systems.
Unicode will probably become the predominant standard for all future systems.
Unicode systems are currently under development by Microsoft, Apple, Novell,
Next, Taligent, Metaphor, and others. By using transparent-character support
in your applications, you will be able to exploit the benefits of these
systems as they become available and quickly enter new language markets with a
minimum of effort, while maintaining compatibility with current systems and
standards.
Example 1: Splitting off the filename from a pathname.
for (pszTemp=szFName+strlen(szFName)-1;
 pszTemp>szFName && *pszTemp != \\' && *pszTemp !=':';
 --szTemp);


Figure 1: Guidelines for transparent-character support.
1. Place all strings that require language translation in your application's
resources.
2. Declare all text (not binary) data using TCHAR, LPTSTR, LPTCHAR.
3. Use transparent functions and macros for all TCHAR-type data.
4. Avoid using ASCII string functions which have no wide-character equivalent.
5. Check calls to strlen() and multiply by sizeof(TCHAR) if result is used as
a byte size.
6. Ensure all memory allocations for strings are based on byte size, not
character count.
7. Check all transparent-character pointer arithmetic for validity; avoid
using characters as indexes to 256-element arrays; remember that subtracting
two character pointers yields a byte size, not a character count; always
divide or multiply by sizeof(TCHAR) to convert between character counts and
byte sizes.
8. Isolate operations on specific character-code standards from transparent
code.
9. Use wcstombs() and mbstowcs() to convert between Unicode and
ASCII/ANSI-specific character codes.
10. Use the byte-order mark whenever processing text files and data streams.
11. Define UNICODE macro when compiling transparent modules for Unicode
support.
12. Avoid mixing modules compiled to support different default character-code
standards.


The Unicode Consortium



The Unicode Consortium is responsible for defining and promoting the Unicode
character standard. Consortium members include all the major international
computer systems providers: Adobe, Apple, Borland, DEC, Go, Hewlett Packard,
IBM, Lotus, Microsoft, Next, Novell, Sun, Symantec, Taligent, Unisys,
WordPerfect, Xerox, and others.
Unicode derives its name from its three main characteristics: universal--it
addresses the needs of world languages; uniform--it uses fixed-width codes for
efficiency; and unique--duplication of character codes is minimized.
The Unicode Consortium was formally incorporated in 1991 as a joint effort
between a number of the major computer software and hardware companies.
However, the Unicode standard first began at Xerox in 1985 when Huan-mei Liao,
Nelson Ng, Dave Opstad, and Lee Collins began working on a database to map the
relationships between identical Japanese and Chinese characters. This lead to
a technique called "Han Unification," which Unicode uses to minimize the codes
it requires.
At the same time, discussions on the development of a universal character set
got underway at Apple. In September of 1987, Joe Becker of the Xerox group and
Mark Davis of Apple began discussions on multilingual issues with a new type
of character encoding as a major topic. Later that year, Becker coined the
term "Unicode."
The following year saw Collins working at Apple on Davis's new
character-encoding proposals. By February Collins had incorporated the basic
architecture for Unicode, and Becker presented it to a /usr/group
international subcommittee meeting in Dallas. Apple decided to incorporate
Unicode support into TrueType.
In February of 1989, bimonthly meetings to discuss Unicode began between these
companies and Sun, Adobe, Claris, and the Pacific Rim Connections joined in.
In addition, Glenn Wright of Sun began the unicode@sun.com on Internet for
Unicode discussions. These discussions lead to the decision to incorporate all
composite characters in existing ISO standards. In 1990, there was a flurry of
activity, culminating in December, when Becker presented Unicode at a UNIX
International meeting. Cooperating with Apple on TrueType, Microsoft became
interested in Unicode and assigned Asmus Freytag to attend meetings. By March
all non-Han work had been completed, and work began on cross mappings and
order. Meanwhile, Wright and Mike Kernaghan of Metaphor started the process of
incorporating the Unicode Consortium, and the work continues today.
Much effort has been expended to meet the demands of each language represented
by Unicode and each company and country involved in its development.
International politics have often played a significant part in the definition
of this standard. The result is a highly consistent and easily usable standard
which will be around for a long time.
For more information, or to become a member of the Unicode Consortium,
contact: Unicode Consortium Inc., 1965 Charleston Road, Mountain View, CA
94043. (Phone 415-961-4189, fax 415-966-1637, or Internet
unicode-inc@HQ.M4.Metaphor.com.)
--D.V.C.
Figure 2: Type definitions for transparent characters.
#ifdef UNICODE // TCHAR's are Unicode
typedef wchar_t TCHAR;
typedef wchar_t * LPTSTR;
typedef wchar_t * LPTCHAR;
#else // TCHARS are ASCII/ANSI
typedef char TCHAR;
typedef char * LPTSTR;
typedef char * LPTCHAR;
#endif


Example 2: Portion of header file which maps the standard string function
names to the wide-character equivalents.
#ifdef UNICODE
#define strlen wcslen
#define sprintf swprintf
#endif

Example 3: Converting between transparent and ASCII characters.
VOID MyFunction ( LPTSTR pszXparent, int cchMaxBuf )
{
 char * pszMbyte;
 /* convert transparent parameter to ASCII if necessary */
#ifdef UNICODE
 /* allocate a buffer for the ASCII conversion of pszXparent */
 pszMbyte = malloc ( cchMaxBuf * sizeof (TCHAR) );
 _wcstombs (pszXparent, psAscii, cchMaxBuf * sizeof (TCHAR) );
#else
 /* string is already ASCII/ANSI so just use it directly */
 pszMbyte = pszXparent;
#endif
 /* perform ASCII-specific operations.... */
 . . .
 /* then convert the result back to a tranparent string... */
#ifdef UNICODE
 mbstowcs ( pszMbyte, pszXparent, cchMaxBuf );
 free ( pszMbyte );
#endif
 return;
}











March, 1994
Writing Non-SCSI CD-ROM Device Drivers


NT doesn't support non-SCSI drives--but you can!




Sing Li


The author is a product architect with Media Synergy in Toronto. He develops
for embedded systems, GUIportability platforms, and device drivers. You can
contact him at 70214.3466@compuserve.com.


One advantage of SCSI interfaces is their high data-transfer rate--anywhere
from 5 to 20+Mbytes/second. But when you consider the 300 Kbytes/second
data-transfer rate typical of CD-ROM drives, you wonder, "Why bother?" The
answer is, of course, compatibility. Although usually more expensive and often
slower, standard interfaces such as SCSI do allow you to mix-and-match CD-ROM
drives. Software development is also easier because you only support one
standard. This explains, in part, why the initial release of Microsoft Windows
NT and IBM OS/2 2.x only supported SCSI-based CD-ROM hardware.
However, there are still times when you'll want to use non-SCSI CD-ROM drives
with these operating systems. For one thing, non-SCSI CD-ROM drives often sell
for several hundred dollars less than their SCSI cousins. In other instances,
you may already have a number of proprietary-interface drives and may not want
to reinvest in new ones that provide essentially the same capabilities.
Whatever the case, I'll present in this article a non-SCSI CD-ROM device
driver for NT. But first, some background on CD-ROM formats and NT device
drivers.


CD Physical and Logical Format


The physical format of audio CDs (detailed in the "Red Book" from Philips) is
commonly referred to as the "CD-DA," or compact-disc digital-audio format. A
CD-DA disc is divided into sectors, each containing 2352 bytes of audio data.
Additionally, each sector contains 784 bytes of error-correction data and 98
control bytes. On all CD drives, these 784+98 bytes are handled exclusively by
the read hardware.
During audio play of all audio CD drives and most CD-ROM drives, Red Book
audio data is fed directly to decoding digital-to-analog conversion hardware.
Consequently, direct computer reading of Red Book audio from audio CDs usually
isn't possible. Some CD-ROM drive manufacturers make drives capable of
directly accessing Red Book audio. This gives you the capability to enhance,
manipulate, and edit commercial music.
While the Red Book describes the CD-DA format, the "Yellow Book" from
Philips/Sony describes the recording format for digital computer data on
CD-ROMs. Data on Yellow Book sectors is laid out in two "modes" based on the
2352 data bytes available on a CD-DA sector. Most currently available CD-ROM
titles are recorded in Mode 1, which specifies 2048 bytes of user data, 288
bytes of error-detection/correction data, and 16 bytes of synchronization and
header information; see Figure 1(a). Mode 1 sectors can be recorded on a disc
containing CD-DA audio--a "mixed-mode disc." The first "audio track" of a
mixed-mode disc is reserved for computer data recorded in Yellow Book Mode 1
sectors. Many popular multimedia titles, including Loom, 7th Guest, and Just
Grandma and Me, use mixed-mode format to incorporate professionally recorded,
stereo-CD audio tracks.
Mode 2 format is the base format for CD XA (Extended Architecture) discs. With
this mode, 2336 bytes are available for user data on each sector. This means
that the 288 bytes of Mode 1 error detection/correction is removed. Since the
chance of uncorrectable error is higher, this mode is seldom used for raw
computer data. Potentially, it can be used to store raw audio, image, or video
information in which single-bit errors aren't critical. The XA format builds
on this mode, making it more robust for handling data; see Figure 1(b).
The CD XA specification, specifically designed for multimedia, is an extension
to the Yellow Book, although currently hardware which can handle the standard
is lacking (CD-ROM drives and controllers). In the future, Kodak Photo CD,
Philips Video CD, and the MPC 2 standard may create more demands for CD-ROM
titles recorded in XA format. On top of a Yellow Book Mode 2 sector, the CD XA
standard can record data onto the 2336 available user data bytes in two
different forms. Form 1 has 2048 bytes of data, with an 8-byte leading
subheader, and 280 bytes of error-detection/correction information; see Figure
1(c). Form 1 is typically used for storing computer data on XA discs and is
similar to the Mode 1 format, with 2048 bytes of error-corrected data. The
Form 2 specification replaces the 280 bytes EDC/ECC of Form 1 with only four
bytes of error-detection data; see Figure 1(d). Lacking robust
error-correction capabilities, Form 2 sectors are typically used for audio or
video data only. The XA standard calls for ADPCM compression of audio data,
and the playback is usually assisted by decoding hardware.
On an actual XA CD disc, Form 1 and Form 2 sectors can be interleaved--this is
the XA's key advantage for multimedia applications. XA provides for
interleaving of data with audio or video, allowing playback of high-quality
multimedia content directly off the CD-ROM without expensive hardware
configurations. Distinction between sectors of the two forms is typically
handled via hardware by scanning the 8-byte subheader.
Since Mode 1, Mode 2, and XA sectors are based on the Red Book sector format,
it's theoretically possible to use an audio CD mechanism for CD-ROM purposes.
Some early CD-ROM drives took advantage of this. These designs quickly became
obsolete because of the slow audio-mechanism access time (typically 600
milliseconds to 1 second) and the intensive software decoding necessary to
extract data. Almost all CD-ROM drives use custom-designed CD-ROM mechanisms
with fast seek times and hardware data detection.
The logical format of a CD-ROM specifies how files and directory structures
are laid out over the physical sectors. The most commonly used logical format
is the ISO-9660 standard. As a cross-platform standard, ISO-9660 CD-ROMs can
be read by DOS, Macintosh, VMS, and UNIX workstations. Consequently, ISO-9660
inherits all the restrictions of the differing file systems, the most severe
being the eight-character filename and eight-level directory restrictions.
Since this is especially troublesome for UNIX, several UNIX-based companies
have proposed an extension to ISO-9660 standard--the Rock Ridge Extensions.
(For more on the Rock Ridge Extensions, see Lynne Jolitz's "Extending
Standards for CD-ROM" in the July 1993 issue of DDJ.)
Full ISO-9660 provides enhancements including security, multivolume sets, and
multiple-extent files (interleaving). In practice, most operating-system
extensions (such as MSCDEX on MS-DOS) don't currently support these features.
Consequently, few published CD-ROM titles use these enhancements.
Kodak introduced the Photo CD as its flagship product in the popular digital
photography market. Anyone can get up to 100 photo prints stored on a digital
Photo CD. Each picture is stored in six different resolutions, the highest
approaching the definition of the original 35mm negative. Besides direct
viewing of the photos on television, Photo CD can also be used to make prints
and even to recreate the original negative. Photo CD discs can be read on
XA-compatible CD-ROM drives, Philips CD-I systems, or consumer Photo CD
players.
The Kodak Photo CD format is compatible with the CD XA specifications. The
actual format is called "CD-I Bridge," and it is proprietary to Philips and
its licensees. Photographs are digitized via high-resolution scanners and
stored on the Photo CD in ISO-9660 format over CD XA Mode 2 Form 1 sectors.
The actual disc medium is a writable compact disc. Unlike injection-molded
"silver" CD-ROMs, Kodak Photo CDs are made with a gold substrate on pregrooved
discs writable via a modulated high-power laser beam.
Multisessions allow you to add new files (photos) to the write-once medium
each time you bring the Photo CD back to your photo finisher. Since the medium
is not rewritable, the directory structure of the original disc can't be
changed. However, on a multisessions Photo CD, a new directory structure of
the entire disc is created as each new session is written to the disc.
Multisessions-compatible CD-ROM drives will always seek and use the latest
directory structure when referring to the content of the disc. Noncompatible
drives will be able to read only the content of the original single session.


NT Device Drivers


As Figure 2 illustrates, NT is a layered, packet-driven system in which every
I/O request is represented as an I/O request packet. Since NT supports
multiple operating-system environments via subsystems, I/O must be flexible
and generic enough to handle demands from native NT, 16-bit Windows, DOS,
OS/2, and POSIX applications.
To simplify device-driver design for layered device drivers, NT uses Filter,
Class, Port, and Miniport drivers. Devices such as CD-ROM drives have common
requirements which can be abstracted. The only code that absolutely needs to
be rewritten for each device is the device-specific commands and protocol
handling. Under NT, monolithic drivers (which handle a single device
type/model or a list of specific devices, each requiring its own specific
driver code) can be divided into cooperating-operating driver layers. For
example, to control a SCSI CD-ROM device, you have a SCSI CD-ROM Class driver
that handles generic, data-related CD operation common to all CD-ROM drives.
To perform its task, the SCSI CD-ROM Class driver calls on the SCSI Port
driver, which provides a high-level generic SCSI command send/retrieval
interface, shielding any adapter-specific details from the Class driver. The
Port driver also understands the SCSI protocol and its implementation. To
control the device through a SCSI bus, the Port driver calls on the
adapter-specific SCSI Miniport to implement the low-level send, signal
control, and synchronization tasks that must be individually designed for each
adapter. Additionally, the SCSI Miniport driver makes use of services provided
by the SCSI Port driver to carry out its duties. In this way, the SCSI
Miniport is wrapped and will be transparent to changes within the SCSI Port
driver and the operating system. Figure 3 illustrates the relationship between
the layered drivers.
The SCSI Port and Miniport driver stack can service any SCSI device other than
CD-ROM, including hard disks, tapes, and the like. The CD-ROM driver only has
to be written once for any combination of SCSI CD-ROM drive and adapter. In
actual implementation, each SCSI CD-ROM drive may have its own unique features
not found in other devices. This is especially true when it comes to nondata
command handling.
Since the SCSI interface was designed to handle storage devices, the
storage-device management command has become a de facto standard, and most
CD-ROM devices comply with it. Extension commands (such as audio control,
multi-session handling, and so forth) are a different story, as most drive
manufacturers have their own extensions. Fortunately, the SCSI CD-ROM Class
driver can handle this. However, this requires modifying the SCSI CD-ROM Class
driver each time a new drive type is added. This approach is impractical since
Microsoft retains ownership of the SCSI CD-ROM Class driver under NT. Thus was
born the "CDAudio Filter driver," a collection of exceptions implemented as a
middle-layer device driver and presented in this article. The function of the
Filter driver (which sits between the I/O manager and the SCSI CD-ROM Class
driver) is to recognize and intercept any device-specific commands and process
them without passing them to the SCSI CD-ROM driver.
The upcoming SCSI 2 addresses the audio-control problem by providing
audio-control commands. This means that SCSI 2-compliant CD-ROM drives will
respond to the same set of audio-control commands, allowing implementation of
such extension in the SCSI Port driver and freeing the SCSI CD-ROM Class or
the CDAudio Filter.
The most obvious approach (and that recommended by Microsoft) is to hook into
the layers at the lowest level. This means writing a Miniport driver to
receive the SCSI command, decode it, handle the request, repackage the
response into SCSI status blocks, and transmit it back to a SCSI Port
driver--a software emulation of a SCSI adapter. While inefficient, this
approach does have the advantage of operating-system independence. This is the
approach I'll describe here.


The Driver Program


This driver handles data-read operations for ISO-9660 Mode 1 discs and is
sufficient for installing Windows NT from the distribution CD-ROM. For this
purpose, the driver will also assume that the CD-ROM is inserted into the
drive before the driver is started, and that there'll be no CD changes during
the operation of the driver.
The Windows NT installation procedure consists of an initial text setup during
which the user boots the setup floppy disk, which in turns loads in the
minimal keyboard, mouse, video, machine/HAL, disk, and SCSI drivers. This is
where you install this CD-ROM driver. TXTSETUP.OEM, the file which provides
the NT setup program with information concerning the Miniport driver, is an
ASCII script file included with the Windows NT DDK.
Once the drivers are loaded, the setup procedure lets you prepare the hard
disk and select location for installation of Windows NT. After this, the setup
program copies a working set of files from the CD onto the Windows NT
partition. My CD-ROM driver will also be copied during this process. Upon
completion, the hard-disk version of Windows NT can be booted to continue the
second phase of installation where NT is booted from the hard disk and the
driver is loaded in the operating system. The setup program then prompts you
to select the remaining installation option. Finally, the rest of the NT
system files are copied from the CD onto the hard disk.
The driver consists of CDNOW.C, (Listing One, page 110) which contains common
device-independent code identical to drivers on any CD-ROM drive, and
CDREAD.C, which contains code specific to the drive. In this case, the code is
specific to the Panasonic/MKE CR522b drive. This approach isolates the code
that must be changed when adapting the code for other drivers so that only
CDNOW.H and CDREAD.C need to be modified. The Panasonic CD-ROM drive
controller is typically configured as four I/O port locations which map into
the registers in Table 1. The drive and controller support only the polling
mode of data transfer. CDREAD.C and other relevant files and programmer notes
are available electronically; see "Availability," page 3.
To initially determine where the drive is installed, the CDREAD.C routine
LocateAdapterSpecific() scans all the legal I/O locations, looking for the
actual address of the drive. It then allocates the address from NT before
accessing it, and frees it if the drive isn't found at a specified location.
CDNOW.C contains routines which follow the classic SCSI Miniport format.
Microsoft provides various Miniport source-code examples with the Windows NT
DDK which can be used as reference. Since we aren't actually dealing with an
SCSI device, we will have to provide emulation at the appropriate spot. By
design, we'll only deal with data reads; other function requests are left as
unimplemented.
When Windows NT loads the driver, DriverEntry is called. For a SCSI Miniport
driver, the routine initializes the function tables with the Initialization(),
StartIo(), Interrupt(), FindAdapter(), and ResetBus() function pointers. It
also allocates and initializes a private area in the device extension for
storage of private data. The last thing the routine does is call
ScsiPortInitialize() (supplied by the SCSI Port driver) and return the status.
When ScsiPortInitialize() is called, the Port driver enumerates the adapters,
resets them, and locates all the devices on each adapter. It does this by
calling the functions registered in the functions table. FindAdapter
initializes required system data structures, and calls findAdapterPort() which
in turn calls LocateAdapterSpecific() in CDREAD.C to locate the adapter.

Once the adapter is located, NT calls StartI/O() with various SCSI requests.
These comes in the form of SCSI Request Blocks (SRBs). Within StartI/O(), we
must emulate the action of a SCSI host-bus adapter and a connected SCSI CD-ROM
drive.


SCSI Emulation


For the purposes of this driver, it's sufficient to handle Bus Reset, Inquiry,
Read Capacity, and Read Data requests. The current Bus Reset function does
nothing, but may be modified to reset the CD-ROM drive. For the Panasonic
drive, this isn't necessary since the drive is reset via hardware each time
the system restarts.
In handling the Inquiry request, you must emulate the existence of a CD-ROM
drive at a specific SCSI target ID. In the emulation, the target ID of 3 is
used. For Read Capacity requests, return a hardcoded capacity for a CD-ROM
disc. Since not all drives support the read-capacity command, you don't have
to call Panasonic's read-capacity command.
For Read Data requests, ReadCDRom() is called. This decodes the read request
and converts it to a form usable by CDRead() in CDREAD.C. To keep the driver
design simple, each read request is completed immediately. The CDRead()
function issues the command, waits until the data becomes available, and fills
the data buffer with the requested sector before returning.
Every request that we handle in StartIo() is completed synchronously. In
situations where interrupts and/or DMA are used, it's possible to return a
request pending status that allows the system to proceed with other tasks
while data is being read. In these cases, the interrupt-service routine must
queue a deferred-processing routine where the data can be transferred into the
buffer area and the request packet can be completed via
ScsiPortNotification(). Queuing of request is handled by the I/O Manager/SCSI
Port driver in these cases.


Conclusion


The driver presented here is for installing the operating system. To modify it
for normal data access, device-state changes must be tracked and handled
carefully. Requests, other than data read from higher-level software, must
also be considered.
For example, all audio functions can be implemented. To track device-state
changes, there are a total of five different types of popular CDs: audio only,
data only, data and audio mixed mode, XA single session, and XA multisessions.
The driver should theoretically handle all changes from one CD to another, as
well as tracking changes of CD of the same type. This gives a total
possibility of 120 state changes. To code a robust driver, all 120 cases
should be handled and tested individually. In reality, typical drivers handle
only a couple of dozen cases via ad hoc design.
SCSI Miniport emulation is only one way of implementing the driver. It's also
possible to replace the entire SCSI Filter/Class/Port/Miniport stack with a
single monolithic driver, or to design your own CD class/IDE port/IDE miniport
driver stack. Such designs can be much more attuned to the actual hardware
controlling the device. They can stand to gain in configuration flexibility
and performance.
Figure 1: (a) Yellow Book Mode 1 Sector; (b) Yellow Book Mode 2 Sector; (c) CD
XA Form 1; (d) CD XA Form 2.
(a) 16 bytes Sync Header + 2048 bytes User Data + 288 bytes EDC/ECC data =
2352 bytes

(b) 16 bytes Sync Header + 2336 bytes User Data = 2352 bytes

(c) 8 bytes subheader + 2048 bytes data + 280 EDC/ECC data = 2336 bytes

(d) 8 bytes subheader + 2324 bytes data + 4 bytes EDC data = 2336 bytes


 Figure 2: NT I/O architecture.
 Figure 3: The layered NT device driver.
Table 1: Panasonic CD-ROM drive I/O port registers.
 I/O Port Read Write

 Base Port Status code register Command register
 Base Port+1 Hardware status bits Not used
 Base Port+2 Data register Hard reset
 Base Port+3 Not used Configuration


[LISTING ONE] (Text begins on page 133.)

/* CALLGATE.H */
typedef DWORD (FAR PASCAL *GATEPROC)(WORD svc, WORD cnt, DWORD extra);

/* SeeYouAtRing0 services */
#define Get386_Svc 0 //get system info
#define PhysToLin_Svc Get386_Svc + 1 //map phys to linear
#define Register_Hwnd_Svc PhysToLin_Svc + 1 //register HWND
#define Unregister_Hwnd_Svc Register_Hwnd_Svc + 1 //unregister HWND
#define StopVM_Svc Unregister_Hwnd_Svc + 1 //toggle DOS box exec
#define RemapGate_Svc StopVM_Svc + 1 //remap call gate

typedef struct { /* call gate procedure parameters */
 DWORD G_Dword; // Dword parameter
 WORD G_Word; // Word parameter
 WORD G_Svc; // service number
}GPARAM;

/* RingoInit functions */

#define EXITRINGO 0xFFFF
#define INITRINGO 0

[LISTING TWO]

/* RINGO.C -- excerpts */
//#define CALLGATE_386 //define CALLGATE_386 to get gates from CALLGATE.386

#include <windows.h>
#include "386.h"
#include "callgate.h"

#ifdef CALLGATE_386
GATEPROC GetFirstCallGateVxD (FARPROC entrypoint,BYTE paramcount);
void DestroyInitGateVxD (WORD callgateselector);
#endif

VOID WINAPI RingoInit(void);
VOID WINAPI SeeYouAtRing0(void);
VOID WINAPI MakeSureOurSegIsInMemory(void);
GATEPROC GetLdtRing0CallGate (FARPROC entrypoint,
 BYTE paramcount,WORD callgate);
GATEPROC GDT_Gate,LDT_Gate;

int FAR PASCAL LibMain ( HANDLE hInstance, WORD wDataSeg,
 WORD cbHeapSize, LPSTR lpszCmdLine )
{
 FARPROC ri = (FARPROC) RingoInit;

 if (!(GetWinFlags () & WF_ENHANCED)) /*VxDs exist in enhanced mode only*/
 return 0;

#ifdef CALLGATE_386 // get the GDT call gate from CALLGATE.386
 if (!(LDT_Gate = GetFirstCallGateVxD (ri, sizeof(GPARAM)/4)))
#else // get the LDT call gate with INT 2F, AX=168A
 if (!(LDT_Gate = GetLdtRing0CallGate (ri, sizeof(GPARAM)/4, 0)))
#endif
 return 0;

 /*** get the main call gate in GDT ***/
 GDT_Gate = (GATEPROC)LDT_Gate (INITRINGO, sizeof(GPARAM)/4,
 (DWORD)SeeYouAtRing0);
 if (cbHeapSize)
 UnlockData (0);
 return (1);
}

char vendor[] = "MS-DOS"; // Microsoft's signature

GATEPROC GetLdtRing0CallGate (FARPROC gproc, BYTE params,WORD gatesel)
{
#define VENDOR_SPECIFIC_API 0x168a
WORD ldt_map; // LDT selector, which maps LDT itself
WORD (far * entryp)(void); // entry point to get the above
LPCALLGATEDESCRPT CGateDescriptor; // build call gate descriptor with this
WORD RW_ldt_map; /* ldt map selector fixes segment read-only problem */
WORD CGateSelector; // to be a call gate selector
DWORD initgate_flat; // callgate procedure's linear address


 _asm {
 mov si, offset vendor
 mov ax, VENDOR_SPECIFIC_API
 int 2fh
 or al, al
 jnz no_vendor
 mov word ptr [entryp], di /* private entry point */
 mov word ptr [entryp+2], es
 mov ax, 100h /* magic number */
 }

 ldt_map = entryp(); /* returns LDT map selector */

 _asm jnc vendor_ok
no_vendor:
 return 0;

vendor_ok:
 // When run under SoftICE/W LDT alias returns read_only, give us a good one
 if (!(RW_ldt_map = AllocSelector(SELECTOROF((void FAR *)&GDT_Gate))))
 return 0;
 SetSelectorBase(RW_ldt_map, GetSelectorBase(ldt_map));
 SetSelectorLimit(RW_ldt_map, GetSelectorLimit(ldt_map));
 if ((CGateSelector = gatesel) == 0) // we might already have one
 if (!(CGateSelector = AllocSelector(0))) // Get a selector for the gate
 {
 FreeSelector (RW_ldt_map);
 return 0;
 }

 // create a pointer to write into the LDT
 CGateDescriptor = MAKELP(RW_ldt_map,CGateSelector & SELECTOR_MASK);

 // build 32-bit ring 3-to-0 call gate
 #define MK_LIN(x) (GetSelectorBase(SELECTOROF(x)) + (DWORD)OFFSETOF(x))
 initgate_flat = MK_LIN(gproc);
 CGateDescriptor->Offset_O_15 = LOWORD (initgate_flat);
 CGateDescriptor->Offset_16_31 = HIWORD (initgate_flat);
 CGateDescriptor->Selector = 0x28; // ring0 flat code seg
 CGateDescriptor->DWord_Count = params & CALLGATE_DDCOUNT_MASK;
 CGateDescriptor->Access_Rights = GATE32_RING3; //pres,sys,dpl3,32CallGate
 FreeSelector (RW_ldt_map); // don't need you any more
 return ((GATEPROC)MAKELP(CGateSelector,0));
}

DWORD WINAPI _export MapPhysToLinear (DWORD physaddr, WORD mapsize)
{
 return (GDT_Gate)(PhysToLin_Svc,mapsize,physaddr); /* DPMI alternative */
}

[LISTING THREE]

;;; RINGO.INC -- excerpts

GPARAM struc ; parameters
 G_Dword dd ?
 G_Word dw ?
 G_Svc dw ?
GPARAM ends

CALLGATE_FRAME struc ; stack frame at the time of ring transition
 CG_pushbp dd ?
 CG_Old_EIP dd ? ; this is where we came from
 CG_Old_CS dd ? ; and will get back
 CG_Params db (type GPARAM) dup (?) ; call gate parameters
 CG_Old_ESP dd ? ; caller's
 CG_Old_SS dd ? ; stack
CALLGATE_FRAME ends

BuildGateStackFrame macro dataseg
 push ebp
 mov ebp,esp
 push gs
 push ds
 push es
 push fs
 push esi
 push edi
 ifidni <dataseg>,<_DATA>
 mov ax,ds
 mov gs,ax ; we'll access our data seg via gs
 endif
 mov ax,ss
 mov ds,ax ; ring 0 flat data delector
 mov es,ax
 mov fs,ax
 ifdifi <dataseg>,<_DATA>
 call GetRingoGdtDataSel
 endif
endm

ClearGateStackFrame macro cleanup
 pop edi
 pop esi
 pop fs
 pop es
 pop ds
 pop gs
 pop ebp
 ret cleanup
endm

movoffs macro reg,off32 ; run-time fixup
 mov reg, offset &off32
 add reg,gs:[ringo_flat]
endm

[LISTING FOUR]

;;; CALLGATE.ASM -- excerpts

.386p
 include vmm.inc
 include ringo.inc
 include 386.inc
public RingoInit,SeeYouAtRing0,MakeSureOurSegIsInMemory
_GATESEG segment dword use32 public CODE'
 assume cs:_GATESEG,gs:_DATA
RingoInit proc far

 BuildGateStackFrame _DATA
 cmp [ebp].CG_Params.G_Svc,EXITRINGOCALL
 jnz short @f
 call RingoExit ; deallocate everything we've got
 jmp short retini
@@: call RelocateRingo ; run-time relocation and fixups
 jc short init_ret
 call DynalinkTrick ; get the VxD chain root
 call InsertRingoDDB ; welcome to the VxD club
 call CreateRingoGDTGate ; GDT call gate to SeeYouAtRing0
retini: mov edx, eax ; prepare return values for the ring 3
 shr edx, 16
 ClearGateStackFrame <size CG_Params> ; clear both ring stack frames
RingoInit endp

SeeYouAtRing0 proc far ; The callgate service proc
 BuildGateStackFrame
 VMMCall Get_Cur_VM_Handle ; always helpful
 movzx eax, [ebp].CG_Params.G_Svc ; service dispatcher
 cmp eax,LASTSVC
 ja @f
 call gs:Gate_Service_Table[eax*4]
@@: mov edx, eax
 shr edx, 16
 ClearGateStackFrame <size CG_Params>
SeeYouAtRing0 endp

CreateRingoGDTGate proc
 movzx edx, word ptr [ebp].CG_Params.G_Dword ; offset16
 add edx,gs:[ringo_flat] ; fixup
 mov ax, cs ; VMM code selector
 mov cx, [ebp].CG_Params.G_Word ; parameter count
 and cx, CALLGATE_DDCOUNT_MASK ; make sure it's a reasonable number
 or cx, GATE32_RING3 ; call gate type
 call BuildCallGateDWords
 VMMCall _Allocate_GDT_Selector,<edx,eax,20000000h> ; undocumented flag
 ror eax,16
 ret
CreateRingoGDTGate endp

BeginProc DestroyGDTCallGate,public
 movzx eax,[ebp].CG_Params.G_Word
 VMMCall _Free_GDT_Selector,<eax,0>
 ret
EndProc DestroyGDTCallGate

BuildCallGateDWords proc
 movzx eax, ax
 shl eax, 16 ; selector
 mov ax, dx ; offset 0-15
 mov dx, cx ; offset 16-31 + type + count
 ret
BuildCallGateDWords endp

;****************************************************************************
; To get the VxD Base (VMM DDB ptr) we're using the undocumented fact that
; VMM's dynalink handler (considered a fault' 20h in DDK spec parlance)
; returns it in ecx. The idea is to hook VMM fault 20h, call any VMM service
; to get our fault handler receive control, call VMM's dynalink directly,

; store ecx in a static variable, and hook fault 20h again, this time
; with fault handlers reversed.
;****************************************************************************

BeginProc DynalinkTrick
 mov esi, gs:[OurDynalinkHandler]
twice: mov eax, 20h
 VMMCall Hook_VMM_Fault ; install our handler
 mov gs:[OLD_DYNALINK_HANDLER], esi
 VMMCall Get_VMM_Version ; need one call get it executed
 cmp esi, gs:[OurDynalinkHandler]
 jnz twice
 mov eax, gs:[VXD_FIRST]
 ret
EndProc DynalinkTrick

Ringo_Dynalink_Handler proc
 call gs:[OLD_DYNALINK_HANDLER]
 mov gs:[VXD_FIRST], ecx ; DDB pointer
 ret
Ringo_Dynalink_Handler endp

PhysToLin proc ; physical to linear address mapping
 movzx ecx, [ebp].CG_Params.G_Word
 VMMcall _MapPhysToLinear,<[ebp].CG_Params.G_Dword,ecx,0>
 ret
PhysToLin endp

ringo_flat dd 0 ; run-time space base
OLD_DYNALINK_HANDLER dd 0
VXD_FIRST dd 0 ; VxD chain root
OurDynalinkHandler dd offset Ringo_Dynalink_Handler
Ringo_DDB VxD_Desc_Block <,,,1,0,,'Ringo ,,offset RingoControlProc,,,,,,,>
Gate_Service_Table label dword
 dd offset Get386
 dd offset PhysToLin
 dd offset RegisterHWND
 dd offset UnregisterHWND
 dd offset StopVM
 dd offset RemapCallGate
_GATESEG ends
end
End Listings



















March, 1994
The Advanced SCSI Programming Interface


Building SCSIsupport into your code




Brian Sawert


Brian is an independent programmer specializing in device drivers and software
for SCSI peripherals. He can be reached at bsawert@grdpnt.flagstaff.az.us or
on CompuServe at 72027,2143.


In the near future, as CD-ROM drives, optical drives, and scanners become
standard fare, SCSI devices will likely dominate the PC peripheral market. And
as the market for SCSI peripherals grows, so will the demand for software that
supports them.
But SCSI is not without its downside. SCSI protocol has generally been
complex, a clear-cut standard for SCSI PC hardware has yet to emerge, and DOS
doesn't offer direct support for SCSI. The Advanced SCSI Programming Interface
(ASPI), however, offers a straightforward way to build SCSI capability into
your code. In this article, I'll examine the ASPI standard and discuss how to
use ASPI to solve SCSI-support problems.


The Problem with SCSI


Although the SCSI standard defines a communications protocol and a command
set, the missing link for the DOS programmer is the interface to the SCSI bus.
If you tackle the problem through low-level hardware programming, you restrict
your software to a particular type or class of SCSI host adapter.
ASPI, proposed by Adaptec, defines a set of high-level functions for
communicating with devices attached to the SCSI bus. It's designed to protect
you from the pitfalls of hardware-level programming and provide a standard
interface to SCSI host adapters.


The Advanced SCSI Programming Interface


Adaptec, Trantor, and Future Domain all offer ASPI compatibility with their
host adapters, ensuring hardware independence and a broader market for
software written to the standard. Even unusual host adapters such as Trantor's
MiniSCSI parallel-port adapter offer ASPI support.
ASPI offers many advantages to the SCSI programmer. Because it is widely
supported, it provides hardware and platform independence. Furthermore, ASPI
manager software is available for DOS, OS/2, Novell NetWare, and other
operating systems. Best of all, ASPI provides a high-level function set that's
easy to use and can dramatically shorten your development time. ASPI hides the
inner workings of SCSI protocol from the programmer. The ASPI manager handles
the gritty details of arbitration, selection, and message passing, returning
status codes and sense data when appropriate. You still need a basic knowledge
of SCSI structures and protocol to use ASPI effectively, but it can shorten
your learning curve by making experimentation easy.
ASPI does have its shortcomings. For example, the ASPI manager does not return
a transfer count. If your device does not support the illegal-length indicator
(ILI) and residual count in the sense data, there's no way to check how much
data was transferred. Some ASPI managers fail to recognize certain devices,
restricting the operations you can perform on them. And the number of
different status codes returned by an ASPI call requires additional error
checking by the calling program.


How ASPI Works


Under DOS, ASPI takes the form of a character device driver. The driver
installs the ASPI manager--a device named $SCSIMGR. At startup, the ASPI
manager polls the SCSI bus, looking for attached devices. A device must be
turned on before bootup in order for ASPI to recognize it. The only time you
access the ASPI driver through DOS is when you initialize it. At other times,
a far call to the ASPI-manager entry point gives access to the ASPI function
set. This makes ASPI functions safe to use from within a device driver or
memory-resident program.
Using ASPI requires knowledge of the SCSI command set and capabilities for the
device you wish to support. Keep in mind that ASPI provides interface
functions, not high-level SCSI functions. The ASPI manager merely passes data
through to the SCSI device without modifying or inspecting it. A point of
confusion sometimes arises because ASPI parameters such as transfer length are
in Intel order, while SCSI command-descriptor block (CDB) parameters appear in
order of MSB to LSB.
The ASPI function set is small, but powerful. Some of the functions return
information about the driver and environment, while others actually
communicate with the SCSI device. The ASPI specification defines the functions
in Table 1. Probably the most useful function in the ASPI set is Function 2
(execute SCSI I/O request). This function passes a SCSI CDB to the device,
handles data transfer, and returns messages and status codes. If the request
results in a Check Condition status, ASPI requests sense data from the device,
making it available to the calling program. Consider this function a direct
channel to the SCSI bus, as you will use it for most of your SCSI requests.
To initialize the ASPI manager, get a file handle to the ASPI driver by
issuing a DOS open call to the $SCSIMGR device. Next, use a DOS IOCTL read to
obtain the ASPI entry-point address. This is a far address you call to execute
any of the ASPI functions. Last of all, close the file handle. You won't need
to access the driver through DOS again. Listing One (page 159) illustrates
these steps.
All ASPI functions use a SCSI request block (SRB). The SRB has a common 8-byte
header for each function, containing an ASPI command code (see Table 2),
host-adapter number, and request flags. The header also contains a status byte
that the ASPI manager uses to return the outcome of the ASPI request. Calling
an ASPI function requires filling in the proper SRB for the function, pushing
the far address of the SRB on the stack, and making a far call to the ASPI
entry point.
Functions that only access the ASPI manager return immediately. Functions that
communicate with the SCSI device may return with the status byte set to 0,
indicating that the request is still in progress. ASPI offers two methods for
determining whether a SCSI request has completed: polling and posting.
Polling, as the name implies, simply means periodically checking the status
byte for a nonzero value. Posting, on the other hand, tells the ASPI manager
to execute a post-processing routine when the SCSI request has completed. You
enable posting by setting the Post bit in the request flags and passing the
address of the routine in the SRB. The ASPI documentation recommends that
applications use polling, reserving posting for device drivers or TSR
programs, which may be interrupt driven.


SCSI I/O Under ASPI


The most complex function in the ASPI specification is also the most powerful.
Function 2 (execute SCSI I/O request) is the gateway to the SCSI bus. The SRB
for this function contains the data-buffer address and size, the target SCSI
ID, and the SCSI CDB. Function 2 is also the most awkward to use. In addition
to the ASPI status byte, it returns status codes for both the host and the
target device. It also returns sense data if the SCSI request produces any.
Finding the sense data can be tricky. It does not reside at a fixed offset in
the SRB, but appears immediately after the SCSI CDB. You must keep track of
the size of the CDB to locate the sense data. Once you are aware of these
issues, using this function becomes quite simple. Just fill in the SRB, call
the ASPI entry point, and watch the status byte until the request completes.
Listing Two (page 159) demonstrates how to call the ASPI I/O function using
polling.


ASPI Under Windows


Using ASPI under Windows is a bit more complex. Because the ASPI manager
expects real-mode addresses in segment:offset form, the protected-mode
selector:offset type of address will not work properly. You must pass
real-mode addresses for the SRB and data buffers.
The entry-point address the ASPI manager returns is also a real-mode address.
This means that under protected mode, you cannot call this address directly.
The solution to both problems is to use Windows' DPMI services, but there is a
catch here that may not be obvious: These services are not available to a
Windows application; you can only use them from within a DLL.
If you're starting to get discouraged, take heart. Accessing the ASPI manager
requires only three DPMI functions. Function 100h (allocate DOS memory block)
and function 101h (free DOS memory block) manage real-mode memory for the ASPI
SRB and data buffers. The Windows functions GlobalDOSAlloc and GlobalDOSFree
use these services, and they are easier to work with than the corresponding
DPMI functions. Be careful, however, that you don't use real-mode buffers too
freely. Allocating real-mode memory means taking it from the address space
below 1 megabyte, where memory is a limited resource.

The only DPMI function you must use directly is function 301h (call real-mode
procedure with far-return frame). Since the ASPI entry point is a real-mode
address, you risk a protection fault if you try to call it from within
Windows. Function 301h lets you call the entry point from protected mode,
although the procedure is somewhat cumbersome. The calling function must fill
a structure with register values for the real-mode procedure and set up a
stack for passing parameters. The DPMI server can provide a default stack, but
the size is limited. Stack usage may vary between ASPI drivers, so you may
wish to provide your own.
Listing Three (page 160) recasts the call to the ASPI entry point as a DLL
function. The call requires some inline assembly code and uses the DPMI
default stack for simplicity. The real-mode call structure duplicates the 32-
and 16-bit CPU registers. Setting up the call requires filling the CS:IP
fields in the structure with the real-mode address of the ASPI entry point.
Pass the pointer to the SRB on the protected-mode stack, setting CX to the
number of words pushed. The DPMI server will create a real-mode stack with
these parameters. Last of all, execute an INT 31h to call the procedure.
If you want to know more about DPMI functions, download the DPMI spec from the
CompuServe Intel forum. For examples of using Windows DPMI services, look in
the Microsoft Software Library on CompuServe.


Conclusion


The complete listings (available electronically, see "Availability," page 3)
for these examples include source code for an ASPI library and an application
to inventory the SCSI bus. For Windows programmers, the listings also include
source code for an ASPI DLL.
If you want to investigate ASPI further, Adaptec sells the ASPI Software
Developer's Kit, which contains the specification and programming guides for
DOS, Windows, OS/2, and NetWare, as well as utilities and sample code. (Call
Adaptec at 800-442-7274 or call their BBS at 408-945-7727.)
Table 1: ASPI functions.
 Function Description
 Function 0 Host-adapter inquiry
 Function 1 Get device type
 Function 2 Execute SCSI I/O request
 Function 3 Abort SCSI I/O request
 Function 4 Reset SCSI device
 Function 5 Set host-adapter parameters
 Function 6 Get disk-drive parameters
Table 2: The ASPI DOS specification.
 Command Code Description
 Command Code 0 Returns information about installed host adapters: the number
of Host-adapter Inquiry adapters installed and strings identifying the ASPI
manager and host adapter.
 Command Code 1 Returns the peripheral device type code for the device at a
Get Device Type given SCSI target ID.
 Command Code 2 Executes a SCSI command, passing a CDB to a device and Execute
SCSI I/O Request managing data transfer and status return. Requests sense data
if a Check Condition status occurs.
 Command Code 3 Aborts a pending SCSI operation.
 Abort SCSI I/O Request
 Command Code 4 Resets the device at a given SCSI target ID.
 Reset SCSI Device
 Command Code 5 Sets unique parameters for a host adapter. Few manufacturers
Set Host-adapter Parameters implement this function.
 Command Code 6 Returns information for a SCSI disk drive. Returns preferred
head Get Disk-drive Information and sector translations and flags for INT 13h
support. This is a recent addition to the ASPI spec and is not widely
supported.
[LISTING ONE] (Listing continued, text begins on page 154.)

// ------ Initializing the ASPI driver. -------
#include "aspi.h" // ASPI definitions and constants
#include "scsi.h" // SCSI definitions and constants
// -------------------- defines and macros --------------------
#define IOCTL_READ 2 // IOCTL read function
#define ASPI_NAME "SCSIMGR$" // SCSI manager driver name
// -------------------- global variables --------------------
void (far *ASPIRoutine) (aspi_req_t far *); // far address of ASPI routine
BYTE f_installed; // flag for ASPI existence

aspi_req_t *srb; // SCSI Request Block pointers
// ----------------------------------------------------------------------
// Routine to check for and initialize ASPI driver.
// Usage: int aspi_open(void);
// Returns nonzero on success, 0 if ASPI driver not found or init failed.
int aspi_open(void)
 {
 int dh; // ASPI driver handle
 if (!f_installed)
 { // not initialized
 if ((dh = open(ASPI_NAME, O_RDONLY O_BINARY)) != -1)
 { // open device driver
 if (ioctl(dh, IOCTL_READ, (void *) &ASPIRoutine,
 sizeof(ASPIRoutine)) == sizeof(ASPIRoutine))
 { // got ASPI entry point
 srb = (aspi_req_t *) malloc(sizeof(aspi_req_t));
 if (srb != NULL)

 { // allocated SRB buffers
 f_installed++; // set installed flag
 }
 }
 close(dh); // close device driver
 }
 }
 return(f_installed);
 }
// ----------------------------------------------------------------------
// Routine to close and clean up.
// Usage: void aspi_close(void);
// Called with nothing. Returns nothing.
void aspi_close(void)
 {
 if (f_installed)
 { // already initialized
 free(srb); // deallocate buffers
 f_installed = 0; // clear installed flag
 }
 return;
 }


[LISTING TWO]
// ------ Calling the ASPI entry point and ASPI I/O. -------
#include "aspi.h" // ASPI definitions and constants
#include "scsi.h" // SCSI definitions and constants

// -------------------- defines and macros --------------------
#define BUSY_WAIT 0x1000 // repeat count for busy check
#define MIN_GRP_1 0x20 // minimum SCSI group 1 command
// -------------------- global variables --------------------
void (far *ASPIRoutine) (aspi_req_t far *); // far address of ASPI routine
BYTE f_installed; // flag for ASPI existence
BYTE aspi_stat; // ASPI status byte
BYTE host_stat; // host status byte
BYTE targ_stat; // target status byte
BYTE host_num; // host adapter number (0 default)

aspi_req_t *srb; // SCSI Request Block pointers
sense_block_t *srb_sense; // pointer to SRB sense data
// ----------------------------------------------------------------------
// Routine to call ASPI driver.
// Usage: int aspi_func(aspi_req_t *ar);
// Called with pointer to SCSI Request Block (SRB).
// Returns nonzero on success, 0 on error.
int aspi_func(aspi_req_t *ar)
 {
 int retval = 0;

 if (f_installed)
 { // ASPI manager initialized
 ASPIRoutine((aspi_req_t far *) ar); // call ASPI manager
 retval++;
 }
 return(retval);
 }
// ----------------------------------------------------------------------

// Execute SCSI I/O through ASPI interface.
// Usage: int aspi_io(BYTE *cdb, BYTE far *dbuff, WORD dbytes,
// BYTE flags, BYTE id, WORD *stat);
// Called with pointer to data buffer, data buffer size, pointer to CDB,
// request flags, and target ID. Returns ASPI status on success, -1 on error.
// Fills stat variable with host status in high byte, target in low byte.
int aspi_io(BYTE *cdb, BYTE far *dbuff, WORD dbytes, BYTE flags,
 BYTE id, WORD *stat)
 {
 int cdbsize;
 int timeout; // timeout counter for polling
 int retval = -1;

 memset(srb, 0, sizeof(aspi_req_t)); // clear SRB
 srb->command = SCSI_IO; // set command byte
 srb->hostnum = host_num; // set host adapter number
 srb->su.s2.targid = id; // set target SCSI ID
 srb->reqflags = flags; // set request flags
 srb->su.s2.databufptr = dbuff; // set pointer to data buffer
 srb->su.s2.datalength = dbytes; // set data buffer length
 srb->su.s2.senselength = sizeof(sense_block_t);
 // set sense data buffer length
 cdbsize = sizeof(group_0_t); // assume 6 byte CDB
 if (*((BYTE *) cdb) >= MIN_GRP_1 && *((BYTE *) cdb) < MIN_GRP_6)
 { // CDB size is 10 bytes
 cdbsize = sizeof(group_1_t);
 }
 srb->su.s2.cdblength = cdbsize; // set CDB length
 srb->su.s2.senselength = MAX_SENSE; // sense sense data length
 memcpy(srb->su.s2.scsicdb, cdb, cdbsize); // copy CDB to SRB
 srb_sense = srb->su.s2.scsicdb + cdbsize; // point to sense data buffer
 if (aspi_func(srb))
 { // ASPI call succeeded
 timeout = BUSY_WAIT; // set timeout counter

 while (srb->status == REQ_INPROG && timeout > 0)
 { // request in progress - keep polling
 timeout--; // decrement timeout counter
 }
 retval = aspi_stat = srb->status; // save ASPI status
 if (aspi_stat != REQ_INPROG)
 { // request completed
 host_stat = srb->su.s2.hoststat; // save host status
 targ_stat = srb->su.s2.targstat; // save target status
 *stat = ((WORD) host_stat << 8) targ_stat;
 // return combined SCSI status
 }
 }
 return(retval);
 }

[LISTING THREE]
// ------ Initializing and using ASPI in a DLL. ------
#include "aspi.h" // ASPI definitions and constants
#include "scsi.h" // SCSI definitions and constants
// -------------------- defines and macros -------------------
#define IOCTL_READ 2 // IOCTL read function
#define ASPI_NAME "SCSIMGR$" // SCSI manager driver name
#define DPMI_INT 0x31 // DPMI interrupt number

#define REAL_CALL 0x301 // real mode call function

typedef struct RealCall
 { // struct for real mode call
 DWORD edi, esi, ebp, reserved, ebx, edx, ecx, eax;
 WORD flags, es, ds, fs, gs, ip, cs, sp, ss;
 } RealCall_t ;
// -------------------- global variables -------------------
void (far *ASPIRoutine) (aspi_req_t far *); // far address of ASPI routine
BYTE f_installed; // flag for ASPI existence
aspi_req_t _FAR *srb; // SCSI Request Block pointers
DWORD dwPtr; // return from GlobalDOSAlloc
// -------------------- local functions -------------------
void far *MaptoReal(void far *pptr); // map to real mode address
int AspiCall(void far *aspiproc, aspi_req_t far *ar); // DPMI function
// ----------------------------------------------------------------------
// Routine to check for and initialize ASPI driver.
// Usage: int FUNC aspi_open(void);
// Returns nonzero on success, 0 if ASPI driver not found or init failed.
int FUNC aspi_open(void)
 {
 int dh; // ASPI driver handle
 UINT wSRB; // selector for buffer memory
 if (!f_installed)
 { // not initialized
 if ((dh = open(ASPI_NAME, O_RDONLY O_BINARY)) != -1)
 { // open device driver
 if (ioctl(dh, IOCTL_READ, (void far *) &ASPIRoutine,
 sizeof(ASPIRoutine)) == sizeof(ASPIRoutine))
 { // got ASPI entry point
 dwPtr = GlobalDosAlloc(sizeof(aspi_req_t));

 if (dwPtr[0] != NULL)
 { // allocated SRB buffer
 wSRB = LOWORD(dwPtr[0]); // extract selector
 GlobalPageLock(wSRB); // lock memory
 srb = (aspi_req_t _FAR *) MAKELP(wSRB, 0);
 f_installed++; // set installed flag
 }
 }
 close(dh); // close device driver
 }
 }
 return(f_installed);
 }
// ----------------------------------------------------------------------
// Routine to close and clean up.
// Usage: void FUNC aspi_close(void);
// Called with nothing. Returns nothing.
void FUNC aspi_close(void)
 {
 UINT wSRB; // selector for buffer memory
 if (f_installed)
 { // already initialized
 wSRB = FP_SEG(srb); // extract selector from pointer
 GlobalPageUnlock(wSRB); // unlock buffer
 GlobalDosFree(wSRB); // deallocate buffer
 dwPtr = NULL;
 srb = NULL;

 f_installed = 0; // clear installed flag
 }
 return;
 }
// ----------------------------------------------------------------------

// Routine to call ASPI driver.
// Usage: int aspi_func(aspi_req_t _FAR *ar);
// Called with pointer to SCSI Request Block (SRB).
// Returns nonzero on success, 0 on error.
int aspi_func(aspi_req_t _FAR *ar)
 {
 aspi_req_t far *arptr; // real mode pointer
 int retval = 0;
 if (f_installed)
 { // ASPI manager initialized
 if ((arptr = (aspi_req_t far *) MaptoReal(ar)) != NULL)
 { // got a valid real mode pointer
 retval = AspiCall(ASPIRoutine, arptr); // call ASPI through DPMI
 }
 }
 return(retval);
 }
// ----------------------------------------------------------------------
// Routine to map protected mode pointer to real mode.
// Usage: void far *MaptoReal(void far *pptr);
// Returns real mode pointer on success, NULL on error.
void far *MaptoReal(void far *pptr)
 {
 WORD sel; // protected mode selector
 void far *ptr = NULL; // real mode pointer
 sel = FP_SEG(pptr); // get selector value
 if (sel == LOWORD(dwPtr))
 { // found matching selector
 ptr = MAKELP(HIWORD(dwPtr), FP_OFF(pptr));
 // build real mode pointer
 }
 return(ptr);
 }
// ----------------------------------------------------------------------
// Call ASPI manager through DPMI server.
// Usage: int AspiCall(void far *aspiproc, aspi_req_t far *ar);
// Returns 1 on success, 0 otherwise.
int AspiCall(void far *aspiproc, aspi_req_t far *ar)
 {
 RealCall_t rcs; // real mode call struct
 int retval = 0;
 int npush;
 void far *sptr;
 memset(&rcs, 0, sizeof(RealCall_t)); // clear call structure
 rcs.cs = HIWORD(aspiproc); // point to real mode proc
 rcs.ip = LOWORD(aspiproc);
 npush = sizeof(aspi_req_t far *) / sizeof(WORD); // words to pass on stack
 sptr = (void far *) &rcs; // far pointer to call structure
 asm {
 push di // save registers
 push es
 sub bx, bx // don't reset A20 line
 mov cx, npush; // number of words to copy to stack

 les di, sptr // point ES:DI to call structure
 mov ax, REAL_CALL // load function number
 push WORD PTR [ar + 2] // push SRB address
 push WORD PTR [ar]
 int DPMI_INT // call DPMI server
 pop ax // restore stack count
 pop ax
 pop es // restore registers
 pop di
 jc asm_exit // DPMI error
 mov retval, 1 // return 1 for success
 }
 asm_exit:
 return(retval);
 }

End Listings













































March, 1994
Emulating Non-DOS Systems Under MS-DOS


Creating a development environment for working within the familiar confines of
DOS




Dan Troy


Dan is a software engineer at Granite Communications Inc., 9 Townsend West,
Nashua, NH 03063. He can be reached at 603-881-8666, or by fax at
603-881-4042.


One of the challenges embedded-systems designers have grappled with for years
is developing for operating systems that require unfamiliar, nonstandard
development tools. Coincidentally, this is the problem that is now confronting
many programmers writing software for the emerging class of hand-held,
wireless, personal-data communications devices such as the Newton MessagePad
from Apple, Zoomer from Casio, and, in this case, my company's VP5 personal
digital-communication device. Still, history has taught us that:
No matter how slick it is, a hardware device won't be accepted by users if
there's no software for it.
Programmers won't write software for a hardware device unless the development
environment is at least approachable.
Realizing this, we wanted to create a development environment for the VP5 that
would allow programmers to work within the familiar confines of MS-DOS using
DOS-based tools. Furthermore, we wanted programmers to test applications under
MS-DOS before loading the cross-compiled program into the VP5. Just to
complicate matters, we also needed to develop a simple mechanism for adding
new operating-system function-emulation capability as existing products are
enhanced, or as new products are developed. This article presents the solution
we devised for the VP5 development environment.
Our approach to remotely executing an application isn't specific to the VP5,
however. You should be able to apply it to any similar situation where
noncompatible operating systems exist. Also, since the emulation code is
written in ANSI C, it is portable to any other microprocessor/operating-system
environment, as long as the necessary hooks exist between the development
platform and the remote communication device. In addition to this, contained
in the emulation code is the simple, table-driven function-execution
mechanism, which is applicable to any other remote-execution environment
written in C.


Our Solution


The VP5 is a wireless, hand-held, touchscreen personal data communicator (see
Figure 1) designed to allow VARs and programmers to develop custom touchscreen
applications written in C. To emulate as many videopad functions as
possible--including the touchscreen and display--we decided to remotely
execute the proprietary videopad operating-system functions from MS-DOS on the
PC. The developer could then see the VP5 screen exactly as it appears when the
application is cross-compiled, linked, loaded, and executed in the VP5 itself.
Also, the developer could actuate developer-defined, touchscreen-sensitive
keys to make sure the application did what it was expected to do. This gives
WYSIWYG feedback in the development process. To accomplish this goal, we
remotely commanded the VP5 via the serial ports on the PC and VP5. In order to
allow easy emulation expandability for future enhancements and portability to
new products, we developed a table-mapping mechanism that would handle C
operating-system functions and their associated parameters.
The overall emulation logic is summed up in Figure 2. To implement all 89
videopad operating-system calls, we implemented a table-driven mechanism. The
table is defined as an array of structures of type shown in Listing One (page
58). Listing Two (page 58) shows a portion of the table. According to the
table, the inputs to the define_key() function are character, integer,
character, character, character, character, character, string. There are no
outputs (NULL string). This corresponds to the function prototype in Listing
Three (page 58). The execution sequence to remotely define a key on the VP5
keyboard would be like that in Listing Four (page 58).
On the PC side, the VP5 operating-system library function looks like Listing
Five (page 58) which calls process_cmd(); see Listing Six (page 58). This, in
turn, sets up and sends a message to the special VP5 emulation application and
receives the response from the VP5.
The counterpart of the define_key() function resides on the VP5. It uses the
arg_table[] identical to that on the PC-emulation software. Listing Seven
(page 60) is the VP5 emulation-command processor.
The spawn_define_key() function is a spawned function located in the
arg_table[]; see Listing Eight (page 62). No return values must be passed to
the define_key() function in the customer application. However, in get_key(),
the videopad must return the key value pressed on the videopad. In this case,
the VP5 operating-system library function looks like Listing Nine (page 62).
Since the get_key() function has no inputs, none are required to set up the
call to process_cmd(). The actual key pressed on the videopad is in c[1]. But
why are there two outputs--c[0] and c[1]--in the get_key() table entry, when
there's only one output from get_key()? Because in this case, get_key() on the
videopad will wait forever for a key to be pressed under normal operation. If
the application-program developer decides to take his or her time pressing a
key after get_key() has been called, then the emulation software on the PC
will eventually time out. To avoid this and allow the developer to spend as
much time as he or she desires before pressing a key on the videopad, we
implemented the algorithm in Listing Ten (page 62) on the VP5. Here, the
spawned function spawn_get_key() is found in the arg_table[]. If no key has
been pressed, the function immediately returns a False condition for c[0]. The
emulated get_key() function on the PC loops until a key is pressed, thus
maintaining constant communication between the PC and the VP5.
One drawback of executing the application remotely is that the VP5 RS-232 port
is not available for the application since it's being used by the emulator. In
order to emulate the VP5 RS-232 COMserial-port communications within the
emulated application, the other RS-232 port on the PC, COM2, is used. In other
words, data that would normally be transferred on the VP5 COMport is instead
transferred via COM2 on the PC, since the VP5 COM port is being used for
application emulation. We created a special cable so that the VP5 COM port is
now represented by p4 (see Figure 3). The p4 connector is the same as the
COMport on the V5.


A Sample Program


Listing Eleven (page 62) is a typical VP5 program in which the
touch-sensitive, boxed key appears on the VP5 and the VP5 waits until the
defined key is pressed. When the defined key is activated by the user
touchscreen, the screen clears, and the message displayed in the WELCOME_KEY
is replaced with the "Requesting Info" message. The VP5 then logs on to the
base station, sends a message to the host, and waits until a message is
received back from the host. Then the received message is displayed in the
defined text window on the VP5 screen for ten seconds. This sample program can
be tested under CodeView, utilizing its full functionality and incorporating
DOS calls such as printf() while debugging.
 Figure 1: Non-DOS VP5 handheld device.
Figure 2: Flow of control in VP5 emulation environment.
VP5 operating-system function call on PC

VP5 function in a library

VP5 function on PC sends command for this function to VP5 via RS-232 port

Special application on VP5 receives command for this function

Special application on VP5 executes this function

Special application on VP5 returns any parameters from function call back to
PC

VP5 function in library on PC returns any parameters to actual application
function call
 Figure 3: VP5 RS-232 COM serial-port configuration.
[LISTING ONE] (Text begins on page 52.)

typedef struct args
{

 unsigned char *input; /* string of inputs to function */

 unsigned char *output; /* string of outputs from function */
 void (*spawn_function)(); /* emulated VP5 OS function */
};

[LISTING TWO]

const struct args arg_table[] =
{

/* COMMAND RESPONSE SPAWN COMMAND */
/* ARGS ARGS FUNCTION OFFSET */

 "cicccccs", "", spawn_define_key, /* 0 */
 "cs", "", spawn_display_key_label, /* 1 */
 "ccc", "", spawn_display_window, /* 2 */
 "", "cc", spawn_get_key, /* 3 */
 "c", "c", spawn_rf_close, /* 4 */
 "cic", "cc", spawn_rf_open, /* 5 */
 "i", "cis", spawn_rf_read, /* 6 */
 "is", "c", spawn_rf_write, /* 7 */
 "", "", spawn_turn_off, /* 8 */
 "", "", spawn_wait_1_sec, /* 9 */
};

[LISTING THREE]

 void define_key(char key_name, int key_attributes, char
 start_row, char start_col, char end_row, char end_col,
 char font_table, char *key_label).

[LISTING FOUR]

#define WELCOME_KEY W'
 .
 .
 .
define_key(WELCOME_KEY, BOXSENSITIVE, A', 1', B', 8',
 STANDARD_FONT,
 "WELCOME to GRANITE'S VP5\nPRESS HERE to BEGIN");
 .
 .
 .
[LISTING FIVE]

void define_key(char key_name, int key_attributes, char
 start_row, char start_col, char end_row, char end_col,
 char font_table, char *key_label)
{
 c[0] = key_name;
 i[0] = key_attributes;
 c[1] = start_row;
 c[2] = start_col;
 c[3] = end_row;
 c[4] = end_col;
 c[5] = font_table;
 strcpy(s[0], key_label);
 cmd = DEFINE_KEY;

 process_cmd();
}

[LISTING SIX]

void process_cmd(void)
{
 .
 .
 .
 /* frame up command to send to videopad */
 while(element_type = arg_table[cmd].input[element++])
 {
 switch(element_type)
 {
 /* argument type of character */
 /* pack it into global cmd_buff via pack_char() */
 case c': pack_char(c[c_cnt++]);
 cmd_buff_length++;
 break;
 /* argument type of string */
 /* pack it into global cmd_buff via pack_char() */
 case s': str_length = 0;
 do
 {
 temp_char = s[s_cnt][str_length++];
 pack_char(temp_char);
 }
 while(temp_char);
 s_cnt++;
 cmd_buff_length += str_length;
 break;

 /* argument type of integer */
 /* pack it into global cmd_buff via pack_char() */
 case i': pack_char((char)(i[i_cnt] >> 8));
 pack_char((char)(i[i_cnt++] & 0xff));
 cmd_buff_length += 2;
 break;
 }
 }
 /* send framed up function command to videopad */
 if(link_layer_send(cmd_buff, cmd_buff_length) !=
 TX_OK) process_fatal_error(NO_LINK_RESPONSE);
 /* look for link layer response from videopad - time out
 if none is found within expected time */
 while((rx_status = link_layer_received()) != READY 
 rx_status != TIMED_OUT);
 if(rx_status == TIMED_OUT) process_fatal_error(TIMED_OUT);
 /* process response from videopad */
 .
 .

 .
 /* if first byte of response buffer (status) is
 not a status command character, then process error */
 if(response_buff[0] != RESPONSE_STATUS_COMMAND)
 process_fatal_error(NON_STATUS_RETURNED);
 while(element_type = (arg_table[cmd].output[element++]))

 {
 switch(element_type)
 {
 /* argument of type character */
 /* unpack it and put into global c[] array */
 case c': c[c_cnt++] = unpack_char();
 break;
 /* argument of type string */
 /* unpack it and put into global s[][] array */
 case s': str_length = 0;
 while(((s[s_cnt][str_length++] =
 unpack_char()) != 0) && (str_length <=
 MAX_STRING_SIZE));
 break;
 /* argument of type integer */
 /* unpack it and put into global i[] array */
 case i': i[i_cnt] = (int)(unpack_char() << 8);
 i[i_cnt] = (int)unpack_char();
 break;
 }
 }
 .
 .
 .
}
[LISTING SEVEN]

void main(void)
{
 int status;
 init_link_layer();
 for(;;)
 {
 /* look for link layer command from PC - time out if
 none is found within expected time */
 while((rx_status = link_layer_received()) != READY 
 rx_status != TIMED_OUT);
 if(rx_status == TIMED_OUT)
 process_fatal_error(TIMED_OUT);
 /* process command from PC */
 /* first byte of command message is command */
 cmd = cmd_buff[0];
 while(element_type = (arg_table[cmd].input[element++]))
 {
 switch(element_type)
 {
 /* argument type character */
 /* unpack it and put into global c[] array */
 case c': c[c_cnt++] = unpack_char();
 break;
 /* argument type string */
 /* unpack it and put into global s[][] array */
 case s': str_length = 0;
 while(((s[s_cnt][str_length++] =
 unpack_char()) != 0) && (str_length
 <= MAX_STRING_SIZE));
 break;
 /* argument type integer */
 /* unpack it and put into global i[] array */

 case i': i[i_cnt] = (int)(unpack_char() << 8);
 i[i_cnt] = (int)unpack_char();
 break;
 }
 }
 /* look up spawning function corresponding to cmd */
 function_address = arg_table[cmd].spawn_function;
 /* execute spawning function */
 (*function_address)();
 /* prepare response message to last received cmd */
 response_buff[0] = STATUS_OK;

 /* frame up command to send to videopad */
 while(element_type = arg_table[cmd].input[element++])
 {
 switch(element_type)
 {
 /* argument type of character */
 /* pack it into global response_buff via
 pack_char() */
 case c': pack_char(c[c_cnt++]);
 response_buff_length++;
 break;
 /* argument type of string */
 /* pack it into global response_buff via
 pack_char() */
 case s': str_length = 0;
 do
 {
 temp_char = s[s_cnt][str_length++];
 pack_char(temp_char);
 }
 while(temp_char);
 s_cnt++;
 response_buff_length += str_length;
 break;
 /* argument type of integer */
 /* pack it into global response_buff via
 pack_char() */
 case i': pack_char((char)(i[i_cnt] >> 8));
 pack_char((char)(i[i_cnt++] & 0xff));
 response_buff_length += 2;
 break;
 }
 }
 /* send framed up function command to videopad */
 if(link_layer_send(response_buff,
 response_buff_length) !=
 TX_OK) process_fatal_error(LINK_ERROR);
 }
}


[LISTING EIGHT]

void spawn_define_key(void)
{
 define_key(c[0], i[0], c[1], c[2], c[3], c[4], c[5], s[0]);
}









[LISTING NINE]

char get_key(void)
{
 do
 {
 cmd = GET_KEY;
 process_cmd();
 }while(!c[0]);

 /* return actual key pressed on videopad here */
 return(c[1]);
}


[LISTING TEN]

void spawn_get_key(void)
{
 if(key_pressed())
 {
 c[0] = TRUE;
 c[1] = get_key();
 }
 else c[0] = c[1] = FALSE;
}


[LISTING ELEVEN]

/* special vp5 library header files */
#include <key.h>
#include <times.h>
#include <control.h>
#include <rf.h>
#include <display.h>

#define WELCOME_KEY W'
#define BASE_STATION_NUMBER 7
#define PERIOD 30

char *msg_to_send = "Please send me the info";
unsigned int unsol_size;

unsigned char unsol_msg[2048];

void main(void)
{
 unsigned char key;
 int i;
 unsigned char remote_number;

 unsigned int msg_length;
 /* define a boxed, touch sensitive key on rows A and B of
 the VP5 screen */
 define_key(WELCOME_KEY, BOXSENSITIVE, A', 1', B', 8',
 STANDARD_FONT, "WELCOME to GRANITE'S VP5\n"
 "PRESS HERE to BEGIN");
 /* wait until a sensitive VP5 key is touched */
 key = get_key();
 /* replace message in WELCOME_KEY with a new message */
 display_key_label(WELCOME_KEY, "Requesting Info\n"
 "Waiting for response");
 /* log on to base station via RF */
 rf_open(BASE_STATION_NUMBER, PERIOD, unsol_msg,
 &unsol_size, unsol_handler, &remote_number);
 /* send the message via RF to the host */
 rf_write(strlen(msg_to_send), msg_to_send);
 /* wait for the response from the host */
 rf_read(&msg_length, msg_rcvd);
 /* display response from host on VP5 screen from line 3
 through line 12 */
 display_window(TEXT_WINDOW, 3, 12);
 for(i = 0; i < msg_length; i++) display_char(msg_rcvd[i]);
 /* log off the base station via RF */
 rf_close(BASE_STATION_NUMBER);
 wait_1_sec(10); /* display rcvd msg for 10 seconds */
 turn_off(); /* shut off VP5 */
}


End Listings
































March, 1994
Cross-Platform Development with Visual C++


A familiar API for UNIX, Windows, and more




Chane Cullens


Chane works with Wind/U at Bristol Technology and can be reached at
chane@bristol.com or 203-438-6969.


The current crop of hardware architectures and operating environments, each
with its own particular set of features, offers exciting possibilities for
software developers. However, timely development of applications that take
advantage of the unique capabilities of platforms ranging from DOS, Windows
3.1, the upcoming Windows 4.0, and NT, to OS/2, UNIX, and Macintosh can be a
challenging undertaking. Toss new CPUs such as the Pentium and PowerPC into
the ring, and you're faced with some serious development decisions.
The most common approach to tackling such challenges includes using
cross-platform APIs (such as XVT, Neuron Data Open Interface, and Visix
Galaxy) or cross-platform application frameworks (Inmark's zApp, C++/Views,
and the Zinc Framework, for example). These tools can solve most of your
portability problems, but programmers often end up wanting a familiar API
that's available across a wide variety of operating environments and hardware
architectures.
Of all the available programming interfaces, the Microsoft Windows API has
become the most pervasive. Furthermore, one of the benefits of using the
Windows API is that a large number of high-quality tools and class libraries
are available, including those that enable you to maintain a single set of
source code for different platforms. For example, with Wind/U from Bristol
Technology (the company I work for), you recompile Visual C++ code so that it
runs as an X/Motif app on UNIX. The Mirrors toolkit from Micrografx, on the
other hand, lets you recompile Windows code generated by Microsoft C 6.x or
Watcom C++ 16-bit for OS/2 applications. Likewise Microsoft's Wings, an
announced--yet unreleased--toolkit based on the Win32 API will someday allow
you to port Windows applications to the Macintosh. (Wings will likely include
the Microsoft Foundation classes, associated libraries, code generator, and
cross-compiler to the 680x0 architecture.)
Although generally regarded as a DOS/Windows development tool, Microsoft's
Visual C++ and the Microsoft Foundation Class (MFC) library can be used as a
cross-platform development tool. This article discusses how you can use Visual
C++/MFC as the cornerstone of your cross-platform development efforts. If you
write code applying the guidelines presented here, you can more easily cross
architectural hurdles when using cross-platform APIs, cross-platform
frameworks, or current and future portability toolkits.
There are several technical reasons for choosing the Windows API over Motif,
particularly for UNIX applications. Even without considering portability,
Windows offers much richer GUI components and paradigms. The typical Motif
application is about as sophisticated as Windows 1.0 programs were. Most
applications don't print (X/Motif has no built-in printing model), provide
online help, use tool/status bars, do much in the way of graphical drawing, or
cleanly support multiple documents (there's no MDI in Motif). In other words,
portability is only one reason why UNIX developers should consider the Windows
API as a development environment.


Portability Pitfalls


Simply choosing a cross-platform class library or API does not solve all the
portability problems involved in writing an application. You must also
consider compiler differences, API nuances, and hardware-architecture
dependencies.
In general, UNIX compilers are based on the AT&T cfront implementation, and PC
compilers are implemented to be cfront 3.0 compatible. Visual C++ is very
compatible with the UNIX C++ 3.0 compilers supplied by HP, IBM, and Sun, but
not identical. One sure way to minimize the differences is to compile with
verbose warning messages on all architectures and update the source code to
remove these warnings. The following figures identify some minor differences
between compilers and easy workarounds to remove the problems.
Figure 1, for instance, shows how Visual C++ allows typecasting using
function-call syntax. cfront compilers only support the C syntax for
typecasting (more on typecasting in the following sections).
Visual C++ allows type int and user-defined type BOOL to be interchanged. With
other platforms, user-defined types may be defined slightly differently.
Consistent use of the user-defined types will avoid any problems; see Figure
2.
In Figure 3, Visual C++ allows variable declaration in switch statement cases
without requiring a new scope. Likewise, Visual C++ allows extra semicolons in
class definitions, as in Figure 4. HP C++ 3.05, on the other hand, does not
correctly handle nested macro expansion; see Figure 5. Nor are the Visual C++
compiler #pragma warning(disable: 4xxx) directives--pragmas used in MFC to
eliminate warning messages during compiles--available in UNIX.
Templates and exceptions normally lead to nonportable code. Although VC++
doesn't directly support templates or exceptions, Microsoft supplies a
template generator and includes exception classes with MFC which are portable
and can be used on all platforms.


API Differences


Of the various Windows API flavors (Win16 for Windows 3.1 for 16-bit
applications, Win32 for 32-bit NT apps, and Win32s for portable 32-bit
applications), the Win32s API is the cross-platform Windows API. Win32s is
available on Windows 3.1 with the Win32s DLLs and on Windows NT, Macintosh
System 7 from Microsoft, and UNIX from Bristol Technology.
Additionally, MFC allows you to have a single set of source for Windows 3.1,
Windows 3.1 with Win32s DLLs, Windows NT, UNIX, and Macintosh. The Win32s API
builds on the Win16 API by adding features from Win32 and does not include
nonportable functions from Win16.
Consequently, you shouldn't make calls to the Win16 functions in Table 1 since
they're not included in Win32s. The Win32 functions in Table 2, however, have
been included, as have the Win32 messages in Table 3. Finally, the Win16
functions in Table 4 have been changed in Win32s.


Word Sizes, Structure Packing, and Byte-Ordering Issues


Independent of the cross-platform toolkit, you must pay attention to
differences in byte ordering, word sizes, and structure packing.
In the Windows 3.1 environment, integers are normally 16 bits wide; in most
other environments, they are 32 bits wide. Example 1 is nonportable (but
working) 16-bit Windows code. Porting this code to NT or UNIX would cause
problems if the value of nOne was ever greater than 65,535 because it would
suddenly become too large to fit into wTwo (which is only 16-bits wide); the
wTwo variable would wrap and start back at 0. Normally, C++'s strong type
checking will not allow code like this to survive, so 16/32-bit problems are
not common in C++ unless typecasting is used.
Another common 16/32-bit problem is structure packing. On 16-bit systems,
compilers pack structures based on 16-bit boundaries by default. On 32-bit
systems, the compilers use 32- or 64-bit boundaries (they waste a byte here
and there to ensure that the elements of a structure are aligned properly).
The end result is that the sizeof operator will return different results in
16- and 32-bit environments. Structure packing can cause problems if you read
structures to and from binary files. MFC does not write structures to file,
but does not prevent the programmer from doing so.
The other common portability problem between Windows and UNIX is byte
swapping. Some UNIX workstations, such as the Sun SPARCstation, have
Big-endian (vs. Intel's Little-endian) byte ordering. This means that you
can't make assumptions about the order of bytes in structures. C++ does not
protect the programmer from these problems. Example 2(a) shows the
byte-swapping problem using classes from MFC. This code makes the fatal
mistake of assuming that data in the DWORD dwPoint will be ordered exactly the
same as the tagPoint structure. To fix the problem, the typecasting is
replaced by the Windows LOWORD and HIWORD macros to deconstruct a DWORD
properly. Example 2(b) is a portable version of the CPoint::CPoint(DWORD)
constructor.


Conclusion


When it comes to cross-platform application development, the Windows API is
more than a least-common denominator. This, coupled with C/C++ standards, make
it an attractive environment for programmers who have to support more than one
platform.
With the great strides that software development tools are making, a year from
now a cross-platform solution may be as easy as selecting a radio button in
your visual development environment's Build Options dialog box.

Figure 1: (a) Sample error message; (b) nonportable statements; (c) portable
statements (change to normal C-style typecasting).


(a) file.C, line 100: error: syntax error

(b) int Number = 26;unsigned char Letter; ...Letter = unsigned char (Number);

(c) Letter = (unsigned char)Number;


Figure 2: (a) Sample error message; (b) nonportable statement; (c) portable
statement (change the return value to match the base-class return-value type).
(a) file.C, line 100: error: WinCalApp::ExitInstance() type mismatch: int
WinCalApp::ExitInstance() and BOOL WinCalApp::ExitInstance()

(b) BOOL WinCalApp::ExitInstance()

(c) int WinCalApp::ExitInstance()
Figure 3: (a) Sample error message; (b) nonportable statements; (c) portable
statements (enclose the statements in a pair of braces to explicitly define
the scope of the new variable).

(a) file.C, line 100: error: jump past initializer (did you forget a { }'?)

(b) default: int Number = GetSomeNumber(); ... Doit(Number)

(c) default: { int Number = GetSomeNumber(); ... Doit(Number) }


Figure 4: (a) Sample error message; (b) nonportable statement; (c) portable
statement (the tailing semicolon after the macro must be deleted).


(a) file.C, line 100: error: syntax error

(b) DECLARE_DYNAMIC(ClassName);

(c) DECLARE_DYNAMIC(ClassName)

Table 1: WIN16 functions not included in Win32s.
AccessResource
AllocDiskSpace
AllocDSToCSAlias
AllocFileHandles
AllocGDIMem
AllocMem
AllocResource
AllocSelector
AllocUserMem
Catch
ChangeSelector
ClassFirst
ClassNext
CloseComm
CloseDriver
CloseSound
CountVoiceNotes
DefDriverProc
DeviceCapabilities
DeviceMode
DirectedYield
DlgDirSelect
DlgDirSelectComboBox
DOS3Call
ExtDeviceMode
FlushComm
FreeAllGDIMem

FreeAllMem
FreeAllUserMem
FreeSelector
GetAspectRatioFilter
GetBitmapDimension
GetCodeHandle
GetCodeInfo
GetCommError
GetCommEventMask
GetCurrentPDB
GetCurrentPosition
GetDCOrg
GetEnvironment
GetInstanceData
GetKBCodePage
GetMetaFileBits
GetModuleUsage
GetSelectorBase
GetSelectorLimit
GetSystemDebugState
GetTempDrive
GetTextExtent
GetTextExtentEx
GetThresholdEvent
GetThresholdStatus
GetViewportExt
GetViewportOrg
GetWindowExt
GetWindowOrg
GetWinFlags
GlobalDosAlloc
GlobalDosFree
GlobalEntryHandle
GlobalEntryModule
GlobalFirst
GlobalInfo
GlobalNext
GlobalPageLock
GlobalPageUnlock
InterruptRegister
InterruptUnRegister
LocalFirst
LocalInfo
LocalNext
LockInput
MemManInfo
MemoryRead
MemoryWrite
ModuleFindHandle
ModuleFindName
ModuleFirst
ModuleNext
MoveTo
NetBIOSCall
NotifyRegister
NotifyUnRegister
OffsetViewportOrg
OffsetWindowOrg
OpenComm

OpenDriver
OpenSound
PrestoChangoSelector
Prof* (8 functions)
QuerySendMessage
ReadComm
ScaleViewportExt
ScaleWindowExt
SetBitmapDimension
SetCommBreak
SetCommEventMask
SetCommState
SetEnvironment
SetMetaFileBits
SetResourceHandler
SetSelectorBase
SetSelectorLimit
SetSoundNoise
SetViewportExt
SetViewportOrg
SetVoice* (6 functions)
SetWinDebugInfo
SetWindowExt
SetWindowOrg
StackTraceCSIPFirst
StackTraceFirst
StackTraceNext
StartSound
StopSound
SwapRecording
SwitchStackBack
SwitchStackTo
SyncAllVoices
SystemHeapInfo
TerminateApp
Throw
TransmitCommChar
UnAllocDiskSpace
UnAllocFileHandles
UngetCommChar
ValidateCodeSegments
ValidateFreeSpaces
WaitSoundState
WriteComm
Yield


Figure 5: (a) Sample error message; (b) nonportable statements; (c) port-able
statements.


(a) file.C: 100: Overflowed replacement buffer.

(b) #define DEBUG_NEW new(__FILE__, __LINE__)#if DEBUG#define new
DEBUG_NEW#endifCObject *obj = new CObject;

(c) #define DEBUG_NEW new(__FILE__, __LINE__)#if DEBUG#define MYnew
DEBUG_NEW#else#define MYnew new#endifCObject *obj = MYnew CObject;

Table 2: Win32 functions included in Win32s (some functions are no-ops).
AbnormalTermination
AddFontModule

AdjustTokenGroups
Beep
CallNextHookEx
CloseHandle
CompareFileTime
ContinueDebugEvent
CopyCursor
CopyFile
CopyIcon
CreateDirectory
CreateFile
CreateFileMapping
CreateProcess
DeleteCriticalSection
DeleteFile
DosDateTimeToFileTime
DrawEscape
DuplicateHandle
EnterCriticalSection
EnumFontFamProc
EnumResLangProc
EnumResNameProc
EnumResourceLanguages
EnumResourceNames
EnumResourceTypes
EnumResTypeProc
EnumThreadWindows
ExitProcess
ExitThread
ExtEscape
FileTimeToDosDateTime
FileTimeToSystemTime
FindClose
FindFirstFile
FindNextFile
FlushFileBuffers
FreeDDElParam
GetCommandLine
GetCurrentDirectory
GetCurrentProcess
GetCurrentProcessId
GetCurrentThread
GetCurrentThreadId
GetDiskFreeSpace
GetEnvironmentStrings
GetEnvironmentVariable
GetExpandedName
GetFileAttributes
GetFileSize
GetFileTime
GetFileType
GetFullPathName
GetLastError
GetLogicalDrives
GetProcessExitCode
GetSaveFileName
GetStartupInfo
GetStdHandle
GetSystemTime

GetTempPath
GetThreadContext
GetVolumeInformation
HeapAlloc
HeapCreate
HeapDestroy
HeapFree
HeapSize
InitializeCriticalSection
IsWindowUnicode
LeaveCriticalSection
LockFile
MapViewOfFile
MapViewOfFileEx
MoveFile
NetBios
PackDDElParam
PeekMessageEx
PostThreadMessage
PrintDlg
RaiseException
ReadFile
ReadProcessMemory
RegCloseKey
RegOpenRegistry
ReleaseMutex
ReleaseSemaphore
RemoveDirectory
RemoveFontModule
ReuseDDElParam
SearchPath
SetBrushOrgEx
SetCurrentDirectory
SetEndOfFile
SetEnvironmentVariable
SetFileAttributes
SetFilePointer
SetFileTime
SetLastError
SetLastErrorEx
SetStdHandle
SetSystemTime
SetThreadContext
Sleep
SystemTimeToFileTime
TlsFree
TlsGetValue
TlsSetValue
UnhandledExceptionFilter
UnlockFile
UnmapViewOfFile
UnpackDDElParam
VirtualAlloc
VirtualFree
VirtualQuery
WaitForDebugEvent
WordBreakProc
WriteFile




Table 3: Win32 messages included in Win32s.
BM_GETIMAGE
BM_SETIMAGE
DM_GETDEFID
DM_SETDEFID
EM_GETTHUMB
WM_CTLCOLOR_BTN
WM_CTLCOLOR_DLG
WM_CTLCOLOR_EDIT
WM_CTLCOLOR_LISTBOX
WM_CTLCOLOR_MSGBOX
WM_CTLCOLOR_SCROLLBAR
WM_CTLCOLOR_STATIC
WM_GETHOTKEY
WM_HOTKEY
WM_MOUSEENTER
WM_SETHOTKEY

Table 4: Win16 functions in Win32s.
AddFontResource
GetClassWord
GetWindowWord
RemoveFontResource
SetClassWord
SetWindowWord


Example 1: Nonportable, 16-bit Windows code.
typedef unsigned short WORD;
int function()
{
 int nOne;
 WORDwTwo;
 ...
 ...
 wTwo = (WORD)nOne;
}


Example 2: (a) Byte-swapping problem using MFC classes; (b) a portable version
of the CPoint::CPoint(DWORD) constructor.
(a) struct tagPOINT{ short x; short y;};class CPoint : tagPOINT { ...
CPoint::CPoint(DWORD); ...};CPoint::CPoint(DWORD dwPoint);{ *(DWORD *)this =
dwPoint;}

(b) CPoint::CPoint(DWORD dwPoint){ x=LOWORD(dwPoint); y=HIWORD(dwPoint);}

















March, 1994
Database Development and Visual Basic 3.0


Putting a pretty face on client/server architectures




Ken North


Ken has developed DBMS projects for mainframe, mini, PC, and client/server
systems and is currently writing a book about Windows multi-DBMS programming
for Wiley & Sons. Contact him at Resource Group Inc., 2604B El Camino Real,
#351, Carlsbad, CA 92008, or on CompuServe at 71301,1306.


Although Microsoft calls the current incarnation of Visual Basic the "Visual
Basic 3.0 Professional Edition," it wouldn't be stretching it to refer to it
instead as "the Visual Basic Database Edition" because of the database
orientation of its tools and functionality. In this article, I'll examine the
Visual Basic 3.0 environment and related tools, presenting in the process a
multimedia database application that uses a local Access database (MDB). I'll
also discuss how you can revise this database so that you can use it with ODBC
and a local dBase (DBF) file. Of course, you'll need an MDB, DBF, and ODBC
data source with comparable table definitions.


Visual Programming


You produce Visual Basic (VB) applications by combining components in a
systematic manner. The visual programming model promotes a division of labor
between specialists who write custom controls and those who build
applications. In most cases, the process of building an application involves
selecting components and writing code that binds the pieces together. With VB,
you create screens by defining forms containing Windows controls such as
scroll bars, list boxes, text boxes, labels, and other user-interface objects.
For each object, you set properties and generate code (methods) that executes
in response to events.
VB 3.0 includes an integrated database engine (the Access Engine), a database
manager that works with multiple DBMS formats, support for OLE 2.0, messaging
(MAPI), database connectivity (ODBC), multimedia (MCI), graphics (Graphics
Server), and report generation (Crystal Reports). VB does not support pure
object-oriented programming, DOS text mode, or an Xbase dialect. The large
number of plug-in custom controls from third-party developers, however, covers
a range of functionality including telecommunications, fax, spreadsheets,
calendars, imaging, graphics, data management, and GUI objects; see the text
box entitled, "Add-on Data-base Tools for Visual Basic."


Custom Controls


Custom controls are essentially plug-and-play components packaged in DLLs.
Microsoft supports a specialized DLL--the VBX (Visual Basic eXtension)--with
its VB and Visual C++ (VC++) compilers. The Professional Edition of these
compilers includes 19 custom controls and a Control Development Kit (CDK).
Developers who write custom controls using the CDK may produce three levels of
controls based on compatibility with VB 1.0, 2.0, and 3.0. VC++ programmers
can use VBXs compatible with VB 1.0.
Also noteworthy is that Microsoft Foundation Classes 2.0 (included with VC++)
contain classes that implement a VBX-emulation layer for VB 1.0 controls.
Controls used with C++ do not have the full functionality of VB 2.0/3.0
controls in several areas, including drag-and-drop behavior, run-time
controls, container controls, and control arrays.


The Database Connection


One of the benefits of VB 3.0 is the separation of the front end from the back
end of a database application. In A Guide to Developing Client/Server SQL
Applications (Morgan Kaufmann, 1992) Khoshafian et al. describe 12 rules for
client software that include a client-autonomy rule whereby client software
shall behave the same whether processing data at the client or server. VB 3.0
provides that type of transparency: A user can run an application using
virtually the same user interface whether the application operates in native
mode on an MDB, attaches to local tables in other database formats, or
connects to an ODBC server. VB provides a good vehicle for prototyping and
debugging using the Access Engine for projects that run on an SQL server in
their final form.
JET, the Access 1.1 Engine, reads and writes databases in a variety of formats
including Access, dBASE III and IV, FoxPro 2.x, Btrieve, Paradox 3.x, and a
number of SQL (ODBC) databases. The Access Engine also provides query
optimization, transaction processing, optimistic and pessimistic locking,
distributed multitable joins, and so on. However, although VB programmers can
manipulate Access's data, VB doesn't support unique Access features such as
forms and macros or the ability to run Access applications. Further, VB
programs cannot implement Access-style security from within VB.
To develop a database application with VB, you can either use the Object Layer
or data controls, both of which are supported by the Access Engine. The Object
(or Programmatic) Layer is VB's option for programmers who feel comfortable
writing code. Data-aware controls (or more commonly, data controls) are an
easy-to-use noncode solution that constitutes VB's Visual Layer. On the other
hand, some developers bypass the JET engine because they've chosen other means
for building database applications. VB provides alternatives such as
third-party database libraries and the ability to write directly to the API of
database DLLs such as ODBC.
Bound controls--data-aware controls that simplify access to a database--are a
major new enhancement for database developers. VB associates a bound control
with a specific database column or field. You create the association by
binding the control to the data control--a scroll bar associated with a
dynaset query (a dynamic query result set), SQL statement, or QueryDef. Bound
controls work with the Access Engine while the data control is associated with
Dynaset, Snapshot, or tables. The Professional Edition has eight bound
controls, and the Toolbox shows icons for Check box, Image, Label, Picture
box, Text box, 3-D check box, 3-D panel, and Masked edits. Combos and list
boxes are not bound.
Finally, it's possible to bypass the engine using the SQL pass-through option,
a tool for obtaining better performance and executing stored procedures at the
server. However, you must manage your own connections if you use ExecSQL. VB
and Access write to level 1 of the ODBC API so not all of the new features
work with every ODBC driver. If you work with core-level ODBC drivers, you
must continue to write directly to the ODBC Call Level Interface (CLI) and
forego the ODBC abstraction and new controls (often preferable for performance
reasons). Access and VB provide an option to bypass the Access Engine's SQL
parse of the SQL and pass it through to the ODBC data source. To use the
pass-through option and bypass the parse, set the data control's Option
property to SQL_PASSTHROUGH. VB 3.0 supports transaction processing with three
statements (BeginTrans, CommitTrans, and Rollback) that are preferred to the
database methods from prior releases.
VB's data-access capabilities are not the only features useful for building
database applications. Others include extensions for communications, object
linking, report writing, graphics, and mail. VB includes Pinnacle's graphics
library, Graphics Server SDK, and the Crystal Reports database report writer.
Crystal Reports for Visual Basic includes a VBX that provides approximately 15
of the 60 calls in the full API for the Crystal Reports print engine. The VBX
is available at design time but hidden at run time. Crystal Reports for Visual
Basic doesn't include the report compiler of Crystal's Pro Edition, which
provides the ability to compile reports into a stand-alone executable.
Notably, Borland also bundles Crystal Reports with its database products.
Another feature, OLE automation, permits objects to publish a set of commands
and functions available to other apps (CreateObject) and subscribe to those
features in other applications (GetObject). OLE automation is a standard
feature in Microsoft applications; see the accompanying text box entitled,
"Visual Basic for Applications." You can use the OLE tool to add OLE 2.0
automation to your applications, but VB doesn't support the full set of OLE
2.0 features.
The VB 3.0 CDK lets C++ programmers create new VB controls. The principal
change to the CDK for 3.0 was the addition of functions and messages for
data-aware controls.
VB isn't the best platform for collaborative projects because version-control
tools that support code sharing for multiprogrammer projects do not compare
with other languages. Microsoft has announced an upcoming version-control
system, but not a release date. Consequently, the most common current approach
is to save forms in ASCII.


A Multimedia Database


Video and database technologies are a natural marriage for some applications.
One such example is an industrial database, where video is used as a tool to
familiarize workers with physical layouts to minimize exposure to hazards or
radiation. The program presented in this article is a typical database that
includes identifiers and evaluation data. The example is for a tennis academy,
although you can easily modify it to fit different applications (a personnel
database, for example). Once the application has been created, I'll examine
the revisions needed so you can use it with ODBC and a local dBase (DBF) file.
The files you'll need for this project are available electronically; see
"Availability," page 3. The example also uses the MCI custom control and
Microsoft's Video for Windows to play video clips. To use this feature, you
must install the files from the \RUNTIME directory of the Drivers diskette of
Video for Windows. Alternatively, you can download VFWRUN.ZIP from the WINEXT
or MSAPPS forum libraries on CompuServe.
The first step in creating the participant form (PERSON.FRM) is to use a data
control, which resembles VCR controls. This 3.0 control permits a user to step
through query data using four buttons: MoveFirst, MovePrevious, MoveNext, and
MoveLast. To associate a form with a table in the database, select the
Properties window, double-click on the DatabaseName property and select an
Access filename from the DataBaseName dialog box. You must follow a similar
procedure to select the query to run against the database by specifying a
table name, an SQL statement, or a stored SQL query for the RecordSource
property.
The next step in the process is to add a number of bound controls linking the
field on the form to a column in the table. Associate all data-aware controls
with the data control and set all of the DataSource properties to the Data1
control. To do this, place the control on the form and complete its DataField
property. VB will display a list of tables in the database. Following the
design of the form, the next step is to add data-validation logic. Click on
the data control, select the Validate event and enter the code for your field
validation. The person form (PERSON.FRM) of our example project validates the
information in three bound text controls. The row or record must have a
surname, a gender of M or F, and a category or rating between 10 and 70.
To test the ODBC links, I used a previously installed testbed of ODBC software
and drivers. The next step was to install and test using more recent drivers
from Q+E Software's ODBC Pack and Microsoft's Desktop Database Driver Pack. To
use the participant form with an ODBC database, you'll need to blank the
DatabaseName property. Then, you need to complete the information in the
Connect property. For example, if the name of the ODBC data source is
PERSONID, enter ODBC;DSN=PERSONID in the Connect property field and select the
appropriate table name from the list associated with the RecordSource
property.
If you set the Connect property to ODBC and don't enter a DSN string, VB will
attempt an SQLDriverConnect, which will prompt you with a list of ODBC data
sources. If you want to create a result set (dynaset) from an SQL query, enter
the SQL statement in the RecordSource property. For example, it might be
common to view participants in rating order by entering SELECT * FROM IDENT
ORDER BY CATEGORY for the RecordSource property. To use an attached dBase
table, you need to set the Connect property to dBase III or dBase IV, specify
the directory name for the DatabaseName property and specify the table name as
the RecordSource property.


Conclusion



Visual Basic is a versatile tool for client and client/server applications.
The application presented here requires only minor changes to run with a
variety of database formats; the data controls provide quick and easy access
to data even though they are not an optimal solution for high-performance
applications.
As an environment, VB often provides more than one solution, so you must
evaluate the trade-offs between using VB at a high level of abstraction or
getting down and dirty while writing to APIs. VB invites trade-offs because
there are some projects where performance is critical, and others where
ease-of-development is paramount. If you opt to write at the API level, you
must learn hundreds of functions (including 50 for Crystal Reports and another
51 for ODBC) instead of pointing and clicking at the VBXs' counterparts. When
evaluating whether to jump in with VBXs or write directly to the APIs,
consider the sheer volume of information: The ODBC 1.0 reference manual, for
example, is some 629 pages, the Graphics Server SDK almost 400 pages, Video
for Windows about 250 pages, and the OLE 2.0 manual nearly 700 pages. Throw in
the Win and the Win32 APIs and you won't want to see a bookstore anytime soon.


Visual Basic for Applications


Visual Basic for Applications, formerly called "Object Basic," is the embedded
language of Microsoft's applications. Access Basic, WordBasic, and Visual
Basic are older cousins of the latest VB edition that includes enhancements in
areas like macro recording, transportability, and object support. VBA is the
new programming tool for Excel 5.0 on the PC and the Macintosh, so the macro
recorder now generates Visual Basic.
The concept of programming with objects is familiar to developers using VB
dialects. Access Basic and Visual Basic include objects such as Form, Dynaset,
Report, and Database. Excel now has Cells, ActiveCells, Ranges, and
collections like Sheets, Worksheets, Charts, and Workbooks. Each edition of VB
introduces more features familiar to programmers using OOP languages such as
Smalltalk, Actor, and C++. The implementation of VB available with Excel
includes an Object Browser, a familiar tool to OOP developers. Although the
OOP-language browsers usually depict a class hierarchy within an application
context, the Object Browser in VBA shows objects from other applications. An
Excel user might browse and use objects from Word for Windows, FoxPro,
PowerPoint, Mail, Access, Project, Publisher, or other applications that
support OLE 2.0. VB programs reference objects in other applications by
statements such as that shown in Figure 1(a). It is possible to use objects
from EXEs or DLLs and to activate other applications and send keystrokes; see
Figure 1(b).
Programmers familiar with the various implementations of Lisp and REXX may be
surprised to learn the extent to which VB has become a full featured
programming language for multiple computing platforms. In addition to VBA
development, Microsoft is porting the VB Programming System to the Macintosh
and Windows NT. NT 3.1 and OLE 2.0 are the forerunners to Cairo, Microsoft's
object-oriented operating system.
Microsoft is clearly positioning VB as a flexible tool for development with
objects (OLE, Windows, and NT today; Mac and Cairo in the future). In the
not-too-distant past there was a lot of interest in microprogrammable
computers, those adaptable to specific application needs by using a writable
control store and microcoded machine instructions. Lisp and Pascal machines
were computers whose microcode gave the hardware a Lisp or Pascal personality.
The industry today is moving to RISC, not extensible architectures, but you
wonder what the planners at Redmond would do with an object-Basic machine.
--K.N.

Figure 1: (a) Referencing objects in other applications from Visual Basic;(b)
using an object from an EXE or DLL.
(a) Set MSWord = CreateObject(class:= "Word.Basic")Set Picasso =
GetObject("guernica", "MSDraw.Metafile").Set Linked Drawing =
Sheet1.OLEObjects.Add (filename:="guernica", _link :=
True)LinkedDrawing.Update

(b) AppActivate title := "Terminal"SendKeys String := "{TAB},{F1},{enter}"



Add-on Database Tools for Visual Basic


The Visual Basic database add-on community includes many tool vendors
competing to fill specific niches. One factor that gives some of these tools
an edge is their usability with other languages. Developers working in
multiple languages, or groups that include programmers with different language
skills, can benefit from the ability to add libraries or custom controls that
work with more than a single programming language.
Microsoft bundles Crystal Reports from Crystal Services (Vancouver, British
Columbia) and Graph from Pinnacle Publishing (Kent, Washington) with Visual
Basic Professional Edition. However, both Visual C++ and Borland C++ 4.0
developers can add similar functionality. The Graphics Server SDK from
Pinnacle Publishing produces almost 20 graph types using installable
components for C/C++, SQLWindows, Turbo Pascal, Powerbuilder, Actor, and
Superbase. Graphics Server's architecture is similar to some Windows SQL
engines. A single copy of the server sits in memory and services graphic
requests in the form of DDE conversations or calls to one of the approximately
170 functions in its DLL (GSWDLL.DLL). Graphics Server manages the windows,
and your application works with views in a manner similar to that of
non-Windows graphics software. The coordinates in your program are relative to
a view, not a window. GS SDK's geometric functions include lines, circles,
ellipses, filled areas, and polygons. The text functions include titles, text,
and numeric labels and legends.
Q+E MultiLink/VB from Q+E Software (Raleigh, North Carolina) is a "middleware"
tool that links VB applications to PC and SQL databases--Oracle, Sybase,
Ingres, SQL Server, dBase, Clipper, Paradox, and Btrieve. The tool, which
supports ODBC, includes 20 database drivers and 60 new properties.
Applications using these database drivers may be distributed royalty free.
MultiLink/VB also provides a number of development tools including a database
manager, custom controls, and the Q+E Builder. New properties include edit
masking, mapped list, combo boxes, and string searches in combo and list
boxes.
VBAssist from Sheridan Software (Mellville, New York) adds more than 25 tools
to the VB environment, including a Data Assistant to bind custom controls to
databases via drag-and-drop; a Form Wizard that generates database forms which
can contain a bound label, text fields, and navigation buttons; and DB
Assistant, a tool that allows the creation, testing, and maintenance of an
Access database from within VB. Data Widgets, also from Sheridan, is a set of
bound controls for creating front ends to database applications.
Integra VDB from Coromandel (Forrest Hills, New York) includes an SQL engine
and VBX controls for VC++ and VB. It includes Architecture controls and Visual
controls. The Architecture controls are invisible abstractions of database
objects. The Visual controls include a grid, text box, frame, radio button,
combo box, list box, and a database-operations control. The Architecture VBXs
include application, data-source, query, form-definition, and form-builder
controls. Integra VDB also has a call-level interface of approximately 50
functions, including a trigger function which supports master-detail lookups
that fire when the user activates bound visual controls.
TrueGrid from Apex Software (Pittsburgh, Pennsylvania) allows VB programmers
to create database browse tables that are editable and fully configurable. To
create a browse table, you drop TrueGrid onto a VB form and set the form's
DataSource property. TrueGrid supports the same database formats as VB, but
also allows you to create browse tables for custom databases. The tool also
includes an expression string syntax for calculating fields, and a design-time
layout editor for interactive design.
Finally, because the Access Engine does not support Clipper index files
(NTXs), A.S. Inc. (Minot, North Dakota) has created vxBase 3.0--a shareware
function library for VB or C/C++ programmers. vxBase includes a number of
functions, expressions, and operators whose syntax is similar to Xbase. A VB
programmer can manage Clipper and dBase III/III+ data, memo, and index files
with familiar functions such as vxAppendBlank, vxPack, and vxZap. vxBase
includes multiple DBF, NTX, xBase, Multiuser, Record Navigation, Memo,
Logical, Date, Numeric, Field, Char, Record I/O, File, Browse, Memory, and
Windows Interface functions. Programmers opting for the commercial product can
order a developer's kit that includes a royalty-free run-time DLL.
--editors






























March, 1994
Porting from DOS to Windows


WINGate's client/server framework minimizes recoding




Walter Oney


Walter is a Boston-based freelance developer and consultant specializing in
system tools and interfaces between applications and NT, Windows, and DOS. He
can be reached on CompuServe at 73730,553.


Although 16-bit Windows doesn't support preemptive multitasking of Windows
apps, it does allow for preemptive, hardware-supported multitasking of "DOS
boxes." As you may know (especially if you have been following Andrew
Schulman's "Undocumented Corner" column in DDJ), this is accomplished by a
protected-mode, virtual-machine operating system (the Virtual Machine Manager,
or VMM) that kicks in when running Windows in Enhanced mode. Armed with this
knowledge and faced with an aging DOS application crying for a modern user
interface, what developer wouldn't want the ability to graft a Windows
front-end client onto a concurrently executing DOS back-end server? This is
the premise behind WINGate from WINGate Technologies.
WINGate allows a DOS program to communicate directly with a Windows
application using a transaction-based API. Once this communication possibility
is enabled, it leads to many interesting ways of exploiting the Windows
environment.


The Components


WINGate Version 1.3x consists of a set of development components for building
your own client/server applications, a Windows Virtual Device Driver (VxD) for
coordinating client and server processes, and a set of prebuilt client and
server applets that you can use to perform a few Windows operations--such as
launching and killing Windows applications--from the command level in a DOS
box. Libraries are available for 16-bit C and Visual C++, Pascal, Basic,
Visual Basic, Clipper, Foxpro, dBase, 32-bit C (Watcom, Zortech, and
MetaWare), and more. WINGate requires Enhanced-mode Windows, which of course
implies an 80386 CPU.
WINGate's target market is developers who want to preserve the value of an
existing DOS application by attaching a Windows GUI. Accordingly, I tried it
out by using a simple client/server database. I found this aspect of WINGate
to be eminently usable and well thought-out. I did, however, encounter a
performance problem that might require attention in a commercial-grade product
(more on this later). The WINGate package, in addition to its libraries,
includes ancillary components such as prebuilt applets and an installation
program.
WINGate comes on a single 3.5-inch diskette with a Windows-based installation
program, which copies the files and adds a DEVICE= line to your SYSTEM.INI.
While generally smooth, the install program does suffer from minor glitches.
For example, unless you prevent it, it replaces your DOS PATH setting with a
single entry for the WINGATE directory. Another problem has to do with version
stamping. The Windows 3.1 API provides a standard facility for stamping
executable files with version information via the resource script. An install
program should compare the version of a diskette-based file with any
pre-existing copy on your hard drive, to avoid overwriting a more recent
version. WINGate includes files that will presumably be redistributed by many
developers. Since these files lack the resource version information, it's
possible that the wrong version could end up on an unsuspecting end user's
machine.


The Simplest Client/Server Application


To investigate WINGate, I built a simple client/server database application.
The "database server" (Listing One, page 100) is a real-mode DOS application
that provides, on request, the name of the capital city of any country in the
known universe--provided the country is within a small table located within
the program. The client (Listing Two, page 100), a simple database-query
generator, is a Windows app that asks the user for a country and then displays
that country's capital.
This simple project was a breeze. I used C8 (the command-line compiler that
forms the backbone of Visual C++ 1.0) on a 486/33 and had a working example in
about two hours. The printed manual contains excellent and accurate
documentation of the 32 functions in the API; the API routines themselves are
aptly named. A short overview of client and server programming at the
beginning of the manual gives a step-by-step cookbook for building the
programs, and sample code illustrates all the steps. The sample code shows a
DOS-based client and a Windows-based server, however. Since I was building a
DOS-based server and a Windows-based client, an example of this converse
situation would have helped.
Many programmers may have trouble using WINGate successfully, I fear. This has
much less to do with WINGate itself than with the difficulty of client/server
applications in general and communication-based protocols in particular. I
find WINGate easier to use than most network protocols because it imposes much
less of a burden on me to establish the initial connection.
The code for a Windows-based client is found in CLIENT.C. As you can see, the
client registers itself as a WINGate user during processing of a WM_CREATE
message via a call to WGRegister(), passing an argument (in this case,
client), which is a unique, system-wide name identifying this particular
WINGate client. My example's identifier isn't particularly unique; you might
want to use wsprintf to generate a unique string based on your task or
instance handle.
My example does not do much error checking; you certainly would need to add
this in a production application. For example, a production application should
ensure that the corresponding server is up and running. You could use
WinExec() to launch the server and then use various WINGate services to
establish a connection. Alternatively, you could use the WGServerInfo API to
get a list of available servers and verify that the one you wanted is up and
running. For this demonstration, I manually started the server in a DOS box.
Selecting a country within the list box which occupies the client area
triggers a database query using the code in Example 1. The first three API
calls setup a transaction packet to ask the database server what the capital
of the selected country is. WGExecute transmits the packet to the server, and
WGGetResponse waits until a response occurs. WGGetTransString extracts the
answer to the query as a null-terminated string, and WGDestroyTrans cleans up
by releasing the resources associated with the transaction.
The server-side code that responds to queries is equally simple. You use a
DOS-only API function named WGInit to initialize the WINGate package and
another function called WGRegisterServer to register the program as a WINGate
server. You now enter a loop in which you poll for transactions; see Example
2. The additional API calls in the server are WGPostResponse, which sends a
response back to the client, and WGGetTransID, which gets the query
transaction's ID for use in posting the response. I detect a shutdown or query
request within the transaction-processing code by checking for a zero-return
from WGGetTransValue(trans, 1, &code).
A WINGate transaction includes eight 32-bit data fields whose use is entirely
up to the client and server programs. WGSetTransValue is used to set these
fields, and WGGetTransValue, to get their values. I used field number 1 as an
opcode, with 0 meaning "shutdown" and 1 meaning "query the database." Looking
at Listing One, you'll notice that the WM_DESTROY message sends a transaction,
to which no response is expected, containing a 0 opcode. This is the only
normal way the server can be made to exit. The use of 0 here is deliberate, by
the way, since WGGetTransValue also returns 0 if there's an error.
Within the DOS-based server program, I've shown you the call to WGGetTrans
that polls for a query transaction. But suppose no transaction is available at
a particular time. The WINGate manual doesn't say what to do. Normally, the
right thing to do is to give up your time slice by issuing interrupt 2F/1680,
and that's what my sample program does. The Windows scheduler then allows
other virtual machines to run. Unfortunately, the scheduler then also imposes
a 50-millisecond execution penalty before the yielding virtual machine is
again eligible to run. In the case of my sample, I saw a noticeable delay
between selecting a country and seeing the response, and I infer this is due
to the scheduling penalty.
The WINGate VxD could overcome the scheduling penalty by using the Wake_Up_VM
and Set_Execution_Focus services judiciously. Alternately, WINGate could
include a "wait-for-transaction" API that would cause the VM to block on a
semaphore. The driver does appear to use the Suspend_VM and No_Fail_Resume_VM
services to suspend and resume a DOS virtual machine in some cases (perhaps
when a DOS process does a WGGetResponse), but these services perform more
slowly than VxD semaphore services.


Prebuilt Clients and Servers


WINGate includes several matched pairs of client and server applets. For
example, there is a DOS-mode client program, WINSPAWN, that communicates with
a WinApp server named WGSPAWN in order to launch new DOS or Windows programs.
You're supposed to use these applets by starting the Windows-side server
program (perhaps using the LOAD= directive in WIN.INI) ahead of time. You then
use the DOS-side client program as necessary to direct the server to perform
some function or another.
I found the rationale for these applets a bit mysterious and also encountered
a problem when using one of them. After first launching all the necessary
Windows-side server programs, I started a DOS session and used the WINSPAWN
client to launch a copy of NOTEPAD; see Example 3(a). An instance of NotePad
then appeared on my desktop. Presumably, the number (11591) displayed on the
confirmation line is NotePad's task handle, in decimal. I then used WINCTRL to
resize and move the NotePad window; see Example 3(b). Predictably, the NotePad
window moved. Finally, I used WINKILL to close out this copy of NotePad; see
Example 3(c). NotePad then asked me if I wanted to lose my unsaved changes
(I'd done some typing in the NotePad window) and exited. Throughout it all,
the DOS "client" commands executed asynchronously from the Windows side. In
other words, WINSPAWN returned before NotePad was actually up and running,
WINCTRL returned before NotePad's window was moved, and so on. This sequence
left my Windows session in a sorry state--I couldn't pull down any menus!
Accelerator keys worked correctly, however, so I tried to exit from Windows.
At this point a system modal dialog told me that Windows was extremely low on
memory, whereupon I rebooted my computer.
Even assuming my experience was atypical, I still wonder whether these little
utilities are useful. Since you obviously won't type these commands in a DOS
box if you can just move the mouse, they will likely be used from a .BAT file.
It isn't easy, however, to capture stdout output like "Execution OK  11591"
to save the task handle you'll need for later commands. You can, of course,
simply use the name of an application in WINCTRL or WINKILL, but you then face
the possibility of moving (or closing!) the wrong window if multiple instances
are active. Moreover, no synchronization primitives are available at the
command level that let you wait until an app is launched before you try to
kill it, for example.


The Clipboard Applets


At first glance, it seems that the WINCLIP/WGCLIP pair of applets provide a
useful function. Normally in Windows, if you want to paste a bitmap or
spreadsheet file from one app to another, you first open an application that
understands the file, mark the portion you want to copy, copy it to the
clipboard, paste it into the target application, and then close the
originating app. With WINCLIP, you can put the whole file directly onto the
clipboard with a single DOS command: winclip dib to @\windows\cars.bmp.
Leaving aside the messy command syntax, I find the implementation of WINCLIP
disappointing. The Windows shell already exports an INT 2F/17xx interface by
which a DOS box can interact with the clipboard. (See Tom Olsen's article,
"Making Windows and DOS Programs Talk," Windows/DOS Developer's Journal, May
1992.) Apparently, WINCLIP doesn't use this interface. If it did, it wouldn't
need to have a matching server or use the WINGate VxD. There would, moreover,
possibly be more clipboard formats available.


Conclusion



Despite my quibbles about the ancillary pieces of WINGate, I think it is a
very worthwhile tool. The WINGate concept of coupling DOS and Windows
applications with a client/server API mediated by a VxD has considerable
technical charm. The author of WINGate has done a good job of creating an API
and a set of libraries to facilitate the task.
Many commercial-product developers will prefer to move whole-hog to a Windows
implementation without considering the potential simplification that products
like WINGate offer. On the other hand, I've fielded enough questions at
industry forums to know that corporate developers and individual consultants
will appreciate a less thorough-going and more pragmatic alternative.
Example 1: A Windows client creating a transaction using WINGate.
WGTRANS trans = WGCreateTrans("client", 80);
WGSetTransString(trans, country[i]);
WGSetTransValue(trans, 1, 1);
long tid = WGExecute(trans, "server", 1000,
 WG_STAT_WAIT_RESPONSE, &code);
WGGetResponse(trans, tid, WG_STAT_WAIT_RESPONSE);
WGGetTransString(trans, capital, sizeof(capital));

WGDestroyTrans(trans);


Example 2: The corresponding DOS server responding to the transaction.
WGTRANS trans = WGGetTrans();
if (trans)
{ // process transaction
 WGGetTransString(trans, country, sizeof(country));
 ... // [code that looks up capital city name]
 WGTRANS response = WGCreateTrans("server",strlen(capital)+1);
 WGSetTransString(response, capital);
 long tid = WGGetTransID(trans, &code);
 WGPostResponse(response, tid);
 WGDestroyTrans(response);
}

Example 3: (a) Using WINSPAWN to launch Windows NotePad from DOS; (b) using
WINCNTRL to resize the NotePad window; (c) using WINKILL to shut down NotePad.
(a) C:\WINGATE>winspawn notepad
 Execution OK ==> 11591

(b) C:\WINGATE>winctrl -ms
#11591 300 240 300 240
 Execution OK ==> 0

(c) C:\WINGATE>winkill #11591
 Execution OK ==> 0



Products Mentioned
WINGate 1.38
WINGate Technologies Inc.
High Street Court, Suite 303
Morristown, NJ 07960
800-946-4283


[LISTING ONE] (Text begins on page 82.)

/**********************************************************************/
/* CLIENT.C -- Database client program (Windows app). By Walter Oney */
/**********************************************************************/

#include <windows.h>
#include <windowsx.h>
#include <string.h>

#include <wingate.h>

static HINSTANCE hInst;

LRESULT CALLBACK MainWndProc(HWND, UINT, WPARAM, LPARAM);

#define arraysize(p) (sizeof(p)/sizeof((p)[0]))
/**********************************************************************/
int NEAR PASCAL WinMain(HINSTANCE hInstance, HINSTANCE hPrev,
 LPSTR lpCmd, int nShow) // WinMain
{ HWND hwnd; // main window handle
 MSG msg; // current message
 WNDCLASS wc; // window class descriptor

 if (hPrev) // only allow 1 instance at a time
 return 0;
 hInst = hInstance;
 memset(&wc, 0, sizeof(wc));
 wc.lpszClassName = "clientwindow";
 wc.hInstance = hInstance;
 wc.lpfnWndProc = MainWndProc;
 wc.hCursor = LoadCursor(NULL, IDC_ARROW);
 wc.hIcon = LoadIcon(NULL, IDI_APPLICATION);

 wc.hbrBackground = (HBRUSH) (COLOR_WINDOW + 1);
 if (!RegisterClass(&wc))
 return 0;
 hwnd = CreateWindow("clientwindow", "WINGate Database Demonstration",
 WS_OVERLAPPEDWINDOW, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT,
 CW_USEDEFAULT, 0, NULL, hInstance, NULL);
 if (!hwnd)
 return 0;
 ShowWindow(hwnd, nShow);

 while (GetMessage(&msg, 0, 0, 0))
 { // process messages
 TranslateMessage(&msg);
 DispatchMessage(&msg);
 } // process messages end
 return msg.wParam;
} // WinMain end
/**********************************************************************/
LRESULT CALLBACK MainWndProc(HWND hwnd,UINT msg,WPARAM wParam,LPARAM lParam)
{ // MainWndProc
 static char *country[] =
 { "Afghanistan","Algeria","Angola","Argentina","Australia","Austria"
 };
 static HWND hwndList; // list of countries
 static BOOL bRegistered; // TRUE if registered with WINGate
 switch (msg) // process message
 { case WM_CREATE: // WM_CREATE
 { int i;
 if (WGRegisterClient("client"))
 { // couldn't initialize WINGate
 MessageBox(hwnd, "Unable to initialize WINGate", "ERROR",
 MB_OK MB_ICONHAND);
 return -1;
 } // couldn't initialize WINGate
 bRegistered = TRUE;

 hwndList = CreateWindow("listbox", "",
 WS_CHILD WS_VISIBLE WS_VSCROLL LBS_NOTIFY,
 0, 0, 0, 0, hwnd, (HMENU) 100, hInst, NULL);
 if (!hwndList)
 return -1;
 for (i = 0; i < arraysize(country); ++i)
 ListBox_AddString(hwndList, country[i]);
 break;
 } // WM_CREATE end
 case WM_SIZE:
 { RECT rc; // WM_SIZE
 GetClientRect(hwnd, &rc);
 if (hwndList)
 MoveWindow(hwndList, rc.left, rc.top, rc.right-rc.left,
 rc.bottom-rc.top, TRUE);
 break;
 } // WM_SIZE end
 case WM_COMMAND:
 switch (LOWORD(wParam)) // select on control id
 { case 100: // the list box

 switch (HIWORD(lParam)) // select on notification code
 {
 case LBN_SELCHANGE: // LBN_SELCHANGE
 { char msgbuf[80]; // message assembly buffer
 int i; // selection (country) index
 char capital[80]; // response from server
 WGTRANS trans; // query transaction
 int code; // error code
 long tid; // query transaction id
 i = ListBox_GetCurSel(hwndList);
 trans = WGCreateTrans("client", 80);
 WGSetTransString(trans, country[i]);
 WGSetTransValue(trans, 1, 1);
 tid = WGExecute(trans, "server", 1000,
 WG_STAT_WAIT_RESPONSE, &code);
 WGGetResponse(trans, tid, WG_STAT_WAIT_RESPONSE);
 WGGetTransString(trans, capital, sizeof(capital));
 WGDestroyTrans(trans);
 wsprintf(msgbuf, "The capital of %s is %s",
 (LPSTR) country[i], (LPSTR) capital);
 MessageBox(hwnd, msgbuf, "Geographical Information",
 MB_OK MB_ICONINFORMATION);
 break;
 } // LBN_SELCHANGE end
 } // select on notification code end
 } // select on control id end
 break;
 case WM_DESTROY:
 if (bRegistered)
 { int code; // close out WINGate connection
 WGTRANS trans = WGCreateTrans("client", 0);
 WGSetTransValue(trans, 1, 0);
 WGExecute(trans, "server", 1000, WG_STAT_NO_RESPONSE, &code);
 WGDestroyTrans(trans);
 WGUnregisterClient("client");
 } // close out WINGate connection end
 PostQuitMessage(0);
 break;

 default:
 return DefWindowProc(hwnd, msg, wParam, lParam);
 } // process message end
 return 0;
} // MainWndProc end

[LISTING TWO]

/*********************************************************************/
/* SERVER.C -- DOS-based database server for WINGate demo. */
/* Written by Walter Oney */
/*********************************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <wingate.h>

static void error(int code);


#define arraysize(p) (sizeof(p)/sizeof((p)[0]))
/*********************************************************************/
void main() // main
{ short code; // error code
 if ((code = WGInit(0, 16384)))
 error(code);
 if ((code = WGRegisterServer("server", 1024, 0, 0)))
 { // can't register server
 WGTerm();
 error(code);
 }
 while (1) // until told to quit
 { WGTRANS trans = WGGetTrans();
 if (trans) // process transaction
 { char country[80]; // name of country being queried
 int i; // loop index
 long tid; // transaction id
 WGTRANS response; // response to query
 static char *key[] =
 {
 "Afghanistan", "Algeria", "Angola", "Argentina",
 "Australia", "Austria"
 };
 static char *value[] =
 {
 "Kabul", "Algiers", "Luanda", "Buenos Aires",
 "Canberra", "Vienna"
 };
 tid = WGGetTransID(trans, &code);
 if (WGGetTransValue(trans, 1, &code) == 0)
 break; // error or "quit" request
 WGGetTransString(trans, country, sizeof(country));
 for (i = 0; i < arraysize(country); ++i)
 if (strcmp(country, key[i]) == 0)
 break; // found it
 response = WGCreateTrans("server", strlen(value[i])+1);
 WGSetTransString(response, value[i]);
 WGPostResponse(response, tid);
 WGDestroyTrans(response);

 } // process transaction end
 _asm // yield time slice
 { mov ax, 1680h
 int 2Fh
 } // yield time slice end
 } // until told to quit
 WGUnregisterServer("server");
 WGTerm();
} // main end
/*********************************************************************/
static void error(int code)
{ // error
 printf("WINGate error %d\n", code);
 exit(1);
} // error end
End Listings














































March, 1994
PROGRAMMING PARADIGMS


Developing for Newton




Michael Swaine


Last year, Apple introduced the Newton MessagePad, the first of its personal
digital assistants, and with it a new user-interface model, a new development
platform, and a new object-oriented language, NewtonScript. What follows is a
look at what Newton really is (and isn't), a view of the first Newton Platform
Development Conference, a peek at Newton development using the Newton Toolkit
(NTK), and some observations on the unique challenges or opportunities in
building and selling software for Newton.


What is Newton?


That's the question Apple was asking in its pre-Christmas commercials. Apple's
answer, predictably, was that Newton is what the world needs now: love and
understanding, peace and productivity.
After buying a Newton MessagePad and accessories; reading numerous magazine
articles, online chats, advertisements, press releases from both Apple and
third-party vendors, and technical docs from Apple's PIE division; attending
the first Newton Platform Development Conference; working through the sample
code in the beta Newton Toolkit (NTK); and playing around at writing simple
Newton apps, I've come to the conclusion that the real issue is what Newton is
not.
Newton isn't simply a handheld personal digital assistant manufactured by
Sharp and sold under Apple and Sharp logos. The Apple Newton MessagePad (or
Sharp ExpertPad, which is essentially the same device) is only the first
product based on Newton Technology. Others will be announced this year.
Newton isn't one form factor. Future Newtonian devices can be expected to look
like anything from telephones to clipboards to blackboards.
Sorry, I shouldn't have said, "look like telephones." I should have said, "be
telephones." At that point, these devices will start to be actual consumer
products, which Newton isn't today, despite the charter of Apple's PIE
division (to which Newton belongs).
Newton isn't just a line of Apple products. Apple is licensing the Newton
software technology to other vendors, who will build their own devices and
sell them under their own labels. Thus, the future of Newton is not solely in
Apple's hands.
Newton isn't dependent on a single processor. Although current devices use the
ARM610 RISC chip, the applications written for Newton compile to a byte-code
representation, completely processor independent. No doubt, the Newton
software itself will be implemented on other processors. My guess is that the
PowerPC chip will be an early port. (Yes, Newton pays for processor
independence in speed--more about this shortly.)
Newton is an operating system. MacWeek columnist Don Crabb has even suggested
replacing the current Macintosh Finder with something more Newtonian. But it
isn't a computer operating system. Newton devices aren't computers and aren't
designed to do things that computers do. The concept behind Newton,
apparently, was something like this: Who hasn't said, "I can handle the big
things; it's the nagging little things that hang me up."? Newton is for the
little things.
While I'm saying what Newton isn't, I will also address some of the initial
(mis?)perceptions of the MessagePad.
Perception: The handwriting recognition is inadequate.
Reality: The MessagePad suffered for not being what editors and writers wanted
it to be, taking some really excessive abuse for the perceived deficiencies in
its handwriting recognition. But it's not supposed to be a substitute for a
reporter's notebook. In a more appropriate use, say as a tool for a
field-service technician, handwriting recognition is less important. That
said, I agree that MessagePad's handwriting recognition is inadequate. But
that doesn't make the device unusable.
Perception: Some pieces are missing: If this is a communication device,
where's the communication?
Reality: Dead on. But since release, Newton has collected some communication
aids. The Connection Kit lets you move files to and from a Mac or a Windows
machine (there are two versions) and supports some file synchronization.
There's a 9600-baud fax modem the size of a cigarette pack, a modem on a
PCMCIA card, a pager on a card (plug the card in and the page messages become
text you can save or edit), Apple's NewtonMail electronic-mail service with
links to most other services, a third-party service that bridges from an
e-mail service to a paging service so you can get e-mail via the pager card,
and some other things. These all cost extra, of course. A future version of
the MessagePad may have more communication capabilities built in.
Perception: At $700, it costs too much.
Reality: By the time you've added the necessary connection kit, fax modem, RAM
card, and battery pack, it'll cost you a lot more than $700.


Universal Attraction


In December, MessagePad in hand, I attended the first Newton Platform
Development Conference. Usually I cover these conferences as press, but this
time I was there as a developer.
Nonetheless, my journalistic antennae were up, scoping out the crowd. As soon
as I arrive at one of these events, I ask myself, who are these people? In
this case, I got the impression of a motley group. There were surely more
women than at any developer conference I've attended. And more nontechnical or
at least less-technical people. There were more people who did not identify
themselves as developers. A man with whom I ate lunch seemed typical: He had
hit upon a vertical-market niche about which he knew a thing or two, he said,
and was planning to sell Newton MessagePads, with his software installed,
directly into this market.
That was one type of attendee, the vertical expert. There were also a lot of
professional developers and quite a few marginal hackers (like me, doing my
bit to pull down the curve). But on average, this seemed to be not quite the
technical audience you'd see at OOPSLA.
Another attendee told me afterward that he had talked to several Windows
developers at the show. Were they concerned that Apple hadn't released the
Windows version of the NTK yet? No, they were willing to buy a Mac to program
on. Although not interested in developing for the Mac, they had no objection
to developing on the Mac. At the very least the story shows that Newton is
being perceived correctly as a new platform, having nothing to do with the
Macintosh.
The conference was organized along three tracks: International Newton
Marketplace (marketing), Orientation to Newton Development (summaries of
chapters in the NTK manual), and Advanced Newton Development (the good stuff).
I skipped back and forth between sessions in the last two tracks and picked
other people's brains during breaks to get the goods on the marketing
sessions.


I've Hit the Wall, Wally!


I attended sessions on NewtonScript and the Newton object model (soups and
stores are the key words there) and view systems; prototypes and
communications; and a session on intelligent assistance, or IA.
IA is a technology that you can implement in your apps that allows the user to
select some text (or sometimes let IA guess what text should be selected), tap
the Assist button, and have the text interpreted as a command, which IA will
then attempt to execute. Commands that IA already understands include things
like:
Remember trim hedge (adds "trim hedges" to your to-do list).
Call Mom at work (finds any name entries that include "Mom" as part of their
data, offers you the list if more than one matches, retrieves the work phone
number for the selected Mom, checks to see if area code or dialing prefix is
necessary, and offers to dial Mom).
Lunch with Jake Tue (schedules a 12:00 appointment with Jake for next
Tuesday).
Fax this to Larry (prepares a fax cover sheet and offers to fax the current
document to Larry, using his fax number, of course).
Birthday Mike 12/10 (creates an annual reminder of Mike's birthday on December
10).
Players of adventure games will recognize the syntax. You can add vocabulary
that your application knows about and have IA pass commands along to it. But
there are some tricky issues in this. The user doesn't have to be in your app
to use its vocabulary. Suppose two games both implement the verb "play." IA
could get confused if they don't do it right; and, frankly, IA can get
confused in any case.
Nevertheless, I think IA is an interesting interface element. Consider: The
user may invoke your app via some verb like "pay" without ever clicking on an
icon or really thinking about the fact that your application is what's doing
the job. Applications that use IA effectively can feel like disembodied
capabilities. Intriguing.
On the last day of the conference, there was a special, unannounced session
typical of Apple conferences. It was announced early on the first day--typical
for unannounced Apple events. Speculation was rife that this would be a sneak
peek at the clipboard-format Newton. Nope. It was, atypically, something with
more substance than flash.

The problem: Compiling to byte code for portability incurs a cost in execution
speed. Newton apps that I have seen tend not to be horribly handicapped by
this, but even the bundled apps can be annoyingly slow, and it's not hard to
imagine apps that will drag the machine to a grinding halt.
The solution: The Father of NewtonScript, Walter Smith, is working on an
enhancement to the Toolkit that will allow native-mode compilation and
optimization for specific hardware, on a function-by-function basis. He showed
some examples. A QuickSort routine, written in NewtonScript, was speeded up by
a factor of 40 by taking it native. A compute-and-draw routine from an early
version of the Maze game Claris is distributing was speeded up by a factor of
seven.
This isn't going to be of any use for Toolbox calls, but for compute-intensive
code, it ought to be very nice.
Here's how it will work when Newton is ported to other processors: You write
one app, but flag the functions that you want to have run native. You deliver
it on various platforms. On those platforms that have implemented native-code
optimization, this will be applied to your routine; on others, you'll just get
the usual byte code.
Right. In neither case do you get to program in the native language of the
target processor and do your own optimizing. That's the price of portability.


Lifting the Toolkit Lid


As I run through some of the components and capabilities of the NTK
development system, please understand that this is not a review. NTK is not,
as I write this, a released product. Some chapters in the doc are yet to be
written in the version that I have, and much of the sample code is "unblessed"
(in other words, back up your system before you download it).
NTK consists, primarily, of the NTK application, something called BookMaker,
scores of pieces of sample code, and what's shaping up to be pretty decent
printed documentation.
NTK itself looks a little like Symantec's Think environment, but has a flavor
all its own.
As with Think, your application in progress is called a "project." Its
components can be viewed and accessed via a Project window; there's also a
Layout window, any number of Browser windows that you create, and an Inspector
window, where you can enter NewtonScript code, execute it remotely on a
connected Newton, and get back results.
The Layout window is a visual representation of the Newton screen; although it
is not an emulator, it does have a preview mode that shows how objects will
appear on the screen. There's also a palette of object prototypes (not the
right terminology) that can be dragged onto the Layout.
There's an additional file named MessagePad, which you drag into a folder
named Platforms. When there are other platforms, you will be able to drag a
different platform file into the folder, and the Layout window will reflect
its form factor.


Making Book on Newton


NTK will let a lot of people develop applications who would not be able to do
so for the Mac or for Windows. Simple applications can be put together via
visual programming methods, using supplied components.
The protoApp template is one such component. In NTK, you can select the
protoApp template from a long list of templates or click on its icon in a
palette, then drag out a rectangle in the Layout window. This gives you a
standard application with frame, title, and go-away box. Standard buttons and
text-entry windows can be dragged into the application window just as easily.
These visual elements are called "views." They are the visual representation
of objects whose properties reside partly in RAM, partly in ROM, dragging one
of them inside another establishes an inheritance relationship between them.
But the beginning developer doesn't need to know all this. Nor does he or she
need to know that building from templates draws on a different, independent
inheritance scheme, or how these inheritance schemes interrelate. The
beginning developer can just drag the pieces into place--adding text to
titles, picking properties from lists--and build the app without writing any
code.
As a result, some simple Newton apps have appeared very quickly. Some of these
are highly content oriented. NTK includes the BookMaker application (not
supplied with the first betas) which is particularly accessible to
nonprogramming developers, like those people I saw at the conference.
BookMaker is an aid to creating electronic books that are to be delivered on
Newton platforms. Is this a good format for reading books? I consider it part
of my job to take such questions seriously, so I actually read, on a Newton
MessagePad, Joseph Conrad's Heart of Darkness, which someone has poured into
an electronic book. My conclusion: Fewer typos would have made this homage to
Conrad more convincing, but it was easy enough to read, way more convenient
than a PowerBook. (And yes, I have done the dirty work to support that
comparison: Last summer, I read Jurassic Park on a PowerBook, so you don't
have to.)
But the kinds of books that really make sense on this platform are reference
books for communications and travel, and job-related manuals for sales and
field-service people. A book of 800 numbers, for example. You can already find
some good examples of such books on various online services.
With BookMaker, you mark up a word-processing document with a special markup
language (simple dot commands, like .Title Heart of Darkness and .Author
Joseph Conrad), then run it through BookMaker. You get a Newton package, which
you can augment using NTK. Every element is scriptable with the full power of
NewtonScript. You can add any degree of power to the book you create through
this method, including adding IA. But the markup language itself has a lot of
capabilities, such as automatic index and table of contents generation, mixed
font specification, inclusion of PICTs and bitmaps, and bookmarks and other
navigational aids.
Of course, BookMaker also has limitations. You have to work around the
file-size limitation of Claris XTND technology, on which BookMaker relies for
reading files. And you are currently limited to the Geneva and New York fonts,
which are in the Newton ROM. In the future, you'll be able to download fonts.
That's the low end of Newton development. You can develop real commercial apps
without much effort and with zero to very little programming. That model won't
work for most interesting apps, though. These will require that you get
seriously into NewtonScript, an interesting language. We'll do just that in a
future column. You can put a significant amount of effort into a Newton app,
particularly if you get into communications and device control. But nowhere
does Newton development approach the complexities of Windows or Mac
development.


Channel Surfing


It was encouraging to hear Michael Spindler speak, as he did at the
conference, about the need for a new approach to software-distribution
channels. He spoke of the need to open things up for the small developer and
made it sound like that was a priority for Apple. I hope it is, and that he
wasn't just giving a sales pitch for Apple's new online service. This Newton
software market is not, or had better not be, a shelf-space battleground.
There seem to be indications that Apple is serious about small developers in
the Newton software market. Of the various distribution methods discussed at
the conference (PCMCIA cards, NewtonMail, Apple's StarCore and PIE Partners
programs, online services, Mac or PC disks), StarCore sounds particularly
interesting. It's a comarketing program, but the StarCore folks seem willing
to entertain quite a variety of different plans, tailored to the particular
needs of the developer. I heard some really small developers who had made
their pitch to StarCore and had not received a definite No.

























March, 1994
C PROGRAMMING


Templates, Patents, Docs, and Phones




Al Stevens


Last month I talked about how my min and max template functions do not work
when one argument is const and the other is not. The point of the diatribe was
that sometimes nothing works as well as reverting to old habits, in this case
the use of a #define macro. The behavior that caused the problem is that
exhibited by Borland's C++ 3.1. I ran similar tests against Watcom C++ and did
not have those problems. I cannot tell you which compiler is correct, because
I do not know what the ANSI C++ standard will say when it is published
sometime in the next century. Borland's version 4.0 with its nonsensical
no-nonsense license is out, but Ihave not yet installed the shipping version.
If you have not heard about the license, you should know that it attempts to
restrict the types of applications you can write (no operating systems or
databases, for instance) and requires you to obtain a royalty-free license if
you distribute more than 10,000 copies per year of your application. As you
might expect, this has ignited a firestorm amongst programmers worldwide. I'll
delve into this in depth in next month's column. The beta version has the same
problem, and compounds it by including its own min and max template functions
with const parameters. Now, the usage works only when both arguments are
const. I had to comment out their templates and use my own.
My min and max templates work very well, thank you. I am thinking about
patenting them. No? Read on.


Patently Absurd


Last month's editorial in DDJ told you about the patent Compton's NewMedia
received for access to multimedia data-bases. Although the Patent Office has
since decided to review the Compton's patent, the implications of that patent
and the intentions of its holder are enough to shake the foundations of
software developers everywhere. Here's another one, although in a smaller
scope. I am concerned about what it implies. Not only might D-Flat's example
Memo-pad program be subject to the licenses claimed by the holder of this
patent, but so might the products of every major software developer in the
country.
Psytronics of Princeton, New Jersey holds patent 4,540,292. I learned about it
when an associate received a letter from a lawyer suggesting that my friend's
software product, a contact manager, might be subject to licensing by the
patent.
The patent abstract describes an "electronic calendar display in which each
column always corresponds to a particular day of the week_." According to the
patent, all prior-art patents were for devices where the first day of the
month is always in the upper-left corner of the calendar matrix, sliding the
days of the week around to match the current month. Seeing a need for a
calendar display that resembles the real thing, the company designed one,
built it, and patented it in 1985.
My friend's contact manager displays a calendar on the screen, similar to the
one in Windows Calendar, Borland's Sidekick, Casio Boss, and every other
hardware and software product that includes a calendar, including D-Flat's
Memopad. I called John Olivo, the lawyer who sent the letter, and asked if he
believed that the patent covered software-generated, screen displays of
calendars. He said that my question was a complex one but that they had no
position on that issue at the moment. I asked many questions about their plans
and intentions with respect to the enforcement of this patent as it would
apply to us programmers, and he was politely evasive. He would not tell me if
they sent similar letters to Borland, Microsoft, or anyone else. He did,
however, know about their calendar-displaying products.
Mr. Olivo hastened to assure me that the letter was meant as a friendly
inquiry to potential licensees and that my friend should not be concerned. He
would not, however, say whether there would be any further legal activity with
respect to companies who chose to ignore or respond with an equally polite "no
thanks" to their invitation to license the technology.
Perhaps this one has gone as far as it will go. Mr. Olivo told me that
Psytronics is a small company. I cannot guess about their intentions or their
resources. But the granting of this patent portends a much bigger concern for
us. The patent office is handing out patents for so-called inventions that
they do not understand, and it seems that anybody can patent anything if it
involves or implies software. What does that mean to us programmers? There we
sit at our PC, building the next great American application. We use algorithms
taken from our own experience and research and from the libraries and
literature of the industry. We use them trustingly, never knowing which of
them is covered by a patent or a patent pending filed for by a naive
"inventor" and granted by an incompetent or uninformed bureaucrat. The
application hits the big time. It catches the attention of the dozen or so
patent holders who sit quietly and watch patiently for their opportunity. Like
parasites, they line up to feed on our revenues. Each one expects a license
fee, a percentage of our gross sales. Get caught by enough of them, and the
leech percentages add up until we are out of business.
My friend asked what he should do. I gave him the only counsel that I could
give, the one piece of advice that sets my teeth on edge and sends shivers up
my spine. I told him to call a lawyer.


Creative Bookmaking


The task of writing a D-Flat++ programmer's guide is upon me, and I am faced
with a common dilemma. How do you effectively document a class library?
Approaching this from the view of the user, the programmer who will use the
classes, I remember problems I've had reading the class-library documentation
for OWL, MFC, and others. They describe the class interface in terms of the
public member functions that are in the class itself, making you search up the
tree for inherited functions. I sure don't want a hernia from carrying the
document from the shelf to my desk, but I need all the information when I need
it, and therein lies the problem.
As a class user, a programmer wants to know the nature and behavior of the
class's public interface. You are not always concerned about whether a
particular member function is inherited, overridden, or unique to the class
being used. There are exceptions, of course. If you are dealing with pointers
or references to a base class, you need to understand the differences between
the behavior of the virtual functions in the base and the derived classes.
But, in most cases, you really just need to know how the methods behave in the
context of the object that you instantiate.
Reading about the derived class in the typical programmer's guide, you learn
about its member functions. Then, you must look at all of the public member
functions of all the base classes up the tree to see all of the methods
available to the derived object. Class definitions are sometimes presented in
alphabetical order, and the search involves leafing through one or more books,
holding your fingers, pencil, comb, mouse, coffee spoon, or whatever is handy
between the pages to mark your place. Other times the classes are grouped in
categories, and you have to do all of the above, plus root through the index
and hope that it is correct. In either case, you have a desk and lap full of
books with bookmarks sticking out everywhere.
The documents could save you all that book-jumping by repeating the
descriptions of all of the base methods for all of the derived classes. Two
things would happen. First, you would need a front-end loader to carry the
books around. Second, class documentation would get out of sync when base
classes changed, particularly if the project suffers from a less than
conscientious documentation effort. The usual approach is for each class to
document its own methods, so nothing is duplicated. It solves the bulk problem
but is less convenient for the programmer, who must search a complex class
hierarchy to find all the methods available to an object.
These are the problems of paper documentation, and it makes me wonder why we
still deal with paper. Isn't paper finally obsolete? All of the manuals for
Borland, Microsoft, and Symantec compiler products now come on CD-ROM along
with the software itself. They use effective presentation and search engines
to display the documentation on the screen. But they also deliver books, which
contain the same information and have no search engines beyond the traditional
tables of contents and indexes. Why do they go to the expense of duplicating
this information, using up trees, weighing down the mail person's sack, and
filling our book-cases? Part of it is our fault. Programmers gripe if the
vendor cuts back on documentation, even when the text is available
electronically and with an intelligent search engine. We perceive the value of
a product to be relative to the bulk of the package. Marketing strategists
know this. They like it when the product is in a big box. Buyers are
subliminally induced to attach greater credibility to the bigger package. It
gets more room on the retailer's shelf and, therefore, more visibility. There
is also the belief that products with documentation endure less piracy because
the users want the books.
Electronic documentation is a better idea. It weighs ounces. It is more
current. (No more README.DOC files to tell us what changed since press time.)
You can find things by using intelligent Boolean queries. Electronic
documentation can have hypertext links. It can talk, play music, and show
video clips. It can include examples that you can compile and run on your
computer. You can click on an element in a class-hierarchy diagram and see its
interface methods. Click a method and see its description. When a new version
of the compiler or tool comes out, your dog can use the old CD-ROM as a
Frisbee. Save trees, landfills, and cervical vertebrae.
What is more, a properly designed class browser would group the methods under
each class that can use them, regardless of where they come from in the class
hierarchy. It would use single copies of the function descriptions, applying
them according to the class declarations taken from the same source-code
header files that the compiler uses.
As I ponder the business of writing D-Flat++ documentation, I am happy that
developers of public-domain software are not bound by the parameters of a
market-driven economy. I don't need to, and will not print a manual. At the
very least, the document will be a text file. With time, however, I will build
a class browser that does what I just described.
Back to the issue of commercial programmer's tools. Ask yourself this. If the
Zippo C++ Compiler and Applications FrameUp product came in a mailer with only
a CD-ROM and a booklet that says to run SETUP from the Program Manager, would
you still buy it? How about at a reduced cost that allows the vendor to
recover development costs and still make a profit? What if you could buy the
bound documentation as an option? Or in a bookstore? If you would not do any
of these things, if you insist that every product and every upgrade comes with
all the manuals printed and bound, ask yourself why.


Where is Don AmecheNow That We Really Need Him?


Looking to the 21st century, we foresee an era driven by technology. The White
House is on the Internet. The information highway leads directly to the
pinnacles of power, and anyone can ride. Astronauts are walking in space,
replacing the Hubble's 80286 with a 386, fixing things so that we can see to
the edges of the Universe. Up and down Silicon Valley, programmers of compiler
tools are chiseling out new and innovative ways for me to add sculptured radio
buttons to my cold-steel, shaded, sculptured, three-dimensional dialog boxes.
These are exciting times.
I'm living this week in the heart of the Valley, having come to spend some
time in the editorial offices of DDJ. This is where the guardians of the
future of our technology live, work, and shape our lives. I should be
exhilarated just to be here where it all happens. At first I was.
Then I tried to make a telephone call from a Best Western motel room in San
Mateo to a residence in Los Gatos, about 40 miles to the south.
The phone in the motel room has an insert with printed instructions for using
a credit card. Dial 9-0, then the area code and phone number, it says. I do.
After several clicks, beeps, rings, and other electronic R2D2 sounds, an
operator comes on the line. How may he help me? I want to charge this call to
my credit card. What number are you calling? The one that I dialed. Would you
tell me what it is? I do, and he asks for my credit-card number.
I should have been suspicious. Back East, you don't have to talk to these
people. You dial the way you are told, the phone goes "boing," and you dial
the card number. "Thank you for using AT&T," the synthesized lady says, and
the call goes through. When you involve the real people, the rates go up. No
matter, I told him my credit-card number. Sorry, says he, I can't bill to that
credit card. Why not? It's for a different carrier. Yeah, says I, AT&T. We are
a private carrier, says he, and you will have to call an AT&T operator.
The phone's instructions say to dial 9-000 for AT&T. I do and get an AT&T
operator, a real person again. I repeat the drill, and the phone goes dead. I
try again and tell the new operator what happened. She promises to stay with
me for the duration. I repeat the phone number and credit card number for the
third time. "Sorry, I can't place this call." What? Is my card invalid? No,
your card is fine, but I can't place the call to that number. Why not, is it a
local call? No it's a long-distance call, but it's not long distance enough.
That, dearhearts, is a direct quote. Here I am, says I, with a telephone in
one hand and an AT&T credit card in the other, and you won't put the call
through; what can I do? Her supervisor comes on the line. Now I have a person
with authority. Call a Pacific Bell operator, he says, they can bill to your
AT&T card. How do I get them? Dial 9-0.
I dial 9-0. The operator answers, and I start over again. Sorry, says he, I
can't bill to that number. It's dj vu all over again. Are you a Pacific Bell
operator? No I'm a CTS operator, and we are a private carrier. How do I get a
Pacific Bell operator? Dial 9-0. That's how I got you! May I suggest that you
call your motel operator for assistance? You may.
The motel operator listens patiently. She has, she says, heard this story many
times. No, she does not know how I can get a Pacific Bell operator from that
phone. She herself uses Pacific Bell at home and cannot make a long-distance
call from the motel. She doesn't know how things got to be that way. We decide
that I should walk to Denny's and use the pay phone. It's raining today.
Somewhere in this fabled land Judge Greene, having retired and left the bench,
has no further use for the telephone and its conveniences. He treasures his
place in history, though, because he is secure in the knowledge that he
performed a vast service for his country by breaking up the finest
communications system that the world has ever seen or will ever see.







March, 1994
ALGORITHM ALLEY


Adaptive Block Coding




Ernie F. Deel


Ernie Deel is an independent computer programmer and consultant with a
background in engineering, CAD, and computer applications in a technical
environment. He can be reached via CompuServe at 72627,3026.




Introduction




by Tom Swan


Mother said there would be days like this. We were traveling south in our
sailboat on the Neuse River in North Carolina. It started to rain, then it
poured. The temperature was in the upper 30s, and a 25-knot wind screamed its
warning of worse to come. We were a teensy bit lost and needed to find a
particular buoy with numbers that, at a distance, appear about as large as the
letters in this sentence. As darkness approached, the waterproof seals on the
binoculars decided to give up their struggle against rain and salt water,
fogging the lenses and causing us to wonder what on earth had made us leave
our house and warm fireplace to head south for the winter using this
particular mode of travel. I half jokingly attempted to beam us up to the
Enterprise, but alas, Scotty wasn't listening.
We may not have a transporter on board, but fortunately, we own a global
positioning system (GPS) that can pinpoint our location to within 10 meters
(100 meters when the signal is scrambled by the military to foil its use by
our enemies, which of course also foils its use by us). Thanks to the GPS, we
were never actually lost, but we still needed to sight that buoy to avoid a
dangerous shoal--going aground 100 meters off course is little better than
hitting land two miles out of the way. Isn't that always the case? Technology
enhances, but never replaces, the simple tools that led to the technology in
the first place. Despite our GPS, finding that buoy would mean the difference
between arriving in port and spending the night outside in the rain--not an
attractive prospect. As you can imagine, searching for tiny, unlit,
red-and-green markers (some are only about three feet tall) in a driving
rainstorm with fogged binoculars tests the limits of the brain's
pattern-recognition capabilities.
Eventually, we found our marker and rounded the jetty into the harbor of a
town with the unusual name of Oriental. I "fixed" the binoculars by
disassembling them, managing in the process to crack one lens and partially
defog the other. At least now we had a monocular we could use to continue to
the next town, where we would purchase a new pair.
Speaking of pattern recognition, I received the following article from Ernie
F. Deel on a data-compression technique that uses pattern-recognition analysis
to achieve fast results and efficient compression ratios. I wish someone would
build pattern-recognition algorithms into navigational instruments. Maybe
then, during the next storm, I could retire below, have a cup of tea, and let
the boat find the darn buoys.
Generic block coding is a data-encoding technique for subdividing a data
stream into multiple smaller blocks, which are then analyzed and encoded
independently. The technique is adaptable to many purposes, but is ideally
suited for compressing graphical images. Two of the most powerful
image-compression schemes currently available--JPEG and fractal
compression--both use block coding along with sophisticated data analysis and
modeling.
The algorithm presented here demonstrates basic block-coding techniques along
with some relatively simple data analysis and modeling. The code uses the
method to create tools for image compression and decompression. Images
compressed with this algorithm are comparable in size to images stored in the
popular GIF format.


The Algorithm


Adaptive block coding (ABC) analyzes individual data blocks to identify
predefined data patterns. Compression is achieved if a block contains a
pattern that can be identified and encoded in fewer bytes than the original
data. By using the most efficient, best-fit data pattern to encode each block,
the algorithm adapts to changing conditions within the data stream, maximizing
the compression level. Block size, number, and type of data patterns, as well
as the encoding methods, can all be modified to experiment with new patterns
or to better adapt the algorithm to a specific type of data.
For work with graphical images, a data block is a set of pixel-color values
from a small 2-D section of the image. As I will show later, however,
block-encoding techniques are not limited to graphical 2-D images. 1-D block
coding can be used to further compress the 2-D output.
I originally developed ABC encoding for an application where partial-screen,
photo-type images were to be captured from video-camera displays and
incorporated into a database. The images were to be mixed with text from the
database. Color was not essential; therefore, I used VGA mode 12h (640x480
resolution) with 16 gray-scale shades. Contrary to conventional wisdom, this
mode can display good-quality gray-scale photographs. The mode's high
resolution also allows excellent-quality text to be included with the image.
The basic algorithm presented here is limited to use with either color or
gray-scale images in mode 12h, but it can be adapted to work with other modes.


2-D Coding


For the initial compression phase, the algorithm divides the image data into
2-D, 8x8 pixel blocks. For each input block, three data items are generated:
A block code, which identifies the general pattern type used to encode the
block.
A pattern descriptor, which describes the specifics of the block using a
simple binary code.
Color coefficients, which define color magnitudes at critical points in the
pattern.
Separate arrays hold these three items, which can be stored in a disk file.
Adding a file header and the image color palette completes the ABC disk-file
structure; see Figure 1. To facilitate image decompression and reconstruction,
the file header also contains the size in pixels of the original image and the
size in bytes of each of the five sections of the disk file.


2-D Data Patterns


Six predefined 2-D block-data patterns are used. In addition, a block code of
0 is used for blocks lacking a defined pattern that can be efficiently
encoded. Based upon my experience with actual images, a very small percentage
of blocks fall into that category. Other data patterns are easily added to
those listed here. Example 1(a) shows a text representation of the horizontal
block-code pattern.
The numbers in the example indicate the row-oriented sequence in which the
pixel block is analyzed for the existence of uniformly colored pixel groups.
The program builds a binary pattern descriptor by assigning the preceding
pixel a code of zero for each pixel having the same color value. Unequal
pixels are coded as 1. A complete pattern descriptor for an 8x8 pixel block
therefore takes eight bytes.
For each pixel encoded as 1, the program writes to the array a coefficient
corresponding to the color of the pixel. To reconstruct the original block,
color coefficients are either read from the array or duplicated as dictated by
the pattern descriptor. Essentially, this is a binary-coded form of run-length
encoding. In photo-type images, very short runs of uniformly colored pixels
are common and can be efficiently encoded. The block patterns in Examples 1(b)
through 1(d) show variations on this theme using different analysis and
encoding sequences.
Example 1(b) is the same as Example 1(a), but it examines the data stream
using a vertical, column-oriented analysis. Example 1(c), on the other hand,
analyzes the input data in a zigzag pattern.

The zigzag-block code efficiently encodes blocks dithered with pixels of
alternating colors. "Dithering" is a technique used in graphics software to
create apparent colors. For example, a simple dithering scheme might alternate
red and yellow pixels to create the appearance of orange. Dithering is also
heavily used by analog-to-digital converters such as scanners and
video-capture cards. Signal fluctuations, round-off errors, and other
phenomena within the hardware and software can also produce a dithering
effect.
The block code in Example 1(d) is similar to that in Example 1(c), but the
zigzag path begins at the upper right. Sometimes, simply analyzing the data in
the other direction can produce more efficient compression.
The numbers in the block code of Example 1(e) represent actual hexadecimal
color values. At first glance, there doesn't appear to be a consistent
pattern, but by counting like-colored pixels, you find that 26 out of 64
pixels have the color value 0A, called prime color. The pattern can be encoded
using a simple binary code. For example, following a horizontal row-oriented
path similar to Example 1(a), every prime-colored pixel is encoded as 0.
Nonprime colored pixels are encoded as 1. Only the prime-color coefficient and
the color values for nonprime pixels are written to the output.
As in the preceding pattern, Example 1(f) represents actual pixel-color
values. Again, no pattern is obviously consistent. Upon close inspection,
though, all the pixels evidently have one of four color values: 03, 05, 07, or
09. Since only four different colors are present, the colors in this block can
be encoded using only two bits instead of the usual four required to encode
the full set of 16 colors. Each of the four colors is assigned a 2-bit code.
The pattern descriptor then becomes a simple listing of 2-bit codes that
describe color values of pixels in a row-oriented fashion. The four-color
coefficients are output in the same sequence as the 2-bit codes, cutting the
size of the block almost in half.
Additional block codes reflect actual numbers of colors found within the image
block. Table 1 lists the bits per pixel used to encode the colors within the
pattern descriptor. Block codes 14 and 15 are not used.


Pattern Analysis and Selection


For each pixel block, the program calculates the final encoded block size for
each of the six basic patterns. Encoded block size is measured in terms of the
equivalent number of noncompressed pixels. This provides a consistent method
of comparing different patterns, including the nonencoded pattern, 0. The most
efficient, best-fit pattern is obviously the one that provides the smallest
encoded-block size. Thanks to the simplicity of the data patterns, compression
is relatively fast even with this simple, brute-force analysis.


One-dimensional Coding


After the 2-D compression phase, the data has lost any possible geometric
interpretation. It can now only be viewed as a one-dimensional sequence of
bytes. For reasons that are not clear to me, a significant number of short
runs of bytes typically occurs in each of the three output arrays. To take
advantage of this redundancy, I created a one-dimensional block-encoding
algorithm that accepts as input one of the 2-D data arrays. If any additional
compression is possible, the input array is returned in compressed format;
otherwise, the array is returned unchanged.
The 1-D block-encoding algorithm closely parallels the 2-D block-encoding
algorithm. The input data stream is first subdivided into 1-D blocks of 16
bytes each. Each block is then analyzed for the presence of a simple pattern.
As in the 2-D case, the following three items are written to the output for
each block:
A block code, which identifies the general pattern type used to encode the
block.
A pattern descriptor, which defines actual pattern specifics for the block
using a simple binary code.
Byte or character coefficients, which define byte or character values at
critical points in the pattern.
For 1-D coding, only two data patterns are used. A pattern code of zero
represents blocks that do not contain an encodable pattern. 1-D block codes 0,
1, and 2 are therefore encoded using only two bits.
Again, a binary pattern descriptor is used. Similar to several of the 2-D
patterns, a byte that repeats the preceding byte is coded as 0; unequal bytes
are coded as 1. Only the nonrepeating bytes are stored. The original block is
reconstructed by either reading bytes from a byte-coefficient array or
duplicating the previous byte as dictated by the pattern descriptor. Figure 2
shows an example of a data block encoded according to the 1-D algorithm. As
with the 2-D prime-color pattern, the idea here is to encode the most common
(prime) character/byte found within the block (0A, in this case).
Following compression, the data array is composed of three different sections,
closely paralleling the segmentation of the 2-D output. To facilitate
decompression, the first two bytes in the array indicate the total number of
1-D blocks. A decompression program can use that information to determine
whether or not a section has been compressed by 1-D coding.
Figure 3 illustrates the output array after 1-D compression. The first two
bytes in each section of the array indicate the total length of the section.
Based upon my experience with actual images, the 1-D compression phase
typically improves the overall compression ratio by about 10 percent or more.


The Source Code


The included ABC source code demonstrates both the use and development of
reusable software components with Microsoft's MS-DOS Basic. The source code
will also run with minimal changes under Visual Basic for Windows. I used
components from Crescent Software's Graphics Workshop to build the ABC
compression and decompression modules.
The program COMP.BAS is the ABC compression-subroutine module, while
DECOMP.BAS contains the ABC decompression subroutine; both programs are
available electronically; see "Availability," page 3. PCX2ABC.BAS in Listing
One (page 148) demonstrates how to use the compression module. The program
displays a PCX image, simulating video-camera output. The image is captured
from the screen, compressed, and stored in a disk file. Listing Two
(SHOWABC.BAS, page 148) demonstrates ABC decompression. The program simply
redisplays compressed files produced by PCX2ABC.
The modules take advantage of Basic's built-in memory management to allocate
dynamic storage arrays. As implemented, the sample programs' arrays are
limited to 64K; therefore, it may not be possible to compress some larger
images with the programs listed here. Basic fully supports the use of huge
arrays (>64K), but I didn't need that capability for my application. If
available storage is exceeded, the compression module returns a negative
number for compressed image size.
Listings Three (COMP.DCL, page 149) and Four (DECOMP.DCL, page 149) contain
function prototype declarations for COMP.BAS and DECOMP.BAS. These files must
be included in programs that use the compression and decompression modules.
Example 1: (a) Horizontal-block code; (b) vertical-block code; (c) zigzag-left
block code; (d) zigzag-right block code; (e) prime-color block code; (f)
variable-length block code.
(a) 01 02 03 04 05 06 07 0809 10 11 12 13 14 15 1617 18 19 20 21 22 23 2425 26
27 28 29 30 31 3233 34 35 36 37 38 39 4041 42 43 44 45 46 47 4849 50 51 52 53
54 55 5657 58 59 60 61 62 63 64
(b) 01 09 17 25 33 41 49 5702 10 18 26 34 42 50 5803 11 19 27 35 43 51 5904 12
20 28 36 44 52 6005 13 21 29 37 45 53 6106 14 22 30 38 46 54 6207 15 23 31 39
47 55 6308 16 24 32 40 48 56 64
(c) 01 02 06 07 15 16 28 2903 05 08 14 17 27 30 4304 09 13 18 26 31 42 4410 12
19 25 32 41 45 5411 20 24 33 40 46 53 5521 23 34 39 47 52 56 6122 35 38 48 51
57 60 6236 37 49 50 58 59 63 64
(d) 29 28 16 15 07 06 02 0143 30 27 17 14 08 05 0344 42 31 26 18 13 09 0454 45
41 32 25 19 12 1055 53 46 40 33 24 20 1161 56 52 47 39 34 23 2162 60 57 51 48
38 35 2264 63 59 58 50 49 37 36
(e) 05 0A 09 0A 05 07 0A 090A 05 0A 07 0A 03 07 0A03 0A 03 0A 09 05 0A 070A 03
09 0A 03 0A 07 0907 05 0A 03 09 0A 05 0A0A 09 07 05 0A 03 0A 0505 0A 0A 05 0A
07 03 090A 03 07 0A 07 0A 09 0A
(f) 05 07 09 03 05 07 03 0909 05 03 07 07 03 07 0503 09 03 05 09 05 09 0705 03
09 07 03 05 07 0907 05 03 03 09 07 05 0303 09 07 05 03 05 03 0505 05 03 05 09
07 03 0909 03 07 03 07 05 09 07


 Figure 1: ABC file structure.
Table 1: Bits per pixel for color encoding.
 Code Colors Bits/Pixel
 6 1 0
 7 2 1
 8 3 2
 9 4 2
 10 5 3
 11 6 3
 12 7 3
 13 8 3
 Figure 2: Sample 1-D encoded data.
 Figure 3: 1-D compression output array.
[LISTING ONE] (Text begins on page 127.)

*** BASIC Adaptive Block Coded (ABC) Image Compression

*** (c)1993, E.F.Deel, CIS 72627,3026
*** PCX2ABC.BAS - Compression Demo Module, displays PCX to simulate video
*** camera output, captures image from screen, compresses & stores.
*** Link with compression module, COMP.BAS
*** NOTE: VGA is required and assumed

DEFINT A-Z

--- Include declarations for compression module
$INCLUDE: COMP.DCL'

--- BASIC DOS/BIOS Interrupt routine
DECLARE SUB InterruptX (IntNumber, Registers AS ANY)

--- External assembler components from Graphics WorkShop by
 Crescent Software, used to display PCX and work with palette
DECLARE SUB SetPaletteEGA (BYVAL PalReg%, BYVAL Value%)
DECLARE SUB SetPalTripleVGA (BYVAL PalReg%, BYVAL Red%, BYVAL Green%,
 BYVAL Blue%)
DECLARE SUB DispPCXVE (BYVAL Display%)
DECLARE FUNCTION OpenPCXFile% (Filename$, Header$)

--- BASIC sub-program to handle palette and display PCX
DECLARE SUB ShowPCX (Filein$, XSize, YSize)

--- Share compression statistics among modules (optional)
COMMON noc, hrc, vrc, pcc, zlc, zrc, vlc, bct&, pct&, plt&, GPDat%()

TYPE RegType
 AX AS INTEGER
 BX AS INTEGER
 CX AS INTEGER
 DX AS INTEGER
 BP AS INTEGER
 SI AS INTEGER
 DI AS INTEGER
 FL AS INTEGER
 DS AS INTEGER
 ES AS INTEGER
 SS AS INTEGER
 SP AS INTEGER
 BusyFlag AS INTEGER
 Address AS INTEGER
 Segment AS INTEGER
 ProcAdr AS INTEGER
 ProcSeg AS INTEGER
 IntNum AS INTEGER
END TYPE

DIM SHARED Registers AS RegType


Filein$ = COMMAND$
IF LEN(Filein$) = 0 THEN
 CLS
 PRINT "SYNTAX: PCX2ABC Filename.PCX [-]"
 PRINT " - = Use lossy preprocessor"
 PRINT " Output written to Filename.ABC"
 END 1

END IF
x = INSTR(Filein$, "-")
IF x THEN
 Lossy = -1
 Filein$ = LEFT$(Filein$, x - 1)
END IF
FileOut$ = Filein$
x = INSTR(FileOut$, ".")
IF x THEN FileOut$ = LEFT$(FileOut$, x - 1)
FileOut$ = FileOut$ + ".ABC"

CALL ShowPCX(Filein$, XSize, YSize)

CALL Compress(0, 0, XSize, YSize, Lossy, rsize&, csize&)

IF csize& < 0 THEN
 Registers.AX = &H3 switch to text mode
 CALL InterruptX(&H10, Registers)
 PRINT "ERROR! Out of memory."
 END 1
END IF

CALL SaveABC(FileOut$, XSize, YSize)

Registers.AX = &H3 switch to text mode
CALL InterruptX(&H10, Registers)

--- Print compression statistics

PRINT "Raw image size = "; rsize&; "bytes ("; XSize; "X "; YSize; "pixels)"
PRINT "Compressed image = "; csize&; "bytes"
PRINT "Compression Ratio = 0."; csize& * 100 \ rsize&
PRINT
PRINT "Pattern #Blks"
PRINT "--------------------"
PRINT "None = "; noc
PRINT "Horiz. Run = "; hrc
PRINT "Vert. Run = "; vrc
PRINT "Prime Color = "; pcc
PRINT "ZigZag Left = "; zlc
PRINT "ZigZag Right = "; zrc
PRINT "Vari. Length = "; vlc
PRINT

END 0 -------------- End Program -----------

SUB ShowPCX (Filein$, XSize, YSize)

 Hdr$ = SPACE$(68 + 768)
 IF NOT OpenPCXFile(Filein$, Hdr$) THEN
 PRINT "File Not Found"
 END 1
 END IF
 XMin = CVI(MID$(Hdr$, 5, 2))
 YMin = CVI(MID$(Hdr$, 7, 2))
 XMax = CVI(MID$(Hdr$, 9, 2))
 YMax = CVI(MID$(Hdr$, 11, 2))
 XSize = XMax - XMin + 1
 YSize = YMax - YMin + 1

 NumPlanes = ASC(MID$(Hdr$, 66, 1))
 PixelBits = ASC(MID$(Hdr$, 4, 1))
 IF (NumPlanes < 2) OR (PixelBits = 2) OR (PixelBits = 8) THEN
 PRINT "PCX must be 640x480x16"
 END 1
 END IF

 Registers.AX = &H12 Switch to graphics
 CALL InterruptX(&H10, Registers)

 i = 17
 FOR k = 0 TO 15
 CALL SetPaletteEGA(k, k)
 t$ = MID$(Hdr$, i, 1)
 r = ASC(t$) \ 4
 i = i + 1
 t$ = MID$(Hdr$, i, 1)
 g = ASC(t$) \ 4
 i = i + 1
 t$ = MID$(Hdr$, i, 1)
 b = ASC(t$) \ 4
 i = i + 1
 CALL SetPalTripleVGA(k, r, g, b)
 NEXT
 CALL DispPCXVE(0)

END SUB

[LISTING TWO]

*** BASIC Adaptive Block Coded (ABC) Image Compression
*** (c)1993, E.F.Deel, CIS 72627,3026
*** SHOWABC.BAS - Demo De-Compression Module, re-displays
*** compressed images from disk.
*** Link with de-compression module, DECOMP.BAS

DEFINT A-Z

DECLARE SUB InterruptX (IntNumber, Registers AS ANY)

$INCLUDE: DECOMP.DCL'

TYPE RegType
 AX AS INTEGER
 BX AS INTEGER
 CX AS INTEGER
 DX AS INTEGER
 BP AS INTEGER
 SI AS INTEGER
 DI AS INTEGER
 FL AS INTEGER
 DS AS INTEGER
 ES AS INTEGER
 SS AS INTEGER
 SP AS INTEGER
 BusyFlag AS INTEGER
 Address AS INTEGER
 Segment AS INTEGER
 ProcAdr AS INTEGER

 ProcSeg AS INTEGER
 IntNum AS INTEGER
END TYPE

DIM Registers AS RegType

FileIn$ = COMMAND$
IF LEN(FileIn$) = 0 THEN
 CLS
 PRINT "SYNTAX: ShowABC Filename"
 END 1
END IF

Registers.AX = &H12 Switch to graphics
CALL InterruptX(&H10, Registers)

CALL DeCompress(FileIn$, 0, 0, OK) Decompress & display file

IF OK THEN
 DO: LOOP UNTIL LEN(INKEY$) wait for keypress
 Registers.AX = &H3 switch to text
 CALL InterruptX(&H10, Registers)
 END 0
ELSE
 Registers.AX = &H3 switch to text
 CALL InterruptX(&H10, Registers)
 PRINT "ERROR! Invalid file/file not found."
 END 1
END IF

END

[LISTING THREE]

*** ABC Compression Module Declarations

--- External components from Graphics WorkShop by Crescent Software
DECLARE SUB GMove4VE (BYVAL FromCol%, BYVAL FromLine%, BYVAL Cols%,
 BYVAL Lines%, BYVAL DestSegment%, BYVAL Direction%)
DECLARE SUB LineBF2VE (BYVAL x1%, BYVAL y1%, BYVAL x2%, BYVAL y2%,
 BYVAL LineColor%)
DECLARE SUB GetPalTripleVGA (BYVAL PalReg%, Red%, Green%, Blue%)

--- BASIC Compression/Decompression SubRoutines
DECLARE SUB Compress (BYVAL X%, BYVAL Y%, BYVAL XSize%, BYVAL YSize%,
 BYVAL Lossy%, rsize&, csize&)
DECLARE SUB ShrinkArray (BYVAL Segment%, BYVAL Addr%, nobytes&, noblks%)

--- BASIC File Output Routine
DECLARE SUB SaveABC(Filename$, XSize, YSize)

--- Internal BASIC Block File Read/Write Routines
DECLARE SUB BlkGet ALIAS "B$GET3" (BYVAL FileNum%, BYVAL Segment%,
 BYVAL Addr%, BYVAL Bytes%)
DECLARE SUB BlkPut ALIAS "B$PUT3" (BYVAL FileNum%, BYVAL Segment%,
 BYVAL Addr%, BYVAL Bytes%)

[LISTING FOUR]


*** ABC De-Compression Module Declarations

--- External assembler "components" from Graphics WorkShop
DECLARE SUB GMove4VE (BYVAL FromCol%, BYVAL FromLine%, BYVAL Cols%,
 BYVAL Lines%, BYVAL DestSegment%, BYVAL Direction%)
DECLARE SUB SetPaletteEGA (BYVAL PalReg%, BYVAL Value%)
DECLARE SUB SetPalTripleVGA (BYVAL PalReg%, BYVAL Red%, BYVAL Green%,
 BYVAL Blue%)
--- External assembler component by Doug Herr
 Displays pixel block taken from a BASIC integer array
DECLARE SUB PutI(BYVAL segment%, BYVAL Addr%, BYVAL x%, BYVAL y%,
 BYVAL wide%, BYVAL high%)
--- Internal BASIC Block File Access Routines
DECLARE SUB BlkGet ALIAS "B$GET3" (BYVAL Filenum%, BYVAL Segment%,
 BYVAL Addr%, BYVAL Bytes%)
DECLARE SUB BlkPut ALIAS "B$PUT3" (BYVAL Filenum%, BYVAL Segment%,
 BYVAL Addr%, BYVAL Bytes%)
--- BASIC Compression/Decompression Routines
DECLARE SUB DeCompress (FileIn$, BYVAL X%, BYVAL Y%, OK%)
DECLARE SUB ExpandArray (BYVAL inseg%, BYVAL inptr%, bytesz&, NoBlk%)

End Listings








































March, 1994
UNDOCUMENTED CORNER


RINGO: VxDs on the Fly




Alex Shmidt


Alex works for a large financial company in New York, where he develops
multiplatform network software. He's been doing low-level Windows programming
since 1991. Alex can be reached on CompuServe at 73302,60.




Introduction




by Andrew Schulman


A key feature of Microsoft's forthcoming "Chicago" operating system is
"plug-and-play," a specification that, once fully implemented, will allow
completely dynamic configuration of a PC, without user intervention or
rebooting the system. While your Windows program is running, drive letters may
come and go, the screen resolution may change, and new device drivers may come
on line or suddenly become unavailable. This should make for interesting
behavior as applications learn to adjust themselves to the new regime. (GO
PLUGPLAY on CompuServe has more about plug-and-play.)
In Chicago, a key part of plug-and-play is the ability to dynamically load and
unload virtual device drivers (VxDs) on the fly. This is provided by a VxD
called VXDLDR.386, which has already been released as part of Windows for
Workgroups 3.11. It appears that only specially marked VxDs can be dynamically
loaded and unloaded.
In this month's "Undocumented Corner," Alex Shmidt shows another way to
achieve dynamic VxD loading and unloading, without using VXDLDR, and even
without a VxD file. Actually, Alex provides a general technique for calling
any 32-bit Ring 0 (privileged) code from a normal Ring 3 (under-privileged)
Windows program. In the May 1993 Microsoft Systems Journal, Matt Pietrek used
callgates to access 16-bit Ring 0 code. Alex extends Matt's technique to
32-bit Ring 0 code, and then shows how to take some of this code and link it
onto the VxD chain. Thus, the VxD code need not reside in a Linear Executable
(LE) .386 file. Alex keeps his VxD code in a normal Windows DLL.
This really works! For example, after starting Alex's sample GATEVIEW program,
his newly created RINGO VxD shows up in the output from my VXDLIST program
(see the December 1993 "Undocumented Corner"). This works not only in Windows
3.1 but also in a prerelease of Chicago. Of course, VxDs dynamically installed
after the system has started will not receive initialization messages such as
SYS_DEVICE_INIT. (VXDLDR solves this with new messages such as
SYS_DYNAMIC_DEVICE_INIT.)
Alex refers to this dynamic VxD facility as RINGO, in obvious reference to
Ring 0, and perhaps also in tribute to Ringo Starr. RINGO relies on several
pieces of undocumented functionality. In particular, its ability to add to the
VxD chain at run time depends on knowledge of the Virtual Machine Manager
(VMM) INT 20h dynamic linker. I also found myself relying on this in my
generic VxD (see MSJ, February 1993). But Alex has uncovered an odd
side-effect of the VMM dynamic linker that can be used to retrieve the root of
the VxD chain; this seems a lot cleaner than the technique I used in the
December 1993 DDJ. Clearly, knowledge of VMM's INT 20h dynamic linker is
useful.
Alex also uses an undocumented parameter to the _Allocate_GDT_Selector
service, and uses (though does not depend on) an undocumented Windows service
for getting a selector to the Local Descriptor Table (LDT). This backdoor,
accessible via INT 2Fh AX=168Ah, is described in Chapter 1 of Matt Pietrek's
Windows Internals (Addison-Wesley, 1993).
For future columns, I'd love to hear from any reader who might be able to
write something on Windows multimedia internals (MMSYSTEM.DLL), or describe
how "OS/2 for Windows" works. Contact me on CompuServe at 76320,302, or in the
new "Undocumented Corner" area on CompuServe (GO DDJ).
A major advantage of protected-mode operating systems is their robustness.
Running at the processor's highest execution-privilege level, the operating
system ensures its exclusive access to vital computing resources. The simple
fact that certain machine instructions are unavailable to applications makes
it difficult for them to fool around with vital system areas and violate the
system integrity.
Unfortunately, this user-supervisor separation also presents a major problem:
Because normal applications aren't allowed low-level access to the system, any
program that needs such access must be built with a special set of programming
tools, and can only be loaded in a special way, usually requiring a reboot of
the whole system.
Although often described as something less than a full operating system,
Microsoft Windows 3 really does contain a complete operating system. It's
presented by the Virtual Machine Manager (VMM) and a set of Virtual Device
Drivers (VxDs), whose primary purpose is to preemptively multitask Virtual
Machines (VMs), visible to us as DOS boxes. The VMM and VxDs run 32-bit code
in the flat memory model at the most-privileged level of the 80x86 CPU, often
called Ring 0, which refers to the innermost circle of a concentric ring
diagram. The "virtualization" introduced by VxDs depends on their ability to
execute any CPU instruction.
On the user-mode side of the fence, we find native Windows and DOS programs.
They run at the least-privileged level, Ring 3 (Windows 3.0 programs ran at
Ring 1) and have no control over their privilege level. Still, many Windows
programs need to do some low-level work (for port I/O and the like) while
maintaining a high level of performance. This is why so many application
installations add a VxD to the SYSTEM.INI file. For example, Microsoft's
recent C compilers install several VxDs.
Without Microsoft's VXDLDR, there usually isn't a way to install and remove a
VxD while Windows is running. VxDs reside in the linear executable (LE) files,
and their presence is under complete control of the LE loader, which only runs
when Windows bootstraps. When a typical installation ends, a message pops up
indicating that we need to restart Windows to have it load the new VxDs.
When creating a VxD, you're limited by the toolset distributed with the Device
Driver Kit (DDK), which contains a special version of assembler (MASM5), the
linker LINK386 for producing LE files, and some other tools. This problem has
been lately addressed in several packages for writing VxDs with 32-bit Watcom
C or Microsoft C. But these VxDs still can only be loaded at Windows startup.
This article shows how to overcome these limitations, using an installable VxD
called "RINGO." RINGO is actually a dual-faced program, running in both Ring 3
and Ring 0. RINGO is a Windows DLL, but provides a technique equally
applicable to the protected-mode DOS world. As you'll see, it used some
undocumented VMM functionality. RINGO works in both the debug and retail
versions of Windows 3.1. Besides, the functionality RINGO relies upon is so
central to Windows that it will likely remain in the future versions. Indeed,
RINGO works fine with the August 1993 prerelease of Chicago.
To give you an idea what RINGO is capable of, I wrote a sample program,
GATEVIEW. As seen in Figure 1, GATEVIEW can display both CPU and
operating-system internal data, like the contents of the Page Directory, GDT
and LDT, the VMCB (see "Undocumented Corner," January and February 1994), the
VxD list, and the stream of VMM events.
RINGO resides in a regular Win16 DLL, has the standard USE16 code segment
_TEXT produced by the C compiler, with LibMain, WEP, and exported functions.
To the Windows loader it looks like any other DLL. What makes RINGO a VxD is
the USE32 segment _GATESEG, with which it's linked. The code in this segment
executes at Ring 0, communicates with the VMM, and provides to the Ring 3 part
a service function, SeeYouAtRing0 (see Listing Four, page 151). When
GATEVIEW.EXE calls the exports, RINGO steps down to Ring 0 and retrieves tons
of internal information for GATEVIEW, usually available only to a system-level
debugger. RINGO also inserts itself into the VxD chain and posts messages back
to GATE-VIEW, whenever it gets called asynchronously by the VMM.
Although the code for RINGO and GATEVIEW is too long to reprint here in its
entirety, excerpts from CALLGATE.H, RINGO.C, RINGO.INC, and CALLGATE.ASM are
shown in Listings One through Four. The complete source code is available
electronically (see "Availability," page 3).


Using Callgates


RINGO does not call Ring 0 directly, as this would invite a general-protection
(GP) fault. Instead, RINGO makes ring transitions via callgates.
Callgates are a native part of the CPU protection mechanism, performing ring
transitions (somewhat slowly) in just one machine instruction. They are one of
the several types of gates designed to allow less-privileged applications to
access more-privileged operating-system services. Callgates are special
entries in a descriptor table, and they have associated selectors. Unlike
segment descriptors, which use bases and limits, callgates are built around
16:16 or 16:32 far pointers to entry points at Rings 0, 1, or 2. When the
operating system wants to provide a strictly controlled interface to its
service at that entry point, it creates a callgate descriptor and makes its
selector available to applications. When an application targets the callgate
selector in the far call or jump instruction, the ring transition occurs. For
example, OS/2 1.x relied heavily upon callgates to provide most DOSCALL kernel
services.
It's worth mentioning that Windows ignores callgates and instead uses a
completely different (and, presumably, faster) ring-transition scheme that
forces the application code to generate an explicit or implicit fault, which
VMM traps and dispatches to the appropriate handler. For example, the VMM
Install_V86_Break_Point service plants a byte 63h in user code; this
corresponds to an illegal ARPL instruction. When a program running in V86 mode
executes the illegal instruction, a fault is generated that VMM interprets as
a signal to dispatch control to a Ring 0 handler.
Let's avoid duplicating the Intel manuals and take a look at how RINGO
constructs its callgates; see Figure 2.
Each Virtual Machine in Windows has its own LDT. This means that when a task
switch occurs and the current VM goes to sleep, all of its LDT selectors
become meaningless until the next time this VM is scheduled. Since RINGO is an
equal-opportunity employer, nice to all VMs, it creates a callgate to the
SeeYouAtRing0 service in the global descriptor table (GDT), available to
everybody.
When the Windows loader brings RINGO.DLL into memory, it knows nothing about
USE32 segments. Just as it always does, the loader allocates a memory block
for _GATESEG from the System VM global memory pool and allocates a USE16 code
selector that maps it. At this point, RINGO treats _GATESEG as just a memory
object. RINGO translates the 16:16 far address of the SeeYouAtRing0 function
within the object into the linear address and inserts it into the gate
descriptor's OFFSET field. This 0-based offset needs a selector, mapping the
entire 4-gigabyte address space. RINGO borrows the flat-model code selector
(28h) from the VMM and inserts it into the descriptor's SELECTOR field. This
allows RINGO to make near calls to the VMM, as is expected of a VxD.
Although callgates literally open doors to interlevel calls, the CPU still
verifies an application's access rights to the gate itself. That's why RINGO
sets the callgate descriptor's DPL and its selector's RPL to 3. It sets the
system bit to 0 to let the CPU know what kind of descriptor it's creating.
Since SeeYouAtRing0is a 32-bit function, RINGO sets the type field to 12 (it
would be 4 for 16-bit code). SeeYouAtRing0 executes when the Ring 3 code in
RINGO.DLL calls a function of type GATEPROC, whose prototype in CALLGATE.H
(Listing One, page 150) accepts two WORDs and one DWORD as parameters,
resulting in two DWORDs total, and RINGO sets the DWORD parameter count field
to 2. Finally, RINGO permanently sets the descriptor's present bit, because
VxDs can't be swapped out of memory.
As usual, the callgate selector is an index to its descriptor in the GDT. When
the Ring 3 code in RINGO makes a far call, the CPU sees that the CS register
has been loaded with the callgate selector, and loads SS:ESP with the stack of
the SeeYouAtRing0 entry point's privilege level from the task state segment
(TSS), copies the number of DWORD parameters specified in the gate descriptor
from the caller stack, and transfers control to SeeYouAtRing0 at Ring 0. The
CPU switches stacks to ensure that the stack segment's DPL will be equal to
the CPL when the instructions manipulating the stack get executed from Ring 0.
Otherwise, a GP fault would occur. This is why there are Ring 0, 1, and 2
stacks specified in the TSS; the CPU is thus always ready for a ring
transition between any two levels. Figure 3 shows the stack layout.
When SeeYouAtRing0 is about to return to Ring 3, the CPU cannot remember that
it's getting back from a gate. To restore the stack properly, SeeYouAtRing0
uses an RETF<# of bytes> instruction. Interestingly enough, it clears the
stack frames of both rings, which means that callgate procedures are typical
Pascal calling convention functions (the called function pops the stack).
How does RINGO modify the GDT? Since both the Windows API and the DOS
Protected Mode Interface (DPMI) know only about LDT segment descriptors, RINGO
uses the VMM _Allocate_GDT_Selector service. If you just construct your
callgate DWORDs and call this API, though, you'll get back 0. Doesn't it like
system descriptors? But wait a second. How does VMM itself get nonsegment
descriptors, such as interrupt gates, for example? The answer is that the
undocumented flag 20000000h must be passed to _Allocate_GDT_Selector to make
the function do what we need. Sure enough, the DDK documentation for the
_Allocate_GDT_Selector flags parameter states: "Specifies the operation flags.
This parameter should be set to 0." Ignore this advice to get a GDT callgate.
Okay, that's a key piece, but we're still at the improper ring to call the
VMM. So how do we get to the proper ring? Personally, I prefer using a tiny
VxD, CALLGATE.386, whose only job is to create and destroy callgates on behalf
of applications.

However, this requires that CALLGATE.386 already be installed. Therefore, I
made RINGO a totally independent and self-loadable VxD. I modified the
technique presented in Matt Pietrek's article, "Run Privileged Code from Your
Windows-based Program Using callgates" (MSJ, May 1993).
To launch itself into the Ring 0 orbit, RINGO creates another callgate, this
time in the LDT. It uses Microsoft's undocumented "MS-DOS" extension to DPMI,
accessible via INT 2Fh AX=168Ah, to get a selector that maps the LDT. Just
like KRNL386.EXE in Windows, RINGO uses this to write directly to the LDT,
building the callgate for another Ring 0 flat-model function, RingoInit
(Listing Four), which initializes the VxD environment. When constructing the
callgate descriptor, I hardcoded selector 28h (see GetLdtRing0CallGate in
RINGO.C, Listing Two, page 150). Neither that selector value nor the LDT
callgate approach are needed with CALLGATE.386.
Note that Microsoft's DPMI extension sets the carry flag, indicating an error,
if called from outside the System VM. To use the RINGO approach with
protected-mode DOS programs, you'll have to use some other method for
allocating LDT callgates, such as CALLGATE.386. (Another method involves the
Intel SLDT and SGDT instructions, which are not privileged.)
If you compile RINGO.C with CALLGATE_386 switch defined, RINGO will get the
callgate for RingoInit from CALLGATE.386. Unlike RINGO, CALLGATE.386 is a
conventional VxD. It implements an INT 2Fh AX=168Ah DPMI-extension interface
to communicate with applications, and thus doesn't need a VxD ID. (Given the
current confusion surrounding VxD IDs, you might want to think about giving
your own VxDs DPMI extension interfaces rather than VxD protected-mode API
interfaces.)
So what happens when LibMain (see Listing Two) calls the far pointer with the
gate selector in it? Because of the callgate, the next instruction the CPU
executes is the PUSH EBP in RingoInit at Ring 0, somewhere at address
28h:80xxxxxx. Having stepped into Ring 0, RINGO is going to make extensive use
of VMM services, and needs to set up several things to be a good neighbor.
RingoInit starts by building the stack frame, partially set by the CPU. Our
SS:ESP points to the Ring 0 stack loaded from the TSS. Then, after loading the
segment registers with the flat-model data selector, RingoInit relocates
_GATESEG in linear memory.
Conventional VxDs never worry about their flat-model environment: Together,
the compiler, linker, and LE loader take care of that. Without this luxury,
RINGO faces two problems:
First, _GATESEG still resides in a DISCARDABLE memory block allocated by the
Windows DLL loader, as specified in RINGO.DEF. Staying in there, RINGO would
crash very soon, because there is no demand-paging for segments referenced by
selectors obtained at run time. Moreover, RINGO installs callback functions to
be called in a context where page-swapping is out of the question. Eventually,
RINGO would get hit when _GATESEG was swapped out, resulting in a page fault.
If we assigned it a FIXED attribute, or Pagelocked it at run time, Windows
would likely move _GATESEG in the linear address space somewhere below 640K,
which is too expensive. Besides, the VMM (especially its debug version) and
the debugger would get confused, since they expect VxDs to always reside above
the 2-gigabyte line.
Second, the code and data references in RINGO aren't properly resolved yet;
they are still relative to the start of _GATESEG, as calculated by the
assembler.
The RelocateRingo function plays the part of the absent loader. First, it
calls VMM's _PageAllocate service for a group of PageFixed (meaning also
PageLocked) pages, large enough to accommodate _GATESEG. RelocateRingo simply
moves the entire _GATESEG segment into this memory, whose linear address is
above 80000000h, exactly where VxDs should live. RINGO uses this address to
resolve the code references. The movoffs macro in RINGO.INC (Listing Three,
page 150) deals with this issue. Next, RelocateRingo creates a GDT
data-selector alias for the _GATESEG run-time space to address its variables.
This eliminates the need for resolving data references, because the offsets
didn't change when _GATESEG was relocated. The GS: segment overrides
throughout the code explain how RINGO accesses the variables.


Adding to the VxD Chain


Let's get back to RingoInit. RINGO processes System_Control messages and needs
a Device Descriptor Block (DDB) attached to the VxD chain. Since Windows 3.1
provides no VMM service to return the DDB root, RINGO gets it using another
piece of undocumented VMM functionality.
The VMM contains an INT 20h dynamic linker for VMM and VxD function calls. To
access a VMM or VxD service, VxDs use the VMMCall or VxDCall macro, which
expands into 6 bytes: 0CDh 20h (INT 20h) followed by a DWORD composed of the
VxD ID and the service number. For example, _Allo-cate_GDT_Selector is service
76h provided by VMM (VxD ID 1), so its DWORD is 010076h. When someone first
issues a VMMCall or VxDCall, the interrupt gate assigned to INT 20h calls the
centralized handler within the VMM. This handler saves all the registers,
makes an indirect call to the dynalink handler, and the fun begins:
The dynalink handler looks back at the stack, finds our six bytes, extracts
the device ID and service number, and passes them to a helper function, which
traverses the DDB linked list and returns the requested service address. In
other words, this is just like a GetProcAddress for VMM and VxD calls. A side
effect of the helper function call is that it returns the device DDB pointer
in the ECX register. In the case of the VMM macro call, ECX contains a VMM DDB
pointer, which is the device-chain root we're looking for. The dynalink
handler patches the original six bytes of the macro call, making them look
like a 32-bit near calls to the device service, and returns back to the
interrupt dispatcher, which restores the registers, including EIP. Our VxD
executes again at the same address as the INT 20h, but this time using a
direct service call instead. Neatly done!
The DynalinkTrick function in CALLGATE.ASM (see Listing Four) uses this
knowledge to get the VxD root. Based on the fact the dynalink handler is one
of the VMM faults, DynalinkTrick installs its own handler,
Ringo_Dynalink_Handler, using the Hook_VMM_Fault service. Effectively, this
operation simply swaps function pointers in the fault call table.
DynalinkTrick then makes a dummy call to the Get_VMM_Version service, giving
Ringo_Dynalink_Handler one chance to execute. The latter calls the original
dynalink handler and saves the value of ECX. Finally, DynalinkTrick restores
the original INT 20h fault and exits with the VMM DDB pointer in hand.
The VxD chain is a linked list, and RINGO inserts itself right after the VMM.
You can run the VXDLIST program from the December 1993 "Undocumented Corner"
to verify this. The function RingoControlProc installed as a Control Procedure
in the DDB responds to the VMM messages until we disconnect RINGO from the
device chain. As Figure 1 shows, GATEVIEW displays these messages.
Finally, RingoInit creates a callgate for SeeYouAtRing0 using undocumented
_Allocate_GDT_Selector functionality described earlier and returns it back to
LibMain in the form of a ready-to-use far pointer.
To sum up, we wrote the flat-model code and integrated it with a Windows DLL
using ordinary tools. After that binary started, we set up the environment and
programmatically launched our VxD to the VMM sphere of influence. Finally, we
established the fast communication channel for user-mode programs to talk to
our VxD. When the DLL usage count decrements to 0, and the WEP entry point
gets called, RINGO discontinues his membership in the VxD club, deallocates
all its resources, including the callgates, and gracefully disappears. A small
taste of plug-and-play!


GATEVIEW


What kind of goodies can RINGO get for us? Since it's also a DLL, RINGO
exports five functions which are just shells around calls through the gate to
SeeYouAtRing0. GATEVIEW gets a lot of stuff from RINGO by calling these
exports. As Figure 1 shows, it dumps the contents of the Page Directory and
any Page Table, GDT, LDT, IDT, TSS, Control, and Debug registers, the list of
running VMs and their parameters, and the list of installed VxDs, including
RINGO itself.
Even if you don't press any GATEVIEW buttons, the lower list box will reflect
internal VMM events relevant to the VM multitasking. Every time one of these
events occurs, the VMM sends a message to all VxDs. Since RINGO is a
legitimate (though oddly created) VxD, it sees these messages and passes them
on to GATEVIEW by calling PostMessage in a nested execution block. (See the
full CALLGATE.ASM, available electronically.)
If you press GATEVIEW's DOS button, you won't be able to start a DOS box until
you press it again, because RINGO will return the carry flag set to the
Create_VM control message. If you press the button before starting Microsoft
Visual C++, it refuses to compile anything, saying "Error initializing build
VM." This should convince you that Visual C++ runs a hidden VM.
What can be done with the technique presented here? As one example, consider
hardware device-driver DLLs installing Ring 0 interrupt handlers. They would
eliminate the interrupt-latency problem faced by their Ring 3 counterparts,
which force developers to write VxDs in the first place. Another application
for installable VxDs would be a generic loader, doing run-time relocations and
fixups. With this, it would be fairly straightforward to create VxDs in C or
even C++. And whereas Microsoft's VXDLDR can only load special dynamic VxDs
located in LE files, the RINGO approach described here can load VxDs residing
in normal Windows DLLs. The only significant difference between RINGO and
conventional VxDs is that RINGO doesn't receive VMM initialization messages.
If your VxD doesn't process these messages, there would seem to be several
advantages to the RINGO approach. If you can think of other uses for
installable VxDs, I'd welcome your suggestions and comments.
 Figure 1: GATEVIEW displays information such as the page directory and the
stream of VMMevents.
 Figure 2: RINGO callgate descriptor for the SeeYouAtRing0 function.
 Figure 3: Stack frame during a call to SeeYouAtRing0 through the callgate.
[LISTING ONE] (Text begins on page 133.)

/* CALLGATE.H */
typedef DWORD (FAR PASCAL *GATEPROC)(WORD svc, WORD cnt, DWORD extra);

/* SeeYouAtRing0 services */
#define Get386_Svc 0 //get system info
#define PhysToLin_Svc Get386_Svc + 1 //map phys to linear
#define Register_Hwnd_Svc PhysToLin_Svc + 1 //register HWND
#define Unregister_Hwnd_Svc Register_Hwnd_Svc + 1 //unregister HWND
#define StopVM_Svc Unregister_Hwnd_Svc + 1 //toggle DOS box exec
#define RemapGate_Svc StopVM_Svc + 1 //remap call gate

typedef struct { /* call gate procedure parameters */
 DWORD G_Dword; // Dword parameter
 WORD G_Word; // Word parameter
 WORD G_Svc; // service number
}GPARAM;

/* RingoInit functions */
#define EXITRINGO 0xFFFF
#define INITRINGO 0

[LISTING TWO]

/* RINGO.C -- excerpts */

//#define CALLGATE_386 //define CALLGATE_386 to get gates from CALLGATE.386

#include <windows.h>
#include "386.h"
#include "callgate.h"

#ifdef CALLGATE_386
GATEPROC GetFirstCallGateVxD (FARPROC entrypoint,BYTE paramcount);
void DestroyInitGateVxD (WORD callgateselector);
#endif

VOID WINAPI RingoInit(void);
VOID WINAPI SeeYouAtRing0(void);
VOID WINAPI MakeSureOurSegIsInMemory(void);
GATEPROC GetLdtRing0CallGate (FARPROC entrypoint,
 BYTE paramcount,WORD callgate);
GATEPROC GDT_Gate,LDT_Gate;

int FAR PASCAL LibMain ( HANDLE hInstance, WORD wDataSeg,
 WORD cbHeapSize, LPSTR lpszCmdLine )
{
 FARPROC ri = (FARPROC) RingoInit;

 if (!(GetWinFlags () & WF_ENHANCED)) /*VxDs exist in enhanced mode only*/
 return 0;

#ifdef CALLGATE_386 // get the GDT call gate from CALLGATE.386
 if (!(LDT_Gate = GetFirstCallGateVxD (ri, sizeof(GPARAM)/4)))
#else // get the LDT call gate with INT 2F, AX=168A
 if (!(LDT_Gate = GetLdtRing0CallGate (ri, sizeof(GPARAM)/4, 0)))
#endif
 return 0;

 /*** get the main call gate in GDT ***/
 GDT_Gate = (GATEPROC)LDT_Gate (INITRINGO, sizeof(GPARAM)/4,
 (DWORD)SeeYouAtRing0);
 if (cbHeapSize)
 UnlockData (0);
 return (1);
}

char vendor[] = "MS-DOS"; // Microsoft's signature

GATEPROC GetLdtRing0CallGate (FARPROC gproc, BYTE params,WORD gatesel)
{
#define VENDOR_SPECIFIC_API 0x168a
WORD ldt_map; // LDT selector, which maps LDT itself
WORD (far * entryp)(void); // entry point to get the above
LPCALLGATEDESCRPT CGateDescriptor; // build call gate descriptor with this
WORD RW_ldt_map; /* ldt map selector fixes segment read-only problem */
WORD CGateSelector; // to be a call gate selector
DWORD initgate_flat; // callgate procedure's linear address

 _asm {
 mov si, offset vendor
 mov ax, VENDOR_SPECIFIC_API
 int 2fh
 or al, al
 jnz no_vendor

 mov word ptr [entryp], di /* private entry point */
 mov word ptr [entryp+2], es
 mov ax, 100h /* magic number */
 }

 ldt_map = entryp(); /* returns LDT map selector */

 _asm jnc vendor_ok
no_vendor:
 return 0;

vendor_ok:
 // When run under SoftICE/W LDT alias returns read_only, give us a good one
 if (!(RW_ldt_map = AllocSelector(SELECTOROF((void FAR *)&GDT_Gate))))
 return 0;
 SetSelectorBase(RW_ldt_map, GetSelectorBase(ldt_map));
 SetSelectorLimit(RW_ldt_map, GetSelectorLimit(ldt_map));
 if ((CGateSelector = gatesel) == 0) // we might already have one
 if (!(CGateSelector = AllocSelector(0))) // Get a selector for the gate
 {
 FreeSelector (RW_ldt_map);
 return 0;
 }

 // create a pointer to write into the LDT
 CGateDescriptor = MAKELP(RW_ldt_map,CGateSelector & SELECTOR_MASK);

 // build 32-bit ring 3-to-0 call gate
 #define MK_LIN(x) (GetSelectorBase(SELECTOROF(x)) + (DWORD)OFFSETOF(x))
 initgate_flat = MK_LIN(gproc);
 CGateDescriptor->Offset_O_15 = LOWORD (initgate_flat);
 CGateDescriptor->Offset_16_31 = HIWORD (initgate_flat);
 CGateDescriptor->Selector = 0x28; // ring0 flat code seg
 CGateDescriptor->DWord_Count = params & CALLGATE_DDCOUNT_MASK;
 CGateDescriptor->Access_Rights = GATE32_RING3; //pres,sys,dpl3,32CallGate
 FreeSelector (RW_ldt_map); // don't need you any more
 return ((GATEPROC)MAKELP(CGateSelector,0));
}

DWORD WINAPI _export MapPhysToLinear (DWORD physaddr, WORD mapsize)
{
 return (GDT_Gate)(PhysToLin_Svc,mapsize,physaddr); /* DPMI alternative */
}

[LISTING THREE]

;;; RINGO.INC -- excerpts

GPARAM struc ; parameters
 G_Dword dd ?
 G_Word dw ?
 G_Svc dw ?
GPARAM ends
CALLGATE_FRAME struc ; stack frame at the time of ring transition
 CG_pushbp dd ?
 CG_Old_EIP dd ? ; this is where we came from
 CG_Old_CS dd ? ; and will get back
 CG_Params db (type GPARAM) dup (?) ; call gate parameters
 CG_Old_ESP dd ? ; caller's

 CG_Old_SS dd ? ; stack
CALLGATE_FRAME ends

BuildGateStackFrame macro dataseg
 push ebp
 mov ebp,esp
 push gs
 push ds
 push es
 push fs
 push esi
 push edi
 ifidni <dataseg>,<_DATA>
 mov ax,ds
 mov gs,ax ; we'll access our data seg via gs
 endif
 mov ax,ss
 mov ds,ax ; ring 0 flat data delector
 mov es,ax
 mov fs,ax
 ifdifi <dataseg>,<_DATA>
 call GetRingoGdtDataSel
 endif
endm

ClearGateStackFrame macro cleanup
 pop edi
 pop esi
 pop fs
 pop es
 pop ds
 pop gs
 pop ebp
 ret cleanup
endm

movoffs macro reg,off32 ; run-time fixup
 mov reg, offset &off32
 add reg,gs:[ringo_flat]
endm

[LISTING FOUR]

;;; CALLGATE.ASM -- excerpts

.386p
 include vmm.inc
 include ringo.inc
 include 386.inc
public RingoInit,SeeYouAtRing0,MakeSureOurSegIsInMemory
_GATESEG segment dword use32 public CODE'
 assume cs:_GATESEG,gs:_DATA
RingoInit proc far
 BuildGateStackFrame _DATA
 cmp [ebp].CG_Params.G_Svc,EXITRINGOCALL
 jnz short @f
 call RingoExit ; deallocate everything we've got
 jmp short retini
@@: call RelocateRingo ; run-time relocation and fixups

 jc short init_ret
 call DynalinkTrick ; get the VxD chain root
 call InsertRingoDDB ; welcome to the VxD club
 call CreateRingoGDTGate ; GDT call gate to SeeYouAtRing0
retini: mov edx, eax ; prepare return values for the ring 3
 shr edx, 16
 ClearGateStackFrame <size CG_Params> ; clear both ring stack frames
RingoInit endp

SeeYouAtRing0 proc far ; The callgate service proc
 BuildGateStackFrame
 VMMCall Get_Cur_VM_Handle ; always helpful
 movzx eax, [ebp].CG_Params.G_Svc ; service dispatcher
 cmp eax,LASTSVC
 ja @f
 call gs:Gate_Service_Table[eax*4]
@@: mov edx, eax
 shr edx, 16
 ClearGateStackFrame <size CG_Params>
SeeYouAtRing0 endp

CreateRingoGDTGate proc
 movzx edx, word ptr [ebp].CG_Params.G_Dword ; offset16
 add edx,gs:[ringo_flat] ; fixup
 mov ax, cs ; VMM code selector
 mov cx, [ebp].CG_Params.G_Word ; parameter count
 and cx, CALLGATE_DDCOUNT_MASK ; make sure it's a reasonable number
 or cx, GATE32_RING3 ; call gate type
 call BuildCallGateDWords
 VMMCall _Allocate_GDT_Selector,<edx,eax,20000000h> ; undocumented flag
 ror eax,16
 ret
CreateRingoGDTGate endp

BeginProc DestroyGDTCallGate,public
 movzx eax,[ebp].CG_Params.G_Word
 VMMCall _Free_GDT_Selector,<eax,0>
 ret
EndProc DestroyGDTCallGate

BuildCallGateDWords proc
 movzx eax, ax
 shl eax, 16 ; selector
 mov ax, dx ; offset 0-15
 mov dx, cx ; offset 16-31 + type + count
 ret
BuildCallGateDWords endp

;****************************************************************************
; To get the VxD Base (VMM DDB ptr) we're using the undocumented fact that
; VMM's dynalink handler (considered a fault' 20h in DDK spec parlance)
; returns it in ecx. The idea is to hook VMM fault 20h, call any VMM service
; to get our fault handler receive control, call VMM's dynalink directly,
; store ecx in a static variable, and hook fault 20h again, this time
; with fault handlers reversed.
;****************************************************************************

BeginProc DynalinkTrick
 mov esi, gs:[OurDynalinkHandler]

twice: mov eax, 20h
 VMMCall Hook_VMM_Fault ; install our handler
 mov gs:[OLD_DYNALINK_HANDLER], esi
 VMMCall Get_VMM_Version ; need one call get it executed
 cmp esi, gs:[OurDynalinkHandler]
 jnz twice
 mov eax, gs:[VXD_FIRST]
 ret
EndProc DynalinkTrick

Ringo_Dynalink_Handler proc
 call gs:[OLD_DYNALINK_HANDLER]
 mov gs:[VXD_FIRST], ecx ; DDB pointer
 ret
Ringo_Dynalink_Handler endp

PhysToLin proc ; physical to linear address mapping
 movzx ecx, [ebp].CG_Params.G_Word
 VMMcall _MapPhysToLinear,<[ebp].CG_Params.G_Dword,ecx,0>
 ret
PhysToLin endp

ringo_flat dd 0 ; run-time space base
OLD_DYNALINK_HANDLER dd 0
VXD_FIRST dd 0 ; VxD chain root
OurDynalinkHandler dd offset Ringo_Dynalink_Handler
Ringo_DDB VxD_Desc_Block <,,,1,0,,'Ringo ,,offset RingoControlProc,,,,,,,>
Gate_Service_Table label dword
 dd offset Get386
 dd offset PhysToLin
 dd offset RegisterHWND
 dd offset UnregisterHWND
 dd offset StopVM
 dd offset RemapCallGate
_GATESEG ends
end
End Listings

























March, 1994
PROGRAMMER'S BOOKSHELF


Legal Self-Help for Software Developers




Jonathan Erickson


Some of the thorniest problems programmers end up wrestling with don't involve
advanced algorithms, new APIs, or obfuscated code. Far too often, the nastiest
ones revolve around legal issues--copyrights, trade secrets, software patents,
employment agreements, distribution contracts, and the like. In fact, in these
days of intellectual-property confusion, the biggest difference between a
one-person code shop and a multimillion-dollar corporation seems to be the
number of lawyers on retainer. The bottom line is that software
developers--whether they're independent contractors writing software for
retail or in-house programmers creating proprietary code--need to have a basic
understanding of their legal rights and responsibilities.
In general, the programming and legal professions cross paths in two areas:
individual rights and software protection. For example, when you go to work
for a company as an employee or contractor, you take along a certain body of
knowledge--skills, algorithms, utilities, and so forth. If you write a program
using one of your homegrown tools or techniques, do you give up rights to it?
What's the best approach to protecting your software: trade secrets,
copyrights, patents, or onerous licensing agreements? Much to their attorney's
delight, many programmers are using all of the above to protect individual
pieces of software because it's unclear exactly what is the best form of
intellectual-property protection.
When you get down to it, however, the law isn't rocket science, no matter what
lawyers would have you believe. While I wouldn't advocate Mrs. Bobbitt's
defending herself before an all-male jury, you can still educate yourself to
the point where legal-babble is at least as understandable as Visual Basic.
Software Development: A Legal Guide and The Software Developer's and
Marketer's Legal Companion, both written by lawyers experienced in software
development and intellectual-property issues, do just this for software
developers. Predictably, the two books cover much of the same territory, and
both provide electronic versions (on MS-DOS disks) of ready-to-use forms,
agreements, letters, contracts, checklists, and similar boilerplates.


Software Development: A Legal Guide


Of the two books, Stephen Fishman's Software Development: A Legal Guide
generally goes into more depth, particularly on volatile topics such as
software patents. (The 60-page chapter on patents was actually written by
patent attorney Leigh Hunt.) In addition to providing a historical and legal
background on software patents, the book also discusses how to determine if
your software is patentable, how to go about getting a patent, where to search
for prior art (the locations and phone numbers of dozens of nationwide
patent-depository libraries are listed), and how to understand the
patent-classification system (not as easy as you'd expect). The discussion of
software patents was refreshingly honest, considering that it was written by
an attorney--or maybe it's just that I agree with the author when he says,
"only the powerful (or very determined) can play the patent game." Finally,
the author discusses what you can do if you're accused of patent infringement.
To my mind, trade secrets have always been a better way than patents to
protect software. However, my understanding of trade-secret law has been
largely anecdotal. Fishman's detailed analysis of the topic and his discussion
of the steps you take to identify and protect secrets filled in the gaps for
me, particularly when he compares trade secrets to both copyrights and
patents.
Interestingly, Software Development also has a timely chapter on multimedia, a
subject fraught with legal land mines. While Fishman's focus is for the most
part on how you obtain permission to use copyrighted materials (text, photos,
video, audio, and so on), he also examines public domain and the fair-use
exception to copyrighted works.
As for individual rights, Fishman devotes specific chapters to employment
agreements (from the perspective of both employer and employee),
independent-contractor agreements, agreements to develop custom software, and
ownership of software created by employees and independent contractors.
Particularly useful when dealing with the Internal Revenue Service are the
guidelines Fishman provides for software companies that use independent
contractors.


The Software Developer's and Marketer's Legal Companion


Fishman's orientation in Software Development: A Legal Guide is largely
no-nonsense reference--just the facts, ma'am. Gene Landy's Legal Companion, on
the other hand, is more readable and loaded with anecdotes. (Landy's chapter
on trade secrets and confidentiality agreements starts off with the
well-documented travails of Gene Wang's jump from Borland to Symantec.) That's
not to say that because Legal Companion is more entertaining, it's less
valuable: While Landy's book overlaps with Fishman's, Legal Companion delves
into topics such as beta-test agreements and software liability that aren't
directly touched upon in Fishman's book.
It's significant that the Legal Companion is written for software developers
and marketers. To this end, Landy also examines distribution and dealer
agreements, shrink-wrap licenses and warranties, and end-user agreements. Of
special note is a chapter on international software distribution (including
export controls), in which Landy discusses how European and Japanese
intellectual-property laws differ from those in the U.S., as well as the ins
and outs of setting up distribution agreements with non-U.S. distributors.
Landy further provides guidelines for negotiating software-publishing
agreements, including license and royalty issues, sublicensing, sales
auditing, warranty and indemnification issues, and termination provisions.
This is followed by a chapter on distribution channels (including shareware),
pricing, warranties, franchise and antitrust laws, and related information.


Boilerplates


Between them, Software Development and Legal Companion also provide hardcopy
and electronic versions of nearly 40 documents, ranging from nondisclosures
and multimedia privacy releases to source-code escrow and shrink-wrap license
agreements. In most cases, the forms are explained or annotated so that you
can use them without having to consult a lawyer.
Copyright, patent, and other registration procedures are also detailed; Legal
Companion even has an official copyright-registration form bound into the
book.
Using just one of these boilerplate forms or letters will more than pay for
the cost of the book.


Conclusion


Having to worry about legal issues can be a distraction to the actual
programming process. However, in this day and age, writing software without
considering legal ramifications is akin to driving at night without
lights--you can't see where you're going, and you can't see what's coming your
way.
Neither Fishman's Software Development nor Landy's Legal Companion will solve
your problems once subpoenas are being served, but either of them may save you
from getting that mired down in the first place. If nothing else, the two
books make it possible for you to carry on a meaningful dialogue with a
smooth-talking latter-day Perry Mason when (not if) you have to engage legal
counsel.
Software Development: A Legal Guide
Stephen Fishman
Nolo Press, 1993, 300 pp., $44.95
ISBN 0-87337-209-3
The Software Developer's and Marketer's Legal Companion
Gene K. Landy
Addison-Wesley, 1993, 548 pp. $34.95
ISBN 0-201-62276-9






March, 1994
OF INTEREST
The Tool Interface Standards (TIS) Committee, a group of software companies
working to advance the interoperability and portability of development tools
for 32-bit Intel architecture operating environments, has announced version
1.1 of the TIS Portable Formats Specification.
The TIS spec standardizes the first linkable, loadable, and debug formats. The
formats include the relocatable object module format (OMF), executable
linkable format (ELF), and debug information format (DWARF). The committee has
also agreed upon Microsoft's PE and symbol and type information (STI) for the
Windows environment. Version 1.1 updates OMF and DWARF by clarifying
inconsistencies in the OMF standard and adding functionality to the DWARF
standard through support for C++, Fortran-90, Modula-2, and Pascal.
TIS includes the aforementioned Portable Formats Specification for defining
file formats portable across Windows, UNIX, OS/2, and other operating systems,
and the Formats Specification for Windows, which describes formats for 32-bit
implementations of Windows. TIS-compliance enables you to mix- and-match tools
that previously conformed to proprietary interface specs. Companies backing
TIS include Borland, IBM, Microsoft, Intel, SCO, Novell, Microtec, Watcom,
Lotus, Metaware, Lahey, Autodesk, SSI, and Absoft.
Copies of the Tool Interface Standards specification are available on Compu-
Serve (GO IntelAccess) or the Intel Literature Center (order #241597). Reader
service no. 20.
Intel Corp.
Literature Center
P.O. Box 7620
Mt. Prospect, IL 60056-7620
800-548-4725
BetterState from R-Active is a Windows-based visual programming tool for
developing state machines which include concurrency and hierarchy. The tool's
graphical environment lets you draw states, identify threads of concurrent
control, and annotate diagrams. These diagrams can be on one or more linked
"pages."
Once a BetterState diagram is complete, the tool generates C, C++, VHDL, or
Verilog HDL code. The software comes with templates for scheduling the design
deterministically, stochastically, or using specific scheduling algorithms
(round robin, for example).
BetterState retails for $1195.00. Reader service no. 21.
R-Active Concepts
20654 Gardenside Circle
Cupertino, CA 95014
408-252-2808
The Windows API is rapidly becoming a de facto standard for programmers
developing under both UNIX and Windows. Joining the ranks of Bristol
Technology's Wind/U cross-platform development tool is Mainsoft's MainWin
Cross-Development Kit which now supports VC++/MFC. (Wind/U is discussed in
this issue's article "Cross-Platform Development with Visual C++" by Chane
Cullens, while the underlying technology to MainWin is discussed in
"Binary-Data Portability" by Jos Luu.)
With the MainWin CDK, you can write a single code base using VC++/MFC on
Windows or Windows NT, and recompile it for Sun, IBM, HP, and SGI workstations
and for X Terminals that support Motif.
The MainWin Cross-Development Kit sells for $5000.00; discounts and licensing
agreements are available. Reader service no. 22.
Mainsoft Corp.
883 North Shoreline Blvd., Suite C-100
Mountain View, CA 94043
415-966-0600
A series of new app notes of interest to developers of testing, measurement,
and control applications have been prepared by National Instruments. The
notes, as you might expect, focus on National Instruments tools.
The first app note, "VXI Block Data Transfer and Register-based Device
Programming Using the GPIB-VXI," discusses the methods for transferring data
between National Instruments' GPIB controllers and VXIbus instruments linked
by GPIB-VXI interface kits (part #340568-01).
As its title suggests, "Measuring Temperature with Thermocouples" (part
#340524-01) examines the use of thermocouples to measure temperature and how
to avoid common problems. "Measuring Temperature with RTDs" (part #340557-01)
addresses how scientists and engineers are able to use the high-accuracy and
stability of resistance-temperature detectors (RTDs) to measure a wide range
of temperatures.
"Signal Conditioning Fundamentals for PC-based Data Acquisition Systems" is an
introduction to the fundamentals of using signal-conditioning hardware with
PC-based data acquisition systems (part #340565-01).
Other app notes covering topics ranging from DSP-based data acquisition to
fast Fourier transforms for Windows applications are available on request.
Reader service no. 23.
National Instruments
6504 Bridge Point Parkway
Austin, TX 78730-5039
512-794-0100
The Kolibri Custom Control Library is a DLL that allows integration of 3-D
Windows controls into C, C++, and Pascal programs. Kolibri consists of 12
control classes for animation, slider controls, swivel controls, list browse
boxes for large databases, flexibutton controls, motion controls, and so on.
The Kolibri controls integrate direction into the Borland Resource Workshop,
essentially becoming an additional set of Resource Workshop tools. (Note that
the library does not require the Borland compiler, however.)
The Kolibri for VBX library provides similar functionality for languages that
support VBXs--Borland C++ 4.0, Visual Basic, and the like.
Each library sells for $149.00, or $298.00 with source code. There are no
run-time royalties. Reader service no. 24.
European Software Connection
P.O. Box 1982
Lawrence, KS 66044
913-832-2070
The HTI Map Developer's Kit from Horizon Technology is a set of three
programming libraries that allow you to add detailed raster-mapping
capabilities to any Windows- or DOS-based application. The Map Display
Library, on which the Developer's Kit is based, supports functions such as
loading, displaying, zooming, and seamless scrolling through a database of
geo- referenced raster maps. It also contains functions to convert geographic
coordinates to screen coordinates and vice versa.
In addition to the Map Display Library, the Developer's Kit includes the Map
Overlay and GPS Support add-on libraries. These tools can be used in
conjunction with CD-ROM-based regional and metropolitan map sets, also
available from Horizons.
The Map Overlay Library allows you to create object-oriented databases using
objects such as points, lines, ellipses, polygons, and so forth. These objects
can be geo referenced to the underlying map database and application-specific
data.
The Developer's Kit supports Microsoft C and Borland C++. The three-library
kit sells for $1739.00. You can also buy the libraries separately. Reader
service no. 25.
Horizons Technology
3990 Ruffin Rd.
San Diego, CA 92123-1826
800-828-3808
Visual Edge's Visual Action Toolset is a collection of tools that incorporates
Display PostScript and the OSF/Motif widget set for creating document
interfaces which allow a PostScript image to be used as an interface element.
Additionally, any standard interface element can be incorporated into a
PostScript image or document. The tool set uses a layered approach to
development that parallels X application development. The Visual Action Widget
Set, comprised of rendering widgets, see-through controls and a Visual
Manager, is analogous to Motif. In this layer, rendering widgets give the
ability to display a PostScript image and to control interface graphics,
see-through controls enable the developer to create widgets with transparent
backgrounds, and Visual Manager widgets control the display and position of
child widgets.
The Visual Action Toolset Framework, analogous to the Xt toolkit, provides the
functionality common to VA widgets. The framework can be used to subclass
existing VA widgets or to create new widget classes. A Display PostScript
Client Library, similar in nature to Xlib, provides a C language interface to
the Display PostScript system.
The Visual Action Toolset sells for $2500.00. It supports UNIX and Motif on
SunOS 4.1.3, AIX 3.2.4, and SGI IRIX 5.1.1. Reader service no. 26.
Visual Edge Software
3950 Cote Vertu, Suite 100
St-Laurent, PQ
Canada H4R 1V4
514-332-6430
Procase Corporation is shipping SmartSystem 3.0, a software tool that helps
programmers understand and fix large amounts of unfamiliar source code by
graphically displaying code structure, providing information about errors and
dependencies, and performing impact analysis. SmartSystem is distinguished by
its large capacity (able to handle C programs in excess of 1 million lines of
code) and robustness (able to handle code that is incomplete, does not
compile, or has errors). It can handle many dialects of C, including code
targeted for many UNIX platforms and embedded systems, providing both
graphical and textual views of code, using a variety of formatting styles.

Version 3.0 runs on Sparc/SunOS, Sparc/Solaris, HP9000/700, and IBM RS/6000
platforms. The major new functionality is the addition of a makefile analyzer
to facilitate getting started in building the code database. It also provides
additional code comprehension capabilities in the Call Graph display, by
showing dependencies involving global data and function pointers (in addition
to showing function and library calls, as supported in the previous version).
There are also mechanisms for uncovering undefined preprocessor macros. Reader
service no. 27.
Procase Corporation
2694 Orchard Parkway
San Jose, CA 95134
408-321-3951
Visual SlickEdit, an updated version of the venerable SlickEdit programmer's
editor from MicroEdge, is available for Windows and Windows NT. Visual
SlickEdit is a configurable editor that uses an object-oriented, C-style macro
language (source code for the macro language comes with the editor) and a
built-in dialog editor. The dialog editor lets you create event-driven dialog
boxes. Additional controls the tool provides include spin, gauge, drive list,
file and directory list, and the like. The editor also makes use of a
technology MicroEdge calls "clipboard inheritance," which provides inheritance
of code pasted to the clipboard.
Visual SlickEdit, which sells for $295.00, includes Brief and Emacs emulation
and provides special support for most compilers and languages. The editor is
available for Windows 3.1 and Windows NT on Intel, Mips, and Alpha machines.
Reader service no. 28.
MicroEdge
P.O. Box 18038
Raleigh, NC 27619-8038
919-790-1691
Intel and Microsoft have jointly announced an update to the Advanced Power
Management (APM) Interface specification, which is designed to increase
battery life for portable and other low-power consumption computers based on
Intel platforms. Originally released in January 1992, APM allows the operating
system to communicate through ROM BIOS to instruct system resources to power
down during periods of nonuse. APM is already supported by the Plug and Play
BIOS specification. New features of APM 1.1 include new power-management
functions to enhance cross-platform compatibility, enhancements to the
suspend/pause scheme for improved communication between the operation system
and BIOS, and a BIOS-compatibility test so that BIOS developers can verify
compliance with the specification.
Copies of the APM 1.1 BIOS Interface specification are available through the
Intel Literature Center, order #241704. The document is also available on
CompuServe through Intel Access, the company's developer-support service.
Alternatively, the specification can be obtained through Microsoft's Hardware
Vendor Relations Group, or through the Plug and Play forum on CompuServe. The
APM 1.1 BIOS-compatibility test is available to OEMs and BIOS-developers
through Intel's General Support Group (800-628-8686). Reader service no. 29.
Intel Literature Center
800-548-4725
Microsoft Hardware Vendor Relations
206-882-8080
WidgetKit/CUA '91 is an add-on for Objectshare's WindowBuilder Pro/V, an
interface builder that works with Small-talk/V for OS/2. WidgetKit/CUA '91
adds support for OS/2-style user-interface elements as defined by the Common
User Access (CUA) '91 specification. The tool includes various controls,
including a notebook, sliders, spin buttons, containers, and value sets.
Specialized editors are used for direct access to each of the control's
attributes. The notebook control uses caching to enhance performance for large
notebooks. Widget-Kit/CUA '91 includes full Small-talk/V OS/2 source code, and
there are no royalties or run-time fees for applications developed with
WindowBuilder Pro. The company plans to release Windows and Win32 versions of
the product early in 1994.
WidgetKit/CUA '91 sells for $295.00. Reader service no. 30.
Objectshare Systems
5 Town & Country Village, Suite 735
San Jose, CA 95128-2026
408-727-3742







































March, 1994
Swaine's World


I don't need this aggravation.


The Wall Street Journal has been running a series on Home Workers, people like
me who work out of a home office. Or "home/office," as some punctuate it. A
home office is a den or bedroom converted to business purposes. A home/office
is a home taken over by the paraphernalia of one's work. I have a home/office.
Anyway, the first story in the series didn't paint a very flattering picture
of us Home Workers. Many of us are torn by feelings of inadequacy, the Journal
reported, working late into the night to overcompensate. Eventually, we start
talking to our dogs. It didn't get any better: the second story in the series
was headlined, "Home Workers Are a Bunch of Slobs."
The Home-Worker-as-Slob motif also figured in a recent contest sponsored by
home Office Computing magazine, which awarded a prize for the messiest home
office. Hah. Big joke.
It's not a joke.
It's a lifestyle choice.
Besides, as I told Zelda, my Labrador Retriever, it's not fair to single out
us Home Workers. There are plenty of slobs among--to choose an example
completely at random--computer magazine editors. I could tell you stories. Or
programmers. I've seen some of your offices. Or at least as much of them as is
visible under the mounds. What color is your carpet, hmmm?
But as somebody once said, "success is what we make of the mess we have made
of things." I'd give you the exact quote, but that would entail finding my
book of quotations in this mess.
Deciding to make some success out of mess, I cleaned up part of my
home/office. The result: my new basement/library/studio, an immaculate and
cozy nook lined with books and furnished with comfy chairs and footstools--and
a video camera and microphone. The ideal setting for conducting video
interviews.
I call it "Swaine's World."
I haven't actually done a video interview yet, but I imagine it would go
something like this:
Cast: Swaine's World! Swaine's World! Runtime! Execute!
Swaine: All right! Parity on, Corbett!
Corbett: Parity on, Swaine!
Swaine: All right. We're going to skip the gratuitous morph and go right to
the totally awesome stuff. Our first guest is David Gergen, newly appointed
Director of Spin Control for Microsoft System Software Division.
Corbett: He's our only guest, Swaine.
Swaine: Too true, Corb. All right. Give it up for Microsoft, folks, if you
haven't already.
Gergen: Thank you. You're too kind. This is too much. Thank you. No, really.
You're too kind.
Swaine: Hello-o? Nobody's clapping, Dave. Are you mental?
Corbett: Mr. Gergen, given that OpenDoc, unlike OLE, is an open protocol, is
reputedly easier to develop for than OLE, and is a superset of OLE, why should
a developer write for OLE?
Gergen: Because OLE is the standard, dorkface.
Corbett: Oh. I'm sorry.
Gergen: I'll let it go this time.
Swaine: So, Dave, have you heard this one? How many Microsoft programmers does
it take to screw in a light bulb?
Gergen: Stop right there, Swaine. Microsoft bought all rights to that joke
last year. If you tell it on the air you'll owe us $12,000 in royalties.
Swaine: Exsqueeze me? Overflow. Ground fault. Receiver has disconnected.
Beeeeeep. That's all the time we have, folks. Don't screw up and miss our next
show, when we'll interview user interface designer and former rock musician,
Nigel "But this knob goes up to eleven" Tufnel.
Michael Swaine
editor-at-large



























April, 1994
EDITORIAL


Ten Years After




Jonathan Erickson editor-in-chief


Until the Tonya Harding/Nancy Kerrigan brouhaha ushered in the notion of
tag-team figure skating, it had been at least ten years since anything really
interesting happened in ice skating. Even back in the 1984 Winter Olympics,
what held the attention of anyone other than hardcore figure-skating devotees
was whether or not U.S. figure skater Scott Hamilton's jeans could endure one
more Russian split, or if a triple axel would finally dislodge East German
Katarina Witt's Jimmy Johnson-like coiffure. In retrospect, it would have been
far more interesting had we been able to foresee the current plight of the '84
Winter Olympics host city--Sarajevo, Yugoslavia. And speaking of Russian
splits, not only has the past decade brought us the breakup of the U.S.S.R.,
but the demise of East Germany, Yugoslavia, and other such countries as well.
In the relatively calm world of computer programming, it's hard to believe
that ten years have passed since Bjarne Stroustrup introduced the C++
programming language. In the high-octane software-development arena, we've
come to believe that things move fast, even though it's taken a decade for C++
to gain little more than a toehold. Stroustrup ostensibly created C++ "to make
writing good programs easier and more pleasant for the individual programmer."
Still, it's taken the marketing might of companies such as AT&T, Microsoft,
Sun, IBM, and Borland to make the language a commercial success. The fact that
Smalltalk has been around more than twice as long as C++ with far less
success, or that there's so darn much Cobol code still out there, says
something about the software-development community's desire to absorb and
adopt new languages, not to mention the marketing efforts of C++ vendors.
As it enters its second decade, C++ finally seems entrenched. But even though
the language is being used to program everything from PCs to supercomputers,
the jury is still out on whether or not Stroustrup's goal of making
programmers more productive and software less complex has been--or can
be--achieved. The real lesson to be learned here is that software development
is a process which does not adapt to change easily. There's almost always too
much at stake for programmers to casually pick up one language while
discarding another--other than for educational or entertainment reasons.
Instead, software development continues to be a process of refinement, with
major changes occuring over a decade or more at a time. In short, the real
issues don't change--just the marketing hype.
A quick glance at Dr. Dobb's Journal ten years ago underscores this. As with
this month's issue, the April 1984 DDJ focused on cryptography, with C.E.
Burton's two-part article entitled "RSA: A Public-Key Cryptography System."
Without a doubt, cryptography is more important to a greater number of
computer programmers and users now than it was a decade ago. Back then, RSA
was still fairly new, and Burton's article was probably the first to bring RSA
to the microcomputer platform. Today, RSA is clearly the dominant approach to
cryptography. While better techniques may have been developed, it is unlikely
that in the near future an alternative will achieve the same degree of
commercial success as RSA.
1984 was also the year Judge Harold Greene became a household name, at least
in the living rooms and kitchens of AT&T executives and stockholders. It was
his consent decree, you'll recall, that led to the breakup of the most
powerful telecommunications monopoly in the world. Now, ten years after,
competition and innovation is finally beginning to catch fire with the Clinton
administration's proposals to eliminate barriers between individual
electronic-communication industries. However, proposed mergers between cable
operators, Baby Bells, entertainment giants, online services, and the like may
lead to electronic networks that dwarf the AT&T of yesteryear.
On the plus side, these mergers will fund the advanced information
infrastructures we'll be using in the coming years. On the downside, the
prospect looms that monopolistic megacorporations will be as unresponsive to
the public well-being as Ma Bell was in the old days. For instance,
Southwestern Bell, long the bellwether for the RBOCs when it comes to
controversial legislation, is currently pushing for a law that would prevent
the Missouri Public Service Commission from challenging phone company profits
or rates. A draft of the bill reportedly says that Southwestern Bell "shall
not under any circumstances be subject to any complaint or hearing as to the
reasonableness of its rates, charges, rentals or earnings." Interestingly,
this legislation was proposed on the eve of the telephone company's
announcement that it had just achieved its best-ever fourth-quarter earnings.
The company, which enjoys a monopoly, claims it needs the money to fund the
construction of the information superhighway.
Clearly, both the federal and state governments have a responsibility to
protect the public good against voracious proposals like that backed by
Southwestern Bell. Likewise, the government needs to guarantee that proposed
mergers won't result in a single company controlling both telephone lines and
cable throughout an individual region. Competition and innovation have stood
us well through the past ten years. They can get us through the next decade,
too.










































April, 1994
LETTERS


Pairing C and C++




Dear DDJ,


In his article "Programming Language Guessing Games" (DDJ, October 1993), P.J.
Plauger expresses confusion over the popularity of C++ since it is so complex
and presents an exceedingly complex algorithm for pairing teams in a
round-robin tournament.
Yes, C++ is a complex language. Plauger is right in suggesting we choose and
stick with a subset of the language. C++ is like English. You can use it to
state something in a very complex fashion, or very simply. The latter is
usually the stronger statement. The round-robin tournament problem illustrates
this very well.
The physical-education community has developed a simple algorithm for pairing
teams. First, list the teams on pieces of paper. Leave one piece blank, if
necessary, to ensure an even number of teams. Lay out the papers in two rows;
that's day #1. Hold the upper-left piece in place, and rotate all the other
pieces of paper; you then have day #2. Repeat the process for each succeeding
day until you are back at your original positions. For six teams, the rotation
would be as in Figure 1.
The C++ program implementing this algorithm is equally straightforward. Note
the simple, elegant power of the for statement controlling the inner print
loop, which I show in Figure 2.
Jay Frederick Ransom
Oxnard, California


Keep It Simple




Dear DDJ,


I was glad to read Michael Swaine's "Programming Paradigms" (DDJ, November
1993), which gives Forth a plug, even in a lighthearted way. It's a far cry,
though, from the good old days of a decade ago when DDJ annually had an entire
issue devoted to Forth.
Part of the Forth Standards efforts are confounded because the creator of
Forth, Charles Moore, doesn't believe in standards for Forth. Moore considers
Forth to be a program-development environment that increases productivity by
speeding up the programming cycle. To keep it simple, small, and speedy
(KISSS), certain design decisions resulted in using postfix notation, threaded
code for compiling, a dictionary to hold functions, separate stacks for data
and return addresses, and a (usually emulated) stack-based processor. As a
result, the Forth language is an outgrowth of the Forth system, rather than a
construct in its own right.
Moore works mostly with embedded systems, and varies the basic Forth to match
the hardware and program-design requirements. Forth then behaves like
assembler. A few primitive words (functions, opcodes) are used to extend Forth
to develop the data types and structures that particular program requires.
On the other hand, a large group of programmers want to use Forth in symbolic
programs: word processors, spreadsheets, graphics programs. They believe a
Forth language without a standard is Forth in chaos. Many come from a
traditional-language background--Fortran, Pascal, C, and the like. They want
Forth to look more like the language they are familiar with, so they push for
CASE statements, local variables, more stacks, string functions,
floating-point numbers, graphics functions, and so on, as part of the standard
Forth. It comes to this: Do you want Forth on a floppy or a CD-ROM?
Walter J. Rottenkolber
Mariposa, California


Windows Setup Follow Up




Dear DDJ,


I wish that you had published Walter Oney's article, "Examining the Windows
Setup Toolkit" (DDJ, February 1994) a few months earlier. Last November, I had
the pleasure of building a setup program for an in-house software package
using the setup toolkit from the Microsoft SDK. I heartily agree with Walter:
The setup toolkit is a credible toolkit (and free for SDK owners), but its
documentation leaves you wishing for more.
There is one point in Walter's article which I can simplify. He recommends
hand-modifying the .INF file produced by the DSKLAYT program. DSKLAYT is used
to lay out the files to best fit on the setup disks. This program requires
that you specify all of the files that will reside on the setup disks. This
includes not only your applications files but the files that control the setup
process. The .INF file, which is generated by the DSKLAYT program, controls
the copying of the files from the setup disks to the user's hard disk.
Normally, all of the setup files will also be listed in the .INF file and
copied to the user's hard disk. Walter suggests removing these files from the
.INF file by hand.
A better approach is to specify that the setup files be placed in a different
section of the .INF file. This is controlled by the DSKLAYT program by filling
in the "Put In Section" entry for each of the setup files. When this field is
left blank, the files are put into the default section named "Files." I
specify that all the setup files be placed into the section SetupFiles. When
your setup script runs, it calls the function AddSectionFilesToCopyList and
specifies the section name to add. For simple installation, only one call
specifies the section "Files."
This is also the procedure to use when you want to allow the user to
selectively setup portions of your application. You group your files in
sections and allow the user to select which sections to setup. It is then a
simple matter to call AddSectionFilesToCopyList for each section that is to be
setup.
Gene Psoter
Atascadero, California


Random Thoughts on the Stock Market




Dear DDJ,



Tom Swan's "Algorithm Alley" column (DDJ, December 1993) correctly points out
that a group of numbers, which are alleged to be random, must satisfy a lot of
tests. But the highlighted example--the stock market--is not a good random
sequence. The market is somewhat unpredictable. But successive prices are very
strongly correlated. The distribution of the first differences (daily changes)
is very uneven with far too many very small changes. You'd quickly scrap a
random-number generator that created numbers like that. The other example,
lottery numbers, is fine.
Donald Kenney
CompuServe 72630,3560
Tom replies: Thanks for your letter, Donald. You're right that successive
stock-market prices are very strongly correlated, but so are the sequences
produced by common random-number generators. In fact, so-called "random
numbers" are completely predictable to produce the same sequence--just rerun
the program using any value in the sequence as the starting seed! Stock market
prices are more random because they are truly unpredictable. If that were not
so, as I stated in the article, everyone would be wealthy.
I understand your point that stock-market prices themselves would not be
suitable as direct substitutes for common random-function output, but I never
said they were. If you use the Dow Jones Industrial Average to program a
lunar-lander simulator, the landing module may crash. (Let's hope the stock
market doesn't.) Seriously, I am not suggesting using stock prices as random
numbers; only that the behavior of the stock market is an example of true
randomness. Random-number generators are misnamed because their output is
predictable, and therefore, not actually random. A generator's output may
appear to satisfy some conditions of randomness, but only real-world events
are truly chaotic.


Putting HVC Back in Order




Dear DDJ,


In Maxwell T. Sanford II's letter (DDJ, November 1993) about my sidebar,
"Putting Colors in Order" (DDJ, July 1993), Maxwell was too restrictive in
assigning Cmax=min(V, 1--V). It is true that R, G, and B must be in the range
(0.0, 1.0), but the Chroma C can be as large as 2/3. This is true, for
example, when V=2/3 and H=0. My new paradigm for representing a color is a
color bubble, not a color cone. Given values for H and V, the max C is given
by the procedure GetMaxC in Figure 3.
Harry J. Smith
Saratoga, California
Processor Scenarios


Dear DDJ,


Even after many months, I found Nick Tredennick's article, "Computer Science
and the Microprocessor" (DDJ, June 1993) very interesting. His market model
and definitions make a lot of (common) sense. During the course of the
article, there is a recurrent theme of RISC vs. CISC and PC software vs.
workstation software, and it becomes apparent that Nick feels that the two
will never mix.
However, emerging software technology is providing the inevitability that the
two worlds will mix. Without going into its relative merits, Microsoft's
Windows NT is one of the first to desegregate hardware systems. NT is already
on several microprocessors, and the list is likely to grow. This can only
serve to help the microprocessor market waters find their own level. It should
mean that the largest market share (CISC) will drop some, and the lowest
market share (RISC) will rise.
The ultimate scenario is that any microprocessor house can provide a
microkernel driver for their product and "Intel Inside" will have no more
meaning than "GE Inside" for toaster-heating elements (although I've heard GE
is considering using the Pentium in its toasters as heater elements). Then we
can all pick our own favorite processor for the job at hand. Aaron Goldberg,
in the September 27, 1993 PC Week, called this the "Esperanto of Tomorrow"
(and showed why a multiplatform operating system has value, whereas Esperanto
failed for lack of interest).
Jonathan Platt
Pipersville, Pennsylvania

Figure 1
Day 1: 1 2 3
 6 5 4
Day 2: 1 6 2
 5 4 3
Day 3: 1 5 6
 4 3 2
Day 4: 1 4 5
 3 2 6
Day 5: 1 3 4
 2 6 5
Day 6: Repeats Day 1

Figure 2
#include <iostream.h>
#include <string.h>

char teams [100] [50], // 100 teams ought to cover the field
 // long names not allowed
 temp [50];
int n_teams = 0,
 first_half,
 second_half,
 day,
 rotate;
char last_name;


main ()
{
 // Read in team names
 do
 {
 cin >> teams [n_teams];
 last_name = teams [n_teams] [0];
 n_teams++;
 } while (last_name != *'); //Delete the * team:
 n_teams -= n_teams % 2; //Another simple, yet strong, expression

 // print out playing schedule
 for (day = 1; day < n_teams; day++)
 {
 cout << "\n\nDay " << day << "\n\n";
 for (first_half = 0, second_half = n_teams - 1;
 first_half < second_half;
 first_half++, second_half--) //What you can do with a for loop!
 cout << " "
 << teams [first_half]
 << " vs "
 << teams [second_half]
 << "\n";

 // Rotate teams for next day

 strcpy (temp, teams [1]);
 for (rotate = 2; rotate < n_teams; rotate++)
 strcpy (teams [rotate - 1], teams [rotate]);
 strcpy (teams [n_teams - 1], temp);
 }
 return 0;
}






Figure 3
procedure GetMaxC(H, V : Double;
 var C : Double);
 { Get Max C for given H and V }
const
 Pi = 3.14159265358979324;
 Pi2o3 = 2.0 * Pi / 3.0;
 DpR = 180.0 / Pi; { Degrees per
 Radian }
var
 HRad : Double;
 C1, C2, C3 : Double;
begin
 HRad:= H / DpR;
 C1:= Cos( HRad + Pi2o3);
 C2:= Cos( HRad);
 C3:= Cos( HRad - Pi2o3);
 if (C1 > 0.0) then
 C1:= V / C1
 else

 if (C1 < 0.0) then
 C1:= (V - 1.0) / C1
 else
 C1:= 1.0;
 if (C2 > 0.0) then
 C2:= V / C2
 else
 if (C2 < 0.0) then
 C2:= (V - 1.0) / C2
 else
 C2:= 1.0;
 if (C3 > 0.0) then
 C3:= V / C3
 else
 if (C3 < 0.0) then
 C3:= (V - 1.0) / C3
 else
 C3:= 1.0;
 C:= C1;
 if (C2 < C) then C:= C2;
 if (C3 < C) then C:= C3;
 if ((V <= 0.0) or (V >= 1.0))
 then C:= 0.0;
end; { GetMaxC }






































April, 1994
The Cambridge Algorithms Workshop


Bruce Schneier


Bruce, an independent software developer and author of Applied Cryptography:
Protocols, Algorithms, and Source Code in C (John Wiley & Sons, 1994),
presented a paper at the Cambridge workshop. Bruce can be contacted at
708-524-9461 or schneier@chinet.com.


In December 1993, the Cambridge Algorithms Workshop, hosted by the Cambridge
University Computer Laboratory, brought together leading figures in the field
of encryption who presented, examined, and challenged new encryption
algorithms designed to run quickly in software. The reasoning behind the focus
on encryption is that constructing algorithms which are both fast and secure
is the core problem of classical cryptology. However, recent developments,
such as differential attacks on block ciphers and correlation attacks on
stream ciphers, have forced cryptanalysts to rethink classic encryption
algorithms such as those in Table 1. At the same time, the need for fast,
efficient, and safe encryption at the application level has increased--DES
(even triple-DES) may be fast enough for e-mail, but it's too slow for
emerging high-bandwidth applications.
The goal of the conference, therefore, was to propose new algorithms capable
of encrypting data at a dozen or so clock cycles per byte for the emerging
class of high data-rate applications.
In this article, I'll cover the conference highlights and examine the current
state of encryption technology in general. For specifics on the workshop, the
proceedings will be published later this year by Springer-Verlag as part of
their Lecture Notes in Computer Science series. (Call 201-348-4033 for
availability.) Additionally, many of the topics and algorithms touched upon in
this article are discussed in my book, Applied Cryptography (John Wiley &
Sons, 1994).


The Algorithms


In all, ten complete algorithms were presented at the Cambridge Algorithms
Workshop, all secret-key, not public-key algorithms like Diffie-Hellman, RSA,
and the U.S. Government's Digital Signature Standard (DSS). (For more on
public keys, see "Untangling Public-Key Cryptography," DDJ, May 1992.)
Jim Massey presented SAFER, a 64-bit block cipher with a 64-bit key. It is
designed for implementation on simple processors on smart cards and only uses
addition and multiplication mod 256, and exponentiation and logarithms in the
multiplicative group of GF(257). Although Massey originally developed SAFER
for Cylink Inc., the company has decided to place this algorithm in the public
domain in the hope that it will become the new standard for software
encryption. This algorithm will probably be important. In fact, former Soviet
cryptanalysts in Yerevan, Armenia have been--without success--trying to break
SAFER for over a year.
Matt Robshaw of RSA Data Security presented a fast block cipher based on the
same principles as the MD5 one-way hash function (see my article, "One-Way
Hash Functions," DDJ, September 1991). Robshaw's algorithm, designed jointly
with Burt Kaliski, operates on large blocks--1024 bytes--but the principles
can be used to design ciphers with smaller block sizes. It is also likely to
be important, as RSA Data Security provides encryption technology to companies
such as Microsoft, Lotus, IBM, Novell, and Apple.
Phil Rogaway presented SEAL, a new stream cipher developed jointly with Don
Coppersmith, IBM's top cryptographer. This algorithm will be used by IBM for
software encryption of disk files in PC-based network software. The algorithm
is based on repeated lookup of large tables of pseudorandom numbers.
Hugo Krawczyk of IBM presented implementation and performance details of the
Shrinking Generator, which he also designed with Don Coppersmith and first
presented at Crypto '93. (See "The Shrinking Generator," Advances in
Cryptology: CRYPTO '93 Proceedings, Springer-Verlag, in preparation.) The
Shrinking Generator is a simple stream cipher which uses the output of a
linear-feedback shift register to decimate the output of another
linear-feedback shift register.
Markus Dichtl of Siemens presented a derivative of the Shrinking Generator
which combines the output of linear-congruential generators rather than
linear-feedback shift registers.
Ross Anderson of Cambridge University presented a modern rotor machine. Rotor
machines were all the rage in cryptography before algorithms moved to
computers. The German Enigma was a rotor machine, as were several U.S.
cryptography machines. They have fallen out of fashion recently, but this
algorithm is an attempt to show that they can still be secure in the face of
massive computing power. Anderson's proposal is for a rotor machine driven by
a linear-feedback shift register; it is simple to describe and to implement,
and yet seems to be very secure.
David Wheeler, also of Cambridge, proposed a bulk-encryption algorithm based
on experience designing algorithms for secure digital telephones. It is based
on iterating functions defined by large lookup tables of random permutations,
and can perform ultra-fast encryption of large amounts of data.
My own Blowfish algorithm is a 64-bit block algorithm with a variable-length
key. (See "Blowfish: A New Encryption Algorithm," on page 38 of this issue.)
William Wolfowicz (of the Italian telephone company's research lab) presented
a 64-bit block algorithm with a 128-bit key which was designed jointly with
Adina di Porto. The algorithm makes use of a novel permutation property
pointed out by the Russian number theorist Vinogradov.
Joan Daemen presented 3-WAY, a new block cipher he's been developing with
colleagues at Leuven, Belgium for two years. It is designed to be fast in both
hardware and software, and to resist both differential and linear
cryptanalysis attacks.
The fastest algorithm appears to be either Wheeler's or Rogaway-Coppersmith's.
Both use about 20 instructions (including four table lookups) to encrypt a
32-bit word: about five clock cycles per byte. On a SparcStation, they had
throughput of about 20 Mbytes/second; on a DEC Alpha, about 100 Mbytes/second.
However, many of the other algorithms are not far behind.
It is too early to tell if any of these algorithms are secure. The important
thing is that they are out there and that cryptanalysts are starting to
examine them. I expect at least two of them to fall before the end of the
year. (It would be unfair to divulge which I think are insecure.) The
techniques used to break them will be recycled into the design of new
encryption algorithms, which may then be broken using new techniques. And so
the cycle of research will continue.
Hopefully, in five or six years there will be a few algorithms that are still
considered secure. These may then be proposed as standards to replace DES and
then used to encrypt data far into the next century.


Designing Secure Algorithms


If nothing else, the Cambridge workshop proved that fast, efficient, and safe
encryption algorithms are as difficult and challenging to design as ever. The
rules of algorithm design are simple. An encryption algorithm should be secure
under the following conditions:
The cryptanalyst (that's the guy trying to break the algorithm) knows all the
details of the algorithm. He has some ciphertext, and his job is to deduce the
plaintext. (This is called a "ciphertext-only" attack.)
The cryptanalyst not only has the algorithm and some encrypted ciphertext, but
also the unencrypted plaintext. His job is to deduce the key. (This is called
a "known-plaintext" attack.)
The cryptanalyst not only has the algorithm, some ciphertext, and the
unencrypted plaintext, but he gets to choose what it is. If there is a
particular plaintext sequence that, if encrypted, will easily yield the key,
he gets to encrypt that sequence. (This is called a "chosen-plaintext"
attack.)
All of these attacks are feasible and have been mounted in the real world.
(For historical anecdotes, see The Codebreakers: The Story of Secret Writing,
by D. Kahn, Macmillan, 1967.) Often, noncryptographers insist that the details
of their algorithm should remain secret. From the point of view of security,
this is a dubious practice. Security should not depend on the secrecy of the
algorithm. If it did, it would be far too vulnerable to a "black bag" attack.
A hardware-encryption device can be stolen and reverse-engineered; a
software-encryption device can be disassembled. (Even the details of DES were
published by the government; see the National Bureau of Standards, Data
Encryption Standard, U.S. Department of Commerce, FIPS PUB 46, January 1977.)
Cryptographers have to assume that analysts will have everything but the key,
simply because it is prudent to do so.
The Skipjack algorithm remains classified and is implemented in the supposedly
tamper-resistant Clipper chip. When Intel came up with a new
reverse-engineering technique which they thought might beat the tamper
protection, the NSA promptly classified it. Even so, the algorithm is probably
resistant to analysis even if its details become known. (The primary reason
that NSA does not want to release the inner workings of Skipjack is probably
because they don't want their secret techniques used to design other
high-security algorithms.)
Cryptographers also have to assume that analysts have access to enormous
amounts of computing power--more computing time than can be optimistically
expected during the next 100 years or so. In the face of all these conditions,
the algorithm should still be unbreakable.


Key Length


Key length is a poor measure of the security of an algorithm, but it's a good
place to start. Algorithms with long keys are not necessarily secure, but
algorithms with short keys are definitely insecure.
For instance, earlier this year Michael Weiner presented a design for a
brute-force DES cracker (see "Efficient DES Key Search," Advances in
Cryptology: CRYPTO '93 Proceedings, Springer-Verlag, in preparation). He
didn't just do some theoretical calculations, but went through the entire
design process. He designed a custom cracking chip down to the gate level and
had his in-house fabrication department estimate fabrication costs. He
designed a controller board, had its cost estimated, and designed and priced
peripheral hardware (power supplies, racks, and a complete mechanical
housing). What Weiner determined was that with a $1 million machine, he could
break DES in three-and-a-half hours. This is cheap enough to hide in the
budget of several different government agencies. It's even cheap enough to be
considered by large corporations or organized-crime syndicates. (His employer,
Bell Northern Research, claims to have no interest in building a working
model.)
This work has important implications for algorithm design. There's nothing
special about DES; the analysis will be similar for all algorithms. 56 bits is
too small for a key; even 64 bits is too small. Even 80 key bits is
marginal--only enough for short-term security. Any algorithm proposed today
should have a key length of at least 128 bits; see Table 2.
These calculations are based on present-day computers. For future projections,
plan on computing power doubling every 18 months. Each of the above numbers
becomes an order of magnitude smaller every five years: A $1 billion machine
that takes 6.7 years to break a key today will take 0.67 years (8 months) with
1999 technology and 0.067 years (24 days) with 2004 technology. What is secure
now might not be in 50 years. In light of these calculations, Skipjack's
80-bit key seems woefully inadequate.
Key length is also critical for export. The U.S. Government does not permit
the export of algorithms with key lengths greater than 40 bits. (Yes, some
exportable algorithms appear to have longer key lengths, but the effective key
length is 40 bits or less.) Various computer-privacy advocates are trying to
change this.
Some of the algorithms presented at the Cambridge workshop had variable-length
keys. This is especially desirable because it allows the implementor to define
his own level of security. If he has to export the algorithm, the implementor
can set the key length at 40 bits. For low-grade security (information that
only has to remain secret for a few minutes), you can use 64 bits. If you need
long-term security, you can use key lengths of 128 or even 256 bits.
Variable-length keys were generally constructed during a "key-expansion phase"
of the algorithm. Generally, there was an initial bit of computation required
before the algorithm could encrypt any data. During this computation, the key
typed in by the user would be expanded into a large set of subkeys used for
encryption. DES does this to some degree; the 56-bit key is expanded into an
array of subkeys totaling 768 bits. Some algorithms at the Cambridge workshop
took this to an extreme, expanding a key into subkey arrays totaling 1 Kbyte
of data or even more.



Differential and Linear Cryptanalysis


Of course, the trick to algorithm design is to make sure that a brute-force
attack is the most efficient way of getting the key, although for most
encryption algorithms, there are other ways. These methods, generally very
complex and mathematical, involve exploiting the structure of the algorithm.
Differential cryptanalysis and linear cryptanalysis are two new attacks that
have been successfully used against DES and other algorithms. Differential
cryptanalysis was invented by Biham and Shamir in 1991 (see Differential
Cryptanalysis of the Data Encryption Standard, by E. Biham and A. Shamir,
Springer-Verlag, 1993), while linear cryptanalysis was invented by Mitsuru
Matsui in 1993 (refer to "Linear Cryptanalysis Method for DES," Advances in
Cryptology: CRYPTO '93 Proceedings, Springer-Verlag, in preparation).
Differential cryptanalysis is a chosen-plaintext attack: It looks at
differences between pairs of plaintexts and corresponding pairs of
ciphertexts. These differences, along with information about the structure of
the underlying algorithm, give an analyst clues about the key. Collect enough
of these differences, and you can find the key more efficiently than you would
with brute force.
Don't get too excited, though. The best chosen-plaintext differential
cryptanalysis attack against DES has a complexity of 247. This is better than
the 256 required for brute force, but requires on the order of 10 terrabytes
of chosen-plaintext data. Although interesting, it is still more theoretical
than practical. The best way to attack DES is still brute force.
Linear cryptanalysis is similar to differential cryptanalysis, but looks for
linear relationships between selected bits of the plaintext, ciphertext, and
key. Against DES, this attack has a complexity of 243. Even better, it is a
known-plaintext attack. However, it still requires much too much data to be
practical.
Now that these attacks are known, there are techniques for optimizing
encryption algorithms so that they are resistant to them. Most of the
algorithms presented at the workshop took these attacks into account during
the design process; see Table 3.


Cascading Multiple Algorithms


One way to increase the security of your system is to chain multiple
algorithms together. For example, first encrypt your file with DES and one
key, and then with IDEA (see "The IDEA Encryption Algorithm," by Bruce
Schneier, DDJ, December 1993) and another key. The result will be much
stronger than using either of the two algorithms individually.
Cascading multiple algorithms might also be the best way to negotiate security
with algorithms that some people don't trust. If Alice and Bob want to
communicate with each other and don't trust each other's algorithms, they can
use both--first her algorithm, and then his. This idea, suggested by Whitfield
Diffie, was discussed at the Cambridge workshop.
This sounds good, but there is a problem: Massey and Maurer proved that a
cascade of multiple algorithms is at least as strong as the first or, with
stream ciphers, at least as strong as the best (see "Cascade Ciphers: The
Importance of Being First" by Maurer and Massey, Journal of Cryptology, 1993).
The difficulty with proving anything more than this is that a bad guy might
provide you with a first algorithm which twisted your plaintext around so as
to provide a chosen-plaintext attack on the second algorithm which you
supplied.
This applies not just to encryption algorithms, but to any process designed by
someone else which you incorporate into your system. In fact, the widely used
CELP code (which compresses digital speech to modem speeds) was designed by
the NSA, and for all anyone knows it could be acting as a cryptanalyst's
helper in some subtle way. It does seem though, that a cascade of algorithms
is better than individual algorithms, provided that the second and subsequent
algorithms are secure against chosen-ciphertext attacks, and provided all the
algorithms' keys are independent.
The real benefit of cascading algorithms is in design diversity; it makes the
overall system less vulnerable to a cryptanalytic attack. Both triple-DES and
IDEA seem secure today, but there is always the possibility that some clever
mathematician might come up with a good attack against one of them tomorrow.
Using triple-DES, then a fast stream cipher such as Wheeler's algorithm, and
then IDEA, would be immune to new attacks against any one of the three
algorithms. Successful attacks against all three would be required to break
the cascaded system.


Conclusion


Encryption algorithms are like airplanes. It's easy to design one, but it's
hard to design one that flies. To make matters worse, it's hard to tell if any
one of them is any good. The only real way to test the security of an
algorithm is to let other programmers try breaking it. But even if the
algorithm survives years of intense analysis by many different people, you can
still only hope that it is really secure.

Table 1: Block-encryption algorithms. IDEA is used for message encryption in
Pretty Good Privacy; PGP. RC2 was developed by RSA Data Security and is used
in a variety of commercial software packages. Skipjack is the NSA-developed
algorithm in the Clipper chip. GOST is an algorithm developed in the Soviet
Union and only recently made public.
 Key Block Length Length Problems

 DES 56 bits 64 bits Key too small
 Triple-DES 112 bits 64 bits Slow
 Khufu 64 bits 64 bits Patented; key too small
 FEAL 32 64 bits 64 bits Patented; key too small
 LOKI-91 64 bits 64 bits Weaknesses; key too small
 REDOC II 160 bits 80 bits Patented
 REDOC III variable 64 bits Patented
 IDEA 128 bits 64 bits Patented
 RC2 variable 64 bits Proprietary
 Skipjack 80 bits 64 bits Secret algorithm
 GOST 256 bits 64 bits Not completely specified
 MMB 128 bits 128 bits Insecure


Table 2: Key length and security in 1994.
 Key Time for a $1M Time for a $1B Length Machine to Break Machine to Break

 40 bits 0.2 seconds 0.0002 seconds
 56 bits 3.5 hours 13 seconds
 64 bits 37 days 54 minutes
 80 bits 2000 years 6.7 years
 100 bits 7 billion years 7 million years
 128 bits 1018 years 1015 years
 192 bits 1037 years 1034 years
 256 bits 1056 years 1053 years


Table 3: Cryptanalysis of DES.
 Attack Type Complexity

 Brute-force Known-plaintext 255

 Differential Known-plaintext 255
 Differential Chosen-plaintext 247
 Linear Known-plaintext 243



























































April, 1994
Cryptography Without Exponentiation


Peter Smith


Peter has worked in the computer industry for 16 years, and served as deputy
editor of Asian Computer Monthly. He invented LUC in 1991 and founded LUCENT
to commercialize Lucas-function-based cryptography. He can be reached at 25
Lawrence Street, Herne Bay, Auckland, New Zealand.


While not a full-fledged public-key cryptosystem, the 1976 Diffie-Hellman
key-negotiation technique featured the first cryptographic use of modulus
exponentiation. Diffie and Hellman's method, which is used to establish a
secret key over an insecure channel, is still in use because the mathematical
problem on which it is based remains as difficult today as it was in 1976. In
this, Diffie and Hellman were also the first to base cryptosystems on problems
that mathematicians have been unable to solve.
In the January 1993 issue of DDJ, I presented an alternative to the RSA
encryption algorithm called LUC (see "LUC Public-Key Encryption," DDJ, January
1993). As that article suggests, many ciphers, including the
Hellman-Diffie-Merkle key-exchange system and the El Gamal digital signature,
can be reinforced by replacing the process of exponentiation with the process
of calculating Lucas functions. This article extends LUC with three new
cryptosystems: the Lucas-function El Gamal public-key encryption, the
Lucas-function El Gamal digital signature, and a Lucas-function-based
key-negotiation method called LUCDIF.


The Algorithms


The exponentiation ciphers here are all based on the mathematical problem
known as the Discrete Logarithm (DL). Basically, this problem reduces to
solving for x in the equation ax=b mod c; where a, b, and c are integers and
their values are known. The cipher known as El Gamal and its variants were
introduced over the course of the 1980s and are based on the DL problem. One
of these, Schnorr's variant of the El Gamal digital signature, was chosen by
the National Institute of Standards and Technology as the basis of the Digital
Signature Standard.
As suggested in my previous article, ciphers based on the DL problem can be
implemented using Lucas functions instead of exponentiation. Such
implementations are sometimes not without their complications in terms of
storage and timing overheads, but they can be shown to be asymptotically as
fast. More importantly, they are cryptographically stronger than their
exponentiation-based ancestors. It is an open question how much stronger the
Lucas-function ciphers are. The fastest known subexponential-time algorithms
for attacking the DL can't be used against them, making them vulnerable only
to exponential-time attacks.
The mathematical problem on which the Lucas-function ciphers are based is
analogous to the DL problem, except that here the problem is to solve for x in
the equation Vx(a,1)=b mod c. This problem has the advantage that the
subexponential algorithms do not appear to generalize to it, so breaking these
ciphers is much more expensive.


Key Negotiation


The Diffie-Hellman key-negotiation process allows two correspondents, Alice
and Bob, to establish a common cryptographic key between them, even if an
eavesdropper is listening in on their connection. They both agree on a prime p
and a primitive root (or generator) a. Using a secret number A, Alice
publishes her part of the key, as given by aA mod p. Similarly, Bob publishes
his part of the key using his secret number B, using the formula aB mod p. In
Alice's case, she takes Bob's key and forms (aB)A mod p, while Bob takes
Alice's published key, and forms (aA)B mod p. Since (aB)A mod p equals aAB mod
p equals (aA)B mod p equals some value K, say, then both Alice and Bob now
have the same key. This method, the first successful, though partial,
implementation of public-key ideas, lets part of the key be made public.
The DL problem seems to guarantee that an eavesdropper, who has only public
knowledge and not the secret values A and B, cannot find K. If p, A, and B are
large enough (say, over 500 bits in length), there is only a small chance of
guessing the secret values. If the key were to be used as a DES key, Alice and
Bob could agree to take only the first 56 bits to K.
We have called our Lucas-function-based key-negotiation method LUCDIF,
combining LUCas and DIFfie. As with LUC, the known multiplicative attacks on
Diffie-Hellman do not carry over to LUCDIF, since it is not multiplicative.
LUCDIF is quite analogous to Diffie-Hellman. Choose the prime p in the same
way. A value l must be chosen so that the condition in Figure 1(a) is true.
Finding such a value is easy. Every value tried has a 50 percent chance of
satisfying the condition. Now the values given by VA(l,1) mod p and VB(l,1)
mod p are published by Alice and Bob, respectively. Bob takes Alice's number
and calculates VB(VA(l,1)) mod p. Similarly, Alice calculates VA(VB(l,1))
modp.
Relation 1 in Figure 1(b) shows that these two values are the same and that
Alice and Bob have obtained the same key, K'. If p is a prime of over 512
bits, then this method of key negotiation is very secure. Once again, for a
DES key, Alice and Bob may decide to select only the first 56 bits of K'.


El Gamal and LUCELG


The El Gamal cipher comes in two parts. There is a procedure for encrypting
and decrypting and a second procedure for signing and verifying a digital
signature. For encryption, assume Alice wants to send a message M to Bob using
his public key y which is equal to ax mod p (x is Bob's private key). Alice
first finds a secret number k, which is greater than zero and less than p, and
calculates L using L_yk mod p. Two other values are then worked out: c1_ak mod
p, and c2_LM mod p. These two values, c1 and c2, make up the cryptogram which
Alice sends to Bob.
For decryption, Bob first calculates L using the fact that L_(ak)x=(c1)x,
since only he knows the value of his secret key x. Having found L, Bob
calculates its multiplicative inverse (L--1), and multiplies this by c2,
recovering M; M_c2(L--1) mod p.
The Lucas-function version of El Gamal public-key encryption and decryption
follows a path similar to that of El Gamal public-key encryption/decryption.
Bob's public key, in this case, is Vx(g,1) mod p. A secret value k is also
necessary here, and we first calculate G. When encrypting, G_Vk(y,1) mod p.
The two halves of the cryptogram are then computed: d1_Vk(g,1) mod p, and
d2_GM mod p.
In the decryption case, Bob deciphers the cryptogram by solving for G;
G_Vx(d1,1) mod p. The multiplicative inverse of G can be calculated, modulo p,
using the extended Euclidean algorithm (see Knuth), and the message is
recovered by M_d2(G--1) mod p. Figure 2 provides an example.
Note that the LUCELG cryptogram is twice the size it would be in LUC. Both d1
and d2 almost always have the same number of digits as the modulus, so the
combined cryptogram will have a length of about twice that of p. This is also
the case with the exponentiation version.


Digital Signature


The El Gamal digital signature is more cumbersome to convert from
exponentiation to Lucas functions than is El Gamal public-key
encryption/decryption. However, observing that Lucas functions have formulas
for multiplying and adding subscripts--see Figure 1(b)--we can construct an El
Gamal-like cipher, since El Gamal's manipulation of exponents can be converted
to the manipulation of Lucas-function subscripts. The formula for the addition
of subscripts (Relation 2) involves the Lucas {Ui} "sister" series.
Subsequently, our Lucas-function alternative to El Gamal involves the doubling
of the public-key size (two Lucas function values, U and V, must be given), as
well as increasing the size of the signature, because two "r" (U and V) values
are necessary.
The variant of El Gamal chosen as the Digital Signature Standard can be
converted in a similar manner. In both cases, we produce ciphers apparently
based on a problem for which there is no known subexponential-time attack;
hence, they are stronger than their prototypes.
The calculation of the nth Lucas function can be done in O(logn) operations,
which is the same order as the computation of similar exponentials. Heuristics
to speed up modular exponentiation can be brought over to the calculation of
Lucas functions, if in more complicated form (witness the formula for adding
subscripts). These new ciphers can be assured of having performance
characteristics similar to those of their progenitors.


Conclusion


For the same level of security, these Lucas-function-based ciphers can be used
with a shorter modulus than the exponentiation ciphers. For a 512-bit modulus,
the reduction is about one fifth, down to 420 bits, for equivalent
cryptographic strength. This reduction increases in size as the modulus grows
longer. That only exponential-time attacks are possible on the Lucas-function
version of the DL problem ensures attempts to solve it are increasingly more
expensive than the subexponential-time attacks possible on the DL itself. We
have applied for patents on these algorithms.
Finally, LUC Encryption Technology Ltd. (LUCENT), has been incorporated to
license and support cryptographic systems based on Lucas functions. For more
information, contact Horace R. Moore, 101 E. Bonita, Sierra Madre, CA 91024.


References



Carey, M.R. and D.S. Johnson. Computers and Intractability: A Guide to the
Theory of NP-Completeness. San Francisco, CA: W.H. Freeman, 1979.
Diffie, W. and M.E. Hellman. "New Directions in Cryptography." IEEE
Transactions on Information Theory (November 1976).
El Gamal, T. "A Public-key Cryptosystem and a Signature Scheme Based on
Discrete Logarithms." IEEE Transactions on Information Theory (July 1985).
Knuth, D.E. The Art of Computer Programming: Volume II: Semi-Numerical
Algorithms, 2nd ed. Reading, MA: Addison-Wesley, 1981.
Smith, Peter. "LUC Public-Key Encryption." Dr. Dobb's Journal (January 1993).


Secure alternatives to RSA


Figure 1: (a) Choosing a value l; (b) Lucas-function relations which let us
transform exponentiation to Lucas-function calculation.
(a) V(p+1)/t(l,1)_2 mod p, for all t>1 dividing (p+1)

(b) Vnm(P,Q)=Vn(Vm(P,Q),Qm) Relation 12Vn+m=VnVm+DUnUm Relation 2



Figure 2: Example of LUCELG.
Let the prime p be 908797. Choose k=1949, g=19, x=2089, y=894501 and M=1111.
G_Vk(y,1) mod p=V1949(894501,1) mod 908797=788038.
d1_Vk(g,1) mod p=V1949(19,1) mod 908797=307718.
d2_GM mod p=788038.1111 mod 908797=338707.
The cryptogram is the pair (d1,d2)=(307718, 338707).
The receiver, who knows that the secret key is 2089, first calculates:
G_Vx(d1,1) mod p=V2089(307718,1)=788038 The inverse of this is 518288.
M_d2(G--1)=338707.518288=1111 The original message.


































April, 1994
SHA: The Secure Hash Algorithm


William Stallings


William is an independent consultant and president of Comp-Comm Consulting of
Brewster, MA. This article is based on material in his forthcoming book,
Network and Internetwork Security (Macmillan, due June 1994). He can be
reached at stallings@acm.org.


An essential element of most authentication and digital-signature schemes is a
hash algorithm. A hash function accepts a variable-size message M as input and
produces a fixed-size tag H(M), sometimes called a "message digest," as output
(see "One-Way Hash Functions," by Bruce Schneier, DDJ, September 1991).
Typically, a hash code is generated for a message, encrypted, and sent with
the message. The receiver computes a new hash code for the incoming message,
decrypts the hash code that accompanies the message, and compares them. If the
message has been altered in transit, there will be a mismatch.
The Secure Hash Algorithm (SHA) was developed by the National Institute of
Standards and Technology (NIST) and published as a federal
information-processing standard (FIPS PUB 180) in 1993. SHA is based on the
MD4 algorithm, developed by Ron Rivest of MIT, and its design closely models
MD4. SHA is used as part of the new Digital Signature Standard from NIST, but
it can be used in any security application that requires a hash code.


SHA Logic


SHA takes as input a message with a maximum length of less than 264 bits and
produces as output a 160-bit message digest. The input is processed in 512-bit
blocks. Figure 1 shows the overall processing of a message to produce a
digest. The processing consists of the following steps:
Step 1: Append padding bits. The message is padded so that its length is
congruent to 448 modulo 512. Padding is always added, even if the message is
already of the desired length. Thus, the number of padding bits is in the
range of 1 to 512. The padding consists of a single 1 bit followed by the
necessary number of 0 bits.
Step 2: Append length. A block of 64 bits is appended to the message. This
block is treated as an unsigned 64-bit integer and contains the length of the
original message (before the padding).
The outcome of the first two steps yields a message that is an integer
multiple of 512 bits in length. The figure represents the expanded message as
the sequence of 512-bit blocks Y0, Y1_YL--1, so that the total length of the
expanded message is Lx512 bits. Equivalently, the result is a multiple of 16
32-bit words. Let M[0_N--1] denote the words of the resulting message, with N
being an integer multiple of 16. Thus N=Lx16.
Step 3: Initialize MD buffer. A 160-bit buffer is used to hold intermediate
and final results of the hash function. The buffer can be represented as five
32-bit registers (A,B,C,D,E). These registers are initialized to the following
hexadecimal values (high-order octets first):
A=67452301
B=EFCDAB89
C=98BADCFE
D=10325476
E=C3D2E1F0
Step 4: Process message in 512-bit (16-word) blocks. The heart of the
algorithm is a module that consists of 80 steps of processing; this module is
labeled HSHA in Figure 1, and its logic is illustrated in Figure 2. The 80
steps have a similar structure.
Note that each round takes as input the current 512-bit block being processed
(Yq) and the 160-bit buffer value ABCDE, and updates the contents of the
buffer. Each round also makes use of an additive constant Kt. Only four
distinct constants are used. The values, in hexadecimal, are shown in Figure
3.
Overall, for block Yq, the algorithm takes Yq and an intermediate digest value
MDq as inputs. MDq is placed into buffer ABCDE. The output of the 80th step is
added to MDq to produce MDq+1. The addition is done independently for each of
the five words in the buffer with each of the corresponding words in MDq,
using addition modulo 232.
Step 5: Output. After all L 512-bit blocks have been processed, the output
from the Lth stage is the 160-bit message digest.
In the logic of each round, each round is of the form:
A,B,C,D,E[CLS5(A)+ft(B,C,D)+E+
Wt+Kt],A, CLS30(B),C,D
where A,B,C,D,E=the five words of the buffer; t=round, or step, number;
0t79; ft=a primitive logical function; CLSs=circular left shift (rotation)
of the 32-bit argument by s bits; Wt=a 32-bit word derived from the current
512-bit input block; and Kt=an additive constant. Four distinct values are
used, and +=addition modulo 232.
Each primitive function takes three 32-bit words as input and produces a
32-bit word output. Each function performs a set of bitwise-logical
operations; that is, the nth bit of the output is a function of the nth bit of
the three inputs. The functions are in Table 1. As you can see, only three
different functions are used. For 0t19, the function is the conditional
function: If B then C else D. For 20t39 and 60t79, the function produces a
parity bit. For 40t59, the function is True if two or three of the arguments
are True.
It remains to indicate how the 32-bit word values, Wt, are derived from the
512-bit message. The first 16 values of Wt are taken directly from the 16
words of the current block. The remaining values are defined as:
Wt=Wt--16_Wt--14_Wt--8_Wt--3
Thus, in the first 16 rounds of processing, the input from the message block
consists of a single 32-bit word from that block. For the remaining 64 rounds,
the input consists of the XOR of a number of the words from the message block.
SHA can be summarized as:
MD0=IV
MDq+1=SUM32(MDq, ABCDEq)
MD=MDL--1
where IV=initial value of the ABCDE buffer, defined in Step 3; ABCDEq=the
output of the last round of processing of the qth message block; L=the number
of blocks in the message (including padding and length fields); SUM32=addition
modulo 232 performed separately on each word of the pair of inputs; and
MD=final message-digest value.


SHA Security


SHA has the property that every bit of the hash code is a function of every
bit in the input. The complex repetition of the basic function ft produces
well-mixed results; that is, it is unlikely that two messages chosen at
random, even if they exhibit similar regularities, will have the same hash
code. Unless there is some hidden weakness in SHA, which has not so far been
published, the difficulty of coming up with two messages having the same
message digest is on the order of 280 operations, while the difficulty of
finding a message with a given digest is on the order of 2160 operations.


Putting message digests to work


 Figure 1: Overall processing of a message to produce a digest.
 Figure 2: The logic of the module HSHA; addition (+) is mod 232.


Figure 3: Hexadecimal values of the four constants.
0t19 Kt=5A827999
20t39 Kt=6ED9EBA1
40t59 Kt=8F1BBCDC
60t79 Kt=CA62C1D6


Table 1: SHA primitive functions.
 Round ft(B,C,D)

 (0t19) (B.C)/(--B.D)
 (20t39) B_C_D
 (40t59) (B.C)/(B.D)/(C.D)
 (60t79) B_C_D
















































April, 1994
The Blowfish Encryption Algorithm


Bruce Schneier


Bruce is the author of Applied Cryptography: Protocols, Algorithms, and Source
Code in C (John Wiley, 1994). This article is based on a paper he presented at
the Cambridge Algorithms Conference. Bruce can be contacted at schneier@
chinet.com.


Blowfish is a block-encryption algorithm designed to be fast (it encrypts data
on large 32-bit microprocessors at a rate of 26 clock cycles per byte),
compact (it can run in less than 5K of memory), simple (the only operations it
uses are addition, XOR, and table lookup on 32-bit operands), secure
(Blowfish's key length is variable and can be as long as 448 bits), and robust
(unlike DES, Blowfish's security is not diminished by simple programming
errors).
The Blowfish block-cipher algorithm, which encrypts data one 64-bit block at a
time, is divided into key-expansion and a data-encryption parts. Key expansion
converts a key of at most 448 bits into several subkey arrays totaling 4168
bytes. Data encryption consists of a simple function iterated 16 times. Each
iteration, called a "round," consists of a key-dependent permutation and a
key- and data-dependent substitution.


Subkeys


Blowfish uses a large number of subkeys that must be precomputed before any
data encryption or decryption. The P-array consists of 18 32-bit subkeys, P1,
P2_P18, and there are four 32-bit S-boxes with 256 entries each: S1,0,
S1,1_S1,255; S2,0, S2,1_S2,255; S3,0, S3,1_S3,255; S4,0, S4,1_S4,255.


Encryption


Blowfish is a Feistel network consisting of 16 rounds; see Figure 1. The input
is a 64-bit data element, x. Divide x into two 32-bit halves: xL and xR. Then,
for i=1 to 16:
xL=xL XOR Pi
xR=F(xL) XOR xR
Swap xL and xR
After the sixteenth iteration, swap xL and xR to undo the last swap. Then
xR=xR XOR P17 and xL=xL XOR P18. Finally, recombine xL and xR to get the
ciphertext.
Function F looks like this: Divide xL into four eight-bit quarters: a, b, c,
and d. F(xL)=((S1,a+S2,b mod 232)XOR S3,c)+S4,d mod 232; see Figure 2.
Decryption is exactly the same as encryption, except that P1, P2_P18 are used
in the reverse order.
Implementations of Blowfish that require the fastest speeds should unroll the
loop and ensure that all subkeys are stored in cache. For the purposes of
illustration, I've implemented Blowfish in C; Listing One (page 98) is
blowfish.h, and Listing Two (page 98) is blowfish.c. A required data file is
available electronically; see "Availability," page 3.


Generating the Subkeys


The subkeys are calculated using the Blowfish algorithm, as follows:
1. Initialize first the P-array and then the four S-boxes, in order, with a
fixed random string. This string consists of the hexadecimal digits of p.
2. XOR P1 with the first 32 bits of the key, XOR P2 with the second 32 bits of
the key, and so on for all bits of the key (up to P18). Cycle through the key
bits repeatedly until the entire P-array has been XORed.
3. Encrypt the all-zero string with the Blowfish algorithm, using the subkeys
described in steps #1 and #2.
4. Replace P1 and P2 with the output of step #3.
5. Encrypt the all-zero string using the Blowfish algorithm with the modified
subkeys.
6. Replace P3 and P4 with the output of step #4.
7. Continue the process, replacing all elements of the P-array and then all
four S-boxes in order, with the output of the continuously changing Blowfish
algorithm.
In total, 521 iterations are required to generate all required subkeys.
Applications can store the subkeys rather than re-executing this derivation
process.


Design Decisions


The underlying philosophy behind Blowfish is that simplicity of design yields
an algorithm that is easier both to understand and to implement. Hopefully,
the use of a streamlined Feistel network (the same structure used in DES,
IDEA, and many other algorithms), a simple S-box substitution, and a simple
P-box substitution, will minimize design flaws.
For details about the design decisions affecting the security of Blowfish, see
"Requirements for a New Encryption Algorithm" (by B. Schneier and N. Fergusen)
and "Description of a New Variable-Length Key, 64-Bit Block Cipher (Blowfish)"
(by B. Schneier), both to be included in Fast Software Encryption, to be
published by Springer-Verlag later this year as part of their Lecture Notes on
Computer Science series. The algorithm is designed to be very fast on 32-bit
microprocessors. Operations are all based on a 32-bit word and are
one-instruction XORs, ADDs, and MOVs. There are no branches (assuming you
unravel the main loop). The subkey arrays and the instructions can fit in the
on-chip caches of both the Pentium and the PowerPC. Furthermore, the algorithm
is designed to be resistant to poor implementation and programmer errors.
I'm considering several simplifications to the algorithm, including fewer and
smaller S-boxes, fewer rounds, and on-the-fly subkey calculation.


Conclusions



At this early stage, I don't recommend implementing Blowfish in security
systems. More analysis is needed. I conjecture that the most efficient way to
break Blowfish is through exhaustive search of the keyspace. I encourage all
cryptanalytic attacks, modifications, and improvements to the algorithm.
However, remember one of the basic rules of cryptography: The inventor of an
algorithm is the worst person to judge its security. I am publishing the
details of Blowfish so that others may have a chance to analyze it.
Blowfish is unpatented and will remain so in all countries. The algorithm is
hereby placed in the public domain and can be freely used by anyone.
 Figure 1: Blowfish is a Feistel network consisting of 16 rounds.
 Figure 2: Blowfish function F.


DDJ's Blowfish Cryptanalysis Contest


The only way to inspire confidence in a cryptographic algorithm is to let
people analyze it. It is in this spirit that DDJ is pleased to announce the
Blowfish Cryptanalysis Contest, our third reader contest in recent years.
We'd like you to cryptanalyze Bruce Schneier's Blowfish algorithm presented in
this issue. Give it your best shot. Break it, beat on it, cryptanalyze it. The
best attack received by April 1, 1995 wins the contest.
The contest rules are simple. It's open to any individual or organization.
Governments are encouraged to enter. Even the NSA can compete and win the
prize (their budget isn't what it used to be; they can probably use the
money). But since we will publish the results, classified entries will not be
permitted. To officially enter the contest, your entry must be accompanied by
a completed and signed entry form. These are available electronically (see
"Availability," page 3) or we'll be glad to mail or fax you a hardcopy.
We're not going to publish messages encrypted in Blowfish and some random key,
because we think that would be too difficult.
Partial results--those attacks that don't break the algorithm but instead
prove that it isn't as strong as we thought it was--are just as useful and can
be entered.
Your entry does not have to consist of code. Instead, your entry can be a
paper describing the attack. The attack does not have to completely break the
Blowfish algorithm, it can simply be more efficient than a brute-force attack.
The attack can be against either the complete algorithm or a simplified
version of the algorithm (fewer rounds, smaller block size, simpler S-boxes,
and the like).
We'll select a winner based on the following criteria:
Success of the attack. How much more efficient is the attack than brute force?
Type of attack. Is it ciphertext only, known plaintext, or chosen plaintext?
Type of algorithm. Is the attack against full Blowfish or a simplified version
of the algorithm?
Bruce Schneier, frequent DDJ contributor, author of Applied Cryptography, and
inventor of the Blowfish algorithm will referee the contest.
The contest results will be published in the September 1995 issue of Dr.
Dobb's Journal, in which we'll discuss and summarize the winning programs, the
weaknesses of the Blowfish algorithm, and any modifications of the algorithm.
We'll be providing a number of awards for the winners. The grand-prize winner
will receive a $750 honorarium. Honorariums of $250 to the second-place winner
and $100 to the third-place winner will also be awarded.
--editors
[LISTING ONE] (Text begins on page 38.)

/* Blowfish.h */

#define MAXKEYBYTES 56 /* 448 bits */

short opensubkeyfile(void);
unsigned long F(unsigned long x);
void Blowfish_encipher(unsigned long *xl, unsigned long *xr);
void Blowfish_decipher(unsigned long *xl, unsigned long *xr);
short InitializeBlowfish(char key[], short keybytes);

[LISTING TWO]

/* Blowfish.c */

#include <dos.h>
#include <graphics.h>
#include <io.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <alloc.h>
#include <ctype.h>
#include <dir.h>
#include <bios.h>
#include <Types.h>

#define N 16
#define noErr 0
#define DATAERROR -1
#define KEYBYTES 8
#define subkeyfilename "Blowfish.dat"


static unsigned long P[18];
static unsigned long S[4,256];
static FILE* SubkeyFile;

short opensubkeyfile(void) /* read only */
{
 short error;
 error = noErr;
 if((SubkeyFile = fopen(subkeyfilename,"rb")) == NULL) {
 error = DATAERROR;
 }
 return error;
}
unsigned long F(unsigned long x)
{
 unsigned short a;
 unsigned short b;
 unsigned short c;
 unsigned short d;
 unsigned long y;

 d = x & 0x00FF;
 x >>= 8;
 c = x & 0x00FF;
 x >>= 8;
 b = x & 0x00FF;
 x >>= 8;
 a = x & 0x00FF;

 y = ((S[0, a] + (S[1, b] % 32)) ^ S[2, c]) + (S[3, d] % 32);
 /* Note: There is a good chance that the following line will execute faster
*/
 /* y = ((S[0,a] + (S[1, b] & 0x001F)) ^ S[2, c]) + (S[3,d] & 0x001F); */
 return y;
}
void Blowfish_encipher(unsigned long *xl, unsigned long *xr)
{
 unsigned long Xl;
 unsigned long Xr;
 unsigned long temp;
 short i;

 Xl = *xl;
 Xr = *xr;
 for (i = 0; i < N; ++i) {
 Xl = Xl ^ P[i];
 Xr = F(Xl) ^ Xr;

 temp = Xl;
 Xl = Xr;
 Xr = temp;
 }
 temp = Xl;
 Xl = Xr;
 Xr = temp;

 Xr = Xr ^ P[N];
 Xl = Xl ^ P[N + 1];

 *xl = Xl;

 *xr = Xr;
}
void Blowfish_decipher(unsigned long *xl, unsigned long *xr)
{
 unsigned long Xl;
 unsigned long Xr;
 unsigned long temp;
 short i;

 Xl = *xl;
 Xr = *xr;

 for (i = N + 1; i > 1; --i) {
 Xl = Xl ^ P[i];
 Xr = F(Xl) ^ Xr;

 /* Exchange Xl and Xr */
 temp = Xl;
 Xl = Xr;
 Xr = temp;
 }
 /* Exchange Xl and Xr */
 temp = Xl;
 Xl = Xr;
 Xr = temp;

 Xr = Xr ^ P[1];
 Xl = Xl ^ P[0];

 *xl = Xl;
 *xr = Xr;
}
short InitializeBlowfish(char key[], short keybytes)
{
 short i;
 short j;
 short k;
 short error;
 short numread;
 unsigned long data;
 unsigned long datal;
 unsigned long datar;
 /* First, open the file containing the array initialization data */
 error = opensubkeyfile();
 if (error == noErr) {
 for (i = 0; i < N + 1; ++i) {
 numread = fread(&data, 4, 1, SubkeyFile);
 printf("%d : %d : %.4s\n", numread, i, &data);
 if (numread != 1) {
 return DATAERROR;
 } else {
 P[i] = data;
 }
 }
 for (i = 0; i < 4; ++i) {
 for (j = 0; j < 256; ++j) {
 numread = fread(&data, 4, 1, SubkeyFile);
 printf("[%d, %d] : %.4s\n", i, j, &data);
 if (numread != 1) {

 return DATAERROR;
 } else {
 S[i, j] = data;
 }
 }
 }
 fclose(SubkeyFile);
 j = 0;
 for (i = 0; i < 18; ++i) {
 data = 0x00000000;
 for (k = 0; k < 4; ++k) {
 data = (data << 8) key[j];
 j = j + 1;
 if (j > keybytes) {
 j = 0;
 }
 }
 P[i] = P[i] ^ data;
 }
 datal = 0x00000000;
 datar = 0x00000000;
 for (i = 0; i < 18; i += 2) {
 Blowfish_encipher(&datal, &datar);

 P[i] = datal;
 P[i + 1] = datar;
 }
 for (j = 0; i < 4; ++j) {
 for (i = 0; i < 256; i += 2) {
 Blowfish_encipher(&datal, &datar);

 S[j, i] = datal;
 S[j, i + 1] = datar;
 }
 }
 } else {
 printf("Unable to open subkey initialization file : %d\n", error);
 }
 return error;
}
End Listings





















April, 1994
The Wavelet Packet Transform


Extending the wavelet transform




Mac A. Cody


Mac is an engineering specialist at E-Systems' Garland Division in Dallas,
Texas. He can be contacted at 214-205-6452 and on Internet at mcody@aol.com


The wavelet transform enables analysis of data at multiple levels of
resolution (also known as "scale"). In addition, transient events in the data
are preserved by the analysis. When the wavelet transform (WT) is applied to a
signal in the time domain, the result is a two-dimensional, time-scale domain
analysis of the signal. The transform has proven useful for the compression
and analysis of signals and images.
The fast wavelet transform (FWT) is an efficient implementation of the
discrete wavelet transform (DWT). The DWT is the WT as applied to a regularly
sampled data sequence. The transform of the data exhibits discrete steps in
time on one axis, and discrete steps of resolution on another. The algorithm
and a C language implementation of the FWT were presented in my article, "The
Fast Wavelet Transform" (DDJ, April 1992).
As demonstrated in that article, the superiority of the DWT over the discrete
Fourier transform (DFT) is in the DWT's simultaneous localization of frequency
and time, something that DFTs can't do. As a trade-off, the frequency
divisions in the DWT are not in integral steps. Instead, the divisions are in
octave bands. Each level of the transform represents a frequency range half as
wide as that of the level above it and twice as wide as that of the level
below it; see Figure 1(a).
Conversely, the time scale on each level is twice that of the level below it
and half that of the level above it; see Figure 1(b). This characteristic of
the DWT poses problems when attempting to localize higher frequencies.
Discrimination of frequency is sacrificed for time localization at the higher
levels in the transform.
It turns out that the DWT is actually a subset of a far more versatile
transform, the wavelet packet transform (WPT). Developed by Dr. Ronald A.
Coifman of Yale University, the WPT generalizes the time-frequency analysis of
the wavelet transform. It yields a family of orthonormal transform bases of
which the wavelet transform basis is but one member.
In this article I'll develop the wavelet packet transform algorithm from its
roots in the wavelet transform algorithm. After that, I'll present C code to
implement the algorithm.


From a Humble Root a Tree Shall Grow


In the fast wavelet transform algorithm, the sampled data set is passed
through the scaling and wavelet filters (the convolution operation). They are,
respectively, low-pass and high-pass filters with complementary bandwidths,
also known as a quadriture mirror filter (QMF) pair. The outputs of both
filters are decimated (desampled) by a factor of two. The high-pass filtered
data set is the wavelet transform detail coefficients at that level of scale
of the transform. The low-pass filtered data set is the approximation
coefficients at that level of scale. Due to the decimation, both sets of
coefficients have half as many elements as the original data set.
The approximation coefficients can now be used as the sampled data input for
another pair of wavelet filters, identical to the first pair, generating
another set of detail and approximation coefficients at the next-lower level
of scale. This process can continue until the limit for the unit interval is
reached. For example, if it is desired that the transform have six levels (5
through 0), then the unit interval must be 64 (26) samples long. The data set
can be of any length as long as it has an integral number of unit intervals.
The resulting algorithm is the forward fast wavelet transform tree algorithm;
see Figure 2(a).
You can turn the tree algorithm on end, with the initial data input at the top
and the detail and approximation coefficients fanning out towards the bottom.
The fast wavelet transform algorithm can now be viewed as a partial graph of a
binary tree (the significance of this will be seen shortly); see Figure 2(b).
The flow of the algorithm moves down and to the left, forming new levels of
the transform from the approximation coefficients at higher levels. The detail
"branches" are not used for further calculations.
Observe that the wavelet transform operation can be stopped at any level while
working down the tree. The resulting "partial" transform is still a valid
orthonormal transform. For example, if the unit interval for a data set were
32 points (25), the corresponding transform would have five levels (4 through
0). If the transform operation were stopped at level 2, the transform would
have only three levels, but the approximation and detail coefficients of the
transform would correspond exactly to a wavelet transform with a unit interval
of 8 (23) samples; see Figure 2(b).
The implication of this observation is that the QMF pair is an orthonormal
transform kernel, just as butterfly operation is the kernel of the FFT. As
long as the filters are designed to be orthonormal wavelet filters and the
original data set meets the unit-interval requirement described above,
repeated applications of the kernel will always yield an orthonormal
transform.
Now, the set of detail and approximation coefficients at each level of the
transform forms a pair of subspaces of the approximation coefficients of the
next-higher level of scale and, ultimately, of the original data set. The
subspaces created by the wavelet transform roughly correspond to the frequency
subbands shown in Figure 1(a). These subspaces form a disjoint cover of the
frequency space of the original data set. In other words, the subspaces have
no elements in common, and the union of the frequency subbands span the
frequency range of the original data set.
What Coifman proposed is that any set of subspaces which are a disjoint cover
of the original data set is an orthonormal basis. The wavelet transform basis
is then but one of a family of orthonormal bases with different subband
intervals. As with the wavelet transform basis, each disjoint cover roughly
corresponds to a covering of the frequency space of the original signal.
Coifman dubbed this family a "wavelet packet library." The various orthonormal
bases are formed by arbitrary applications of the orthonormal transform kernel
upon the detail coefficients as well as the approximation coefficients of
higher transform levels.
The application of the transform kernel to both the detail and approximation
coefficients results in an expansion of the structure of the fast wavelet
transform tree algorithm. The tree algorithm for the wavelet packet transform
can be represented as a full binary tree; see Figure 3. As read from left to
right, the a and d symbols at each node indicate the order of orthonormal
transform kernel filter operations performed which yield each particular
subspace of the original data set. Each node in the transform tree is also
representative of a particular wavelet packet. The transform coefficients
computed at each node are a correlation of the original data set and a
waveform function representing the wavelet packet.
For example, the sequence aaad in Figure 3 represents four operations of the
orthonormal transform kernel representing one of 48 possible wavelet packets.
(The combination of all possible translations in time and dilation in scale
for the wavelet packets is J*2J; in this instance J equals 4.) The first three
represent low-pass filter/decimation operations performed by the transform
kernel. The fourth represents a high-pass filter/decimation operation
performed by the transform kernel. This subspace should be recognizable as
exactly the level 0 detail coefficients of the wavelet transform. The
operations of the orthonormal transform kernel correspond to the wavelet
function of the wavelet transform. Likewise, the wavelet packet represented by
aaaa is the scaling function of the wavelet transform.


Packets, Graphs, and Bases


The wavelet transform basis is actually a subset of a family of bases formed
by the wavelet packet transform. The heavy lines in Figure 3 indicate the
graph forming the wavelet basis. Note that the wavelet basis consists of the
subspaces d, ad, aad, aaad, and aaaa. The sequences a, aa, and aaa are
intermediate steps leading to the generation of the subspaces of the wavelet
basis at the lower levels. Since the orthonormal transform kernel can be
arbitrarily applied to either approximation or detail branches on the tree,
J*2J graphs representing different orthonormal bases can be created; see
Figure 4.
The variety of orthonormal bases which can be formed by the WPT algorithm,
coupled with the infinite number of wavelet and scaling functions which can be
created, yields a very flexible analysis tool. The flexibility of WPT versus
the FWT can be compared to that of having a complete set of sockets for a
ratchet rather than a single socket to attach to it. The ratchet (algorithm)
works the same regardless of the socket (basis) that is chosen. The
flexibility of the tool is in choosing the appropriate socket (basis) for the
particular nut (problem). The choice of wavelet and scaling functions is then
analogous to selecting from English, metric, or Torx socket sets for use with
the ratchet. The WPT allows tailoring of the wavelet analysis to selectively
localize spectral bands in the input data as well as to correlate the signal
to the wavelet. Not only can the best wavelet be chosen to analyze a
particular signal but the best orthonormal basis can as well. In
signal-processing terminology, the various bases of the wavelet packet
transform can be used as arbitrary adaptive tree-structured filter banks.


Piece-wise Convolutions and Traversing the Tree


The implementation of the WPT is itself a generalization of the FWT routine
presented in my previous article. As with the FWT, the kernel operations are
the decimating and interpolating convolutions, as presented in Listing One
(page 101). The convolutions performed are actually piece-wise convolutions,
due to the discrete nature of the data. The routine DualConvDec2 replaces the
routines ConvolveDec2 and Dotp in the FWT code. The routine DualConvInt2Sum
replaces ConvolveInt2, DotpOdd, and DotpEven in the inverse FWT code.
Both convolution routines are designed to operate upon aperiodic data of
finite length. The data does not represent an infinitely repeating pattern and
is assumed to be surrounded by zero-valued data; see Figure 5. To support this
data model, each data array is appended with additional data storage equal to
the length of the wavelet filter, minus one (the shaded elements). The extra
data is filled with the terminating convolution values as the wavelet filters
"slide off" the end of the data set. Extending the convolution data by this
additional amount during the decomposition process of the forward transform
ensures perfect reconstruction of the original input data by the inverse
transform. DualConvDec2 also uses partial dot products at both ends of the
convolution to simulate the implied zero-valued data (the dotted lines)
outside of the data array. The dotted lines on the filter elements indicate
coefficients not used in the partial dot product calculations. DualConvInt2Sum
does not need to do this, since all information necessary for reconstruction
is contained within the extended data arrays. Note that DualConvInt2Sum
performs only the odd-valued dot product at the beginning of the convolution
for the initial, reconstruction data point.
The WPT data is stored in the structure WPTstruct, defined in Figure 6(a). The
structure contains storage for the number of levels in the transform, the
length of the original, untransformed data array, and a pointer to a
two-dimensional matrix of data arrays. The size of the matrix in dependent
upon three factors. These are the number of levels in the transform, the
length of the original data array, and the length of the transform filters.
The length of the data array is itself affected by the length of the transform
filters; see Figure 6(b).
Figure 6(c) shows the matrix structure as the wavelet packet binary tree. The
data-array pointers are allocated memory from the heap as necessary to form
the disjoint cover for the chosen orthonormal basis. Those array pointers not
required for the disjoint cover are set to zero. The example shown in the
figure represents a three-level wavelet basis for an input of 40 data points
with transform filters containing six coefficients.
The routine DualConvDec2 is used by AWPT, the forward wavelet packet transform
routine; see Listing Two (page 101). AWPT accepts pointers to the input data
array, the WPT data structure, and the transform filter arrays, and the length
of the filters. The transform routine works down the levels of the binary tree
performing convolutions on the data arrays. Each level of the binary tree is
traversed by taking adjacent pairs of array pointers as the destination nodes
for the low-pass and high-pass convolutions. If the array pointer for the
low-pass convolution is zero, the convolutions are not performed since the
destinations are not part of the current disjoint cover.
At the highest level (where i equals 0) the input data array is the source for
the convolutions. On each subsequent level, the sources are the arrays on the
previous level. The appropriate array in the binary tree is determined by
dividing the current j index by 2. After each convolution operation, the data
length is divided by two, in order to keep track of the effect of the
decimation operation during the convolutions.
DualConvInt2Sum is used by IAWPT, the inverse wavelet packet transform
routine. IAWPT accepts pointers to the source WPT data structure, the output
data array, the transform filter arrays, and the length of the filters. The
inverse transform routine works up the levels of the binary tree, performing
convolutions on the data arrays and reconstructing the higher-resolution data
on each level. Each level of the binary tree is traversed in the same fashion
as was AWPT. The destination arrays are on the next-higher level of the
matrix, and they are selected by dividing the j index by 2. At the highest
level (i equals 0), the destination array is the output data array. After each
convolution operation, the data length is doubled to keep track of the effect
of the interpolation operation during the convolutions.
The code listings presented here are written in ANSI C and have been tested
with Borland Turbo C 2.0. They should compile on any compiler that is
compliant with ANSI C. I've also written a wavelet packet transform
demonstration program which is available electronically; see "Availability" on
page 3. The electronic version includes the demo program, sample data files,
support drivers, and documentation.


Conclusion



The wavelet packet transform generalizes the discrete wavelet transform and
provides a more flexible tool for the time-scale analysis of data. All of the
advantages of the fast wavelet transform are retained since the wavelet basis
is in the repertoire of bases available with the wavelet packet transform.
Given this, the wavelet packet transform may eventually become a standard tool
in signal processing.


References


Cody, Mac A. "The Fast Wavelet Transform." Dr. Dobb's Journal (April 1992).
Coifman, Ronald R., Yves Meyer, and Victor Wickerhauser. Wavelet Analysis and
Signal Processing. New Haven, CT: Yale University, 1991, preprint.
 Figure 1: (a) The Discrete Wavelet Transform (DWT) divides the spectrum of
the sampled data into octave bands; (b) the resolution at each level of the
DWT is half that of the level above it and twice that of the level below. At
lower levels, time resolution is sacrificed for frequency localization.
 Figure 2: The tree or pyramid algorithm of the forward fast wavelet transform
(a) can be viewed as a partial graph of a binary tree (b). FOr a particular
unit interval (2J samples), a maximum of J levels of transform data can be
formed.
 Figure 3: The wavelet packet transform viewed as a complete binary graph.
Each "a" and "d" in each sequence represents the filtering operations
performed to yield the particular subspace of the original signal. The bold
lines represent the disjoint cover known as the wavelet basis.
 Figure 4: Different disjoint covers formed from the WPT binary tree (Figure
3) yield different wavelet packet bases. (a) A subband basis; (b) an
orthonormal basis subset; (c) a basis which is the opposite of the wavelet
basis (better frequency localization at higher frequencies).
 Figure 5: The convolution operation in DualConvDec2 employs partial dot
products at both ends of the data array to simulate zero-valued data
surrounding it; (b) the convolution operating in DualConvInt2Sum performs dot
products with alternating odd and even filter components to simulate
interpolation. The additional convolution data generated by DualConvDec2 is
used to accomplish perfect reconstruction. Marked data elements indicate
starting points of dot-product calculations.
 Figure 6: (a) The data structure for the WPT. The REAL_TYPE declaration can
be defined either as float or double; (b) the storage structure for the
original signal consists of the data, plus appended storage equal to the
filter length, minus 1; (c) the data component of WPTstruct is a
two-dimensional matrix of data arrays forming the wavelet packet binary tree.
Data-array pointers set to 0 indicate parts of the tree that aren't part of
the disjoint cover (the wavelet basis).
[LISTING ONE] (Text begins on page 44.)

/* CONVOLVE.C */
/* Copyright (C) 1993 Mac A. Cody - All rights reserved */

#include "convolve.h"

/* DualConvDec2 - Convolution of data array with decomposition filters
 followed by decimation by two.
 Input(s): REAL_TYPE *src - Pointer to source data sample array.
 REAL_TYPE *htilda - Pointer to lowpass filter array.
 REAL_TYPE *gtilda - Pointer to highpass filter array.
 short srclen - length of source data array.
 short filtlen - length of filter arrays.
 Output(s): REAL_TYPE *adst - Pointer to approximation data sample array.
 REAL_TYPE *ddst - Pointer to detail data sample array.
*/
void DualConvDec2(REAL_TYPE *src, REAL_TYPE *adst, REAL_TYPE *ddst,
 REAL_TYPE *htilda, REAL_TYPE *gtilda, short srclen, short filtlen)
{
 short i, j, brklen, endlen;
 REAL_TYPE adot_p, ddot_p;
 REAL_TYPE *head_src, *lp_fltr, *hp_fltr;
 brklen = 1; /* initial break in dot product is after first element */
 /* perform truncated dot products until filter no longer hangs off end of
 array; decimation by two handled by two-element shift; break count
 increases by two on every iteration */
 for(j = 0; j < filtlen; j += 2, brklen += 2)
 {
 head_src = src + j; /* point to appropriate initial element at head */
 lp_fltr = htilda; /* set up pointer to lowpass filter */
 hp_fltr = gtilda; /* set up pointer to highpass filter */
 adot_p = *head_src * *lp_fltr++; /* initial lowpass product of head */
 ddot_p = *head_src-- * *hp_fltr++; /* initial highpass product of head */
 for(i = 1; i < brklen; i++) /* perform remaining products of head */
 {
 adot_p += *head_src * *lp_fltr++;
 ddot_p += *head_src-- * *hp_fltr++;
 }
 *adst++ = adot_p; /* save the completed lowpass dot product */
 *ddst++ = ddot_p; /* save the completed highpass dot product */
 }
 endlen = srclen + filtlen - 2; /* find total length of array */

 /* perform convolution to the physical end of the array
 with a simple dot product loop */
 for(; j <= endlen; j += 2)
 {
 head_src = src + j; /* point to appropriate initial element at head */
 lp_fltr = htilda; /* set up pointer to lowpass filter */
 hp_fltr = gtilda; /* set up pointer to highpass filter */
 adot_p = *head_src * *lp_fltr++; /* initial lowpass product */
 ddot_p = *head_src-- * *hp_fltr++; /* initial highpass product */
 for(i = 1; i < filtlen; i++) /* perform remaining products */
 {
 adot_p += *head_src * *lp_fltr++;
 ddot_p += *head_src-- * *hp_fltr++;
 }
 *adst++ = adot_p; /* save the completed lowpass dot product */
 *ddst++ = ddot_p; /* save the completed highpass dot product */
 }
 /* perform convolution off the physical end of the array
 with a partial dot product loop, like at the beginning */
 for(brklen = filtlen - 2, j = 2; brklen > 0; brklen -= 2, j += 2)
 {
 head_src = src + endlen; /* point to last element in array */
 lp_fltr = htilda + j; /* set up pointer to lowpass filter offset */
 hp_fltr = gtilda + j; /* set up pointer to highpass filter offset*/
 adot_p = *head_src * *lp_fltr++; /* initial lowpass product */
 ddot_p = *head_src-- * *hp_fltr++; /* initial highpass product */
 for(i = 1; i < brklen; i++) /* perform remaining products */
 {
 adot_p += *head_src * *lp_fltr++;
 ddot_p += *head_src-- * *hp_fltr++;
 }
 *adst++ = adot_p; /* save the completed lowpass dot product */
 *ddst++ = ddot_p; /* save the completed highpass dot product */
 }
} /* End of DualConvDec2 */
/* DualConvInt2Sum - Convolution of data array with reconstruction
 filters with interpolation by two and sum together.
 Input(s): REAL_TYPE *asrc - Pointer to approximation data sample array.
 REAL_TYPE *dsrc - Pointer to detail data sample array.
 REAL_TYPE *h - Pointer to lowpass filter array.
 REAL_TYPE *g - Pointer to highpass filter array.
 short srclen - length of source data array.
 short filtlen - length of filter arrays.
 Output(s): REAL_TYPE *dst - Pointer to output data sample array.
*/
void DualConvInt2Sum(REAL_TYPE *asrc, REAL_TYPE *dsrc, REAL_TYPE *dst,
 REAL_TYPE *h, REAL_TYPE *g, short srclen, short filtlen)
{
 short i, j, endlen;
 REAL_TYPE dot_pe, dot_po;
 REAL_TYPE *head_asrc, *head_dsrc, *lp_fltr, *hp_fltr;
 endlen = srclen + filtlen - 2; /* find total length of array */
 filtlen /= 2; /* adjust filter length value for interpolation */
 j = filtlen - 1; /* start with filter covering' end of array */
 head_asrc = asrc + j; /* point to initial element at head */
 head_dsrc = dsrc + j; /* point to initial element at head */
 lp_fltr = h + 1; /* set up pointer to lowpass filter */
 hp_fltr = g + 1; /* set up pointer to highpass filter */
 /* initial lowpass and highpass odd product */

 dot_po = *head_asrc-- * *lp_fltr + *head_dsrc-- * *hp_fltr;
 lp_fltr += 2; hp_fltr += 2; /* skip over even filter elements */
 for(i = 1; i < filtlen; i += 2) /* perform remaining products */
 {
 dot_po += *head_asrc-- * *lp_fltr + *head_dsrc-- * *hp_fltr;
 lp_fltr += 2; hp_fltr += 2; /* skip over even filter elements */
 }
 /* save the completed lowpass and highpass odd dot product */
 *dst++ = dot_po;
 /* perform initial convolution with a simple dot product loop */
 for(j++; j <= endlen; j++)
 {
 head_asrc = asrc + j; /* point to appropriate initial element at head */
 head_dsrc = dsrc + j; /* point to appropriate initial element at head */
 lp_fltr = h; /* set up pointer to lowpass filter */
 hp_fltr = g; /* set up pointer to highpass filter */
 /* initial lowpass and highpass even product */
 dot_pe = *head_asrc * *lp_fltr++ + *head_dsrc * *hp_fltr++;
 /* initial lowpass and highpass odd product */
 dot_po = *head_asrc-- * *lp_fltr++ + *head_dsrc-- * *hp_fltr++;
 for(i = 1; i < filtlen; i++) /* perform remaining products */
 {
 dot_pe += *head_asrc * *lp_fltr++ + *head_dsrc * *hp_fltr++;
 dot_po += *head_asrc-- * *lp_fltr++ + *head_dsrc-- * *hp_fltr++;
 }
 /* save the completed lowpass and highpass even dot product */
 *dst++ = dot_pe;
 /* save the completed lowpass and highpass odd dot product */
 *dst++ = dot_po;
 }
} /* End of DualConvInt2Sum */


[LISTING TWO]

/* AWPT.C */
/* Copyright (C) 1993 Mac A. Cody - All rights reserved */

#include "wp_types.h"
#include "awpt.h"
#include "convolve.h"

/* AWPT - Aperiodic Wavelet Packet Transform: Data is assumed to be
 non-periodic, so convolutions do not wrap around arrays.
 Convolution data past end of data is generated and retained
 to allow perfect reconstruction of original input.
 Input(s): REAL_TYPE *indata - Pointer to input data sample array.
 REAL_TYPE *htilda - Pointer to lowpass filter array.
 REAL_TYPE *gtilda - Pointer to highpass filter array.
 short filtlen - Length of filter arrays.
 Output(s): WPTstruct *out - Pointer to transform data structure.
 Note: Structure pointed to by out' contains:
 out->levels - Number of levels in transform (short).
 out->length - Length of input data sample array (short).
 out->data - Pointer to pointer of arrays of data (REAL_TYPE ***).
*/
void AWPT(REAL_TYPE *indata, WPTstruct *out,
 REAL_TYPE *htilda, REAL_TYPE *gtilda, short filtlen)
{

 short i, j, nodes, datalen;
 REAL_TYPE *src;
 levels = out->levels;
 datalen = out->length; /* start with length of input array */
 /* loop for all levels, halving the data length on each lower level */
 for (i = 0, nodes = 2; i < levels; i++, nodes <<= 1)
 {
 for(j = 0; j < nodes; j += 2)
 {
 if(out->data[i][j] == 0) continue;
 if(i == 0) /* ... source for highest level is input data */
 src = indata;
 else /* ... source is corresponding node of higher level */
 src = out->data[i-1][j >> 1];
 DualConvDec2(src, out->data[i][j], out->data[i][j+1],
 htilda, gtilda, datalen, filtlen);
 }
 datalen /= 2; /* input length for next level is half this level */
 }
} /* End of AWPT */
/* IAWPT - Inverse Aperiodic Wavelet Packet Transform: Data is assumed to be
 non-periodic, so convolutions do not wrap arround arrays.
 Convolution data past end of data is used to generate perfect
 reconstruction of original input.
 Input(s): WPTstruct *in - Pointer to transform data structure.
 REAL_TYPE *htilda - Pointer to lowpass filter array.
 REAL_TYPE *gtilda - Pointer to highpass filter array.
 short filtlen - Length of filter arrays.
 Note: Structure pointed to by in' contains:
 in->levels - Number of levels in transform (short).
 in->length - Length of output data sample array (short).
 in->data - Pointer to pointer of arrays of data (REAL_TYPE ***).
 Output(s): REAL_TYPE *indata - Pointer to input data sample array.
*/
void IAWPT(WPTstuct *in, REAL_TYPE *outdata,
 REAL_TYPE *htilda, REAL_TYPE *gtilda, short filtlen)
{
 short i, j, levels, nodes, datalen;
 REAL_TYPE *dst;
 levels = in->levels;
 /* start with length of lowest level input array */
 datalen = in->length >> levels;
 /* loop for all levels, doubling the data length on each higher level;
 destination of all but the highest branch of the reconstruction
 is the next higher node */
 for (i = levels - 1, nodes = 1 << levels; i >= 0; i--, nodes >>= 1)
 {
 for(j = 0; j < nodes; j += 2)
 {
 if(out->data[i][j] == 0) continue;
 if(i == 0) /* ... destination for highest level is input data */
 dst = outdata;
 else /* ... destination is corresponding node of higher level */
 dst = in->data[i - 1][j >> 1];
 DualConvInt2Sum(in->data[i][j], in->data[i][j+1], dst,
 htilda, gtilda, datalen, filtlen);
 }
 datalen *= 2; /* input length for next level is half this level */
 }

}/* End of IAWPT */
End Listings




























































April, 1994
Fuzzy Logic in C: An Update


John A.R. Tucker, Phillip E. Fraley, and Lawrence P. Swanson


John teaches computer courses at Albright College and Reading Area Community
College; Phillip is working on several projects, including proton models,
large color images, and neural networks; Lawrence currently works as a test
engineer. They can be reached through the DDJoffices.


Early last year, we were looking for a software implementation of fuzzy logic.
Greg Viot's article "Fuzzy Logic in C" (DDJ, February 1993) was a step towards
what we needed, but it didn't include the necessary initialization, parsing,
and output functions. Consequently, we filled in the gaps by writing functions
that, together with Greg's code, make a working fuzzy-logic program you can
use. Listing One (page 101) is the complete source code for the updated
version (which includes Greg's original code and our additions). The
enhancements, which we'll focus on in this article, are shaded as well as
identified in the comments. For background on fuzzy logic in general and
Greg's techniques in particular, refer to his original article.


Rule Files and Structures


We saw right away that the parsing of the rules file would be a problem to
generalize for all possible combinations of antecedents and consequences, so
we elected to simplify the problem by allowing only two antecedents and one
consequence.
The generalized case would have resulted in loss of clarity. We didn't try to
be clever about our functions; in fact, they are quite direct (three segments
are repetitive). The extensive use of linked lists and pointers to structures
in initialize_system() related to the rules are, however, quite involved. Nor
did we optimize or generalize the code. This makes it possible for you to
modify the code to accept other input files by copying existing code segments
and making minor adjustments. Finally, we took full advantage of understanding
the input data structures for the specific example of the inverted-pendulum
problem Greg described.
To allow for easy alteration of the fuzzy sets or rule definitions, we used
three ASCII files with fixed names and formats as the input files that
describe the fuzzy sets (angle, velocity, and force). Similarly, an ASCII file
is used to describe the rules file. These four files are to be located in a
common directory from which the program is run.
In the three files describing the fuzzy sets (in1, in2, and out1), you can use
any name ten characters or less in length on the first line as a name for the
input fuzzy set. The first column of the subsequent lines is for the name of
the membership element of that fuzzy set, again limited to ten characters. The
next four columns describe the corner points of the membership (if the third
and fourth columns are the same, the shape is a triangle). White-space,
spaces, or tabs separate the columns. You may have as many rows of membership
elements as you please, but five, seven, or nine seem to be the best choices.
Take care not to include any blank rows.
The first file, in1 (angle), looks like Figure 1. The files in2 (velocity) and
out1 (force) are similar. In initialize_system(), we have three nearly
identical code fragments. You can block copy them and make the few changes
required. The cycle is as follows: Open the file, set a pointer and allocate
memory, read the fuzzy set's name, read a line of data from the file, set a
pointer to the next structure, assign values to the structure elements, and
lastly, close the file. The differences in these three segments are in lines 1
and 2 (the filenames are different), lines 5 and 6 (the pointers point to
differing places), and lines 27 and 33, where the filenames in the error
messages are different.
We included an error trap to detect if either slope1 or slope2 is less than or
equal to 0, a condition not allowed in the original program. If such an error
is encountered, the program exits with appropriate information. The setting of
the pointers for the rules file is more complex. In the original article, Greg
suggested a file that looked like Figure 2(a). Although we liked the form of
this file, it was complex to parse so we stripped the file to its essential
elements: the name of the fuzzy set elements and the order in which they
appear in each rule; see Figure 2(b). We used an awk and sed pipeline to strip
the "rule" file and create a more suitable form for parsing in a file named
"rules." (The command line awk {print $6, $10, $14}' rulesed s/)//g'> rules
does this elegantly. You can create the rules file directly, as with the input
files, and eliminate the clearer representation of the rule file entirely if
you do not have these tools.) You can have more or less than the 15 rules in
Greg's article. Add or delete them as you please, one rule per row. Be certain
that membership-element names are exact matches in all the files, including
the rules file. In particular, note that upper and lower case are not
equivalent.
Once initialize_system() is written, you're limited to two inputs and one
output. You will need to make changes to the arguments for fscanf() and define
new buffers to accommodate any other combination.
The insight to the rules structures initialization is that structures of
rule_type and rule_element_type form the acceptable rules at the time of
initialization. That is, appropriate fuzzy inputs (antecedents) are associated
(linked) with a fuzzy output (consequence) as defined in the rules at the time
the rules file is read. Values in the mf_type structure are pointed to by the
pointer stored in the rule_element_type *value. Later, if a 0 is pointed to by
any of the if_side value pointers, the function defuzzification() will equate
to 0, and subsequent calculation of the sum_of_products and sum_of_areas will
not be affected. See Figure 3 for the complete relationship of all the data
structures and their pointers.


Using the Updated Program


To illustrate how you can use the updated fuzzy-logic program, we'll refer you
to the rule in Figure 4, where we begin by opening the rules file, allocating
memory for a structure rule_type and setting a pointer to it, and scanning the
rules one line at a time. As each line (rule) is read, we "know" that the
first field in the line is the angle (structure io_type, pointed to by
*membership_functions), so we begin searching its fuzzy-set members
(structures mf_type), doing a string match on the membership element name, NL.
When the match is found, memory for a rule_element_type structure is
allocated, the address to the value element of the matching mf_structure is
stored, and a pointer to rule_type, pointed to by the *if_side. A pointer to
the second field (in this case, velocity) is also established as a pointer
(element *next) in rule_element_type.
The second field of the rule (velocity) is then used to search for a string
match on its membership-element name, ZE, in the second io_structure
*membership_function, and a pointer to the address where its value is located
is stored in the next rule_element_type *value. Finally, the last element of
the rule, the consequence (force, in our example), is treated in the same
manner: The address of the match is stored in the rule_element_type *value
pointed to by the rule_type *then side when the appropriate membership element
name, PL, is matched.
These steps are repeated for every rule in the rules file; refer again to the
first three rules in Figure 3.
To complete the alterations, other changes included placing the two anchor
pointers System_Output and System_Inputs as global pointers along with the
existing Rule_Base, adding macro definitions for max and min for
cross-compiling onto MS-DOS platforms, and adding the #include for the
function strcmp(); see Example 1. We also included the necessary function to
accept two inputs from the command line as arguments for the initial condition
get_system_inputs() and a function put_system_outputs() to examine the exit
status of a single inference pass on the input data.
After using the code with various inputs, we needed to add error traps because
we were getting core dumps with certain input. These were caused by division
by zero when there were no rules in the set to cover the condition.
Consequently, we added the code in Example 2(a) to the original function
defuzzification(). We also added Example 2(b) to rule_evaluation().
To further illustrate how you can use the program, assume the scaled angle of
60 and a scaled velocity of 125 as in Figure 5. The line force: Value=134
reflects the defuzzified and scaled single-valued output for the two inputs.
It would be instructive to interface this program to a graphics output device
where a loop could be created and the inverted pendulum balanced.
Alternatively, a batch file or shell script could feed new inputs and use the
output to generate the two new inputs, storing intermediate data in a file.
Or, you might try graphing the trapezoidal output areas made on each
iteration.


Completing afuzzy-basedinference engine


Figure 1: The in1 file; values can be altered as desired.
 Angle
NL 0 31 31 63
NM 31 63 63 95
NS 63 95 95 127
ZE 95 127 127 159
PS 127 159 159 191
PM 159 191 191 223
PL 191 223 223 255



Figure 2: (a) Original rules file; (b) modified rules file.
(a) rule 1: IF (angle is NL) AND (velocity is ZE) THEN (force is PL)rule 2: IF
(angle is ZE) AND (velocity is NL) THEN (force is PL)rule 3: IF (angle is NM)
AND (velocity is ZE) THEN (force is PM)
 ...rule 15: IF (angle is PL) AND (velocity is ZE) THEN (force is NL)

(b) NL ZE PLZE NL PLNM ZE PM
 ...PL ZE NL


 Figure 3: Relationship of data structures and their pointers.

Figure 4: Sample rule used to develop Figure 3.
rule 1: IF (angle is NL) AND (velocity is ZE)
THEN (force is PL)

Figure 5: Output generated with a scaled angle of 60 and scaled velocity of
125.
fuzz 60 125
angle: Value=60
 NL: Value 21 Left 0 Right 63
 NM: Value 203 Left 31 Right 95
 NS: Value 0 Left 63 Right 127
 ZE: Value 0 Left 95 Right 159
 PS: Value 0 Left 127 Right 191
 PM: Value 0 Left 159 Right 223
 PL: Value 0 Left 191 Right 255
velocity: Value=125
 NL: Value 0 Left 0 Right 64
 NM: Value 0 Left 31 Right 95
 NS: Value 14 Left 63 Right 127
 ZE: Value 210 Left 95 Right 159
 PS: Value 0 Left 127 Right 191
 PM: Value 0 Left 159 Right 223
 PL: Value 0 Left 191 Right 255
force: Value=134
 NL: Value 0 Left 0 Right 63
 NM: Value 203 Left 31 Right 95
 NS: Value 0 Left 63 Right 127
 ZE: Value 0 Left 95 Right 159
 PS: Value 0 Left 127 Right 191
 PM: Value 203 Left 159 Right 223
 PL: Value 21 Left 191 Right 255
 Rule #1: 21 210 21
 Rule #2: 0 0 21
 Rule #3: 203 210 203
 Rule #4: 0 0 203
 Rule #5: 0 210 0
 Rule #6: 0 14 0
 Rule #7: 0 0 0
 Rule #8: 0 210 0
 Rule #9: 0 0 0
 Rule #10: 0 210 0
 Rule #11: 0 14 0
 Rule #12: 0 0 203
 Rule #13: 203 210 203
 Rule #14: 0 0 0
 Rule #15: 0 210 0



Example 1: Adding the #include, global pointers, and macros.
#include <string.h>
#define max(a,b) (a<b ? b : a)
#define min(a,b) (a>b ? b : a)
struct io_type *System_Inputs;
struct io_type *System_Output;



Example 2: (a) Code added to the original function defuzzification(); (b) code
added to rule_evaluation().
(a) if(sum_of_areas==0){ printf("Sum of Areas = 0, will cause div error\n");
printf("Sum of Products= %d\n",sum_of_products); so->value=0; return;}

(b) int nomatch=0;for(tp=rule->then_side;tp!=NULL;tp=tp->next){
*(tp->value)=max(strength,*(tp->value));
if(strength>0)nomatch=1;}}if(nomatch==0)printf("NO MATCHING RULES FOUND!\n");

[LISTING ONE] (Text begins on page 56.)

/* Update to Greg Viot's fuzzy system -- DDJ, February 1993, page 94 */
/* By J. Tucker, P. Fraley, and L. Swanson, April 1993 */
#include <stdio.h>
#include <string.h> /* NEW */
# define max(a,b) (a<b ? b : a) /* NEW */
# define min(a,b) (a>b ? b : a) /* NEW */
struct io_type *System_Inputs; /* anchor inputs NEW */
struct io_type *System_Output; /* anchor output NEW */
#define MAXNAME 10
#define UPPER_LIMIT 255
struct io_type{
 char name[MAXNAME];
 int value;
 struct mf_type *membership_functions;
 struct io_type *next;
 };
struct mf_type{
 char name[MAXNAME];
 int value;
 int point1;
 int point2;
 float slope1;
 float slope2;
 struct mf_type *next;
 };
struct rule_type{
 struct rule_element_type *if_side;
 struct rule_element_type *then_side;
 struct rule_type *next;
 };
struct rule_element_type{
 int *value;
 struct rule_element_type *next;
 };
struct rule_type *Rule_Base;

main(argc,argv) /* NEW */
int argc; /* NEW */
char *argv[]; /* NEW */
{ int input1, input2; /* NEW */
 if(argc!=3) /* NEW */
 { printf("Error - Must supply 2 numeric inputs.\n"); /* NEW */
 printf(" Inputs scaled to range 0-255.\n"); /* NEW */
 printf("Usage: %s angle velocity\n",argv[0]); /* NEW */
 exit(0); /* NEW */
 } /* NEW */
 input1=atoi(argv[1]); /* NEW */
 input2=atoi(argv[2]); /* NEW */
 initialize_system(); /* Read input files, NEW */
 get_system_inputs(input1,input2); /* Get & put argv NEW */
 fuzzification();
 rule_evaluation();

 defuzzification();
 put_system_outputs(); /* print all data, NEW */
} /* END MAIN */
fuzzification()
{ struct io_type *si;
 struct mf_type *mf;
 for(si=System_Inputs;si!=NULL;si=si->next)
 for(mf=si->membership_functions;mf!=NULL;mf=mf->next)
 compute_degree_of_membership(mf,si->value);
} /* END FUZZIFICATION */
rule_evaluation()
{ struct rule_type *rule;
 struct rule_element_type *ip; /* if ptr */
 struct rule_element_type *tp; /* then ptr */
 int strength;
 int nomatch=0; /* NEW, test some rules */
 for(rule=Rule_Base;rule!=NULL;rule=rule->next)
 { strength=UPPER_LIMIT;
 for(ip=rule->if_side;ip!=NULL;ip=ip->next)
 strength=min(strength,*(ip->value));
 for(tp=rule->then_side;tp!=NULL;tp=tp->next)
 { *(tp->value)=max(strength,*(tp->value)); /* NEW */
 if(strength>0)nomatch=1; /* NEW */
 } /* NEW */
 }
 if(nomatch==0)printf("NO MATCHING RULES FOUND!\n"); /* NEW */
} /* END RULE EVALUATION */
defuzzification()
{ struct io_type *so;
 struct mf_type *mf;
 int sum_of_products;
 int sum_of_areas;
 int area, centroid;
 for(so=System_Output;so!=NULL;so=so->next)
 { sum_of_products=0;
 sum_of_areas=0;
 for(mf=so->membership_functions;mf!=NULL;mf=mf->next)
 { area=compute_area_of_trapezoid(mf);
 centroid=mf->point1+(mf->point2-mf->point1)/2;
 sum_of_products+=area*centroid;
 sum_of_areas+=area;
 }
 if(sum_of_areas==0) /* NEW */
 { printf("Sum of Areas = 0, will cause div error\n"); /* NEW */
 printf("Sum of Products= %d\n",sum_of_products); /* NEW */
 so->value=0; /* NEW */
 return; /* NEW */
 } /* NEW */
 so->value=sum_of_products/sum_of_areas;
 }
} /* END DEFUZZIFICATION */
compute_degree_of_membership(mf,input)
struct mf_type *mf;
int input;
{ int delta_1, delta_2;
 delta_1=input - mf->point1;
 delta_2=mf->point2 - input;
 if((delta_1<=0)(delta_2<=0))mf->value=0;
 else

 { mf->value=min((mf->slope1*delta_1),(mf->slope2*delta_2));
 mf->value=min(mf->value,UPPER_LIMIT);
 }
} /* END DEGREE OF MEMBERSHIP */
compute_area_of_trapezoid(mf)
struct mf_type *mf;
{ float run_1,run_2,area,top;
 float base;
 base=mf->point2 - mf->point1;
 run_1=mf->value / mf->slope1;
 run_2=mf->value / mf->slope2;
 top=base - run_1 - run_2;
 area=mf->value*(base+top)/2;
 return(area);
} /* END AREA OF TRAPEZOID */
initialize_system() /* NEW FUNCTION INITIALIZE */
{ int a, b, c, d, x;
 char buff[10],buff1[4],buff2[4];
 static char filename1[]="in1"; /* "angles" filename */
 static char filename2[]="in2"; /* "velocities" filename */
 static char filename3[]="out1"; /* "forces" filename */
 FILE *fp;
 struct io_type *outptr;
 struct mf_type *top_mf;
 struct mf_type *mfptr;
 struct io_type *ioptr;
 struct rule_type *ruleptr;
 struct rule_element_type *ifptr;
 struct rule_element_type *thenptr;
 ioptr=NULL;
 ruleptr=NULL;
 ifptr=NULL;
 thenptr=NULL;
/* READ THE FIRST FUZZY SET (ANTECEDENT); INITIALIZE STRUCTURES */
 if((fp=fopen(filename1,"r"))==NULL) /* open "angles" file */
 { printf("ERROR- Unable to open data file named %s.\n",filename1);
 exit(0);
 }
 ioptr=(struct io_type *)calloc(1,sizeof(struct io_type));
 System_Inputs=ioptr; /* Anchor to top of inputs */
 x=fscanf(fp,"%s",buff); /* from 1st line, get set's name */
 sprintf(ioptr->name,"%s",buff); /* into struct io_type.name */
 mfptr=NULL;
 while((x=fscanf(fp,"%s %d %d %d %d",buff,&a,&b,&c,&d))!=EOF)/* get line */
 { if(mfptr==NULL) /* first time thru only */
 { mfptr=(struct mf_type *)calloc(1,sizeof(struct mf_type));
 top_mf=mfptr;
 ioptr->membership_functions=mfptr;
 }
 else
 { for(mfptr=top_mf;mfptr->next;mfptr=mfptr->next); /* spin to last */
 mfptr->next=(struct mf_type *)calloc(1,sizeof(struct mf_type));
 mfptr=mfptr->next;
 }
 sprintf(mfptr->name,"%s",buff); /* membership name, NL, ZE, etc */
 mfptr->point1=a; /* left x axis value */
 mfptr->point2=d; /* right x axis value */
 if(b-a>0) mfptr->slope1=UPPER_LIMIT/(b-a); /* left slope */
 else

 { printf("Error in input file %s, membership element %s.\n",
 filename1,buff);
 exit(1);
 }
 if(d-c>0) mfptr->slope2=UPPER_LIMIT/(d-c); /* right slope */
 else
 { printf("Error in input file %s, membership element %s.\n",
 filename1,buff);
 exit(1);
 }
 }
 close(fp); /* close "angles" file */
/* READ THE SECOND FUZZY SET (ANTECEDENT); INITIALIZE STRUCTURES */
 if((fp=fopen(filename2,"r"))==NULL) /* open "velocity" file */
 { printf("ERROR- Unable to open data file named %s.\n",filename2);
 exit(0);
 }
 ioptr->next=(struct io_type *)calloc(1,sizeof(struct io_type));
 ioptr=ioptr->next;
 x=fscanf(fp,"%s",buff); /* from 1st line, get set's name */
 sprintf(ioptr->name,"%s",buff); /* into struct io_type.name */
 mfptr=NULL;
 while((x=fscanf(fp,"%s %d %d %d %d",buff,&a,&b,&c,&d))!=EOF)/* get line */
 { if(mfptr==NULL) /* first time thru only */
 { mfptr=(struct mf_type *)calloc(1,sizeof(struct mf_type));
 top_mf=mfptr;
 ioptr->membership_functions=mfptr;
 }
 else
 { for(mfptr=top_mf;mfptr->next;mfptr=mfptr->next); /* spin to last */
 mfptr->next=(struct mf_type *)calloc(1,sizeof(struct mf_type));
 mfptr=mfptr->next;
 }
 sprintf(mfptr->name,"%s",buff); /* membership name, NL, ZE, etc */
 mfptr->point1=a; /* left x axis value */
 mfptr->point2=d; /* right x axis value */
 if(b-a>0) mfptr->slope1=UPPER_LIMIT/(b-a); /* left slope */
 else
 { printf("Error in input file %s, membership element %s.\n",
 filename2,buff);
 exit(1);
 }
 if(d-c>0) mfptr->slope2=UPPER_LIMIT/(d-c); /* right slope */
 else
 { printf("Error in input file %s, membership element %s.\n",
 filename2,buff);
 exit(1);
 }
 }
 close(fp); /* close "velocity" file */
/* READ THE THIRD FUZZY SET (CONSEQUENCE); INITIALIZE STRUCTURES */
 if((fp=fopen(filename3,"r"))==NULL) /* open "force" file */
 { printf("ERROR- Unable to open data file named %s.\n",filename3);
 exit(0);
 }
 ioptr=(struct io_type *)calloc(1,sizeof(struct io_type));
 System_Output=ioptr; /* Anchor output structure */
 x=fscanf(fp,"%s",buff); /* from 1st line, get set's name */
 sprintf(ioptr->name,"%s",buff); /* into struct io_type.name */

 mfptr=NULL;
 while((x=fscanf(fp,"%s %d %d %d %d",buff,&a,&b,&c,&d))!=EOF)/* get line */
 { if(mfptr==NULL) /* first time thru */
 { mfptr=(struct mf_type *)calloc(1,sizeof(struct mf_type));
 top_mf=mfptr;
 ioptr->membership_functions=mfptr;
 }
 else
 { for(mfptr=top_mf;mfptr->next;mfptr=mfptr->next);
 mfptr->next=(struct mf_type *)calloc(1,sizeof(struct mf_type));
 mfptr=mfptr->next;
 }
 sprintf(mfptr->name,"%s",buff); /* membership name, NL, ZE, etc */
 mfptr->point1=a; /* left x axis value */
 mfptr->point2=d; /* right x axis value */
 if(b-a>0) mfptr->slope1=UPPER_LIMIT/(b-a); /* left slope */
 else
 { printf("Error in input file %s, membership element %s.\n",
 filename3,buff);
 exit(1);
 }
 if(d-c>0) mfptr->slope2=UPPER_LIMIT/(d-c); /* right slope */
 else
 { printf("Error in input file %s, membership element %s.\n",
 filename3,buff);
 exit(1);
 }
 }
 close(fp); /* close "force" file */
/* READ RULES FILE; INITIALIZE STRUCTURES */
 ioptr=NULL;
 outptr=NULL;
 if((fp=fopen("rules","r"))==NULL) /* open rules file */
 { printf("ERROR- Unable to open data file named %s.\n","rules");
 exit(0);
 }
 ruleptr=(struct rule_type *)calloc(1,sizeof(struct rule_type));
 if(ioptr==NULL)Rule_Base=ruleptr; /* first time thru, anchor */
 while((x=fscanf(fp,"%s %s %s",buff,buff1,buff2))!=EOF) /* get a line */
 { ioptr=System_Inputs; /* points to angle */
 for(mfptr=ioptr->membership_functions;mfptr!=NULL;mfptr=mfptr->next)
 { if((strcmp(mfptr->name,buff))==0)
 { ifptr=(struct rule_element_type *)
 calloc(1,sizeof(struct rule_element_type));
 ruleptr->if_side=ifptr; /* points to angle */
 ifptr->value=&mfptr->value; /* needs address here */
 ifptr->next=(struct rule_element_type *)
 calloc(1,sizeof(struct rule_element_type));
 ifptr=ifptr->next;
 break; /* match found */
 }
 }
 ioptr=ioptr->next; /* points to velocity */
 for(mfptr=ioptr->membership_functions;mfptr!=NULL;mfptr=mfptr->next)
 { if((strcmp(mfptr->name,buff1))==0)
 { ifptr->value=&mfptr->value; /* needs address here */
 break; /* match found */
 }
 }

 if(outptr==NULL)outptr=System_Output;/* point then stuff to output */
 for(mfptr=outptr->membership_functions;mfptr!=NULL;mfptr=mfptr->next)
 { if((strcmp(mfptr->name,buff2))==0)
 { thenptr=(struct rule_element_type *)
 calloc(1,sizeof(struct rule_element_type));
 ruleptr->then_side=thenptr;
 thenptr->value=&mfptr->value; /* needs address here */
 break; /* match found */
 }
 }
 ruleptr->next=(struct rule_type *)calloc(1,sizeof(struct rule_type));
 ruleptr=ruleptr->next;
 } /* END WHILE READING RULES FILE */
 close(fp); /* close "rules" file */
} /* END INITIALIZE */
put_system_outputs() /* NEW */
{ struct io_type *ioptr;
 struct mf_type *mfptr;
 struct rule_type *ruleptr;
 struct rule_element_type *ifptr;
 struct rule_element_type *thenptr;
 int cnt=1;
 for(ioptr=System_Inputs;ioptr!=NULL;ioptr=ioptr->next)
 { printf("%s: Value= %d\n",ioptr->name,ioptr->value);
 for(mfptr=ioptr->membership_functions;mfptr!=NULL;mfptr=mfptr->next)
 { printf(" %s: Value %d Left %d Right %d\n",
 mfptr->name,mfptr->value,mfptr->point1,mfptr->point2);
 }
 printf("\n");
 }
 for(ioptr=System_Output;ioptr!=NULL;ioptr=ioptr->next)
 { printf("%s: Value= %d\n",ioptr->name,ioptr->value);
 for(mfptr=ioptr->membership_functions;mfptr!=NULL;mfptr=mfptr->next)
 { printf(" %s: Value %d Left %d Right %d\n",
 mfptr->name,mfptr->value,mfptr->point1,mfptr->point2);
 }
 }
/* print values pointed to by rule_type (if & then) */
 printf("\n");
 for(ruleptr=Rule_Base;ruleptr->next!=NULL;ruleptr=ruleptr->next)
 { printf("Rule #%d:",cnt++);
 for(ifptr=ruleptr->if_side;ifptr!=NULL;ifptr=ifptr->next)
 printf(" %d",*(ifptr->value));
 for(thenptr=ruleptr->then_side;thenptr!=NULL;thenptr=thenptr->next)
 printf(" %d\n",*(thenptr->value));
 }
 printf("\n");
} /* END PUT SYSTEM OUTPUTS */
get_system_inputs(input1,input2) /* NEW */
int input1, input2;
{ struct io_type *ioptr;
 ioptr=System_Inputs;
 ioptr->value=input1;
 ioptr=ioptr->next;
 ioptr->value=input2;
} /* END GET SYSTEM INPUTS */
End Listing





April, 1994
Digital I/O with the PC


Brian Hook and Dennis Shuman




Putting the parallel port to work


Brian is a programmer at the USDA developing data-acquisition and analysis
software. He can be reached on the Internet at bwh@cis.ufl.edu or on
CompuServe at 72144,3662. Dennis, a research scientist at the USDA, develops
electronic/accoustic systems to detect insect pests in agricultural
commodities. He can be reached at USDA, ARS, 1700 SW 23rd Dr., Gainesville, FL
32608.


Data-acquisition and analysis is often performed with dedicated, proprietary,
and expensive laboratory instruments. However, the PC's open architecture
makes it a cost-effective alternative for many data-acquisition and analysis
projects. One particular project we developed at the Acoustic/Electronic
Insect Detection Laboratory at the United States Department of Agriculture,
Agricultural Research Service facility in Gainesville, Florida required just
such a system. The system is an integrated hardware/software setup that allows
for digital input and output in a low-end PC configuration. This article
describes what we learned while implementing digital I/O via the PC's parallel
port.
The system, known as "EGPIC" (Electronic Grain Probe Insect Counter), checks
for insects in stored-grain bins and elevators. It does this by electronically
sensing insects that crawl into specially designed probes placed at a number
of locations in the grain mass. The hardware side of the EGPIC is responsible
for detecting an insect and generating the appropriate signal for some
digital-input computer interface. The software is responsible for reading,
analyzing, displaying, and storing the collected data. We selected the PC as
the host system for the software because of its wide, low-cost availability
and profusion of development tools.
Given these design criteria and the cost and compatibility constraints, the
final specification sheet for our digital input and output (DIO) interface was
as follows:
Simple interface to a PC.
Multiple digital input lines.
Interrupt-on-input capability.
At least one digital output, preferably more than one.
Compatibility and availability across a wide range of platforms, including
ISA, EISA, and MCA buses.
Relatively low cost.


Digital I/O Options


The PC architecture has a wide variety of input and output techniques
available to it, from specialized DIO boards to the relatively crude game
port. Each has some advantages and disadvantages for this type of system.
Specialized DIO boards are available for the PC, typically as 8- or 16-bit ISA
boards with 48 I/O lines, configurable to generate interrupts on one of
several different IRQs. While nearly ideal feature-wise, these boards are
relatively expensive--from $40 or $50 to more than $1000. Even with the
inexpensive boards, this cost becomes significant in high volumes. In
applications using from 9 to 96 probes, EGPIC has been configured for use with
these DIO boards. However, when only one to eight probes are needed, the DIO
board is unnecessary. Also, since these boards require a free bus slot, some
systems, such as laptops (a likely target platform), would be excluded from
using EGPIC.
The PC's standard RS-232 serial port is suitable for this type of application,
but the external EGPIC hardware would require an extra translation layer to
generate RS-232-compatible bit streams from the eight digital inputs. This
method is suitable for very large systems requiring hundreds or thousands of
probes, but is unnecessarily complex for smaller-scale systems.
Among other deficiencies, neither the keyboard input nor the game port offer
output capabilities, ruling out their consideration as suitable I/O interfaces
for EGPIC.
This leaves the PC's printer parallel port. Like a dedicated DIO card, it
offers digital output lines and interrupt-on-input capability (using either
IRQ 5 or 7). Unlike DIO cards, it's available for all PC platforms and is
relatively inexpensive. The parallel port's only shortcomings are that not all
implementations have input capability and the port may already be in use by
another device, likely a printer. However, many systems have two parallel
ports; if not, a second parallel port is an inexpensive addition. And as for
the lack of input capability, after some software tricks, a 100 percent
compatible PC parallel port can, in fact, be used for up to 8 bits of digital
input.


Programming the Parallel Port


To illustrate the programming techniques discussed in this article, I've
written the Parallel Port Digital Input Output (PPDIO) package, a rudimentary
set of C functions that allow for reading and writing to the parallel port and
installing an interrupt-service routine (ISR) to handle incoming data on the
parallel port. Listing One (page 103) is PPDIO.H; Listing Two (page 103) is
PPDIO.C.
The parallel port is programmed via three separate I/O registers: the
input-only data register, the output-only status register, and the
input/output control register. The addresses of these register ports differ
depending on the machine, but they are usually offset from 0x378, 0x278, or
0x3BC. The base address for a particular LPT port is stored in the BIOS data
area. The PPDIO_Get- LptAddress() routine shows how to retrieve this
information.
The data register (see Figure 1), located at the parallel port's base address,
takes a standard bit mask that indicates which pins should be sent high and
low. Sending information out the parallel port is accomplished with a simple
OUT instruction. PPDIO_SendByte() handles this. The parallel port transmits
this byte until told to transmit a different one, making digital output a
trivial task. Note that while we can theoretically read the data register with
an IN instruction, the byte read won't be incoming data--it will be the most
recent data transmitted.
The data register can't be used for input, so we must use both the status and
control registers; see Figure 2 and Figure 3 for their respective layouts.
Reading the status register is very straightforward, but keep in mind that the
logic of pin 11 is inverted.
The control register is nominally an output-only register, but by taking
advantage of the four output lines driven with open-collector drivers, we can
force the control register into giving us input. If we produce a high TTL
logic level at the control register's corresponding pins, we can drive the
pins low via incoming signals. Thus, by setting the appropriate bits of the
control register, the pins can be used as input. This is handled transparently
when PPDIO_InstallISR() is called.
Reading the control and status registers is accomplished by an IN instruction
at the port's base address and relevant offset. The routines
PPDIO_ReadStatusRaw() and PPDIO_ReadControlRaw() illustrate how to accomplish
this. Because several of the input lines have negative active logic placed
upon them by the parallel port, helper functions that translate negative logic
would be useful. The routines PPDIO_ReadStatusCooked() and
PPDIO_ReadControlCooked() provide this functionality, along with converting
reserved and unused bits to 0.


Interrupt-driven Communications


Now that input and output have been addressed, all that's left is making the
communications interrupt-driven. The parallel port's input lines could be
polled; however, this would be cumbersome, time consuming, and error-prone.
Having an interrupt generated whenever a digital line is sent high is a far
more elegant means of input detection.
Assuming the target system supports it, interrupt-driven input on the parallel
port is actually not very complicated to achieve. First, the control register
must have its interrupt-enable bit set. Next, an ISR must be installed in the
DOS interrupt vector table for the appropriate IRQ. Finally, the port's IRQ
must be unmasked from the 8259 Programmable Interrupt Controller's
interrupt-enable register. All of this is demonstrated in PPDIO_InstallISR().
While programming in an interrupt-driven manner is theoretically simple,
hardware support can be shaky. The printer port has traditionally utilized IRQ
7, but IRQ 5 isn't uncommon either. Even worse, some machines either have the
parallel-port interrupt disabled altogether or have an alternate device (such
as a network or sound card) using its IRQ. To compound matters, there's no
easy way to detect which IRQ a given base address or LPT port corresponds
to--this must either be known by the user or determined empirically by the
program.
To solve this problem, EGPIC uses a simple call-and-acknowledgment method of
IRQ determination. This involves installing ISRs at IRQs 5 and 7, "calling"
the EGPIC hardware (which acknowledges the call by generating an interrupt),
then seeing which ISR is called. If no ISR is called, either another IRQ is in
use (doubtful, since the PC industry has fairly well standardized on IRQs 5
and 7 for the parallel port), or no IRQs are being used for the parallel port.
It's simpler to have the user input which IRQ to use, but this demands a
higher level of user knowledge than the application may reasonably assume. An
interesting secondary use of this call-and-response procedure is for hardware
testing--if a probe is known to be installed and fails to generate a response
when requested, then that probe must be malfunctioning.
Interrupts are generated via pin 10, normally a printer's ACK line. A
high-line level sent to pin 10 results in an interrupt being generated,
assuming that all other relevant setup has been done.
During the development of EGPIC we found that both cable length (from the
probe to the computer) and interrupt latency played a role in determining
whether a signal actually existed at the inputs when the ISR was called. With
long cable lengths and a fast computer, it was possible for some inputs not to
be updated by the time the ISR was executed. Conversely, with a slow computer
it was possible for the signal to have come and gone (depending on the length
of the generated input) by the time the ISR was called. These timing problems
can be compensated for in hardware, but not knowing of their existence can
lead to some rather irksome bugs.



Potential Problems


Two particularly bothersome problems came to light while developing the EGPIC
system.
First, if the interrupt lines were allowed to float while the system was
collecting data, the PC would likely lock up. This is because a floating line
will often fluctuate between TTL TRUE and FALSE, causing thousands of
interrupts to be generated every second, freezing up the system. Something as
innocuous as accidentally pulling a cable loose or turning off the input
hardware may cause a system lockup.
The second problem is that not all PC parallel ports are identical. Some
parallel ports deviate considerably from the original PC's design, rendering
this type of specialized input and output--which assumes 100 percent hardware
compatibility--impossible. Unfortunately, only trial and error will determine
which systems are nonstandard.


Conclusion


Listing Three (page 103) is DIO.C, a program that implements simple
interrupt-driven DIO with the parallel port. By itself, DIO.C is fairly
useless--consider it more of a basic framework to draw upon than a real
program. Any program that would use these routines would require some amount
of custom hardware and software design.
At first glance, the antiquated design of the PC parallel port seems highly
limited, but it can quite easily be configured as an inexpensive digital
input/output interface. The hardware required is minimal: a one-shot circuit
per channel and a single interrupt line for the logical OR of all the
channels. The software, as demonstrated in this article, is quite simple and
easily customized for specific applications. In our case, the parallel port
satisfied all of our requirements superbly, enabling the EGPIC project to be
both completed in a timely and cost-effective manner and distributed across a
wide range of systems.


Acknowledgments


We are grateful to Hok Chia and Sergey Kruss (University of Florida) for
electronic technical assistance.


References


Eggebrecht, Lew. Interfacing to the IBM Personal Computer, Second Edition.
Carmel, IN: SAMS, 1992.

Figure 1: Data register (base address+0 ).
 Bit Pin/Function Logic
 0 0 1=TRUE
 1 1 1=TRUE
 2 2 1=TRUE
 3 3 1=TRUE
 4 4 1=TRUE
 5 5 1=TRUE
 6 6 1=TRUE
 7 7 1=TRUE


Figure 2: Status register (base address+1).
 Bit Pin/Function Logic

 0 (reserved) --
 1 (reserved) --
 2 (reserved) --
 3 15 1=TRUE
 4 13 1=TRUE
 5 12 1=TRUE
 6 10 1=generate
 interrupt
 7 11 0=TRUE

Figure 3: Control register (base address+2).
 Bit Pin/Function Logic
 0 1 0=TRUE
 1 14 0=TRUE
 2 16 1=TRUE
 3 17 0=TRUE
 4 IRQ enable 1=enabled
 5 (reserved) --

 6 (reserved) --
 7 (reserved) --
[LISTING ONE]
//-----------------------------------------------------------------
// PPDIO Parallel Port Digital IO routines
// Version 1.0 Copyright 1993 by Brian Hook. All Rights Reserved.
// File: PPDIO.H -- header file for the PPDIO library
// Compile with Borland C++ 3.1 -- porting to another compiler
// should be extremely trivial.
//-----------------------------------------------------------------
#ifndef __PPDIO_H
#define __PPDIO_H

//--- Pin definitions for control register ------------------------
#define PIN_1 0x01
#define PIN_14 0x02
#define PIN_16 0x04
#define PIN_17 0x08

//--- Pin definitions for status register -------------------------
#define PIN_15 0x08
#define PIN_13 0x10
#define PIN_12 0x20
#define PIN_10 0x40
#define PIN_11 0x80

//--- Interrupt enable bit definition -----------------------------
#define PTR_ENABLE_INT_BIT 0x10

//--- Function prototypes -----------------------------------------
unsigned PPDIO_GetLptAddress( int lpt_port );
void PPDIO_InstallISR( void interrupt (*fnc)(), int irq );
unsigned char PPDIO_ReadControlRaw( void );
unsigned char PPDIO_ReadStatusRaw( void );
unsigned char PPDIO_ReadControlCooked( void );
unsigned char PPDIO_ReadStatusRaw( void );
void PPDIO_RemoveISR( void );
void PPDIO_SendByte( unsigned char data );
void PPDIO_SetBaseAddress( unsigned base_address );
void PPDIO_SetLptPort( int lpt_port );

#endif

[LISTING TWO]

//-----------------------------------------------------------------
// PPDIO Parallel Port Digital IO routines
// Version 1.0 Copyright 1993 by Brian Hook. All Rights Reserved.
// File: PPDIO.C -- code and variables for the PPDIO library
// Compile with Borland C++ 3.1 -- porting to another compiler
// should be extremely trivial.
//-----------------------------------------------------------------
#include <dos.h>
#include "ppdio.h"

static unsigned ppdio_data_register;
static unsigned ppdio_control_register;
static unsigned ppdio_status_register;
static unsigned ppdio_interrupt_no;

static unsigned ppdio_irq;

static unsigned char ppdio_old_control_value;
static unsigned char ppdio_old_8259_mask;
static void interrupt (*ppdio_old_intvec)();

unsigned PPDIO_GetLptAddress( int lpt_no )
{
unsigned far *pp = ( unsigned far * ) MK_FP( 0x40, 8 );
 //--- Assumes values of 1, 2, or 3 -----------------------------
 return ( pp[lpt_no-1] );
}
void PPDIO_InstallISR( void interrupt (*fnc)(), int irq_no )
{
static char mask[] = { 0xfa, 0xf7, 0xef, 0xdf, 0xaf, 0x7f };
unsigned char temp;

 //--- Interrupt number = IRQ no + 8 ----------------------------
 ppdio_interrupt_no = irq_no + 8;

 //--- Save original interrupt vector ---------------------------
 ppdio_old_intvec = getvect( ppdio_interrupt_no );

 //--- Install new ISR ------------------------------------------
 setvect( ppdio_interrupt_no, fnc );

 //--- Enable interrupts by setting the PTR_ENABLE_INT_BIT in
 //--- the control register. Also, OR it by 0x04 to send pin
 //--- 16 high then write out a 0 to pins 1, 14, and 17 so
 //--- that we can use the control register for input.
 ppdio_old_control_value = inportb( ppdio_control_register );
 temp = ppdio_old_control_value PTR_ENABLE_INT_BIT PIN_16;
 temp &= ~ ( PIN_17 PIN_14 PIN_1 );
 outportb( ppdio_control_register, temp );

 //--- Unmask our IRQ in the interrupt controller ---------------
 ppdio_old_8259_mask = inportb( 0x21 );
 temp = ppdio_old_8259_mask & mask[ppdio_interrupt_no-10];
 outportb( 0x21, temp );

 //--- Clear pending interrupts ---------------------------------
 outportb( 0x20, 0x20 );
}

unsigned char PPDIO_ReadControlCooked( void )
{
unsigned char raw_control;
unsigned char cooked_control = 0;
 raw_control = PPDIO_ReadControlRaw();

 //--- Return a control register mask that compensates for the inverse logic
 //--- of pins 1, 14, and 17, and with 0s where bits are reserved or unused.
 if ( !( raw_control & PIN_1 ) )
 cooked_control = PIN_1;
 if ( !( raw_control & PIN_14 ) )
 cooked_control = PIN_14;
 if ( raw_control & PIN_16 )
 cooked_control = PIN_16;
 if ( !( raw_control & PIN_17 ) )

 cooked_control = PIN_17;
 return ( cooked_control );
}
unsigned char PPDIO_ReadControlRaw( void )
{
 return ( inportb( ppdio_control_register ) );
}
unsigned char PPDIO_ReadStatusCooked( void )
{
unsigned char raw_status;
unsigned char cooked_status = 0;

 raw_status = PPDIO_ReadStatusRaw();
 //--- Return a status register mask that compensates for the
 //--- inverse logic of pin 11, and with 0s for any reserved or unused bits.
 if ( raw_status & PIN_15 )
 cooked_status = PIN_15;
 if ( raw_status & PIN_13 )
 cooked_status = PIN_13;
 if ( raw_status & PIN_12 )
 cooked_status = PIN_12;
 if ( !( raw_status & PIN_11 ) )
 cooked_status = PIN_11;
 return ( cooked_status );
}
unsigned char PPDIO_ReadStatusRaw( void )
{
 return ( inportb( ppdio_status_register ) );
}
void PPDIO_RemoveISR( void )
{
 //--- Restore the interrupt controller's previous state --------
 outportb( 0x21, ppdio_old_8259_mask );

 //--- Restore the original interrupt vector --------------------
 setvect( ppdio_interrupt_no, ppdio_old_intvec );

 //--- Restore the printer control register ---------------------
 outportb( ppdio_control_register, ppdio_old_control_value );
}
void PPDIO_SendByte( unsigned char data )
{
 outportb( ppdio_data_register, data );
}
void PPDIO_SetBaseAddress( unsigned base_address )
{
 ppdio_data_register = base_address;
 ppdio_status_register = base_address + 1;
 ppdio_control_register = base_address + 2;
}
void PPDIO_SetLptPort( int lpt_port )
{
 PPDIO_SetBaseAddress( PPDIO_GetLptAddress( lpt_port ) );
}

[LISTING THREE]

//-----------------------------------------------------------------
// PPDIO Parallel Port Digital IO routines

// Version 1.0 Copyright 1993 by Brian Hook. All Rights Reserved.
// File: DIO.C -- this is an example how you could use the PPDIO
// routines. This could be used as a framework upon which you
// could build real applications.
// Compile with Borland C++ 3.1 -- porting to another compiler
// should be extremely trivial.
//-----------------------------------------------------------------
#include <conio.h>
#include "ppdio.h"

volatile int isr_called = 0;
void huge interrupt MyISR( void )
{
 isr_called = 1;

 //--- Normally you would read the input pins here and do something important
 //--- Signal end of interrupt to the interrupt controller ------
 outportb( 0x20, 0x20 );
}
void main( void )
{
 //--- Use LPT1 -------------------------------------------------
 PPDIO_SetLptPort( 1 );

 //--- Install our ISR on IRQ 5 ---------------------------------
 PPDIO_InstallISR( MyISR, 5 );

 //--- Run until either a key is pressed or interrupt is generated on IRQ 5
 while ( !kbhit() && !isr_called ) {
 }
 PPDIO_RemoveISR();
}
End Listings





























April, 1994
EchoNets, E-memes, and Extended Realities


Scott B. Guthery


Scott is a scientific advisor at the Schlumberger Laboratory for Computer
Science. He can be reached via Internet at guthery@austin.slcs.slb.com.


The walkie-talkies of computing are personal digital assistants, or PDAs.
Apple's Newton, AT&T's EO, Casio's Zoomer, and IBM's Simon all open the
possibility of direct, computer-to-computer connections using wireless
personal-communication systems. Network vendors will contend that central
switches are necessary for communication among a large number of people, and
while central switches do add features to network communication, switchless
network communciation that relays messages from one node to another is also
possible. This article will explore possibilities for switchless networks
which I will call "EchoNets."


EchoNets


Suppose that every minute or so your PDA broadcasts a message such as, "Curly
here. Anybody out there?" Then, suppose I happen to walk by with my PDA turned
on. It answers, "Yeah, Moe here. What's up, Curly?" Yours replies, "I've got
15 messages for you," and sends my PDA the messages. "Thanks," mine says, "and
here are eight for you." "See ya, Curly," yours says. "Ciao, Moe," says mine,
and they go their separate ways.
This is a basic EchoNet message exchange, a form of which is used in existing
computer networks, including Usenet, FidoNet, and Relaynet. In fact, the news
groups in FidoNet [Bush 93] are actually called "echoes," and mail is called
"echomail." The difference between the use of the EchoNet-style protocols by
PDAs and their use in existing networks is that the nodes in existing networks
exchange messages with nodes that are known and relatively fixed over time. In
a PDA EchoNet, a node is constantly polling for and talking to strangers.
Obviously, if I'd queued up a message to you in my PDA or if you'd entered a
message to me in yours, we would have communicated without a switch. This is
plain-vanilla, peer-to-peer communication. But suppose my sister had written a
message to you last night on her PDA. Sometime during the night, her PDA
engaged in a message exchange with mine, and I unwittingly carried her message
with me when I left for work this morning. When I walked by your PDA, I
delivered her message to you. In a sense, my PDA was the network backbone that
carried my sister's message to you.
EchoNets are by no means a recent discovery. One of the earliest networks in
the Internet, the DARPA Packet Radio Network, used a "flooding" protocol,
which is a form of EchoNet. And the gateways and bridges in modern switched
networks are really nothing more than EchoNet nodes with fixed neighbors. So
even switched networks may have EchoNet subnets.
However, we've become so accustomed to the features provided by switches that
we think communication systems require them. While it will be useful for PDAs
to be able to connect to pay-per-message switched networks such as cellular,
telephone, satellite, and cable systems, it's important to realize that you
are buying the added features of the switch and billing, as much as the raw
network capability itself. It's also useful to realize that you can
communicate without them.


Receiver Addressing vs. Sender Addressing


Of course, one downside of EchoNet messaging is that the mail may not go
through. And even if it eventually does go through, you have no idea about nor
any control over how long it will take to deliver your message. It's like
Usenet or FidoNet e-mail, only worse--much worse.
To get an EchoNet mail message from me to you, we need a chain of PDAs: First,
I have to pass near A, then some time later, A has to exchange messages with
B, and so forth, until you finally cross paths with Z. While it's helpful that
the whole chain of PDAs need never exist in totality at any one point in time,
you and I are clearly at the mercy of many chance events and random
happenings. What this means is that you can probably get EchoNet mail reliably
to people in the crowd you hang out with, but it's unlikely that you can get
EchoNet mail to your friend in Tuva. In fact, it could be argued that EchoNets
aren't good at all for person-to-person e-mail when you know exactly who you
want to send the message to and what their address is.
Where EchoNets beat traditional e-mail (and telephone and surface mail, for
that matter) is when you don't know who you want to send the message to, or
when you don't know how to get in touch with them. In this case, you want to
broadcast your message in the hopes that the person or people you're looking
for will receive it. Thus, it is the receiver or receivers of the message,
rather than the sender, who determines to whom the message is addressed.
Just like a piece of e-mail, an EchoNet message comes with a number of header
fields (see Figure 1) which describe it. Header fields help you scan the
incoming EchoNet messages quickly to find the ones that are meant for you.
You'll probably want to activate some sort of automatic text filter to sift
through all the incoming messages and set aside ones that fit you and your
profile of interests.


E-memes


It's important to remember that, unlike Usenet newsgroup messages, you're both
a potential recipient and a relay point for all of the EchoNet messages you
receive. Just because you find a message you think is addressed to you,
doesn't mean you shouldn't pass it on. There may be other people who are
interested in the message, and you're part of the chain that is going to get
it to them.
By default, you should also relay messages that aren't addressed to you along
with ones that are. On the other hand, you're free to look over your message
traffic and delete any messages you don't want your PDA to pass along. In an
analogy to the memes (or thoughts) of human communication, EchoNet messages
are kind of like e-memes. E-memes that people like--that they want to tell
other people about--are passed on. E-memes that people don't like die off
quickly, either by being read and deleted, or by being killed by automatic
purge rules.
Besides passing e-memes, you can also annotate or elaborate on them. In this
case, your observation becomes linked to the e-meme it comments on, and when
the e-meme is sent to another PDA, these links are preserved so that you
really can read and add to threads of thought.


Anonymity and Privacy


Have you noticed that Internet and FidoNet messages often arrive signed by
everybody who handled them along the way? There are some interesting personal
privacy implications if you extend this networking custom to an EchoNet. For
example, if I get a message that has a path from Bob to Jim to Sally to Pete,
then I could try to deduce that Jim was in the vicinity of Sally at some time.
Due to name spoofing and path hacking (not to mention people borrowing each
other's PDAs), this isn't ironclad evidence, but it is providing some
information about the whereabouts of both your PDA and, by association,
yourself.
Fortunately, EchoNet can forgo this cyberspace territorial-marking custom.
There is nothing in the EchoNet relay algorithm that requires knowledge of how
the message got to your PDA or the fact that it ever went through it. You can
participate in EchoNets completely anonymously, or under a pen name. EchoNet
pen names are like the handles of CB radio and the nicknames of Internet Relay
Chat, with the advantage that you don't ever have to use an FCC-approved call
sign or NIC-approved Internet address.
Privacy on an EchoNet is a little more problematic. If I can't understand what
the message says, it's hard for me to figure out if it's for me or not. If
it's encrypted and I don't have the key, then I'm pretty sure it isn't.
Furthermore, if I can't read the message, then I can't determine if I want to
pass it on or not and probably won't. Therefore, since encrypted e-memes will
probably die out faster than unencrypted ones, I'd expect to see a resurgence
of the clear text forms of encoding. This leads you to wonder which properties
of an e-meme make it travel the farthest.


An EchoNet Application: Measurements and Surveys


Psychologists who study cliques have devised fascinating ways of measuring the
who-knows-whom connectivity between people. In one experiment, you're given a
booklet describing a target person. This description does not include his or
her name or whereabouts. You're asked to enter your name and address in the
booklet, then pass it to somebody you know who stands a chance of getting the
booklet to the target. The person to whom you give the booklet repeats the
process, and sooner or later the booklet ends up in the hands of the person it
describes. By counting the names in the booklet when it arrives, the
psychologist obtains an upper-bound estimate of the who-knows-whom distance
between you and the target. These experiments are called "studies of the
small-world problem."
Suppose the people receiving the booklet had also been requested to enter some
other information about themselves, rather than just entering their names and
addresses. Then the booklets would accumulate a survey of all the people who
handled them. Now suppose that the people are PDAs and the booklets are e-meme
threads. What you have is a low-cost way of taking measurements and surveys.
For measurements, specially equipped PDAs can operate completely autonomously.
At regular time intervals, the measurement's sensor is read, and a time- and
location-stamped sensor value is queued as an outgoing e-meme. Over time, some
of these readings find their way back to the person interested in them. While
the coverage both in time and space is unpredictable, expenses are kept to a
minimum and the flow is continuous.
An e-meme survey is more in the spirit of EchoNet. In this case, your PDA
receives a questionnaire as the head of a thread of responses. The thread head
asks you to append the thread with your response to the questionnaire. You're
free to throw the whole thing away or look at the responses of other people
before you add your own. As with PDA measurements, we don't exactly have a
controlled experiment, but then, we don't have to bear the cost of conducting
one, either.
A downside of e-meme surveys can be getting the raw data back to the person
conducting the survey. If the PDAs are moving around, you cover a wide area
but may only get back a portion of the measurements you took. On the other
hand, if the PDAs are relatively immobile, you'll cover less area but stand a
better chance of collecting more data. In this case, you can take advantage of
the fact that the PDAs are immobile and upgrade the basic EchoNet protocol to
include a notion of routing.
What if, in sending around passive text fragments, there were some way of
sending executable code fragments? I think this is what people have in mind
when they talk about Knowbots and General Magic Telescript agents. If there
were some way of telling friendly viruses from evil ones, then a code fragment
that hopped from PDA to PDA, gathering up data and then heading home at the
end of the day, would be a terrific way to cover a lot of territory quickly.
If nothing else, the quitting-time algorithm will be fun to design.



EchoNet Routing


What if our PDAs aren't roaming around but are sitting still, in a classroom,
for example, or at a concert or in an office? Here, rather than trusting to
random passoffs to get messages through, the PDAs can run a routing algorithm
so that each PDA knows exactly which PDAs are out there and which PDAs a
message has to go through to get to a particular PDA.
The simplest routing algorithm begins by each PDA figuring out who it is
directly connected to. It does this using the usual message-exchange protocol,
but rather than exchanging messages, it exchanges connectivity information.
"Curly here. Anybody out there?" one says, and gets back a bunch of messages:
"Yeah, Sleepy here. What's up?" "Yeah, Sneezy here. What's up?" "Yeah, Doc
here. What's up?" "Yeah, Bashful here. What's up?" Now Curly knows he's
directly connected to Sleepy, Sneezy, Doc, and Bashful, and each of these
knows they are directly connected to Curly.
With this information, Curly can address messages directly to particular PDAs
rather than broadcasting them to everybody and having to deal with all their
responses. "Curly here. Who are you connected to, Sleepy?" Sleepy responds to
Curly, "Sleepy here. I'm connected to Doc and Snow White, Curly." "Ahah! A new
player," thinks Curly. What Curly has discovered is that there is a Snow White
out there that he can get a message to by way of Sleepy. "Curly here. Give
this to Snow White, Sleepy: Yo, SW, what's cookin'?' "
What you have here is explicit routing. Rather than just broadcasting a
message into the ether, Curly sent it directly to Sleepy along with explicit
instructions to pass it directly to Snow White. We've also added the
peer-to-peer "Who are you connected to?" message. By coupling this message
with direct addressing, any PDA can discover the entire known universe and its
connectivity. Knowing this, your PDA can present you with a list of all the
PDAs with which you can communicate on EchoNet and can send any message
directly to the one you pick. In a sense, your PDA is functioning as a router
or a switch as well as a source, sink, and passive relay point. You're
realizing some of the advantages of a switched network without building a
central switch through which all traffic must flow.
There are hundreds of networkdiscovery and network-routing algorithms and
protocols. Many can be used in EchoNets and switched nets. In fact, an EchoNet
can be thought of as just a network, in which every node is also a router.
Recently, the Internet technical community has become interested in supporting
Internet connectivity to mobile hosts (see the accompanying text box entitled
"Mobile Internetworking") and has published the "Internet Packet Transmission
Protocol," which discusses a possible routing algorithm for this situation.


EchoNets as Distributed Systems


This primitive network-discovery and routing algorithm is an example of a
large class of distributed-system algorithms that has received attention over
the last 15 years. Dijkstra's classic paper [Dijkstra 80] and Chang's
independent discovery [Chang 82] have set the tone and direction for much of
this work. Chang's paper is a more readable introduction to distributed
algorithms even though Dijkstra's paper has priority. Yang and Marsland's
recent note [Yang 93] is an excellent annotated bibliography on two important
problems in this field.
Dijkstra and Chang showed that it is possible to design practical algorithms
for EchoNets which enable any node in the network to discover information
about the whole network or about any particular node in the network. In fact,
Chang called these algorithms "echo algorithms" because a broadcast question
produces an echoed response. "Practical" here means that the answer is
obtained in a deterministic and computable amount of time and that the EchoNet
message traffic generated by the broadcast request eventually dies out.
The Dijkstra/Chang echo algorithm proceeds as follows: The initial,
inquisitive node sends its question to each of the nodes to which it is
connected. Upon initially receiving the question, each node relays the
question to all the nodes to which it is connected except for the initial
node. If the receiving node has no other nodes to which it can send the
question, then it sends the accumlated answer back to the node that sent it
the question. Finally, when a node receives answers from all the nodes to
which it sent the question, it in turn sends the accumulated answers to the
node that first sent it the question.
In his paper, Chang gives a number of applications of this basic echo
algorithm along with some performance calculations and special-case
improvements. One of the applications, the Single-Source Sort, is particularly
applicable to our PDA-to-PDA communication situation. The idea is that a new
node is joining an existing EchoNet and wants to pick a unique identity. The
new node sends out the question, "What is your name?" Each node's echo is its
own name, appended to the list of names it has received from the nodes it has
contacted. What arrives back at the new node is a list of the names of all the
nodes in the EchoNet. All the new kid on the block has to do now is pick a
name that isn't on the list.


Global State and Cooperative Behavior


Dijkstra/Chang-style echo algorithms are fine for determining static
properties of an EchoNet (such as the list of the names of all the nodes in
the network); but what about dynamic properties? Suppose all the nodes in an
EchoNet wanted to cooperate in accomplishing a task of some sort. How would
they keep track of the current state and progress of their combined effort, or
stay coordinated?
One way would be to have everybody synchronize their actions to a global
clock, then treat the dynamic state as simply a series of static states
separated by network-wide time synchronization points. From an individual
node's point of view, the drill might look something like this (see, for
example, [Flammer 92]):
1. Do something useful.
2. Wait until the global-synchronization point.
3. Exchange what you've done with everybody else and find out what everybody
else has done using an echo algorithm.
4. Figure out what to do next.
5. Go to step #1.
While there are a number of global-clock and virtual-time algorithms which can
be used for the global-synchronization point [Yang 93], you get the feeling
there's an excessive amount of overhead in this approach--theremust be a less
West Point and more Mill Valley way of achieving cooperation.
Chandy and Lamport [Chandy 85] describe an algorithm whereby nodes in an
EchoNet can determine the global state of a cooperative effort without a
global clock. They called their algorithm a "distributed snapshot" by analogy
to a group of photographers (the nodes) who take several pictures (the local
states) and piece together the results (by exchanging messages) to form a
"meaningful" panoramic picture (the global state) that is larger than a
picture that any one photographer's camera could handle. In this context,
"meaningful" means that the composite picture is sufficient for coordinating
the nodes and getting the cooperative effort accomplished.
The Chandy/Lamport algorithm provides a method for "strobing" the recording of
a node's local state by a mechanism other than a global alarm clock. The
method is based on the sending of a special "Record your state!" message
around the network. After the message has been received and obeyed by all
nodes, the recorded local states are collected to form a description of the
global state; this global state description is distributed to all nodes using
an echo algorithm.
Since the "Record your state!" message reaches nodes at different times, you
have to record not only the state of each of the nodes but the state of what
causes nodes to change state; videlicet, the in-transit message traffic
between nodes. The resulting global state is rather like a little film clip
that we can run forward and backward to see what the network was up to during
a tiny interval of time. The Chandy/Lamport algorithm is a careful
specification of how the local states of the nodes and the communication
channels are to be recorded so that this movie is a useful representation of
the net's state.
The Chandy/Lamport distributed snapshot algorithm works like this:
Initiation Rule: Send the "Record your state!" message to each node to which
you're connected and then record your state before you send any further
messages.
 Unrecorded State Rule: If you receive a "Record your state!" message and you
have not already recorded your state, then: 1. Record your state; 2. record
the fact that the state of the channel between you and the node that sent you
the message is "Empty"; 3. start recording all subsequent messages you receive
from other nodes; and 4. send the "Record your state!" to each node to which
you are connected.
 Recorded State Rule: If you receive a "Record your state!" message and you
already have recorded your state, then record the fact that the state of the
channel between you and the node that sent you the message is the sequence of
the messages you got from this node between the time you recorded your state
and the current "Record your state!" message.
The algorithm gives receivers the obligation of recording the in-flight
messages. The recording starts at a node when the first "Record your state!"
message appears on any channel and stops channel-by-channel when "Record your
state!" appears on each channel.


Extended Realities


What the Dijkstra/Chang and Chandy/Lamport algorithms give us is a way for
many mobile computers to act cooperatively. While each individual PDA is
keenly aware of its own surroundings, it can also count on the "eyes and ears"
of the other PDAs in its EchoNet to act as lookouts in regions beyond its own
ken. I think of PDAs knitted together by these algorithms as being similar to
the compound eye of an insect or a very large array radio antenna. In a sense,
the reality of each PDA has been extended to the area covered by the entire
EchoNet of which it is a member.
It's interesting that this relatively complex form of network behavior has
been achieved without a central switch. Switchless networks like EchoNets
certainly have their drawbacks, such as indeterminate message delivery. The
advantages, however, include robustness due to absence of a single point of
failure and the ease with which nodes can enter and leave the communication
mesh. In the era of mobile wireless computing, we may find situations where it
just doesn't make sense to send the message downtown and back if it only has
to get to someone standing next to me. We may also find useful forms of
network communication that don't send us a bill at the end of the month.


Bibliography


Acharya, Arup, and B.R. Badrinath. "Delivering Multicast Messages in Networks
with Mobile Hosts." Proceedings of the 13th International Conference on
Distributed Computing Systems. May 25--28, 1993, Pittsburgh, PA.
Bush, Randy. "FidoNet: Technology, Tools, and History." Communications of the
ACM (August 1993).
Chandy, K. Mani, and Leslie Lamport. "Distributed Snapshots: Determining
Global States of Distributed Systems." ACM Transactions on Computer Systems,
(February 1985).
Chang, Ernest J.H. "Echo Algorithms: Depth Parallel Operations on General
Graphs." IEEE Transactions on Software Engineering (July 1982).
Chow, Ching-Hua. "On Multicast Path Finding Algorithms." Proceedings of IEEE
Infocom '91.
Dijkstra, E.W., and C.S. Scholten. "Termination detection for diffusing
computations." Inf. Proc. Lett. (August 1980).
Ioannidis, John, and Gerald Q. Maguire, Jr. "The Design and Implementation of
a Mobile Internetworking Architecture." Proceedings of the 1993 Winter USENIX
Meeting, January 25--29, 1993, San Diego, CA.
Flammer, George H. Method for Synchronizing a Wide Area Network without Global
Synchronization. U.S. Patent #05130987.

Ioannidis, John. Protocols for Mobile Internetworking. Ph.D. Thesis, Columbia
University, 1993.
Trask, Jeremy R., and Anthony Wiener. "Data Communication System." U.S. Patent
4,937,569, June 26, 1990.
Uehara, Keisuke, et al. "Enhancement of VIP and Its Evaluation." Proceedings
of INET '93, August 17--20, San Francisco, CA.
Wada, Hiromi, Tatsuya Ohnishi, and Brian Marsh. "Packet Forwarding for Mobile
Hosts." Internet Draft, July 1993.
Yang, Zhonghua, and T. Anthony Marsland. "Annotated Bibliography on Global
States and Times in Distributed Systems." ACM Operating Systems Review (July
1993).

Figure 1: An EchoNet header field.
To: EchoNet Implementers Everywhere
From: Earl of Echoes in Austin
Subject: Improved Short-Hop
Protocol
Keywords: EchoNet, Transfer Rules
Send-Date: September 6, 1993
Route: Bubba in Temple, Jenny Jet
in Dallas



Mobile Internetworking


A number of similar schemes (Ioannidis, Uehara, and Wada, for example) have
been proposed for extending TCP/IP, and hence, the Internet, to mobile
computers. The situation is a little more complicated because, as originally
conceived, TCP/IP addresses combine two distinctly different pieces of
information: unique name and current location (kind of like "Minnesota Fats"
or "Boston Blackie"). When host computers were immobile, this didn't matter;
if they did move, we changed their names. Clearly this solution won't work for
a computer tooling down Route 66.
A design criterion for all of the mobile-internetworking proposals is to
minimize the impact of supporting mobile hosts on the existing network as much
as possible. Thus, not only should everything that works today continue
working, but a stationary host should be able to communicate with a mobile
host just as if it were another stationary host. Basically, this means that
the mobile host's address doesn't change, at least from the point of view of
other hosts communicating with it.
The Internet Packet Transmission Protocol (IPTP) proposed in a July 1993
Internet Draft (Wada) endows a mobile host with two addresses: a home address
(the unique name, "Blackie") and an away address (the current location,
"Boston"). The home address doesn't change and is the permanent name of the
host known to the world. The away address does change and is the address at
which the mobile host can currently be reached. The only change to the network
is the addition of a piece of software called a Packet Forwarding Server to
the mobile host's home network, which keeps track of where the mobile host is
and forwards messages to it.
As the mobile host moves around, it acquires an away address from each local
network whose territory it enters, and it sends this away address back to its
home network's Packet Forwarding Server. In this way the home network always
knows where the mobile host is and how to reach it. Messages to a mobile host
are always sent to its home address; when they're received, the Packet
Forwarding Server readdresses them to the mobile host's away address. Messages
from a mobile host go directly to whom they are addressed and need not detour
through the Packet Forwarding Server on the mobile host's home network. The
return address on these messages is the mobile host's home address, not its
temporary away address.
As hosts and routers on the Internet are willing to become more
mobile-host-aware, there are a number of efficiencies that can be introduced
into this minimum-impact protocol, and many of these are described in the
referenced papers. For example, a sender might indicate that it is willing to
track the mobile host as well so the mobile host could set its return address
to its away address rather than its home address.
--S.B.G.



































April, 1994
Help for Windows Help Authors


Windows help authoring tools provide quick relief




Al Stevens


Al is a DDJ contributing editor. He can be reached through the DDJ offices or
on CompuServe at 71101,1262.


To be taken seriously, a Windows application must provide online help. Users
have come to expect it, and developers have little choice but to provide it.
However, like staff meetings, program documentation, and user's guides, it's a
task that programmers approach as willingly as they would a root canal. But
like it or not, most Windows developers must eventually build a help database.
Fortunately, Windows includes WINHELP.EXE, an application that displays
online, context-sensitive help in a standard format. You design a help
database and build the hooks into the application. WINHELP does the rest.
Building a help database, however, is no easy task. If you're lucky, your boss
hires a professional tech writer to do most of it. There is more to the job
than writing the words and composing the pictures. You use a number of
unrelated tools to convert the help words and graphics into a database format
that WINHELP recognizes. You can work with these tools in their native
autonomous environments or use a Help authoring tool to integrate them into a
project-oriented toolset. This article describes the components of a Windows
help database, addresses the manual procedures for building one, and discusses
three Help authoring tools that ease the process. One tool, the Windows Help
Author, comes as an unsupported application on the Microsoft Developer Network
(MSDN) CD. The other two, Windows Help Magician from Software Interphase and
RoboHelp from Blue Sky Software, are commercial tools from third-party
vendors.


About WINHELP


WINHELP is an independent Windows application that comes with Windows.
Developers use it to provide online help similar in look and feel to that of
other Windows applications. WINHELP displays help text and graphics from
databases that conform to a prescribed format. The format supports hypertext
links, keyword searches, graphical displays and controls, and navigational
controls. Applications programs associate their run-time contexts with
specific topics in the Help database. The developer composes the help
database, assigns run-time context identifiers to the help topics, and puts
the associated context-sensitive hooks in the applications code.
A help database can have text, graphics, motion video, and sound. The text can
include highlighted phrases that you click on to pop up informational windows
or jump to other topics in the text. Graphical elements such as tool-bar
buttons, icons, and screen shots can be displayed and clicked on. The help
document can talk, display pictures, play music, and show movie clips. There
is an automatic table of contents and a keyword search feature. There are
navigation functions that jump forward and backward through the topics. The
user can place and retrieve bookmarks in the text. All of these features are
implemented by WINHELP based on a database that the developer builds.
Although WINHELP typically provides online help to applications, its
hypertext, multimedia, and navigational features make it useful for presenting
other kinds of information. You can run WINHELP either from within an
application, or as a stand-alone program and command it to display text and
graphics from any conforming database. WINHELP is commonly used for online
reference and users' documentation for compilers and other applications. When
you see a Program Manager group with one or more prominent yellow
question-mark icons, there is a good chance that they each run WINHELP to
display a different documentation database. There are even "readme" files
implemented as help databases.


Help-Project Tools and Components


Building a help database involves several tools. You need a word processor
that reads and writes the so-called "rich-text format" (RTF), which is an
abominable concoction of embedded ASCII tokens that define how a document
should appear. The RTF editor is used to compose the help database.
Theoretically, you could use an ASCII text editor to build an RTF document,
but it's not advisable. You're better off using a word processor that works
with the format and displays comprehensible text on the screen.
A second text editor, such as Notepad, that works with ASCII text maintains
the help project file, which describes various components in the database. If
your database includes graphics, you need a program to produce bitmap files.
If the user is to make selections by clicking on parts of the graphics, you
need the Hot-Shot Editor. The Help Compiler reads the project, text, and
bitmap files and compiles the help database into the format that WINHELP
expects. (The Help Compiler and Hot-Shot Editor are bundled with most Windows
software-development environments and are included in the SDK.) Finally, you
need your own software-development environment to put help context identifiers
into the application.
Among the project components you'll use when building a help database are:
Help Project File. A help database uses an ASCII text project file with the
.HPJ filename extension. The project file contains options for the Help
Compiler, including the name of the .RTF text files, and the title and size of
the help window. It also associates string topic identifiers that you place in
the help text with integer context identifiers in the programs.
Help Text. The help text contains the narrative text for the help database,
tokens that specify the filenames and position for graphics, topic names and
identifiers, and the linkages for hypertext references and keyword
associations.
The RTF word processor must be able to encode double-underlined and hidden
text and insert footnotes into the document, all by using RTF protocols.
Footnotes, which are tagged with $, #, and K characters and appear just ahead
of the title for each topic, provide mnemonic identifiers for the topic,
titles for the table of contents, and keywords for the search. Underlined text
identifies hypertext phrases. Hidden text immediately follows the hypertext
phrases and specifies the link's mnemonic identifier. Word for Windows 2.0 and
Ami Professional both have these capabilities.
Multimedia. A help database can include graphics, sound bites, and movie
clips. You include these elements by putting tokens in the text that identify
what they are and their filenames. For example, to insert a bitmap you put the
following token into the text at the character position where you want the
upper-left corner of the bitmap to show:
{bmc dolly.bmp}
The bmc keyword specifies a bitmap. dolly.bmp is the name of the file in this
example that contains the bitmap. The Help Compiler uses the token to build
the graphic rendering into the database.
If the user clicks on a graphic to jump to another topic, you underline the
token and add a hidden topic identifier to link to the topic. If parts of the
same graphic point to different jump links, you use the Hot-Shot Editor to
identify the coordinates of each jump area. For example, I put a picture of
the application's tool bar in the help database and used each tool button to
jump to the topic that describes its function. A weakness of the help
development system is that graphics in the database slow the Help Compiler
down to a crawl when you use the compression options to build the database. A
big help file with many large pictures can take hours to compile.


Context-Sensitive Help


Most help databases support context-sensitive help. If they do not, the user
must start at a table of contents or do a keyword search to find a particular
help topic. By associating help topics in the database with menu selections,
dialog boxes, controls, and other application-specific contexts, you give the
user the ability to go directly to the help topic that discusses the currently
selected application context. For example, suppose that you assign context
identifier 3 to the EDIT_MENU identifier. These are arbitrary values that you
decide to use. In your program, you associate the integer 3 with the Edit menu
label on the menu bar. In the help text, you associate the EDIT_MENU
identifier with the topic in the text that explains the menu. When the user
selects that menu and asks for help, winhelp.exe displays the associated
topic.
To integrate a help database with an application, you modify the source code
to specify the database name and to associate context identifiers with
different parts of the application. How you do this depends on the programming
language. Visual Basic's Options/Project menu opens the Project Options dialog
box where you add the name of the help database. The Menu Design Window
includes a HelpContextID field for the numeric value associated with the
associated help topic. The Properties windows for the application's controls
include similar HelpContextID fields.
A C program that uses the Windows SDK API intercepts the WM_KEYDOWN message
and watches for the F1 key, either from the application window's processing
module or by using the SetWindowsHookEx function to install a filter function
that intercepts messages to dialog boxes. Once the key is pressed, the program
calls the SDK's WinHelp function passing the name of the help database and the
context identifier number.
Microsoft Foundation Classes programmers associate help contexts with controls
using the MAKEHM tool, which constructs topic mnemonics from the source-code
control identifiers and assigns context identifiers to them. The message map
associates the Windows ID_HELP message to CWinApp::OnHelp. AppWizard builds
this framework automatically when you elect to include context-sensitive help
in your application.


Building a Database: The Manual Approach


The Windows Program Manager has features that help organize the tools into
what approaches an integrated fashion. Recently I worked on a help database
for a Visual Basic application. I wanted VB and the help-authoring tools
available at the same time so that I could put context identifiers in the
program while I wrote topics in the help database.
I set up a Program Manager group for the project. An icon runs Visual Basic
with the application's makefile on the command line. That takes care of the
software-development side of the project. Another icon runs Notepad to edit
the help project file. A Word for Windows icon starts Word to edit the RTF
text file. An MS-DOS prompt icon starts a DOS batch file that runs the Help
Compiler. I used Paintbrush to build and change bitmaps; it has an icon in the
Program Manager group. Finally, an icon runs WINHELP itself to view the help
document during each stage of its development. These tools sit together as
Program Items in a Program Group with the startup subdirectories and
command-line document files built into their properties. Thus, not only do I
avoid rummaging through all of the groups to find and run them, but they start
up with the help files loaded and ready to modify.
The manual approach works, but it is not perfect. Getting into and out of Word
involves telling Word each time to convert the RTF format. The help document
in Word does not resemble the display that WINHELP uses. You get neither a
visual tool nor WYSIWYG. To see the real thing, you must run the Help Compiler
and compile the whole database, which can take a long time. Inserting the
correct footnote tokens with the correct footnote values is a tedious and
error-prone process. Remember that you are using the features of a word
processor to create links and chains in a textual database, a text editor to
associate the link identifiers with numbers in the project file, and a
software-development environment to put the numbers in the source code. There
are no built-in integrity checks. Nothing ensures that you properly coordinate
the contexts and topic identifiers among the three files. Some, but not all,
of these problems go away when you use a help-authoring tool.



Microsoft Windows Help Author


The Microsoft Developer Network CD contains an "Unsupported Tools and
Utilities" section that includes a program named "Help Author," which is easy
to use and well documented. The MSDN CD-ROM includes as a bonus an extensive
Help Authoring Guide that covers the creative side of the job.
Help Author has two parts: an application named "Help Project Editor" and a
Word for Windows template. The Help Project Editor uses dialog boxes to
collect the information for the ASCII project file. You don't have to deal
with that file again. It also automates the interface with Word, launching it
with the template installed and the RTF file loaded. A help database can
contain more than one RTF file, and Help Project Editor keeps them in a list.
You can also launch the Help Compiler and WINHELP to view the currently
compiled database.
The Word template adds three tool buttons to the Word tool bar. The first one
opens a dialog box that lets you change the footnote values in the current
topic. You can add, change, and delete the topic's title, context mnemonic
string, keywords, browse sequence, and so on, all without having to deal with
Word's footnote commands. The second tool button opens a dialog box that lets
you insert jump and pop-up links into the database, automatically applying the
underlined and hidden text attributes. If you select a phrase that is already
in the text, the operation uses it. Otherwise it inserts whatever you put in
the dialog box as the link phrase.
The third tool button compiles only the current topic into a temporary help
database and calls WINHELP to display it. You preview a topic--text, graphics,
sound, and movies--exactly as the user sees it and without recompiling the
entire database. What you see in Word and what WINHELP displays is usually
quite different. You need to view your progress in small increments, and this
feature supports that need. It is Help Author's strongest advantage over the
other tools, and anyone who develops a sizable help database wants this
capability. The other tools do not have it.
Help Author does not integrate graphics and multimedia tools. You still have
to launch them yourself and write their files into the proper subdirectory so
that the Help Compiler finds them.
An amusing side to Help Author is that its own help database has context
errors. Most of the Help buttons on its dialog boxes link to help topics that
do not exist, although there are topics in the database to cover the functions
of the dialog boxes. Nonetheless, Help Author smooths several of the wrinkles
out of the manual procedure, automates most of the tedium of using Word to
build the database, and is well worth trying. As a Windows developer, you
should have the MSDN CD-ROM, anyway. Help Author is a bonus.


Windows Help Magician


Windows Help Magician from Software Interphase has some good features, some
bugs, and some annoying quirks. Among its quirks is that the setup includes a
package called "Bitmap Magician." Its purpose is to let you build a pseudofont
by converting the characters in an existing font into bitmaps that you can
include in the help database. Help Compiler uses only a few fonts and doesn't
accept all of the characters in the fonts it does allow. For example, you can
put the trademark character in the text, but the Help Compiler deletes it from
the help database. Bitmap Magician solves that problem.
When you run it, Bitmap Magician asks you to select a font. When you do, it
says that there was an "overflow" with no explanation of what that means.
Next, you learn that you are looking at only a demo version. The dialog
advises you how to order the real thing. When you acknowledge that piece of
good news, the program changes the mouse cursor to an hourglass and leaves it
that way. Most Windows users would think that the system is hung up. Not
really. You can use the hourglass cursor as if it were an arrow. Close the
program, delete its icons from the Program Manager group and proceed to the
Help Magician itself.
The second annoyance is the overall appearance of Help Magician's windows. The
design is an example of a designer gone wild with enthusiasm over 3-D
sculptured controls but without the design skills to know how to use them.
Everything in the application window and all of the menus and dialog boxes are
broadly sculptured. I understand that this is a matter of taste, but I have
never seen anything quite like this. There is no menu bar, only a big, fat,
sculptured tool bar. When you punch it, it pulls down a menu, also sculptured.
The menus use tool buttons. There are the usual File, Edit, Options, Help, and
other menus, but they are all represented by ugly tool buttons. The real tool
bar is sculpted at the bottom of the window. The overall appearance detracts
from the program's functionality.
Help Magician works with its own database format while you compose the help
information. Then it converts to RTF format to run the Help Compiler. It
launches the word processor of your choice but does not provide templates.
Help Magician has its own editor. That's a good idea, but it's not
particularly well implemented. Text that you select for titles and links is
surrounded by vertical bars. The vertical bars come in pairs and have to be
balanced. You cannot distinguish a starting vertical bar from its terminating
vertical bar in a pair. You cannot distinguish two sets of different pairs of
vertical bars. A help topic with centered or justified text, a title, and some
jump links displays with a mlange of pairs of vertical bars. It's hard to
read.
I went through the tutorial process and then tried to add a MIDI sound bite to
the tutorial's help document. Somehow I messed up the database. Somehow the
MIDI insertion upset the balance of the vertical bars in the line of text. I
could not build the RTF file or delete the line. Help Magician stubbornly
issued error messages no matter what I tried. Finally, I deleted the entire
topic, resulting in lost work.
Next I moved to my own project and imported the RTF file that I built using
the manual procedure and Help Author. Without making any changes, I tried to
rebuild the Help Magician database into a new RTF file. Help Magician reported
another unbalanced marker, this time telling me that I could delete it with a
Ctrl+bracket key combination. It didn't tell me that before. I don't know
where the unbalanced marker came from. I looked at the original RTF file, and
everything looked okay. I deleted the unbalanced marker and saved the RTF
file.
Help Magician uses a single-font edit box, and your view of the help text is
completely unlike what it is going to look like in a help window, far more so
than if you are using Word. Centered text is not centered, and margins are not
shown. Those distracting vertical bars are everywhere. To move from topic to
topic, you have to change the page number in an edit box at the bottom of the
screen and press the Enter key.
There is no preview mode. You can test the database, which displays the help
in the same single-font, vertical-bar format and lets you exercise the jumps
and popups. However, to see the real thing, you must compile the entire
database and run WINHELP.
In one place, the RTF import mangled a graphic token. You could see where some
of the RTF protocols were exposed as if they were text. The result was that
the Help Compiler could not find the bitmap file. I was able to fix the token
in the editor, but the graphic lost its text-centered attribute. I found no
way in Help Magician to center or otherwise justify text or tokens. Similarly,
there seems to be no way to set margins other than to launch Word, do it from
there, and import the RTF file again. Not a good idea, given the import
mangling. There were several other places where the import mangled the RTF
file. I had to fix them in the Help Magician editor, and, once again, had no
way to set the margins or control the justification.
I launched Word from Help Magician to look at the saved RTF file. It was
different now. All of the link phrases and their context identifiers were
displayed with a strike-through font and Help Magician had added a bunch of
its own footnotes. Even though I had a copy of Word running, Help Magician
launched a new copy. (Help Author's launch was smart enough to use the copy of
Word that was already running.) The strike-throughs and new footnotes didn't
seem to have hurt anything.
Help Magician launches the Help Compiler, WINHELP, Word, HotShot Editor,
Paintbrush, and the Sound Recorder. It installs Microsoft Video playback
software and shows you how to add audio, video, animation, and MIDI to a help
database. Before you use Help Magician on a real project, however, spend some
time with its tutorial and get a feel for how it works and where the bugs are.
You might like it, and you might not.


RoboHelp


RoboHelp from Blue Sky runs on top of the word processor. The current release
supports Word for Windows only, although Blue Sky plans support for other
packages. It won't be an easy port because the main part of RoboHelp is
implemented as a Word document template with macros written in the WordBasic
programming language. There are some other executable utilities, including one
that launches a RoboHelp project, but you can just as easily launch it
yourself from Word simply by opening a document that includes the RoboHelp
template. This implementation is a dramatic example of what a programmer can
do with WordBasic.
The RoboHelp template modifies Word's menus and tool bar and adds a floating
tool bar that stays on top and to the right of the document while you are
editing. The commands open dialog boxes to establish and change the
characteristics of the help project. You add topics, jumps, popups, graphics,
search keywords, topic titles, and context mnemonics by pressing tool buttons
and filling in the dialogs. RoboHelp manages the document and the help project
file and does not use its own database format; it uses the Word document
format. One tool button writes the RTF file from the Word document. Another
runs the Help Compiler to build the help database.
You can preview a topic by pressing a button, but the preview is not much
better than what Word is already showing you. For example, it doesn't show
graphics. In fact, RoboHelp's topic preview is not as good as viewing the
topic in Word. The preview justifies all of the text in the left margin
regardless of how you have the margins and paragraphs set up. It shows the
graphics-insert tokens just as they appear in the document. It's a feature
they could have left out.
There are other things that I would change. When you open a RoboHelp document,
its Word template defaults the Edit/Find command to search for hidden text,
presumably so you can find the jump and pop-up links. Most of my searches are
for text in the document, which is not hidden, so I have to change the Find
property every time. The "H" tool button that the template added changes
selected text to unhidden, an odd choice for this button. I wouldn't think
you'd need this one very often. A better choice would be to toggle hidden text
into and out of view. Some of the time you want to see the links; other times
you want the text to line up more like it does when WINHELP displays it.
Like Help Magician and unlike Help Author, RoboHelp does not use an existing
running copy of Word; it launches its own. Because RoboHelp is a Word
template, you can see its source code by opening the macros. Furthermore, if
you don't like the behavior that I just described or anything else, you can
modify it by changing the WordBasic code. You could even add your own
adaptation of Help Author's indispensable topic-preview feature.
RoboHelp builds context-identifier files you can include in C++ and Visual
Basic programs. It includes a VBX control that adds a help button to a dialog
box and prompts you for the associated context identifier. A Screen Capture
utility manipulates pictures from the Clipboard. It runs in the background
waiting for you to put some graphics in the clipboard. When you do, it pops up
with an image-processing tool that lets you modify what you captured into a
bitmap for your help database. There is also an icon-composition tool included
with RoboHelp.


Summary


It's not hard to pick a favorite from these alternatives. The manual approach
worked, but was tedious. I prefer it over Help Magician, however, which seems
to be not quite ready for prime time. Help Author is an order of magnitude
better than the manual setup, and RoboHelp is far and away the best tool for
the job that I've seen.
When I started this project, I went looking for "Visual Help." Although I
didn't find exactly that, I am satisfied that there are tools that make the
job easier. I do think, however, that a need for such a product exists. It
would have all of the best features of the three packages discussed here. In
addition, its editor would emulate the WINHELP display--that's the "visual"
part. The tool would use RTF as its native database format and could emulate
the jumps, popups, and multimedia features of WINHELP. Unlike Word, it would
display graphics without taking all day. Such a program would eliminate Help
Compiler until the end of the help document development project. This would be
a valuable product. If you build it, they will come.
For More Information
Microsoft Developer Network CD
Microsoft Corp.
One Microsoft Way
Redmond, WA 98052-6399
800-759-5474

The Windows Help Magician
Software Interphase Inc.
82 Cucumber Hill Rd., #113
Foster, RI 02825
401-397-2340

RoboHelp

Blue Sky Software
7486 La Jolla Blvd., Suite 3
La Jolla, CA 92037-9582
800-677-4946


























































April, 1994
Algorithms for Directed Graphs


A unique approach using genetic algorithms




Salvatore R. Mangano


Sal is president of Man Machine Interfaces. He can be reached at 555 Broad
Hollow Road, Suite 230, Melville, NY 11747 or on CompuServe at 72053,2032.


Directed graphs underlie any tool that displays a tree diagram,
class-relationship diagram, or entity-relationship diagram. As such, you might
expect a CASE tool to provide an optimized directed-graph drawer. However,
most CASE tools I'm familiar with punt when addressing this problem. Although
an algorithm for drawing a directed graph like that shown in Figure 1 is
straightforward, a general-purpose graph drawer that draws graphs in an
aesthetically pleasing format is difficult to create and computationally
expensive. So, CASE tools usually use a few simple rules to get an initial
layout and then allow the user to clean things up by dragging objects around.
Putting the burden of "pretty drawing" on the user wastes time better spent on
the design.
This article looks at a novel solution to this problem using the emerging
technology of genetic algorithms (GAs). Specifically, I'll use EOS, my
company's C++ GA application framework, and Microsoft's Visual C++ to develop
a module for optimizing the aesthetic layout of directed graphs. I'll create a
Windows-hosted test application that exercises this module on randomly created
graphs. The intent of this article is not to produce a commercial-grade graph
drawer, but rather to demonstrate the use of GA technology on a nontrivial and
unique problem. Since most programmers' first exposure to GAs is usually on a
function-optimization problem, this article provides some insights on the
advanced use of GA techniques.


The Technique


A GA is an algorithm that works with a population of potential solutions to a
problem. Through means analogous to biological natural selection, it evolves
better and better solutions to the problem. To accomplish this, the user of a
GA must first find a way to encode the problem into a string of bits. The bits
are analogous to genes and the strings to chromosomes. The encoding of a
solution as bit strings is often called the "genotype" and the decoded
solution, the "phenotype." The genotype is mapped to the phenotype by the
decoding function. The next step after the encoding is the measure of fitness.
Fitness is one of the core elements that appear in every variation of a GA.
Calculation of fitness involves mapping a solution onto a positive number such
that greater numbers correspond to better solutions. This mapping is
accomplished by the fitness or objective function.
The second core feature of every genetic algorithm is a population of
individuals. At any time during the execution of a genetic algorithm, there
exists a population of candidate solutions to the problem (individuals
consisting of a genotype and a phenotype). The initial population is usually
generated randomly. The process of transforming this initial, mediocre
population into a population containing near-optimal solutions is the heart of
the GA. It proceeds by iterations of the following genetic operators:
selection, reproduction, crossover, and mutation.
Selection is the process by which candidate individuals from the current
generation are chosen to produce the next generation. Selection is a
survival-of-the-fittest strategy. After two individuals are selected, a
weighted coin is flipped to determine if the individuals will mate to produce
two new offspring or simply be placed in the next generation as is. Mating is
accomplished by the crossover operation. The probability of mating is called
the "crossover probability" (pc). The simplest form of crossover, called
"single point," is shown in Figure 2. As each bit is copied from parent to
child, it is subject to mutation based on the mutation probability (pm). Pm is
usually very small, relative to pc. Iterations of selection, reproduction, and
mating are repeated until a new population is created. At this point, the
fitness values of the new population are recalculated, and the process repeats
until either some acceptable solution is found or an upper time limit has been
reached. The GA described above can be summarized by the procedure shown in
Figure 3.


The Tools


A GA framework is useful due to the large number of GA variants that can be
produced by altering one or many of the steps in the basic algorithm. GA
researchers have invented several variations on selection, reproduction,
crossover, and mutation. Each variation can be mixed and matched to produce a
unique GA variant. Object orientation turns out to be an ideal technique for
expressing these variations. Through an adept combination of inheritance and
composition, all the GA variants can be expressed. This ultimately allows you
to code a GA using the basic technique and then try variations by
instantiating different classes.
Although EOS consists of over 80 classes, I'll restrict this discussion to a
small relevant subset--TBasicGA, TPopulation, TEnvironment, TBinaryIndividual,
TGenotype, TPhenotype, TBinaryCrossoverFunctor, and TBinaryMutation- Functor.
These bases consist of many derived classes that implement variants of the
basic GA. Other classes exist to implement special-purpose features. Each of
the classes listed encapsulates a different behavior of the overall GA.
TBasicGA is the genetic-algorithm interface class. TPopulation is a collection
class that holds instances of TIndividual. TPopulation encapsulates the
selection operation. TEnvironment encapsulates the GA's parameters--pc, pm,
the random-number generator, and other statistical information.
TBinaryIndividual is an interface class that unifies instances of TGenotype
and TPhenotype into a single object. TBinaryGenotype encapsulates the binary
genetic coding of the problem as strings, and it provides an interface to the
crossover and mutation classes. TPhenotype encapsulates the decoding function
and the fitness function. It is the main class from which you derive to build
a GA-based application. TBinaryCrossoverFunctor and TBinaryMutationFunctor are
classes that encapsulate the operations of crossover and mutation. These
classes are called functors because they are functional objects. Functors are
used so various flavors of crossover and mutation can be plugged in or out of
a genotype without recoding any of the genotype's methods. Figure 4 shows the
relationship between these classes.


The GA Module


To derive a genetic encoding, I'll formalize the problem we are attempting to
solve. We are given some arbitrary directed graph, as well as a grid where
each cell represents a potential home for the graphical depiction of a node in
the graph. The goal is to find an assignment of nodes to cells such that when
the arcs are drawn, we get an aesthetically pleasing picture. Stating the
problem in this way makes some crucial assumptions that may not be true in a
real situation. First, I assume that the nodes, when drawn, are of equal size.
Second, I assume that once I have assigned nodes to cells, the arcs can easily
be drawn to complete the best possible drawing. (In other words, we need not
optimize the drawing of arcs.) Third, I assume that the nodes are equally
spaced in a grid and not arbitrarily placed on the output screen. I make these
assumptions to simplify the example and the code. A more general solution is
certainly possible using GAs.


Genetic Encoding


If each node in the graph is assigned a sequential number, then the problem
can be viewed as a mapping of each node number onto an (x, y) coordinate in
the grid. The mapping that keeps connected nodes close together and produces
the fewest arc crossings will yield more aesthetically pleasing drawings.
Other domain-dependent criteria may come into play when determining better
drawings, but I ignore this possibility to expedite the solution.
Given the above formalism, the encoding treats the bit string as a series of
(x, y) pairs. The first pair assigns node0 to grid cell (x0, y0). The second
assigns node1 to (x1, y1), and so on. This encoding allows collisions (two or
more nodes are assigned to a single cell), so I need a collision-resolution
procedure. A problem like this often arises in GAs, and there are several
approaches to handling it. Some programmers assign very low fitness values to
illegal genotypes. Others attempt to repair illegal genotypes before decoding
them. Still others create special-purpose crossover and mutation operators
that do not allow illegal genotypes to arise in the first place. In this case,
I'll resolve a collision by searching for the closest empty cell to the one
assigned, according to a fixed procedure. This is similar to a repair
technique, but we are repairing the phenotype instead of the genotype.
Given that I have a graph with N nodes and a grid that is X cells wide and Y
cells high, I can calculate the required length of the bit string using the
equation in Figure 5. If X and Y are not powers of 2, then it is possible that
(x, y) pairs can be encoded such that either x or y is greater then X or Y. I
resolve this problem by always decoding x modulo X and y modulo Y. The
decoding is implemented by a TPhenotype derived class called
CGraphDrawingPheno. This class contains a two-dimensional matrix that will
represent the grid. The genotype will be decoded so that each entry in the
matrix will receive the node number of the node assigned to that grid
position. Empty positions will be assigned a value of 0. The decoding of the
genotype is implemented by CGraphDrawingPheno::Decode() member function; see
Listings One and Two, page 106. This function uses a reference to a graph
drive class to determine the number of bits in each component in the encoding.
It copies these bits to buffers to be converted to integers by the utility
functions AllelesToInt(). "Allele," a term borrowed from biology, refers to
the value expressed by a gene. Once the node, its row, and column in the grid
are decoded, the member function GetNearestEmptyCell() is called to resolve
the possibility of a node already existing at the desired location.


The Fitness Function


Now that I have a way to encode the placement of a graph's nodes on a grid, I
need a technique for evaluating each placement's fitness. There are many ways
to do this, depending on what you consider to be an aesthetically pleasing
layout. When deriving this fitness function, I let intuition guide me in the
initial derivation and then experiment to tweak the function so it works well
for a variety of graphs. The function I ultimately arrived at can be seen in
the CGraphDrawingPheno class's CalcFitness() member function; see Listing Two.
The idea behind this function is to reward genotypes that decode into drawings
where nodes connected by an arc are adjacent or close and to penalize when
nodes are adjacent but not connected. This is done on a node-by-node basis so
the resulting fitness value is a measure of how well nodes of the entire graph
were assigned to grid locations. Notice that I completely ignore arc drawing
for simplicity. The remaining members of CGraphDrawingPheno implement
construction, destruction and copying. I also include some private-utility
functions that encapsulate the testing for adjacency and the calculation of
distance between grid cells. These can be used in experimenting with
variations of the fitness function.
I include three other classes in this module: CGAGraphDriver, CGraphDrawerGA,
and CWordMatrix. CGAGraphDriver is an interface class that collects
information (such as number of nodes in the graph and the size of the grid)
before the GA and its associated objects are initialized; see Listings Three
and Four (page 107). A very important function in this class is
CalcChromosomeLength(), which, based on the number of nodes and the size of
the grid, determines the number of bits necessary in a chromosome to encode
the problem. Also included in this class are functions for drawing the
optimized and unoptimized views of the graph. These use Windows-specific
functions, but the logic can be easily ported to other graphics systems.
CGraphDrawerGA is derived from the EOS class TBasicGA. It overrides the
population-creation function and several reporting functions useful for
testing the GA's performance before it is embedded into a larger application.
Listings Five and Six (page 147) show the class declaration and implementation
of CGraphDrawerGA. The important function here is CreatePopulation(). This
function determines the characteristics of the genotype, phenotype, and the
population. I use a two-point crossover-operator genotype (instead of single
point) because this tends to work better with longer chromosomes. I also use a
population technique known as Elitism, which ensures that a certain number (in
our case, two) of the best individuals from the previous generation make it to
the next generation. This improves performance on some types of problems.
The utility class CWordMatrix implements a 2-D matrix of WORDS. Its
implementation makes use of the MFC's CObArray and CWordArray to create an
array of CWordArrays.


The Test Program



To test the graph-optimizing GA, I created a simple Windows-hosted application
using Visual C++. I used AppStudio, AppWizard, and ClassWizard to automate the
creation of this program. The program uses two dialog boxes. The first dialog
box allows me to specify the graph logically in terms of the number of nodes
and their connections. The second allows me to specify the bounds of the grid
and trigger the GA optimization of the graph on the grid. I also include
options for displaying the optimized and unoptimized views of the graph. The
optimized view is the best solution found by the GA; the unoptimized view is
an arbitrary drawing of the graph from first node to the last. Due to the
length of the test program, the complete listings for this project are
available electronically; see "Availability," page 3.


Conclusion


Experiments that I have conducted using the GA-based graph-drawing module
demonstrate that GAs present a viable solution to the problem. By improving
the fitness function and by possible use of custom genetic operators, I
believe that this technique will also work in commercial CASE tools.


References


Coplien, James O. Advanced C++: Programming Styles and Idioms. Reading, MA:
Addison-Wesley, 1992.
Goldberg, David E. Genetic Algorithms in Search, Optimization, and Machine
Learning. Reading, MA: Addison-Wesley, 1989.
Michalewicz, Zbigniew. Genetic Algorithms + Data Structures = Evolution
Programs. New York, NY: Springer-Verlag, 1992.
 Figure 1: Typical directed graph.
 Figure 2:Single-point crossover.
Figure 3: Pseudocode detailing the GA process.
Initialize a random population and measure its fitness.
WHILE (stopping criteria has not been reached)
BEGIN
 WHILE (next generation is not full)
 BEGIN
 Select 2 parents randomly based on fitness.
 IF (Flip(pc)) THEN
 Cross parents (mutating with probability pm)
 and place children in next generation.
 ELSE
 Place parents into next generation untouched.
 END
END

Solution with highest fitness is the answer.

Figure 4: Relationship between classes.
Figure 5: Equation to calculate the required length of a bit string.
[LISTING ONE] (Text begins on page 92.)

//File: GRPHPHEN.H
#ifndef __GRPHPHEN_H
#define __GRPHPHEN_H

//Header for EOS class representing a phenotype.
//You need EOS v1.1 to compile this code
#ifndef __PHENO_H
#include "pheno.h"
#endif //__PHENO_H

class CGraphDrawingPheno : public TPhenotype
{
 public:
 CGraphDrawingPheno(CGAGraphDriver &driver,int width, int height) ;
 ~CGraphDrawingPheno() ;
 double CalcFitness() ;
 void Decode(PTGenotype geno) ;
 PTPhenotype Copy() ;
 void GetPhenoInfo(void *pInfoStruct) ;

 void GetNearestEmptyCell(const int row, const int col, int &actualRow,
 int &actualCol) ;
 BOOL Adjacent(WORD node1, WORD node2) ;
 BOOL Diagonal(WORD node1, WORD node2) ;
 BOOL FindNode(const WORD node, int &row, int &col) ;
 double Distance(WORD node1, WORD node2) ;
 double RectDistance(WORD node1, WORD node2) ;
private:
 int m_Width ;
 int m_Height ;
 CWordMatrix *m_pGrid ; //grid where each entry is a node
 // number or EMPTY_CELL
 CGAGraphDriver &m_Driver ; //interface to the graph driver class
 int * m_GridIndex[2] ; //index into grid to quickly locate nodes
};

[LISTING TWO]

//File: GRPHPHEN.CPP
#include "stdafx.h"
//eos headers
#include "eos.h"
#include "eosutil.h"
#include "geno.h"

//graph GA headers
#include "grphphen.h"
#include "wmatrix.h"
#include "gdriver.h"
#include "grphutil.h"

const HIGHEST_REWARD = 10 ;
const MEDIUM_REWARD = 5 ;
const SMALLEST_REWARD = 1 ;
const HIGHEST_PENALTY = 10 ;
const MEDIUM_PENALTY = 5;
const SMALLEST_PENALTY = 1;

CGraphDrawingPheno::CGraphDrawingPheno(CGAGraphDriver &driver, int width,
 int height)
 : m_Driver(driver)
{
 m_Width = width ;
 m_Height = height ;
 m_pGrid = new CWordMatrix(height,width,EMPTY_CELL) ;
 m_GridIndex[0] = new int [m_Driver.GetNumNodes()];
 m_GridIndex[1] = new int [m_Driver.GetNumNodes()];
}
CGraphDrawingPheno::~CGraphDrawingPheno()
{
 delete m_pGrid ;
 delete [] m_GridIndex[0] ;
 delete [] m_GridIndex[1] ;
}
double CGraphDrawingPheno::CalcFitness()
{
 WORD numNodes = (WORD) m_Driver.GetNumNodes() ;
 long maxDist = (m_Width + m_Height) ;
 maxDist*=maxDist;

 //set base fitness so even the worst case phenotype
 // will not bring fitness below 0
 int connectivity = m_Driver.GetConnectivity() ;
 double base_fitness = numNodes*(numNodes-1) * maxDist ;
 //* connectivity;
 double fitness = base_fitness ;
 for (WORD node1=0;node1<numNodes;node1++) {
 int node1Connections=Max(m_Driver.GetNumConnections(node1),1);
 for (WORD node2=0;node2<numNodes;node2++) {
 if (node1 == node2)
 continue ;
 BOOL bConnected = m_Driver.Connected(node1,node2) ;
 int node2Connections =
 Max(m_Driver.GetNumConnections(node2),1);
 double distance = Distance(node1,node2) ;
 distance*=distance;
 if (bConnected && distance > 4) {
 fitness -= distance ;
 //(node1Connections+node2Connections) ;
 continue ;
 }
 if (!bConnected && distance <= 4) {
 fitness -= 4/distance ;
 //(node1Connections+node2Connections) ;
 continue ;
 }
 }
 }
 ASSERT(fitness >= 0);
 return fitness ;
}
void CGraphDrawingPheno::Decode(PTGenotype pGeno)
{
 WORD numNodes = (WORD) m_Driver.GetNumNodes() ;
 int rowAlleleLen = m_Driver.CalcRowAlleleLength() ;
 int colAlleleLen = m_Driver.CalcColAlleleLength() ;
 int offset = 0 ;
 for (WORD node=0;node<numNodes;node++) {
 char rowAllele[16], colAllele[16] ;
 //we know that these are no bigger than sizeof(WORD)
 for(int bit=0;bit<rowAlleleLen;bit++)
 rowAllele[bit] =
 pGeno->GetExpressedGeneValue(offset++,0) ;
 for(bit=0;bit<colAlleleLen;bit++)
 colAllele[bit] =
 pGeno->GetExpressedGeneValue(offset++,0) ;
 int codedRow = AllelesToInt(rowAllele,0, rowAlleleLen-1) ;
 int codedCol = AllelesToInt(colAllele,0, colAlleleLen-1) ;
 int actualRow, actualCol ;
 GetNearestEmptyCell(codedRow,codedCol,actualRow,actualCol) ;
 m_pGrid->SetAt(actualRow, actualCol, node) ;
 m_GridIndex[0][node] = actualRow ;
 m_GridIndex[1][node] = actualCol ;
 }
}
PTPhenotype CGraphDrawingPheno::Copy()
{
 CGraphDrawingPheno * pPheno =
 new CGraphDrawingPheno(m_Driver,m_Height,m_Width) ;

 return pPheno ;
 //don't copy values because these are derived by the genotype via Decode
}
void CGraphDrawingPheno::GetPhenoInfo(void *pInfoStruct)
{
 *((CWordMatrix **)pInfoStruct) = m_pGrid ;
}
//Algorithm resolves collisions by searching around the neighborhood of
// (row,col) in the grid for an empty cell. The row and col of the empty cell
// is returned in actualRow and actualCol.
void CGraphDrawingPheno::GetNearestEmptyCell(const int row, const int col,
 int &actualRow,int &actualCol)
{
 //insure we are in range!
 actualRow = row % m_Height ;
 actualCol = col % m_Width ;
 //if we find and empty cell then no search necessary
 if (m_pGrid->GetAt(actualRow,actualCol) == EMPTY_CELL)
 return ;
 else { //search for "nearest" empty cell
 int maxDist=Max(m_Height,m_Width) ;
 int actualRow2 = actualRow ; //save actuals
 int actualCol2 = actualCol ;
 //start at a distance of 1 and search outward
 for (int dist=1;dist<maxDist;dist++) {
 //First check "sides"
 for(int i=-dist; i<=dist;i++) {
 for(int j=-dist;j<=dist;j++) {
 if (i!=j && (j==dist j==-dist 
 i==dist i==-dist)) {
 actualCol = actualCol2+j ;
 actualRow = actualRow2+i ;
 if(actualCol >= 0 && actualCol
 < m_Width &&
 actualRow >= 0 && actualRow
 < m_Height &&
 m_pGrid->GetAt(actualRow,actualCol) == EMPTY_CELL)
 return ;
 } //if
 } // for j
 } //for i
 //Now check 4 corner cells
 actualCol = actualCol2+dist ;
 actualRow = actualRow2+dist ;
 if(actualCol < m_Width &&
 actualRow < m_Height &&
 m_pGrid->GetAt(actualRow,actualCol) ==
 EMPTY_CELL)
 return ;
 actualCol = actualCol2-dist ;
 actualRow = actualRow2+dist ;
 if(actualCol >= 0 &&
 actualRow < m_Height &&
 m_pGrid->GetAt(actualRow,actualCol) ==
 EMPTY_CELL)
 return ;
 actualCol = actualCol2+dist ;
 actualRow = actualRow2-dist ;
 if(actualCol < m_Width &&

 actualRow >= 0 &&
 m_pGrid->GetAt(actualRow,actualCol) ==
 EMPTY_CELL)
 return ;
 actualCol = actualCol2-dist ;
 actualRow = actualRow2-dist ;
 if(actualCol >= 0 &&
 actualRow >= 0 &&
 m_pGrid->GetAt(actualRow,actualCol) ==
 EMPTY_CELL)
 return ;
 } //for dist
 } //else
 return ;
}
//Return TRUE if node1 is adjacent to node2 on the grid
BOOL CGraphDrawingPheno::Adjacent(WORD node1, WORD node2)
{
 int row1, col1 ;
 if (!FindNode(node1,row1,col1))
 return FALSE ;
 int row2, col2 ;
 //look up
 row2=row1-1 ;
 if (row2 >= 0 && m_pGrid->GetAt(row2,col1) == node2)
 return TRUE ;
 //look down
 row2=row1+1 ;
 if (row2 < m_Height && m_pGrid->GetAt(row2,col1) == node2)
 return TRUE ;
 //look left
 col2=col1-1 ;
 if (col2 >= 0 && m_pGrid->GetAt(row1,col2) == node2)
 return TRUE ;
 //look right
 col2=col1+1 ;
 if (col2 < m_Width && m_pGrid->GetAt(row1,col2) == node2)
 return TRUE ;
 return FALSE ;
}
//Return TRUE if node1 is diagonal to node2 on the grid
BOOL CGraphDrawingPheno::Diagonal(WORD node1, WORD node2)
{
 int row1, col1 ;
 if (!FindNode(node1,row1,col1))
 return FALSE ;
 int row2, col2 ;
 //look upper left
 row2=row1-1 ;
 col2=col1-1 ;
 if (row2 >= 0 && col2 >= 0 && m_pGrid->GetAt(row2,col2) == node2)
 return TRUE ;
 //look lower left
 row2=row1+1 ;
 col2=col1-1 ;
 if (row2 < m_Height && col2 >= 0 && m_pGrid->GetAt(row2,col1) == node2)
 return TRUE ;
 //look lower right
 row2=row1+1 ;

 col2=col1+1 ;
 if (row2 < m_Height && col2 < m_Width && m_pGrid->GetAt(row1,col2) ==
 node2)
 return TRUE ;
 //look upper left
 row2=row1-1 ;
 col2=col1+1 ;
 if (row2 >= 0 && col2 < m_Width && m_pGrid->GetAt(row1,col2) == node2)
 return TRUE ;
 return FALSE ;
}
//Return the Euclidean distance between nodes on the grid
double CGraphDrawingPheno::Distance(WORD node1, WORD node2)
{
 int row1, col1, row2, col2 ;
 if (FindNode(node1,row1,col1) && FindNode(node2,row2,col2)) {
 double diffRow = row1 - row2 ;
 double diffCol = col1 - col2 ;
 return sqrt(diffRow*diffRow + diffCol*diffCol) ;
 }
 else
 return sqrt(m_Height*m_Height + m_Width*m_Width) ;
}
//Return the recti-linear distance between nodes on the grid
double CGraphDrawingPheno::RectDistance(WORD node1, WORD node2)
{
 int row1, col1, row2, col2 ;
 if (FindNode(node1,row1,col1) && FindNode(node2,row2,col2)) {
 double diffRow = row1 - row2 ;
 double diffCol = col1 - col2 ;
 return Abs(diffRow) + Abs(diffCol) ;
 }
 else
 return m_Height + m_Width ; //really an error ?!?
}
//Use an index to quickly locate a node on the grid
BOOL CGraphDrawingPheno::FindNode(const WORD node, int &row, int &col)
{
 if (node >= m_Driver.GetNumNodes())
 return FALSE ;
 row = m_GridIndex[0][node] ;
 col = m_GridIndex[1][node] ;
 return TRUE ;
}

[LISTING THREE]

//File: GDRIVER.H
#ifndef __GDRIVER_H__
#define __GDRIVER_H__
//flag an empty cell in the grid
const EMPTY_CELL = 0xFFFF ;

class CGAGraphDriver
{
 //Interface
public:
 CGAGraphDriver(int numNodes, int width, int height) ;
 ~CGAGraphDriver() ;

 void SetGraph(CWordMatrix &graph) ;
 void Optimize(int numGenrations) ;
 void DrawOptimized(CDC &dc) ;
 void DrawUnOptimized(CDC &dc) ;
 //Query members (const)
 //Calc the length of a chromosome
 //needed based on the graph and grid
 UINT CalcChromosomeLength() const ;
 UINT CalcRowAlleleLength() const ; ;
 UINT CalcColAlleleLength() const ; ;
 int GetWidth() const ;
 int GetHeight() const ;
 int GetNumNodes() const ;
 BOOL Connected(WORD node1, WORD node2) const;
 int GetNumConnections(WORD node) const ;
 int GetConnectivity() ;
 void Stop() ;
 PTIndividual m_pBest ;
 PTIndividual m_pWorst ;
 BOOL m_Stop ;
 //Implementation
private:
 //Draw the graph in this grid
 void Draw(CDC &dc, CWordMatrix &Grid) ;
 //num nodes in the graph
 int m_NumGraphNodes ;
 //width of grid to draw on (in cells)
 int m_GridWidth ;
 //height of grid to draw on (in cells)
 int m_GridHeight ;
 //connection table representation of a graph
 CWordMatrix *m_pGraph ;
 //GA that will find the "optimal" drawing
 //of the graph on the grid
 TBasicGA *m_pTheGA ;
} ;

[LISTING FOUR]

//File: GDRIVER.CPP
//Used as an interface class to the GA.
//Stores the representation of the graph as
//a connection grid.

//required headers
#include "stdafx.h"

//Headers needed for EOS programs
//You need EOS v1.1 to compile this code
#include "eos.h"
#include "eosutil.h"
#include "geno.h"
#include "individ.h"
#include "gaenviro.h"

//headers specific to graph GA
#include "wmatrix.h"
#include "gdriver.h"
#include "grphutil.h"

#include "graphga.h"

//GA parameters used, these need not be
//hard coded in advanced implementations
const int POP_SIZE = 20 ;
const double PX = 0.7 ;
const double PM = 0.03 ;
const double RAND_SEED=0.76451 ;

//DRAWING parameters used, these need not be
//hard coded in advanced implementations
const int CELL_WIDTH = 30 ;
const int CELL_HEIGHT = 30 ;
const int CELL_SPACE = 30 ;

//Driver constructor initializes a graph with numNodes and a
//grid that the graph will be optimized to draw on (width x height)
CGAGraphDriver::CGAGraphDriver(int numNodes, int width, int height)
{
 m_NumGraphNodes = numNodes;
 m_GridWidth = width ;
 m_GridHeight = height ;
 //graph represented as boolean connection matrix
 m_pGraph = new CWordMatrix(m_NumGraphNodes,m_NumGraphNodes) ;
 //The Graph GA object
 m_pTheGA = new CGraphDrawerGA(*this) ;
 m_pBest = NULL ;
 m_pWorst = NULL ;
 m_Stop = FALSE ;
}
//Clean up in the destructor
CGAGraphDriver::~CGAGraphDriver()
{
 delete m_pGraph ;
 delete m_pTheGA ;
}
//set the conections from graph into the member m_pGraph
void CGAGraphDriver::SetGraph(CWordMatrix &graph)
{
 for (int row = 0 ; row < m_NumGraphNodes; row++)
 for (int col = 0 ; col < m_NumGraphNodes; col++)
 m_pGraph->SetAt(row,col,graph[row][col]) ;
}
// Optimize the drawing of the graph by first initializing the GA's population
// and environment. Then execute the GA for numGenerations generations
void CGAGraphDriver::Optimize(int numGenerations)
{
 m_pTheGA->CreatePopulation(POP_SIZE) ;
 m_pTheGA->CreateEnvironment(PX,PM,RAND_SEED) ;
 m_pTheGA->Evolve(numGenerations) ;
}
//Draw the optimized graph on the Windows DC
void CGAGraphDriver::DrawOptimized(CDC &dc)
{
 CWordMatrix *pGrid ;
 m_pBest->GetPhenoInfo(&pGrid) ;
 Draw(dc,*pGrid) ;
}
//Draw the un-optimized graph on the Windows DC

void CGAGraphDriver::DrawUnOptimized(CDC &dc)
{
 CWordMatrix *pGrid ;
 m_pWorst->GetPhenoInfo(&pGrid) ;
 Draw(dc,*pGrid) ;
}
void CGAGraphDriver::Draw(CDC &dc, CWordMatrix &Grid)
{
 CPen *pPen = (CPen *) dc.SelectStockObject(BLACK_PEN) ;
 for (int row = 0 ; row < m_GridHeight; row++)
 for (int col = 0 ; col < m_GridWidth; col++) {
 if (Grid[row][col] != EMPTY_CELL) {
 int x1 = col * (CELL_WIDTH + CELL_SPACE) + CELL_SPACE ;
 int x2 = x1 + CELL_WIDTH ;
 int y1 = row * (CELL_HEIGHT + CELL_SPACE) + CELL_SPACE ;
 int y2 = y1 + CELL_HEIGHT ;
 dc.Ellipse(x1,y1,x2,y2) ;
 char buffer[4] ;
 sprintf(buffer,"%d",Grid[row][col]) ;
 dc.TextOut(x1+CELL_WIDTH/4,y1+CELL_HEIGHT/4,buffer,
 strlen(buffer)) ;
 }
 }
 //draw arcs
 for (int node1 = 0 ; node1 < m_NumGraphNodes; node1++)
 for (int node2 = 0 ; node2 < m_NumGraphNodes; node2++)
 if (m_pGraph->GetAt(node1,node2)) {
 int row1, col1 ;
 Grid.Find(node1, row1, col1) ;
 int row2, col2 ;
 Grid.Find(node2, row2, col2) ;
 int x1 = col1 * (CELL_WIDTH + CELL_SPACE) + CELL_SPACE ;
 int x2 = col2 * (CELL_WIDTH + CELL_SPACE) + CELL_SPACE ;
 int y1 = row1 * (CELL_HEIGHT + CELL_SPACE) + CELL_SPACE ;
 int y2 = row2 * (CELL_HEIGHT + CELL_SPACE) + CELL_SPACE ;
 if (x1 < x2)
 x1 += CELL_WIDTH ;
 else
 if (x2 < x1)
 x2 += CELL_WIDTH ;
 else
 if (x1 == x2) {
 if (Abs(row1 - row2) > 1) { //route around!
 y1 += CELL_WIDTH/2 ;
 y2 += CELL_WIDTH/2 ;
 int x3 = x1 - CELL_WIDTH/2 ;
 dc.MoveTo(x1,y1) ;
 dc.LineTo(x3,y1) ;
 dc.LineTo(x3,y2) ;
 dc.LineTo(x2,y2) ;
 continue ;
 }
 x1 += CELL_WIDTH/2 ;
 x2 += CELL_WIDTH/2 ;
 }
 if (y1 < y2)
 y1 += CELL_HEIGHT ;
 else
 if (y2 < y1)

 y2 += CELL_HEIGHT ;
 else
 if (y1 == y2) {
 if (Abs(col1 - col2) > 1) { //route around!
 if (x1 < x2) {
 x1 -= CELL_WIDTH/2 ;
 x2 += CELL_WIDTH/2 ;
 }
 else {
 x1 += CELL_WIDTH/2 ;
 x2 -= CELL_WIDTH/2 ;
 }
 int y3 = y1 - CELL_HEIGHT/2 ;
 dc.MoveTo(x1,y1) ;
 dc.LineTo(x1,y3) ;
 dc.LineTo(x2,y3) ;
 dc.LineTo(x2,y2) ;
 continue ;
 }
 y1 += CELL_HEIGHT/2 ;
 y2 += CELL_HEIGHT/2 ;
 }

 dc.MoveTo(x1,y1) ;
 dc.LineTo(x2,y2) ;
 }

 dc.SelectObject(pPen ) ;
}
//Calculate the length of the chromosome needed to encode
//a drawing of the graph in a grid
UINT CGAGraphDriver::CalcChromosomeLength() const
{
 return m_NumGraphNodes*(GetNumBitsToEncode(m_GridHeight) +
 GetNumBitsToEncode(m_GridWidth));
}
UINT CGAGraphDriver::CalcRowAlleleLength() const
{
 return (UINT) GetNumBitsToEncode(m_GridWidth) ;
}

UINT CGAGraphDriver::CalcColAlleleLength() const
{
 return (UINT) GetNumBitsToEncode(m_GridHeight) ;
}
//Return TRUE if node1 is connected to node2
BOOL CGAGraphDriver::Connected(WORD node1, WORD node2) const
{
 return m_pGraph->GetAt(node1,node2) ;
}
//Returns the number of connection leaving node
int CGAGraphDriver::GetNumConnections(WORD node) const
{
 int count = 0 ;
 for (WORD i=0;i<m_NumGraphNodes;i++)
 if (i != node && m_pGraph->GetAt(node,i))
 count++ ;
 return count ;
}

//Returns the total number of connections in the graph
int CGAGraphDriver::GetConnectivity()
{
 int count = 0 ;
 for (WORD node1=0;node1<m_NumGraphNodes;node1++)
 for (WORD node2=0;node2<m_NumGraphNodes;node2++)
 if (node1 != node2 && m_pGraph->GetAt(node1,node2))
 count ++ ;
 return count ;
}
void CGAGraphDriver::Stop()
{
 m_Stop = TRUE ;
}

[LISTING FIVE]
//File: GRAPHGA.H
#ifndef __GRAPHGA_H__
#define __GRAPHGA_H__

//Headers needed for EOS programs
//You need EOS v1.1 to compile this code
#ifndef __BASICGA_H
#include "basicga.h"
#endif

class CGraphDrawerGA : public TBasicGA
{
public:
 CGraphDrawerGA(CGAGraphDriver &driver) ;
 void CreatePopulation(long size, PTIndividual prototype = NULL) ;
 void ExitReport() ;
private:
 BOOL Stop() ;
 void InterGeneration(ulong, PTIndividual, PTIndividual, PTIndividual,
 PTIndividual) ;
 CGAGraphDriver & m_Driver ;
};
#endif

[LISTING SIX]

//File: GRAPHGA.CPP

#include "stdafx.h"

//Headers needed for EOS programs
//You need EOS v1.1 to compile this code
#include "eos.h"
#include "geno.h"
#include "basicga.h"
#include "nptxgeno.h"
#include "genrepop.h"
#include "gaenviro.h"

//headers specific to graph GA
#include "gdriver.h"
#include "graphga.h"
#include "graphind.h"

#include "grphphen.h"
#include "wmatrix.h"

CGraphDrawerGA::CGraphDrawerGA(CGAGraphDriver &driver)
 : m_Driver(driver)
{
}
//Create the population of individuals
//We use 2 Point Crossover and Elitism
void CGraphDrawerGA::CreatePopulation(long size, PTIndividual prototype)
{
 //Create a genotype with 1 chromosome and 2 point crossover
 //The graph driver is queried to determine the chromosome length
 PTNPtCrossGenotype pGeno =
 new TNPtCrossGenotype(m_Driver.CalcChromosomeLength(),1,2) ;
 CGraphDrawingPheno * pPheno =
 new CGraphDrawingPheno(m_Driver,m_Driver.GetWidth(),
 m_Driver.GetHeight()) ;
 CGraphDrawingInd indiv(pGeno,pPheno);
 m_pPopulation = new TGenReplacePopulation(size,&indiv) ;
 m_pPopulation->SetElitism(2) ;
}
//When the GA is done set the best and worst individuals in the driver
void CGraphDrawerGA::ExitReport()
{
 m_Driver.m_pBest = m_pEnvironment->GlobalFittestIndivid ;
 m_Driver.m_pWorst = m_pEnvironment->GlobalWorstIndivid ;
}
//allow for windows processing!
void CGraphDrawerGA::InterGeneration(ulong, PTIndividual, PTIndividual,
 PTIndividual, PTIndividual)
{
 MSG msg ;
 //while there are msgs for status window
 while (PeekMessage(&msg,AfxGetApp()->m_pMainWnd->
 m_hWnd,0,0,PM_REMOVE)) {
 TranslateMessage(&msg) ;
 DispatchMessage(&msg) ;
 }
 SetCursor(LoadCursor(NULL, IDC_WAIT));
}
//GA calls this function to determine if it should stop
BOOL CGraphDrawerGA::Stop()
{
 return m_Driver.m_Stop ;
}
End Listings















April, 1994
PROGRAMMING PARADIGMS


A Little RISC Lands Apple in the Soup




Michael Swaine


In the early 1980s, the British microcomputer market was dominated by British
companies, primarily Sinclair and Acorn.
It was an unlikely scenario.
The microcomputer revolution was by this time becoming institutionalized.
What, only a few years before, had been a marginal market of electronics
hobbyists selling to other electronics hobbyists had become a
venture-capital-attracting international industry. IBM had come in and
legitimized the industry, was the commonly heard--and true, even if
incomplete--explanation.
All the early shots in this revolution had been fired in the United States,
and all the big companies--no surprise--were U.S. companies, some of which had
established manufacturing facilities in Europe. The European market, taken as
a whole, was only a fraction of the U.S. market. The British market was a
fraction of that fraction, and, unlike some European countries, Britain didn't
have high tariffs to keep out American computers. By all logic, American
computer companies should have been able to walk all over the homegrown
brands.
But that's not what happened. British computer companies were bucking the odds
and winning. What was going on?


Who Were These Guys?


One of the things that stands out when you look at the British microcomputer
scene in those days is the Cambridge connection. Sinclair and Acorn had
Cambridge University connections in common, and Acorn in particular maintained
close ties with the university, drawing on it for personnel, ideas, and
support. Cambridge may have been one strength of these companies.
But Sinclair and Acorn differed in many ways. For one thing, Clive Sinclair
went for the high-concept products: The World's Cheapest Computer, The First
Practical Electric Car. The Acorn crew were less flamboyant. They just built a
computer.
The Sinclair computer was one of the first users of the Zilog Z80, arguably
the first microprocessor created specifically to be the CPU of a personal
computer. Arguably. The Acorn used a chip originally intended for controller
use: the Rockwell 6502. The Acorn developers got to be experts in the 6502,
just as Apple cofounder Steve Wozniak did.
Clive Sinclair, like Nolan Bushnell in the United States, founded several
companies, explored diverse industries, and had flashes of high visibility;
Sinclair, though, has been off American radar for years. The Acorn team
prospered with less abrupt ups and downs and has significant visibility today.
It was the BBC deal that made their fortune.
The British Broadcasting Company had decided to launch a computer-education
television show that would run throughout the UK, and it wanted a BBC
microcomputer to sell to viewers of the show. It was a savvy plan, and when
Acorn got the BBC contract, both Acorn and the BBC thought that they could
sell over ten thousand computers despite the small size of the nascent British
market.
To date, Acorn has sold nearly two million BBC Micro-compatibles, and the
company has grown from a typical microcomputer company of the early '80s with
a staff of a couple dozen to a multimillion-pound company with hundreds of
employees.
When it came time, in the mid-1980s, to admit that the 6502 had had its day,
the Acorn guys did something telling. Rather than accept the conventional
wisdom about the "right" microprocessor for the next generation of computers,
they fell back on their expertise, or perhaps just their old habits. They
designed their own.
What they came up with was the kind of chip you might expect old 6502 hackers
to design: a small instruction set, low power consumption, small die size,
potentially low cost. It may have been of only academic interest to them that
these are now the characteristics of low-end RISC chips. They weren't trying
to develop the first commercial RISC processor. They just wanted a better
6502.
What they came up with was the Acorn RISC Machine, or ARM. The first ARM chip
was shown fully functional in April of 1985. It operated reliably at 8 MHz,
although designed to operated with a 4-MHz clock. It was a 3m device of about
25,000 transistors. Initially, the ARM1 was offered as a coprocessor in the
BBC computer. The second generation ARM2 was used by Radius in one of its
first graphics accelerator cards for the Macintosh. The ARM2 also saw service
in the movies, being used in the robotic controller from MicroRobotics of
Cambridge, England, that controlled the robot turtles in the movie Teenage
Mutant Ninja Turtles.


Meanwhile, Back in the Colonies_


Apple formed its Advanced Technology Group (ATG) in 1986. At that time Acorn,
facing competitive pressures from clones, had just been acquired by Olivetti
and was soon to release its first ARM-based computer, the Archimedes, to a
lukewarm response. Apple's ATG was chartered to explore new technologies that
could be of use to Apple in the '90s. One technology that ATG evaluated and
took note of for possible inclusion in Apple products was Acorn's ARM
processor, but nothing was done with the ARM at the time.
Somewhat later, a skunkworks within ATG called the Advanced Products Group
(APG) took on the mission of developing a new system architecture that they
were calling Newton. The trip to Newton had a lot of side trips and blind
alleys. It was apparently Michael Chao's Knowledge Navigator pitch to John
Sculley that tipped the balance from a tablet form factor to the hand-held
device that Apple eventually released.
One of the other alleys explored involved the microprocessor. For some time
the AT&T Hobbit chip was considered. What they were looking for was a
processor with characteristics that sounded like those of a microcontroller
rather than a computer CPU: small die size, low cost, low power consumption,
instruction set efficiency, ease of embedding in ASIC designs. In 1990, RISC
looked promising, and ARM looked particularly good.
To ensure that future ARM processors would fit Apple's evolving needs, Apple
made a deal. It was an early example of the joint ventures that Apple
continues to pursue today. Apple UK joined forces with Acorn and VLSI
Technology, with whom Acorn had worked in producing the first ARM chips, to
form ARM Ltd.
ARM Ltd.'s ARM 610 became the processor for the first Newton devices, the
Apple MessagePad and Sharp ExpertPad. (ARM6 devices like the ARM 610 really
represent the fourth generation of ARM devices; apparently the numbering
skipped 4 and 5.)
ARM was on a roll. In 1992, 3DO announced that the ARM60 would be used in its
Interactive Multiplayer. ARM6 devices are also seeing use in controller
applications, such as fuzzy-logic controllers.
The ARM6 family embodies full 32-bit addressing and support for both
Big-endianness and Little-endianness, a requirement imposed by Apple. The
ARM610 includes a 4-Kbyte cache, a write buffer, and a MMU, all in a package
smaller than a 386. The MMU implements memory domains and permissions designed
to provide hardware support for modern operating-system memory-management
strategies like multilevel memory protection, memory paging, demand-paged
virtual memory, and object-oriented memory with background garbage collection.
The last of these turns out to be crucial to the Newton model for object
storage.
The rest of this column looks at some of the characteristics of that model.


A Little Selfishness


Newton's model of object-oriented technology is reported to be related to
SELF, an object-oriented dynamic language developed by Smith and Unger at
Stanford University about the time the ARM1 chip was seeing first silicon.
NewtonScript is not SELF, though, or Dylan, or any other language. It has some
unique characteristics.
One characteristic that NewtonScript does share with SELF is the "everything
is an object" approach. The SELF model is unusual among object-oriented
languages in that it isn't built around classes. The slogan "everything is an
object" means that objects inherit directly from other "prototype" objects, as
distinct from the more familiar class-based inheritance.
Newton's object-oriented language, NewtonScript, diverges from SELF in many
ways, but has much the same spirit. It has prototype inheritance, as well as
"parent" inheritance. But not everything is an object to NewtonScript. Chunks
of data that can fit into 32 bits (integers, characters, Boolean values) are
addressed via immediate reference, while everything else is a pointer
reference. All these pointer-referenced data are stored in the heap as, yes,
objects. Some object-data types are: symbols, reals, arrays, strings, and
frames. The most important type of object in the Newton object-storage model
is the frame.
A frame is a data structure containing named references to objects of
arbitrary data type. It's much like a struct or record in other languages. A
frame can also contain functions.
Example 1 is a typical NewtonScript frame. Frames in NewtonScript are
delimited by braces ({}). The named data items within a frame are called
"slots." Each slot is specified by its name, a colon, and its value. The slots
are separated from one another by commas. Example 1 shows a _proto slot (more
about this shortly), an integer constant slot, a Boolean constant slot, a
string constant slot, a function slot (this is how methods are implemented in
NewtonScript), and a slot that is itself a frame.
The _proto slot indicates one of the modes of inheritance, prototype
inheritance. To establish that frame 2 inherits in this way from frame 1, you
give frame 2 a _proto slot and give that slot a reference to frame 1 as its
value. Frame 1 is then frame 2's prototype. Frame 2 can use (inherit) slots of
frame 1, can override them with its own slot declarations, and can have
additional slots that frame 1 doesn't have. Since functions can appear in
frame slots, functions can also be overridden and inherited in this same way.
By the way, to send that method exampleFunction as a message to the frame
exampleFrame, the syntax is exampleFrame : exampleFunction.
A couple of points will indicate how you work with this kind of inheritance:
Inheritance is by reference, and prototypes can be in ROM. The implication is
that if there is any chance that a reference to a certain slot may be a
reference to ROM, you should declare that slot in frame 2, even though it is
declared in frame 1 and inherited from it.
In fact, the whole Newton user interface essentially resides in prototypes in
ROM, and you can use them as the prototypes for components of your
applications. Simple Newton applications can be developed without any actual
coding by using visual programming tools in the Newton Toolkit (NTK). These
tools mainly facilitate this process of using ROM prototypes as the prototypes
for components of your application. More complex applications will require
some actual coding, of course, and it should be noted that only the
user-interface elements can be used in this way. The rest of your app has to
be built the hard way.



Look for the Union Label


To understand how Newton stores object data, you need to know about stores,
soups, and entries.
Newton objects can, at least for the current devices, reside in one of two
places: in memory (ROM or RAM) or on a PCMCIA card. The memory and the card
are called "stores." Other stores may be available on future Newton devices.
Stores contain collections of data called "soups." All the data in a store are
in soups, and a store can hold many soups. If a store is like a volume, a soup
is like a database on the volume.
Soups are made up of "entries." An entry is a frame. If a soup is like a
database, an entry is like a record.
This model--physical stores containing soups made up of entries, and entries
that are struct-like frames of object data--shows that Newton objects
basically reside on Newton's physical storage devices, but it creates a false
impression.
Because it isn't the simple soups that matter most in Newton software
development, but cross-store collections called "union soups." Union soups
seamlessly merge data from soups of different stores. If programmers use union
soups rather than soups, then users can always decide where they want their
data stored. In a machine with less than 200K of user-available RAM, you can
be sure that's an issue. The moral for Newton developers: Use union soups.
Naturally, there's an exception to this rule. Preferences are stored in the
System soup in ROM only. Every application adds at least one entry to this
soup, which is not a union soup.
All existing soups (the "names" soup used by the bundled Names application,
for example) are available to your application, and you are encouraged to use
them. You can add your own data to these existing soups by adding a slot. To
avoid conflicts, Apple encourages you to add just one slot, using your
appSymbol as the name entry for the slot.
Note the distinction: Adding an entry to a soup is like adding a record to a
database. Adding a slot is like adding a field.


Soup Management


Besides automatic garbage collection, Newton provides a lot of built-in data
management. Soups automatically maintain indexes of their entries. You specify
these indexes when you create a soup, but indexes can be added and removed
dynamically. Currently, the only kind of index supported is "slot," but future
versions of NewtonScript may support others. Using a slot index means that the
index key is the value of a particular slot that appears in each entry.
The function theStore : createSoup ( soupNameString, indexArray ) creates a
soup of the specified name in the store named theStore. IndexArray is a frame
describing the initial index(es) you are creating for the store. You don't
have to create any, since indexes can be added later. Soups can contain any
mishmash of entries, but unless all entries have at least one slot in common,
it won't be possible to specify an index that lets you search the whole soup.
Some points on managing soup entries: When you add an entry to a soup, you
actually add the transitive closure of the entry. Altering an entry doesn't
update the store; you need to call EntryChange. The Newton operating system
calls EntryChange every so often when idle, but applications will typically
have to know when to call EntryChange themselves. The only way to get at the
entries in a soup is via a "query." A query can use an index, or some other
kind of search, like searching all string slots in all entries for a specified
search string. A query returns a set of entries, and these entries are then
accessed through an object called a "cursor."
A cursor is a pointer to one of the entries in this returned set. The cursor
is advanced to the next entry in the set or otherwise repositioned by sending
it messages.
The Newton approach to handling persistent-object data has some distinctive
and, I think, interesting characteristics. I suspect I'll have more to say
about it in future columns.

Example 1: A NewtonScript frame.
exampleFrame := {
 _proto: protoFrame,
 index: 1,
 active: TRUE,
 name: "Name of Frame",
 exampleFunction:
 func(param)
 begin
 return param * 10;
 end
 otherFrame:
 { owner: "Mike Swaine",
 ownerAddress: "72511,172" }
 } ;























April, 1994
C PROGRAMMING


Borland Nonsense: Ready, Aim, Shoot!




Al Stevens


The marksman: Borland. The target: Borland's foot. The weapon: Borland C++
4.0's No-Nonsense License Statement.
Read the saga of how a company, known far and wide as the software developer's
friend, dropped their guard, let their lawyers rewrite their no-nonsense
license statement, and plugged themselves squarely in the pedal extremity.
Our story begins with the patent insanity. Unbeknownst to us, Borland holds a
patent on their VROOMM overlay technology, and they have several other
software patents pending. Those patents, when granted, will cover algorithms
that are implemented within their libraries, DLLs, database engines, and other
redistributable modules. In theory, when you build a program with their
compiler, the executable code will contain algorithms covered by a Borland
patent. Setting aside the question of the validity of software patents in
general, the result is that you are distributing a program made with patented
components. By law, you need a license from Borland to distribute those
components.
Licenses can be obtained in many ways. You can pay a one-time fee for an
unlimited license. You can pay a per-copy royalty. The holder can give you a
royalty-free license. You can exchange patent licenses. Or you can be denied
the license. If the patent holder does not want any competition, or does not
want you in business for some reason, they can refuse to grant you a license.
You would need to find another way to write your program.
Traditionally, Borland and other compiler vendors include this grant in the
license conditions with which you tacitly agree when you break the seal and
use the product.


The Borland Dilemma


Prior to version 4.0, Borland's C++ no-nonsense license statement made no
mention of patents. It granted to each registered user a license to distribute
compiled programs without additional fees being charged. But someone at
Borland saw something wrong with that. They reasoned that a major competitor
could use Borland technology to build competing tools and applications.
As the self-professed dominant vendor of tools and applications, Borland found
itself facing an internal conflict of agendas. The languages division wants to
provide software developers with the best software development technology. The
applications folks want to maintain dominance in a marketplace where
competitors can use those superior Borland tools.
As one Borland spokesman put it, Microsoft could buy one copy of Turbo C++ for
$99.00 and receive unlimited use of the patented VROOMM technology in
applications that would then compete with Borland applications. Borland wanted
to keep the competition from using its patented technology against it and
continue at the same time to be responsive to the needs of its language
customers. It was the old clich about having your cake and eating it, too,
which is what Borland tried to do. But what it came up with was met by an
overwhelming firestorm of user reaction.
What lit the fire? Well, in times past, you could distribute as many copies of
programs as you wanted. Under the terms of the new no-nonsense license
statement, you could distribute only up to 10,000 copies per year of your
Borland-compiled application. To distribute more copies than that, you would
have to get Borland's permission. The reasoning behind this peculiar
condition, as spokespeople explained later, is that only large competitors are
likely to be selling more than 10,000 copies per year.


D-Flat Gets a License


I wanted to learn more, so I set out to get a royalty-free license to
distribute more than 10,000 copies of D-Flat. I called Borland and asked for
their OEM licensing department, which is what the no-nonsense license
statement says I should do. The operator connected me with Karen Rogers. When
I asked if this was the OEM no-nonsense licensing department, she hesitated,
laughed, and asked what my call was about. Karen is in Corporate Affairs. I
told her what I needed, and she transferred me to John Smart, Borland's patent
lawyer. I told him what I wanted, and he said no problem. When he got my name,
company, and the name of D-Flat, he recognized it, knew I was from the press,
and we had a congenial conversation about the situation.
Getting the license was easy. I have it now and may distribute D-Flat without
restriction. But the disturbing part is to get this license, users, potential
Borland competitors or not, had to tell Borland about the product. Open the
books, so to speak.


The Shift Hits the Kahn


Programmers around the world read the 10,000-copy restriction and went
ballistic. There are many venues for software distribution where the developer
cannot account for numbers. One is shareware. Another is the distribution of
royalty-free redistributables that you develop for other programmers to use to
develop programs which they distribute. Such as D-Flat. Get's hairy, doesn't
it? But for whatever reason, no one gives up freedoms without a fight,
particularly when they are taken away in the small print. Programmers felt
betrayed and said so, loudly and with some emotion. The Borland forums on
CompuServe burned with their complaints. Many vowed to return the package for
a refund. Most demanded an explanation.


Borland Responded


Borland reacted to the outcry by posting a Q&A dialog on CompuServe that was
supposed to clear up the matter. They announced their intention to revise the
no-nonsense license statement to remove some of the restrictions, but the
wording of the Q&A was vaguer than the original no-nonsense license statement,
and it was not clear how they would deal with the problem short of removing
all of the restrictions.
The 10,000-copy restriction was silly at best. Borland's VROOMM patent is the
only one it has, although others are pending. One wonders who Borland was
trying to contain. The Q&A document stated publicly that the restrictions are
directed only at certain large, litigious competitors and that others had
nothing to worry about. It said, "If you are not a litigious competitor, then
the restriction doesn't apply to you." How does Borland know who is going to
sue them? It seems to be saying, "If you sue me, I'm taking my license back."
Smart narrowed the number of litigious competitors to two and would not name
them but said that one of them was suing Borland now. That would be Lotus.
Listen up, Philippe. Overlays ain't that hard to figure out. Lotus can hire
some fast and loose programmers and do their own overlay manager quicker than
you can snap-roll your Waco. It isn't worth all this bad public relations just
to force them to do that.
Speculation follows. Could be someone at Borland heard that Lotus wrote
everything in Turbo C and is heavily committed to some compiler
implementation-dependent stuff. That would be the coup de grace. Rig your
no-nonsense license statement so that a big competitor, one who just happens
to be suing your eyes out, cannot upgrade to the next version of their
principal development tool. This is the only scenario that I can come up with
that even remotely explains Borland's changes in attitude about patents.
Nonetheless, I wonder about the ethics and legality of a license that is
publicly waived for everyone except certain competitors. Borland told us that
there are only two targets, and it gave us enough information to guess who the
large, litigious competitors are. End of speculation.


Other Restrictions: What You Can Compile


The 10,000-copy limit and the patent threat are only the first half of the
story. Most programmers did not notice that Borland C++ 3.1's no-nonsense
license statement contains language that restricts what kind of programs you
can compile and distribute. You are restricted from developing:
_a compiler, development tool, environment product or library which includes
any of the libraries, DLLs or source code included in this package_[or]_a
product that is generally competitive with or a substitute for any Borland
Language product.
How many of you 3.1 users knew that? You didn't read your no-nonsense license
statement, did you? See what it says? You can't develop a programmer's editor
because it would compete with Brief. You can't develop a compiler, an IDE, a
profiler, a user-interface class library (such as D-Flat++), a resource
compiler, and so on.
The patent stuff, which caught everyone's eye, drew attention to these other
restrictions. Most of the programmers spoke out as if the noncompete
conditions were new to version 4.0. They were not. Nonetheless, users were mad
about the noncompete stuff too.



The Paradox Paradox


What was the intent? According to Smart, Borland did not want to restrict you
in any of the ways that I just described, even though the language in the new
no-nonsense license statement said otherwise. It merely wanted to prevent
anyone from buying the Paradox engine, putting a user-interface shell around
it, and selling a product that competes with Paradox. There's the real
paradox, folks. The languages department wants to provide developers with a
comprehensive database engine, but the applications department does not want
those developers to use it in ways that the company does not approve of.


More Nonsense


Borland's shot in the foot was a double-barreled blast. The second barrel on
their no-nonsense license statement contained the condition that the program
you develop "_may not be an operating system."
Wow. I didn't know that Borland was planning to release an operating system.
My earlier speculation would not apply to this one. The other large litigious
competitor doesn't use Borland's compiler to compile their operating system.
That can't be why Borland put this restriction in. My usually reliable sources
weren't telling, either, other than to say that one of the lawyers added the
language. Makes you wonder. Doesn't anybody outside of the legal department
read this stuff before it goes in the big blue and white box?
This operating-system restriction had wide-ranging implications. For one
thing, it ruled out UNIX ports. But worse, it hit embedded-system developers
squarely between the eyes. An embedded system does not usually use MS-DOS,
DR-DOS, or any other general-purpose operating system. The embedded program
will be self-contained, which means that it includes an operating system. You
couldn't write one of them according to the new terms.
Of course, another outcry was heard 'round the world. Borland reacted quickly
by saying in its CompuServe Q&A, "The restriction against creation of an OS is
deemed unnecessary and will be dropped."
Why was the restriction necessary one day and not the next? One theory
involves Borland's agreement with Microsoft. Borland has a license to
distribute certain Windows development materials that are covered by
Microsoft's copyright of the Windows API. Without those materials, Borland's
users would need to purchase the SDK to develop Windows programs. The theory
speculates that Microsoft granted that license on the condition that Borland
would somehow prohibit its users from developing operating systems that
compete with Microsoft. I do not believe this theory. Microsoft does not have
similar restrictions on your use of its own software development products. I
believe that Microsoft grants those licenses because its best interests are
served when you develop Windows programs regardless of the compiler that you
use. No, the story that Smart told me makes more sense. One lawyer, who
doesn't know what an operating system is, put the language in, and no one else
was smart enough to cross it out.
So, once again, why did the urgency of this operating-system condition
disappear so fast? Because it isn't important, and you, their customers,
howled, that's why.


The Healing of the Wound


Borland lost a large measure of credibility during this episode. It stuck a
toe in to test the patent waters and got it shot off. It tried for some reason
to limit the development of operating systems and got the door slammed shut in
its face.
To regain some lost esteem, Borland went into damage-control crisis mode. The
spin doctors rewrote the no-nonsense license statement to remove the operating
system and 10,000-copy restrictions and to water down the non-compete clause
to reflect their true, original, noble intentions, which are more palatable.
The only restriction is as follows:
Your programs may not be merely a set or subset of any of the libraries, code,
Redistributables or other files included in this package.
That restriction seems reasonable and reflects Borland's responsiveness to the
concerns of its customers. More importantly, it demonstrates the power of the
user's voice when a vendor tries to impose unreasonable restrictions on its
customers. We, the programmers, won this one by the sheer force of our
numbers. I hope that all vendors are watching and that they will have the good
sense to let some users look at what the lawyers write before they commit to
it.
We hope that Borland learned that lesson. In its zeal to counter their
enemies, it forgot who its friends were. It showed us a different face, one
that we had not seen before, a mean-spirited one that holds and can enforce
software patents if it wants to. We want to believe that the old face has
returned and that it is the true one.
Borland is not a litigious company. It has never sued anyone. Borland thought
that reputation would hold it in good stead in the face of public reaction to
their actions. But when I asked about the future, when the empty suits change
occupants, when I asked about how we could be sure that some future regime
would continue to overlook those fascist and burdensome no-nonsense license
restrictions, Borland could not answer. In the face of overwhelming public
disapproval of their actions and the hidden agenda that those actions seemed
to reveal, Borland did what it had to do. It took it all back. The version 4.0
no-nonsense license statement is, if anything, more liberal and more absent of
nonsense than that of version 3.1.


What We Learned


This episode teaches us something else, too. Read the licensing conditions on
whatever software-development tool you use to develop a program that you plan
to distribute. Virtually all C++ compiler products have some restrictions.
They require that you put a valid copyright notice on your software and do not
remove any copyright notices that they include on the redistributable
components. You indemnify the vendor from any liability if your programs do
not work. You may redistribute the redistributables only as a part of an
operating program and not as redistributables themselves. You must be a
registered user of their product to distribute programs compiled with their
product.
Only Borland and Microsoft have restrictions about what the programs
themselves may do. Borland does not want you to distribute sets and subsets of
its redistributables, whatever that means. Microsoft does not want you
distributing programs that use its libraries, MFC, and VBX redistributables in
programs that programmers use to build programs that use VBXs. Interestingly,
although Symantec C++ Professional licenses the MFC libraries from Microsoft
for just such a purpose, it does not have a similar license restriction about
what you can do with them. As you can see, it gets complicated.


Patent-Leather Agenda


Several years ago, every issue of every automobile magazine was sure to have
at least one editorial where the author whined about the national 55-mph speed
limit. Until the law was repealed those magazines acted as the self-appointed
guardians of our right to drive fast. Similarly, every gun magazine today can
be depended upon to wedge their editorial agenda against enemies of the people
such as Janet Reno and James Brady who would abridge our Second Amendment
right to own and bear semi-automatic assault weapons and handguns (in a
well-formed militia, of course).
We in the programming-trade press are coming to sound very much like those
other self-interest watchdog publications. We are beating this issue of
software patents to death. What is the point? Well, in the first place, the
issue is a technical one that is being administered with nontechnical
criteria, and we, the trade press, are the only public forum that has or will
tell the truth. The arguments for software patents are based in power, money,
and politics. When you apply nontechnical, interim solutions to a technical
problem, you almost always arrive at a final solution that does not work, if
only because the technical parts of the problem are unsolved. Unfortunately,
we are singing to the choir. You, our readers, already understand.
Every programmer understands that "software patent" is an oxymoron. The
lawyers who prepare and file the patent documents do not understand. Neither
do the Patent Office bureaucrats who grant the patents. Some of the people who
hold the patents understand, but they are motivated by things other than
technical purity, such as the promise of gain.
One of our smartest programmers is Bill Gates. His plans for Microsoft include
100 new software patents per year. He knows better, knows that the system does
not know better, and plans to use that advantage to expand his power,
influence, and wealth. Why am I surprised? Isn't this supposed to be the
greed-is-good generation?
Software patents are everywhere. Most of the software-tool vendors who bring
their demos to the DDJ conference room proudly announce that they have filed
patents on parts of their products. I don't think they read our editorials. I
recently attended a briefing of a new version of a well-known database
management system. The vendor has a patent pending on his particular use of
the B-tree algorithm in the indexes that support interfile relationships. I
had to laugh, because years ago I used an identical technique in government
software systems. It was obvious then. It is obvious now. The patent will
probably be granted.
And yet, we keep thumping the drum. If we educate you about the dangers of
software patents and their potential to compromise your livelihood, then we
have done some good. At least you will be prepared. If enough of us kick up
enough of a fuss, maybe our legislators will get the hint and do something
positive about the problem. Maybe we can get the attention of those who need
to understand their business and ours a little better. The effort might be in
vain, however. Even if we educate the lawyers, they will pretend to continue
to operate in a cyberfog. Technical ignorance supports their agenda, which is
collecting fees for knowing the law. Educating Patent Office bureaucrats is
probably a waste of time, too. As soon as one of them understood software well
enough to do the job, they would quit and find work as a programmer. Who
wouldn't? And finally, trying to educate wannabe wealthy patent holders and
fee collectors is guaranteed to be folly. They have already learned all that
they need to know.


Goodbye, Sonny


I want to tell you about an unsung hero in our industry, someone who will
never be the subject of a book, who will never receive a prestigious award,
and about whom you will never hear, except today in this column.
Almost six years ago, in my first "C Programming" column, I told how my
brother Fred got me started with C. He was a microcomputer pioneer with an
engineering degree and a love of programming. He had every issue of Dr. Dobb's
and was among the first of the home-brew computer makers. He was not one of
the famous hackers, but he knew more about it than most. He kept a low profile
and kept in touch with everything that was happening. You did not know about
him, but he knew about all of you.
It was 1971. I was pounding out Cobol accounting programs when Fred dropped
by. He brought a small aluminum hobby box with a front panel sporting four
LEDs, four toggle switches, and some push buttons. It was a home-built
computer, about the size of a cigar box, running an Intel 4004 microprocessor.
I had never seen such a thing. We spent all afternoon loading programs and
data into the small memory with switches and buttons and reading the output as
binary values in the lights. The 4004 was meant to be used in calculators, but
Fred was using it for some kind of black-box application in what we would call
today an "embedded system." We got excited about that little box with four
data lines and 256 bytes of memory. Someday, we thought, everyone would want
one.
Fred grew with and ahead of the technology, always among the first to try new
things. He built one of the first Altairs. It's still in his basement lab,
still running. He recruited me to write programs for his projects and showed
me how to squeeze code into tight spaces, citing Stevens's first law of
programming, which said that any program can be reduced by one byte, and
Stevens's second law, which said that sometimes Stevens's first law had to be
applied recursively. Together we built many diverse embedded systems: a
telephone call accounting system, a point-of-sale monitoring device, a
power-company remote-station monitoring system, a laboratory etching device.
We integrated microprocessors with PBXs, VCRs, TV cameras, cash registers,
voice synthesizers, pagers, stepper motors, plating chambers, motion
detectors. Fred designed and built the hardware, and I wrote the programs. We
worked side by side, days and nights, and every project was a learning
experience. Those old 8080 machines served as prototypes for the products and
primitive development systems for the firmware. We typed the source code into
memory with a TeleType terminal, programmed EPROMS from paper tape, and erased
them under UV light. We wire-wrapped and soldered and patched and programmed
and hand-assembled our way through dozens of one-of-a-kind machines, each one
a wonder to behold and every one finished and performing its mission, some of
them still in service today.
Twelve years ago, Fred's diabetes took him out of the action. With the passage
of time he lost most of his vision, his kidneys, his legs, and a hand to the
ravages of the disease. Not able to see well enough to design and debug
hardware again, he returned to software, learning UNIX, Forth, C, and assembly
language. His reading and typing were slowed sometimes to a crawl, but he
never gave up, always maintained a hearty sense of humor, and never lost his
enthusiasm for the work. Even when he could barely lift himself out of bed, he
talked about ideas for the next project and held onto the belief that he'd
lick the odds and see it through one more time. With his right hand gone and
unable to see, he was still at it, figuring out how to integrate a joystick
keyboard-emulator program with a voice synthesizer so that he could get back
to programming as soon as he got well.
On his fifty-sixth birthday, eight days before Christmas, his frail body gave
way to a last heart attack, and Fred died, and his monumental spirit,
intelligence, and courage were gone.
This is a lonely time for me. Everything that I know and all that I have done
that is good can be traced in one way or another to things that my big brother
Fred gave me. By his teaching, his example, and his encouragement, he was my
mentor, my friend, and my biggest fan. But the loss is not mine alone. He left
a family that he loved unconditionally and many loyal and devoted friends. We
will all miss him.




April, 1994
ALGORITHM ALLEY


Searching for a Search Engine




Tom Swan


Selecting the right tool for the job is always important, whether you are a
carpenter, mechanic, or programmer. Too often, however, programmers choose
algorithms for the wrong reasons--selecting a Quick sort because they believe
it's always the fastest (not true) or using a binary search because they heard
it always makes the fewest comparisons when finding elements in a sorted array
(also not true). Never choose an algorithm because of its popularity.
Depending on your application's requirements, a less-well-known method may be
faster or more efficient.
On the other hand, it's human nature to be taken in by claims of superiority,
as I discovered while searching for a tool of another variety--I'm talking
oil-filter wrenches, now, not algorithms. You see, I need to regularly change
the oil and filter in the diesel engine on board my home and sailboat, but it
took three tries to find a wrench that would properly unscrew the filter can.
The first tool I purchased, the most popular design, came with a band of steel
attached to a vice grip that dented the filter case with only minimal
pressure. The next sported a plastic strap and the written promise that "one
size fits all." Imagine the raw holding power of plastic on a greasy canister,
and it's not hard to understand why this filter wrench of the future could
never work as advertised. (Products like these make me question whether tool
manufacturers ever try their own wares. I often wonder the same about software
vendors.) Finally, while poking around in a mechanic's tool chest, I found a
homemade pipe, fitted for a socket wrench, with a rough leather strap that
grabbed the filter the first time. Later, I bought one from the mechanic. This
just goes to show that you should never choose tools based on their popularity
or advertising claims. It's often the unlikely junk in the bottom of the
drawer that works best.


Fast Failures


The same is true of algorithms. For instance, in dusting off an old program
that I use to prepare Pascal listings for publication, I wondered whether a
binary search was the best way to look up entries in a sorted list of
keywords--the critical code in this application that parses Pascal programs
and converts keywords to lower case, optionally delimited for boldfacing in a
word processor. I knew that a binary search makes only log N+1 comparisons,
where N is the number of words in the array. Finding an entry in a list of 100
keywords, then, requires a maximum of six comparisons, which I wrongly assumed
to be the best results I could expect.
To improve the program's speed, I considered using a hash function or a binary
tree to search for keywords, but then I realized that, once again, I had been
searching for a search engine for the wrong reasons. Most strings in a program
listing are not keywords, so my program's speed was more dependent on how fast
a word was not found than it was on the speed of a successful search. In other
words, I needed a method that failed faster than the competition. Once I came
to that realization, I found a way to boost my program's run-time speed by 20
percent. The algorithm that I chose, called a trie search--after "information
reTRIEval"--is no faster on the average than a binary or hash function, but it
requires a maximum of N comparisons--where N is the number of words beginning
with the same letter--to determine that a word is not in the table. In
practice, most failed searches take only one or two such comparisons--many
take none--far better than required by a binary search, which tends to make
the maximum number of comparisons for unrecognized words. By selecting the
right tool for the job, taking into consideration the fact that most searches
could be expected to fail, I increased my program's speed by using a
less-popular, but better-suited, search algorithm.


Trie-Search Algorithm


A classic trie-search algorithm relies on a table arranged as illustrated in
Figure 1. The figure shows only a portion of a complete table, indexed in the
first column from A to Z. You could also index the table using other character
sets--a standard ASCII trie table, for example, might have 127 rows. Each
element in the index contains the number of another array that stores the
table's words. The table entries might directly store data, or they could
contain pointers--the exact format of the table depends on your program's
requirements and the programming language you are using. A zero or null entry
in a column indicates there are no words beginning with that letter. There are
no Pascal keywords beginning with H, Y, or Z, so those entries are set to
zero. (I'm using Borland Pascal's keywords here.)
As you can tell from Figure 1, a program can use a trie-search table to
quickly determine whether a search argument is not a key word. In fact, no
string comparisons at all are required for entries beginning with H, Y, or Z.
Only one comparison is needed to find words beginning with B. To achieve the
same results using a binary search requires up to six comparisons for negative
searches of Borland Pascal's 57 keywords. In other applications with larger
tables, you could extend the algorithm to use two or more tables indexed on a
word's successive letters. The trick is to minimize the number of full
comparisons required to find words in the table or to determine their absence.
Once you've structured the table, the rest is easy.
One problem, however, is evident from Figure 1. Many table slots are empty,
wasting space. To minimize memory use, you can instead construct the table as
a sparse matrix, as illustrated in Figure 2. Now the first column becomes an
array of pointers, each of which addresses a list of words beginning with the
same letter. (The table could be compressed somewhat by deleting the first
letter of each word.) Entries with no words are null. As in the classic table,
you could extend the sparse matrix by building other indexes for subsequent
letters in each word. Carrying that idea to the extreme reduces the trie table
to a digital list--that is, a binary tree of letters, with paths forming the
table's words. Small tables such as the one shown here, however, work just as
well with a single-level index.
Example 1 is pseudocode for Algorithm #18, Trie Search. The algorithm simply
looks up an input argument's first letter in the index, then searches the
linked list for a match. Only a single exact-match string comparison is needed
inside the inner loop--the key ingredient of this method's speed. A binary
search requires alphabetical less-than or greater-than comparisons, further
slowing searches for arguments not found.


Pascal Parser


Listings One, Two, and Three show the source code for my Pascal Parser,
IDENT.PAS. SEARCH.PAS, the Pascal unit in Listing One (page 143), implements
the trie-search algorithm. Keyword lists are composed of linked records of
type ResWordRec. The global Index array corresponds to the Index column in
Figure 2. Procedures AddList and AddWord build the trie-search tables--you can
use these procedures to construct a trie-search engine for any list of words,
but the words must be inserted in alphabetical order (see function
Initialize). Function IsReserved determines whether a given word, passed as
argument Ident, is a member of the table.
The other two listings, COMMON.PAS (Listing Two, page 143) and IDENT.PAS
(Listing Three, page 143), use the trie-search engine to parse a Pascal
listing. The program converts to lower case all keywords in a Pascal source
file, and also optionally capitalizes all non-keywords (specify option--c).
Use option--b to add <* and *> delimiters to keywords. The word begin, for
example, is translated to <*begin*>. Use the --b option only on a copy of a
source file--after conversion, the file will no longer compile. (You can
restore the original text by deleting all instances of <* and *>.) I use
WINWORD.MAC (Listing Four, page 145) in Word for Windows to convert delimited
words to boldface after inserting a listing into a document. You could
probably whip up a similar macro for other word processors.


Your Turn


Next month, more algorithms. Meanwhile, send your favorite algorithms and
tools to me in care of DDJ--software tools, that is.
Figure 1: Classic trie-search table.
[1] [2] [3] [4] [5] [6]
[a] 2 and begin far goto xor
[b] 3 array 0 file 0 0
:. -- asm -- for -- --
[f] 4 0 -- function -- --
[g] 5 -- -- 0 -- --
[h] 0 -- -- -- -- --
:. -- -- -- -- -- --
[x] 6 -- -- -- -- --
[y] 0 -- -- -- -- --
[z] 0 -- -- -- -- --
 Figure 2: Trie table converted to a sparse matrix.

Example 1: Pseudocode for Algorithm #18 (trie search).

input
 Arg: String;
var
 P: Pointer;
begin
 P  Index[Arg[1]];
 while(P <> nil) do
 begin
 if P^.Word = Arg then
 return True;
 P  P^.Next;
 end;
 return False;
end;

[LISTING ONE] (Text begins on page 121.)

(* ----------------------------------------------------------- *(
** search.pas -- Search engine for IDENT program **
** Trie search algorithm **
** Copyright (c) 1994 by Tom Swan. All rights reserved. **
)* ----------------------------------------------------------- *)

unit Search;
INTERFACE
uses Common;

{ Return true if Ident is a Turbo Pascal reserved word }
function IsReserved(Ident: IdentStr): Boolean;
IMPLEMENTATION
type
 ResWord = String[14];
 PResWordRec = ^ResWordRec;
 ResWordRec = record
 Word: ResWord; { Reserved word string }
 Next: PResWordRec; { List link field }
 end;

var
 Index: array[a' .. z'] of PResWordRec;
{ Add word W to list at P }
procedure AddList(var P: PResWordRec; var W: ResWord);
begin
 if (P <> nil) then
 AddList(P^.Next, W)
 else begin
 P := new(PResWordRec);
 if (P = nil) then
 begin
 Writeln(Out of memory');
 Halt;
 end;
 P^.Word := W;
 P^.Next := nil
 end
end;

{ Add word W to global Index }
procedure AddWord(W: ResWord);

begin
 if Length(W) = 0 then exit;
 AddList(Index[W[1]], W)
end;

{ Initialize search engine variables }
procedure Initialize;
var
 C: Char; { Index[] array index }
begin
 for C := a' to z' do
 Index[C] := nil;
 AddWord(and');
 AddWord(array');
 AddWord(asm');
 AddWord(begin');
 AddWord(case');
 AddWord(const');
 AddWord(constructor');
 AddWord(destructor');
 AddWord(div');
 AddWord(do');
 AddWord(downto');
 AddWord(else');
 AddWord(end');
 AddWord(export');
 AddWord(exports');
 AddWord(far');
 AddWord(file');
 AddWord(for');
 AddWord(function');
 AddWord(goto');
 AddWord(if');
 AddWord(implementation');
 AddWord(in');
 AddWord(inherited');
 AddWord(inline');
 AddWord(interface');
 AddWord(label');
 AddWord(library');
 AddWord(mod');
 AddWord(near');
 AddWord(nil');
 AddWord(not');
 AddWord(object');
 AddWord(of');
 AddWord(or');
 AddWord(packed');
 AddWord(private');
 AddWord(procedure');
 AddWord(program');
 AddWord(public');
 AddWord(record');
 AddWord(repeat');
 AddWord(set');
 AddWord(shl');
 AddWord(shr');
 AddWord(string');
 AddWord(then');

 AddWord(to');
 AddWord(type');
 AddWord(unit');
 AddWord(until');
 AddWord(uses');
 AddWord(var');
 AddWord(virtual');
 AddWord(while');
 AddWord(with');
 AddWord(xor');
end;

{ Trie search algorithm }
function IsReserved(Ident: IdentStr): Boolean;
var
 P: PResWordRec;
begin
 IsReserved := false;
 if Length(Ident) = 0 then exit;
 DownCase(Ident);
 P := Index[Ident[1]];
 while(P <> nil) do
 begin
 if P^.Word = Ident then
 begin
 IsReserved := true;
 exit
 end;
 P := P^.Next
 end
end;

begin
 Initialize;
end.

[LISTING TWO]

(* ----------------------------------------------------------- *(
** common.pas -- Various constants, types, and subroutines **
** Copyright (c) 1994 by Tom Swan. All rights reserved. **
)* ----------------------------------------------------------- *)
unit Common;
INTERFACE
const
 identStrLen = 64;
 digitSet = [0' .. 9'];
 upperSet = [A' .. Z'];
 lowerSet = [a' .. z'];
 alphaSet = upperSet + lowerSet;
 identSet = alphaSet + digitSet + [_'];
type
 IdentStr = String[identStrLen];
{ Return lowercase equivalent of Ch }
function DnCase(Ch: Char): Char;
{ Convert all letters in identifier to lowercase }
procedure DownCase(var Ident: IdentStr);
IMPLEMENTATION
{ Return lowercase equivalent of Ch }

function DnCase(Ch: Char): Char;
begin
 if Ch in upperSet
 then Ch := Chr(Ord(Ch) + 32);
 DnCase := Ch
end;

{ Convert all letters in identifier to lowercase }
procedure DownCase(var Ident: IdentStr);
var
 I: Integer;

begin
 if Length(Ident) > 0 then
 for I := 1 to Length(Ident) do
 Ident[I] := DnCase(Ident[I])
end;

begin
end.

[LISTING THREE]

(* ------------------------------------------------------------*(
** ident.pas -- Convert key word identifiers in .PAS files. **
** Converts key words in Pascal listings to lowercase, and **
** marks them for bold facing. Words are marked using the **
** symbols <* and *>. For example, <*begin*> is interpreted as **
** a bold faced "begin" key word. A word-processor macro could **
** search for all <* and *> symbols in the resulting file and **
** replace these with bold face on and off commands. **
** Copyright (c) 1994 by Tom Swan. All rights reserved. **
)* ------------------------------------------------------------*)

{$X+} { Enable "extended" syntax }
program Ident;
uses Dos, Common, Search;
const
 bakExt = .BAK'; { Backup file extension }
 tempExt = .$$$'; { Temporary file extension }
type
 PString = ^String;
 PListRec = ^TListRec;
 TListRec = record
 Path: PString;
 Next: PListRec
 end;
 TState = (
 Reading, Chkcomment, Comment1, Comment2, Stopcomment,
 Stringing, Converting
 );
var
 FileSpec: ComStr; { Files entered on command line }
 Root: PListRec; { File name list root pointer }
 DelimitWords: Boolean; { True to add <* and *> to reserved words }
 CapIdentifiers: Boolean; { True to capitalize non-keywords }
{ Return copy of a string }
function NewStr(S: String): PString;
var

 P: PString;
begin
 GetMem(P, Length(S) + 1);
 if (P <> nil) then
 PString(P)^ := S;
 NewStr := P
end;
{ Return true if InF is successfully converted to OutF }
function ConvertIdents(var InF, OutF: Text): Boolean;
var
 Ch, PushedCh: Char;
 State: TState;
 Identifier : IdentStr;
 function GetCh(var C: Char): Char;
 begin
 if PushedCh <> #0 then
 begin
 C := PushedCh;
 PushedCh := #0
 end else
 Read(InF, C);
 if (C = #13) or (C = #10) then
 begin
 if (C = #13) then
 Writeln(OutF); { Start new line }
 C := #0 { Ignore new line characters }
 end;
 GetCh := C
 end;
 procedure UngetCh(Ch: Char);
 begin
 PushedCh := Ch
 end;
 procedure PutCh(Ch: Char);
 begin
 if Ch <> #0 then
 Write(OutF, Ch)
 end;

begin
 PushedCh := #0; { No pushed character }
 State := Reading;
 while not eof(InF) do
 begin
 GetCh(Ch);
 case State of
 Reading:
 begin
 case Ch of
 ( : State := Chkcomment;
 { : State := Comment1;
 ''' : State := Stringing;
 end;
 if Ch in alphaSet then
 begin
 UngetCh(Ch);
 State := Converting
 end else
 PutCh(Ch)

 end;
 Chkcomment:
 if Ch = *' then
 begin
 PutCh(Ch);
 State := Comment2
 end else begin
 UngetCh(Ch);
 State := Reading
 end;
 Comment1:
 begin
 PutCh(Ch);
 if Ch = }' then
 State := Reading
 end;

 Comment2:
 begin
 PutCh(Ch);
 if Ch = *' then
 State := Stopcomment
 end;
 Stopcomment:
 begin
 PutCh(Ch);
 if Ch = )' then
 State := Reading
 else
 State := Comment2;
 end;

 Stringing:
 begin
 PutCh(Ch);
 if Ch = ''' then
 State := Reading;
 end;

 Converting:
 begin
 Identifier := ';
 while Ch in identSet do
 begin
 Identifier := Identifier + Ch;
 Read(InF, Ch) { Note: Don't call GetCh here! }
 end;
 if IsReserved(Identifier) then
 begin
 DownCase(Identifier);
 if DelimitWords then
 Identifier := <*' + Identifier + *>'
 end else
 if CapIdentifiers and (Length(Identifier) > 0) then
 Identifier[1] := UpCase(Identifier[1]);
 Write(OutF, Identifier);
 UngetCh(Ch);
 State := Reading
 end

 end
 end;
 if PushedCh <> #0 then { Write possible pushed last char that }
 PutCh(Ch); { sets eof() to true. }
 ConvertIdents := true
end;

{ Convert one file specified in Path string }
procedure ConvertOneFile(Path: PathStr);
var
 Result: Integer;
 BakF, InF, OutF: Text;
 TempName, BakName: PathStr;
 Name: NameStr;
 Dir: DirStr;
 Ext: ExtStr;
begin
 Write(Path);
 Assign(InF, Path);
 {$i-} Reset(InF); {$i+}
 if IoResult <> 0 then
 Writeln( **Error opening file')
 else begin
 FSplit(Path, Dir, Name, Ext);
 TempName := Dir + Name + tempExt;
 BakName := Dir + Name + bakExt;
 Assign(OutF, TempName);
 {$i-} Rewrite(OutF); {$i+}
 if IoResult <> 0 then
 Writeln( **Error creating output file')
 else begin
 if ConvertIdents(InF, OutF) then
 begin
 Close(InF);
 Close(OutF);
 Assign(BakF, BakName);
 {$i-}
 Erase(BakF);
 Result := IoResult; { Throw out IoResult }
 Rename(InF, BakName);
 Rename(OutF, Path);
 {$i+}
 if IoResult <> 0 then
 Writeln( **Error renaming files')
 else
 Writeln( done')
 end else
 Writeln( **Error processing files')
 end
 end
end;

{ Convert files on global list at Root pointer }
procedure ConvertFiles(List: PListRec);
begin
 if List = nil then
 Writeln(No files specified')
 else
 while List <> nil do

 begin
 ConvertOneFile(List^.Path^);
 List := List^.Next
 end
end;

{ Add file path to list }
procedure ListFile(var List: PListRec; Path: PathStr);
var
 P: PListRec;
begin
 New(P);
 P^.Next := List;
 P^.Path := NewStr(Path);
 if P^.Path = nil then
 Dispose(P)
 else
 List := P
end;

{ Create list of file names from FileSpec string }
procedure ListFiles(var List: PListRec);
var
 Sr: SearchRec; { Directory search record }
 L: Integer; { Length of Dir string }
 OldDir: DirStr; { Old directory upon entry to procedure }
 Path: PathStr; { Expanded file specification with path info }
 Dir: DirStr; { Directory component of Path }
 Name: NameStr; { File name component of Path }
 Ext: ExtStr; { File extension component of Path }
begin
 GetDir(0, OldDir); { Save current path }
 Path := FExpand(FileSpec); { Add path info to file spec }
 FSplit(Path, Dir, Name, Ext); { Separate Path components }
 L := Length(Dir); { Prepare to change directories }
 if L > 0 then
 begin
 if (Dir[L] = \') and (L > 1) and (Dir[L - 1] <> :') then
 Delete(Dir, L, 1); { Ensure that ChDir will work }
 ChDir(Dir) { Change to location of file(s) }
 end;
 FindFirst(Path, 0, Sr); { Start file name search }
 while DosError = 0 do { Continue while files found }
 begin
 Path := FExpand(Sr.Name); { Expand to full path name }
 ListFile(List, Path); { Add path to list }
 FindNext(Sr) { Search for the next file }
 end;
 ChDir(OldDir)
end;

{ Display instructions }
procedure Instruct;
begin
 Writeln(Use -b option to surround reserved words with');
 Writeln(<* and *> for bold-facing in a word processor.');
 Writeln(Use -c option to capitalize non-keyword identifers.');
 Writeln;
 Writeln(WARNING: After conversion with -b, the listing will');

 Writeln(not compile. Use -b ONLY on a copy of original files.');
 Writeln;
 Writeln(ex. IDENT single.pas');
 Writeln( IDENT -b one.pas two.pas');
 Writeln( IDENT wild??.pas -b *.pas')
end;

{ Main program initializations }
procedure Initialize;
begin
 Writeln;
 Writeln(IDENT -- (C) 1994 by Tom Swan');
 Writeln(Converts Pascal reserved words to lowercase.');
 Writeln;
 Root := nil; { File name list is empty }
 DelimitWords := false; { Normally do not add <* and *> to words }
 CapIdentifiers := false { Normally do not capitalize other idents }
end;
{ Main program block }
var
 I: Integer;
begin
 Initialize;
 if ParamCount = 0 then
 Instruct
 else for I := 1 to ParamCount do
 begin
 FileSpec := ParamStr(I);
 if (FileSpec = -b') or (FileSpec = -B') then
 DelimitWords := true
 else if (FileSpec = -c') or (FileSpec = -C') then
 CapIdentifiers := true
 else begin
 ListFiles(Root);
 ConvertFiles(Root)
 end
 end
end.

[LISTING FOUR]

Sub MAIN
StartOfDocument
EditFind .Find = "<*", .WholeWord = 0, .MatchCase = 0, .Direction = 1, \
While EditFindFound()
 EditClear
 EditFind .Find = "*>", .WholeWord = 0, .MatchCase = 0, .Direction = 1, \
 If Not EditFindFound() Then
 Stop
 End If
 EditClear
 WordLeft 1, 1
 Bold 1
 EditFind .Find = "<*", .WholeWord = 0, .MatchCase = 0, .Direction = 1, \
Wend
End Sub
End Listings





April, 1994
UNDOCUMENTED CORNER


Think Globally, Act Locally: Inside the Windows Instance Data Manager




Klaus Mller


Klaus studies information technology at the Dresden University of Technology's
Fraunhofer Institut for Microelectronic Circuits IMS-2. He is currently
developing a heterogeneous multiprocessor system for parallel image processing
based on the TMS320C40 processor. Contact Klaus on CompuServe at 100117,2526.




Introduction




by Andrew Schulman


Many DOS programmers still "don't do Windows," seeing it as irrelevant to DOS
programming. But this is unrealistic, because Windows Enhanced mode affects
even software loaded before Windows is loaded, including DOS memory-resident
programs (TSRs), device drivers, and even DOS itself. In short, many DOS
programs have no choice but to become Windows-aware.
Windows awareness is especially important for "instance data." For example,
load a TSR like Chris Dunford's CED command-line editor and then start Windows
Enhanced mode. Open two DOS boxes. Type a command in one DOS box, switch to
the other one, and then press the up-arrow key. The command typed in the first
DOS box appears in the second, as "state" leaks across! This unintentional
interprocess communication might appear to some programmers as a feature, but
it is more likely to strike users as a bug. A DOS programmer who blows off
this problem with a proud "I don't do Windows" had better be sure that every
one of his users feels the same way.
Now put the statement LOCALTSRS=CED in the [NonWindowsApp] section of
SYSTEM.INI (if it's not already there), and restart Windows. This time,
commands typed in one DOS box are recalled only in that DOS box; they don't
leak across into the other one. As its name implies, LOCALTSRS= has somehow
made CED's state "local" to each DOS box. Exactly how this works is the
subject of this month's "Undocumented Corner."
Rather than use CED, you can switch to the DOSKEY utility that Microsoft
includes in DOS 5 and 6. This command-line editor exhibits the same correct
behavior as LOCALTSRS=CED, except that no LOCALTSRS=DOSKEY statement is
necessary. Clearly, DOSKEY is doing something CED isn't.
Unlike CED, DOSKEY intercepts INT 2Fh and looks for calls to AX=1605h and
AX=4B05h. If it receives a call to either of these functions, DOSKEY declares
the address and size of its command-line history buffer as "instance
data"--that is, as data that must be local (rather than shared) in each DOS
box. Note that "instance data" in this context has nothing to do with multiple
"instances" of Windows applications (though there are some analogies).
INT 2Fh AX=1605h is documented in the Windows Device Driver Kit (DDK). This is
a crucial interface with which DOS programmers must be familiar. It may seem
perverse that an API necessary for DOS programmers is located in the Windows
DDK, but INT 2Fh AX=4B05h, documented as "Identify Instance Data" in the
MS-DOS Programmer's Reference, is identical (at least as it relates to
instance data). In fact, DOSKEY uses the same piece of code to handle both
calls. Unfortunately, the MS-DOS Programmer's Reference indicates that INT 2Fh
AX=4B05h is related to the relatively unused DOS task switcher and says
nothing about the need for DOS programs to instance data for compatibility
with the far more prevalent Windows Enhanced mode.
But what does INT 2Fh AX=1605h actually do? And how does it relate to other
means of instancing data, such as the LOCALTSRS= statement (or its LOCAL=
equivalent for DOS device drivers)? Eventually these methods of declaring
instance data to Windows, plus several others, lead to the _AddInstanceItem
function provided by the Windows Virtual Machine Manager (VMM). This call is
documented in the Windows DDK and in the book Writing Windows Virtual Device
Drivers, by David Thielen and Bryan Woodruff (Addison-Wesley, 1994).
Okay, so what does _AddInstanceItem do? What is instance data really, and how
does VMM implement it? The Microsoft KnowledgeBase includes a surprisingly
good explanation, "Instanced Data Management in Enhanced Mode Windows"
(Q90796). However, this states that the internal "instance buffers are not
accessible to VxDs or TSRs; they are local data structures to be accessed by
the VMM only."
In this month's "Undocumented Corner," Klaus Mller shows how to access the
internal instance-data structures, using a virtual device driver (VxD) loaded
early in the Windows boot process, right after VMM. By using the documented
Hook_Device_Service call to intercept the _AddInstanceItem function, Klaus's
VxD builds up a picture of the instance-description buffer.
To further describe the Windows instance-data manager, Klaus uses another
interesting method: examining the error-message strings that appear in the
widely available debug version of WIN386.EXE. Many of these error messages
refer to internal VMM functions whose names we otherwise would not know; see
Figure 1.
Once the internal instance-data structures are located, the results must be
interpreted. Let's say that (as in Figure 2) the virtual keyboard device (VKD)
instances 28h bytes at address 415h. So what? Well, these 28h bytes include
the BIOS keyboard buffer. VKD instances this buffer so that each DOS
box--actually, each virtual machine (VM), including the System VM in which
Windows applications run--has its own local BIOS keyboard buffer. Keys typed
in a DOS box don't leak into the user's copy of Excel or Word for Windows.
This has a downside, too: It's difficult (though not impossible) for Windows
applications and full-screen DOS boxes to deliberately "push" keystrokes into
each other.
In previous "Undocumented Corner" columns (January and February 1994), Kelly
Zytaruk examined the Windows virtual machine control block (VMCB) and noted
that offsets 0BCh and 0CCh in the VMCB refer to instance data. Klaus fleshes
out this point.
I expected that all of Klaus's results would have to be thrown out for
Microsoft's forthcoming Chicago operating system (Windows 4). However, Klaus
reports that the instancing mechanism has not fundamentally changed and that
his programs for locating the instance-data structures work in Chicago, too.
Of course, Chicago is still in prerelease, and anything can happen between now
and when it ships. Klaus does note that VMM in Chicago provides a
_GetInstanceInfo call, which reports whether a given region is instanced or
not, though whether this is more useful than the existing _TestGlobalV86Mem is
unclear. Future articles from Klaus will show how to locate device CB areas
and asynchronously access instance data without causing a page fault.
In addition to the usual places to download DDJ code (see page 3), you can get
programs and source code mentioned in this article from the new Undocumented
Corner area in the DDJ Forum on CompuServe (GO DDJ). If you have any comments
or suggestions for future articles, please post messages to me there, as well;
my CompuServe ID is 76320,302.
There are programmers who see Windows not as a graphical user interface, but
as an operating system with preemptive multitasking of virtual machines (VMs).
These developers have to worry about things like hardware interrupt handlers
that operate in the context of a specific VM.
Asynchronous access of data in a specific VM is possible via the documented
CB_High_Linear field in the largely undocumented VM control block (VMCB)
structure (see DDJ, January/February 1994). The current VM is mapped in the
first megabyte of the linear-address range. You can access any VM's address
space (current or not) by adding CB_High_Linear to its linear address.
So far, so good. But sometimes when you want to access data in a VM, you get a
page fault. This page fault is transparent (applications don't see it), but it
can lower performance and lead to serious problems inside an interrupt
handler.
What's wrong here? It turns out that the page faults are an integral part of
instance-data management in Windows Enhanced mode.


Local vs. Global Data


Instanced data is born as global data before Windows starts, but once Windows
starts, it becomes local for each VM.
In general, it would be good if all data in a VM were local. In a genuine
protected multitasking environment, global data is very dangerous. Consider an
ugly TSR that crashes a DOS system. In a truly protected environment, the
multitasking kernel nukes only the crashed VM.
Windows 3.x Enhanced mode does not support this level of safety. Since Windows
is based on DOS, Microsoft compromised. All memory allocated before Windows
starts (including DOS TSRs, device drivers, and DOS itself) will be mirrored
in each VM via the 386 paging mechanism. Thus, this memory is global, shared
by all VMs.
There are several reasons to allow the presence of global data. For one thing,
Windows rests on top of DOS, and some DOS data must be global. Consider the
example of the DOS system-file tables (SFTs) when SHARE.EXE is loaded. The
SFTs present before Windows started need to be visible to all VMs.
Global data also saves memory. Remember the days before 386 memory managers? A
well-equipped system with network drivers, mouse drivers, disk-caching
programs, and other resident software could consume 400 Kbytes of conventional
memory. If you created three VMs, you quickly wasted 1 Mbyte of memory.
Wasted? What about the paging mechanism of 386-mode Windows with its ability
to extend physical memory with disk memory? Unfortunately, TSR memory must
often reside permanently in physical memory. Many TSRs maintain a hardware
interrupt; paging out memory which contains hardware interrupt handlers would
cause unpredictable results. The interrupt-handler code would be executed for
each VM separately, so the best result from paging global TSRs would be a
remarkable performance decrease. Consequently, memory belonging to DOS TSRs,
device drivers, and DOS itself is, by default, global to all VMs, and is not
pageable. If a TSR changes some data in its memory area, the change takes
effect in all VMs. If the TSR crashes a VM because of a memory-related error,
all VMs will be crashed, including the System VM with all the Windows apps.
Although the executable code of the TSRs can be the same for all VMs, the data
must be private for each VM. Such privacy is called "instancing." While this
should not be confused with instance data in Windows applications, it is
analogous.
Any portion of software loaded before Windows can be instanced using one of
several documented techniques, including INT 2Fh AX=1605h and the LOCALTSRS=
and LOCAL= statements in SYSTEM.INI. Some obvious candidates for instancing
are the interrupt-vector table and parts of the BIOS data area. The keyboard
buffer has to be instanced, as do the history buffers belonging to any
command-line editors loaded before Windows. (Why doesn't the history buffer
for a command-line editor loaded inside a Windows DOS box have to be
instanced? Answer: Because the memory is already local.)



Instance Data and Paging


The memory management of Windows Enhanced mode is based on the 80386 paging
mechanism. The smallest unit of memory is one 4K page. The following types of
memory are to be found in the VM's address range (the page types are
documented in the DDK):
Global data. Nonpageable, system-wide data shared by all VMs. Changing the
data in one VM changes the data in all VMs. By default, the allocated DOS
memory at Windows startup is PG_SYS. The memory is "mapped" into each VM. Page
type: PG_SYS.
Local data. Pageable, local data specific for each VM. The free DOS memory at
Windows startup is local. A DOS program started in a DOS box cannot crash
other VMs because of an error that belongs to its PG_VM memory. Page type:
PG_VM.
Instance data. A mixture of instanced and global data. Like PG_SYS pages, they
are nonpageable. The difference from global (PG_SYS) memory is that some of
the data is marked as local and handled in a specific manner. Page type:
PG_INSTANCE.
Though PG_INSTANCE pages are not paged out to disk, PG_INSTANCE can be marked
"not-present"; this is key to the instance-data mechanism. Only one VM at one
time has a PG_INSTANCE page "present." The corresponding pages in the other
VMs are marked not-present. If the pages were all swapped out during a task
switch, the global data would become local; the pages would not be updated
when another VM changed the global data. Conversely, if the pages were still
present in the other VMs, writing data to the instanced part of the pages
would make them global because all corresponding PG_INSTANCE pages have the
same physical base.
Windows saves the instanced parts of PG_INSTANCE pages in a special buffer.
Because the paging mechanism has 4K granularity, a physical copy is required
for any instance data item smaller than 4K. An instanced data item can be as
small as a single byte. Many individual instance items can thus decrease
performance.


The Instance-Data Manager


The instance-data manager (IDM) manages the instance mechanism. It is part of
the Windows Virtual Machine Manager (VMM) and exports the documented
_AddInstanceItem service. While processing _InstanceInitComplete, the IDM
allocates memory via the VMM _PageAllocate service for the following buffers:
For each instance item, the instance-description buffer contains its linear
address, its length, and the location of its data within the instance buffers.
VxDs declare instance data via the documented _AddInstanceItem service and
InstDataStruc structure. The code in VMM for _AddInstanceItem builds a linked
list of these structures. At _InstanceInitComplete, VMM discards any duplicate
instance-data requests (for example, if one VxD instances 2 bytes at 415h, and
another instances 20h bytes at 400h) and builds the instance-description
buffer as a sorted array of InstMapStrucs. During Instance_Init_Complete, the
sorted list is examined. With the information from the list of InstMapStrucs,
the IDM builds the instance-description buffer as an array of
InstanceMapStrucs.
The instance-snap buffer is used to save the instant data present at Windows
startup time. When Windows exits back to DOS, the instance data is restored.
The VM1 instance buffer initially contains a copy of the data in the
instance-snap buffer.
When a new VM is created, it also gets a VM instance buffer, which contains
all instanced data for the VM when the PG_INSTANCE pages are set not-present.
The offset and handle of the VM instance buffer can be found at offsets 0BCh
and 0C4h in the VMCB. In Chicago, the VM instance buffers (including the VM1
buffer) are part of the VMCB and are allocated via the
_Allocate_Device_CB_Area service. (In a future article, I'll describe a VxD
that hooks this call to help enumerate these CB areas.)
_InstanceInitComplete is an internal VMM function. While working with the
debug version of WIN386.EXE, I found some very informative error messages that
include references to this internal function; see Figure 1. A great deal can
be learned about the internals of the IDM simply by examining these error
messages and pondering what the IDM must require for normal, error-free
operation. To verify the interpretation of the error messages in Figure 1, it
is useful to inspect the code of the IDM. Since the error messages are issued
via Out_Debug_String with the string offset in ESI, it is easy to locate the
desired code that would issue this message if something went wrong. From
there, it is easy to work backwards to the code's normal operation.
To illustrate the handling of the PG_INSTANCE pages, assume that they are
currently owned by the System VM (SYSVM, or VM1). When a second VM is created,
the VMM begins to schedule the VMs according to their priority. If the VMM
schedules from SYSVM to VM2, the PG_INSTANCE pages in VM1 are still present,
and the corresponding pages in VM2 are not. If any code in VM2 accesses the
PG_INSTANCE pages (via the linear address), a page fault occurs. The IDM
copies the SYSVM's instanced data to SYSVM's instance buffer and sets the
faulting PG_INSTANCE page of the SYSVM as not present. Then the IDM sets the
corresponding PG_INSTANCE of VM 2 present and copies the instanced data from
the instance buffer of VM2 to the page.
This instancing mechanism is not complicated. But Microsoft has added an
important twist for performance reasons: The physical copy of the instance
data to the instance buffers occurs only if the pages are written (but not
read) by a VM other than the one which owns the PG_INSTANCE pages. Because
they are set not-present for the VM, a page fault occurs and the copy starts.
Now we can understand why Microsoft has stated that "fragmented and large
instance areas decrease the performance of the swapping mechanism."
In summary, for write access VMMwill: 1. Copy instance data from the
PG_INSTANCE page physical base of the former owner to the VM's instance
buffer; and 2. copy instance data from the instance buffer of the owner to the
PG_INSTANCE page physical base. For read access VMM will copy instance data
from the instance buffer of the new owner to the PG_INSTANCE page physical
base.


Spying on Instance Data


For a deeper understanding of the instancing, it is useful to write a program
that displays the address and size of all instance data and the name of the
program which asked for the data to be instanced. LISTINST.386 (see Listing
One, page 146) is a VxD I wrote that initializes shortly after the VMM, before
other VxDs gain control, and hooks the VMM _AddInstanceItem service at
Sys_Critical_Init time. INSTWALK is a DOS program that displays the
information saved by LISTINST.386.
Two other required VxDs are VXDQUERY.386 and (to examine instance data in
Chicago) LISTCALL.386. All three VxDs are loaded automatically if you run the
DOS program VXDLOAD.EXE just before starting Windows. (VXDLOAD uses the
versatile INT 2Fh AX=1605h interface to tell Windows to load the VxDs.) While
there isn't room to show all of the source code, the programs are available
electronically (see "Availability," page 3).
For each call to _AddInstanceItem, LISTINST stores the address of the caller
and the offset of the passed-in InstDataStruc. (This structure is documented
in VMM.INC and INT2FAPI.INC, included with the DDK and with VxD-Lite.)
LISTINST has a V86 API which allows DOS programs running in a VM to get the
results of the _AddInstanceItem hook. INSTWALK uses this V86 API and prints
out all instanced data.
Sample output is shown in Figure 2. To clarify what this output means, Figure
3 shows a hex dump (using the PROTDUMP utility discussed in the "Undocumented
Corner," January and February 1994) of one of the instance data items declared
by DOSKEY; clearly, this instance data is the DOSKEY-command history buffer.
By specifying a VM number on the command line, PROTDUMP can view this buffer
in each VM; see #1 and #2 in Figure 3. The ownership and purpose of the buffer
is identical for all VMs, but the data (here, the command history) differs in
each VM. This is precisely what instance data means.
The INSTWALK utility has a --p switch to dump out the instance page-ownership
array (an internal IDM structure which lists all PG_INSTANCE pages and their
current owners) and the instance-description buffer in raw form. Because this
structure is not accessible, I built my own array. Remember, only one VM at a
time can own a PG_INSTANCE page. So you have only to walk down the VM's page
tables to determine the VM in which the PG_INSTANCE page is set present; see
LISTINST.386 for details. This is similar to output from the .mi command
available in debuggers such as WDEB386 and Soft-ICE/Windows when the debug
WIN386.EXE is installed. INSTWALK --p also dumps out the offsets in the
instance buffers, which is important if you want to access the data in the
instance buffers.
LISTINST.386 gets the names of the callers to _AddInstanceItem via a service
provided by VXDQUERY.386, which provides a complete map of the VMM and VxD
address space. VXDQUERY can name all known VxD services by name and address,
start and end of VxD objects and, particularly important for this article, all
VxD Control procedures. Even debuggers such as WDEB386 don't provide as
complete a map as VXDQUERY. Since VXDQUERY is a commercial program of mine,
source code is not provided; however, I'm making a special version available
electronically for DDJ readers; see page 3.
To use VXDQUERY, another VxD simply loads an address into the EDI register and
calls VxdQuery_Address_To_VxD_Name. The service returns the name of the VxD
plus the closest known procedure that precedes the specified address. If you
want to spy on the usage of a VxD call, as I did with _AddInstanceItem, write
a VxD that uses the documented VMM function Hook_Device_Service to intercept
the service at Sys_Critical_Init time, and collect the address of the callers
plus any related parameters. You may need to initialize right after VXDQUERY.
During Init_Complete or later, you can call VXDQUERY to translate your
addresses into VxD names. Finally, your VxD should export a V86 or PM API to
make your results public to the non-32-bit world (that is, to a program
similar to INSTWALK).
Figure 2 (from LISTWALK) shows all instanced data with their linear addresses.
At Sys_Critical_Init time, for example, VKD instances 28h bytes at 415h, one
byte at 471h, 4 bytes at 480h, and 0Bh bytes at 496h. To make any sense of
these numbers, you need to consult a reference that describes standard PC
absolute-memory locations. A good source is the file MEMORY.LST included with
Ralf Brown's "Interrupt List" (see IBMPRO library 5 on CompuServe); another
source is Undocumented PC by Frank van Gilluwe (Addison-Wesley, 1994). In the
VKD example, the 28h bytes at 415h include the keyboard buffer and head and
tail pointer, 471h is the Ctrl-Break flag, 480h points to the keyboard buffer,
and 496h includes keyboard-status bytes. It makes sense that VKD wants to
instance these portions of the BIOS data area.
Of particular interest are the final calls to _AddInstanceItem in Figure 2.
VMM made them from an internal routine called Create_Int_2F_Inst_Table; Figure
1 explains where this name comes from. This is the VMM code that processes the
Win386_Startup_Info_Struc chain passed back from INT 2Fh AX=1605h; this
documented structure includes an SIS_Instance_Data_Ptr field listing items
that software loaded before Windows (such as DOSKEY) wants instanced.
VXDLOAD.EXE determines the names of TSRs which supply a Win386_SIS.
LISTINST.386 gets a pointer to the list of all SISs on entry in the
Sys_Critical_Init procedure in EDX, because VXDLOAD fills the Win386_SIS for
LISTINST.386 with a pointer to its reference data.
VMM first lets all virtual devices instance their data and then calls
Create_Int_2F_Inst_Table to convert the instance structures provided by DOS
TSRs via their Win386_SIS to the VMM (IDM) instance structures. The result is
a doubly linked list of instance-data structures. You can walk the chain
during Init_Complete with the InstLinkF and InstLinkB pointers. After the
linked list is complete, the instance-description buffer will be initialized
with the data from the linked list. Finally, the so-called "snapshot" is taken
in order to save the startup-time values of the instanced data in the instance
snap buffer. If a new VM is created, the IDM creates a VMx instance buffer
(where x is the VM ID) and copies the contents of the instance snapshot buffer
into it.
There is one problem with this technique of hooking _AddInstanceItem to build
up a picture of the instance description buffer. As noted, LISTINST.386 hooks
_AddInstanceItem as early as possible: at Sys_Critical_Init time, right after
VMM, and before other VxDs. Unfortunately, this is too late to intercept the
_AddInstanceItem calls that VMM makes to support the LOCALTSRS= statement.
However, LISTINST is still able to find these instance items by following the
doubly linked list of InstDataStrucs. VMM will instance the entire program,
excepting its environment segment.
It would also be useful to determine the ownership of the PG_INSTANCE pages.
Again, only one VM at a time can own a PG_INSTANCE page. LISTINST.386 provides
the array to LISTWALK. Type LISTWALK --p to dump the instance page-ownership
array and the instance-description buffer. The output is more interesting if,
immediately after INSTWALK starts, you switch to another session or compile
something in the background. In this case, the instance pages are owned by
different VMs.


Not Just Spying


What can you do with this information? In addition to a better understanding
of how instance data works and what sorts of data must be instanced, the
information presented here might lead to some interesting techniques for
accessing instance data without causing page faults. This will be taken up in
a future article.
Figure 1: Some instance-related error messages from the debug WIN386.EXE.
_InstanceInitComplete no instance list
Tells us there must be an instance list. This is obviously a chain of linked
InstDataStrucs from documented in VMM.INC.
_InstanceInitComplete entries not sorted #esi > #edi
_AddInstanceItem sorts the entries and chains them together via the InstLinkF
and InstLinkB fields of the InstDataStruc.
Computed Inst_VM_Buf_Size of 0 _InstanceInitComplete
_InstanceInitComplete must compute a Inst_VM_Buf_Size.
Allocation failure VM1 Inst _InstanceInitComplete
Allocation failure Inst snap _InstanceInitComplete
Allocation failure Inst Descrip _InstanceInitComplete
_InstanceInitComplete allocates space for the VM1 Inst buffer, the Inst snap
and Inst descrip buffer.
Fail grow Instance desc buff AllocateInstanceMapStruc

IDM function AllocateInstanceMapStruc tests the size of the instance
description buffer and, if needed, grows the size.
Swap_Instance_Page, 0 in Inst_Page_Owner for page #ecx
Inst_Page_Owner is the VM in which the PG_INSTANCE page is marked present.
Swap_Instance_PageFS ERROR INSTANCE PAGE > 10Fh
ERROR: Re-entered instance copy procedure
The IDM function Swap_Instance_PageFS copies the instance data in the instance
buffer of the VM and changes the owner of the PG_INSTANCE page.
ERROR: Instance fault on suspended VM #EBX
A suspended VM cannot own a instanced page.
Instance fault on page with 0 IMT_Inst_Map_Size
The Inst_Page_Owner will change after a page fault on a instance page is
detected. The IDM determines the new owner VM and saves the instanced data of
the old owner via Swap_Instance_Page.
_AddInstanceItem failed InstDataStruc @#EDI Create_Int2F_Inst_Table
The internal routine that processes the InstDataStruc chain from INT 2Fh
AX=1605h is called Create_Int2F_Inst_Table.

Figure 2: Sample output from INSTWALK.
C:\DDJ\INST>instwalk
V86 API from ListInst.386: Result of _AddInstanceItem Hook.
Summary of allocation calls to the Instance Data Manager.

Name of VxD InstLinAddr InstSize

VKD : Sys_Critical_Init_Proc 0x00000415 0x00000028
VKD : Sys_Critical_Init_Proc 0x00000471 0x00000001
VKD : Sys_Critical_Init_Proc 0x00000480 0x00000004
VKD : Sys_Critical_Init_Proc 0x00000496 0x0000000B
VMM: _Allocate_Global_V86_Data_Area 0x0002807C 0x00000374
V86MMGR : Unknown_Service 0x00028435 0x00000006
VTD : Device_Init_Proc 0x000284C0 0x00000008
VDD : Device_Init_Proc 0x00000449 0x0000001E
VDD : Device_Init_Proc 0x00000484 0x00000007
VDD : Device_Init_Proc 0x000004A8 0x00000004
VDD : Device_Init_Proc 0x00000410 0x00000002
VCD : Device_Init_Proc 0x00000400 0x00000008
VCD : Device_Init_Proc 0x0000047C 0x00000004
DOSMGR : Device_Init_Proc 0x00000413 0x00000002
DOSMGR :IGROUP 0x00000504 0x00000001
DOSMGR :IGROUP 0x00000000 0x00000400
DOSMGR :IGROUP 0x00001550 0x0000001A
DOSMGR :IGROUP 0x0000156A 0x00000772
VMM: _Allocate_Global_V86_Data_Area 0x000286B4 0x00000256
DOSMGR :IGROUP 0x00003CE0 0x00000004
DOSMGR :IGROUP 0x00001342 0x00000002
DOSMGR :IGROUP 0x00004830 0x000008F0
DOSMGR : Device_Init_Proc 0x00027FD0 0x00000010
DOSMGR: Instance_Device 0x00001278 0x00000004
DOSMGR: Instance_Device 0x0001EEC2 0x00000004
DOSMGR: Instance_Device 0x000283F0 0x00000004
VMM:Create_Int_2F_Inst_Tbl:DOSKEY 0x00017D30 0x00000288
VMM:Create_Int_2F_Inst_Tbl:DOSKEY 0x00018C53 0x00000200
VMM:Create_Int_2F_Inst_Tbl:MOUSE 0x00012158 0x00000889
VMM:Create_Int_2F_Inst_Tbl:MOUSE 0x00012AD7 0x0000010A
VMM:Create_Int_2F_Inst_Tbl:VLM 0x0000DD60 0x0000001C
VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x00000500 0x00000002
VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x0000050E 0x00000014
VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x0000070C 0x00000001
VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x00005140 0x00000002
VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x000053B0 0x000004C8
VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x00001252 0x00000002
VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x00001262 0x00000004
VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x00001429 0x00000106

VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x00001530 0x00000001
VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x000021F0 0x00000022
VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x000012B9 0x00000001
VMM:Create_Int_2F_Inst_Tbl:SYS DRV 0x000012BC 0x00000002

Figure 3: Examining DOSKEY's instance data.
C:\DDJ\INST>instwalk grep DOSKEY
VMM:Create_Int_2F_Inst_Tbl:DOSKEY 0x00017D30 0x00000288
VMM:Create_Int_2F_Inst_Tbl:DOSKEY 0x00018C53 0x00000200

C:\DDJ\INST>\ddj\protdump\protdump #2 18c53 200
00018C53 45 00 63 3A 5C 65 70 73 5C 65 70 73 69 6C 6F 6E E.c:\eps\epsilon
00018C63 2E 65 78 65 20 24 2A 00 69 6E 73 74 77 61 6C 6B .exe $*.instwalk
00018C73 20 3E 20 69 6E 73 74 77 61 6C 6B 2E 6C 6F 67 00 > instwalk.log.
00018C83 67 72 65 70 20 44 4F 53 4B 45 59 20 69 6E 73 74 grep DOSKEY inst
00018C93 77 61 6C 6B 2E 6C 6F 67 00 5C 64 64 6A 5C 70 72 walk.log.\ddj\pr
00018CA3 6F 74 64 75 6D 70 5C 70 72 6F 74 64 75 6D 70 20 otdump\protdump
00018CB3 31 38 63 35 33 20 32 30 30 00 5C 64 64 6A 5C 70 18c53 200.\ddj\p
 ... etc. ...

C:\DDJ\INST>\ddj\protdump\protdump #1 18c53 200
83018C53 45 00 63 3A 5C 65 70 73 5C 65 70 73 69 6C 6F 6E E.c:\eps\epsilon
83018C63 2E 65 78 65 20 24 2A 00 63 64 20 74 61 70 63 69 .exe $*.cd tapci
83018C73 73 00 74 61 70 63 69 73 00 63 64 5C 70 72 6F 63 s.tapcis.cd\proc
83018C83 6F 6D 6D 00 70 63 70 6C 75 73 00 63 64 5C 74 61 omm.pcplus.cd\ta
83018C93 70 63 69 73 00 74 61 70 63 69 73 00 67 72 65 70 pcis.tapcis.grep
83018CA3 20 4E 46 5F 20 5C 62 6F 72 6C 61 6E 64 63 5C 69 NF_ \borlandc\i
83018CB3 6E 63 6C 75 64 65 5C 74 6F 6F 6C 68 65 6C 70 2E nclude\toolhelp.
83018CC3 68 00 65 78 69 74 00 63 64 5C 69 6E 73 74 00 63 h.exit.cd\inst.c
 ... etc. ...

[LISTING ONE] (Text begins on page 125.)

;;; _AddInstanceItem hook from LISTINST.386

;;; from DDK VMM.INC
InstDataStruc struc
InstLinkF dd 0 ; linked list forward ptr
InstLinkB dd 0 ; linked list back ptr
InstLinAddr dd ? ; Linear address of start of block
InstSize dd ? ; Size of block in bytes
InstType dd ? ; INDOS_Field or ALWAYS_Field -- ignored?
InstDataStruc ends

;;; from LISTINST.INC -- my InstData struct includes caller address
KM_InstData struc
AddInst_Caller dd ?
InstDataStruc { } ; from VMM.INC
KM_InstData ends

;;; from LISTINST.ASM
oldservice dd 0 ; return value from Hook_Device_Service
Inst_Struc_Ptr dd 0 ; InstLinkF from InstDataStruc
calladr dd 0 ; address of _AddInstanceItem caller
Data_Buf_Addr dd 0 ; created with _PageAllocate PG_SYS
Data_Buf_Size dd 0
Data_Buf_Handle dd 0
Inst_Data_Count dd 0 ; number of instance items seen so far


;;; from LISTINST.AM Sys_Critical_Init handler

;Instancing of the first byte in the 1st MB in order to get all calls to
;_AddInstanceItem befor ListInst_Sys_Critical_Init. The _AddInstanceItem
;service chains the InstDataStrucs together to a sorted double linked list
;via InstLinkF and instLinkB.
;If the LinkF field is -1, no other calls were made.
;If LinkF <> -1, then it represents a call to _AddInstanceItem caused by a
;system.ini - entry "LOCALTSRS= tsr_name". The VMM instances the whole TSR,
;the first 16 Byte represents the MCB of the PSP. So we can determine the name
;of the fully instanced TSR.
 ;;; ...
 mov KM_Instance.InstLinAddr,0
 mov KM_Instance.InstSize,1
 mov KM_Instance.InstType,ALWAYS_FIELD
 mov esi,offset32 KM_Instance
 VMMcall _AddInstanceItem <esi,0>
 cmp KM_Instance.InstLinkF,-1 ;any LOCALTSRS ?
 je nolocal
 mov esi,KM_Instance.InstLinkF ;yes, get it
loclp: mov Inst_Struc_Ptr,esi
 mov calladr,'LTSR'
 call addinst ;add instance item to our list
 mov esi,[esi.InstDataStruc.InstLinkF] ;get next InstDataStruc
 cmp esi,-1 ;no more strucs?
 jne loclp

nolocal:mov eax,_AddInstanceItem
 mov esi,offset32 myhook
 VMMcall Hook_Device_Service
 mov [oldservice],esi
 ;;; ...
BeginProc Hooked_AddInstanceItem
; The AddInstanceItem Hook stores the callers address
; and the instance data pointer in the Inst_Data_Buf buffer.
myhook:
 push ebp
 mov ebp,esp
 push [ebp+0ch] ; Flags
 push [ebp+8] ; Instance Structure Pointer
 push [ebp+8]
 pop Inst_Struc_Ptr
 push [ebp+4] ; get caller's return address!
 pop calladr
 call [oldservice] ; call original _AddInstanceItem
 add esp,8
 pop ebp
 cmp eax,0 ; error in _AddInstanceItem ?
 je exit
 call addinst ; add Instance Item to our list
exit: ret
;*****************************************************************************
;addinst - adds an instance item to our list.
;INPUT: calladr - address of caller of _AddInstanceItem
; Inst_Struc_Ptr - address of InstDataStruc
;OUTPUT: hookerr = -1 - error growing Data_Buf
; hookerr = 0 - all O.K.
;*****************************************************************************
addinst:push edi

 push esi
 push ecx
 push eax
hook2: mov edi,Data_Buf_Addr
 mov ecx,Inst_Data_Count
 imul ecx,sizeof KM_InstData
 add edi,ecx
 push edi
 add edi,sizeof KM_InstData
 mov ecx,Data_Buf_Addr
 add ecx,Data_Buf_Size
 cmp edi,ecx
 pop edi
 jl hook1
 mov ecx,Data_Buf_Size
 add ecx,1000h
 shr ecx,0ch
 mov edx,Data_Buf_Handle
 VMMcall _PageReAllocate <EDX, ECX, PAGEZEROINIT>
 cmp eax,0
 je hookerr
 add Data_Buf_Size,1000h
 mov Data_Buf_Handle,eax
 mov Data_Buf_Addr,edx
 jmp hook2
hook1: mov eax,calladr ;save caller's address in buffer
 stosd
 mov esi,Inst_Struc_Ptr
 mov ecx,(sizeof InstDataStruc)/4
 rep movsd ;save instance data struc
 inc Inst_Data_Count
hookret:pop eax ;next offset pair
 pop ecx
 pop esi
 pop edi
 ret
hookerr:mov hook_err,-1
 jmp hookret

EndProc Hooked_AddInstanceItem
End Listing





















April, 1994
PROGRAMMER'S BOOKSHELF


A Clear Look Through Bleary Eyes at Two Books on Algorithms




Tom Ochs


Tom is a consultant specializing in the integration of modern
software-development methods into technical organizations. He has over 15
years experience as a research scientist, has written a commercial numerical
package, and is a registered mechanical engineer living in Albany, Oregon. Tom
can be contacted on CompuServe at 70511,652.


Algorithm: A set of well-defined rules for the solution of a problem in a
finite number of steps.
Books on algorithms are important tools for the professional software
developer. However, as can be seen from the definition, the breadth of issues
covered under the heading of algorithms can make the selection of the proper
book difficult. Topics can cover numerical applications, business
applications, data structures, searching, sorting, optimization, and many
others. Some generic characteristics seen in algorithm-related books (ARBs)
include: How algorithms are designed, why a particular algorithm is chosen,
what measures are used to assess algorithm effectiveness, construction
considerations, instances of the algorithms, application examples, test cases,
reliability issues, comparisons with other algorithms, and data-structure
dependence. ARBs also have an associated level of difficulty that can range
from introductory through intermediate and advanced, to specialized-advanced
(where you, the author, and three others in the world are interested in the
topic). The tone can vary from practical to academic, and the presentation,
from well written to just plain poor quality.
Clearly, the selection of a book on algorithms is situational, depending on
your needs of the moment. A lot of books are collecting dust on my bookshelves
because their characteristics don't meet my current needs. We should take the
opportunity to use our analytical skills to determine our needs and then
compare those needs to the characteristics of the available books. This can
help make our investments in time and money work for us. To supplement your
needs assessment, here is my analysis of two relatively new books. Like movie
critics who rate films from one to four stars, I will use a p rating, with p
being a book that has little to offer, pp being a book with marginal impact,
ppp having significant contribution, and pppp being a book that must reside on
the serious developer's shelf.


Programming Classics


Any book that purports to be "detailing the best algorithms ever devised for a
wide range of practical problems_" has a huge challenge ahead of it just to
live up to the propaganda on the jacket. Unfortunately, Programming Classics:
Implementing the World's Best Algorithms, by Ian Oliver, falls far short of
the hype. Even though it does cover a wide selection of applications, the
coverage is spotty, sometimes shallow, and generally incomplete. In trying to
limit the complexity of the presentation, Oliver has also limited its
usefulness. On numerous occasions he resorts to hand waving such as: "_is
beyond the scope of this book_," "We will not analyze in detail_," "Do not use
this algorithm unless you know what you are doing," and "Given the
mathematical sophistication needed for dealing with eigenvalues, no discussion
of the reasons why the algorithm works will be given." Oliver has mistakenly
tried to keep the presentation at an introductory level while introducing
intermediate-level algorithms and concepts.
The lack of detailed discussion on the theory of operation of many of these
algorithms leaves you to accept Oliver's choice for the implementation based
on faith alone. If you have to modify, debug, or optimize the functions, the
presentation in this book is generally inadequate. The inconsistency in the
amount of detail is illustrated by the adequate coverage of sorting methods,
including performance comparisons and application-specific suggestions, while
the section on arithmetic is devoid of explanation.
In the poorly explained section on arithmetic, rational methods are introduced
and a warning is given:
"_the methods will fail when integer overflow occurs. For certain practical
applications it will be necessary to implement the algorithms in multiple
precision arithmetic. The algorithms for multiple precision calculations are
beyond the scope of this book."
What Oliver doesn't say is that the methods generally fail after only a few
operations due to overflow, and the use of greatest-common-denominator (GCD)
reduction is only temporarily effective at preventing the overflow. His
presentation also skirts the fact that this implementation only works for toy
problems if multiple-precision arithmetic is not used.
The author uses his own generic language, reminiscent of Ada, to define his
"code" examples. His intent was to produce a broadly targeted representation
that was language independent. Instead, it will be difficult to translate some
of the code to older languages such as Fortran, Cobol, or C. The example code
exhibits problems with initialization, typing, character/byte access,
parameter passing, memory usage, and other implementation issues. Since these
issues are addressed in a generic way, it is almost assured that few real
languages will come close to mapping transparently to his representations,
forcing the users to modify their implementations without a clear
understanding of the algorithm-design issues. If Oliver was serious about
producing reliable code for readers to use directly, he should have chosen
specific target languages so the syntax questions could have been dealt with
in his implementations.
I rate Programming Classics p, for poor execution of a fundamentally good
concept, useful only as the first place to look to find references to more
complete explanations of the problems to be solved. While this book could be
useful to experienced designers looking for a reference that gives terse
overviews and points to other sources for details, it will be a hazard for the
inexperienced designer looking for a quick method to solve a poorly understood
problem.


Algorithms from P to NP


Algorithms from P to NP is a careful, academic text designed for graduate
students, upper-level undergraduate students, and computing professionals
prepared to use rigorous mathematical analysis in problem solving. If you
aren't comfortable with set notation, discrete mathematics, data structures,
calculus, and algebraic expression of problems--pick another book. Algorithms
from P to NP, Volume I, Design and Efficiency, by Moret and Shapiro, is
clearly designed as an advanced textbook to be used in a classroom setting
with an instructor and does an excellent job in that context. It also serves
as a good refresher and reference for those who have been through similar
advanced courses. I particularly liked the presentation and felt that Moret
and Shapiro did a good job of leading the student through the solution
process; however, it is industrial-strength analysis and not for the
faint-of-heart. But it is worth the effort. The authors expect you to
recognize standard algebraic notation, but introduce specific concepts with
which you might not be familiar--a spanning tree, O-notation, generating
functions, and directed graphs. The exercises range from simple examples to
thesis-level assignments.
Algorithms from P to NP concentrates on combinatorial optimization problems
and takes a thorough, depth-over-breadth approach. Moret and Shapiro start
with several traditional problems such as the knapsack problem (filling a
knapsack with the optimal mix of things for a camping trip) and the traveling
salesperson problem (traveling through a series of cities while optimizing
time or distance). These problems are revisited throughout the book in generic
instances as the problem-solving approach is modified and expanded to
encompass extensions of the problems. You're given more tools to deal with
increasingly difficult examples of the problems as the book progresses.
The reference to a "stack of punch cards," which most graduate students have
never seen, dates the origin of some of the examples while demonstrating the
timelessness of the problems. Throughout the book, there is just enough nerd
humor (my favorite kind) to liven up a graduate course in algorithms. The
basic approach of the book is one that I am comfortable with: "The study of
algorithms cannot be dissociated from the study of problems." Their approach
is to start with problem solving and then show how the solutions map naturally
into an algorithm for the effective solution of the problems. They spend time
reviewing methods for assessing algorithm run time, but they deal only
peripherally with the concept of reliability. This limited discussion of
reliability is probably related to the focus on combinatorial problems, as
opposed to numerical issues. Algorithms from P to NP discusses not only the
theoretical, asymptotic behavior of the algorithms, but also the application
and implementation issues that impact performance. The language used for
example code is Pascal, and the code examples have been used and tested in
classroom situations.
The name of the book reflects the concentration in this volume on problems
that have solutions which, as a worst case, require "Polynomial time" (O(Nk),
where N is the number of items and k is some constant) for their completion.
These are represented as P-problems. The second volume deals with NP-complete
problems (problems for which no solution has been found that can be completed
in polynomial time). NP-complete problems are an ongoing subject of research,
and are generally solved by approximation methods.
I rate this book ppp, for concise, clear explanations of problem-solving
issues. A serious textbook for serious study of combinatorial issues. Don't
pick this book up for light reading!
(For reviews of 14 ARBs dealing with numerical issues, refer to my "Building
Blocks" column in the former Computer Language magazine, November, 1992.)
Programming Classics: Implementing the World's Best Algorithms
Ian Oliver
Prentice Hall, 1993, 386 pp. $38.00
ISBN 0-13-100413-1
Algorithms from P to NP, Volume I: Design and Efficiency
B.M.E. Moret and H.D. Shapiro
Benjamin/Cummings Publishing, 1991, 576 pp. $41.95
ISBN 0-8053-8008-6












April, 1994
OF INTEREST
To kickstart PowerPC application development, Apple's APDA group has announced
a number of Macintosh-based programmer tools for yet-to-come PowerPC-based
Apple computers. The "Macintosh on RISC SDK" includes tools for creating new
applications or porting existing Macintosh applications for future Apple
PowerPC-based PCs. At the same time, Apple introduced the
"Macintosh-with-PowerPC Starter Kit" and a comprehensive, self-paced training
course entitled Programmer's Introduction to RISC and PowerPC. Additionally,
Apple is offering Metroworks' native PowerPC development environment,
CodeWarrior. Apple PowerPC-based computers are expected to become available in
the first half of 1994.
The Macintosh on RISC SDK is an MPW-based cross-development environment that
runs on a 680x0 Macintosh, generating native code for
Macintosh-with-PowerPC-based systems. When these Macs become available, you
can finish the port by testing and debugging your native Mac-with-PowerPC
applications. The Macintosh on RISC SDK includes a C/C++ compiler that
generates optimized code, PowerPC assembler, two-machine PowerPC debugger,
universal system header files for both 680x0 and PowerPC processor-based
platforms, MacApp 3.1 (Apple's object-oriented application framework), Apple
Installer 4.0 (which is capable of installing either 680x0 or PowerPC
environments from a common set of files), MPW Development System 3.3, a
PowerPC linker, build tools and scripts, and sample applications for
Mac-with-PowerPC.
The Macintosh-with-PowerPC Starter Kit includes detailed technical
documentation about both the PowerPC microprocessor and System 7 for Macintosh
with PowerPC. Among other information, this kit includes Motorola's PowerPC
601 RISC Microprocessor User's Manual, Inside Macintosh: PowerPC System
Software.
CodeWarrior is a native development environment for the PowerPC-based and
680x0-based Macintosh that lets you create applications for both platforms
using the same source-code base. CodeWarrior comes in three versions: Gold,
Silver, and Bronze. Gold, the most comprehensive, includes development
releases of C/C++ for the 680x0 Mac and Mac-with-PowerPC, a development
release of Pascal for the 680x0 Mac, and C/C++ cross-compilers. Silver
supports native PowerPC development only, and will be released when Apple
ships Mac-with-PowerPC systems. Bronze supports 680x0 development only.
The Macintosh on RISC SDK, available in prerelease with an automatic upgrade,
sells through APDA for $399.00, CodeWarrior Gold (also prerelease) for
$399.00, the PowerPC Starter Kit for $39.95, and the Programmer's Introduction
for $150.00. Alternately, the tool sets are bundled, selling for $849.00.
Reader service no. 20.
APDA
Apple Computer
P.O. Box 319
Buffalo, NY 14207-0319
800-282-2732
A library of encryption tools implemented as linkable object modules and
Windows DLLs has been released by AT&T. The library includes RSA, DES, El
Gamal public-key, Secure Hash, MD5, and Diffie-Hellman encryption technology.
The code modules are packaged as SecretAgent (DES, El Gamal, and DSA digital
signature), SecretAgent II (DES, RSA, and MD5), Surety (DSA), and SecureZmodem
(DES using Zmodem protocol).
Prices for code packages containing DSA are $750.00 for DOS/Windows, $1000.00
for Macintosh, and $1250.00 for UNIX. Packages that include RSA sell for
$300.00 for DOS/Windows, $400.00 for Macintosh, and $500.00 for UNIX.
The license allows programmers to load the code into two workstations for
development purposes. Royalties are required for software distributed to end
users. Reader service no. 21.
AT&T Secure Communications Systems
800-203-5563
Visual Xbase, a visual application-development tool for Xbase and C
programmers, has been released by Rytech International. The tool includes a
3-D screen designer with an integrated, intelligent data dictionary,
workbench, and a flexible, self-optimizing multidialect code generator.
Visual Xbase supports FoxPro 2.5 (DOS and Windows), Clipper 5.2, dBase (IV
2.0), and X2C, an add-on that converts Xbase code to C. In each case, the tool
generates code optimized to the selected language. It supports query by
example, incremental table searches, filter by example, data-integrity
searches, calculated fields, key checking, memory variables, cascaded
deletions, and more.
Visual Xbase sells for $495.95. There are no run-time license fees. Reader
service no. 22.
Rytech International Inc.
2 Stamford Landing, #100
Stamford, CT 06902
203-357-7812
Mathematica toolkits for electrical engineering, signal processing, control
engineering, statistics, finance, and similar disciplines are on the way. The
first in the series is the Electrical Engineering Pack, which covers topics
ranging from elementary to advanced and includes examples in circuit analysis,
transmission lines, and antenna design. The EE Pack also covers Bode, Nyquist
and root-locus plots, Smith Charts, and antenna-field patterns. New functions,
which extend the original Mathematica product, are specifically aimed at the
EE problem domain. Full source code for the examples is also included.
The Electrical Engineering Pack is available for the Macintosh, Windows, and X
Window System and is priced at $195.00. Reader service no. 23.
Wolfram Research
100 Trade Center Drive
Champaign, IL 61820
217-398-0747
Applied Cryptography: Protocols, Algorithms, and Source Code in C, by Bruce
Schneier, has been published by John Wiley & Sons. In its coverage of
cryptographic protocols, techniques, and algorithms, Applied Cryptography is
perhaps the most complete book of its kind. For instance, Schneier (who is a
frequent DDJ contributor) unravels virtually all block algorithms (including
the NSA-backed Skipjack), public-key algorithms (from RSA to cellular
auto-mata), one-way hash functions, random-sequence generators, and special
algorithms for protocols. In addition, the book provides over 100 pages of
published source code for many of these algorithms. Likewise, Schneier's
35-page reference and bibliography section (listing over 900 sources) is of
particular value for research.
The 618-page book sells for $44.95 (ISBN 0-471-59756-2). Reader service no.
24.
John Wiley & Sons Inc.
605 Third Ave.
New York, NY 10158
212-850-6000
RenderWare, an interactive 3-D graphics API for Windows released by Criterion
Software, supposedly increases Windows 3-D graphics performance, without the
need for special 3-D graphics accelerators. Based on 3-D graphics software
technology from Canon (Criterion's parent company), RenderWare reportedly
enables mid-range workstation performance on a 486/50 PC.
RenderWare provides a device-independent 3-D graphics API, an object-based
interface consisting of a small number of object types, and functions such as
advanced shading and texturing. Typical applications for RenderWare include
multimedia, visual simulation, scientific visualization, CAD, virtual reality,
presentation graphics, and entertainment/games.
In addition to Windows, the RenderWare API is available on other platforms
such as Macintosh, UNIX (X11), and OS/2. The RenderWare SDK, which is priced
from $10,000.00, includes a development library, debugging library,
documentation, examples, and demos. Reader service no. 25.
Criterion Software Ltd.
17-20 Frederick Sanger Road
Guildford, Surrey
United Kingdom, GU2 5YD
+44-483-574-325
Undocumented DOS, second edition, by Andrew Schulman, Ralf Brown, David Maxey,
Raymond Michels, and Jim Kyle has been released by Addison-Wesley. The book,
spearheaded by Schulman, who edits DDJ's "Undocumented Corner" column, has
been updated to include coverage of MS-DOS 6, Novell DOS, Windows 3.1, the
forthcoming "Chicago" operating system (DOS 7 and Windows 4), and more.
Like its predecessor, Undocumented DOS, second edition belongs on every PC
programmer's bookshelf. The book, with disk, retails for $44.95. (ISBN
0-201-63287). Reader service no. 26.
Addison-Wesley Publishing Co.
1 Jacob Way
Reading, MA 01867
617-944-3700
A set of tools that provide programmers with an API for fax-related
applications has been developed by Sofnet. These tools, called the "FaxWorks
API," facilitate the development of apps that integrate fax, OCR, scanning,
voice, and image viewing. The API is proprietary, however, in that it was
developed to support Sofnet fax-related software--FaxWorks Pro LAN, FaxWorks
OS/2, FaxWorks ProServer, and so on.
The FaxWorks API forms a protocol layer in which other apps can exchange
information. FaxWorks acts as a server when applications ask it to perform
tasks such as faxing, scanning, or OCR. It acts as a client, however, when it
asks applications for data such as a list of names and fax numbers from a
phone book. The FaxWorks API and documentation are available free on
CompuServe (GO SOFNET). Reader service no. 27.
Sofnet Inc.
1110 Northchase Parkway, Suite 150
Marietta, GA 30067
404-984-8088
As Ken North pointed out in his article, "Database Development and Visual
Basic 3.0" (DDJ, March 1993), languages embedded in application programs are
becoming more and more common. In the case of Microsoft, Visual Basic for
Applications (formerly Object Basic) is a programming tool currently available
only for Excel 5.0, but with more application support presumably on the way.
WordPerfect has countered with WordPerfect 6.0 for Windows SDK and WordPerfect
File Format SDK.

The WordPerfect 6.0 for Windows SDK features WordPerfect's new Writing Tools
API, a macro language, and Shared Code 2.0 for Windows--a shared library of
routines used by all WordPerfect for Windows software. The File Format SDK,
which is available on a nondisclosure basis, contains documentation defining
the WordPerfect 6.0 format and the WordPerfect Graphic File Format.
In related news, Softbridge has announced that SBL 3.0, an implementation of
the Basic language for embedding into application software, now supports OLE
2.0 automation. This means that SBL, which has a syntax compatible with Visual
Basic, can operate within and across applications.
The WordPerfect 6.0 for Windows SDK and WordPerfect File Format SDK sell for
$149.00 each. Reader service no. 28.
Softbridge provides various licensing arrangements for SBL 3.0. Reader service
no. 29.
WordPerfect Corp.
1555 N. Technology Way
Orem, UT 84057-2399
800-451-5151
Softbridge Inc.
125 Cambridge Park Drive
Cambridge, MA 02140
617-576-2257
According to Al Stevens in this month's "Examining Room," good help can be
hard to find when it comes to creating Windows help systems. Addressing this
problem is Masterhelp, a new tool from Performance Software. MasterHelp takes
text formatted for printing with Word for Windows and automatically creates a
Windows help file, including hypertext jumps. The program also automatically
creates Microsoft's Multimedia Viewer files for interactive tutorials and the
like.
Among features created by MasterHelp are: a table of contents in a secondary
window; a pop-up window which provides an overview of the entire document; a
pop-up window that lets you know where you are in the document; and a pop-up
window that shows hypertext-related topics. MasterHelp retails for $495.00.
Reader service no. 30.
Performance Software Inc.
575 Southlake Blvd.
Richmond, VA 23236
804-794-1012
Manageware for NetWare, a multiplatform tool for managing NetWare-based
networks from Hitecsoft, is designed specifically for creating NetWare
Loadable Modules (NLMs) and server-based applications. Manageware is based on
a specification called the "Network Management Language" (NML) developed by
Hitecsoft. Similar to a fourth-generation language, NML provides a more
flexible means of accessing network internals than traditional languages. NML
is operating-system independent and has built-in network extensions (such as
client/server, distributed processing, smart-object architecture, and others).
Manageware-based NLMs run under all supported platforms without source-code
modification. This means that you can develop and test network-management
programs under DOS, then run them as NLMs on the server.
Manageware Version 1.0 is an interpreter (and compiler for the developer's
edition) that features a flexible preprocessor, virtual memory management,
automatic variable declaration, external function calls, and user-definable
functions with local variable declaration and dynamic parameter passing. The
tools provide full access to NetWare internals such as binderies, connections,
directories, queues, IPX, SPX, and so on.
Manageware for NetWare sells for $895.00. Reader service no. 31.
Hitecsoft Corp.
3370 N. Hayden Road, Suite 123-175
Scottsdale, AZ 85251-6632
602-970-1025




































April, 1994
SWAINE'S FLAMES


Pentium vs. PowerPC




Michael Swaine editor-at-large


It's Pentium vs. PowerPC. That's the simplistic view of what's going on in the
area of personal-computer CPUs. DEC and MIPS Technology may see things
differently, and this magazine is rarely simplistic about such matters, but
this page is where we dumb down to the level of the rest of the computer
press. Or even lower. And what's lower than a Lettermanesque Top Ten List?
Generous to a fault, we give you two.


Top Ten Reasons why Pentium will Prevail


10. Anybody out there using the Dvorak keyboard? You do know that it's been
shown to be superior in every way to the ubiquitous Qwerty keyboard, don't
you?
 9. On the PowerPC you'll have to run Windows and DOS apps under emulation. On
Apple's own PowerPC machines, which it is calling Macintoshes, you'll have to
run Macintosh apps under emulation. Emulation is slow. Emulation is an
unnecessary layer of complexity. Emulation is evil.
 8. The subliminal message. It probably wasn't a good marketing decision to
call the technology behind the PowerPC "RISC."
 7. Intel stock keeps going up.
 6. In the short run, the ability to run DOS and Windows apps faster than a
486 machine is what will justify buying a new machine. In the short run, Intel
wins.
 5. In the long run, bet on the company with the deepest pockets. In the long
run, Intel wins.
 4. Everybody roots for the underdog. And puts their money on the favorite.
 3. Compatibility. Compatibility. Compatibility.
 2. Did I mention? Computer buyers value compatibility.
 1. The Austin factor. Can we be absolutely sure that those Motorola guys
won't withdraw the PowerPC from the market the first time New York Times
columnist William Safire criticizes it? There's something in the water down
there.


Top Ten Reasons why PowerPC will Prevail


10. It's got significantly faster floating-point performance than Pentium.
 9. It's cheaper. By half.
 8. Apple and IBM are solidly behind it.
 7. Apple and IBM stocks are recovering.
 6. If anyone is looking for a bridge from Intel to RISC, and a lot of people
are, it's here. The first generation of PowerPC machines will run existing
apps under emulation at speeds comparable to existing mid-range to high-end
PCs. Native apps will be considerably faster. Early indications are that the
emulations will be very solid.
 5. Precedent. IBM's RS/6000 workstations haven't done too badly, and PowerPC
is the migration of the RS/6000 processor technology to the personal-computer
market.
 4. The portable edge. The portables market is critical, and by releasing a
Pentium chip that won't work in portables, Intel has given PowerPC a huge head
start in portables.
 3. Price. Price. Price.
 2. Well, do computer buyers value compatibility? I mean, even if it costs
them something? When have they ever had to pay for it? How much are they
willing to pay for it?
 1. The Clinton clincher. The future belongs to those willing to embrace
change.
One reason that did not make the second list: It's Intel's turn to be the Evil
Empire. No, IBM had the '80s and Microsoft gets the entire decade of the '90s.
No honeymoon for Bill.

















May, 1994
EDITORIAL


Less Talk, More InfoAction


Life can be tough in the Canadian Maritimes. The summers are short, winters
bitter, and economic conditions perpetually harsh. When the federal government
says you can't fish or log anymore, and McDonald's stops buying your potatoes
for french fries, Maritime job opportunities get up and go. Still, Maritimers
hang onto their can-do attitude and have proven they aren't afraid of working
hard and taking risks.
To a Maritimer's way of thinking, education is the key to breaking out of
seemingly endless economic doldrums. An educated and highly skilled work
force, so the story goes, will attract new industries, thereby providing more
jobs and a higher standard of living. Education, in fact, was what took me to
the Maritimes nearly 20 years ago, when I went to Prince Edward Island--the
small island province across from Nova Scotia and New Brunswick--to teach
school. Except for the clean-smelling locker rooms, the schoolhouse itself
wasn't much different from those I'd grown up in and taught at. But for the
students--many of whom attended one-room country schools the previous
year--stepping into that new building was stepping into the future.
Buildings, however, are expensive to construct and maintain, not to mention
hard to get to when the snow's howling off the Gulf of St. Lawrence.
Consequently, the province of New Brunswick has launched TeleEducation NB, a
project aimed at delivering education via information technology. But
TeleEducation, also known as the "New Brunswick Distance Education Network,"
is more than the high-tech information-highway smoke we've grown used to
inhaling. TeleEducation is up and running and delivering province-wide
educational services ranging from high school- and college-level classes to
in-house corporate and extension training.
New Brunswick citizens have at their fingertips classes on astronomy out of
Mount Allison University and health care from the University of New Brunswick.
Farmers are taking animal-husbandry courses at nearby agricultural extension
offices, police officers are studying law at the local hoosegow, and employees
of McCain Foods are participating in training sessions from their desks.
Within its first six months, TeleEducation was delivering coursework to nearly
2000 students. The ultimate goal, says project director Rory McGreal, is for
60 percent of New Brunswick's population to eventually participate in some
form of distance education.
There's nothing particularly exotic about New Brunswick's network. A standard
TeleEducation site is a 486 PC with a 240-Mbyte hard disk, 8 Mbytes of RAM, a
low-cost digitizing tablet, and a 14,400-baud modem. On the software side,
each Windows 3.1-hosted PC runs Smart Technologies' Smart 2000 conferencing
system, which enables students to communicate with each another in real time,
sharing graphics, text, and images. In a typical scenario, the student dials
the TeleEducation modem bridge (a 486/66 PC with multiple modems attached) and
begins communicating in peer-to-peer fashion with teachers and fellow students
over the province-wide digital fiber-optic network. This communication
consists of receiving, annotating, and sending electronic worksheets, or even
launching applications. In the near future, the project will move to TCP/IP
and communicate over the Internet, bringing every school in the province
online. Eventually, the network will provide one-stop shopping for government
and private-sector services--driver's licenses, health-care information, water
and electric bills, and the like--via kiosks or home PCs.
But projects like TeleEducation need more than hardware and software to work.
They require vision, leadership, clearly stated goals, and purpose--all
present in the New Brunswick project.
The fundamental principles that define TeleEducation are that it be open,
available, and affordable to all citizens. As stated in its strategic plan,
"the network forms part of the province's strategic agenda for supporting
local entrepreneurship in the knowledge industries and achieving economic
independence by raising the general level of education of the province_. In
this manner, the government hopes to promote the spirit of self-reliance and
entrepreneurship."
To turn this vision into reality, New Brunswick has created the Ministry for
the Electronic Information Highway. Reporting to the Minister of Economic
Development, this new department is charged with promoting the development of
the information highway by making government a model user, while at the same
time encouraging private-sector involvement.
Clearly, U.S. high-tech interests could learn from New Brunswick. Most of what
we've heard are vague generalities and promises; most of what we've seen are
demonstrations of what someday might be. From radio and TV commercials that
redefine the information highway to their particular advantage to Rush
Limbaugh labeling it as a liberal plot, everyone is talking about the
information highway, but no one is doing much about it--except for the likes
of New Brunswick. The technology is here now. What's missing is the vision,
leadership, and commitment to use that technology to create a better society.
Until we find these, the information superhighway will likely remain nothing
more than a back alley.
Jonathan Ericksoneditor-in-chief












































May, 1994
LETTERS


More on ASPI




Dear DDJ,


Brian Sawert's article "The Advanced SCSI Programming Interface" (DDJ, March
1994) was both readable and useful--especially the C examples for using ASPI.
I'd like to add some information about ASPI host managers and how they fit
into the scheme of things, at least under DOS.
Adaptec has an SDK manual with source-code diskette, available to companies or
individuals who have joined Adaptec's ACAP (Adaptec Compatibility Advantage
Program). Contact Kristi Rinehart at Adaptec (408-945-8600) for more details.
The materials include a MASM program showing how to use ASPI under DOS, a C
program and headers showing how to use ASPI under Windows through Adaptec's
WINASPI.DLL, and comparable information for using ASPI under OS/2 and Novell
NetWare.
As far as I know, only Adaptec has a Windows DLL, WINASPI.DLL, for using ASPI
with the ASPI host managers that are paired with the various Adaptec SCSI host
adapters. This approach avoids using DPMI and also depends on the availability
of an undocumented extension to ASPI, as described in released Adaptec
materials. Only WINASPI.DLL--not your Windows app--needs to use the
undocumented ASPI stuff. I'm pretty certain that WINASPI.DLL also requires
that the ASPI host manager be capable of executing either in real or protected
mode, which would make ASPI run a lot faster than via DPMI with its necessary
switching from protected to real mode and back again.
Although not necessarily a complete list, the following vendors do supply ASPI
host managers, a prerequisite for any ASPI programming, with their SCSI host
adapters: Adaptec, Advanced Integration Research (AIR), Alpha Research, Always
Technology, American Megatrends (AMI), BusLogic, Distributed Processing
Technology, DTC, Future Domain, LinkSys, Trantor (now a fully owned Adaptec
subsidiary), and UltraStor. In addition, Corel includes the IBMASPI.SYS ASPI
host manager for DOS in Version 2.0 of CorelSCSI. This device driver is for
the SCSI host adapters that are primarily in IBM's PS/2 Micro Channel
computers. In summary, ASPI has become fairly ubiquitous in the SCSI world,
since the above list probably covers more than 99 percent of all SCSI host
adapters in the personal computing universe.
The ASPI specification provides for true overlapping of SCSI I/O with
CPU-intensive processes, even under monotasking DOS, but requires a
bus-mastering SCSI host adapter to really do it. The documented ASPI mechanism
for doing so is called "posting." You put a nonzero segment:offset of a
callback address in the SRB, and your program gets control there when the I/O
is completed or posted. (A zero callback address causes the ASPI host manager
to complete the SCSI I/O before returning to your program.) A word of caution:
An ASPI manager for a non-bus-mastering device (such as parallel-port SCSI or
the PIO Adaptec 1522) executes the SRB to completion, then gives your program
control at the callback address. In other words, the sequencing between
callback and return to your program may vary among ASPI host managers.
Ben Myers
Spirit of Performance
Harvard, Massachusetts


Post-Compile Optimization




Dear DDJ,


In his article, "Examining OPTLINK for Windows" (DDJ, November 1993), Matt
Pietrek did an excellent job of comparing the optimization capabilities of
SLR's OPTLINK with Microsoft's LINK and Borland's TLINK.
It should be noted that any code optimizations beyond those listed in the
article would require the complete disassembly, adjustment, and reassembly of
the input machine code and symbol tables. PC_Opt, a post-compile optimization
tool I've developed, performs such additional optimizations.
In its present form, PC_Opt performs far-to-near procedure conversion,
register parameter-passing conversion, stack-clearing conversion, and
unaccessed-code elimination on MS-DOS OMF-compatible object modules and
libraries. The resultant size and speed improvements vary, depending on the
application being optimized.
Although PC_Opt (which sells for $15.00 plus $5.00 shipping and handling)
lacks the features and flexibility of the tools Matt discussed, it does
operate within levels of reliability and ease-of-use, which sufficiently
demonstrates that a post-compile complete, object-code optimization step
should be considered as an integral part in any worthy software-development
environment.
Jim Taylor
Optimite Systems
Dallas, Texas


Palindromes




Dear DDJ,


I enjoyed Tom Swan's "Algorithm Alley" column on palindromic encryption (DDJ,
November 1993) and the subsequent letters from readers contributing their own
palindromes. Some years ago, I worked with a fellow who had previously been
employed by Mostek (remember them?) and hated the company with all his heart
and soul. For his edification, I composed this palindrome: Market some Mostek
RAM. Thanks for an interesting article.
Harold M. Martin
Bellaire, Texas


That Looks Familiar




Dear DDJ,



I noticed something very familiar on your January 1994 cover: one of my
company's products. The red-striped mainframe is an Amdahl 5995-a1400mp. Like
many DDJ readers, I have an eye for detail. When looking for the note as to
where this cover was taken (Apple Computer's data center), I noticed no
mention of the type or manufacturer of the mainframe.
Of course, being an Amdahl employee, I'm always looking for defense of our
mainframe-vendor honor. It's amusing to me that in the midst of all this PC
and workstation hype, Apple has a big Amdahl mainframe in their data center.
Yes, I'm nitpicking, but to hear the PC trade journals tell, the mainframe is
dead and buried. We, as a company, are and will be getting smaller, as many
businesses are today. But mainframes still carry the major work loads for
business and industry. We are still selling and improving them. As speeds
improve in CMOS technology to what mainframe ECL logic is capable of, all
computers will get smaller. In the same light, as PC and workstation platforms
learn to use "RAS" (reliability, availability, serviceability) and
multiprocessing as mainframes do, the great fissure between the two camps will
look like a drainage ditch for the runoff of the companies that can't keep up
with change in user direction.
Michael R. Bonuchi
Chicago, Illinois


Forth Corrections




Dear DDJ,


Thanks for publishing my letter in the March 1994 issue of Dr. Dobb's Journal.
Unfortunately, there were a couple of errors in Example 2 on page 12. Example
1 here is a revised, corrected version.
William E. Drissel
Grand Prairie, Texas


Prior Patents




Dear DDJ,


In his March 1994 "C Programming" column, Al Stevens discussed a patent
(#4.540.292) for an electronic calendar display that's owned by Psytronics of
Princeton, NJ.
I wrote a program in 1977 to "electronically" display and print calendars in
conventional paper form. I later mailed it to D.E. Cortesi, Dr. Dobb's
resident intern. He published it in the July 1981 "Dr. Dobb's Clinic" column
on page 42. Because the typesetter left off the "<" and ">" symbols, the
program was published again several months later.
I have no idea when Psytronics applied for their patent, but if it was after
the July 1981 issue, couldn't the published article be used to establish prior
art?
In any event, Al was correct. Granting the patent was absurd.
L. Barker
Chicago, Illinois


Galileo! Newton! Where Are You When We Need You?




Dear DDJ,


This is regarding Peter Varhol's review of Bart Kosko's book Fuzzy Thinking
(DDJ, November 1993).
All this talk about multistate logic puts me in mind of a fad prominent with
science-fiction writers circa 1940: They called it "non-Aristotelian logic"
and abbreviated it null-A. There was a whole line of "Null-A" stories by
Heinlein, Asimov, and the like.
Or does "fuzzy" mean nonlinear? Or probabilistic? Or analog? Or precisely
what? I must say, it's aptly named--it's all so fuzzy. Is the wheel being
reinvented? In light of all this fuzzy thinking, Galileo's observation that
"the universe is written in the language of mathematics" bears repeating.
Galileo, where are you when we need you?
One can empathize with the author's disillusionment with "the complexities of
nonlinear mathematics," but there exists a little-known mathematical
discipline called "relaxation analysis" that enables relaxation oscillations
and other broken and finitely discontinuous functions to be handled directly
in the time domain with ease and aplomb--allowing them to be readily
differentiated and integrated, among other things. (See my book, Waveforms: A
Modern Guide to Nonsinusoidal Waves and Nonlinear Processes, Prentice-Hall,
1986.) Thus, some seemingly intractable problems of writing analytic
expressions for the nonsinusoidal functions often encountered in electronics
and other walks of life are easily handled. For example, Figure 1(a) shows the
integral of the AC component of a full-wave rectified sinusoid; in this case,
a cosine wave. Try graphing it using Basic or whatever, and you'll see the
skewed sinuous wave typical of the incompletely filtered output of a full-wave
rectified sinusoid. The curve is not sinusoidal, but you have to look closely
to see that! If you have a CGA (or better) card and a pipeable Basic on path
(such as GWBASIC 3.22), you can graph it using the command in Figure 1(b) at
the DOS prompt.
Still, whatever gets you thinking is cool. Long live fuzzy!
As for Michael Swaine's "Swaine's Flames" in the same issue, "the model is the
thing" is a case in point of "familiarity breeds contempt." In this case,
familiarity with the model breeds contempt for the real thing, and the model
then becomes the real thing in one's thinking. I thought Sir Isaac Newton got
mankind out of that destructive loop long ago.
Before Newton's extended use of the scientific method of experimentation, many
wise men were convinced they could uncover all the secrets of the universe by
simply reasoning about them. Boy, did that get us off track!
No, the model is not the real thing, any more than The Nightmare Before
Christmas is the real world. And thinking it is represents a reversion to
earlier times in which superstition ruled supreme. Lord deliver us from a
revisitation of those times.
Sir Isaac, where are you when we need you?
Homer Tilton
Tucson, Arizona

Example 1: A Forth text interpreter, compiler, and debugger in pseudocode.
forever {

 get the next word (delimited by
 whitespace)
 look it up in the dictionary
 if found
 if we are compiling
 if the word is immediate
 execute the word
 else
 "compile" the word into the
 dictionary
 else
 execute the word
 else //(not found, must be a number
 or undefined)
 if it's a number
 if we're compiling
 "compile" a literal into the
 dictionary
 else
 push the number onto the stack
 else
 send word followed by "?" to
 screen
 stop compiling
 flush interpreter input buffer
 accept future input from keyboard
} // end of forever

Figure 1: (a) A cosine wave; (b) Basic command graphing a full-wave rectified
sinusoid.
(a) {_COS(X)_--2/PI}=SIN(ATN(TAN(X)))--(2/PI)ATN(TAN(X))+constant

(b) Echo h=6.28:v=1:screen 2:for j=0 to 600:x=h*(j/300- 1):Q=atn(tan(x))
:y=sin(Q)-.64*Q:pset(j,100-95*y/v):next:wait 96,2GWBASIC






























May, 1994
Trends in Operating System Design


Will we gain portability at the expense of performance?




Peter D. Varhol


Peter is chair of the graduate computer science department at Rivier College
in New Hampshire. He can be contacted at varholp@alpha.acast.nova.edu.


Over the past several years, we've witnessed a number of trends affecting
operating-system design, foremost among them a move to modularity. Operating
systems such as Microsoft's NT, IBM's OS/2, and others are splintered into
discrete components, each having a small, well-defined interface, and each
communicating with others via intertask message passing. The lowest level is
the microkernel, which provides only essential OS services, such as context
switching. Windows NT, for example, also includes a hardware-abstraction layer
beneath its microkernel which enables the rest of the OS to perform
irrespective of the processor underneath. This high level of OS portability is
a primary driving force behind the modular, microkernel-based push.
For an example of a modular, operating-system architecture, there's no better
place to look than QNX Software's QNX operating system. QNX is a real-time OS
with a UNIX-like command language. QNX consists of a tiny (around 8-Kbyte)
microkernel that only handles process scheduling and dispatch, interprocess
communication, interrupt handling, and low-level network services, all of
which are accessible through 14 kernel calls. The size and simplicity of the
kernel allows it to fit entirely in the internal cache of processors such as
the 80486.
A minimal QNX system can be built by adding a process-manager module, which
creates and manages processes and process memory. To use a QNX system outside
an embedded or diskless system, a file system and device manager can be added.
These managers run outside kernel space, so the kernel remains small. For more
details, see the accompanying text box entitled, "QNX: A Scalable,
Microkernel-Based Operating System" and "A Message-Passing Operating System,"
by Dan Hildebrand (DDJ, September 1988).
Likewise, IBM's Workplace operating system (see Figure 1) is based on the Mach
3.0 microkernel, although IBM-specific extensions (developed with the OSF
Research Institute) support parallel processors and real-time operations. This
implementation counts five sets of features in its core design: interprocess
communication (IPC), virtual-memory support, processes and threads, host and
processor sets, and I/O and interrupt support.
Process dispatch is in the microkernel, but process scheduling is not. The
design goal behind this distinction is to separate policy from mechanism. In
this case, dispatch is a core mechanism that need never change, but scheduling
is a policy that might. This lets you swap the default scheduler for one that
provides stronger support for real time, for example, or for a specialized
scheduling policy for nonstandard uses.
Above the microkernel, IBM implements personality-neutral services (PNSs) that
implement a policy rather than a mechanism, and run outside kernel space.
Memory management, for instance, is divided between the microkernel and a PNS.
The kernel itself operates the paging functions of the CPU. The pager,
operating outside the kernel, determines the page-replacement strategy--that
is, which pages will be removed from memory to accommodate a page brought in
as a result of a page fault. The pager implements a policy, and the policy can
be changed through the use of an alternative pager. IBM is providing a default
pager to boot Workplace OS, but the primary paging mechanism is actually the
file system, which provides memory-mapped file I/O, caching, and
virtual-memory policies, combined.
PNSs include not only traditional OS services (such as the file system and
device drivers), but also networking and even database engines. Behind this
strategy is IBM's belief that placing application-oriented services such as
these close to the microkernel can improve the efficiency of data transfers
and queries. Third-party database vendors such as Oracle can then embed
database engines as PNSs to improve performance or make more-direct use of
kernel services.
The third layer of modules, closest to the user, is composed of individual
personalities. A "personality" is the appearance and behavior of an operating
system from the standpoint of the end user. OS/2 can be one personality,
Windows another, UNIX a third. The personality looks like the operating
system, and system services behave in the expected manner, but many of the
services are actually implemented at the PNS level, differently than in the
original OS. IBM has demonstrated a UNIX personality, which was simply the
entire OSF/1 image running on top of Mach.


Objects and Distributed Computing


Another major trend is objects finding their way into operating systems. The
primary characteristic of objects that makes them worth using in an operating
system is encapsulation. This makes possible, for example, object-embedding
technologies such as Microsoft's object linking and embedding (OLE) that would
have been difficult (if not impossible) using a file-based data paradigm.
Objects and message passing go hand in hand. In a classic object-oriented
system, messages carry data objects along with instructions on what to do with
that data. In an OS, message passing helps modularize the operating-system
architecture, since the transfer of data is not dependent upon having a
function to call.
Operating systems such as QNX and Windows NT already use message passing, at
least to some extent. Message passing in NT supports networking as well as
security. For example, the security gateways check every system message to
ensure that the user has the privileges to send that message. Consequently,
data and instructions are under better control than in a traditional OS.
Among the emerging object technologies are IBM's System Object Model (SOM) and
the Distributed System Object Model (DSM), Microsoft's Component Object Model
(COM), the Object Management Group's Common Object Request Broker Architecture
(CORBA), Next's Portable Distributed Objects (PDO), and Taligent's Taligent
Operating Environment (TOE).


Performance is an Issue


One question that's hounded message-based operating systems from the start is
performance. Does communicating with different components through message
passing--as opposed to straight function calls--hurt performance? It clearly
can (although QNX claims that its message-passing architecture offers
performance comparable to that of traditional architectures). In
object-oriented languages such as Smalltalk, vendors claim decent but hardly
stellar performance for message passing. Whether the OS queues messages, or
whether a message blocks until the recipient executes a receive (as in QNX),
it is easy to see that this mechanism can be slower than a function call.
These new operating systems are using a variety of techniques to improve
message-passing performance. One common approach, used by IBM, involves the
use of shared memory space so that data doesn't have to be copied from one
memory address to another. However, this still requires that two processes
establish a connection before the shared-memory approach can work. This is
still a two-step process (connect, then exchange), so it is still more time
consuming than a straight procedure call.
Windows NT takes this one step further, with a special implementation of the
local procedure call for Win32 applications called the "quick LPC." This
technique opens one port to establish the connection between processes, then
passes multiple messages through a shared memory space without the need to
send additional messages through the port. However, there is a trade-off: NT
assigns a thread to every instance of the quick LPC, and this uses up system
resources.
Another performance issue revolves around memory utilization. If higher-level
OS services run in user space, as they do with QNX and Workplace OS, there's a
trade-off between efficiency and performance. Kernel processes cannot be
swapped out to disk, while user processes can. This means that an OS that
relies on user processes may run in less memory, at the expense of speed. One
solution to this is a configurable kernel. The next release of OSF/1 will let
system administrators determine whether to run large parts of the OS in kernel
or user space. Thus, you'll be able to tune the OS for specific needs.
Overall, reduced performance may be a consequence of the direction operating
systems are taking, as has been true over the last few years with windowed
systems. One alternative is to give priority to a particular operation at the
expense of others, as Windows NT does with I/O and context switching.


Conclusion


The modularity of emerging operating systems will not be very noticeable to
application programmers in the near future. Most of us will be programming on
commercial versions with most of the major building blocks built in. There
will probably be about the same number of APIs, although it may be important
to know which module a particular API applies to.
The benefits will be primarily indirect. The unified approaches to OSs, for
example, mean that porting applications will be easier. Particularly with
IBM's multiple personalities, the OS issue may not even matter, as long as the
CPU is the same.
The big change will come with objects. Both OLE and Apple's OpenDoc (a
compound-document architecture designed for sharing text, graphics, and video
objects across operating systems) will require that developers understand and
adhere to the underlying object model so that they can take advantage of the
ability to establish hot links between data objects into compound documents.
Applications will have access to OS services that will fundamentally change
how we view data.
The bad news is that there are competing object models. OpenDoc includes
support for Microsoft's OLE 2.0 spec so that an OLE application should work
with an OpenDoc operating system, but not vice versa. Other object models will
have their own ways of doing things. Multiple personalities such as the
Workplace OS, will relieve some of the learning curve, but versatile
programmers will have to know not only C++ and objects, but how multiple
operating systems use them.
 Figure 1: IBM's Workplace operating system is based on the Mach 3.0
microkernel architecture.


QNX: A Scalable, Microkernel-Based Operating System


The operating system of the future may best be modeled by QNX Software's QNX,
a 32-bit multitasking OS that utilizes a tiny microkernel. QNX takes a modular
approach to services that lets you choose only those services necessary for a
particular use. QNX is not an implementation of UNIX, despite its UNIX-like
command language and POSIX compliance. It is a separate and distinct operating
system from the ground up, and it uses technologies just now starting to come
into the mainstream.

The heart of QNX is its microkernel, which implements interprocess
communication, low-level network services, process scheduling, and interrupt
dispatching; see Figure 2. Process scheduling is real time with preemption,
and scheduling is prioritized with round-robin, FIFO, and adaptive-scheduling
disciplines. All kernel services are available through 14 APIs, so the ways to
access the kernel services are limited.
QNX is a message-passing operating system that utilizes blocking versions of
Send, Receive, and Reply function calls. Messages don't queue--the message
facility is a process-to-process copy, which QNX claims provides performance
comparable to function calls. You can construct your own message queues using
built-in messaging primitives.
However, the microkernel does not include process managers, device managers,
or a file system. The process manager, Proc, provides services such as process
creation and accounting, memory management, inheritance, and pathname-space
management. Together, the kernel and Proc provide the features necessary to
implement a bare-bones operating system. Fsys (the file-system manager) and
Dev (device manager) can be added to for more robustness. Like other QNX
processes, device drivers run in user space, but use a specific API to enable
them to access a kernel-interrupt vector.
The networking manager is an optional component, tied directly into the
microkernel. There is a private interface between the kernel and the network
manager, so that any messages passed from a local to a remote process are
queued to the network manager. Net manages the sending and receiving of
messages, essentially merging microkernels on different nodes into a single,
virtual microkernel.
The message-passing architecture, combined with networking services, produces
a seamless, distributed system. From the standpoint of user processes, there
is no difference between a local call and a call across the network. Likewise,
all services above the microkernel are transparently accessible to all
processes, whether or not they are local. For data acquisition, QNX can use a
private connection between microkernels on a network. This lets you mirror a
data-acquisition process without generating traffic on a network being used
for other activities.
QNX can be extended. New modules can be developed in user space and debugged
at the source level while still providing services normally associated with
the kernel. QNX claims that customized OS services can be easily developed by
application programmers. Because of the small number of APIs in the kernel and
the limited number of APIs in the other QNX-provided components, the QNX
learning curve isn't as difficult as with UNIX.
The QNX microkernel consists of 605 lines of source code. A complete
implementation of all of the services necessary to implement process
management, device management, the file system, and networking, is under
16,000 lines. QNX also conforms to POSIX 1003.1, 1003.2 (shell and utilities),
and 1003.4 (real time). With POSIX compliance and a similar command-line
interface, is it possible to use QNX in place of UNIX? From my own
experiments, the answer appears to be yes, at least in some circumstances. QNX
Software is not positioning QNX as a general-purpose operating system, but
there's no reason why it can't be used for almost any purpose.
--P.D.V.


An Interview with Linus Torvalds, Creator of Linux




Sing Li




Sing, a products architect with microWonders in Toronto, specializes in
embedded-systems development, GUI portability, UNIX system programming, and
device drivers. You can contact him on CompuServe at 70214,3466.


Linus Torvalds is a student at the University of Helsinki (Finland) working
towards a masters degree in computer science. In 1990, he took an
operating-systems course on UNIX and C and became hooked on OS design. Linus
wanted to make his 386 PC function like the Sun workstations at the
university. What started out as a protected-mode utility posted on the
Internet, eventually resulted in Linux, a widely popular 32-bit,
protected-mode, preemptive multitasking operating system that runs on 386 PCs.
The Linux project now involves hundreds of programmers worldwide. It is
available at ftp sites around the world, the most popular distributions being
the MCC (Manchester Computer Center) in England and SLC (Softlanding Software)
in Canada. The full distribution consists of kernel sources, C, C++, man
pages, basic utilities, networking support, X Windows, XView/OpenLook, DOS
emulators, and much more. A comprehensive list of Linux distribution sites for
downloading, as well as related information, is available electronically (see
"Availability," page 3).
Linux supports an unlimited number of concurrent users. Each application runs
in its protected address space, greatly reducing the chance of system crashes
brought on by ill-behaved applications. Applications on Linux can make use of
either static or dynamically linked libraries.
Virtual memory is supported through demand paging, and up to a total of 256
Mbytes of usable swap space can be configured. Executables are demand loaded,
this ensures efficiency in memory usage as well as better system performance.
The memory manager supports shared executable pages with copy-on-write. There
is a common memory-cache pool for both system and application use which
ensures that memory is best utilized wherever it is needed.
Kernel support of networking is included for TCP/IP, both over standard
Ethernet hardware and over asynchronous lines (via SLIP, serial-line Internet
protocol). The operating system supports various national or customized
keyboards. The PC console can act as multiple virtual terminals under Linux,
using hot-key switching. Each virtual terminal acts independently and can be
in either graphic or character mode.
I recently linked up with Linus over the Internet and asked him about the
history (and future) of Linux.
SL: What was your motivation behind building Linux?
LT: I bought my first PC clone in early '91, and while I didn't want to run
MS-DOS on it, I couldn't afford a real OS for it either. I ended up buying
Minix, which I knew of from an OS course, and while it wasn't really what I
hoped for, I still had a reasonable UNIX clone on my desk.
Linux didn't start out to be an operating system: I just played around with
the hardware to learn about the new machine, and found the memory management
and process switching of the 386 especially interesting. After tinkering a few
months, my small project eventually became something that looked more and more
like an OS. So I decided I wanted to create something that I could use instead
of [running] Minix on my machine.
When I decided to create my own OS, compatibility became a major factor. I
wanted to write just the OS: I didn't want to rewrite every program under the
sun. That is still very much true, and Linux seems to be one of the easier
UNIXs to port things to--it's a good mix of POSIX/SysV/BSD/SunOS4. My search
for the POSIX documentation also got me in touch with arl, who was later to
create the Linux directory on nic.funet.fi, the site where I still release my
kernels.
SL: With commercial flavors of UNIX, standards are a hotly debated topic.
What's your viewpoint on standards compliance?
LT: Simple adherence to standards isn't the Linux way. I (and others) have
tried to make the system as usable as possible, and added some features just
because they were interesting. I've strived for a simple and clean design
within those constraints--at least as long as it's efficient. (I hate
inefficient code and still fall back to checking the compiler output every now
and then.)
SL: Since Linux seems to run almost everything that plain-vanilla UNIX will,
exactly how different is the internal architecture between commercial UNIX and
Linux?
LT: Well, the basic design has similarities: The kernel is monolithic, and
processes aren't forcibly preempted while in kernel mode. So the architecture
per se doesn't necessarily differ too wildly, but the actual code is likely to
be rather different.
SL: Tell us about your programming style when dealing with developing a
multitasking OS which runs a wide variety of software on a variety of hardware
configurations.
LT: I try to avoid subtle code: If it isn't obvious what a routine does, it's
likely to be buggy (or become so after a few changes). The way the scheduling
works is rather hard to follow at times, and some of the file-system checks
can seem incomprehensible unless you know what is happening. (I dislike
locking, so the file-system code has to be very careful in order to avoid race
conditions.) One of my personal favorites may be the select() code, which is
definitely not obvious, but avoids races in interesting ways.
One of the most challenging aspects has been the wide variety of PC hardware:
Drivers which work on most machines can fail subtly on others. Linux has good
support for different kinds of hardware, but it has in some cases been a real
trial to get it all to work, and there are still occasionally reports of
machines that simply don't work correctly with Linux. It can be rather
frustrating at times.
SL: What's in the future for Linux?
LT: I expect to continue working on it the same way I have so far: no real
long-term planning, only a general idea about what I want to have. I,
personally, have been handling only the actual kernel for a long time now, and
I expect to continue with that: I hope others will find interesting projects
in Linux (both in the kernel and in user space), as they have so far. I hope
the Windows-emulation project will work out, along with the iBCS2 ("real i386
unix" binary compatibility) project: Those will open up new user areas when
they arrive.
Figure 1: The QNX microkernel.


A Conversation with E. Douglas Jensen




Michael Floyd


Doug Jensen, technical director for real-time computer systems at Digital
Equipment Corporation (DEC), has had a long career developing real-time
systems. While an associate professor at Carnegie-Mellon University (CMU),
Jensen developed the notion of a decentralized OS and created the Alpha OS
kernel. Jensen's technology is now incorporated in DEC's Libra OS kernel. I
recently spoke with Jensen about the use of microkernel technology in
real-time operating environments, its benefits, and its future.
DDJ: How much of your work on the Alpha OS kernel is embodied in the Libra OS
architecture?
EDJ: The Libra OS architecture embodies my understanding and experience from
over 27 years of research and advanced-technology development in real-time
computers and operating systems. My Alpha OS kernel at CMU is one of the
primary intellectual progenitors of the Libra OS. Another is the Mach 3
kernel, which forms the commercial and standards context for the Alpha and
other new real-time OS technologies in Libra. The concepts of distributed
threads, time-value functions, and best-effort scheduling are based directly
on extensions of Alpha kernel functionality.
DDJ: You say you've created a new paradigm for resource management in
real-time systems. Describe this paradigm and tell us its relevance to other
microkernels.
EDJ: The Libra OS architecture reflects the important expansion of real-time
computing from its roots in small scale, centralized, low-level, sampled-data
subsystems. Many real-time computing systems are becoming more complex and
decentralized as they move up in the application-control hierarchy. But most
traditional real-time concepts and techniques don't scale up. These
small-scale ideas include hard deadlines as the only kind of
computation-completion timeliness constraint, the requirement for application
programmers to somehow map computation-completion time constraints onto fixed
priorities, and the limitation of the real-time OS's responsibility for
computational timeliness to starting the highest-priority computation as
quickly as possible. These notions all are based on the pretense that a system
can be deterministic, which is an oversimplification that usually works
adequately in small scale but not in the large--much as Newton's "law" of
gravity was revealed by relativistic physics to be a small-scale
simplification of space-time curvature.
Libra's real-time paradigm is a generalization of the traditional concepts and
techniques which allows the domain of real-time computing to encompass
larger-scale, more-dynamic, more-decentralized applications. For example, time
constraints can be expressed in terms of the benefit a computation provides,
as a function of the time that computation completes execution. Libra OSs
accept responsibility for adaptively managing resources according to those
time constraints, to attain the most-optimal system timeliness possible under
the current conditions. And Libra does this on an end-to-end basis across
physically dispersed computing nodes. In contrast, commercial real-time OS and
executive products are centralized--"distributed real-time" systems are
actually non-real-time networks of centralized real-time nodes, without
OS-enforced end-to-end timeliness.

DDJ: What benefits do microkernels currently offer, and how will they evolve
over, say, the next six years?
EDJ: The real-time application domain implies that it is no longer possible
for one or two kinds of real-time OSs--a small real-time executive and a
full-function real-time UNIX, for instance--to meet user needs. It even
appears that a general-purpose, real-time distributed OS may be theoretically
impossible. The only feasible solution may be for real-time applications will
facilitate this structure. The classical layered organization of OSs and
system and application software will relax to more of a "depends on" hierarchy
of distributed objects. A modular OS is more than an unconstrained collection
of building blocks--for manageability, it requires an OS architecture
specification which all these different configurations comply with.
First-generation microkernels exist today, but this kind of modular
OS--real-time or not--is still in the research stage.




























































May, 1994
A C++ Multitasking Class Library


Preemptive multitasking under DOS




Ken Gibson


Ken has been designing real-time embedded software for the last several years
and is currently a software engineer with Intel. He can be reached at
kenneth_gibson@ccm.jf.intel.com.


Multithreaded applications, programs capable of executing more than one
section of code concurrently, can solve a number of fprogramming problems,
including those found in simulation and real-time device control. Languages
like Ada address this by providing support for concurrent processing in their
language definition. However, most languages (including C++) do not provide
built-in support for the execution of multiple threads.
This article presents a class library that lets you implement a program as a
set of concurrent threads. The multitasking class library is written to run
under DOS and is built with Microsoft C++ 7.0. In the library, I define a
Thread class that can be allocated for each thread of execution. A Scheduler
object is defined to schedule the processor for thread execution. In addition,
I provide a semaphore class for thread synchronization and a queue class which
can be used for interthread communications. To illustrate how you use the
class library, I'm providing a sample program electronically (see
"Availability," page 3).
Although designed for DOS, the multitasking class library is not difficult to
port to other systems. By adding the proper processor initialization and using
a locator program, a ROMable image can be created for use on embedded
processors in place of a real-time executive.


Design Goals


I wanted the library to be easy to use, flexible, and portable. As such,
concurrent processing is achieved by allocating instances of the thread class
and specifying a main() function in the constructor for each one. Flexibility
is enhanced through a priority-based, preemptive scheduler.
I chose counted semaphores for thread synchronization because they are
flexible and can be used to implement higher-level abstractions, such as
monitors and pipes. Furthermore, semaphores are designed so that they can be
signaled from interrupt service routines. Thus, high-priority threads can
preempt lower-priority ones for fast response to external events.
Portability is more difficult because of the processor- and compiler-specific
requirements of swapping processor context between threads. However, I've
isolated nonportable code to a few assembly functions in one assembly module
(Listing Three, page 98), with processor-specific definitions contained in one
header file (Listing Four, page 98).


Class Definitions


The first class in the library is a doubly linked queue. Queues are integral
to the operation of multitasking classes and are used by all of the other
classes. Many existing queue classes are available, but I've included my own
so that my class library can be used by itself.
The Dlque class is defined in Listing One, page 96. A Dlque object contains a
private forward and backward link to other Dlque objects, and public member
functions for standard operations to add items, remove items, and check for an
empty queue. Additional member functions include Delink(), which removes an
item from the middle of a Dlque, and Peak() which looks at the item on the
head of a Dlque without removing it.
The doubly linked queue class is defined so that both the queue head and the
items placed on the queue are Dlque objects: The constructor for a Dlque
object simply sets both forward and backward pointers to point to Dlque.
Figure 1 shows an empty Dlque head. After items have been added to the queue,
the flink fields each point to the next item on the queue, and the blinks
point to the previous item. The exceptions are the blink on the head, which
points to the last item on the queue, and the flink on the last item, which
points back to the head. Figure 2 shows a queue with two items.
The Add() member function appends a new Dlque object to the end of a queue
(see Listing Two, page 96). Since the Dlque object to be added is not on a
queue, the constructor has initialized both flink and blink to point to the
new item. Add() first sets the blink of the new item to the blink of the head.
If the queue is empty, the blink in the head points to itself. Otherwise, it
points to the current end of the queue. Add() sets the flink of the new item
to point back to the queue head, because this item is added at the end of the
queue. Add() then sets the flink of the Dlque object pointed to by the blink
of the head to point to the new item being inserted. This could be the flink
of either the head or the previous last item on the queue. Finally, Add() sets
the blink of the queue head to point to the new item.
Remove() removes the first item on a queue. It first checks for an empty queue
and returns NULL if it finds one. Otherwise, it gets the pointer to the first
item from the flink in the head. Next, it updates the flink of the head to
point to the item that was pointed to by the flink of the item being removed.
This is either the next item on the queue, or a pointer back to the queue head
(if only one item was on the queue). Remove() then sets the blink of the next
item in the queue to the blink of the item being removed. If the queue had
only one item on it, the head is left with its blink and flink pointing to the
head itself. Finally, Remove() updates the flink and blink of the removed item
to point to itself.
The Delink() member function is used to remove a Dlque object from the middle
of a queue. It sets the flink of the previous item on the queue (referenced by
the blink) to the flink of the item being removed, and the blink of the next
item on the queue to the blink of the item being removed.


The Scheduler Class


The Scheduler object can be viewed as the kernel of a multitasking application
that uses the class library presented with this article. Although users do not
use the Scheduler class directly, an instance of a Scheduler object is created
by the class library for use by the thread and semaphore objects in an
application. As Listing One shows, the scheduler contains a table of all
threads in the application; the prioritized ready list; and member functions
to set up threads at initialization time, select which threads to execute, and
swap processor context between threads. The scheduler does this through a
combination of portable C++ member functions and calls to processor-specific
assembly functions.
When the scheduler is created, its constructor initializes its thread table
(which contains all threads in the application) to be initially empty. It
initializes its pointer to the free stack space available for allocation to
individual threads. Each thread must have its own region of free stack, which
the scheduler must allocate from the application's stack space. The
CurStackBase member variable is initialized using the processor-specific
InitStackBase() in its constructor. This function simply returns the current
value of the stack pointer. As a thread is created, its constructor calls the
scheduler's GetStackSpace() member function to reserve stack space and obtain
an initial stack pointer. Assuming the stack grows toward low memory,
GetStackSpace() subtracts the thread's requested stack space from CurStackBase
to allocate a region of free stack for this new thread and leaves CurStackBase
pointing to the new beginning of free application stack. GetStackSpace() then
returns the previous value of CurStackSpace as the base of the new thread's
stack. Figure 3 shows the stack configuration for an application that has
created two threads.
GetStackSpace() also enforces a minimum stack size for each thread, because
although a thread may not allocate any stack variables, on many processors
interrupts push data onto the application's stack using the stack pointer at
the time of the interrupt. If there's not enough free stack space to
accommodate this data, the stack pointer will cross into the adjacent thread's
stack and corruptits data.
Finally, the constructor creates the NULL thread, which guarantees that when
the scheduler reschedules the threads on the processor, the NULL thread will
be ready to run. This can happen when all of the application threads are
blocked, waiting on some event. The NULL thread executes an idle loop and runs
at a priority level below any application threads, allowing one to preempt it
when it's ready to run.
The AddThread() member function adds new threads into the scheduler. It first
searches the thread table for an empty entry and, if one is found, enters a
pointer to the new thread. It then calls the new thread's MakeReady() member
function to set the thread's state to READY and enters the new thread on the
ready list at the specified priority level. The ready list is implemented as
an array of Dlque objects. Each Dlque holds the ready threads for one priority
level; AddReady() just adds the thread to the appropriate Dlque.
The Resched() member function selects threads in the application for
execution. It searches the array of ready queues in priority order for a
nonempty queue. The NULL thread ensures that at least one ready thread can be
scheduled. After selecting a thread, Resched() checks whether this new thread
is the same as the last current thread. If so, the processor is already
running in the correct thread's context, and Resched() simply returns. If not,
Resched() calls the new thread's MakeCurrent() member function, which marks
the new thread's state as CURRENT, points the CurrentThread pointer to the new
thread, and calls the ContextSwitch() member function. ContextSwitch() makes
an inline call to the assembly-language AsmContextSwitch(), which does the
processor-specific work of swapping context between the old and new threads.
AsmContextSwitch() takes as parameters pointers to the old and new thread's
pregs structures, processor-specific structures containing the registers that
must be part of a thread's saved context; see Listing Four. AsmContextSwitch()
gets the pointer to the old thread's saved-register area into an internal
register after saving the value of that register on the stack. Since this
function is always entered through a function call, registers not conserved
across function calls need not be saved. AsmContextSwitch() saves those that
must be conserved into the saved-register area. The current values of the
stack and instruction pointers are not saved, however. Instead, the return
address is taken off the stack and saved as the instruction pointer, and the
stack pointer is incremented so that when this thread is rescheduled, the
context will be restored as if it had just returned to Resched().
Next, Resched() gets the pointer to the new thread's saved registers and
restores all of the registers except the instruction pointer and the register
pointing to the saved-register area. The final steps are to push the saved
instruction pointer onto the stack, restore the register currently pointing to
the saved-register structure, and execute a return instruction that pops the
saved instruction pointer off the stack and begins executing in the new
thread. Listing Three provides an example of AsmContextSwitch() for the Intel
architecture.
Pause(), the next member function, allows a thread to voluntarily relinquish
control of the processor to other threads of the same priority. Pause() puts
the calling thread back on the end of the ready list and calls ReSched().
The last function in the scheduler is StartMultiTasking(), which is called at
initialization and transforms a single-threaded application to a set of
threads, each running in its own context. StartMultiTasking() first sets up
the scheduler such that the NULL thread appears to be the current thread by
removing it from the ready list, setting its state to CURRENT, and pointing
the scheduler's CurrentThread member variable to point to the NULL thread.
StartMultiTasking() calls Pause(), which calls Resched() to select the
highest-priority thread for execution and call ContextSwitch(). This saves the
current context as the NULL thread's context and begins executing the selected
thread. If the NULL thread is later rescheduled, it returns from the call to
Resched() in StartMultiTasking() and executes the next statement in this
function, which is an infinite loop.
The Scheduler class is not instantiated by users of the class library.
Instead, the library creates a Scheduler object during initialization. The
library must guarantee that only one instance of the scheduler is created and
that its constructor is executed before users can create any thread objects.
Allocating one instance of the scheduler in a .cpp module and referencing it
as an extern from a header file won't work because C++ does not guarantee the
execution order of constructors for objects statically allocated in different
modules. In that case, a class-library user could statically allocate thread
objects, and their constructors could be executed before the constructor for
the scheduler.
This is addressed by the SchedulerInit helper class in Listing One. The header
file that defines the multitasking classes allocates a static instance of
SchedulerInit in each module that includes it. SchedulerInit contains a static
count variable that is initialized to 0 at compile time. The constructor for
SchedulerInit increments this count each time it is called and only creates an
instance of the scheduler when the count is 0. It also initializes the static
pointer Scheduler::InstancePtr to point to this single instance of the
scheduler so that other classes can reference it.


The Thread Class



The Thread class in Listing One contains the context of a thread of execution
within the application. Private member variables include the Dlque object that
places the thread on the ready list or on semaphores, and the
processor-specific pregs structure for saving the thread's processor state
when the thread must block. A thread also stores its current state in its
private area. Figure 4 shows the allowable states and state transitions for a
thread. A thread can either be CURRENT, BLOCKED, or READY. A READY thread
waits on the ready list and will transition to the CURRENT state when it
becomes the highest-priority thread to run. A currently running thread can
transition to the BLOCKED state by waiting on a semaphore or back to READY if
preempted by a higher-priority thread, or if it signals a semaphore with a
waiting thread of equal or higher priority. A BLOCKED thread waiting on a
semaphore will return to the READY state when the semaphore is signaled and
the thread is at the head of the waiter's list.
Public member functions on threads include functions to set the current state
of the thread as well as the constructor. The thread constructor takes a
pointer to a main() function as a parameter; optional parameters may be
provided to specify the amount of stack space and the priority. The
constructor allocates a region of free stack space from the scheduler, gets
its initial stack pointer, and places a pointer to the static ThreadRet() on
the thread's stack so that if the thread ever returns from its main()
function, it will return to ThreadRet().
The thread constructor also calls the assembly-language InitPregs() to get its
initial saved-processor registers. These are initialized so that the first
time the thread is scheduled, it begins executing at its main() function. An
InitPregs() for the Intel architecture is in Listing Three. Finally, the
constructor sets the thread state to READY and enters the thread into the
scheduler's ready list to await execution.


The Semaphore Class


A semaphore is defined as a subclass of a Dlque since one of its primary
functions is to queue waiting threads. Otherwise, it is a straightforward
implementation of a counted semaphore. The constructor for a semaphore allows
the option to specify a nonzero initial count: If none is provided, it
defaults to 0. Wait() first checks for a nonzero count. If nonzero, the count
is decremented and Wait() returns to the caller. If 0, then the calling thread
must be blocked. It changes the calling thread to the blocked state, queues it
on the list of waiting threads, and calls the scheduler's Resched() to switch
to another thread.
The semaphore's Signal() first checks for any waiting threads. If none are
present, it increments the count and returns to the caller. If threads are
waiting, Signal() removes the next waiting thread from the list, readies it to
run, and returns it to the ready list. Signal() then checks to see if this is
now the highest-priority thread and if so, returns the calling thread to the
ready list and tells the scheduler to perform a context switch.


The Main() Function


When using the class library, application programmers provide the main-level
functions for the threads that they create. The class library provides the
main() for the application where it initiates concurrent processing among the
threads. When main() is executed, the scheduler's constructor will have
already executed. The scheduler requires that at least one thread be
statically allocated so that it will be initialized and entered into the ready
list when main() is called. As shown in Listing Two, main() calls
Scheduler::StartMultiTasking() to begin executing in a thread context.


Conclusion


The multitasking class library presented here allows C++ programmers to write
programs as a set of concurrent threads. It does so using thread and semaphore
classes and a scheduler object. While this is particularly relevant for
real-time system designers, it can also be a valuable addition to any C++
programmer's toolbox.
Figure 1: Empty doubly linked queue head.
Figure 1: Doubly linked queue with two items.
Figure 1: Stack use with two threads.
Figure 1: Thread states.
[LISTING ONE] (Text begins on page 28.)

// threads.h -- Multitasking class definitions
#ifndef THREADS_H
#define THREADS_H
#include "specific.h"
#define TRUE 1
#define FALSE 0

typedef void (*vfptr)();

class Thread;
class Semaphore;

// Stack size values
#define MIN_STACK 0x400 // Minimum stack size per thread
#define NULL_STACK 0x400 // Space for NULL thread
#define INIT_STACK 0x080 // Space for scheduler initialization
#define DEFAULT_STACK 0x400 // Default size

// Values for thread states
#define THREAD_UNUSED 0 // Thread Table entry unused
#define THREAD_READY 1 // Thread is ready to run
#define THREAD_CURRENT 2 // Thread is currently running
#define THREAD_BLOCKED 3 // Blocked on a sem or timer

// Thread priorities
#define LOWEST_PRIORITY 4
#define HIGHEST_PRIORITY 0
#define NULL_PRIORITY (LOWEST_PRIORITY+1)

// Max number of threads to allow

#define MAX_THREADS 8
#define MAX_THREADID (MAX_THREADS-1)

// Doubly Linked Queue class
class Dlque
{
private:
 Dlque *flink; // Forward Link
 Dlque *blink; // Backward Link
public:
 int Empty(); // Check for empty queue
 void Add( Dlque *Queue ); // Add to the back
 Dlque *Remove(); // Remove from the front
 void Delink(); // Remove from the middle
 Dlque *Peak(); // Look at front without removing.

 Dlque() { flink = blink = this; }
 ~Dlque() {}
};
// The SCHEDULER class
class Scheduler
{
private:
 Thread *ThreadTab[MAX_THREADS];
 Dlque ReadyList[NULL_PRIORITY+1];
 char *CurStackBase;
 Thread *NullThread;
 void ContextSwitch( pregs *OldRegs, pregs *NewRegs )
 { asmContextSwitch( OldRegs, NewRegs ); }
public:
 static Scheduler *InstancePtr; // Ptr to one and only instance
 Thread *CurrentThread; // Current thread
 char *GetStackSpace( unsigned Size );
 void ReSched(); // Reschedule threads
 void AddReady( Thread *pThread );
 void RemoveReady( Thread *pThread );
 char AddThread( Thread *pThread );
 void Pause();
 void StartMultiTasking();
 Scheduler();
 ~Scheduler() {}
};
// Scheduler initialization class. Insures that only one instance
// of the Scheduler is created no matter how many modules include
// threads.h. Also insures that it is created before any threads.
class SchedulerInit
{
private:
 static int count; // Compile time initialized to 0
public:
 SchedulerInit() { if( count++ == 0 )
 Scheduler::InstancePtr = new Scheduler; }
 ~SchedulerInit() { if( --count == 0 )
 delete Scheduler::InstancePtr; }
};
static SchedulerInit SchedInit;
// THREAD class
class Thread
{

private:
 friend class Scheduler;
 friend class Semaphore;
 Dlque Queue; // For putting threads on Queues
 pregs Regs; // Processor specific saved registers
 char State; // Current thread state
 static void ThreadRet(); // Called if a thread returns from main
public:
 char id; // Thread ID
 unsigned Priority;
 void MakeReady() { State = THREAD_READY; }
 void MakeCurrent() { State = THREAD_CURRENT; }
 void MakeBlocked() { State = THREAD_BLOCKED; }
 Thread( vfptr MainRtn,
 unsigned Priority=LOWEST_PRIORITY,
 unsigned StackSpace=DEFAULT_STACK );
 ~Thread() {}
};
// SEMAPHORE class
class Semaphore : Dlque
{
private:
 short count;
public:
 void Wait();
 void Signal();
 Semaphore( short InitCount=0 );
 ~Semaphore() {}
};
#endif // THREADS_H

[LISTING TWO]

// threads.cpp -- Implementation of Multitasking Classes
#include <stdio.h>
#include <stdlib.h>
#include "threads.h"

#define TRUE 1
#define FALSE 0

// Static count of SchedulerInit object that have been created.
int SchedulerInit::count = 0;

// Pointer to the one instance of the scheduler.
Scheduler *Scheduler::InstancePtr;

// Dlque::Empty -- Returns TRUE if the Dlque is empty.
inline int Dlque::Empty()
{
 return( flink == this );
}
// Dlque::Add -- Adds an item to the end of a doubly linked queue.
void Dlque::Add( Dlque *Queue )
{
 Queue->blink = blink;
 Queue->flink = this;
 blink->flink = Queue;
 blink = Queue;

}
// Dlque::Remove -- Removes item at the head of the dlque. NULL if empty.
Dlque *Dlque::Remove()
{
 Dlque *Item;
 if( Empty() ) {
 return( NULL );
 }
 Item = flink;
 flink = Item->flink;
 Item->flink->blink = Item->blink;
 Item->flink = Item->blink = Item;
 return( Item );
}
// Dlque::Delink -- Delinks an item from the middle of a dlque.
void Dlque::Delink()
{
 blink->flink = flink;
 flink->blink = blink;
 flink = blink = this;
}
// Dlque::Peak -- Returns a pointer to the first item without removing it.
Dlque *Dlque::Peak()
{
 if( Empty() ) {
 return( NULL );
 }
 return( flink );
}
// Scheduler Constructor
Scheduler::Scheduler()
{
 short i;
 InstancePtr = this;
 // Initialize the Thread Table
 for( i=0; i<MAX_THREADS; ++i ) {
 ThreadTab[i] = NULL;
 }
 // Initialize System Stack Base to the current stack pointer
 CurStackBase = InitStackBase();
 // Allocate space for scheduler initialization
 CurStackBase -= INIT_STACK;
 // Create the NULL Thread.
 NullThread = new Thread( NULL, NULL_PRIORITY, NULL_STACK );
}
// GetStackSpace -- Used by new threads to get their initial SP
char *Scheduler::GetStackSpace( unsigned Size )
{
 char *Base;
 if ( Size < MIN_STACK ) {
 Size = MIN_STACK;
 }
 Base = CurStackBase;
 CurStackBase -= Size; // Assume stack grows toward low mem.
 return Base;
}
// Scheduler::AddThread -- Add a new thread into the Scheduler
char Scheduler::AddThread( Thread *pThread )
{

 register char id;
 for( id=0; id<MAX_THREADS; ++id ) {
 if( ThreadTab[id] == NULL ) {
 break;
 }
 }
 if( id == MAX_THREADS ) {
 return( FALSE );
 }
 ThreadTab[id] = pThread;
 pThread->MakeReady(); // Tell new thread to make itself READY
 AddReady( pThread ); // Add to ready list in the scheduler
 return( TRUE );
}
// AddReady -- Add the given thread to the ReadyList
inline void Scheduler::AddReady( Thread *pThread )
{
 ReadyList[pThread->Priority].Add( &pThread->Queue );
}
// RemoveReady -- Remove the specified thread from the ready list.
inline void Scheduler::RemoveReady( Thread *pThread )
{
 pThread->Queue.Delink();
}
// Scheduler::ReSched -- Picks next ready thread and calls ContextSwitch to
// perform the context switch to the new thread.
void Scheduler::ReSched()
{
 Thread *OldThread;
 Thread *NewThread;
 unsigned Priority;
 for( Priority=0; Priority<=NULL_PRIORITY; ++Priority ) {
 if( !ReadyList[Priority].Empty() ) {
 NewThread = (Thread *)ReadyList[Priority].Remove();
 break;
 }
 }
 // If calling thread is still ready and is the highest
 // priority ready thread, just return
 if( NewThread == CurrentThread ) {
 CurrentThread->MakeCurrent();
 return;
 }
 OldThread = CurrentThread;
 CurrentThread = NewThread;
 CurrentThread->MakeCurrent();
 ContextSwitch( &OldThread->Regs, &CurrentThread->Regs );
}
// Scheduler::Pause -- Checks for any ready threads that are equal or higher
// priority than the calling thread. If so, reshcedules.
void Scheduler::Pause()
{
 short SavedPS;
 SavedPS = DisableInt();
 CurrentThread->MakeReady(); // Switch from Current to Ready
 AddReady( CurrentThread ); // Caller back on end of ReadyList
 ReSched(); // Run new highest priority thread
 EnableInt( SavedPS );
}

// StartMultiTasking -- Perform transformation from a single threaded
// application to a set of threads running in individual contexts. This
// is done by first setting up the system variables to look like the Null
// thread is the current thread. Then, call Pause() which will cause the
// context of this routine to be saved as the Null thread's context. When
// the Null thread is rescheduled, the CPU will return to this routine.
// Rest of this routine then becomes the loop that runs in the Null thread.
void Scheduler::StartMultiTasking()
{
 RemoveReady( NullThread );
 CurrentThread = NullThread;
 NullThread->MakeCurrent();
 Pause();
 while( TRUE ); // Loop in the NULL thread
}
// Thread::Thread -- Creates a new thread based on the specified params.
Thread::Thread( vfptr MainRtn,
 unsigned TaskPriority, unsigned StackSpace )
:Queue()
{
 short *StackPtr;
 short RegContents;
 // Set up the initial stack so that if the main routine for this
 // thread returns for some reason, it returns to ThreadRet
 StackPtr
 = (short*)Scheduler::InstancePtr->GetStackSpace(StackSpace);
 *StackPtr = (short)Thread::ThreadRet;
 // Call processor/compiler specific routine to initialize
 // the saved processor registers.
 InitPregs( &this->Regs, (short)StackPtr, (short)MainRtn );
 Priority = TaskPriority;
 Scheduler::InstancePtr->AddThread( this );
 MakeReady(); // Set our state to READY
}
// ThreadRet -- Routine that is placed on each thread's stack as the return
// address in case the main' routine ever returns.
void Thread::ThreadRet()
{
#ifdef _DEBUG
 printf( "A Thread returned from main()\n" );
#endif
 exit( 1 );
}
// Semaphore::Semaphaore -- Constructor for objects of the class semaphore.
Semaphore::Semaphore( short InitCount )
{
 count = InitCount;
}

// Semaphore::Wait -- Queue a thread as a waiter on a semaphore
void Semaphore::Wait()
{
 short SavedPS;
 SavedPS = DisableInt();
 if( count ) // No need to block waiter
 {
 --count;
 }
 else // Waiter must block

 {
 Scheduler::InstancePtr->CurrentThread->MakeBlocked();
 Add( &Scheduler::InstancePtr->CurrentThread->Queue );
 Scheduler::InstancePtr->ReSched();
 }
 EnableInt( SavedPS );
}
// Semaphore::Signal -- Signal a semaphore
void Semaphore::Signal()
{
 short SavedPS;
 Thread *Waiter;
 SavedPS = DisableInt();
 if( Empty() ) // No waiters to reschedule
 {
 ++count;
 }
 else // There are blocked waiters
 {
 Waiter = (Thread*)Remove(); // Get next waiter
 Waiter->MakeReady(); // Make it ready
 Scheduler::InstancePtr->AddReady( Waiter );
 if( Waiter->Priority <
 Scheduler::InstancePtr->CurrentThread->Priority ) {
 Scheduler::InstancePtr->CurrentThread->MakeReady();
 Scheduler::InstancePtr->AddReady(
 Scheduler::InstancePtr->CurrentThread );
 Scheduler::InstancePtr->ReSched();
 }
 }
 EnableInt( SavedPS );
}
// main()
void main()
{
 Scheduler::InstancePtr->StartMultiTasking();
}

[LISTING THREE]

; Intel Architecture specific routines.
 .MODEL small
 .CODE ; Create C compatible CS
; Offsets into the saved register area for each register
AX_OFST = 0
BX_OFST = 2
CX_OFST = 4
DX_OFST = 6
BP_OFST = 8
SI_OFST = 10
DI_OFST = 12
DS_OFST = 14
SS_OFST = 16
ES_OFST = 18
PSW_OFST= 20
PC_OFST = 22
SP_OFST = 24
INIT_PSW = 0200h ;Thread's initial Processor Status Word
; Return the current stack pointer. This will be used as a reference for

; assigning the stack base for each thread.
; C Prototype: char *InitStackBase( void );
 PUBLIC _InitStackBase
_InitStackBase PROC
 mov ax, sp
 sub ax, 2 ;Where it will be after return
 ret
_InitStackBase ENDP
; asmContextSwitch - Switches processor context between two threads
; C Prototype: void asmContextSwitch( pregs *OldRegs, pregs *NewRegs );
; 1. Assume SMALL or COMPACT memory model. Don't save and restore CODE,
; STACK, or DATA SEGMENTS. These always stay the same.
; 2. Assume Microsoft and Borland C calling conventions. This routine will
; always be called' and the registers AX, BX, CX, DX do not need to be
; preserved across procedure calls and are not saved and restored here.
 PUBLIC _asmContextSwitch
_asmContextSwitch PROC
; Currently have: SP -> Return Address
; SP+2 -> Old reg save area pointer
; SP+4 -> New reg save area pointer
 push si ;Save old task's SI
 push bp ;And BP
 mov bp, sp ;Get back to the base of the stack frame
 add bp, 4
 mov si, [bp+2] ;Get pointer to old register save area
 pop [si+BP_OFST] ;Save old process's BP in save area
 pop [si+SI_OFST] ;and SI
 mov [si+DI_OFST], di ;and rest of the regs that must be saved
 mov [si+ES_OFST], es
 pushf ;Push PSW onto the stack
 pop [si+PSW_OFST] ;then pop into save area
; Save the return address as the saved PC and increment the SP before
; saving so context will be restored as if just returned to ReSched
 mov bx, [bp] ;Get return address off the stack
 mov [si+PC_OFST], bx
 mov bx, sp ;Increment SP
 add bx, 2
 mov [si+SP_OFST], bx ;and save
 mov si, [bp+4] ;Get new process's saved regs
 mov bp, [si+BP_OFST] ;and restore registers
 mov di, [si+DI_OFST]
 mov es, [si+ES_OFST]
 push [si+PSW_OFST] ;Push new PSW onto the stack
 popf ;then pop into PSW
 mov sp, [si+SP_OFST] ;Switch to new stack
; Push the saved PC on the stack to be restored when RET is executed
 push [si+PC_OFST]
 mov si, [si+SI_OFST] ;Finally, restore SI
 ret
_asmContextSwitch ENDP
; InitPregs -- Sets the initial saved processor register for a new thread.
; C Prototype: void InitPregs(pregs* pRegs,short InitStack,short MainRoutine
);
 PUBLIC _InitPregs
_InitPregs PROC
 push si
 push bp
 mov bp, sp
 add bp, 4
 mov si, [bp+2] ;Get pointer to pregs

 ; Assume SMALL or COMPACT memory model and set the
 ; initial segments the same as the current ones
 mov [si+SS_OFST], ss
 mov [si+DS_OFST], ds
 mov [si+ES_OFST], es
 mov [si+PSW_OFST], INIT_PSW
 mov ax, [bp+4]
 mov [si+SP_OFST], ax ;Stackbase
 mov ax, [bp+6]
 mov [si+PC_OFST], ax ;Main Routine
 pop bp
 pop si
 ret
_InitPregs ENDP
; DisableInt - Disables Interrupts and returns current Processor Status Word
; C Prototype: short DisableInt( void );
 PUBLIC _DisableInt
_DisableInt PROC
 pushf
 pop ax
 cli
 ret
_DisableInt ENDP
; EnableInt - Enables interrupts IF enabled in saved Processor Status Word
; C Prototype: void EnableInt( short );
 PUBLIC _EnableInt
_EnableInt PROC
 push bp
 mov bp, sp
 mov ax, [bp+4] ;Get saved Processor Status Word
 and ax, 0200h ;If Interrupts were enabled
 jz NoEnable
 sti ;then re-enable them
NoEnable:
 pop bp
 ret
_EnableInt ENDP

 END

[LISTING FOUR]

// specific.h -- Processor and compiler specific definitions
#ifndef SPECIFIC_H
#define SPECIFIC_H
// Intel processor saved register area.
struct pregs
{
 short ax; // Offset 0
 short bx; // 2
 short cx; // 4
 short dx; // 6
 short bp; // 8
 short si; // 10
 short di; // 12
 short ds; // 14
 short ss; // 16
 short es; // 18
 short psw; // 20

 short pc; // 22
 short sp; // 24
};
// Processor specific routines in specific.asm
extern "C" void asmContextSwitch( pregs*, pregs* );
extern "C" void InitPregs( pregs*, short, short );
extern "C" char *InitStackBase( void );
extern "C" short DisableInt( void );
extern "C" void EnableInt( short );

#endif
End Listings


















































May, 1994
MMURTL: Your Own 32-Bit Operating System


A message-based, multitasking, real-time kernel




Richard Burgess


Rich spent 20 years in the U.S. Coast Guard, primarily in systems analysis and
design. He now heads The D Group, a Lorton, Virginia consulting firm. Rich can
be reached via e-mail at rburgess@aol.com.
MMURTL (pronounced "Myrtle") is an operating system designed to run on 386SX
or better Intel-based PCs. MMURTL, short for "message-based, multitasking,
real-time kernel," supports flat, 32-bit, virtual-paged memory space and all
32-bit instructions (including device drivers) without resorting to thunking
16-bit BIOS code. Still, MMURTL's file system is DOS FAT-compatible, so to use
it, all you have to do is run the loader from MS-DOS, then boot MMURTL. In a
general sense, MMURTL's messaging types have a client/server flavor. One of
MMURTL's most attractive features, however, is that it's small (at least for a
multitasking OS), running in only one Mbyte, with room to spare for an
application or two.
I initially used MASM 5.x to develop MMURTL. After running into some problems
with 32-bit instructions, however, I switched to Turbo Assembler 2.x, then
TASM 3.x. More recently, I've been using DASM, the assembler included with
MMURTL. Still, the code assembles with TASM or MASM.
In this article, I'll discuss MMURTL's paged-memory management, specifically,
the code contained in the file MEMCODE.INC. The current version of the
complete MMURTL operating system--source code, executables, device drivers, C
compiler, assembler, documentation, and other utilities--is available
electronically (see "Availability," page 3).


MMURTL Terms and Definitions


Before going further, I'll define a few unique, MMURTL-specific terms.
A MMURTL job is an application or system service. Each job has its own linear
memory space provided by the OS and paging hardware. A job has one or more
tasks (threads of execution) managed with 32-bit Intel task-state segments
(TSS). Jobs are kept track of in MMURTL with a structure called a "job control
block" (JCB).
Physical memory includes the memory chips and their addresses as accessed by
the hardware. If I put address 00001 on the address bus of the processor, I'm
addressing the second byte of physical memory. Linear memory, on the other
hand, is what applications use as they run. This memory is actually translated
by the paging hardware to physical addresses that MMURTL manages. Programs
running in MMURTL have no idea where they are physically running in the
machine's hardware address space, nor would they want to. These are "fake"
addresses, but very real to every job on the system.
Logical memory is the memory that programs deal with and is based around a
"selector." A protected-mode program's memory is always referenced to a
selector, mapped (in a table) to linear memory by the OS and translated by the
processor. Generally, selectors are managed in a local or global descriptor
table (LDT or GDT); MMURTL, however, doesn't use LDTs. Logical memory is read
by the processor, where an additional address translation takes place. The GDT
allows you to set up a zero-based address space that really isn't at linear
address 0.
If you are familiar with segmented programming, you know MS-DOS uses tiny,
small, medium, large, and huge memory models to accommodate the variety of
segmented programming needs. The only memory model in MMURTL is analogous to
the small memory model, in which you have two segments: one for code, the
other for data and stack. This sounds like a restriction until you consider
that a single segment can be as large as all physical memory (or larger, with
demand paging).
MMURTL doesn't provide memory management in the sense that compilers and
language systems provide a heap or an area managed and cleaned up for the
caller. MMURTL is a paged-memory system: It allocates pages of memory as they
are requested and returns them to the pool of free pages when they are
deallocated. MMURTL manages all the memory in the processor's address space as
pages. Because linear addresses are "fake" and the real memory (physical
pages) can be allocated in any order, address-space fragmentation is really
not a concern.
A page is four Kbytes of contiguous memory on a 4-Kbyte boundary of physical
and linear addressing.


Segmentation


MMURTL uses three defined segments in the GDT--the OS code segment (08h), the
application code segment (18h), and a data segment (10h). Selectors (or
"segment numbers") are fixed.
Using the same data selector for the OS and all programs lets you use 32-bit
near-data pointers exclusively, thereby greatly simplifying application
development. This technique also speeds up code by maintaining the same data
selectors throughout the program's entire execution. The only selector that
changes is the code selector, which goes through a call gate into the OS and
back again. This means the only 48-bit pointers you'll use in MMURTL are for
an OS call address (16-bit selector, 32-bit offset).


Paging, Page Tables, and Page Directories


Paging lets you manage physical- and linear-memory addresses with simple table
entries. These table entries are used by the paging hardware to translate (or
map) physical to linear memory. Linear memory is what applications see as
their own address space. For instance, you can take the very highest 4-Kbyte
page in physical memory and map it into the OS's linear space as the second
page of its memory. This 4-Kbyte page of memory becomes addresses 4096--8191,
even though it's really sitting up at a physical 16-Mbyte address.
The structures that hold these translations are called "page tables" (PTs),
and each entry in a PT is called a "page-table entry" (PTE). Every PT has 1024
4-byte PTEs, so a single 4-Kbyte PT can manage four Mbytes of linear/physical
memory. That's not too much overhead for what we get out of it.
The paging hardware finds the page tables using a page directory (PD). Every
MMURTL job gets a unique PD, and each entry in a PD is called a "page
directory entry" (PDE). Each PDE is four bytes long and holds the physical
address of a PT. This means you can have 1024 PDEs in the PD, each pointing to
a different PT, which can have 1024 entries, each representing four Kbytes of
physical memory. This allows you to map the entire 4-gigabyte linear address
space:1024*1024*4K(4096)=4,294,967,296 (4 gigabytes).


The Memory MAP


MMURTL's OS code and data are both mapped into the bottom of every job's
address space. A job's memory space actually begins at the 1-gigabyte
linear-memory mark. This is so high because it gives the OS and each
application one gigabyte of linear memory space. Leaving the OS zero-based
also greatly simplifies memory initialization. Figure 1, the map of a single
job and the OS, is identical for every job and service installed.
The OS has to know where to find all the tables allocated for memory
management and how to get to them quickly. I could have built a separate table
and managed it, but this wasn't necessary, and I wanted to keep overhead down.
The processor translates these linear (fake) addresses into real (physical)
addresses. First, it finds the current PD by looking at the current task's
value in the control register CR3, the physical address of the current PD. The
processor uses the upper ten bits of the linear address it's translating as an
index into the PD. The entry it finds is the physical address of the PT. The
processor then uses the next ten bits as an index into the PT. Now it's got
the PTE, the physical address of the page it's after. Sounds like a lot of
work, but it's done with very little overhead.
The OS has no special privileges as far as addressing physical memory. MMURTL
uses linear addresses (fake ones), just like the applications, which is fine
until you have to update or change a PDE or PTE. At that point, you can't just
get the value out of CR3 and use it to find the PT because you'll crash.
Likewise, you can't take a physical address out of a PDE and find the PT it
points to.
Finding the PD for an application isn't a problem. When you start the
application, you build the PD and stick the physical address in the TSS field
for CR3, then put the linear address of the PD in the JCB. This is fine for
one address (the PD). However, it's another story when you're talking about
dozens or hundreds of linear addresses for all the PTs that need to be
managed.
MMURTL keeps the linear address of all PTs in the upper two Kbytes of the PD.
(Two Kbytes doesn't sound like much to save, but when 10, 20, or even 30 jobs
are running, it adds up.) The upper two Kbytes are a shadow of the lower two.
Each PDE has the physical address of a PT. MMURTL needs to know the physical
address of a PT, given its linear address for aliasing addresses between jobs
(and it needs it fast).
MMURTL's shadow entry with the linear address of the PT is exactly 2048 bytes
above each real entry in a PD; the shadow entries are marked "not used."


Page-Directory Entries



Table 1 lists the sample entries that describe the page directory. This
example assumes one PDE for the OS and one for the application.
The structure PDR1 in the source code describes a PTE. Each of the physical
and linear addresses stored are only 20 bits because the last 12 bits of the
32-bit address are below the granularity of a page (4096 bytes). These lower
12 bits for a linear address are the same as the last 12 bits for a physical
address. All the shadow entries are marked "not present," as are all entries
with nothing in them, so they don't exist as far as the processor is
concerned. If you decide to move the shadow information into separate tables
and expand the OS to address and handle 4 gigabytes of memory, it will be
transparent to applications.


Allocation of Linear Memory


AllocPage, AllocOSPage, and AllocDMAPage are the only calls to allocate memory
in MMURTL. AllocPage allocates contiguous linear pages in the job's address
range. This is 1--2 gigabytes. The pages are all initially marked read/write
with the user-protection level. AllocOSPage allocates contiguous linear pages
in the OS address range; this is 0--1 gigabyte. These pages are all initially
marked read/write with the system-protection level; entries automatically show
up in all jobs' memory space because all OS PTs are listed in every job's PD.
AllocDMAPage allocates contiguous linear pages in the OS address range, but it
ensures that these pages are below the 16-Mbyte physical-address boundary.
Most direct memory access (DMA) hardware on ISA machines can't access physical
memory above 16 Mbytes. AllocDMAPage also returns the physical address needed
by the DMA users.
All allocation routines first check nFreePages to see if there are enough
physical pages to satisfy the request. If so, they call FindRun to determine
if that number of pages exists as contiguous free entries in one of the PTs.
If not, MMURTL will create a new PT (see AddOSPT). They then call FindRun
again. This is strictly on a first-fit basis. Adding a 4-Kbyte PT for four
more megabytes of clean linear address space creates less overhead than using
cleanup code or linked lists to manage that space. When a large enough run is
found, the allocation routines call AddRun to get the linear address that they
return to the customer. AddRun does the actual allocation of physical memory
for each page. All AllocPage calls return either an address to contiguous
linear memory or an error if it's not available. With a 1-gigabyte address
space, it's unlikely you won't find a contiguous section of PTEs. It's more
likely you'll run out of physical memory.


Deallocation of Linear Memory


When pages are deallocated, the caller passes in a linear address (from a
previous AllocPage call) along with the number of pages to deallocate. The
caller must ensure that the number of pages in DeAllocPage does not exceed
what was allocated. If it does, the OS will attempt to deallocate as many
pages as requested. This may run into memory allocated in another request (but
only from that caller's memory space). There will be no error, but the memory
will not be available for later use. If fewer pages are passed in, only that
number will be deallocated. With the sole exception of DMA users (device
drivers), the caller will never know (nor should it try to find out) where the
physical memory is located.


Allocation of Physical Memory


By handling translation of linear to physical memory, the processor takes a
great deal of work away from the OS. It is not important if pages of memory in
a particular job are physically next to each other (with the exception of
DMA). The main goal of physical memory management is simply to keep track of
how much physical memory there is and whether or not it's currently in use.
Physical-memory allocation is tracked by pages with a single array, the
page-allocation map (PAM), which is similar to a bit-allocation map for a
disk. Each byte of the array represents eight 4-Kbyte pages (1 bit/page). This
means the PAM would be 512 bytes long for 16 Mbytes of physical memory. The
current version of MMURTL can handle 64 Mbytes of physical memory, making the
PAM 2048 bytes long. The PAM is an array of bytes from 0--2047, with the
least-significant bit of byte 0 representing the first physical 4-Kbyte page
in memory (physical addresses 0--4095).
Physical memory for AllocPage and AllocOSPage is allocated from the top down.
For AllocDMAPage, I allocate physical memory from the bottom up. Thus, even if
you install a device driver that uses DMA after your applications are up and
running, physical memory below 16 Mbytes will be available (that is, if any is
left). The PAM shows which pages of memory are in use, not who they belong to.
To get this information, you must go to the PDs and PTs.
That's it for memory management--simple but effective. MMURTL's messaging is
really its most powerful, and probably its most interesting, feature, but
you'll have to read the architecture section of the documentation to find out.


Acknowledgments


Many people have helped MMURTL along the way. Reginald B. Carey of CSSi
(Columbia, MD) is the inspiration behind the MMURTL kernel primitives. Thanks
also to Tom Clark and Dan Haynes of the U.S. Coast Guard's Telecommunications
and Information Systems Command (Alexandria, VA), Dave McLarty of Convergent
Consultants (Atlanta, GA), and Scott Bair of the U.S. Coast Guard.
Figure 1 Job map.
Table 1: Entries in the page directory.
=============================================================================
 Entry # Description
=============================================================================
 0 Physical address of OS PT
 1--255 Empty PTEs
 256 Physical address of job PT
 257--511 Empty PTEs
 512 Linear address of OS PT (shadow)
 513--767 Empty shadow PTEs
 768 Linear address of job PT (shadow)
 769--1023 Empty shadow PTEs
=============================================================================

















May, 1994
Inside Windows NT Services


Somewhere between daemons and TSRs




Marshall Brain


Marshall works for Interface Technologies (Wake Forest, NC), which does
software design, consulting, and programmer training in Windows NT, Motif,
C++, and object-oriented design. He is the lead author for Prentice Hall's
five-book series on Windows, which includes Win32 System Services: The Heart
of Windows NT. He can be reached at brain@iftech.com.


Every operating system needs a way to execute background tasks that run
continuously, regardless of who is using the machine. These background tasks
can perform various services important to the system or its users. For
example, a messaging system might monitor the network and display a dialog box
whenever it receives a message from another machine. An application that sends
and receives faxes needs to start up at boot time and then continuously
monitor the fax modem for fax machines dialing in. A home or office security
program, or code that controls a piece of test equipment, may need to poll
sensors periodically and respond to them when appropriate. All of these tasks
require CPU time to perform their jobs but should not affect a user working at
the keyboard because they require so little of the total CPU power available.
In MS-DOS, background tasks like these are handled by
terminate-and-stay-resident (TSR) programs. These programs are usually started
in the AUTOEXEC.BAT file. In UNIX, background tasks are handled by daemons.
Standard UNIX daemons such as cron or finger are usually started at the end of
the UNIX boot sequence, before the system lets the first user log in. In
Windows NT, background tasks are called "services" and can start automatically
when NT boots; they remain running in the background regardless of who is
logged in.
Windows NT services are implemented as otherwise-normal executables that
follow a specific protocol allowing them to interact properly with the service
control manager (SCM). In this article, I'll discuss how to create and install
simple Win32 services in Windows NT. Once you understand simple services, it
is easy to build your own, because all services, no matter how complicated,
must contain the same basic SCM interface code. Once the basic requirements of
the SCM protocol are met, there's no real difference between the executable
for a service and that of a regular program.
A good working knowledge of NT services is important to both programmers and
system administrators. Programmers obviously benefit because they can create
their own services. The benefit to administrators is more subtle but equally
important. Background tasks, in general, can be dangerous. Both MS-DOS and
Macintosh systems make good viral hosts because, through their lack of
security, they allow any person or program to create background tasks at any
time. Windows NT and UNIX systems are secure, so only a system administrator
can add background tasks to the system. However, if the administrator adds a
destructive background task, then it is free to do its damage. The
administrator who understands the mechanisms and privileges available to
Windows NT services can be more selective in installing potentially harmful
background tasks.


The Basics of NT Services


Services come in two different varieties. Driver services use device-driver
protocols to interface NT to specific pieces of hardware. Win32 services, on
the other hand, implement general background tasks using the normal Win32 API.
This article focuses on Win32 services because of their general utility and
ease of creation. Any NT programmer with the normal NT SDK (or Visual C++) and
administrative access to an NT machine can implement and install Win32
services. If you need to create a program that starts at boot time and runs
continuously as a background task in Windows NT, you will likely use the Win32
service protocol.
Services are accessible to user manipulation via the NT Control Panel, which
has a Services applet that displays a list of all available Win32 services.
This applet lets you start, stop, pause, and resume services. A second dialog,
accessed by pressing the Start button, lets you change the startup behavior as
well as the default account used by the service. A service can start
automatically at boot time, it can be totally disabled, or it can be set to
start manually. When starting a service manually, a user can supply startup
parameters. You need to be logged in as the administrator or a power user to
do anything with this applet.
Windows NT ships with a number of pre-installed services that handle such
things as network messaging, command scheduling with the "at" command, and
distributed RPC naming. When you create your own services, you must perform a
separate installation step to insert them into the list managed by the
services applet. The installation process adds information about a new
service--its name, the name of its executable, its startup type, and the
like--into the registry so that the SCM knows about the new service the next
time the machine boots.


Creating a New Service


A program that acts as a service is a normal EXE file, but it must meet
special requirements so that it interfaces properly with the SCM. The
designers of NT have carefully choreographed the flow of function calls, and
you must follow that plan closely or the service will not work. I'll summarize
the flow here.
The first step in the SCM protocol is for your service to call the
StartServiceCrtlDispatcher function. Your service should call this function
right at the beginning of your program's main (or WinMain) routine. This
function provides the SCM with the address of your ServiceMain function, which
the SCM will call when it starts the service. (For details on the functions
discussed here, including all function parameters, consult the Win32
programmer reference manuals or the api32wh.hlp help file. You can find
additional information on services in my book, Win32 System Services: The
Heart of Windows NT.)
The next step in the protocol is for the SCM to call your ServiceMain function
when it wants to start the service. This happens, for example, when the system
administrator presses the Start button in the Services applet; the SCM will
execute ServiceMain in a separate thread. Your ServiceMain should call
RegisterServiceCtrlHandler, which registers a Handler function with the SCM.
This Handler function is used by the SCM for control requests. You can name
the Handler function anything you like, but it is listed in the documentation
under Handler. The RegisterServiceCtrlHandler function returns a handle that
your service uses when sending status messages to the SCM.
Your ServiceMain function must also start the thread that does the actual work
of the service. ServiceMain should not return until it's time for the service
to stop. When it returns, the service has stopped.
The last major piece of the SCM protocol consists of your Handler function.
This function usually contains a switch statement that parses control requests
received from the SCM. By default, the SCM can send any of the requests in
Table 1, each of which is identified by a defined constant. You can also
specify custom constants (which have integer values ranging from 128 to 255)
and send them through the SCM to the service.
A complete NT service, therefore, consists of an EXE containing the main,
ServiceMain, and Handler functions, as well as a function that contains the
thread for the service itself. Figure 1 summarizes the interactions between
these different functions and the SCM.


A Simple Service


Listing One (page 100) shows the simplest possible service--a service that
simply beeps. By default, it beeps every two seconds. You can optionally
modify the beep interval with startup parameters. This service is complete in
that it will appropriately respond to the SCM for every control signal
possible. Because of that, you can use this code as a template for creating
your own services.
The main function calls StartServiceCtrlDispatcher to register the ServiceMain
function using an array of SERVICE_TABLE_ENTRY structures. In this case, the
program contains just one service, so there is only one entry in the table.
However, it is possible for you to implement several services within a single
EXE file, and in such cases, the table identifies the appropriate ServiceMain
function for each service.
Your initialization code can be placed in the main function prior to the call
to StartServiceCtrlDispatcher, but if it does not complete in less than 30
seconds, the SCM aborts the service on the assumption that something went
wrong.
The ServiceMain function gets called when the SCM wants to start the
service--either during the boot process or because of a manual start.
ServiceMain always contains the following steps:
1. RegisterServiceCtrlHandler is called to register the Handler function with
the SCM as this service's Handler function.
2. The SendStatusToSCM function is called to notify the SCM of progress. The
fourth parameter is a "click-count" value, which increments each time the
program updates the status. The SCM and other programs can look at the click
count and see that progress is being made during initialization. The last
parameter is a "wait hint" that tells the SCM how long (in milliseconds) it
should expect to wait before the click count gets updated again.
3. ServiceMain creates an event that will be used at the bottom of the
function to prevent it from returning until the SCM issues a STOP request.
4. ServiceMain checks for startup parameters. These can be passed in by the
user during a manual start (using the Startup Parameters line in the Service
applet). Any parameters are passed to the ServiceMain function via an
argv-style array.
5. If your service needs to perform other initialization tasks, they should be
placed here, just prior to the call to InitService.
6. The InitService function is called, starting the thread that does the
actual work of the service. If it succeeds, then ServiceMain lets the SCM know
that the service has successfully started.
7. ServiceMain now calls WaitForSingleObject, which waits efficiently for the
terminateEvent event object to be set in the Handler function. Once it is,
ServiceMain calls the terminate function to clean up and then returns to stop
the service.
There isn't much flexibility in this sequence: With the exception of step #5,
you must perform each of the tasks in the order indicated for the service to
start properly.
The terminate function cleans up any open handles and sends a status message
to the SCM to tell it that the service is stopped. The SCM calls the Handler
function whenever it wants to pause, resume, interrogate, or stop the service.
To stop the service, the handler sets terminateEvent. By doing this, it causes
ServiceMain, which is executing as a separate thread, to terminate and return.
Once ServiceMain returns, the service is stopped.
The SendStatusToSCM function consolidates all of the statements necessary to
send the service's current status to the SCM. The InitService function gets
called by ServiceMain when it needs to start the service's thread. This
function calls CreateThread to create a new thread for the service.
ServiceThread contains the actual work to be performed by the service. In this
case, the thread consists of an infinite loop that beeps and then sleeps for a
predetermined interval. When creating your own services, you can place any
code that you like in this thread, calling either Win32 functions or your own
functions.



Installing and Removing Services


In order to use the beep service, you have to install it. Installation makes
the SCM aware of the service and causes the SCM to add it to the list of
services that appears in the Services applet of the Control Panel. INSTALL.CPP
(Listing Two, page 101) demonstrates how to install a service. The code begins
by opening a connection to the SCM using the OpenSCManager function. In the
call to OpenSCManager, you must specify what you want to do so that the SCM
can validate that activity. If the account you are logged in under does not
have sufficient privilege, then the call will return NULL.
Installing the new service is accomplished by a call to CreateService. This
call uses the pointer to the SCM returned by OpenSCManager; additional
parameters include the name, label, and EXE file specified on the command
line, along with a set of standard parameters to fill in all of the other
values. The use of SERVICE_WIN32_OWN_PROCESS indicates that the service's EXE
file contains just one service, and the SERVICE_DEMAND_START parameter
indicates that the service is started manually rather than automatically. A
typical invocation of the install program from the command line might be:
install BeepService "Beeper" c:\winnt\beep.exe.
The first parameter for this command specifies the name of the service used
internally by the SCM. You will use this name later to remove the service. The
second parameter specifies the label used to display the service in the
Services applet. The third parameter gives the fully qualified path to the
service's executable. After you install the service, start it using the
Services applet in the Control Panel. You can look up the error codes in the
online help file for the Win32 API.
To remove a service, follow the steps in REMOVE.CPP (Listing Three, page 101),
which starts by opening a connection to the SCM. It then opens a connection to
the service using the OpenService function and queries the service to find out
if it is currently stopped. If it is not, REMOVE.CPP stops it. The
DeleteService function removes the service from the Services applet in the
Control Panel. The usual way of invoking the removal program is: remove
BeepService.


Conclusion


Services are an essential part of Windows NT because they allow you to extend
the operating system. Using my code as a template, you will find that it is
easy to create new services of your own.
Table 1: Control requests sent by the SCM to your service.
=============================================================================
 Request Description
=============================================================================
 SERVICE_CONTROL_STOP Tells service to stop.
 SERVICE_CONTROL_PAUSE Tells service to pause.
 SERVICE_CONTROL_CONTINUE Tells service to resume.
 SERVICE_CONTROL_INTERROGATE Tells service to immediately report its status.
 SERVICE_CONTROL_SHUTDOWN Tells service shutdown is imminent.
=============================================================================
Figure 1: The relationship between the SCM, the service's EXE, and the Install
program.
[LISTING ONE] (Text begins on page 48.)

//****************************************************************************
// BEEPSERV.CPP -- the simplest possible service for Windows NT. By Marshall
// Brain. This service beeps every 2 seconds, or at a user-specified interval.
// Code is also in the book "Win32 System Services: The Heart of Windows NT"
//****************************************************************************

#include <windows.h>
#include <stdio.h>
#include <iostream.h>
#include <stdlib.h>

#define DEFAULT_BEEP_DELAY 2000

//------- Global variables --------
char *SERVICE_NAME = "BeepService"; // The name of the service
HANDLE terminateEvent = NULL; // for holding ServiceMain from completing
int beepDelay = DEFAULT_BEEP_DELAY; // The beep interval in ms.
BOOL pauseService = FALSE; // Flags holding current state of service
BOOL runningService = FALSE;
HANDLE threadHandle = 0; // Thread for the actual work
SERVICE_STATUS_HANDLE serviceStatusHandle; // Handle used to communicate
 // status info with the SCM. Created
 // by RegisterServiceCtrlHandler.
//------------------------------------------------------------------
void ErrorHandler(char *s, DWORD err)
{ cout << s << endl << "Error number: " << err << endl;
 ExitProcess(err);
}
//--- SendStatusToSCM() - Consolidates status updates sent via
SetServiceStatus
BOOL SendStatusToSCM (DWORD dwCurrentState,
 DWORD dwWin32ExitCode,

 DWORD dwServiceSpecificExitCode,
 DWORD dwCheckPoint,
 DWORD dwWaitHint)
{ BOOL success;
 SERVICE_STATUS serviceStatus;
 // Fill in all of the SERVICE_STATUS fields
 serviceStatus.dwServiceType = SERVICE_WIN32_OWN_PROCESS;
 serviceStatus.dwCurrentState = dwCurrentState;
 // If in the process of doing something, then accept
 // no control events, else accept anything.
 if (dwCurrentState == SERVICE_START_PENDING) {
 serviceStatus.dwControlsAccepted = 0;
 }
 else { serviceStatus.dwControlsAccepted =
 SERVICE_ACCEPT_STOP 
 SERVICE_ACCEPT_PAUSE_CONTINUE 
 SERVICE_ACCEPT_SHUTDOWN;
 }
 // If a specific exit code is defined, set up win32 exit code properly
 if (dwServiceSpecificExitCode == 0) {
 serviceStatus.dwWin32ExitCode = dwWin32ExitCode;
 }
 else { serviceStatus.dwWin32ExitCode = ERROR_SERVICE_SPECIFIC_ERROR;
 }
 serviceStatus.dwServiceSpecificExitCode = dwServiceSpecificExitCode;
 serviceStatus.dwCheckPoint = dwCheckPoint;
 serviceStatus.dwWaitHint = dwWaitHint;
 // Pass the status record to the SCM
 success = SetServiceStatus (serviceStatusHandle, &serviceStatus);
 return success;
}
//------------------------------------------------------------------
DWORD ServiceThread(LPDWORD param)
{ while (1)
 { Beep(200,200); Sleep(beepDelay);
 }
 return 0;
}
//---- InitService() -- initializes the service by starting its thread ----
BOOL InitService()
{ DWORD id;
 threadHandle = CreateThread(0, 0, // Start the service's thread
 (LPTHREAD_START_ROUTINE) ServiceThread,
 0, 0, &id);
 if (threadHandle==0) { return FALSE; }
 else { runningService = TRUE; return TRUE; }
}
//---- Handler() -- dispatches events received from SCM ----
VOID Handler (DWORD controlCode)
{ DWORD currentState = 0;
 BOOL success; // There is no START option because
 switch(controlCode) // ServiceMain gets called on a start.
 { // Stop the service.
 case SERVICE_CONTROL_STOP: // Tell SCM what's happening.
 success = SendStatusToSCM(SERVICE_STOP_PENDING,
 NO_ERROR, 0, 1, 5000);
 runningService=FALSE; // Set the event that is holding
 // ServiceMain, so that
 SetEvent(terminateEvent); // ServiceMain can return.

 return;
 case SERVICE_CONTROL_PAUSE: // Pause the service
 if (runningService && !pauseService)
 { // Tell SCM what's happening.
 success = SendStatusToSCM( SERVICE_PAUSE_PENDING,
 NO_ERROR, 0, 1, 1000);
 pauseService = TRUE;
 SuspendThread(threadHandle);
 currentState = SERVICE_PAUSED;
 }

 break;
 case SERVICE_CONTROL_CONTINUE: // Resume from a pause
 if (runningService && pauseService)
 { // Tell the SCM what's happening
 success = SendStatusToSCM( SERVICE_CONTINUE_PENDING,
 NO_ERROR, 0, 1, 1000);
 pauseService=FALSE;
 ResumeThread(threadHandle);
 currentState = SERVICE_RUNNING;
 }
 break;
 case SERVICE_CONTROL_INTERROGATE: // Update current status
 break; // it will fall to bottom and send status
 case SERVICE_CONTROL_SHUTDOWN:
 // Do nothing in a shutdown. Could do cleanup but must be quick
 return;
 default: break;
 }
 SendStatusToSCM(currentState, NO_ERROR, 0, 0, 0);
}
//---- terminate() -- handle an error from ServiceMain by cleaning up
//---- and telling SCM that the service didn't start.
VOID terminate(DWORD error)
{ // If terminateEvent has been created, close it.
 if (terminateEvent) CloseHandle(terminateEvent);
 // Send a message to SCM to tell about stoppage.
 if (serviceStatusHandle) {
 SendStatusToSCM(SERVICE_STOPPED, error,0, 0, 0);
 } // If the thread has started, kill it off.
 if (threadHandle) CloseHandle(threadHandle);
 // Do not need to close serviceStatusHandle.
}
//---- ServiceMain is called when SCM wants to start service. When it returns,
// the service has stopped. It therefore waits on an event just before the end
// of the function, and that event gets set when it is time to stop. It also
// returns on any error because the service cannot start if there is an error.
VOID ServiceMain(DWORD argc, LPTSTR *argv)
{ BOOL success; //
 // immediately call Registration function
 serviceStatusHandle = RegisterServiceCtrlHandler(SERVICE_NAME,Handler);
 if (!serviceStatusHandle) { terminate(GetLastError()); return; }
 // Notify SCM of progress
 success = SendStatusToSCM(SERVICE_START_PENDING, NO_ERROR, 0, 1, 5000);
 if (!success) { terminate(GetLastError()); return; }

 // create the termination event
 terminateEvent = CreateEvent (0, TRUE, FALSE, 0);
 if (!terminateEvent) { terminate(GetLastError()); return; }

 // Notify SCM of progress
 success = SendStatusToSCM(SERVICE_START_PENDING,
 NO_ERROR, 0, 2, 1000);
 if (!success) { terminate(GetLastError()); return; }
 if (argc == 2) // Check for startup params
 { int temp = atoi(argv[1]);
 if (temp < 1000) beepDelay = DEFAULT_BEEP_DELAY;
 else beepDelay = temp;
 } //
 // Notify SCM of progress
 success = SendStatusToSCM(SERVICE_START_PENDING, NO_ERROR, 0, 3, 5000);
 if (!success) { terminate(GetLastError()); return; }
 // Start the service itself
 success = InitService(); //
 if (!success) { terminate(GetLastError()); return; }
 // The service is now running.
 // Notify SCM of progress
 success = SendStatusToSCM(SERVICE_RUNNING, NO_ERROR, 0, 0, 0);
 if (!success) { terminate(GetLastError()); return; }
 //
 // Wait for stop signal, and then terminate
 WaitForSingleObject (terminateEvent, INFINITE);
 terminate(0);
}
//------------------------------------------------------------------
VOID main(VOID)
{ SERVICE_TABLE_ENTRY serviceTable[] =
 { { SERVICE_NAME, (LPSERVICE_MAIN_FUNCTION) ServiceMain },
 { NULL, NULL }};
 BOOL success;
 //
 // Register with the SCM
 success = StartServiceCtrlDispatcher(serviceTable);
 if (!success) ErrorHandler("In StartServiceCtrlDispatcher",
 GetLastError());
}

[LISTING TWO]

//-----------------------------------------------------------
// INSTALL.CPP -- this code installs a service. By Marshall Brain.
//-----------------------------------------------------------
#include <windows.h>
#include <iostream.h>
//-----------------------------------------------------------
void main(int argc, char *argv[])
{ SC_HANDLE newService, scm;
 if (argc != 4)
 { cout << "Usage:\n";
 cout << " install service_name service_label executable\n";
 cout << " service_name is the name used internally"
 " by SCM\n";
 cout << " service_label is the name that appears"
 " in the Services applet\n";
 cout << " (for multiple words, put them in"
 " double quotes)\n";
 cout << " executable is the full path to the EXE\n";
 cout << "\n";
 return;

 }
 // Open a connection to the SCM
 scm = OpenSCManager( 0, 0, SC_MANAGER_CREATE_SERVICE);
 if (!scm) ErrorHandler("In OpenScManager",GetLastError());
 // Install the new service
 newService = CreateService( scm, argv[1], // eg "beep_srv"
 argv[2], // eg "Beep Service"
 SERVICE_ALL_ACCESS,
 SERVICE_WIN32_OWN_PROCESS,
 SERVICE_DEMAND_START,
 SERVICE_ERROR_NORMAL,
 argv[3], // eg "c:\winnt\xxx.exe"
 0, 0, 0, 0, 0);
 if (!newService) ErrorHandler("In CreateService", GetLastError());
 else cout << "Service installed\n";
 // Clean up
 CloseServiceHandle(newService);
 CloseServiceHandle(scm);
}

[LISTING THREE]

//-------------------------------------------------------------
// REMOVE.CPP -- this code removes a service from the Services
// applet in the Windows NT Control Panel. By Marshall Brain.
//-------------------------------------------------------------
#include <windows.h>
#include <iostream.h>
//-----------------------------------------------------------
void main(int argc, char *argv[])
{ SC_HANDLE service, scm;
 BOOL success;
 SERVICE_STATUS status;
 if (argc != 2)
 { cout << "Usage:\n remove service_name\n"; return;
 }
 // Open a connection to the SCM
 scm = OpenSCManager(0, 0, SC_MANAGER_CREATE_SERVICE);
 if (!scm) ErrorHandler("In OpenScManager", GetLastError());
 // Get the service's handle
 service = OpenService(scm, argv[1], SERVICE_ALL_ACCESS DELETE);
 if (!service) ErrorHandler("In OpenService", GetLastError());
 // Stop the service if necessary
 success = QueryServiceStatus(service, &status);
 if (!success)
 ErrorHandler("In QueryServiceStatus", GetLastError());
 if (status.dwCurrentState != SERVICE_STOPPED)
 { cout << "Stopping service...\n";
 success = ControlService(service, SERVICE_CONTROL_STOP,
 &status);
 if (!success) ErrorHandler("In ControlService",
 GetLastError());
 }
 // Remove the service
 success = DeleteService(service);
 if (success) cout << "Service removed\n";
 else ErrorHandler("In DeleteService", GetLastError());
 // Clean up
 CloseServiceHandle(service);

 CloseServiceHandle(scm);
}
End Listings



























































May, 1994
Optimizing Matrix Math on the Pentium


Hand-tuning using C and assembler




Harlan W. Stockman


Harlan, who earned his doctorate in geochemistry at MIT, works in the
geochemistry department at Sandia National Labs. He can be contacted at
hwstock@sandia.gov.


In many applications, matrix math is the rate-limiting step. With the 486,
programmers optimized matrix math by minimizing the number of instructions,
putting little emphasis on the flow of data onto or off the CPU, since
floating-point addition and multiplication were two to ten times slower than
loads and stores.
Times have changed. The Pentium can add or multiply floating-point numbers in
one or two clock cycles; cache misses and data-induced pipeline stalls are now
limiting factors in numeric performance. Pentium programmers must embrace new
concepts, familiar to RISC programmers, to achieve the highest speed. Yet the
Pentium retains a distinctly non-RISC architecture--the comparatively small
number of registers and the stack structure are hard for compilers to handle.
On the other hand, it is extremely easy to improve Pentium performance with
tiny fragments of assembly language; the performance gains are typically much
larger than you get with a 486 and hint at the power of future,
Pentium-optimized compilers.
This article examines methods to speed up matrix operations on the Pentium and
explores some of the common lore about optimizing C and assembly code. I'll
begin with a matrix-multiplication example, then move to the LINPACK routines
for solving simultaneous linear equations. Some of these methods have been
discussed elsewhere [Ross, 1993], but those discussions tend to be
qualitative. My intent is to provide a quantitative assessment of each method.
I used three machines for my testing: a 60-MHz Pentium (Gateway P5-60 with 256
Kbytes write-through L2 cache); a 33-MHz 486DX (Gateway 486/33C with 64 Kbytes
of write-through L2 cache); and a 100-MHz MIPS R4000 (SGI Elan with 1 Mbyte of
write-back L2 cache). The Pentium code was compiled with Symantec C++ 6.0 for
DOS, using the --mx --5 --o --f switches. This compiler aligns data on 8-byte
boundaries, but otherwise doesn't perform Pentium-specific optimizations for
the floating-point unit (FPU). I compiled the R4000 code with SGI's UNIX cc
version 3.1 under IRIX 4.0.5H, using the --O3 --mips2 switches; the latter
turn on all optimizations, including loop unrolling and global register
allocation and allow 64-bit loads and stores.


Matrix Multiplication: The Cache Counts


Table 1 compares the times required to multiply two double-precision matrices,
a[][] and b[][], to form a product matrix c[][], using five different
algorithms. Example 1, adapted from Mark Smotherman's mm.c benchmark, shows
the first three (and simplest) of these methods. By compiling with the
--DASMLOOP switch, you can replace the standard inner loop with one of the
handcoded assembly routines discussed shortly. Despite the simplicity of the
operation, there is nearly a twelve-fold variation in speed for the Pentium,
and a factor of 8.2 for the R4000. The slowest algorithm, by far, is the
"normal" method; see Example 1(a). Simple variations on the normal method,
such as transposing the b[][] matrix or switching the loop order, as in
Example 1(b) and 1(c), cause a two- to eight-fold increase in speed.
To understand why the normal method is so bad, consider how the processor
fills the L1 data cache. On the Pentium, the L1 cache is divided into 128
two-line sets, each line containing 32 bytes (or four double-precision
numbers). Each time a matrix element is read from memory, the processor checks
if that datum is available in one of the lines; if not, the element and its
three neighbors are read into a cache line. Thus, if your program tends to use
data in consecutive memory locations, the processor will generally find the
datum it needs in the cache, and won't have to look in the much slower main
memory. However, the number of lines is limited, so if a program reads
elements from widely separated places in memory, it will typically overwrite
cache lines before they are reused.
Now you can see why the normal method (see Figure 1) is so hard on the cache.
The inner loop forms the dot product, where the bkj elements are accessed
sequentially along the jth column and each element is separated by 8N bytes (N
being the number of elements in a row). Each time you read a bkj element, the
cache tries to read its three neighbors, assuming they will soon be used.
However, these neighbors are in the next three columns, and the program won't
try to fetch them until it cycles through the next outer loop--by then, the
cache line will have been overwritten. Translating the inner loop into
assembly language (the third entry in Table 1) has little effect, since the
time taken to read data into the CPU far overshadows the time for addition and
multiplication. In contrast, the inner loops of Examples 1(b) and 1(c) access
the a[][], b[][], and c[][] arrays along rows and use the cache much more
efficiently.
There are two lessons to be learned from the matrix-multiplication example.
First, write C routines so that the innermost loops operate on rows whenever
possible, or at least reuse data as much as you can while it is in the cache.
Second, before you spend a lot of time writing assembly language, make sure
the overall algorithm is cache-efficient.
While searching for the most efficient C algorithm, be aware that many methods
are implicitly optimized for RISC chips and may not be suited for the Pentium.
An example is the warner() method (lines 11 and 12 of Table 1). In the pure C
version, this algorithm does not substantially improve Pentium performance
over the simpler reg_loops() method. However, the R4000 speed is greatly
improved by warner(). The latter algorithm copies array elements into numerous
temporary variables to keep the program from constantly recalculating
addresses in the innermost loop; this approach works well on a RISC machine,
since the compiler will keep these variables in registers. The Pentium has far
fewer registers than the R4000, so the C compiler tends to keep the temporary
variables in memory; thus the Pentium sees little benefit from the warner()
method unless one uses hand-optimized assembly language for the inner loop,
carefully keeping the temporary variables on the FPU stack.


Which Array Declaration is Best?


It is generally accepted [Press et al., 1992] that 2-D arrays are more
efficient when declared as pointers to pointers to double--that is, as double
**a instead of double a[500][500]. The latter declaration ostensibly requires
a multiplication by the row length to calculate the address of an element;
since integer multiplications are comparatively slow (10 or 11 clock cycles on
a Pentium or R4000), it is reasonable to assume that the a[500][500]
declaration will be less efficient. However, lines 2 and 5 in Table 1 tell a
different story. For the "normal" algorithm, the **a type declaration produces
substantially slower code on both the Pentium and R4000.
To find the source of this discrepancy, I disassembled the object code
produced by the R4000 and Pentium compilers. For both processors, the object
code for normal() contains 1.5 times as many loads from memory when the arrays
are declared **a instead of a[500][500], since the a[], b[], and c[] pointers
must also be read from memory. But equally important, with the
a[500][500]-type declaration both compilers managed to generate inner loops
free of integer multiplication. Both the Pentium and SGI compilers effectively
transformed the innermost loop of normal() into Example 2, which requires no
integer multiplication to calculate successive b[k][j] addresses.


Optimizing With ASM


To eke more performance out of the Pentium, you can replace the inner loops
with two simple assembly-language routines. The function ddot() (Listing One,
page 104) returns the dot product of the vectors x and y; and daxpy() (Listing
Two, page 104) forms the vector sum aox+y, where a is a scalar, replacing the
old y values with the new. Both ddot() and daxpy() process n elements in an
unrolled loop of n/4 iterations, relying on a cleanup block to process the
last (n mod 4) elements. These functions have wide application and are at the
core of the LINPACK linear-algebra package.
From the assembly-language programmer's perspective, the Pentium FPU looks
much like its 80x87 predecessors. There are eight 80-bit registers (st0
through st7) arranged in a stack; for most instructions, one of the operands
must be st0, which leads to a "top-of-stack" bottleneck. However, there are
several significant differences between the Pentium and the 486 and 386/387.
First, with the Pentium it is much more important to avoid "data
collisions"--that is, successive FPU operations that reference the same
location in memory and stall the pipeline. Second, the fxch instruction--which
switches the contents of two registers--can often be executed in parallel with
another FPU instruction, making it essentially clockless. Careful use of fxch
can make the Pentium FPU stack look much like a set of general registers and
removes the top-of-stack bottleneck. Perhaps more important, fxch can be used
to avoid data collisions. However, lacing code with abundant fxch instructions
will slow down the 387 and 486, so you must be judicious. Third, while it is
still possible to achieve simultaneous execution of integer and floating-point
instructions on the Pentium, the value of this trick is diminished. FPU and
integer instructions share the first three stages of the U and V pipelines,
then feed into separate execution units. I found that interleaving integer
instructions with fadd, fmul, and fst generally slowed the code by a few
percent, since the smooth flow of the 8-stage FPU pipeline was disrupted.
To illustrate the importance of assembly-level optimization, let's examine the
object code produced by the compiler. Example 3 is C code for the inner loop
of daxpy() and the corresponding assembly language produced by Symantec C/6.0.
This compiler-produced ASM is pretty good, but has several inefficiencies. The
calculation of addresses is complicated; the compiler doesn't seem to notice
the simple relationship among the addresses of x[i], x[i+1], and so on, so
offsets are stored on the memory stack (referenced by ebp) and reloaded into
registers with each iteration of the loop. The value of a (in aoxi+yi) is also
kept in memory, even though it could easily be stored in one of the six free
FPU registers. Equally important are the pairs of fadd, fstp instructions that
set up a "data collision" that stalls the pipeline.
Example 4 shows three different ways to recode the loop of daxpy(), from the
least efficient, (a), to the most efficient, (c). (Example 5 is a schematic of
the FPU stack, so you can track the register contents after each operation.)
All three algorithms keep "a" on the FPU stack and have simple addressing,
which results in substantial savings on the Pentium. However, Example 4(a)
retains the adjacent instructions like fadd qword ptr [ebx] and fstp qword ptr
[ebx]. Since each instruction references the same address in memory, data
collisions stall the pipeline. The function in Example 4(b) removes three of
the collisions by accumulating four aoxi+yi results on the stack, then
successively popping and storing the results. Example 4(c) removes the last
collision with a single fxch instruction. Table 2 gives the time to perform
100 million floating-point operations with each version of daxpy(), for both
the Pentium and 33-MHz 486. For reference, the first line of Table 2 gives the
speed of a pure C version of the code. On the Pentium, the speedup from
Examples 4(a) to 4(b) is 17 to 26 percent, with an additional 6 to 9 percent
from Examples 4(b) to 4(c). The most efficient Pentium algorithm is 1.75 to
2.14 times as fast as the optimized, compiled C code. Table 2 also shows
several dramatic differences between the Pentium and the 486. First of all,
the advantage of assembly language, compared to compiled C code, is much
smaller for the 486; the fastest ASM routine is only 1.10 to 1.28 times faster
than the C code. Second, there is very little difference in speed-
-less than 3 percent--among the various ASM routines. The data collisions have
almost no effect on 486 speed, and adding the extra fxch slows the 486
algorithm by a mere 1.6 to 2.7 percent. Lastly, with optimized code the
Pentium is 7.4 to 9.0 times faster than the 486/33; normalized to the same
clock speed, the Pentium is 4.1 to 5.0 times faster. Note that daxpy() is
similar in purpose to the da() function given by Subramaniam and Kundargi
("Programming the Pentium Processor," DDJ, June 1993). The code used in da()
is easier to generate with a compiler, but requires four times as many fxch to
avoid data collisions, penalizing the 486 by an additional 7 percent.
Returning to Table 1, you see that the assembly-language routines speed up
matrix multiplication by a factor 1.6 to 2.66 on the Pentium. In fact, these
simple ASM routines allow the Pentium to match or surpass the 100-MHz R4000,
while carrying an extra 16 bits of precision. The comparison is not completely
fair, since I've not hand-optimized the R4000 code. However, it is generally
not wise to hand-tune R4000 code on an SGI. First of all, SGI's compilers are
extremely good; I disassembled the R4000 code for daxpy() and several other
matrix routines and was hard pressed to find any improvements. Second, SGI's
assembler does a lot of instruction shuffling to take advantage of load-delay
slots. The instruction sequence in object code can be quite different from the
sequence specified in your assembly-language source, and hand-optimizations
may make the code run slower. But most important, SGI's global register
allocation (the --O3 compiler switch) makes assembly tuning almost impossible,
since you can't specify registers at will; individual program modules are
compiled to an intermediate ucode, and the linker decides how to allocate
registers only after examining all the modules. The global-allocation scheme
is a key to the R4000's excellent performance; the next-lower optimization
scheme (--O2) generated code up to 40 percent slower.


LINPACK


LINPACK is an extensive linear-algebra package for manipulating matrices and
solving linear systems [Dongarra et al., 1979]. The code was originally
written in Fortran, but portions have been translated into C, as in the
clinpack benchmark. Among the most widely used routines in LINPACK are dgefa()
and dgesl(), which can be combined to solve a linear system Aox=y by the LU
decomposition method, where the vector x is the "solution." This method first
factors the matrix A into lower and upper triangular matrices L and U, such
that A=LoU. Thus: (LoU) x=y or L (Uox)=y. We define Uox=y* where y* is a
column vector, and solve Loy*=y for y*; since L is triangular, the solution
follows very simply from forward substitution. We then solve Uox=y* for x by
back substitution. In LINPACK, dgefa() factors the matrix A into L and U, and
dgesl() solves the system by substitution. Typically, dgefa takes the lion's
share of computation time.
Though the LU method may seem indirect and complicated, it is actually quite
fast and robust; moreover, it can be written so that the innermost loops
perform operations purely on rows, minimizing the cache thrashing we discussed
in the matrix-multiplication example. In fact, the inner loops of dgesl() and
dgefa() call the same daxpy() and ddot() routines discussed in the
matrix-multiplication example. I use the LU method extensively, and the matrix
A commonly contains at least 300x300 double-precision elements and is solved
up to a thousand times in a single program. Thus, the speed of these routines
is very important to me.
Table 3 shows results for the clinpack.c benchmark, which calls dgesl() and
dgefa() repeatedly to solve several sets of 100x100 linear systems, using
double precision. The matrices are embedded in larger (200x200) arrays to
ensure the cache is exercised. The standard benchmark runs ten iterations; I
found it necessary to use 100 iterations, due to the poor resolution of the PC
system clock. The assembly-language versions of daxpy() and ddot() were used
for the first line of the table; the second line is for the pure C algorithm.
The third line gives results of the Fortran version of the benchmark (Lahey
compiler, 486 optimizations), and the fourth, a C-compiled, f2c translation of
the original Fortran benchmark. (F2c is a public-domain program to translate
Fortran source into C [Feldman et al., 1992]; coupled with a good C compiler,
it is widely regarded as an inexpensive substitute for a real Fortran
compiler.)
Table 3 has several significant points. First, with a little ASM, the 60-MHz
Pentium surpasses a 100-MHz R4000. Second, the ASM routines once again have
far more of an effect on the Pentium than on the 486. Third, don't expect
compiled f2c translations to match the quality of a real Fortran compiler or
well-optimized, native C code. The f2c translation is perhaps acceptable for
the 486, but on the Pentium it is nearly three times slower than the best
C/ASM code.
The problem with f2c lies partly in the way it maps arrays. A 2-D Fortran
array a(i,k) is translated to a 1-D C array, each element referenced as
a[i+N*k], where N is the number of elements in a Fortran column, and i and k
are adjusted for the Fortran indexing offset. Consequently, the translated
code is peppered with integer multiplications, and most C compilers can't
optimize them away. On a 486, integer multiplication is roughly the same speed
as a floating-point operation but on the Pentium, an imul takes 10 or 11 clock
ticks, compared to just 1 or 2 for fadd or fmul. Thus on the Pentium, the
f2c-translated code can spend much more time using integer multiplication to
calculate addresses than it spends actually performing floating-point
operations.
While LINPACK is still in wide use, it has a very powerful successor, called
LAPACK [Anderson, 1992]. The LAPACK routines carry cache optimization even
further, with sophisticated "blocking" algorithms that divide large matrices
into chunks small enough to fit in the L1 and L2 caches. Like LINPACK, LAPACK
uses basic linear-algebra subprograms (or BLAS) to perform low-level matrix
operations. However, the LAPACK BLAS are much more complicated than the simple
daxpy() and ddot() routines described earlier. Some workstation manufacturers
include hand-coded LAPACK BLAS in their compiler libraries, offering dramatic
improvements over simple, compiled C and Fortran code.



Conclusions


The Pentium is not just a fast 486. The design of the Pentium pipeline and the
radical changes in relative speeds of floating-point and integer operations
require much more attention to the flow of data on and off the FPU. Currently,
the two best methods to speed Pentium matrix operations are to use
cache-efficient C or Fortran code and handcode the inner loops to simplify
address calculations and avoid pipeline stalls.
It is obvious from Table 1 and Table 3 that pure C code can be 1.3 to 2.5
times faster on the 100-MHz R4000 than on the 60-MHz Pentium. After
disassembling object code from the R4000 and Pentium, I concluded that much of
this speed difference was due to the sophistication of SGI's R4000 compiler,
rather than a hardware advantage. The SGI compilers unroll loops, minimize the
time lost in address calculations, and reshuffle instructions to avoid load
delays and data collisions. The simple hand optimizations discussed in this
article mimic those performed by the SGI compiler and allow the Pentium to
match or surpass R4000 performance. Let's hope that as Pentium compilers
mature, the hand optimizations will be less and less necessary.


Availability


The complete mm.c and LINPACK benchmarks are too large to include in this
article. However, they can be obtained from many ftp sites around the world
(including ftp.nosc.mil). They can be located via archie servers. The f2c
translator is available from netlib.att.com, and from many bulletin boards.
The mm.c benchmark was developed by Mark Smotherman, and contains
contributions from numerous programmers. In particular, the reg_loops() and
tiling() algorithms were developed by Monica Lam, and the warner() method was
modified from an algorithm by Dan Warner.


References


Anderson, B. LAPACK User's Guide. SIAM (1992).
Dongarra, J.J., C.B. Moler, and J.R. Bunch. Linpack: User's Guide. SIAM
(1979).
Feldman, S.I., D.M. Gay, M.W. Maimone, and N.L. Schryer. A Fortran-to-C
Converter. Computing Science Technical Report No. 149, AT&T Bell Laboratories,
Murray Hill, NJ (1992).
Press, W.H., S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical
Recipes in C. Cambridge, U.K.: Cambridge University Press (1992).
Ross, J.W. "Calling C Functions with Variably Dimensioned Arrays." Dr. Dobb's
Journal (August 1993).
Subramaniam, R. and K. Kundargi. "Programming the Pentium Processor." Dr.
Dobb's Journal (June 1993).
Figure 1The dot product in normal().
Table 1: Time in seconds to multiply two double-precision 500x500 matrices.
=============================================================================
 Pentium R4000
 60 MHz 100 MHz
=============================================================================
 normal() 145.7 72.34
 normal(), **a 186.2 128.6
 normal(), asm 121.6 --
 inner loop
 transpose() 58.66 29.87
 transpose(), **a 61.08 30.61
 transpose(), asm 23.07 --
 ddot() inner loop
 reg_loops() 43.44 34.82
 reg_loops(), asm 26.60 --
 daxpy() inner loop
 tiling() 53.28 36.18
 tiling(), asm 22.91 --
 daxpy() inner loop
 warner() 42.89 15.67
 warner(), asm 16.09 --
 inner loop
=============================================================================


Example 1: Matrix multiplication.
(a) void normal(){ int i,j,k; for (i=0;i<N;i++){ for (j=0;j<N;j++){ c[i][j] =
0.0; for (k=0;k<N;k++){ c[i][j] += a[i][k]*b[k][j]; } } }}
(b) void transpose(){ int i,j,k; double temp; for (i=0;i<N;i++){ for
(j=0;j<N;j++){ bt[j][i] = b[i][j]; } } for (i=0;i<N;i++){ for
(j=0;j<N;j++){#ifndef ASMLOOP temp = a[i][0]*bt[j][0]; for (k=1;k<N;k++){ temp
+= a[i][k]*bt[j][k]; } c[i][j] = temp;#else c[i][j] = ddot(N, a[i],
bt[j]);#endif } }}
(c) void reg_loops(){ int i,j,k; double a_entry; for (i=0;i<N;i++){ for
(j=0;j<N;j++){ c[i][j] = 0.0; } } for (i=0;i<N;i++){ for (k=0;k<N;k++){#ifndef
ASMLOOP a_entry = a[i][k]; for (j=0;j<N;j++){ c[i][j] += a_entry*b[k][j];
}#else daxpy(N, a[i]+k, b[k], c[i]);#endif } }}
Example 2: Inner loop of normal() after being transformed by both the Pentium
and SGI compilers.
aptr = a[i];
cptr = c[i] + j;
bptr = b[0] + j;
for(k=0; k<500; bptr+=500;){

 *cptr += *(aptr + k) * *bptr;
}



Table 2: Time in seconds to perform 100 million operations with daxpy(). N is
the number of elements per row.
 Pentium, 60 MHz 486, 33 MHz N=15 N=500 N=15 N=500

 Compiled C 18.13 11.26 80.12 63.42
 ASM, Example 4(a) 10.49 9.12 62.82 58.14
 ASM, Example 4(b) 8.95 7.25 62.87 57.87
 ASM, Example 4(c) 8.46 6.65 64.25 59.41
 ASM, Example 4(c) 8.90 6.43 63.64 58.64
 loops unrolled by 8


Table 3: LINPACK benchmarks (MFLOPS).
 Pentium 486 R4000 60 MHz 33 MHz 100 MHz

 clinpack.c/asm 10.59 1.64 --
 clinpack.c 6.35 1.44 9.22
 linpack.for (Lahey 6.83 1.43 --
 FORTRAN)
 f2c translation 3.57 1.09 --
 of linpack.for


Example 3: (a) Main loop of daxpy() in C; (b) corresponding assembly language
produced by compiler.
(a)

for (i=m; i<n; i=i+4) {
 y[i] = y[i] + a*x[i];
 y[i+1] = y[i+1] + a*x[i+1];
 y[i+2] = y[i+2] + a*x[i+2];
 y[i+3] = y[i+3] + a*x[i+3];
}

(b)

L124: mov ECX, EBX
 sub ECX, ESI
 mov EAX, 014h[EBP]
 fld qword ptr [EAX][ECX]
 fmul qword ptr 0Ch[EBP]
 mov EAX, 01Ch[EBP]
 fadd qword ptr [EAX][ECX]
 fstp qword ptr [EAX][ECX]
 fld qword ptr [EDX][ECX]
 fmul qword ptr 0Ch[EBP]
 mov EAX, 010h[EBP]

 fadd qword ptr [EAX][ECX]
 fstp qword ptr [EAX][ECX]
 wait
 mov EAX, 0Ch[EBP]
 fld qword ptr [EAX][ECX]
 fmul qword ptr 0Ch[EBP]
 mov EAX, 8[EBP]
 fadd qword ptr [EAX][ECX]

 fstp qword ptr [EAX][ECX]
 fld qword ptr [EBX]
 fmul qword ptr 0Ch[EBP]
 mov EAX, 4[EBP]
 fadd qword ptr [EAX][ECX]
 fstp qword ptr [EAX][ECX]
 wait
 add EBX, 020h
 add EDI, 4
 cmp EDI, 8[EBP]
 jl L124


Example 4: Three ways to code daxpy() for Pentium, from least efficient (a) to
most efficient (c).
 (a) (b) (c)

loop1: fld qword ptr [eax] loop2: fld qword ptr [eax] loop3: fld qword ptr
[eax]
 fmul st,st(1) fmul st,st(1) fmul st,st(1)
 fadd qword ptr [ebx] fadd qword ptr [ebx] fadd qword ptr [ebx]
 fstp qword ptr [ebx] ;next element ;next element
 ;next element fld qword ptr [eax+8] fld qword ptr [eax+8]
 fld qword ptr [eax+8] fmul st,st(2) fmul st,st(2)
 fmul st,st(1) fadd qword ptr [ebx+8] fadd qword ptr [ebx+8]
 fadd qword ptr [ebx+8] ;next element ;next element
 fstp qword ptr [ebx+8] fld qword ptr [eax+16] fld qword ptr [eax+16]
 ;next element fmul st,st(3) fmul st,st(3)
 fld qword ptr [eax+16] fadd qword ptr [ebx+16] fadd qword ptr [ebx+16]
 fmul st,st(1) ;next element ;next element
 fadd qword ptr [ebx+16] fld qword ptr [eax+24] fld qword ptr [eax+24]
 fstp qword ptr [ebx+16] fmul st,st(4) fmul st,st(4)
 ;next element fadd qword ptr [ebx+24] fadd qword ptr [ebx+24]
 fld qword ptr [eax+24] ;store new y[]'s ;store new y[]'s
 fmul st,st(1) fstp qword ptr [ebx+24] fxch st(2) ;!!! fxch !!!
 fadd qword ptr [ebx+24] fstp qword ptr [ebx+16] fstp qword ptr [ebx+8]
 fstp qword ptr [ebx+24] fstp qword ptr [ebx+8] fstp qword ptr [ebx+16]
 add eax,32 fstp qword ptr [ebx] fstp qword ptr [ebx+24]
 add ebx,32 add eax,32 fstp qword ptr [ebx]
 dec ecx add ebx,32 add eax,32
 jnz loop1 dec ecx add ebx,32
 jnz loop2 dec ecx
 jnz loop3

Example 5: (a) FPU for stack for Example 4(a)--at top of loop a is already
loaded onto stack; (b) FPU stack for Example 4(c)--at top of loop a is already
loaded onto stack, skipping several instructions near bottom of loop.
(a)


fld qwptr [eax] fmul st, st(1) fadd qwptr [ebx] fld qwptr [eax]


st0
st1
st2
st3


x[i]
a
a*x[i]
a

a*x[i]+y[i]
a
a
fld qwptr [eax] fmul st, st(1) fadd qwptr [ebx] ///////fld qwptr [eax+8]


st0
st1
st2
st3


x[i]
a
a*x[i]
a
a*x[i]+y[i]
a
x[i+1]
a*x[i]+y[i]
a
fadd qwptr [ebx+24] fxch st(2) fstp qwptr [ebx+8] fstp qwptr [ebx+16]


(b)


st0
st1
st2
st3
st4


a*x[i+3]+y[i+3]
a*x[i+2]+y[i+2]
a*x[i+1]+y[i+1]
a*x[i] + y[i]
a


a*x[i+1]+y[i+1]
a*x[i+2]+y[i+2]
a*x[i+3]+y[i+3]
a*x[i]+y[i]
a


a*x[i+2]+y[i+2]
a*x[i+3]+y[i+3]
a*x[i]+y[i]
a
a*x[i+3]+y[i+3]
a*x[i]+y[i]
a
[LISTING ONE] (Text begins on page 52.)

 .386P
 .model small

.data
ALIGN 4
dstorage DQ 0.0
.code
;******************* ddot() *********************
; double ddot(int n, double *xptr, double *yptr)
; ..forms the dot product of two row vectors..
; RETURNS product in edx:eax

 public _ddot
_ddot proc
 push ebx
 ;---STACK:---
;+-------+------------+-------+--------+--------+
; ebx ret addr n xptr yptr 
;^esp ^esp+4 ^esp+8 ^esp+12 ^esp+16
;-----------------------------------------------+
 mov ecx, dword ptr [esp+8]
 test ecx, ecx ;<= 0 iterations ?
 jle badboy
 mov eax, dword ptr [esp+12] ;eax = xptr
 mov ebx, dword ptr [esp+16] ;ebx = yptr
 fldz ;initialize accumulator..
 ;---determine length of cleanup, main loop iterations---
 mov edx, ecx
 and edx, 3 ;edx is length of cleanup...
 shr ecx, 2 ;loops unrolled by 4, so adjust counter...
 jz cleanup1
 ;=======loop1=======
 loop1: fld qword ptr [eax]
 fmul qword ptr [ebx]
 fadd
 fld qword ptr [eax+8]
 fmul qword ptr [ebx+8]
 fadd
 fld qword ptr [eax+16]
 fmul qword ptr [ebx+16]
 fadd
 fld qword ptr [eax+24]
 fmul qword ptr [ebx+24]
 fadd
 add eax,32
 add ebx,32
 dec ecx ;faster than "loop" on pentium...
 jnz loop1
 ;=====END loop1=====
cleanup1: or edx,edx
 jz store1
 fld qword ptr [eax]
 fmul qword ptr [ebx]
 fadd
 dec edx
 jz store1
 fld qword ptr [eax+8]
 fmul qword ptr [ebx+8]
 fadd
 dec edx
 jz store1
 fld qword ptr [eax+16]

 fmul qword ptr [ebx+16]
 fadd
 ;----store result-------
 store1: fstp dstorage ;Zortech expects to see result in edx:eax...
 fwait ;Needed for 387...
 mov eax, dword ptr dstorage
 mov edx, dword ptr dstorage+4
 ;-------
 pop ebx
 ret
 badboy: xor eax,eax
 xor edx,edx
 pop ebx
 ret
_ddot endp

[LISTING TWO]

;******************* daxpy() ********************
; void daxpy(int n, double *aptr, double *xptr, double *yptr)
; ..forms the sum of a*x[i] + y[i], and stores in y[]..
; RETURNS nothing.
 public _daxpy
_daxpy proc
 push ebx
 ;---STACK:---;
+---------+------------+--------+--------+--------+--------+
; ebx ret addr n aptr xptr yptr 
;^esp ^esp+4 ^esp+8 ^esp+12 ^esp+16 ^esp+20
;----------------------------------------------------------+
 mov ecx, dword ptr [esp+8]
 test ecx, ecx ;<=0 iterations ?
 jle badboy5
 ;---load *aptr onto fp stack
 mov eax, dword ptr [esp+12] ;address of multiplier (aptr)..
 ;---test if *aptr is positive or negative 0.0---
 mov edx, dword ptr [eax+4] ;upper dword of *aptr..
 and edx, 01111111111111111111111111111111B ;mask off sign bit
 or edx, dword ptr [eax]
 jz badboy5
 ;---load *aptr onto stack if not 0.0---
 fld qword ptr [eax] ;multiplier now in ST(0)..
 mov eax, dword ptr [esp+16]
 mov ebx, dword ptr [esp+20]
 ;---determine length of cleanup, main loop iterations---
 mov edx, ecx
 and edx, 3 ;edx is length of cleanup...
 shr ecx, 2 ;loops unrolled by 4, so adjust counter...
 jz cleanup5
 ;=======loop5=======
 loop5: fld qword ptr [eax]
 fmul st,st(1)
 fadd qword ptr [ebx]
 ;---next element---
 fld qword ptr [eax+8]
 fmul st,st(2)
 fadd qword ptr [ebx+8]
 ;---next element---
 fld qword ptr [eax+16]

 fmul st,st(3)
 fadd qword ptr [ebx+16]
 ;---next element---
 fld qword ptr [eax+24]
 fmul st,st(4)
 fadd qword ptr [ebx+24]
 ;---store new y[]'s, clean stack---
 fxch st(2) ; !!! Avoid data collision !!!
 fstp qword ptr [ebx+8]
 fstp qword ptr [ebx+16]
 fstp qword ptr [ebx+24]
 fstp qword ptr [ebx]
 add eax,32
 add ebx,32
 dec ecx ;faster than "loop" on pentium,
 jnz loop5 ;due to instruction pairing...
 ;=====END loop5=====
cleanup5: or edx,edx
 jz stckcln5
 ;---1st cleanup element---
 fld qword ptr [eax]
 fmul st,st(1)
 fadd qword ptr [ebx]
 fstp qword ptr [ebx]
 dec edx
 jz stckcln5
 ;---next element---
 fld qword ptr [eax+8]
 fmul st,st(1)
 fadd qword ptr [ebx+8]
 fstp qword ptr [ebx+8]
 dec edx
 jz stckcln5
 ;---next element---
 fld qword ptr [eax+16]
 fmul st,st(1)
 fadd qword ptr [ebx+16]
 fstp qword ptr [ebx+16]
 ;---------------------
 ;stckcln5: ;must clean *aptr off stack
 fstp st(0)
 badboy5: pop ebx
 ret
_daxpy endp


















May, 1994
RTMK: A Real-Time Microkernel


This portable microkernel is based on the Sceptre standard




J.F. Bortolotti, P. Bernard, and E. Bouchet


The authors are electrical engineers and can be contacted at fax number +41 22
784 3452.


Digital-signal processors such as Texas Instruments' 320C30/40, Motorola's
96000, and Analog Devices' ADSP-210x are typically used as dedicated
coprocessors--number-crunchers hooked to interrupt vectors, for example. In
such configurations, the digital-signal processor (DSP) feeds a
general-purpose CPU with "predigested" data so it can take care of
housekeeping. Because the inevitable resource duplication is both expensive in
cost-sensitive applications and inefficient in design, we needed a way to get
rid of the general-purpose microprocessor in interrupt-driven systems we were
designing.
With interrupt-driven systems, time is intentionally sacrificed as a safety
margin while the CPU waits (via a loop) for the next interrupt. However, the
time spent in a background loop with high-performance DSPs is typically 10 to
20 percent of the total CPU time--power that's often more than the total power
of the general-purpose CPU. In designing our interrupt-driven system, we
decided to utilize this power, instead of resorting to another processor. The
only element we lacked was a simple method of scheduling the various
low-frequency processes (compared to the interrupt processing) that run on the
system. We needed a real-time kernel that would provide a fast context switch
(to spend as little time in the kernel as possible), a simple interface to a
standard programming language, interruptibility of the kernel to allow
high-priority immediate processes hooked to an interrupt, a standardized
approach to kernel services, and a high level of portability.
Since we weren't aware of any commercial software that could deliver total
control over the interrupt state of the processor (and because a fast
interrupt was time critical in our application), we ended up designing RTMK,
the real-time kernel presented in this article.
The source code for both DSP and PC implementations of the kernel is provided
with this article. Listings One through Three are the PC implementation.
Listing One (page 105) is the RTMK.H header file, Listing Two (page 105) is
the RTMK.C source file, and Listing Three (page 106) is TEST.C, the test
program. The source code for the DSP version (specifically, the TMS320C30) is
available electronically; see "Availability," page 3. (Executables for the PC
version are also available electronically.) The PC implementation and test
programs have been compiled with Borland C 3.1 but should compile as-is with
Microsoft C. The DSP version has been compiled with the TI C compiler (Version
4.5) for TMS320C30.


RTMK and the Sceptre Standard


In 1980, several European software houses, tired of waiting for the ideal
"real-time" language, proposed a simple but realistic set of commands for a
real-time kernel; see "Sceptre," BNI Y (the French computing-standards bureau)
September 1982. Our real-time kernel is based on this standard, although it
supports only a subset of the original Sceptre commands.
The authors of Sceptre defined a set of primitives that is portable only by
its specification. This means that once you've ported the kernel to a new
target system, all applications that require specific real-time services are
portable to this new target, as well. Services standardized by the Sceptre
kernel are listed in Table 1. Note that unlike other kernels, Sceptre events
are associated with tasks because events are an elementary mechanism that does
not require a system-event queue handled by the kernel. This minimizes the
addressing needed to identify which task to schedule and suppresses the need
to clock a kernel function at a submultiple frequency of the fastest task to
properly schedule the tasks in the system.
Remember that a real-time application is composed of programs under the
control of an executive system. Any program execution is achieved through an
active agent called a "task/process." The mechanisms that control the
execution of these tasks are part of the executive. The executive has to hide
specific properties of the target system by turning them into logical
properties, interface between different parts of an application (possibly
written in different languages), handle communications, and manage hardware
resources (files, memory, and so on).
Since most high-level languages already provide parts of this executive as
elements of their library, it is only necessary to design the kernel--a set of
operations and data structures that can be used through the procedural calling
mechanism of a high-level language.
The entities of the Sceptre standard include:
Tasks/processes. The agent responsible for the execution of a program. A task
has a name and attributes. A task context is the minimum set of information
necessary for the target processor to execute the task instructions. A stored
context is a memory area where the execution context of the task is saved when
the task is not currently executing.
Immediate Tasks. A task ensuring the interface between the real-time
application and the environment. Immediate tasks are often handled via the
interruption mechanism. In RTMK, the immediate process is the
interrupt-service routine.
Scheduler. A software or hardwired module in charge of handling the CPU power
of a set of processors and dispatching it to a set of tasks. Generally, the
scheduler of the immediate task is the interrupt mechanism of the processor,
while the scheduler of standard (or differed) tasks is part of the kernel.
Events. The event object stores a signal, which means that a condition became
True. In the Sceptre standard, events are associated with specific tasks. This
provides an efficient signalling process with low overhead.
Region. The region object is purely a mark of CPU ownership. Context switches
through the scheduler will not occur until a region is exited.
Queue. The queue is a communication object between tasks that behaves like
FIFO memory.
When designing a kernel, it's important to choose a well-behaved compiler that
permits context switches with a limited context save. It's also important to
select a hardware platform with a stack-oriented architecture and no
stack-size limitations. Although PC programmers are familiar with this, DSP
developers who work in environments where architectures are optimized for
branching speed--thus having limited stack or no stack pointer--may be
unfamiliar with this. The compiler itself is important, and its behavior must
be carefully analyzed to define the appropriate context, which will be saved
in the processes descriptor.


Process Descriptors


In any multitasking kernel, the CPU has to be allocated to different
processes, even when they are not finished. During process changes (or
"context switches"), we preserve the environment the process was executing in
(the "process context"), while reinstating the next scheduled process.
A context switch consists of saving the current context, selecting the process
that will get processor time, and restoring the context of this new process.
Consequently, the context must contain all the necessary data to maintain the
integrity of the CPU and compiled code for the process. The area where all
this data is stored is a "process descriptor."
The RTMK PC process descriptor is shown in Example 1. The task priority is
handled by the structure's priority and pmask fields. The second item is
simply a mask that is precalculated during task definition in order to
accelerate the context switches by avoiding unnecessary left shifts at run
time. The expected_signals field is a mask. If bit n is set in this mask, then
signal n can activate the process. The received_signals is another 32-bit
field in which each bit set means that the corresponding signal has been sent
to the process. Another element of the context is the microprocessor context
that was saved (in the form of a jmp_buf buffer set by the setjump function in
the PC implementation).
In the original Sceptre implementation of a real-time application, each
process is associated with a set of signals. Each signal can be arrived or
not_arrived. The process specifies its own activation condition through these
signals.
A process can be in one of the following states:
Waiting. The process is suspended until one of its signals has arrived.
Ready. The process has received a signal and will continue execution when the
kernel allocates it CPU time.
In progress. The process is currently running.
Sending or receiving signals is done through kernel services by the process.
In these primitives, the signals that have arrived are checked against those
expected. If a match occurs, the CPU can be reallocated to another process,
depending on its priority. This mechanism means that the process-ready list
can only be updated when a process sends or waits for a signal. Consequently,
a task switch from the kernel's point of view will only occur during the
execution of the signal and wait primitives.
In RTMK, we've implemented only the basic services. The signals are sent
directly to the task, not (as in many other operating systems) to the kernel.
This means RTMK signal handling is much easier and faster, thanks to its
Sceptre characteristics.


Process-Handling Services


RTMK provides a number of process-handling services, including:
osend(PROCESS,SIGNALS). Sets the bits contained in SIGNALS with the
corresponding bits in the received_signals field of the processor decriptor
and calls the scheduler. When executing, this primitive CPU ownership can be
lost if the signals being sent activate a higher-priority task.

owait(SIGNALS). Sets the process descriptor expected_signals field and calls
the scheduler, which usually suspends the process until the notified signals
arrive. This service returns the list of received signals. The return value
distinguishes between the signals that triggered the process, since two
signals can be active at the same time.
oreset(SIGNALS). Clears the bits corresponding to SIGNALS in the process
descriptor arrived_signals field in order to permit another cycle. By clearing
these bits just before the next call to the wait service, we can ignore all
the events that occurred during the processing (though it is not common
practice to do so).
oarrived_signals(void). Returns the field arrived_signals to the caller.
In each of these calls, the signal is a mask containing the bit corresponding
to a given signal set. Consequently, multiple signals can be sent to a given
task at the same time. The tokens ALL_SIGNALS and ANY_SIGNAL improve the
readability of the code (see Example 2).
A process is simply defined by allocating a Process Control Structure (PCS)
and calling the create_process function with two arguments: a pointer to the
PCS and the address of the process's entry point (again, see Example 2).


An Example


In Example 2, the interrupt handler signals events directly to two processes.
Process1 is a short routine that writes a character directly on the PC screen.
Process2 is a long process that spends a lot of time in a useless loop and
prints dots on the screen through the standard I/O library.
Process1 is activated 200 times when Process2 is activated once. The long
calculation times of Process2 demonstrate that the direct screen writes are
continuing when Process2 is active, which shows that the CPU has been
temporarily reallocated to Process1 during the execution of Process2. The CPU
has not been directly stolen by the kernel, but by the interrupt (an
"immediate process" in Sceptre terminology) that is sending signals, thus
giving control to the kernel.


Performance


Context-switch performance is critical if the time spent in the kernel is to
be minimized. We've measured it at 11 ms with a full C version on a Texas
Instruments TMS320C30 clocked at 32 MHz. This time drops to 5.5 ms with a
scheduler written in assembler. The PC version has not been timed.
Table 1: Sceptre kernel services.
 Class Operation Parameters Action

 Task Handling Start Task Starts the execution of the task.
 Stop Task Stops the execution of the task.
 Continues Task Resumes the execution of the task.
 Terminate Terminates the task that calls this service.
 Change priority Task,priority Gives task the new priority.
 State Task Returns task state.
 Priority Task Returns the task priority.
 Current task Returns the id of currently executing task.
 Signaling Send Event,task Sends an event to a task.
 Wait Event list Waits until one of the events has arrived.
 Arrived Event list True if all the events have arrived.
 Clear Event list Resets all events in list to nonarrived state.
 Communication Put Element,queue Places the element into the queue.
 Get Element,queue Gets the element from the queue.
 Empty Queue True if the queue is empty.
 Full Queue True if the queue is full.
 Mutual-exclusion Lock Region Requests exclusive property of the region.
 Unlock Region Releases the region.
 Task States Nonexisting No descriptor associated to task (not created).
 Existing A descriptor has been defined for the task.
 Nonexecutable Task has descriptor but can't start execution.
 Executable Task has descriptor and can start execution.
 Not in service Task is executable but execution has not yet been started or
is finished.
 In Service Task is executable and has started execution but has not finished.
 Waiting Task is in service and waiting for a condition to become True to
continue execution.
 Active Task is in service and not waiting for anything except a free
processor to execute.
 Ready Task is active and only waiting for processor.
 In Progress Task is active and currently executing on a processor.


Example 1: RTMK process descriptor (PC version).
typedef unsigned long word32; /* Double Word for 32 process max kernel */
typedef word32 SIGNALS; /* Signal type */
struct PCS {
 jmp_buf context; /* CPU Context (registers) */
 SIGNALS expected_signals;
 SIGNALS received_signals;
 word32 pmask;

 word32 priority;
 };

typedef struct PCS *PROCESS; /* pointer of PROCESS CONTEXT STRUCTURE */


Example 2: Defining a process.
#include "RTMK.H"
#include <stdio.h>
#include <conio.h>
#include <dos.h>
#include <signal.h>

#define IT 0x1C
#define VIDEO_RAM 0xB8000000

PROCESS p1,p2;

int i,j;
char far* p=(char far*)VIDEO_RAM+1;

void interrupt (*old_vector)();
void interrupt clock_it()
{
 outp(0x20,0x20);
 i+=1;
 if(i==200){
 i=0;
 send(p2,1);
 }
 else send(p1,1);
}
far process1()
{
 while(1) {
 p++;
 *p++=0x31;
 if(p>(char far *)VIDEO_RAM+25*80*2)
p=(char far* )VIDEO_RAM+1;
 wait(ANY_SIGNAL);
 reset(ALL_SIGNALS);
 }
}
far process2()
{static long n;
 enable();
 while(1) {
 printf("process 2:waiting\t");
 wait(1);
 printf("process 2 :reseting signals\t");
 reset(1);
 printf("process 2:calculating");
 for(j=0;j<60;j+=1){
 for(n=0;n<100000;n+=1);
 printf(".");
 }
 printf("calculation terminated ");
 }
}

jmp_buf sys_context;
void terminate()
{
 longjmp(sys_context,1);
}
void main() {
 clrscr();
 create_process(&p1,process1);
 create_process(&p2,process2);
 old_vector=getvect(IT);
 disable();
 signal(SIGINT,terminate);
 setvect(IT,clock_it);
 if(!setjmp(sys_context)){
 run_kernel();
 }
 setvect(IT,old_vector);
}


[LISTING ONE]
/* functions' prototypes */
#include "rtmktype.h" /* declaration of system's types */

#ifndef RTMK_H
#define RTMK_H

void create_process(PROCESS *,void far *);
void send(PROCESS,SIGNALS);

SIGNALS wait(SIGNALS);
SIGNALS reset(SIGNALS);
SIGNALS arrived_signals(void);
SIGNALS process_state(PROCESS);

#define ANY_SIGNAL 0xffffffff
#define ALL_SIGNALS 0xffffffff

int run_kernel(void);

#endif /* RTMK_H */

[LISTING TWO]

/* RTMK.C Real Time Micro Kernel */

#include "RTMKTYPE.h" /* RTMK types' definitions */
#include <dos.h> /* include for context and interrupts management */

#define NULL 0
#define PROCESS_STACK_SIZE 500 /* Stack size for each process */

unsigned _stklen=20; /* minimal stack needed to start the kernel */

/********************* System's variables ****************/
struct PCS pcs_tab[32]; /* Process Context Structure table */

unsigned stack[32*PROCESS_STACK_SIZE]; /* stack table for all the process */
unsigned nbr_process; /* number of process declared */


PROCESS current_process; /* pointer on current process PCS */
word32 ready_process; /* bit map list of ready process */

/************************************************************************/
/* create_process: declares the process where p is the identifier for */
/* the kernel and entry_point is the address of the process's code */
/* the context of the process is initialized. */
/************************************************************************/

void create_process(PROCESS *process_id,void far *entry_point())
{
 if (nbr_process<32){ /* 32 is the maximun number of process */
 *process_id=pcs_tab+nbr_process;
 (pcs_tab[nbr_process].context)->j_ip=FP_OFF(entry_point);
 (pcs_tab[nbr_process].context)->j_cs=FP_SEG(entry_point);
 (pcs_tab[nbr_process].context)->j_flag=0;
 /* reset flag register to disable interrupts */
 /* process stack */
 (pcs_tab[nbr_process].context)->j_sp=
 (unsigned)stack+PROCESS_STACK_SIZE*(32-nbr_process);
 (pcs_tab[nbr_process].context)->j_bp=(pcs_tab[nbr_process].context)->j_sp; /*
bp=sp (stack) */
 (pcs_tab[nbr_process].context)->j_ds=FP_SEG((void far *)stack);
 (pcs_tab[nbr_process].context)->j_ss=FP_SEG((void far *)stack);
 nbr_process+=1;
 }
}
/****************************************************************************/
/* scheduler: the context of the current process is saved and the system */
/* switch to the ready process. If next_process=NULL the higher priority */
/* ready process is searched, else the process is the ready process. */
/****************************************************************************/
void scheduler(PROCESS next_process)
{word32 n,i; /* i and n loop variables */

/* saves the context of current process */
 if (setjmp(current_process->context)==0){
 if (next_process)
 current_process=next_process; /* scheduled is one in next_process */
 else { /* determine the next_process */
 n=0;
 i=0x80000000;
 while (!(i&ready_process)) {
 n+=1;
 i>>=1;
 }
 current_process=pcs_tab+n; /* scheduled process is elected process */
 }
 longjmp(current_process->context,1); /* switch to the scheduled process */
 }
}
/************************************************************************/
/* SIGNALS MANAGEMENT: send(process,signals_mask), wait(signals_mask), */
/* reset(signals_mask), arrived_signals(), process_state(process) */
/************************************************************************/
/* send: send to process signals that are on (1) in signals_mask */
/************************************************************************/
void send(process,signals_mask)
PROCESS process;

SIGNALS signals_mask;
{
 process->received_signals=signals_mask; /* update arrived signals */
 if (process->received_signals&process->expected_signals){
 /* if the process is waiting for the signals */
 ready_process=process->pmask; /* put the process ready */
 process->expected_signals=0; /* reset expected signals */
 if (current_process->priority<process->priority){
 /* process's priority level is higher than current_process priority */
 scheduler(process); /* switch to process directly */
 }
 }
}
/*****************************************************************************/
/* wait: puts current process in wait for signals_mask return arrived
signasl*/
/*****************************************************************************/
SIGNALS wait(signals_mask)
SIGNALS signals_mask;
{
 if (!(current_process->received_signals&signals_mask)){
 /* if signals in signals_mask are not arrived */
 /* update expected_signals */
 current_process->expected_signals=signals_mask;
 ready_process^=current_process->pmask; /* turn process not ready */
 scheduler(NULL); /* switch to next process */
 }
 return (current_process->received_signals); /* returns arrived signals */
}
/************************************************************************/
/* reset: puts signals that are on in signals_mask to zero(not arrived) */
/* returns signals that were arrived */
/************************************************************************/
SIGNALS reset(signals_mask)
SIGNALS signals_mask;
{SIGNALS old_signals;
 old_signals=current_process->received_signals; /* saves arrived signals */
 /* reset signals_mask */
 current_process->received_signals=
 current_process->received_signals&~signals_mask;
 return (old_signals); /* returns arrived signals before reset */
}
/************************************************************************/
/* arrived_signals: returns arrived signals of the current process */
/************************************************************************/
SIGNALS arrived_signals()
{
 return (current_process->received_signals); /* returns arrived signals */
}
/************************************************************************/
/* process_state: returns expected signals of process */
/************************************************************************/
SIGNALS process_state(process)
PROCESS process;
{
 return (process->expected_signals); /* returns expected signals */
}
/************************************************************************/
/* run_kernel: initialize the kernel's variables and switch to the */
/* first process, the last loop is the system idle process. */

/************************************************************************/
word32 free_time; /* time not used by process */
int run_kernel()
{int i; /* loop variable */
 word32 current_process_mask; /* manage the process mask */
 PROCESS pcs_ptr; /* pointer on process contect structure */
 disable(); /* disable interrupts */
 /* initialization of process context structures */
 current_process_mask=0x80000000;
 ready_process=0;
 for(i=0;i<=nbr_process;i++){ /* for each process initialize pcs */
 pcs_ptr=pcs_tab+i; /* point to the pcs in the pcs table */
 pcs_ptr->received_signals=0; /* no events arrived */
 pcs_ptr->expected_signals=0; /* no events expected */
 pcs_ptr->pmask=current_process_mask; /* put the process mask */
 pcs_ptr->priority=nbr_process-i; /* set the priority */
 /* the process is now ready to take the CPU */
 ready_process=current_process_mask;
 /* current_process_mask for the next process */
 current_process_mask=current_process_mask>>1;
 }
 current_process=pcs_tab+nbr_process; /* current process is idle process */
 free_time=0; /* reset free_time */
 scheduler(pcs_tab); /* switch to the higher priority process */
 enable(); /* enable interrupts in idle process */
 for(;;) /* loop forever : idle process */
 free_time+=1; /* one for each loop */
}

[LISTING THREE]

#include "RTMK.H"
#include <stdio.h>
#include <conio.h>
#include <dos.h>
#include <signal.h>

#define IT 0x1C
#define VIDEO_RAM 0xB8000000

PROCESS p1,p2;
int i,j;
char far* p=(char far*)VIDEO_RAM+1;

void interrupt (*old_vector)();
void interrupt clock_it()
{
 outp(0x20,0x20);
 i+=1;
 if(i==200){
 i=0;
 send(p2,1);
 }
 else send(p1,1);
}
far process1()
{
 while(1) {


 p++;
 *p++=0x31;
 if(p>(char far *)VIDEO_RAM+25*80*2) p=(char far* )VIDEO_RAM+1;
 wait(ANY_SIGNAL);
 reset(ALL_SIGNALS);
 }
}
far process2()
{static long n;
 enable();
 while(1) {
 printf("process 2:waiting\t");
 wait(1);
 printf("process 2 :reseting signals\t");
 reset(1);
 printf("process 2:calculating");
 for(j=0;j<60;j+=1){
 for(n=0;n<100000;n+=1);
 printf(".");
 }
 printf("calculation terminated ");
 }
}
jmp_buf sys_context;
void terminate()
{
 longjmp(sys_context,1);
}
void main() {
 clrscr();
 create_process(&p1,process1);
 create_process(&p2,process2);

 old_vector=getvect(IT);
 disable();
 signal(SIGINT,terminate);
 setvect(IT,clock_it);
 if(!setjmp(sys_context)){
 run_kernel();
 }
 setvect(IT,old_vector);
}
End Listings



















May, 1994
OS/2 and UnixWare Interprocess Communication


Searching for common IPC ground




John Rodley


John is an independent consultant in Cambridge, MA. He can be contacted on
CompuServe at 72607,3142.


Multitasking operating systems such as UNIX and OS/2 bring interprocess
communication (IPC) explicitly into the realm of applications programming. As
a UNIX or OS/2 developer, you can design a single system with dozens of
independent processes (or threads) if that's what you want. However, to make a
coherent system, these processes usually have to pass information back and
forth, and that's where IPC comes in.
An applications programmer looking at the APIs of the two systems would be
hard-pressed to see similarities. However, the IPC models are similar, despite
greatly differing implementations. In this article, I'll look at IPC under
IBM's OS/2 2.1 and Novell's UnixWare 1.1 and see what common ground exists.
In designing a multiprocessing (or multithreaded) system, there are a number
of IPC issues you need to think about. The first one is, how are the processes
going to identify the shared resource? Since related processes (child
processes and threads) share a common set of open file handles with their
parent, they don't have to worry about resource naming. The parent can create
any number of anonymous resources (pipes, shared memory, and so on) and both
parent and child can refer to them simply by handle.
Unrelated processes don't enjoy the luxury of sharing open handles and need to
have another way to identify shared resources. Under OS/2, resources must have
unique, fully qualified filenames. Each resource type uses a common base
filename; for semaphores, this is \SEM32; for queues, \QUEUES; for shared
memory, \SHAREMEM; and for pipes, \PIPE.
Under UNIX, queues, semaphores, and shared memory share a single naming
scheme. These resources have an integer id that is typically created by a call
to ftok, which uses an agreed-upon filename and a char seed to create a
consistent integer id. Listing One (page 107) is a typical use of ftok to
create a resource ID.
With the resource named, the processes must then agree on who owns the
resource--who's responsible for creating and destroying it. System-wide
resources (those that unrelated processes can access) generally have to be
explicitly destroyed. That means that if your process disappears without
destroying the resource, the resource will still exist the next time the
process comes up, and the create call will fail. Take it from me, processes
disappear mysteriously a lot more often than anyone is willing to admit. Local
resources, like anonymous pipes, live and die with the process that create
them. In general, you have to put some thought into what is needed if a
process fails to properly dispose of an unwanted resource.
Another important issue in any IPC situation is "atomicity." (I write atomic
code all the time--it blows up spectacularly.) In this case, atomicity refers
to whether or not a call can be task switched. For our purposes, a call is
said to be atomic if it is guaranteed not to be task switched while it is
being performed. Consider the case of a bunch of OS/2 processes trying to gain
ownership of a resource protected by a single, mutual-exclusion semaphore.
Each process makes a call to DosRequestMutexSem and blocks waiting for the
semaphore to clear. The current owner clears the semaphore. Only one of the
waiting processes can now become the new owner. If DosRequestMutexSem were not
atomic, you could get two or more processes waking up thinking they owned the
semaphore.
The last and perhaps most-important consideration is what mode you're going to
run the IPC channel in--blocking or nonblocking. In blocking mode, your
process sleeps until the call (read or write) can be satisfied. In nonblocking
mode, the call returns an error if it can't be satisfied. If, like the window
procedure in a Windows program, a process is solely concerned with processing
the message coming in over the IPC channel, it's simple, safe, and very
effective to use blocking I/O. Most real-world programs, though, can't afford
to block waiting for I/O. Usually, you'll have to split a program into
multiple related processes to use blocking I/O. A common OS/2 technique is to
spawn a thread for each blocking I/O channel and have each thread use
nonblocking I/O to get the information back to the main thread.


Shared Memory and Semaphores


There are roughly four types of IPC: shared memory, semaphores, pipes, and
queues. Shared memory is the simplest, most intuitive type of IPC. With shared
memory, multiple processes have access to a single block of memory. Shared
memory is particularly effective where the size of the data being communicated
is small and fixed. The difference between shared memory and the message-style
IPC APIs (pipes and queues) is the difference between a whiteboard and a pad
of post-it notes:
You can only fit so much on the whiteboard, but post-its go on forever.
The information on the whiteboard has no particular order, while the post-its
are always stuck to your terminal in either chronological or priority order.
Consider an example of an uninterruptible power supply (UPS). A typical smart
UPS keeps track of a couple of dozen variables (line voltage, load, and the
like), which it transmits to the system every couple of seconds over the
serial port. The UPS control program might consist of two processes: a daemon
that reads the serial port and updates the variables in the shared-memory
block and a user interface (UI) that displays the variables it reads in the
shared-memory block. The user is only interested in the current value of any
variable, so the UI doesn't need to try to display every change of value and
thus has no need to synchronize with the daemon. In this case, the
shared-memory block is used like the outfield scoreboard at a baseball game,
simply transmitting the scores to anyone who wants to read them.
This is a simple example, but it presents two potentially nasty atomicity
problems. The first is in initialization. The daemon process creates the
shared-memory block, and the UI process attaches to it. There is a small
window of time after the daemon makes the block when the values in the block
will be uninitialized. In that time window, the daemon could be task-switched,
and the UI could attach to the block and read the bogus values. Figure 1
illustrates the problem. The second situation occurs whenever the UI reads or
the daemon writes the block. Assume that one of our UPS values is a string.
The daemon could be strcpying our string to the shared-memory block when the
OS task switches to the UI, which then tries to use the half-completed string.
Not a good situation. The same thing can be said in reverse; if the UI is
halfway through strcpying the string to video memory and the daemon overwrites
the string, you also get inconsistent data.
This is where semaphores come in. With the UPS example, a single semaphore
could guard the block. On startup, the daemon would create the semaphore and
set it. Then it would create and initialize the shared-memory block. Only
after initialization would the daemon clear the semaphore. The UI would attach
to the semaphore and always block, waiting for it to clear before reading the
shared-memory block. Since the daemon would always set the sema-phore before
beginning to write the block, the UI is guaranteed to never read a
half-complete value. Figure 2 shows how semaphore protection might work in
this case.
Table 1 and Table 2 show the OS/2 and UnixWare shared-memory and semaphore
APIs. As usual, under OS/2 there's a separate call for every step, while UNIX
lumps a bunch of functionality into a catchall, ioctl-style shmctl call. OS/2
goes even further, by dividing semaphores into two types: mutual exclusion
(such as the one in the previous example) and event (such as one that
indicates that a print job has finished). UNIX also operates by default on a
set of semaphores, making you jump through hoops to implement a single
semaphore, while OS/2 assumes a single semaphore and makes you use different
calls to operate on sets. I tend to agree with R.W. Stevens: In his book,
Advanced Programming in the UNIX Environment (Addison-Wesley, 1992) he calls
the SVR4 semaphore implementation "unnecessarily complicated." Stevens goes so
far as to suggest using file-record locking for simple semaphores. Listings
Two and Three (page 107) show OS/2 and UnixWare implementations of a
semaphore-protected, shared-memory initialization sequence.


Anonymous Pipes


Everyone knows that real comm programmers send messages, and if it's messages
you're after, you have only two choices--pipes and queues. A pipe is a FIFO,
and it comes in two flavors, named and anonymous, with two sizes, full and
half duplex. The original pipe was a UNIX invention, and it came in only one
flavor and size: anonymous, half-duplex. If you wanted full-duplex, you had to
buy two half-duplexes. A call to pipe yielded one read and one write handle.
The reading process used the read handle; the writing process, the write
handle. If Process A wrote a message using the write handle, then read a
message using the read handle, it would get back the message it just wrote.
Figure 3 is an example of a half-duplex pipe.
OS/2 anonymous pipes are always half-duplex. For Process A to talk to Process
B via an anonymous pipe, Process A creates a pipe, getting a read and a write
handle, then spawns Process (or thread) B, which inherits both of those
handles. Process A then closes the read handle, while Process B closes the
write handle. This leaves A with a valid write handle and B with a valid read
handle. Closing the useless handle ensures that a process gets notified when
the process on the other end of the pipe goes away. If B wants to talk to A,
they have to open another pipe.
A full-duplex pipe is a two-way channel that makes the name "pipe" a bit of a
misnomer because the content of a real-world pipe (like water) is always
pumped in one direction at a time: half-duplex. In the UNIX world, half-duplex
pipes were judged to be less than completely elegant, so as of SVR3.2, all
pipes are full duplex (also known as "anonymous-stream pipes"). A call to pipe
gives a handle to each end of a full-duplex pipe. Process A reads from and
writes to the first handle, while Process B reads from and writes to the
second handle. So a pipe shared by Processes A and B is more properly thought
of as two pipes, one from A to B and one from B to A. Figure 4 is an example
of a full-duplex pipe.
In service, a pipe is sort of a cross between a serial port and a file. Like a
file, you can generally choose how you want to read the data, binary or
translated, whether you want a byte at a time or CR-LF delimited lines. Like a
serial port, you can never read what you write (unless you're abusing a
half-duplex pipe), and read data always comes at you in FIFO order. You can
open it as a byte-stream (raw mode) or as a message stream (cooked mode). You
can also end up blocking on the open if there's no process on the other end,
on the read if there's no data available, on the write if the pipe's full, and
on the close if there's still data flowing.


Named Pipes


Named pipes are exactly that: IPC channels that are referred to by name and
can thus have unrelated processes on either end. OS/2 has a single, fairly
simple named-pipe API that provides full-duplex IPC channels.
On the other hand, if you plow into the UnixWare doc looking for named pipes,
you will be disappointed. UnixWare actually has two forms of named pipe: FIFOs
and named-stream pipes. With FIFOs, data always gets read in the order in
which it was written. But FIFOs are also files in that they actually exist
within the file system and can be seen with a simple directory listing. The
file I/O functions (open, close, read, write, and so on) work on them. You
create a FIFO by calling mkfifo or using the creat system call with the
FIFO_MODE bit set. FIFOs have one big drawback--they are only half-duplex.
If you want full-duplex named pipes, you need to use named-stream pipes.
Named-stream pipes are even more difficult to find in the doc than FIFOs (try
streams(7), or just read Stevens). This is mostly because they're implemented
by combining two other facilities: FIFOs and anonymous-stream pipes. On the
server end, you create a FIFO and an anonymous-stream pipe, push the streams
module connld onto the stream, then use fattach to attach the FIFO name to the
anonymous-stream pipe. Voil! The anonymous-stream pipe now has a name. The
tricky part is that the file descriptor the server uses for reading and
writing is neither the original anonymous-stream pipe descriptor nor one
gotten from opening the FIFO, but one received via the ioctl call I_RECVFD.
The client end is easy: Simply open the FIFO and read and write it as if it
were an anonymous-stream pipe. If it all sounds a little convoluted, well,
it's not that bad. Nmpipe.c is a named-pipes demo program (available
electronically) for a client and server exchanging messages over a
named-stream pipe. Table 3 and Table 4 list the calls in the OS/2 and UnixWare
pipe APIs.
If you need to send a byte stream, named or anonymous pipes are the only IPC
choices you have. This is why the classic pipe examples always involve duping
standard input and output. In fact, a good application for named pipes is in a
multiuser game, where the users all sit there bashing the arrow keys to move
their game pieces around and shoot at each other. There's a single, global
game process and a UI process for each user. Each UI opens a pipe to the game
process. The UI sends the user's keystrokes to the game process over the pipe
and receives all the other user's moves from the game process over the same
pipe. (Someone at Microsoft wrote a great, multiuser tank game for LAN Manager
that used networked named pipes in this fashion.)


The Other End



Unlike all the other forms of IPC, pipes are true process-to-process
connections, so they pay close attention to the status of the processes on
both ends of the conversation. Depending on what the processes are doing,
something happening on one end of the pipe can result in a signal (SIG_PIPE),
an error return, or blocking on the other end.


Queues, Multiplexing, and More


Now that Windows has conquered the world, message queues are a painfully
familiar topic, but the Windows message queue is actually a good example of
when to use a queue:
All the events have to be received (forget about mouse moves for now). It
wouldn't do to miss any mouse clicks. In the UPS example, all the data-value
changes didn't have to be processed by the UI.
The order of events must be maintained. When you double click on an icon, the
double-click message has to come after the mouse move that took the mouse to
the icon. In the UPS example, the user doesn't care if the value for line
voltage was received before the value for load.
The event can be described with a small amount of data of fixed size. Most
Windows messages try to put the event data into the fixed-size message. When
the data won't fit, Windows uses a pointer.
There is an important difference between OS/2 and UNIX queues. UNIX queues can
hold a configurable maximum total number of bytes in the queue, so you can
copy variably sized messages directly into the queue, while OS/2 queues
operate strictly on 32-bit data values. Big deal, you say; just use pointers.
Pointers to what, though? Say Processes A and B are unrelated and trying to
communicate 10-byte messages via a queue. A passes B a pointer to a 10-byte
section of memory. Unless that pointer points to shared memory, B will GP
fault trying to read it. QSHMEM.C (available electronically) is an OS/2
program that uses queues and shared memory to pass greater than 32-bit
messages.
OS/2 queues allow you to choose either FIFO, LIFO, or priority ordering of
elements. You set the read order when you create the queue, and it can't be
changed. UNIX approaches read ordering a little differently. You create the
queue without specifying a read order. You specify (and can change) the read
order with each call to read. You can read either the first message or the
first message that has a particular integer id attached to it. By using the
target process's pid as the integer id, you can then use the queue as a
two-way, full-duplex FIFO channel. Table 5 and Table 6 show the UNIX and OS/2
queue APIs.
In many cases, you'll want to set up a single process that serves a number of
blocking IPC channels. UnixWare includes the ability to operate on multiple
I/O handles (not just pipes) via the poll and select calls. Under OS/2, you
have to attach a semaphore to each pipe or queue, then create a multiplexed
semaphore containing all the individual semaphores. QSHNMEM.C uses OS/2
multiplexed semaphores to implement a single process servicing multiple
queues.
UNIX strives to preserve a consistent approach to all system resources,
whether they be files, devices, or IPC channels. Thus, with all four forms of
UnixWare IPC, you have to pay careful attention to the read/write permissions
(execute mostly doesn't apply) for owner, group, and public the same way you
do for files. Along with permissions you also have to be aware of the
effective user id of the processes. In OS/2, permissions are generally binary:
Other processes either can access the resource or they can't, with no gray
areas.
I've ignored signals to this point. Under UnixWare, they provide a
significantly different IPC model whereby you can register a function to be
called when an event occurs, then "send" that event from within an unrelated
process. They suffer from serious restrictions, however. No information is
passed with the signal, and the number of user-defined signals is severely
limited. Also, if a process is in a blocking system call when a signal
arrives, the system call will generally return an error that must be
explicitly ignored. OS/2 uses the signaling model for handling exceptions like
Ctrl-C, but does not provide for user-defined signals.
Under UnixWare and OS/2, several other forms of IPC are also available. These
include OLE, DDE (which is built on shared memory, anyway), and the messaging
APIs of the Motif and Presentation Manager GUIs. Both also support
network-socket APIs; UnixWare, of course, comes bundled with the NetWare
networking APIs.


Conclusion


Like almost everything else, IPC is not portable between OS/2 and UnixWare.
However, though the implementation details differ greatly, the two systems do
share certain ways of thinking about IPC. They try to cover the same
functionality, and almost any style of IPC you implement in one can usually be
replicated in the other.
Figure 1: Unprotected shared-memory initialization.
 Time slice Daemon process User-interface process

 1 -- Loop waiting for shared memory to appear.
 2 Create uninitialized shared memory. --
 3 -- Read shared memory.
 4 Initialize shared memory. --
 5 -- Display bogus values from uninitialized
 shared memory.



Figure 2: Shared-memory initialization protected by a semaphore.
 Time Daemon process User-interface process slice
 1 -- Loop waiting for semaphore to appear.
 2 Create and set semaphore. --
 3 -- Block waiting for semaphore to clear.
 4 Create uninitialized shared memory. --
 5 -- Block_
 6 Initialize shared memory. --
 7 -- Block_
 8 Clear semaphore. --
 9 -- Unblock now that semaphore is clear;
 read shared-memory block and display
 initialized values.


Figure 3: A half-duplex pipe.
After call:
 pipe(fd);
fd(1)= fd(0)=
write handle read handle
write handle ----> read handle







Figure 4: A full-duplex pipe.
After call:
 pipe(fd);

fd(0)=server read/write handle
fd(1)=client read/write handle

 Server Client
 write/write -----> read/write
 handle <----- handle



ƨ


Table 1: (a) OS/2 shared-memory API; (b) OS/2 semaphore API.
 Function Description

(a) DosAllocSharedMem Create shared memory.
DosFreeMem Destroy shared memory.
DosGetNamedSharedMem Attach to named shared memory.
DosGetSharedMem Attach to unnamed shared memory.
DosGiveSharedMem Give access to unnamed shared memory.

(b) DosCloseEventSem Destroy event semaphore.
DosCreateEventSem Create event semaphore.
DosOpenEventSem Attach to event semaphore.
DosPostEventSem Trigger an event.
DosQueryEventSem Query event sem status.
DosResetEventSem Clear an event from semaphore.
DosWaitEventSem Block waiting for event trigger.
DosCloseMutexSem Destroy a mutual exclusion semaphore.
DosCreateMutexSem Create mutex semaphore.
DosOpenMutexSem Attach to mutex semaphore.
DosQueryMutexSem Check status of mutex semaphore.
DosReleaseMutexSem Clear mutex semaphore.
DosRequestMutexSem Wait for semaphore to clear, then set it.
DosAddMuxWaitSem Add a semaphore to semaphore set on the fly.
DosCloseMuxWaitSem Destroy the semaphore set.
DosCreateMuxWaitSem Create semaphore set.
DosDeleteMuxWaitSem Delete a semaphore from semaphore set on the fly.
DosOpenMuxWaitSem Attach to existing semaphore set.
DosQueryMuxWaitSem Check semaphore's status.

Table 2: (a) UnixWare shared-memory API; (b) UnixWare semaphore API.
 Function Description

(a) shmget Create shared memory.
 shmctl Destroy shared memory.
 shmat Attach to existing shared memory.
 shmdt Detach from shared memory.

(b) semget Create a semaphore set.
 semctl Destroy a semaphore set, get and set semaphore values.

Table 3: OS/2 pipe API.

 Function Description

DosCallNPipe Does a transact and a close. See DosTransact and DosClose.
DosConnectNPipe Put server end of pipe into listen mode. Blocks until client
end calls DosOpen.
DosCreateNPipe Make a named pipe.
DosDisConnectNPipe Server acknowledges client calling DosClose.
DosPeekNPipe See if anything's in the pipe.
DosQueryNPipeHState See if pipe's listening, connected, closed.
DosQueryNPipeInfo Check pipe parameters.
DosQueryNPipeSemState Check on the semaphore attached to this pipe.
DosSetNPHState Change blocking and read modes.
DosSetNPipeSem Attach a semaphore to a pipe.
DosTransactNPipe Write a message and read a response.
DosWaitNPipe Wait for named pipe to be created.
DosClose Close the pipe.
DosCreatePipe Create anonymous pipe.
DosOpen Open a pipe.
DosRead Read a pipe.
DosWrite Write a pipe.


Table 4: UnixWare pipe API.
 Function Description

pipe Create anonymous stream pipe.
popen Exec a process and create a new anonymous stream pipe and attach it to
process's stdin and stdout.
pclose Close a pipe.
ioctl Used to push connld onto named pipe stream and to receive named pipe
file descriptor. See I_PUSH and I_RCVFD.
fattach Attach file system name to stream.
creat Create a FIFO (set FIFO_MODE).
mkfifo Create a FIFO.


Table 5: OS/2 queue API.
 Function Description

DosCloseQueue Destroy a queue.
DosCreateQueue Create a queue.
DosOpenQueue Attach to an existing queue.
DosPeekQueue See what's in the queue.
DosPurgeQueue Clean out the queue.
DosReadQueue Take a message off the queue.
DosWriteQueue Put a message into the queue.


Table 6: UnixWare queue API.
 Function Description

msgget Create a new queue or attach to an existing queue.
msgctl Destroy the queue.
msgsnd Send a message.
msgrcv Take a message off the queue.


[LISTING ONE] (Text begins on page 78.)

main( int argc, char *argv[] )
{
 /* The Unix integer id for a message queue. */

 int msgqid;
 /* Since argv[0] is the name of this executable file
 it is guaranteed to be a valid filename. */
 msgkey = ftok( argv[0], A' );
 ....

[LISTING TWO]

// SEMSHMEM.C - An OS/2 demonstration of a semaphore protected shared
// memory initialization sequence.
// Startup two command line windows. Run: SEMSHMEM -c to start the
// client and SEMSHMEM -s to start the server. You can see what each is
// doing by the printouts. The server will change the value at the base
// of shared memory 20 times and then exit.

#define INCL_BASE
#define INCL_DOSPROCESS
#include <os2.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <stdlib.h>

#define SHAREMEM_SIZE 100 // Size of shared memory
PSZ szShareMem = "\\SHAREMEM\\QSRV"; // Name of shared memory
char *pShareMem; // Pointer to base of shared memory
INT *pint; // Another pointer to base of shared mem.

PSZ szSem = "\\SEM32\\QSRV"; // Name of our mutex semaphore
HMTX hMutexSem; // Handle to our mutex semaphore

PID srvPid, mypid;
int i, j;
APIRET rc;
PTIB ptib; // Thread info block
PPIB ppib; // Process info block

int DoClient( void );
int DoServer( void );

// main - decide which process we are by the command line, then run
// either the client or server function.
int main( int argc, char *argv[] )
{
DosGetInfoBlocks( &ptib, &ppib );
mypid = ppib->pib_ulpid;
if( strcmp( argv[1], "-c" ) == 0 )
 DoClient();
else
 DoServer();
return( 0 );
}
// DoClient - Attach to the shared memory block, then to the mutex
// semaphore guarding the block. Block waiting on the semaphore
// until the server has initialized the shared memory.
int DoClient()
{
 printf( "Started client process %lu.\n", mypid );
 for( i = 0; i <= 30; i++ )

 {
 if( i == 30 )
 {
 printf( "Client attach error %d\n", rc );
 goto AttachError;
 }
 if(( rc = DosGetNamedSharedMem( (PVOID *)&pShareMem,
 szShareMem, PAG_READPAG_WRITE )) == 0 )
 break;
 else
 DosSleep( 1000L );
 }
 printf( "Opened shared memory %p!\n", pShareMem );
 if(( rc = DosOpenMutexSem( szSem, &hMutexSem )) != 0 )
 {
 printf( "Client couldn't open semaphore\n" );
 goto OpenError;
 }
 printf( "Opened semaphore: waiting for clear\n" );
 // Wait 90 seconds for the semaphore to clear
 if(( rc = DosRequestMutexSem( hMutexSem, 90000L )) != 0 )
 {
 printf( "Client sem request failed!\n" );
 goto RequestError;
 }

 printf( "Semaphore cleared!\n" );

 pint = (INT *)pShareMem;
 for( i = 0; i < 20; i++ )
 {
 printf( "Client read value %d\n", *pint );
 DosSleep( 1000L );
 }
RequestError:
 DosCloseMutexSem( hMutexSem );
OpenError:
 DosFreeMem( pShareMem );
AttachError:
return( 1 );
}
// DoServer - Create a semaphore to guard the block while it's being
// initialized, then create the shared memory block. Client waits to see the
// shared memory block, so create and set the semaphore first. Then sit
// twiddling the value at the base of shared memory for awhile.
int DoServer()
{
 printf( "Server alloced share mem at %p\n", pShareMem );
 if(( rc = DosCreateMutexSem( szSem, &hMutexSem, 0, TRUE )) != 0 )
 {
 printf( "Server create semaphore error %d\n", rc );
 goto CreateError;
 }
 printf( "Started server process %lu.\n", mypid );
 if(( rc = DosAllocSharedMem( (PVOID *)&pShareMem, szShareMem, SHAREMEM_SIZE,
PAG_READPAG_WRITEPAG_COMMIT )) != 0 )
 {
 printf( "Server alloc failed %d\n", rc );
 goto AllocError;
 }

 memset( pShareMem, \0', SHAREMEM_SIZE );
 printf( "Server initializing shared memory ...\n" );
 DosSleep( 10000L );
 DosReleaseMutexSem( hMutexSem );
 pint = (INT *)pShareMem;
 for( i = 0; i < 20; i++ )
 {
 (*pint)++;
 printf( "Server changed value to %d\n", *pint );
 DosSleep( 1000L );
 }
 DosCloseMutexSem( hMutexSem );
AllocError:
 DosFreeMem( pShareMem );
CreateError:
 return( rc );
}
[LISTING THREE]

/* semshmem.c - A UnixWare demonstration of a semaphore guarding a shared
 memory block initialization sequence. Run: semshmem -c for the client
 process and semshmem -s for the server. Run them in separate shell
 windows and you can see from the printouts how client waits on semaphore.
*/

#include <errno.h>
#include <ctype.h>
#include <time.h>
#include <memory.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/sem.h>

int shmkey; /* Shared memory integer id. */
int shmid; /* Shared memory's instance id. */
char *shmptr; /* Base of shared mem. */
int *pint; /* Both processes use this as integer at base of shared mem. */

int semkey; /* Semaphore integer id. */
int semid; /* Semaphore's instance id. */
struct sembuf semoparray[1]; /* Used for semaphore operations */

#define SHAREMEM_SIZE 100
#define MAXREPS 20
#define SEM_INDEX 0 /* This is our index in the semaphore array */
union semun {
 int val;
 struct semid_ds *buf;
 ushort *array;
 };
union semun us; /* Used for semaphore semop() calls */

/* main - Construct semaphore and shared memory keys and call the client or
 server process based on command line argv[1] */
void main( int argc, char *argv[] )
{
 /* Create the shared memory id */
 if(( shmkey = ftok( argv[0], A' )) < 0 )

 {
 printf( "ftok error\n" );
 goto error;
 }
 /* Create the shared memory id */
 if(( semkey = ftok( argv[0], S' )) < 0 )
 {
 printf( "ftok error\n" );
 goto error;
 }
 if( strcmp( argv[1], "-c" ) == 0 )
 DoClient();
 else
 DoServer();
error:
exit( 0 );
}
/* DoClient - Attach to the existing shared memory segment, then sit in a loop
 incrementing the integer at the base of the shared memory. */
int DoClient()
{
int rc = 0;
int i;
 printf( "Waiting for semaphore create ...\n" );
 for( i = 0; i <= 30; i++ )
 {
 if( i == 30 )
 {
 printf( "timed out waiting for semaphore create\n" );
 goto error;
 }
 /* Create the semaphore */
 if(( semid = semget( semkey, 0, 0 )) < 0 )
 {
 printf( "waiting for semaphore create ...\n" );
 sleep( 5 );
 }
 else
 {
 printf( "got semid = %d\n", semid );
 sleep( 5 );
 break;
 }
 }
 printf( "Waiting for semaphore to clear ...\n" );
 us.val = 1;
 printf( "Setting semaphore\n" );
 semoparray[SEM_INDEX].sem_num = SEM_INDEX;
 semoparray[SEM_INDEX].sem_op = -1;
 semoparray[SEM_INDEX].sem_flg = SEM_UNDO;
 if( semop( semid, semoparray, 1 ) < 0 )
 {
 printf( "Can't set semaphore!\n" );
 goto error;
 }
 printf( "Semaphore cleared!\n" );
 /* Find the shared memory segment. */
 if(( shmid = shmget( shmkey, SHAREMEM_SIZE, IPC_ALLOC )) < 0 )
 {

 printf( "shmget error\n" );
 goto error;
 }
 /* Get a pointer to the shared memory block */
 if(( shmptr = shmat( shmid, 0, 0 )) == (void *)-1)
 {
 printf( "shmat error\n" );
 goto error;
 }
 pint = (int *)shmptr;
 /* Sit in a loop changing the string */
 for( i = 0; i < MAXREPS; i++ )
 {
 printf( "Int is now %d.\n", *pint );
 sleep( 1 );
 }
 shmdt( shmptr );
error:
 return( rc );
}
/* DoServer - Create a shared memory block, then sit in a loop changing the
 integer value at base of shared memory block. Presumably, client process
 is reading this value. Compare the printfs the two processes do. */
int DoServer()
{
int rc = 0;
int i;
struct shmid_ds dummy;
 printf( "Creating semaphore\n" );
 /* Create the semaphore */
 if(( semid = semget( semkey, 1, IPC_ALLOC )) >= 0 )
 {
 printf( "Already existed, removing!\n" );
 semctl( semid, IPC_RMID, 0 );
 }
 /* Create the semaphore */
 if(( semid = semget( semkey, 1, IPC_CREAT0666 )) < 0 )
 {
 printf( "semget error\n" );
 goto error;
 }
 /* Set the semaphore to available. */
 printf( "Setting semaphore\n" );
 us.val = 1;
 if( semctl( semid, SEM_INDEX, SETVAL, us ) < 0 )
 {
 printf( "semctl setval error %d\n", errno );
 goto error;
 }
 /* Now claim the semaphore by setting it to 0 through semop. */
 semoparray[SEM_INDEX].sem_num = SEM_INDEX;
 semoparray[SEM_INDEX].sem_op = -1;
 semoparray[SEM_INDEX].sem_flg = SEM_UNDO;
 if( semop( semid, semoparray, 1 ) < 0 )
 {
 printf( "Can't set semaphore!\n" );
 goto error;
 }
 printf( "Semaphore is now set.\n" );

 /* Create the shared memory segment. */
 if(( shmid = shmget( shmkey, SHAREMEM_SIZE, IPC_ALLOC )) >= 0 )
 {
 shmctl( shmid, IPC_RMID, &dummy );
 printf( "Already existed!\n" );
 }
 /* Create the shared memory segment, and give everyone access to it. */
 if(( shmid = shmget( shmkey, SHAREMEM_SIZE, IPC_CREAT0666 )) < 0 )
 {
 printf( "shmget error\n" );
 goto error;
 }
 /* Get a pointer to the shared memory block */
 if(( shmptr = shmat( shmid, 0, 0 )) == (void *)-1)
 {
 printf( "shmat error\n" );
 goto error;
 }
 pint = (int *)shmptr;
 /* Fill the block with a non-zero value */
 memset( shmptr, 6, SHAREMEM_SIZE );
 /* Sleep for 20 seconds so we can visually see client blocking waiting
 on the semaphore, and both waking up when sem is cleared. */
 sleep( 20 );
 /* Clear the semaphore so the client can start reading the block */
 semoparray[SEM_INDEX].sem_num = SEM_INDEX;
 semoparray[SEM_INDEX].sem_op = 1;
 semoparray[SEM_INDEX].sem_flg = SEM_UNDO;
 if( semop( semid, semoparray, 1 ) < 0 )
 {
 printf( "Can't set semaphore!\n" );
 goto error;
 }
 printf( "Semaphore is now clear.\n" );
 /* Sit in a loop sleeping, then incrementing the integer at the
 base of the shared memory block */
 for( i = 0; i < MAXREPS; i++ )
 {
 (*pint)++;
 printf( "Int changed to %d\n", *pint );
 sleep( 1 );
 }
 sleep( 5 );
 shmdt( shmptr );
 /* Destroy the shared memory block */
 if( shmctl( shmid, IPC_RMID, 0 ) < 0 )
 printf( "shmctl(RMID) error\n" );
 semctl( semid, IPC_RMID, 0 );
error:
 return( rc );
}











May, 1994
Porting D-Flat++ to OS/2


Borland C++ for OS/2 eases the burden




Jon Wright


Jon is currently involved with the development of the "Guidelines" Visual C++
Development tool for OS/2 Presentation Manager. He can be contacted on
Compu-Serve at 71333,2641.


Borland's entry into the OS/2 world, Borland C++ for OS/2, has opened the door
for many DOS developers who have grown used to Borland-specific library
extensions. Of course, ANSI specifies a standard library of functions for C
and C++, but these functions aren't always adequate for real-world
applications, and developers end up supplementing them with extensions. With
Borland C++ for OS/2, familiar extensions are available to OS/2 developers.
In this article, I'll describe the toolset Borland has provided for OS/2 and
my experiences porting Al Stevens's D-Flat++ (DF++) library to OS/2. Regular
readers of DDJ are familiar with this C++ class library which allows you to
build a CUA '89-compliant user interface under DOS. (DF++ was first discussed
in Al's "C Programming" column in June 1992.) Because the DF++ listings are
lengthy, the entire DF++ for OS/2 project is available electronically; see
"Availability" on page 3. This article covers Version 1.x of the tools.


Touring the Tools


A complete installation of Borland C++ for OS/2 consumes roughly 27 Mbytes of
hard-disk space. Included in the toolset are all the tools you need for C++
development under OS/2. Browsing the directory structure reveals few surprises
to veteran Borland C++ programmers. However, when you run the
integrated-development environment (IDE) or debugger, you can see that much
has been rewritten. These tools take advantage of OS/2 and Presentation
Manager (PM) features, while providing interfaces that seem much smoother.
Borland C++ for OS/2 includes a set of header files to access the OS/2 APIs
and tools to manipulate and create OS/2-specific files. While IBM has
traditionally offered these as a separate toolkit, Borland bundles them with a
few modifications. Most of the sample code from the IBM toolkit is also
included, although a couple of the larger examples are missing.
A PM version of the Resource Workshop, for instance, combines the functions
provided by the separate Font, Icon, and Dialog editors in the IBM toolkit.
Borland also provides a 32-bit version of the TASM assembler, which I believe
is the only native OS/2 2.x assembler available. (Microsoft removed OS/2
support from MASM at Version 6.1.)


Porting D-Flat++


Porting DF++ to OS/2 was fairly painless. Most of the changes I made are
confined to half a dozen classes--Cursor, Keyboard, Mouse, Screen, Speaker,
and Timer--that perform platform-dependent access to the hardware. By
encapsulating the system interfaces and restricting the dependent code to
these modules, Al made the library far easier to port than I had expected. The
DOS versions of the affected classes mostly use the int86() and int86x() calls
to perform low-level system access, along with inp() and outp() for port I/O.
Because OS/2 is a protected-mode operating system, these calls are not
permitted. Hardware access requires either a device driver or a 16-bit segment
with I/O privilege level. In addition, the BIOS is real-mode code which is not
valid in protected mode. In place of these calls, OS/2 provides an API to
access system resources. Table 1 is a list of the mappings between the DOS
interrupt calls and the OS/2 API.
Listings One and Two (page 110) present the Cursor class. The class definition
was modified to include a VIO-CURSORINFO structure instead of the registers
structure in the original DF++. This is used to hold details of the cursor
type when communicating with the VioGetCurType() and VioSetCurType() APIs. The
other pair of APIs, VioGetCurPos() and VioSetCurPos(), both use 16-bit x and y
values.
With the Screen class (see "Availability," page 3), I chose to simplify the
interface to the class by removing the Page(), isEGA(), isVGA(), isMono(), and
isText() member functions. These were only used within the screen module, so
they weren't really part of the public interface. I also removed the address,
page, and mode data members that these functions used. I replaced them with a
VIOMODEINFO structure that returns the screen dimensions using a VioGetMode()
call. The Keyboard class, also available electronically, maps OS/2's KbdXxx
functions to the DOS int 16h calls used in the original DF++; see Table 1. In
the constructor for the Keyboard instance, it's necessary to open the keyboard
as a device, then grab the focus. The system state of the keyboard is then set
to binary (RAW) mode, and shift state reading is turned on.
In the Speaker class (Listings Three and Four, page 110), the Beep method no
longer needs to do its own timing or write directly to the speaker hardware;
the function maps well onto DosBeep(); see Table 1.
The Timer class (Listing Five, page 110, and Listing Six, page 111) was
rewritten to use OS/2 system timers, as opposed to the original approach,
which involves the hooking of an interrupt to periodically update a
static-timer array. Borland already has a timer.hpp, so I changed the name of
the files to dtimer.hpp and dtimer.cpp, respectively. (I've found that relying
on path position to distinguish between identically named files can lead to
confusion.) High-resolution timing is not required, so the OS/2
DosAsyncTimer() and DosStopTimer pair is adequate. When the timer expires, an
Event semaphore is posted, which may be queried at any time to discover if the
timer is running. Note that the APIs used in this module are OS/2 2.x specific
and thus will not work under 16-bit versions of OS/2. This wasn't an issue for
me (Borland C++ for OS/2 will only generate 32-bit code), but alternative
schemes could be devised.
Mouse handling was the trickiest to understand and implement correctly. It
took me some time to appreciate that modal windows are constructed within the
Mouse::DispatchEvent call. With D-Flat++, a modal window, such as a message
box, suspends processing of the mouse event that invoked it and starts its own
version of the event-processing loop until it terminates. Processing of the
original event then resumes. Thus, the dispatcher must be reentrant. Failing
to appreciate this, I coded the mouse-event dispatcher as a state machine,
which caused several hours of head scratching. The moral is to beware of
assumptions! I modified my mouse handler, setting the next state before
dispatching the event, but a better approach could probably be found.
Drawing a state diagram showed that only one timer was required, rather than
the two in the original implementation. Therefore, I removed the Moved(),
LeftButton(), and ButtonReleased() member functions. These were only used
internally to the module and were not necessary for the ported module. Data
members were added to hold the timing constants that determine whether two
adjacent clicks are actually a double click and the typematic behavior. A
dialog could be written to manipulate these to accommodate user preferences.


Program Notes


When writing the mouse interface, I ran across an inconsistency between
Presentation Manager and CUA guidelines. According to CUA '91 (the version PM
follows), double-clicking is defined thus: "press and release a button on a
pointing device twice while a pointer is within the limits that the user has
specified for the operating environment." PM itself doesn't wait for the
second release; it acts on the preceding click. D-Flat++ follows CUA '89 (the
version of the standard used in Windows), so that's how I implemented it, too.
In practice, the difference is not really noticeable.
Under OS/2 text mode, performance can suffer unless text is written to the
screen in longer chunks. As it stands, DF++ usually writes a character at a
time, so this could be improved. On a development machine, performance is
acceptable with the test program. But on a slower machine, the tracking
rectangle tended to wallow about, trying to follow the mouse. With the text
editor (TED) in the most recent DF++, clearing the screen at start-up is slow
and could stand some redesign.
An interesting side effect of porting DOS code to OS/2 is the discovery of
latent defects. By running the application under the TD-GX debugger, you can
see that each time a dereference of a null pointer (for instance) is
attempted, TD-GX handles the exception and puts you back in control, opening
windows to show the source line where the exception occurred, the disassembled
instruction, the call stack, and other information. This caught a couple of
problems which had made it into the original code as posted (and which appear
to run fine under DOS). This may make the case for OS/2 as a first platform
for development, with a later port to DOS when the code runs smoothly.


Plus and Minus


Version 1.x of the Borland OS/2 compiler revealed a number of shortcomings
(although Version 2.0, which addresses many of these problems, may be
available as you read this). In particular:
TLINK's lack of support for Thread Fixup records (which causes problems when
linking with code generated by IBM's C Set ++ compiler).
A limit of 16 Mbytes for static arrays.
A limit of 68 threads allowed per process.
A limit of 40 file handles (in the run-time code).
Improper thunking handling (16  32-bit conversion).
The thread-fixups problem was corrected in a maintenance release, but I did
not have time to retest the other areas.
Overall, the compiler was speedy when compared with IBM tools. However, this
speed comes at a price--optimization is not as good as with IBM C Set ++ or
Watcom C++ 9.5, either of which would be preferable for a final compile of
production code on new projects (see the accompanying text box entitled,
"IBM's C Set ++").

The Borland debugger does exploit OS/2's capabilities. It debugs both
character-mode and PM programs, though a few major flaws make some PM apps
impossible to debug successfully. One flaw is that it starts debuggee programs
as children and doesn't allow you to bring them to the foreground from within
the tool. The child can easily be completely hidden behind the debugger
windows, requiring use of the system task list to bring it to the foreground.
Borland's IDE from DOS was carried on to OS/2 and rewritten to take advantage
of the Presentation Manager and CUA '91 controls. Settings for particular
projects are adjusted using "Notebooks," a form of multipage dialog. Help is
comprehensive, and context-sensitive help (using Ctrl+F1) for the OS/2 APIs
was added with the maintenance release. To achieve this, Borland has included
.HLPversions of the .INFfiles, which increases the amount of disk space
required considerably. This is because OS/2 treats INF and HLP files
differently, although internally, the difference is one byte. Color
highlighting of syntax works well in the editor, a big plus being the easy
detection of missing ends to comments which can result in large tracts of code
being effectively excluded from the compile.
I tend to work on larger multidirectory projects and have set up a build
system with a third-party make tool to coordinate all sources. The IDE is
based around a single project/source tree, with no provision for version
control, which I found a bit restricting. This, coupled with my enthusiasm for
a different editor, led me to do most of my work by integrating the
command-line version of the Borland tools into my existing system.
Although the main project was to get D-Flat++ working under OS/2, I thought
I'd also try out the Presentation Manager with a small piece of code. Using
the Hexadecimal calculator in Charles Petzold's Programming the OS/2
Presentation Manager (Microsoft Press, 1989), I wrote a 32-bit C++ version of
the original 16-bit C program, using the Borland-supplied samples as a
template. The calculator program came together very easily in one evening. No
class libraries were needed, which was fortunate, as Borland's Object Windows
Library (OWL) was not a part of the first release for the OS/2 platform. For
this example, I used the Resource Workshop for creating the dialog resources.


Conclusion


As it stands, Borland's OS/2 C++ tools are good. My main concern with using
them on a large project would be Borland's support record. Where a large body
of legacy code needs to be ported from a DOS environment, the availability of
the Borland extensions makes the task easier.
Borland C++ for OS/2 is easy to set up and use, fast, and a good starting
point for OS/2 programming, especially for programmers coming from DOS and
Windows. Its Resource Workshop and Assembler components in particular
guarantee it a place in my toolbox.
Table 1: Mappings between DOS interrupt calls and the OS/2 API.
 DOS Interrupt OS/2 Function

Int 10, Func 01h VioSetCurType()
Int 10, Func 02h VioSetCurPos()
Int 10, Func 03h VioGetCurPos()
 VioGetCurSize()
Int 10, Func 06h VioScrollUp()
Int 10, Func 07h VioScrollDn()
Int 10, Func 12h VioGetMode()
Int 10, Func 15h VioGetMode()
Int 10, Func 1Ah VioGetMode()
peek() VioReadCellStr()
poke(), movedata() VioWrtCellStr()
Int 16, Func 00h KbdCharIn()
Int 16, Func 01h KbdPeek()
Int 16, Func 02h KbdGetStatus()


IBM compiler technology is just as mature and possesses all the functionality
of Borland C++, but it takes a different philosophical approach in its
fundamental design. While the Borland compiler provides a powerful set of
vendor-specific extensions, the C Set ++ (C and C++) compiler is built with an
emphasis on correct, optimized code and provides a high degree of ANSI, ARM,
and DWP compliance. For instance, it is able to globally optimize programs
across source files, and provides templates and exception handling.
However, such capabilities come at a price. The compiler expects to run on a
machine with a typical configuration of 80486 with 16 Mbytes of RAM and 100
Mbytes free on a fast hard drive (65 Mbytes of files and 30 Mbytes swap
space). C Set ++ will run on half the memory and disk space but with a penalty
in performance and available features. By contrast, Borland's compiler will
run twice as fast in 8 Mbytes of RAM.
IBM provides tools and class libraries--such as a Browser and Profiler--that
Borland does not. IBM's class libraries use exception handling and templates
throughout, and range from basic collection classes to an extensive high-level
encapsulation of the Presentation Manager API.
Also significant is the issue of support. Borland accumulates six to twelve
months worth of fixes, then does a major-upgrade release. IBM releases fixes
frequently, with a program to install them. These fixes are available
electronically, or mailed free if you report one of the bugs that it fixes.
Both Borland and IBM support teams on CompuServe, where much of the support
effort is concentrated. Borland uses dedicated support engineers, while IBM
support includes product developers and managers.
Borland's audience will mostly be developers moving to OS/2 from the
DOS/Windows world. They are familiar with the interface and tools. At first
glance, IBM seems pitched towards larger development projects and the
professional-compiler market. However, C Set ++ pricing puts it firmly in the
same league as Borland, and with decreasing hardware prices, Borland will need
to run fast to keep up.
--J.W.


IBM's C Set ++



Borland C++ for OS/2
Borland
1800 Green Hills Road
Scotts Valley, CA 95066

[LISTING ONE] (Text begins on page 86.)
// ------------- cursor.h -- modified for OS/2 operation - jw21sep93
#ifndef CURSOR_H
#define CURSOR_H

#define INCL_BASE
#define INCL_NOPMAPI
#include <os2.h>

const int MAXSAVES = 50; // depth of cursor save/restore stack
class Cursor
 {

 VIOCURSORINFO ci;
 // --- cursor save/restore stack
 int cursorpos[MAXSAVES];
 int cursorshape[MAXSAVES];
 int cs; // count of cursor saves in effect
 void Cursor::GetCursor();
public:
 Cursor();
 ~Cursor();
 void SetPosition(int x, int y);
 void GetPosition(int &x, int &y);
 void SetType(unsigned t);
 void NormalCursor() { SetType(0x0607); }
 void BoxCursor() { SetType(0x0107); }
 void Hide();
 void Show();
 void Save();
 void Restore();
 void SwapStack();
 };
inline void swap(int& a, int& b)
 {
 int x = a;
 a = b;
 b = x;
 }
#endif

[LISTING TWO]

// ------------ cursor.cpp -- modified for OS/2 operation - jw13nov93

#include <dos.h>
#include "cursor.h"
#include "desktop.h"

Cursor::Cursor()
 {
 VioGetCurType(&ci, 0);
 cs = 0;
 Save();
 }
Cursor::~Cursor()
 {
 Restore();
 }
// ------ get cursor shape and position
void Cursor::GetCursor(){};
// -------- get the current cursor position
void Cursor::GetPosition(int &x, int &y)
 {
 USHORT sx, sy;
 VioGetCurPos(&sy, &sx, 0);
 x = sx;
 y = sy;
 }
// ------ position the cursor
void Cursor::SetPosition(int x, int y)
 {

 VioSetCurPos((USHORT)y, (USHORT)x, 0);
 }
// ------ save the current cursor configuration
void Cursor::Save()
 {
 USHORT x, y;
 if (cs < MAXSAVES)
 {
 VioGetCurPos(&y, &x, 0);
 cursorpos[cs] = (x<<8) y;
 VioGetCurType(&ci, 0);
 cursorshape[cs] = (ci.yStart << 8) ci.cEnd;
 cs++;
 }
 }
// ---- restore the saved cursor configuration
void Cursor::Restore()
 {
 USHORT row = (USHORT)(cursorpos[cs] & 0xff);
 USHORT col = (USHORT)((cursorpos[cs] >> 8) & 0xff);
 if (cs)
 {
 --cs;
 VioSetCurPos(row, col, 0);
 SetType(cursorshape[cs]);
 }
 }
/* ---- set the cursor type ---- */
void Cursor::SetType(unsigned t)
 {
 ci.yStart = (USHORT)((t >> 8) & 0xff);
 ci.cEnd = (USHORT)(t & 0xff);
 VioSetCurType(&ci, 0);
 }
/* ----- swap the cursor stack ------- */
void Cursor::SwapStack()
 {
 if (cs > 1)
 {
 swap(cursorpos[cs-2], cursorpos[cs-1]);
 swap(cursorshape[cs-2], cursorshape[cs-1]);
 }
 }
/* ------ hide the cursor ------ */
void Cursor::Hide()
 {
 USHORT t;
 t = ci.attr;
 ci.attr = 0xffff;
 VioSetCurType(&ci, 0);
 ci.attr = t;
 }
/* ------ show the cursor ------ */
void Cursor::Show()
 {
 VioSetCurType(&ci, 0);
 }

[LISTING THREE]


// --------- speaker.h
#ifndef SPEAKER_H
#define SPEAKER_H

#define INCL_BASE
#include <os2.h>
class Speaker
 {
public:
 void Beep();
 };
#endif

[LISTING FOUR]

// -------- speaker.cpp -- modified for OS/2 operation - jw21sep93

#include "speaker.h"
#include "dflatdef.h"
void Speaker::Beep()
 {
 DosBeep(1000, 250);
 }

[LISTING FIVE]

// ----------- dtimer.h -- modified for OS/2 operation - jw21sep93
// uses OS/2 Timer services rather than doing own timer management.
// renamed to avoid collision with Borlands Timer.h

#ifndef TIMER_H
#define TIMER_H

#define INCL_BASE
#define INCL_DOSDATETIME
#define INCL_NOPMAPI
#include <os2.h>

#include "dflatdef.h"
class Timer
 {
 HEV sem;
 HTIMER timer;
 APIRET rc;
 Bool disabled;
public:
 Timer();
 ~Timer();
 Bool TimedOut();
 void SetTimer(int ticks);
 void DisableTimer();
 Bool TimerRunning();
 void Countdown() { ; }
 Bool TimerDisabled() { return disabled; }
 };
#endif

[LISTING SIX]


// -------- dtimer.cpp -- OS/2 Version created to use the OS/2 Timer Services
// jw21sep93 -- renamed from timer.cpp' to match header

#include <stdio.h>
#include "dtimer.h"

Timer::Timer()
 {
 // create semaphore
 ULONG SemAttr = DC_SEM_SHARED; // needs to be shared
 disabled = True; // start in disabled state
 timer = 0;
 rc = DosCreateEventSem(NULL, &sem, SemAttr, 1);
 if(rc != NO_ERROR)
 {
 printf("DosCreateEventSem failed: rc = %d\n",rc);
 }
 }
Timer::~Timer()
 {
 rc = DosStopTimer(timer);
 rc = DosCloseEventSem(sem);
 }
void Timer::SetTimer(int ticks)
 {
 ULONG ct;
 disabled = False;
 if (timer)
 {
 rc = DosStopTimer(timer);
 }
 rc = DosResetEventSem(sem, &ct);
 if(rc != NO_ERROR && rc != ERROR_ALREADY_RESET)
 {
 printf("DosResetEventSem failed: rc = %d\n",rc);
 }
 rc = DosAsyncTimer(ticks*18, (HSEM)sem, &timer);
 if(rc != NO_ERROR)
 {
 printf("DosAsyncTimer failed: rc = %d\n",rc);
 }
 }
void Timer::DisableTimer()
 {
 disabled = True;
 }
Bool Timer::TimedOut()
 {
 ULONG ct = 0L;
 if (disabled == False)
 {
 rc = DosQueryEventSem(sem, &ct);
 if(rc != NO_ERROR)
 {
 printf("DosQueryEventSem failed: rc = %d\n",rc);
 }
 }
 return (Bool) (ct != 0L);

 }
Bool Timer::TimerRunning()
 {
 ULONG ct;
 if (disabled == False)
 {
 rc = DosQueryEventSem(sem, &ct);
 if(rc != NO_ERROR)
 {
 printf("DosQueryEventSem failed: rc = %d\n",rc);
 }
 return (Bool) (ct == 0L);
 }
 return(False);
 }
End Listings














































May, 1994
A Multicolumn List-Box Container for OS/2


Visual programming with GpfRexx




Brian Proffit


Brian was part of IBM's OS/2 development team. He is president of Visionary
Research, the author of OS/2 Application Development Tools and OS/2 Inside &
Out, and a contributing editor to OS/2 Magazine and OS/2 Developer. You can
contact Brian on CompuServe at 75300,1466.


The basic list box has become a staple of graphical user interfaces because it
provides a simple way of presenting scrollable lists of related information.
Using a list-box control is much easier than writing code to paint the
information in a panel, drawing the scroll bar, detecting user actions on the
scroll bar, and repainting the list text accordingly. Unfortunately, standard
list boxes are vector-based, with only one-dimensional information allowed.
Developers have looked for ways to take advantage of the coding ease of a list
box, while being able to display two-dimensional information in a more tabular
format. In short, they want a list box that can display multiple columns. Yet
multicolumn list-box controls aren't routinely provided for in operating
systems such as Windows and OS/2. In this article, I'll develop a multicolumn
list-box control in a Workplace Shell container using the multiple-record list
structure in Gpf Systems' GpfRexx.
Gpf is a C/C++ tool for creating GUIs for OS/2 Presentation Manager (PM)
applications. REXX (REstructured eXtended eXecutor) is a high-level procedural
language provided in OS/2 that uses English-like commands to implement program
logic. For its part, GpfRexx is a visual programming environment that combines
the power of REXX with the point-and-click simplicity of Gpf for rapidly
designing, testing, and generating OS/2 PM GUI programs.
In addition to the full CUA '91 control set, GpfRexx also supports both
standard OS/2 features (drag-and-drop, multithreading, and multitasking) and
advanced ones (PM multimedia and DB2/2 SQL). More importantly, GpfRexx allows
you to incorporate user-designed controls and custom bitmaps into programs,
and to control display attributes such as colors, fonts, object placements,
and the like--all without extensive knowledge of PM programming. For testing,
GpfRexx (which generates royalty-free apps) provides its own multithreaded
debugger.
While an interpretive environment like GpfRexx buys you rapid development (not
to mention a lower learning curve than you face with C or C++), you may take a
performance hit as your application grows. Luckily for those programs in which
you must maximize performance, you can import applications created using
GpfRexx into regular Gpf and generate native C or C++ code for OS/2 or
Windows--without losing any of your original interface-design work.
I'll first use GpfRexx to create a multicolumn list-box control, then generate
a C implementation of the control that you can later optimize. (Executable
versions, data definitions, and related files for the control are available
electronically; see "Availability," page 3.)


Creating a New Program


As with most visual-programming tools, the process of designing and creating
new programs or utilities (such as the multicolumn list box) involves "filling
out" multiple screens and selecting predefined functions. Figure 1 is the
first GpfRexx screen you encounter. The Application Properties dialog box
requires you to name the application, as well as define text that appears in
the Task List describing the application. You also check radio buttons to
determine if the resulting application is a stand-alone .EXE file or a smaller
executable file that calls the separate GpfRexx DLLs.
Then you begin laying out the overall UI. As you would expect, GpfRexx
generates a canvas on which to paint your interface. Standard default File and
Help menus, which can be modified or removed, are included. The size and
position of the window determines the starting size and position of your
application's window. Double-clicking on the title bar brings up the Window
Style dialog box that lets you change the text in the title bar and set user
options such as sizing, minimizing/maximizing, and menus; see Figure 2.
Changing the presentation parameters modifies the color and fonts used for
objects in this window. The icon that represents this application is set here,
and any entries you would like to add to the application's pop-up menu are
defined as well.
Next, you select Create/Container to position and define the characteristics
of the container object and actions you can perform on its contents from a
Container Control dialog box. After closing this dialog, click on the
container to select it, then size it to fill the main window.


Designing the Container


List-box entries are loaded as a series of records in a compound variable,
often called a "stem." For the purposes of example, I'll fill the stem from
prgrmrs.dat, a test file that contains a list of programmers, their current
projects, and their percentages ahead of or behind schedule. The test file is
a standard, comma-separated variable (CSV) file, with records laid out as
follows: PROGRAMMER NAME, PROJECT TITLE, SCHEDULE PERCENTAGE (positive values
indicate ahead of schedule).
To make the list box more visually indicative, I'll use different icons to
identify those far behind (--10 percent), roughly on (--10 percent<x<10
percent), or far ahead (_10 percent) of schedule. You must define these icons
within GpfRexx before they can be loaded into the stem. From the Create/Icon
Object menu, select the filenames of the icons you want to add, and give them
meaningful names for use in your program.
The key to this application is the use of a multiple-record list structure
designed into GpfRexx as a predefined stem for use in a container. The
necessary fields in the stem are shown in Figure 3. The container behaves much
like the Drives object on the Workplace Shell desktop, with predefined
Details, Icon, and Tree views, which present the information in three
different ways. The Details view shows the information defined in the _Areas.
Only the _Icon and _Title fields are displayed in the Icon and Tree views.
At this point, you need to create a user function to read the programmer file
and load the list-box container. GpfRexx includes standard File Open and File
Save As dialogs, as well as default functions to be executed when the
application is initialized and when it is exited. Select Create/User Function
Object to see the User Function Edit dialog box. This dialog displays a list
of the primary objects already defined for the application, the methods
available to act upon those objects, and the code associated with those
methods. This simplifies copying existing code into your new function. In this
case, however, I'll be creating all new code, so click on Zoom (so the entire
window is available for our instructions), name the procedure CreateListBox,
then move to the Multiple Line Edit (MLE) control and enter the instructions
in Example 1.
Once the list-box stem is loaded, you need to load the field-information stem,
again using the User Function Edit dialog box. Some tedious but uncomplicated
coding is required to lay out the columns in the list box. I have introduced a
second function (DefineFields) to compartmentalize this code for easier
reading of the source, debugging, and maintenance. Figure 4 is the general
layout of the field-information stem. Simply click on "add" in the User
Function dialog box to create DefineFields with the code in Example 2.


Running the Program


That's essentially all that's required to build the multicolumn list-box
control. However, you need to tell GpfRexx when you want CreateListBox to be
executed. The most logical time would be when the main window is initialized.
Returning to Figure 2, select the Action button to see the Action On dialog
box, which lists the events that can take place on the selected object--in
this case, the main window. You add CreateListBox to the window-initialization
processing by simply pointing-and-clicking.
The last step is creating the executable program. Referring again to Figure 1,
you select the Toolkit menu to have GpfRexx build a complete .EXE file ready
to be run. Upon execution, the program reads the sample data file and displays
the multicolumn list box in Figure 5.
Of course, the tabular view in Figure 5 isn't the only way to look at
information when using containers. GpfRexx lets you, for instance, utilize the
container's Icon View for a quick glance at data; see Figure 6. In this case,
programmers more than 10 percent behind schedule are represented by a frown,
those 10 percent or more ahead of schedule get a smiley face, and everyone
else is in-between. The radio buttons I've added at the bottom of the display
let you toggle between the two views using built-in GpfRexx functions.


REXX to C/C++ Conversion


Clearly, GpfRexx is a powerful tool for quickly building graphical programs.
You can put together simple applications or prototypes in a fraction of the
time needed in conventional development processes. But if your program grows
so large that the speed disadvantage of an interpreted language such as REXX
becomes a problem, you can import your GpfRexx application into the Gpf C/C++
toolkit and generate a C or C++ version. In other words, you can read an
application-interface file created by GpfRexx directly into Gpf to move REXX
programs to C.
All of the user-interface design work in the GpfRexx-created multicolumn
list-box control moved over to Gpf without a hitch. The only necessary change
was to rewrite the user functions in C. All of the hierarchy and navigation
code translated automatically. The result is a Gpf 2.1 ORC file that is the
same as the GpfRexx ORC, except that the REXX user-function objects are
replaced with C user-function objects. C source for these user functions, a
make to build them, and a user header that should be present when generating
the application (for automatic inclusion in the generated source) are
available electronically; see "Availability," page 3.
Translating the REXX is fairly obvious. GpfRexx provides functions for
accessing PM objects and parameters. Most of these functions map directly onto
PM APIs with only the type and number of parameters changing. This is because
the calls have been simplified to insulate REXX programmers from the details
of PM. In some cases, more work is required; for instance, GpfRexx provides
functions to load list boxes or containers with an array or stem. This is not
supported in C, so you must create a loop to sequentially insert or append
items to the data in the control. The most difficult change in multicolumn
list-box design involves loading data to a container control. In the user
function that loads the list, you can see that not only is a loop required to
load the data, but since it is a container, control memory has to be allocated
to hold the data. Unlike a list box, a container holds pointers rather than
strings.
As with GpfRexx, the process is basically point-and-click once you load Gpf
and open the GpfRexx-generated file (PROGRMRS.ORC, in this case). Once you
specify a C or C++ compiler, you'll see the same main window you created with
GpfRexx--including the icons.

Figure 1 GpfRexx automatically creates a blank window into which you can place
graphical objects.

Figure 2 The Window Styles dialog box allows you to modify the appearance of
the window and its contents.
Figure 3: The fields in the predefined stem for use in a container.
Record.0 Number of records in list

For each record 1 through n:

Record.n._Attributes (e.g., COLLAPSED, DROPONABLE, etc.)
Record.n._Title User-defined title text
Record.n._Icon Icon name
Record.n._Data The user data record
Record.n._Area.0 Number of fields (x) in this record
Record.n._Area.1
 .
 . Contents of each field
 .
Record.n._Area.x


Figure 4: General layout of the field-information stem.
FieldInfo.0 Number of fields

 For each field 1 through n:

FieldInfo.n._Title Title text for this field
FieldInfo.n_Area A number (x) corresponding to this field's place
 number RecordInfo.n._Area.x in the multiple record
 list stem
FieldInfo.n._DataStyle Justification of data and column separator switch
FieldInfo.n._TitleStyle Justification of title text

Figure 5 The multicolumn list box.
Figure 6 Utilizing the container's Icon View.
Example 1: Code needed to create a user function to read the programmer data
file and load the list-box container.
/* Let user know this may take a second by changing cursor to an hourglass. */
SetPointer(SPTR_WAIT')
ProgrammerFile = prgrmrs.dat'

/* Initialize the record count in the first record */
ProgRecord.0 = 0

DO WHILE LINES(ProgrammerFile) /* As long as there's more, */
 ProgData = LINEIN(ProgrammerFile) /* read a record */

 /* Break the record into its component fields */
 Parse Var ProgData ProgName ,' Project ,' SchedPercent
 ProgRecord.0 = ProgRecord.0 + 1 /* increment counter */
 n = ProgRecord.0 /* current record number */
 /* Set appropriate icon */
 SELECT
 WHEN SchedPercent<= -10 THEN ProgRecord.n._Icon = FaceFrown'
 WHEN SchedPercent>= 10 THEN ProgRecord.n._Icon = FaceSmile'
 OTHERWISE ProgRecord.n._Icon = FaceOkay'
 END /* SELECT */

 /* Now load the data into the stem */
 ProgRecord.n._Title = ProgName /* Show name in all views */
 ProgRecord.n._Data = ProgData
 ProgRecord.n._Area.0 = 3
 ProgRecord.n._Area.1 = ProgName

 ProgRecord.n._Area.2 = Project
 ProgRecord.n._Area.3 = SchedPercent
 END /* DO WHILE LINES(ProgrammerFile) */
rc = STREAM(ProgrammerFile,'C','CLOSE') /* close file */
/* Define the layout of the columns of the listbox */
Perform(DefineFields');
/* Define the initial view in the container (just like the */
/* DRIVES object on the desktop has Detail, Icon, Tree */
SetCnrView(ContProgrammers','DETAILS')
/* Finally, load the ProgRecord stem into the container */
Parent = InsertCnrRecordList(ContProgrammers','ProgRecord.')
/* Note that the pointer is automatically reset */


Example 2: Loading the field-information stem.
ProgField.0 = 3 /* Set field counter */
DO Field = 1 TO 3
 ProgField.Field._Area = Field /* Match fields to list areas */
 ProgField.Field._TitleStyle = "CENTER" /* Center titles */
 END /* DO Field */
/* Set parameters for Programmer Name */
ProgField.1._Title = Programmer'
ProgField.1._DataStyle = SEPARATOR' /* Default left justify */
 /* and specify vertical bar on right */
/* Set parameters for Project Title */
ProgField.2._Title = Project'
ProgField.2._DataStyle = SEPARATOR'

/* Set parameters for Schedule Percentage */
ProgField.3._Title = % ahead of sched.'
ProgField.3._DataStyle = CENTER SEPARATOR' /* Center column */
SetCnrFieldInfo(ContProgrammers','ProgField.')

For More Information

GpfRexx
Gpf Systems Inc.
30 Falls Road
Moodus, CT 06469-0414
800-831-0017
$247.50





















May, 1994
PROGRAMMING PARADIGMS


OS Follies




Michael Swaine


IBM has been reevaluating its operating-system strategy. Good move, IBM. The
whole industry could stand to reevaluate what it's doing--or not doing--with
respect to operating systems. And some people may feel they could use some
help understanding what's what with PowerPC. I wish I could help.


The Year of the Pundit


At recent meeting of the Software Entrepreneurs Forum in Silicon Valley,
panelists listed all the reasons why UNIX is prospering--then went on to list
all the reasons why it's barely surviving. That could be metonymical for the
state of uncertainty prevalent in the entire operating-system field. Other
authors have things to say about UNIX standardization, but I see no sign that
1994 will be the year of UNIX, or that UNIX is suddenly going to start moving
apps. Maybe it'll be the year of OS/2, yuk yuk.
It's certainly not the year of System 7. Granted, Macintosh System 7 supports
a lot of applications. Apple sold more personal computers than any other
computer manufacturer last year, according to some estimates, and System 7 is
really Apple's only PC operating system. But the Mac operating system is on
its last legs. Does anyone doubt that Apple could have replaced it by now if
it were not for the fact that replacing your hardware architecture and your
software architecture simultaneously is more than a little crazy?
The replacement waiting in the wings is the Taligent operating system,
code-named "Pink." Apple's future. And IBM has a say in what it contains and
when it gets released. Does this make any sense to you?
The bottom line used to be the beaten path to Bill Gates's door. DOS and
Windows rule the universe, a situation heavy with historical ironies, from the
$25,000 Tim Paterson got for writing DOS, to the carefully engineered
"success" of Windows. But the DOS/Windows picture gets a little cloudy out
beyond next Thursday. The fact that WordPerfect is no longer working on its
DOS version is bad news for DOS as a platform.
So, in theory, is Chicago, aka Windows 4, which is supposed to eliminate the
need for DOS, but won't. Windows, meanwhile, grows another head every time you
look at it. There are three versions of Win32 alone, not to mention the
proliferation of Windows for [Fillintheblank] announcements.
NT is Microsoft's future, but it isn't selling very well. Maybe one of the new
versions (Daytona? Indy? Cairo?) will turn the tide, but 173,000 real copies
sold in 1993 is not impressive. And even Chicago may be a hard sell to IS
managers still trying to complete the transition to Windows 3.1. Meanwhile,
OS/2 is selling hundreds of thousands of copies a month.


Random Rumors fromGloom-and-Doomers


Realizing that I couldn't make sense of these phenomena, I went trolling for
clearly articulated perspectives among some of the operating-system watchers
and builders I know. The perspectives I got were clearly articulated all
right, but_.
One long-time observer of the operating-system scene and a professional cynic
predicted slow progress in operating systems from Apple and Microsoft so long
as a majority of their customers are using 68K and x86 machines. We'll see new
operating-system components added to existing operating systems, many of these
components having been borrowed from unreleased next-generation systems
languishing in their R&D labs. But those next-generation operating systems
will continue to be all promise and no delivery until the push of marketing
hype exceeds the drag of installed-base inertia.
One out-of-work former employee of an operating-system company grumbled that
no operating system these days can make it unless it's named Windows or System
7 or MS-DOS. Superior technology is no guarantee of success against an
established standard, he said, especially in a commoditized market, which is
what the PC market has become.
But if computers are commodities, they're getting to be awfully gaudy ones. An
employed app developer complained about how hard it is to keep up with the
galloping featuritis of modern GUIs. "Bare metal is starting to look awfully
good," he said. Yeah, but where is it an option? A few weeks later, I listened
as an Adobe developer explained how he deals with Apple's incessant demands
that Adobe add support for the latest operating-system gimmick to its apps.
"Sure, Apple," he kids them along, "We'll do that right after we implement
publish and subscribe for ATM."
After wading through some more observations along these lines, I decided I'd
better find out what IBM was saying. After all, IBM has been reevaluating its
operating-system strategy.


I Go Right to the Source


IBM's strategy currently seems to be that it will sell customers any operating
system they want, at least among the five (five!) 32-bit operating systems it
intends to support: WorkPlace OS (aka WPOS, formerly portable OS/2), Windows
NT, AIX now and PowerOpen later, Solaris, and Taligent.
The list doesn't include DOS or Windows, though DOS and Windows apps are
supported through OS/2. It also doesn't include the Macintosh operating
system. To run a Mac app on an IBM PowerPC machine, I gather, you'd have to
run Mac Application Services (Apple's unbundled System 7 components) on top of
PowerOpen on top of WorkPlace OS. PowerOpen is the not-yet-released
replacement for IBM's UNIX (AIX) that will also replace Apple's UNIX (A/UX).
IBM's argument for supporting five operating systems is that the user ought to
have a choice. On the other side of this tissue-thin argument is the fact that
a choice of five operating systems could mean a dearth of applications. Sales
of a million machines wouldn't represent a viable base for application
software if the machines were evenly distributed among five operating systems.
Of course, IBM offered three operating systems on its first personal computer
in 1981, and that situation shook itself out quickly. If the analogy holds, we
are in preshakeout, a good time to clean your basement.
Not all of IBM's lines of computers will support all five operating systems,
but the PowerPC machines will. They will be preloaded with the user's choice
among these 32-bit OSs as soon as they are available. IBM has demonstrated all
five running on PowerPC machines.


And Ask the Boss


According to Andy Jawlik, spokesperson for IBM's Power Personal Systems
Division, the reason for these new processors is to support the next
generation of OSs with several desiderata: robustness--that is, crash
protection, security, and preemptive multitasking; distributed processing; and
new human-interface technologies, including speech recognition, handwriting
ditto, and intelligent agents.
Agents? This refers to some intriguing work afoot at IBM. IBM Power Personal
Systems Division will, it is officially reported, provide tools for creating
object-oriented intelligent agents that use the power of the PowerPC to
provide more human-centered interaction with the computer. IBM expects a
cottage industry of agent developers to grow up around this technology, in a
two-guys-in-a-garage model.
There's another mission for these PowerPC machines: Under a recent reorg, the
client mission for RS/6000 products has been moved over to Power Personal
Division. IBM's first PowerPC machines will run RS/6000 apps. PowerPCs have to
coexist with RS/6000 machines. And run any of five operating systems.
Apple's PowerPC story is simpler.


They're Always on a Steady Course


The story is: Macintosh forever. Or until we say otherwise.

"Macintosh" here means the Mac operating system. Apple's Ross Ely assures us
that there will be multiple operating systems on Macintoshes with PowerPC
hardware, but what he means is just this: In year 1, there will be System 7
and A/UX, the latter of which will "migrate" to PowerOpen. PowerOpen will
appear first on Apple's server line.
Oh, and Apple is "looking at Windows NT."


In the Future, Everything will be Emulated


Emulation has a bad name. That's why, in the future, it will be called "having
a personality."
Taligent, when IBM and Apple release it, will have personalities for all the
important operating systems, meaning that it can run their apps. NT will have
or does have personalities for DOS, Windows, Win32, OS/2, POSIX, and native NT
apps. WPOS will have personalities for DOS, Windows, Win32, OS/2, and AIX.
Sun's Solaris has a Windows personality, but it's not an engaging personality,
since it is based on WABI, an incomplete Windows emulation.
Then there's the ability of IBM's new PowerPC machines to act like POWER
workstations. Although PowerPC is an IBM-Motorola line, the initial 601 chip
is purely an IBM chip. The 601 is designed to run the software of IBM's POWER
line of RISC computers; it's a bridge chip from POWER to the pure PowerPC
architecture. It's not a matter of emulating the POWER architecture, I gather.
(God no, it's not emulation!) They just included the POWER instruction set.
This would seem to mean that native-mode app developers had better be careful
what instructions they use.
Then there's the 615. In mid-1995, IBM will be in production with the PowerPC
615, which will have onboard x86 emulation. This will also be an IBM product,
not a Motorola one. It is expected to run x86 software as fast as a 66-MHz
Pentium. Of course, it would need something like SoftWindows to emulate the
peripheral hardware, which raises some performance questions.
SoftWindows is real emulation. It's the PowerPC version of Insignia Solutions'
SoftPC, which emulates a PC on a Mac. I say PC rather than x86 CPU because the
product supports PC peripherals, too. Microsoft is using Insignia SoftWindows
technology in Windows NT, for which Insignia gets a source license to Windows
3.1. That makes possible rock-solid compatibility within certain limits and
better performance than you might expect. But both these claims come with very
large caveats, which I'll go into when I talk about the new Power Macintoshes.
Yes, Power Macintosh. That's what the new PowerPC-based Macintoshes are
called. As I write this, that's a big secret. By the time you read it, it'll
either be wrong or old news. I know a lot of such secrets. You see, I got
sneaked. Let me tell you about it.


I Get Sneaked


It's November. I'm a little vague as to just what time it is, but I know it's
way too early for me to be up and about. Here I am, though, in the conference
room of Bandley 88, or is it DeAnza 6-5000, not entirely sure how I got here,
not entirely sure why. Across the room I see jelly donuts and some mystery
pastries with powdered sugar on top. Ten feet to the west of me is a huge
steaming coffee urn, and the breeze is to the east.
Gradually, I become aware of a presence. Between me and sustenance stands a
woman in a suit. She is saying something to me, holding out a pen and a piece
of paper that I slowly come to understand is an Apple nondisclosure agreement.
Savvy and cruel as any inner-city crack dealer, these Apple PR people withhold
the goodies until they get theirs.
Three cups of coffee and a sugar fix later, I know that I'm at a technical
press briefing covering Apple's plans for PowerPC. I smile as I watch several
dozen other computer-industry columnists, technology trackers, and high-tech
reporters stagger in and run the gauntlet. Several of the more technical types
gravitate to my table, attracted perhaps by my alert expression and cheery
greeting. A few cups later--or in the case of the nutritionally correct, a hit
of fructose later--their eyes begin to open, too.
We cover the basics quickly, reaching immediate consensus on which pastries
are to be avoided and which are to be stuffed in the pockets to get us through
what's ahead. A MacWorld editor comes by to say hello, but won't sit down
because there are MacUser editors at the table. Protocol.
Finally, we are all ushered across the hall to get sneaked. There will be two
days of sneaking, with lots of inside info that we are not permitted to leak
to a soul until the embargo date some eight weeks hence. Monday morning it'll
all be in MacWeek.


Power Comes to the Mac


There were three issues on people's minds at the sneak: price, performance,
and compatibility. And developer support. There were four issues_.
These issues were the obvious ones. Apple knows it needs to cut into the
Windows/DOS/Intel CPU market if it is to grow. According to Ross Ely, the
biggest objections to Mac from Windows users have been--guess what? Price,
performance, and compatibility.
Is that all? Hey, no problem.
At least, that's the attitude Apple wants to project. The new Power Macs will
answer the first two concerns, goes the story, and the SoftWindows bundle
should help a lot with the last. How fanciful is all this?


The Price of Power


Price: Apple is pricing these machines aggressively. The cheapest, which is a
pretty hot machine, will cost about $2000, monitor included. There will be a
SoftWindows-bundled version for which I don't have a price now, but it should
come with 16 Mbytes of RAM and cost under $3000.
Performance, native mode: Compute-intensive apps should speed up by a factor
of three or four, based on early results I've seen from a variety of sources.
Since the PowerPC 601 is 40 percent faster than a Pentium on floating-point
operations, apps that use floating-point math will probably do better on a
PowerPC than on a Pentium machine. Unfortunately, many of the early Power Mac
native applications will be ports of mainstream applications, not optimized
for the PowerPC, that wouldn't show off its power if they were optimized. One
company that will be pushing the PowerPC hardware from the start is Fractal
Designs of Santa Cruz, CA.
Performance, emulation: To get a fresh perspective on this, I talked to a
friend whose old Mac application hasn't had a major upgrade in several years.
He reported that it ran flawlessly on a PowerPC and was noticeably faster. His
experience seems not to be unusual. Because Apple has ported key elements of
the system to PowerPC, apps that rely on system calls will be sped up when
they run on a PowerPC machine. This isn't pure emulation.
Performance, SoftWindows: Insignia's SoftPC ran Windows apps on Macs at
unacceptable speeds. It looks like you can expect high-end 386 to 486SX/25
speed for Windows apps using SoftWindows on a PowerMac. That's apps; system
tasks will execute faster, though, reportedly rivaling Pentium speeds.
Insignia Solutions optimized Win 3.1 for PowerPC, writing drivers that map
Windows operations directly to QuickDraw. Claims are for high-end 486
performance for apps later this year, and better compatibility, when Insignia
Solutions delivers the 486 version. My friend Hal says that high-end 486
performance from current-generation PowerPCs via 486 emulation is impossible.
Compatibility, Mac apps: Solid, from what I saw, and heard from third-party
developers, at the sneak. An interesting scenario is shaping up. Corporate
buyers are universally expressing cautious optimism, and preparing to act on
the caution, not the optimism. If early appearances hold up, there will be few
problems with Mac compatibility, and these buyers may be revising their
purchase plans midstream.
Compatibility, Windows apps: Early reports describe really solid
compatibility. (Insignia has a Windows source license, and a complete licensed
copy of Windows 3.1 and MS-DOS is part of the package.) SoftWindows reportedly
deals appropriately with Novell NetWare, LAN Manager, Banyan Vines, TCP/IP,
COM and LPT ports, floppy drives, displays, and CD-ROM drives.
The caveat is that SoftWindows emulates a 286. 486 emulation is slated for
later this year. For now, SoftWindows is restricted to standard mode by the
286 emulation, so it won't run apps that check for a 386 on installation, or
that insist on a 386 or 486, or that require Enhanced mode. That includes
current versions of FrameMaker, Quattro Pro, Visual C++, Borland C++, MathCad,
Paradox, Improv, FoxPro, and Access, among other apps. That's a big caveat.
Developer support: Apple has a slew of developer tools for PowerPC
development, including the $399 Macintosh on RISC Software Developer's Kit,
which includes all you need to write PowerPC-based apps on a 68030 or 68040
Mac using C or C++.
But the real news for development is the arrival of Metrowerks on the scene.
This St. Laurent, Quebec company's CodeWarrior development system should be
shipping (not a prerelease version) by the time you read this. It lets you
write code on a PowerPC or 680x0 Mac using C, C++, or Pascal, and compile it
to a 680x0 or a PowerPC binary. The development environment is something like
MPW, something like Think. I've seen it in action, and compilation speed blows
away the competition, meaning Apple and Symantec. This thing is hot.
















May, 1994
C PROGRAMMING


Quincy: A C Interpreter Project




Al Stevens


This month launches a new "C Programming" column project, a C-language
interpreter with a D-Flat integrated development environment (IDE). The
program is named "Quincy," and its origins go back to the mid-eighties. It was
then that I downloaded a shareware program called the "Small C Interpreter"
(SCI) from a BBS. You can still find SCI among C-language shareware tools. SCI
is a K&R subset interpreter with a command-line user interface that resembles
early CP/M and MS-DOS Basic interpreters. SCI is the brainchild of Bob Brodt,
the author of BAWK and other widely used utility programs. I called Bob, and
we agreed to collaborate on a commercial upgrade to SCI. He would add some
features to the language interpreter, and I would rewrite the user interface
with a full-screen editor, debugger, help screens, and such. We named it "QC,"
booked some ads in DDJ and Computer Language, and sat back to wait for the
revenues to come pouring in.
The first sale, surprisingly, was to Microsoft. Their purchase order included
a letter saying that they like to keep abreast of new language products,
particularly in the C arena. I guess they wanted to see what a product named
QC looked like. We didn't know it when we named the product, but Microsoft was
about to introduce QuickC, which was bound to be nicknamed QC. Maybe they were
worried about the name conflict. They stopped worrying when they saw the
package. It was small potatoes, not all that impressive. (QuickC 1.0 wasn't
much better, though.)
Another call came from someone who was upset about the QC moniker and wanted
it changed. I forget his name, but he owned a C compiler also named QC and he
had a sabre to rattle. He was familiar with name problems. His company had
originally advertised as the CodeWorks, and another company, The Austin Code
Works, rattled some sabres of their own and made them change the name. Now The
Austin Code Works was marketing the CodeWorks QC compiler, and he didn't want
our product out there confusing the marketplace. He was right, of course, and
I learned a lesson. Do plenty of product-name research before you spend any
money that commits to a particular choice.
Not wanting a trademark fight, I was fretting about how to change the name of
the program without losing whatever product identification our advertising
efforts had yielded. Just then my daughter Wendy's white cat, Quincy (named
after Jack Klugman's TV character), strolled by. Problem solved, the program
was rechristened Quincy, close enough to QC to preserve the recognition and
far enough away to hold the lawyers at bay.
We had sold a few, a very few, copies of Quincy when I tired of the mail-order
software business. I was making more money playing the piano in a saloon.
Remembering what the QC guy told me, I called The Austin Code Works, and Scott
Guthery, who was to become a famous curmudgeon in these pages, agreed to
handle Quincy if we would allow him to include source code in the package. He
still carries the product and even sells one every now and then. Its proceeds
go to the Brevard County Food Bank.
While integrating Bob's interpreter code into the IDE, I had seen similarities
between it and the Tiny-C interpreter published in a book. Bob told me that he
used the Tiny-C implementation to learn about C interpreters. Scott Guthery
wrote the Tiny-C book. The cycle was complete.
Where is all this leading? Recently, I undertook a book project for an
introduction to programming in C. Not quite C for Dodos, but along those
lines. I wanted to include a compiler or interpreter on the companion diskette
after the fashion of other language tutorials. Contemporary C-compiler
products, as wonderful as they are, are disk-hungry behemoths with a heavy
emphasis on C++. Some vendors will supply limited-feature compilers to go
along with books about their languages and products, but they usually expect
royalties. Not wanting to give up royalties, I resurrected Quincy and looked
it over. Although it had promise, there were two problems. First, the language
implementation was a K&R subset with none of the ANSI improvements. Second,
the IDE did not comply with CUA, the user interface that everyone expects to
see in new applications. I set aside some time and completely rewrote the
program, using D-Flat for the IDE and putting as much of ANSI C as I could
into the interpreter. Not much of the original Quincy code survived, but I
kept the name in memory of that little white cat.
Quincy will be the tutorial environment in my new C book. Eventually, it might
serve as the basis for a book about interpreter design. And starting this
month, this column will run a series of articles about Quincy itself. As with
all of my projects, you can download or send for the source code.


Quincy Design Goals


Quincy (the interpreter) is meant to be used to learn C. It will include a
script-driven tutorial that uses D-Flat help windows to walk a student through
the exercises and explain the lesson being demonstrated. Quincy is not a
wall-to-wall implementation of ANSI C, although it is a healthy subset. The
language implementation lacks some features. Quincy does not support typedef,
multiple-dimension arrays, variable argument lists, goto, or concatenated
string literals. If needed, I'll add those features, but in its current
condition, the interpreter suffices to help teach C to a novice.
Quincy will not link multiple source-code files. It is an interpreter that
loads and executes a source-code file using Run and Debug commands. Just as
the original Small C Interpreter resembled MBasic in the user interface,
Quincy resembles QBasic in this respect. You can use the #include preprocessor
directive to load multiple source-code files into one program. The
preprocessor does not implement all of the ANSI C additions, such as
"stringizing."
I broke Quincy down into several logical parts. The IDE, which is implemented
in D-Flat, is one. The preprocessor is another. The debugger and its interface
to the interpreter are yet another. The C-language interpreter is the last
part. The purpose is to isolate the code that would change if I decide to
install Quincy into a different user interface, such as Windows, or use the
Quincy environment with an altogether different language interpreter.
The first version of Quincy is 4.0, reflecting its ancestry. There were three
versions of the K&R edition.
Quincy is relevant to this column not because DDJ readers need a C tutorial,
but because the program is a study in interpreter design and implementation.
It is also a study in the use of D-Flat to develop an application. You have to
build your own configuration-file definition, command codes, menus, and dialog
boxes in addition to or in place of the D-Flat defaults.
Quincy is also useful for developing small C programs that you can compile
later. It fits on and will run from a 360K diskette, so you can take it with
you no matter what the size of your road machine. (In 1990, I used Quincy
Version 2 to develop the Data Encryption Standard programs that I published in
this column. I was on the road, and my laptop was a T1000 4.77-MHz 8088 with
one 720K diskette drive and no hard disk. Now that you feel sorry for me and
my Spartan roots, let's return to the present.)


Quincy's IDE


Listing One, page 144, is interp.h, the header file that defines the Quincy
menu and command codes, some prototypes, and some external variables. The
command codes are in addition to the ones that the D-Flat Application window
class uses. There are other header files for the debugger and interpreter.
I'll discuss them in later columns.
Listing Two, page 144, is qnc.c, the source file that implements Quincy's IDE.
It is a direct knock-off from D-Flat's example Memopad program. Quincy is not
a multiple-document-interface application like the Memopad, though.
The main function initializes D-Flat, loads the configuration and help files,
creates the application window and its child editor window, and loads a
source-code file if one is named on the command line. The main function also
calls reBreak and unBreak. These functions manage DOS's contrary Ctrl+Break
and Ctrl+C operations. While the IDE is running, the interrupts for these keys
are directed to an empty interrupt-service routine to prevent the user from
exiting the program by pressing one of the keys. Such exits are inhospitable,
at least. They are untimely, too, because the D-Flat environment has other
interrupts hooked. An uncontrolled exit leaves DOS in an unstable state with
interrupt vectors pointing into the vapor. While the interpreter is running a
program, the break keys are set up to let the user interrupt the process to
get out of a dead loop or terminate an errant program.
The QuincyProc function is the window-processing module for Quincy's
application window. It intercepts D-Flat messages and processes them in cases
where they need other than the routine application procedures. The
CREATE_WINDOW message turns on the scroll-bar option if a mouse is installed.
The SIZE message adjusts the sizes of the application's child windows, which
consist of an editor control and a list-box control to display programmed
watch variables. The SHOW_WINDOW message deletes and adds scroll bars when the
user has made that selection from a menu. It also observes when the screen
size and the configured window height disagree and sends a SIZE message to
compensate. The application window intercepts attempts to give it the focus
and passes the focus on to the editor window. Each of the COMMAND messages
calls an associated command function. Some of them, such as the ones to run
and step through the program, are in the debugger. Others, such as the
commands to load and save source code files, are processed here in qnc.c.
The QncEditorProc module is the window-processing module for the Editor window
class. It intercepts its SCROLL and PAINT messages to highlight source-code
lines that have breakpoints set and the one that represents the current
program step position. If the user adds or deletes source-code lines with the
keyboard or by cutting and pasting to and from the clipboard, the program
adjusts the breakpoint table accordingly. The functions that make those
adjustments are in the debugger.
qnc.c includes the command function to display the output screen while the
user is viewing the IDE. Another command function displays a dialog box with
some user-configurable memory parameters.
Quincy uses D-Flat Version 18 or later. That version adds an Editor class to
deal with the problems of source code, such as expanding and collapsing tabs.
It also replaces the File Open and Save As dialog boxes with improved versions
and fixes a few small bugs. I built a special version of the D-Flat object
library, too, to include only those D-Flat features that Quincy needs. All the
MDI stuff is out, for example, which significantly reduces the size of the
executable.


"C Programming" Column Source Code


Quincy, D-Flat, and D-Flat++ can be downloaded from the DDJ Forum on
CompuServe or the Internet by anonymous ftp. See page 3 for details. If you
cannot get to one of the online sources, send a diskette and a stamped,
addressed mailer to me at Dr. Dobb's Journal, 411 Borel Ave., San Mateo, CA
94402. I'll send you a copy of the source code. It's free, but if you want to
support the Careware charity, include a dollar for the Brevard County Food
Bank. They help hungry and homeless citizens.


Template Functions: A Reader Responds


In the February column, I groused about how Borland C++'s template
implementation made building min/max template functions difficult. In March, I
pointed out that the Watcom C++ compiler did not exhibit the same behavior and
worked exactly the way I wanted it to. A reader, who asked to remain
anonymous, responded. He uses Borland compilers and follows the activities of
the ANSI C++ committee. I edited his communications to protect his identity
and to cast our e-mail exchanges into a conversation.
Reader: Regarding your comments about your min/max template functions, please
consider the following, taken from the ANSI C++ WP:
3 A template function may be overloaded either by (other) functions of its
name or by (other) template functions of that same name. Overloading
resolution for template functions and other functions of the same name is done
in three steps:


[1] Look for an exact match (f13.2) on functions; if found, call it.
[2] Look for a function template from which a function that can be called with
an exact match can be generated; if found, call it.
[3] Try ordinary overloading resolution (f13.2) for the functions; if a
function is found, call it.

If no match is found the call is an error. In each case, if there is more than
one alternative in the first step that finds a match, the call is ambiguous
and is an error.
4 A match on a template (step [2]) implies that a specific template function
with arguments that exactly matches the types of the arguments will be
generated (f14.5). Not even trivial conversions (f13.2) will be applied in
this case.
Please note the last sentence of the paragraph; I think it is pretty explicit
about not allowing const int -> int or int -> const int. I have asked a C++
authority about your example, and he has confirmed that Borland C++ and
Symantec C++ are correct in rejecting it, since it is not currently legal.
Having said that, though, I have to point out that the next ANSI meeting is to
consider relaxing the matching rules, such that your example would become
legal, and it is very likely that this proposed change will be accepted.
Stevens: According to a literal translation of the WP, the template in Example
1(a), which I copied from the Borland C++ 4.0 stdlib.h, would be crippled.
This template would work only with const types.
Reader: This is absolutely true, and unfortunately, that happens to be
precisely what the language definition implies, as of today.
I am not saying that the rules should stay this way; I am merely stating what
they are. It's always a difficult process to decide how faithfully to follow
the (evolving) standard, where it just plain doesn't make sense.
Stevens: The Borland compiler does not really work the way a literal
interpretation would indicate, however. Example 1(b) works fine.
The Watcom compiler works the other way. The arguments to a template function
can be any mix of const and non-const. I really don't see any overwhelming
problem with that implementation, and it solves the problems that I pointed
out in the column.
Reader: The above program is correctly flagged by Borland C++ 4.0 as an error,
but only in strict ANSI mode, so I still have to argue that Borland C++
implements these rules pretty consistently. Borland requires the --A switch
for complete compliance, and I am not aware of any options on Watcom that
would make the compiler correctly flag the illegal code above. Of course,
chances are that with the next ANSI meeting over, they will be the ones
correctly implementing this feature.
On this particular point, there is no ambiguity--the language here is
basically broken (but unambiguously so). And the C++ authority that I
consulted agrees that the code is definitely illegal, as far as the language
stands today.
I guess Borland did anticipate ANSI a little bit here by allowing what I just
described without --A, similarly to Watcom, only Borland didn't go quite as
far as Watcom did. With respect to the Watcom behavior, as long as ANSI
actually makes this legal, I'd agree; otherwise, I'd have to argue that by
taking advantage of an extension, you'd be writing nonportable code.
The reader's points are valid. I said in my original complaint that I did not
know which implementation is correct. Now that I know, I am happy to hear that
they are at least considering a repair to what, in my opinion, is a flawed
design.
The ANSI meeting that decides this issue has not taken place, as I write this,
although it should be history by the time you read these words. I won't guess
at the outcome, but remember that my original comments were not so much about
the problem itself, but about C++ issues of style. To circumvent the problem I
ignored the C++ homily that discourages #define and lapsed into classic C. We
do what we have to do until the language makers are finished. Then we do what
we have to do some more.
Example 1: Examining C++ template functions.
(a) template <class T> inline const T& min( const T& t1, const T& t2 ){ return
t1>t2 ? t2 : t1;}

(b) #include <stdlib.h>main(){ int a = 123; int b = 234; int c; c = min(a,b);}


[LISTING ONE]

/* ------- interp.h -------- */

#ifndef INTERP_H
#define INTERP_H

#include "dflat.h"

#define QUINCY "Quincy"

/* ----- Menu and dialog box command codes ------ */
enum {
 ID_OUTPUT = 200,
 ID_SCROLLBARS,
 ID_RUN,
 ID_STEP,
 ID_STOP,
 ID_BREAKPOINT,
 ID_WATCH,
 ID_DELBREAKPOINTS,
 ID_COMMANDLINE,
 ID_EXAMINE,
 ID_CHANGE,
 ID_VALUE,
 ID_STEPOVER,
 ID_ADDWATCH,
 ID_SWITCHWINDOW,
 ID_MEMORY,
 ID_PROGSIZE,
 ID_DATASPACE,
 ID_VARIABLES,
 ID_STACK,
 ID_JUMP
};

void qinterpmain(unsigned char *source, int argc, char *argv[]);
void HideIDE(void);
void UnHideIDE(void);
void PrintSourceCode(void);
void unBreak(void);
void reBreak(void);
int PrintSetupProc(WINDOW, MESSAGE, PARAM, PARAM);

extern WINDOW editWnd;
extern WINDOW applWnd;
extern int currx, curry;
extern DBOX Display;
extern DBOX PrintSetup;
extern DBOX MemoryDB;

#endif

[LISTING TWO]

/* --------------- qnc.c ----------- */
#include <process.h>
#include <alloc.h>
#include "dflat.h"
#include "qnc.h"
#include "interp.h"
#include "debugger.h"

char DFlatApplication[] = QUINCY;
static char Untitled[] = "Untitled";
char **Argv;

WINDOW editWnd;
WINDOW applWnd;
WINDOW watchWnd;
BOOL Exiting;
BOOL CtrlBreaking;

static char *qhdr = QUINCY " " QVERSION;
extern unsigned _stklen = 16384;
static void interrupt (*oldbreak)(void);
static void interrupt (*oldctrlc)(void);
static BOOL brkct;
static int QuincyProc(WINDOW, MESSAGE, PARAM, PARAM);
static void SelectFile(void);
static void OpenSourceCode(char *);
static void LoadFile(void);
static void SaveFile(int);
void ToggleBreakpoint(void);
void DeleteBreakpoints(void);
static int QncEditorProc(WINDOW, MESSAGE, PARAM, PARAM);
static char *NameComponent(char *);
static void ChangeTabs(void);
static void FixTabMenu(void);
static void CloseSourceCode(void);
static void OutputScreen(void);
static void Memory(void);
static void ChangeQncTitle(void);

void main(int argc, char *argv[])

{
 int hasscr = 0;
 if (!init_messages())
 return;
 curr_cursor(&currx, &curry);
 setcbrk(0);
 reBreak();
 Argv = argv;
 if (!LoadConfig())
 cfg.ScreenLines = SCREENHEIGHT;
 applWnd = CreateWindow(APPLICATION, qhdr, 0, 0, -1, -1, &MainMenu,
 NULL, QuincyProc, HASSTATUSBAR );
 ClearAttribute(applWnd, CONTROLBOX);
 LoadHelpFile();
 if (SendMessage(NULL, MOUSE_INSTALLED, 0, 0))
 hasscr = VSCROLLBAR HSCROLLBAR;
 editWnd = CreateWindow(EDITOR, NULL, GetClientLeft(applWnd),
 GetClientTop(applWnd), ClientHeight(applWnd),
 ClientWidth(applWnd), NULL, NULL, QncEditorProc,
 hasscr HASBORDER MULTILINE);
 ChangeQncTitle();
 SendMessage(editWnd, SETFOCUS, TRUE, 0);
 if (argc > 1)
 OpenSourceCode(argv[1]);
 while (dispatch_message())
 ;
 unBreak();
 cursor(currx, curry);
}
/* -- interception and management of Ctrl+C and Ctrl+Break -- */
static void interrupt newbreak(void)
{
 return;
}
int CBreak(void)
{
 if (!Stepping) {
 CtrlBreaking = TRUE;
 if (inSystem)
 longjmp(BreakJmp, 1);
 }
 return 1;
}
void unBreak(void)
{
 if (brkct) {
 setvect(0x1b, oldbreak);
 setvect(0x23, oldctrlc);
 ctrlbrk(CBreak);
 brkct = FALSE;
 }
}
void reBreak(void)
{
 if (!brkct) {
 oldctrlc = getvect(0x23);
 oldbreak = getvect(0x1b);
 setvect(0x23, newbreak);
 setvect(0x1b, newbreak);

 brkct = TRUE;
 }
}
/* --- change application window title to show filename --- */
static void ChangeQncTitle(void)
{
 char *ttl;
 char *cp = Untitled;
 char *cp1 = editWnd->extension;
 int len = 13;

 if (cp1 && *cp1) {
 cp = strrchr(cp1, \\');
 if (cp == NULL)
 cp = strchr(cp1, :');
 if (cp == NULL)
 cp = cp1;
 else
 cp++;
 len = strlen(cp) + 3;
 }
 ttl = DFmalloc(strlen(qhdr) + len);
 strcpy(ttl, qhdr);
 strcat(ttl, ": ");
 strcat(ttl, cp);
 AddTitle(applWnd, ttl);
 free(ttl);
}
/* --- window processing module for Quincy application --- */
static int QuincyProc(WINDOW wnd,MESSAGE msg,PARAM p1,PARAM p2)
{
 int rtn;
 static BOOL InterceptingShow = FALSE;
 BOOL ThisSbarSetting;
 static BOOL PrevSbarSetting;
 static int PrevScreenLines;
 switch (msg) {
 case CREATE_WINDOW:
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 if (SendMessage(NULL, MOUSE_INSTALLED, 0, 0))
 SetCheckBox(&Display, ID_SCROLLBARS);
 else
 ClearCheckBox(&Display, ID_SCROLLBARS);
 FixTabMenu();
 return rtn;
 case SIZE:
 {
 int dif = (int) p2 - GetBottom(wnd);
 int EditBottom = p2 - BottomBorderAdj(wnd);
 if (watchWnd != NULL &&
 TestAttribute(watchWnd, VISIBLE))
 EditBottom -= WindowHeight(watchWnd);
 if (dif > 0) {
 /* --- getting bigger--- */
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 SendMessage(watchWnd, MOVE, GetLeft(watchWnd),
 GetTop(watchWnd)+dif);
 SendMessage(editWnd, SIZE, GetClientRight(wnd), EditBottom);
 }

 else {
 /* --- getting smaller --- */
 SendMessage(editWnd,
 SIZE, GetClientRight(wnd), EditBottom);
 SendMessage(watchWnd, MOVE, GetLeft(watchWnd),
 GetTop(watchWnd)+dif);
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 }
 return rtn;
 }
 case SHOW_WINDOW:
 ThisSbarSetting =
 CheckBoxSetting(&Display, ID_SCROLLBARS);
 if (InterceptingShow) {
 if (ThisSbarSetting != PrevSbarSetting) {
 if (ThisSbarSetting)
 AddAttribute(editWnd,
 VSCROLLBAR HSCROLLBAR);
 else {
 ClearAttribute(editWnd, VSCROLLBAR);
 ClearAttribute(editWnd, HSCROLLBAR);
 }
 }
 if (PrevScreenLines != cfg.ScreenLines)
 SendMessage(wnd, SIZE, GetRight(wnd), cfg.ScreenLines-1);
 }
 break;
 case CLOSE_WINDOW:
 SendMessage(editWnd, CLOSE_WINDOW, 0, 0);
 break;
 case SETFOCUS:
 if (p1 && editWnd) {
 SendMessage(editWnd, msg, p1, p2);
 return TRUE;
 }
 break;
 case COMMAND:
 switch ((int)p1) {
 case ID_NEW:
 OpenSourceCode(Untitled);
 return TRUE;
 case ID_OPEN:
 SelectFile();
 return TRUE;
 case ID_SAVE:
 SaveFile(FALSE);
 return TRUE;
 case ID_SAVEAS:
 SaveFile(TRUE);
 return TRUE;
 case ID_PRINTSETUP:
 DialogBox(wnd, &PrintSetup, TRUE, PrintSetupProc);
 return TRUE;
 case ID_PRINT:
 PrintSourceCode();
 return TRUE;
 case ID_OUTPUT:
 OutputScreen();
 return TRUE;

 case ID_RUN:
 RunProgram();
 if (Exiting)
 PostMessage(wnd, CLOSE_WINDOW, 0, 0);
 return TRUE;
 case ID_STOP:
 StopProgram();
 return TRUE;
 case ID_EXAMINE:
 ExamineVariable();
 return TRUE;
 case ID_WATCH:
 ToggleWatch();
 return TRUE;
 case ID_ADDWATCH:
 AddWatch();
 return TRUE;
 case ID_SWITCHWINDOW:
 SetNextFocus();
 return TRUE;
 case ID_STEPOVER:
 StepOverFunction();
 return TRUE;
 case ID_STEP:
 StepProgram();
 if (Exiting)
 PostMessage(wnd, CLOSE_WINDOW, 0, 0);
 return TRUE;
 case ID_BREAKPOINT:
 ToggleBreakpoint();
 return TRUE;
 case ID_DELBREAKPOINTS:
 DeleteBreakpoints();
 return TRUE;
 case ID_COMMANDLINE:
 CommandLine();
 return TRUE;
 case ID_MEMORY:
 Memory();
 return TRUE;
 case ID_TAB2:
 cfg.Tabs = 2;
 ChangeTabs();
 return TRUE;
 case ID_TAB4:
 cfg.Tabs = 4;
 ChangeTabs();
 return TRUE;
 case ID_TAB6:
 cfg.Tabs = 6;
 ChangeTabs();
 return TRUE;
 case ID_TAB8:
 cfg.Tabs = 8;
 ChangeTabs();
 return TRUE;
 case ID_DISPLAY:
 InterceptingShow = TRUE;
 PrevSbarSetting =

 CheckBoxSetting(&Display, ID_SCROLLBARS);
 PrevScreenLines = cfg.ScreenLines;
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 InterceptingShow = FALSE;
 return rtn;
 case ID_EXIT:
 case ID_SYSCLOSE:
 if (Stepping) {
 Exiting = TRUE;
 StopProgram();
 return TRUE;
 }
 break;
 case ID_ABOUT:
 MessageBox
 (
 "About Quincy",
 " \n"
 " \n"
 " \n"
 " \n"
 " \n"
 " \n"
 " The Quincy C Interpreter\n"
 " Version 4.0\n"
 " Copyright (c) 1994 Al Stevens\n"
 " All Rights Reserved"
 );
 return TRUE;
 default:
 break;
 }
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}
/* --- The Open... command. Select a file --- */
static void SelectFile(void)
{
 char FileName[64];
 if (OpenFileDialogBox("*.c", FileName))
 /* --- see if the document is already in --- */
 if (stricmp(FileName, editWnd->extension) != 0)
 OpenSourceCode(FileName);
}
/* --- open a document window and load a file --- */
static void OpenSourceCode(char *FileName)
{
 WINDOW wwnd;
 struct stat sb;
 char *ermsg;

 CloseSourceCode();
 wwnd = WatchIcon();
 if (strcmp(FileName, Untitled)) {
 editWnd->extension = DFmalloc(strlen(FileName)+1);
 strcpy(editWnd->extension, FileName);

 LoadFile();
 }
 ChangeQncTitle();
 SendMessage(wwnd, CLOSE_WINDOW, 0, 0);
 SendMessage(editWnd, PAINT, 0, 0);
}
/* --- Load source code file into editor text buffer --- */
static void LoadFile(void)
{
 FILE *fp;
 if ((fp = fopen(editWnd->extension, "rt")) != NULL) {
 char *Buf;
 struct stat sb;

 stat(editWnd->extension, &sb);
 Buf = DFcalloc(1, sb.st_size+1);

 /* --- read the source file --- */
 fread(Buf, sb.st_size, 1, fp);
 fclose(fp);
 SendMessage(editWnd, SETTEXT, (PARAM) Buf, 0);
 free(Buf);
 }
}
/* ---------- save a file to disk ------------ */
static void SaveFile(int Saveas)
{
 FILE *fp;
 if (editWnd->extension == NULL Saveas) {
 char FileName[64];
 if (SaveAsDialogBox("*.c", FileName)) {
 if (editWnd->extension != NULL)
 free(editWnd->extension);
 editWnd->extension = DFmalloc(strlen(FileName)+1);
 strcpy(editWnd->extension, FileName);
 }
 else
 return;
 }
 if (editWnd->extension != NULL) {
 WINDOW mwnd = MomentaryMessage("Saving the file");
 if ((fp = fopen(editWnd->extension, "wt")) != NULL) {
 CollapseTabs(editWnd);
 fwrite(GetText(editWnd), strlen(GetText(editWnd)), 1, fp);
 fclose(fp);
 ExpandTabs(editWnd);
 }
 SendMessage(mwnd, CLOSE_WINDOW, 0, 0);
 ChangeQncTitle();
 SendMessage(editWnd, SETFOCUS, TRUE, 0);
 }
}
/* ------ display the row and column in the statusbar ------ */
static void ShowPosition(WINDOW wnd)
{
 char status[60];
 sprintf(status, "Line:%4d Column: %2d", wnd->CurrLine+1, wnd->CurrCol+1);
 SendMessage(GetParent(wnd), ADDSTATUS, (PARAM) status, 0);
}

/* ----- close and save the source code -------- */
static void CloseSourceCode(void)
{
 if (editWnd->TextChanged)
 if (YesNoBox("Text changed. Save it?"))
 SendMessage(applWnd, COMMAND, ID_SAVE, 0);
 SendMessage(editWnd, CLEARTEXT, 0, 0);
 SendMessage(editWnd, PAINT, 0, 0);
 if (editWnd->extension != NULL) {
 free(editWnd->extension);
 editWnd->extension = NULL;
 }
 DeleteAllWatches();
 DeleteBreakpoints();
 StopProgram();
}
/* ---- count the newlines in a block of text --- */
static int CountNewLines(char *beg, char *end)
{
 int ct = 0;
 while (beg <= end)
 if (*beg++ == \n')
 ct++;
 return ct;
}
/* ---- count the newlines in a block of editor text --- */
static int CountBlockNewLines(WINDOW wnd)
{
 return TextBlockMarked(wnd) ?
 CountNewLines(TextBlockBegin(wnd), TextBlockEnd(wnd)) : 0;
}
/* ---- count the newlines in clipboard text --- */
static int CountClipboardNewLines(WINDOW wnd)
{
 return ClipboardLength ?
 CountNewLines(Clipboard, Clipboard+ClipboardLength-1) : 0;
}
/* ----- window processing module for the editor ----- */
static int QncEditorProc(WINDOW wnd,MESSAGE msg, PARAM p1,PARAM p2)
{
 int rtn;
 switch (msg) {
 case SETFOCUS:
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 if ((int)p1 == FALSE) {
 SendMessage(GetParent(wnd), ADDSTATUS, 0, 0);
 if (ErrorCode == 0)
 SendMessage(NULL, HIDE_CURSOR, 0, 0);
 }
 else
 ShowPosition(wnd);
 return rtn;
 case SCROLL:
 case PAINT:
 {
 int lno;
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 /* ---- update source line pointer and breakpoint displays ----- */
 for (lno = wnd->wtop+1;

 lno <= wnd->wtop+ClientHeight(wnd); lno++)
 if ((Stepping && lno == LastStep) 
 isBreakpoint(lno))
 DisplaySourceLine(lno);
 return rtn;
 }
 case KEYBOARD:
 /* --- if user adds/deletes lines,
 adjust breakpoint table in debugger --- */
 if ((int) p1 == \r')
 AdjustBreakpointsInserting(wnd->CurrLine+1, 1);
 else if ((int) p1 == DEL && *CurrChar == \n')
 AdjustBreakpointsDeleting(wnd->CurrLine+2, 1);
 break;
 case KEYBOARD_CURSOR:
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 SendMessage(NULL, SHOW_CURSOR, 0, 0);
 ShowPosition(wnd);
 return rtn;
 case COMMAND:
 switch ((int) p1) {
 case ID_SEARCH:
 SearchText(wnd);
 return TRUE;
 case ID_REPLACE:
 ReplaceText(wnd);
 return TRUE;
 case ID_SEARCHNEXT:
 SearchNext(wnd);
 return TRUE;
 case ID_CUT:
 CopyToClipboard(wnd);
 SendMessage(wnd, COMMAND, ID_DELETETEXT, 0);
 SendMessage(wnd, PAINT, 0, 0);
 return TRUE;
 case ID_COPY:
 CopyToClipboard(wnd);
 ClearTextBlock(wnd);
 SendMessage(wnd, PAINT, 0, 0);
 return TRUE;
 case ID_PASTE:
 /* --- if user pastes lines,
 adjust breakpoint table in debugger --- */
 AdjustBreakpointsInserting(wnd->CurrLine+1,
 CountClipboardNewLines(wnd));
 PasteFromClipboard(wnd);
 SendMessage(wnd, PAINT, 0, 0);
 return TRUE;
 case ID_DELETETEXT:
 /* --- if user deletes lines,
 adjust breakpoint table in debugger --- */
 AdjustBreakpointsDeleting(wnd->BlkBegLine+1,
 CountBlockNewLines(wnd));
 rtn = DefaultWndProc(wnd, msg, p1, p2);
 SendMessage(wnd, PAINT, 0, 0);
 return rtn;
 case ID_HELP:
 DisplayHelp(wnd, "QUINCY");
 return TRUE;

 default:
 break;
 }
 break;
 case CLOSE_WINDOW:
 CloseSourceCode();
 break;
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}
/* -- point to the name component of a file specification -- */
static char *NameComponent(char *FileName)
{
 char *Fname;
 if ((Fname = strrchr(FileName, \\')) == NULL)
 if ((Fname = strrchr(FileName, :')) == NULL)
 Fname = FileName-1;
 return Fname + 1;
}
/* ---- rebuild display when user changes tab sizes ---- */
static void ChangeTabs(void)
{
 FixTabMenu();
 CollapseTabs(editWnd);
 ExpandTabs(editWnd);
}
/* ---- update the tab menu when user changes tabs ---- */
static void FixTabMenu(void)
{
 char *cp = GetCommandText(&MainMenu, ID_TABS);
 if (cp != NULL) {
 cp = strchr(cp, ();
 if (cp != NULL) {
 *(cp+1) = cfg.Tabs + 0';
 if (GetClass(inFocus) == POPDOWNMENU)
 SendMessage(inFocus, PAINT, 0, 0);
 }
 }
}
/* ------- display the program's output screen ----- */
static void OutputScreen(void)
{
 SendMessage(NULL, HIDE_CURSOR, 0, 0);
 HideIDE();
 getkey();
 UnHideIDE();
 SendMessage(NULL, SHOW_CURSOR, 0, 0);
}
/* ---- the user may change certain interpreter memory parameters --- */
static void Memory(void)
{
 char text[20], *tx;
 int i;
 static struct prm {
 unsigned *max;
 unsigned id;
 } parms[] = {

 { &MaxProgram, ID_PROGSIZE }, { &MaxDataSpace, ID_DATASPACE },
 { &MaxVariables, ID_VARIABLES }, { &MaxStack, ID_STACK },
 { &MaxJumps, ID_JUMP }
 };
 for (i = 0; i < (sizeof parms / sizeof(struct prm)); i++) {
 sprintf(text, "%u", *parms[i].max);
 SetEditBoxText(&MemoryDB, parms[i].id, text);
 }
 if (DialogBox(applWnd, &MemoryDB, TRUE, NULL)) {
 for (i = 0; i < (sizeof parms/sizeof(struct prm)); i++)
 *parms[i].max = atoi(GetEditBoxText(&MemoryDB, parms[i].id));
 }
}
End Listings
















































May, 1994
ALGORITHM ALLEY


Trouble in Paradise




Tom Swan


Algorithms are great at describing solutions to problems such as how to sort
an array or how to compress a text file. But algorithms don't usually provide
all the gory details needed to implement a method of solution--my favorite
definition for an algorithm. Error handling, for example, complicates the job
of turning an algorithm into a running program. What should an algorithm do to
help programmers cast out the demons in their code?
The answer depends on the algorithm's needs, its peculiarities,
operating-system requirements, and a host of other variables. How you deal
with trouble in paradise also depends on what you consider to be an error. The
word error suggests that something has gone wrong, but it isn't practical for
algorithms to account for every possible unplanned event such as a hard-drive
failure or a bad parity bit in RAM. Instead, errors at the algorithm level are
better thought of as exceptional conditions that arise during the course of an
algorithm's execution. In this sense, errors are normal, though unusual,
values, conditions, or other happenstance that require special handling.
Why, then, do most algorithms ignore exceptional conditions? I've never seen a
sorting algorithm that explains, for example, what to do with a stack
overflow, even though a proper implementation must plan for that unhappy
event. Programmers are expected to understand that, in addition to
implementing an algorithm's steps, other details such as error handling are
required. Perhaps if algorithms provided more of these details, fewer bugs
would be introduced by neglecting to account for potential problems known to
the algorithm's designer.
ANSI C++ exceptions provide the tools that algorithms can use to do exactly
that. The concepts of exceptions and exception handlers aren't new. They've
been around for years but have usually been provided as library functions or
operating-system interrupt services. As nonsystem-dependent language elements,
exceptions are suitable for incorporating error handling into algorithms. For
those who haven't used exceptions, the following is a brief tutorial. After
that, I'll list an algorithm in pseudo-Pascal and its implementation in C++,
both of which use exceptions to report illegal input values.


C++ Exceptions


Exceptions come with their own terminology. There are three main concepts to
learn--how to create an exception, how to handle one, and how to enable
exception handling.
To create an exception, a program throws an object that describes the nature
of the exception. The object can be a literal value, a string, an object of a
class, or any other object. (An object is not necessarily a class object. It's
just a value that describes an exceptional condition.)
To handle an exception, a program catches the value thrown by another process.
Program statements that catch exceptions are called exception handlers.
To enable exception handling, a program tries one or more processes that might
throw exceptions.
A function throws an exception, signaling a problem, by using a Throw
statement such as: throw 1;. Usually, however, you shouldn't throw literal
integers around--they don't mean much out of context. A better object to throw
is a string: throw "overflow";. That's more meaningful. Elsewhere in the
program, an exception handler can catch and display the message; see Example
1(a).
What happens at this point is up to you. The program might continue or end, or
it could restart the condition that caused the problem. An exception is a
mechanism for reporting and dealing with errors--an exception doesn't dictate
a course of action. Actually, exceptions aren't limited to error handling. An
empty list, for example, could report its bare cupboards as an exceptional
condition that requires special treatment. Exceptions can be objects of any
type, but they are most conveniently represented as instances of a class. For
example, you might declare an Overflow class--it doesn't need any substance,
just a name: class Overflow { };. You can throw an instance of the class as an
exception, throw Overflow();. Elsewhere, the program can catch the exception
in a Catch statement, as in Example 1(b).
The Throw statement throws an object of the Overflow class, which is caught by
the Catch statement at another place in the program (never mind exactly where
for the moment). The technique might be easier to understand by giving the
object a name; see Example 1(c).
You might call error-class member functions for the named exception object.
For example, class Overflow could declare a member function Report, as in
Example 1(d). The Catch statement can call Report for the thrown exception
object to display an error message; see Example 1(e).
Now it's time to toss in another wrinkle--try blocks. Consider the sample
function in Example 2(a) that throws a couple of exceptions. If conditionA is
True (whatever that is), the function throws a string exception that reports
"Big trouble!." If conditionB is true, the function throws an object of the
Overflow class, initialized by the default constructor. If the function
detects no errors, it returns normally, passing back the return value "123."
Throwing an exception immediately terminates the function. In other words, an
exception provides functions with an alternate return mechanism. Rather than
reserve a special value such as --1 that, if returned by a function, signals
an error, exceptions make it possible for functions to return a different type
of value that flags a problem.
To enable exception handling, call a function inside a try block. For example,
Example 2(b) shows how you might call AnyFunction.
A try block contains one or more statements for which you want to catch
exceptions. The most important rule to remember about try blocks is that Catch
statements must follow them immediately. You cannot have a try block in one
place and Catch statements elsewhere. That would be like having a baseball
pitcher in one stadium and the catcher in another. Example 2(c) is a more
complete example that keeps all the necessary players in the same ballpark.
The expanded code first tries to call AnyFunction. If the function returns
normally, the program assigns the function result to x. In that case, the
program skips the two Catch statements because there are no exceptions to
handle. If AnyFunction throws an exception, however, the try block immediately
terminates, and a subsequent Catch statement catches the thrown object.


Exceptions in Practice


There are several other esoteric details concerning exceptions, their
declaration forms, and the consequences of throwing exceptions in
constructors, destructors, and so forth. My aim here, though, is not to
provide a complete tutorial on C++ exceptions, but to introduce the main
concepts as a means for dealing with errors in algorithms expressed in Pascal.
Example 3 shows a sample algorithm in pseudo-Pascal for raising a real number
to any real-number power.
The method uses the formula exp(E * ln(B)), where E is the exponent and B is
the base. Textbooks on programming typically list this algorithm, but neglect
to account for conditions that cause the method to fail--raising a negative
base to a fractional exponent, for instance, and raising zero to an exponent
less than zero. As a consequence, many programs that use the standard formula
simply halt when fed illegal input--not exactly a robust implementation. In
the example, on attempting to raise a negative base to a fractional exponent,
Power accounts for illegal arguments by throwing an exception. The resulting
Error object completely describes the problem, and it includes copies of the
illegal input values: throw Error(b, e);.
I don't know of any Pascal compilers that have Catch, Throw, and Try keywords,
so you probably can't run the example as listed. Using an ANSI C++ compiler
such as Borland C++ 4.0, however, you can compile and run the algorithm's
implementation in Listing One (page 147). The sample program includes a Power
function that throws exceptions for illegal arguments. Call Power in a try
block like that in Example 4(a). If Power throws an exception, the final
output statement is skipped and the try block immediately ends. A subsequent
Catch statement can trap the exceptions; see Example 4(b).
Listing One's Power function also shows one way to use exceptions to trap
errors in source-code implementations--a function that doesn't account for all
possible input values, for example. Enable the last commented statement in
Power to throw the exception "Implementation Error" if the preceding code
doesn't account for all possible exponent and base arguments (see the Error
class's default constructor). This statement causes the compiler to generate
an expected "Unreachable Code" warning. The warning is desirable in this case
because it indicates that the preceding statements provide for all possible
exit paths from Power.


Unhandled Exceptions


You might wonder: What happens to dropped exceptions--those that the program
fails to catch? The answer is simple and logical. Unhandled exceptions pass
upwards in the call chain until handled by a Catch statement or until there
are no more exception handlers left. In that case, C++ calls one of three
global functions to deal with the exception: Abort, Terminate, or Unexpected:
Exceptions that are not handled by a Catch statement call the Unexpected
function. By default, Unexpected calls Terminate.
Exceptions that detect a corrupted stack or that result from a destructor
throwing an exception (a dangerous practice to be reserved only for the most
critical of problems) call the Terminate function. By default, Terminate calls
Abort.
The Abort function is lowest on the totem pole. As you might expect, Abort
ends the program immediately. Programs should never directly call Abort.
Obviously, Unexpected, Terminate, and Abort are intimately related. You can
replace the first two functions with your own code to deal with unhandled
exceptions in whatever way you wish. In some programs, it might make sense to
replace Unexpected with a default error handler. In others, you might want to
replace Terminate to sweep the floors and dust the rugs (and close open files)
in case a program ends prematurely. You cannot replace Abort. An Unexpected
function may throw an exception, in which case the search for an exception
handler (that is, a Catch statement) begins at the location that originally
caused Unexpected to be called. A Terminate function may not throw an
exception. Neither Unexpected nor Terminate may return normally to their
callers. Call the C++ set_terminate and set_unexpected functions to replace
Terminate and Unexpected with your own code. Because these functions are
defined at the language level, rather than as low-level system subroutines,
algorithms and their implementations can employ exceptions for error handling,
while still providing programmers complete control over the specific actions
to be taken when trouble brews.


Your Turn


Next month, more algorithms and techniques in Pascal and C++. Meanwhile, send
your favorite algorithms, tools, and comments to me in care of DDJ.
Example 1: (a) An exception handler can catch and display a message; (b) a
program can catch an exception in a Catch statement; (c) using the Throw
statement; (d) class Overflow declares a member function Report; (e)
displaying an error message.

(a) catch (char* message) { cout << "Error! -- " << message << endl;}

(b) catch (Overflow) { cout << "Overflow detected!" << endl;}

(c) catch (Overflow overObject) { // ...}
(d) class Overflow { void Report() { cout << "Error: overflow" << endl; }};

(e) catch (Overflow overObject) { overObject.Report();}


Dr. Dobb's Journal, May 1994 #
Example 2: (a) Throwing exceptions; (b) enabling exception handling by calling
a function inside a try block; (c) calling AnyFunction.
(a) int AnyFunction(){ if (conditionA) throw "Big trouble!"; if (conditionB)
throw Overflow(); return 123; // normal return}

(b) try { int x = AnyFunction();}

(c) try { cout << "Here we go! << endl; int x = AnyFunction(); cout << "x == "
<< x << endl;}catch (char* message) { cout << "Error! -- " << message <<
endl;}catch (Overflow) { cout << "Overflow!" << endl;}


Example 3: Pseudocode for Algorithm #19 (Power).
function Power(Base, Exponent: double):
double;
 function Pow(B, E: double): double;
 begin
 Pow := exp(E * ln(B))
 end;
begin
 if (Base > 0.0) then
 Power := Pow(Base, Exponent)
else if (Base < 0.0) then
 begin
 if frac(Exponent) = 0.0 then
 if odd(trunc(Exponent)) then
 Power := -Pow(-Base, Exponent)
 else
 Power := Pow(-Base, Exponent)
 else
 Throw Error(Base, Exponent)
 end else
 begin
 if Exponent = 0.0 then
 Power := 1.0
 else if Exponent < 1.0 then
 Throw Error(Base, Exponent)
 else
 Power := 0.0
 end
end;


Example 4: (a) Calling Power in a try block; (b) trapping exceptions with a
Catch statement.
(a) try { double base, exponent, result // ... prompt for base and exponent
result = Power(base, exponent); cout << "result == " << result << endl;};
(b) catch (Error& e) { e.Report(); return -1;}return 0;
[LISTING ONE] (Text begins on page 123.)
[LISTING ONE]
// powex.cpp -- Implementation of Power function
// requires Borland C++ 4.0 or ANSI C++ compiler with exceptions
// Copyright (c) 1994 by Tom Swan. All rights reserved


#include <iostream.h>
#include <math.h>

class Error;

double Pow(double b, double e);
double Power(double b, double e) throw(Error);

class Error {
 double b; // Base
 double e; // Exponent
public:
 Error()
 { cout << "Implementation error!" << endl; }
 Error(double bb, double ee)
 : b(bb), e(ee) { }
 void Report();
};

int main()
{
 cout << "Power Demonstration\n\n";
 cout << "This program displays the result of raising\n";
 cout << "a value (base) to a power (exponent). To\n";
 cout << "force an exception, enter a negative base\n";
 cout << "and a fractional exponent (e.g. -4 and 1.5)\n";
 cout << "Or, enter a zero base and an exponent less than\n";
 cout << "zero.\n\n";
 try {
 double base, exponent, result;
 cout << "base? ";
 cin >> base;
 cout << "exponent? ";
 cin >> exponent;
 result = Power(base, exponent);
 cout << "result == " << result << endl;
 }
 catch (Error& e) {
 e.Report();
 return -1;
 }
 return 0;
}

// Subfunction to Power
double Pow(double b, double e)
{
 return exp(e * log(b));
}

// Return b raised to the e power
double Power(double b, double e) throw(Error)
{
 if (b > 0.0) return Pow(b, e);
 if (b < 0.0) {
 double ipart;
 double fpart = modf(e, &ipart);
 if (fpart == 0) {
 if (fmod(ipart, 2) != 0) // i.e. ipart is odd

 return -Pow(-b, e);
 else
 return Pow(-b, e);
 } else
 throw Error(b, e);
 } else {
 if (e == 0.0) return 1.0;
 if (e < 1.0) throw Error(b, e);
 return 0.0;
 }
// throw Error(); // Unreachable code warning expected
}

// Display values that caused an exception
void
Error::Report()
{
 cout << "Domain error:"
 << " base:" << b
 << " exponent:" << e
 << endl
 << "Press Enter to continue...";
 char c;
 char buffer[80];
 if (cin.peek() == \n') cin.get(c);
 cin.getline(buffer, sizeof(buffer) - 1);
}
End Listing


































May, 1994
UNDOCUMENTED CORNER


LA Law




Andrew Schulman


On Wednesday, February 23, 1994, a federal jury in Los Angeles delivered its
verdict in the case of Stac Electronics vs. Microsoft Corporation. Stac had
sued Microsoft, charging that DoubleSpace in MS-DOS 6 infringed on Stac's LZS
data-compression patent. Microsoft countersued, charging among other things
that, by reverse-engineering the undocumented "preload" interface in DOS 6 and
using it in Stacker 3.1, Stac had misappropriated Microsoft's "trade secrets."
The trial began on January 18, just after the devastating Los Angeles
earthquake, and, as a paid consultant and potential expert witness for Stac, I
had a front-row seat at the proceedings.
The jury's verdict was, in its own way, a small earthquake. The jury awarded
Stac $120 million in damages for patent infringement by Microsoft; this is $10
million more than Stac asked for. In turn, the jury also awarded Microsoft
$13.6 million in damages for trade-secrets misappropriation by Stac. Microsoft
lost both its own patent-infringement claim against Stac (Microsoft had bought
a data-compression patent which predated Stac's) and a "breach of contract"
claim.
In sum, Microsoft lost every part of this case except its trade-secrets
misappropriation claim.
While the large patent-infringement award to Stac seems like the big news, the
smaller trade-secret award to Microsoft is at least as interesting, because of
its direct connection to reverse engineering and the use of undocumented
interfaces in the PC software industry. Even a brief article in the New York
Times (February 24) picked up on the fact that the trade-secret claim
"centered on Stac's use of what is known as an undocumented call in MS-DOS."
By making Stacker 3.1's use of the undocumented preload interface out to be
"trade-secrets misappropriation," Microsoft put reverse engineering and the
use of undocumented interfaces on trial. And the eight-person LA jury agreed
with Microsoft that Stac's use of Nu-Mega's Soft-ICE debugger to
reverse-engineer the undocumented preload interface, and Stacker 3.1's use of
the preload interface when running under MS-DOS 6, constituted trade-secrets
misappropriation.


Why is This Interface Different from All Other Interfaces?


The jury's decision seems odd because Stac's reverse-engineering of the
undocumented preload interface, for use in Stacker 3.1 under MS-DOS 6, is no
different from dozens of previous uses of undocumented interfaces. Microsoft
has never before claimed that undocumented interfaces were "trade secrets."
The list of utilities that employ undocumented DOS or Windows interfaces is
quite long. A few examples include the Norton Utilities, Central Point PC
Tools, 386MAX, QEMM/386, DesqView, NetWare, and Sidekick. Microsoft has never
claimed that any of these products were misappropriating trade secrets. In
fact, Stack itself used other undocumented DOS calls in Stacker 1, 2, and 3.
This is the first time that calling an undocumented function has been viewed
as stealing a trade secret. Either Microsoft has decided to call into question
the entire past history of the PC software industry, or it somehow views
Stac's use of the undocumented preload interface as different from all
previous uses of undocumented calls.
The preload interface is what IO.SYS in MS-DOS 6.0 and higher uses to load a
block device driver named DBLSPACE.BIN early in the DOS boot sequence, before
processing CONFIG.SYS. A description of all the preload calls is given in
Geoff Chappell's DOS Internals (Addison-Wesley, 1994). A partial description
of the preload is given in Undocumented DOS, second edition (Addison-Wesley,
1993).
Frankly, the preload interface is no big deal. As Chappell puts it, the
preload "has the appearance of a hack." Figure 1 shows a pseudocode summary of
this $13.6 million interface, as seen from IO.SYS's perspective. According to
Microsoft, this interface is a valuable "trade secret" that took one man-year
to develop. To be preloaded, a driver must respond to these calls. What Stac
did was figure out this interface, and modify Stacker to respond appropriately
so that it would be preloaded under DOS 6.
It is difficult to see what makes this interface a valuable trade secret. With
the publication of Chappell's book, and to a certain extent with the earlier
publication of the second edition of Undocumented DOS, it's no longer a
secret. In any case, it seems no different from any other undocumented DOS
interfaces, such as the network redirector, the List of Lists, the Swappable
Data Area, the once-undocumented interfaces used by TSRs, or the COMMAND.COM
installable command interface, all of which applications have been using for
years.
Consider Microsoft's response to my article, "Examining the Windows AARD
Detection Code" (DDJ, September 1993). This article was explicitly based on
reverse-engineering an encrypted piece of code in WIN.COM that attempts to
detect if the user is running Windows on a non-Microsoft version of DOS. The
AARD code could only achieve its purpose if other vendors didn't know what the
code was testing for. If any piece of commercially available code were a
"trade secret," it would have to be this. Yet, in Microsoft vice president
Brad Silverberg's response ("Letters," DDJ, January 1994), not once did he
claim that this code was a trade secret, or that I shouldn't have
reverse-engineered it. Why the preload interface is a trade secret, while the
AARD code--which actually requires secrecy to serve its purpose--is not, is a
mystery.
Another example of how Microsoft's case against Stac represents a 180-degree
turn from previously held positions is the "Microsoft Statement on the Subject
of Undocumented APIs," issued on August 31, 1992 in response to a controversy
in the press over the book Undocumented Windows. A Q&A section in Microsoft's
statement included the question, "Why are there undocumented APIs?" The answer
provided half a dozen reasons, but the explanation that undocumented APIs are
trade secrets was nowhere among them. To the question, "How do ISVs uncover
undocumented APIs?," Microsoft answered that "Finding these APIs is quite
simple using the many debuggers available in the market." That, naturally, is
what Stac tried to point out during the trial.
So what explains the jury's verdict? Remember that Microsoft didn't win its
"breach of contract" claim, so the standard "You may not reverse engineer,
decompile, or disassemble the software" boilerplate in Microsoft's beta
agreements and shrink-wrap licenses was not an issue. These attempted
limitations on reverse engineering have not been found enforceable, and the
Stac verdict does not seem to change this.


Copying the Design?


Instead, the jury seems to have been convinced that Stac "copied the design"
of the preload, a phrase repeatedly used by Microsoft. Perhaps the jury, in
awarding damages to both Stac and Microsoft, felt that there was some parallel
between Microsoft's patent infringement and Stac's reverse engineering.
The "copied the design" phrase is interesting, since Microsoft made no
copyright infringement claim against Stac. Microsoft's expert witness said
that Stac didn't copy the "literal program code itself." But, he said, "they
copied the design part." This refers to nothing more than the fact that
Stacker 3.1 can be preloaded under MS-DOS 6, just like DoubleSpace.
This nebulous "copied the design" slogan obscured the point that Stac merely
figured out how to be compatible with an interface in MS-DOS 6. Stac didn't
release a competing DOS, but a product that used an undocumented feature in
DOS. Stac didn't copy the feature or design in their product; they interfaced
with it. This distinction between copying and interfacing is clear to any
programmer who has ever figured out an API and called it. However, the jury
decided that using the BOOT command in Soft-ICE to reverse engineer an
undocumented interface, and then using this interface, is equivalent to
copying a design.
Stac never saw Microsoft's source code; it learned everything it needed using
Soft-ICE on the binary MS-DOS code, but the jury may have thought that Stac
had taken MS's source code, viewing this as parallel to MS's infringement of
Stac's patent. An important part of Microsoft's case was a persistent use of
the term "source code" to describe disassembled listings. Microsoft made
frequent, irrelevant reference to how it "protects the underlying source code
as a trade secret."
Since Stac had never seen or taken Microsoft's actual source code, Microsoft
had to make it seem as if disassembly could produce an equivalent to the
original source code. Thus, referring to Stac's reverse engineering,
Microsoft's expert witness said that Stac "spent a great deal of time
effectively putting the comments back into the disassembled code." This
reflects a basic misunderstanding of what reverse engineering can and can't
do. It can't "put back" the original source-code comments, variable names, or
function names. A developer can, of course, come up with new comments and
names. But he cannot "put back" anything that has been removed during
compilation or assembly. Reverse engineering cannot turn a binary product back
into the original source code for the same reason that you can't turn a
McDonald's hamburger back into a cow. Yet, this "put back" phrase, and the
image of source-code copying it suggested, may have carried some weight with
the jury.


Trust Us


If reverse engineering and calling an undocumented interface now constitute
"copying a design," it is not clear how developers are supposed to develop
their products when sufficient documentation is absent. Microsoft's expert
witness (whose most recent publication dates from 1974) merely recommended
what may have been appropriate in the period predating the PC mass-market
software industry: You must ask the vendor (whom, perhaps, you know on a
first-name basis) for additional information.
In deposition testimony quoted during the trial, Microsoft's expert was asked,
"Do you know what general industry practices are in circumstances in which
there isn't enough information about the operating system available from the
operating-system vendor?" His answer was simply "No." During trial,
Microsoft's attorneys suggested that "when you wanted information, you called
the person who owned the product."
Thus, Microsoft's position is to turn back the clock: Trust the vendor. No
means of independent discovery are required, since the operating-systems
vendor will supply all you need. Anything the vendor doesn't supply, by
definition you don't need.
Microsoft chairman Bill Gates testified on January 28. At one point, Gates was
asked by a Stac attorney if good examples of reverse engineering would include
buying a toy and figuring out how it was made, chemically analyzing a cookie
to determine its ingredients, or General Motors buying a Japanese car and
taking it apart. Gates agreed these were all good examples of reverse
engineering, but "I know in our industry that type of reverse engineering is
prevented."
This was the position that Microsoft put before the jury: that reverse
engineering, and the use of undocumented calls, is uncommon (indeed,
"prevented") in the PC software industry. This, Micorosoft knows, is simply
not true.


A Chilling Effect?


It is possible that the verdict will have little effect. After all, Microsoft
claims that Stac's reverse engineering of preload is somehow "totally
different" from any previous reverse engineering that has gone on in the PC
software industry. This is absurd, but for those developers who have used an
undocumented interface, it is convenient. Probably the safest way to avoid a
trade-secrets misappropriation claim from Microsoft is to not sue Microsoft
for patent infringement. Consider, too, that because of the Justice Department
investigation of Microsoft, the company may not have as free a hand as it
would like in taking this verdict as a precedent for further action against
DOS-utilities vendors.
Both Stac and Microsoft immediately asked the judge to set aside the jury's
decision, and both companies will surely appeal. The final chapter has yet to
be written in this case.

Still, the Stac verdict does seem to establish the right for a company such as
Microsoft to declare, out of the blue, that one of its undocumented interfaces
is suddenly a trade secret. In cross-examination of MS-DOS product manager
Brad Chase, a Stac attorney asked how Microsoft decides whether it's okay to
reverse engineer the preload. Chase's answer: "There's no set rules. It's done
on a case-by-case basis." Taken seriously, this no-rules rule could turn
undocumented interfaces into an unpredictable legal minefield.
Figure 1: The $13.6 million interface: The DOS 6.0 preload.
use 21/4B03 (Load Overlay) to load file named \DBLSPACE.BIN
check that offset 12h in file == 2E2Ch // signature ",."
fp = offset 14h in file // function pointer to driver
(*fp)(ax = 6, es:bx = DD init packet) // modified device driver init
size = (*fp)(bx = 4) // query size
(*fp)(bx = 6, es = new location for driver) // relocate
(*fp)(bx = 2, ah = 55h, al = number of drives) // mount drives
(*fp)(bx = 0) // preload done: driver should hook INT 2Fh, etc.
for each device= line in CONFIG.SYS
 if it's a block device driver
 call 2F/4A11 bx = 0 // documented GetDriverInfo
 (*fp)(bx = 2, ah = 55h, al = number of drives)
 DBLSPACE.SYS /MOVE:
 2F/4A11 bx = -1 to get driver size
 2F/4A11 bx = -2 to move driver













































May, 1994
PROGRAMMER'S BOOKSHELF


Applying Cryptanalysis




Al Stevens


Al is a DDJ contributing editor. He can be reached through the DDJ offices or
on CompuServe at 71101,1262.


cryptography (krip-tgra-f) n. The art or process of writing in or
deciphering secret code.
A Sherlock Holmes story tells how Professor Moriarty, despite his superior
intelligence and mathematical skills, is unable to break a simple code, one
that uses a particular edition of a rare book as the key. The story does not
reveal the secret of the code until the end. The ciphertext--the encrypted
message--consists of page and word numbers. There is no repetition, no
underlying formula that Moriarty could derive from the seemingly random
numbers in the ciphertext. To decrypt the message, Moriarty has to know the
algorithm--the cipher--after which he needs a matching edition of the key
codebook, of which only two copies exist, and they are in the possession of
the covert correspondents. Unbreakable code, according to the artistic license
exercised by Conan Doyle.
Cryptography is nearly as old as the written word. Surreptitious communication
has been a requirement of governments and their armies since the beginning of
organized society. Some of the first applications for digital computers were
computations to assist codebreakers. Programmers take to encryption/decryption
algorithms naturally. The subject intrigues us because it involves complex
puzzles and their solutions and because the solutions reveal the locked-away
secrets of others. We like to use our wits to get into locked places, to
outsmart those who contrive the locks, to contrive locks of our own that
others cannot pick. To do so requires an understanding of
code-making--cryptography--and code-breaking--cryptanalysis.
Applied Cryptography, by Bruce Schneier, is a comprehensive treatment of the
subject, describing its concepts and demonstrating its processes. It is the
definitive work on cryptography for computer programmers. According to this
book, applications for cryptography fall into two camps--those meant to deter
the mildly curious and those that must stand up to aggressive and expert
assault. The dedicated cryptographer has no use for the former and strives for
perfection in the latter. Schneier is a dedicated practitioner. He
characterizes the first type of applications as being suitable for defending
against your kid sister and the second for standing off the agents of major
governments. Apparently there is no middle ground.
Applied Cryptography begins with a discussion of the terms and processes of
cryptography. It describes several classical manual encryption processes and
some computer algorithms. It even provides an example of a simple XOR
algorithm written in C, borrowed and slightly modified from a DDJ "C
Programming" column I wrote several years ago. I was pleased to see my code
included in this work until I read the part where the author called the
algorithm an embarrassment, one that produces code that is trivial to break,
good enough only to keep your kid sister from reading your files. I don't have
any sisters, so that wasn't my intent. The point of the example, however, is
that the simple algorithm is the same one that many respectable commercial
applications use for encryption. The message is that if you depend on those
applications for security in more than the most benign of environments, you
are being short-changed. Although Schneier references the DDJ column and uses
its code to make the point, he fails to say that the column pointed out the
algorithm's weaknesses, then implemented the much more secure DES algorithm
that month and again with some corrections two months later. That omission is
my only criticism of the book, and it is a personal one. If you weren't me,
you wouldn't notice and it wouldn't matter.
Although Applied Cryptography describes itself as a reference book, it also
serves as an advanced wall-to-wall tutorial on cryptography. You cannot turn
to a chapter and expect to understand everything without some preparation. For
example, you should read Chapter 2, "Protocol Building Blocks" before going
further. This chapter defines a cast of characters used throughout the book in
scenarios about messages, encryption, decryption, and codebreaking. If you are
interested in public-key algorithms, for example, and you jump to Chapter 12,
you will find yourself reading about situations involving Bob and Alice, with
no explanation of who they are and what roles they play. Chapter 2 is a
prerequisite to much of the rest of the book.
Applied Cryptography discusses the degree of security that each algorithm
offers. A cryptanalyst can use a computer to decrypt ciphertext with
brute-force techniques, but, depending on the algorithm, the time required for
a Cray to do it might be measured in multiples of the life of the universe.
Decryption of an encrypted message requires three things: the decryption
algorithm, the key, and the message itself. We encrypt messages because the
carrier is not secure, and we assume that an interloper can eavesdrop. Much of
the book's discussion about codebreaking assumes that the intruder knows which
encryption algorithm is being used. How they could know this is not always
obvious. In a benign environment, it could be as simple as looking at your
application and knowing or reverse-engineering the algorithm the application
employs. In a more hostile environment, the enemy has to apply other
intelligence-gathering techniques to find out how you encrypt your messages.
The book assumes rightly that such techniques exist and are effective. If the
snoop learns the key as well as the algorithm, then no codebreaking is
necessary.
Many encryption algorithms depend on the use of a random-number generator. If
my computer and your computer both generate identical random-number sequences
given the same seed, then we can build a reasonably secure cryptosystem, or so
it would seem. As long as the sequence is truly random with no repeating
patterns, there are no patterns for the codebreaker to observe. The book
points out the problem with these schemes. Computer random-number generators
are effectively only pseudorandom-number generators. Give them enough cycles,
and they will repeat themselves. Repetitions produce patterns, on which a
cryptanalyst thrives.
Novice cryptographers might assume that their security lies in the secrecy of
the algorithm itself. You might reason that a codebreaker can have the
ciphertext message and even the key and still not be able to decrypt the
message because the algorithm itself is unknown. For example, a file
compressed with a Huffman tree appears to be a random pattern of bytes, not
even the same length as the original plaintext. Strip the character-frequency
array and use it as a key, and an essential piece of the decryption
(decompression) is missing from the ciphertext. What is left for a codebreaker
to work with? It looks pretty secure to the casual observer.
There are flaws in this logic. First, the mission of espionage is to uncover
that which is supposed to be kept secret. Spies have ways of learning your
algorithm. Remember that, before you can send an encrypted message, you and
your correspondent have to agree on a secure mode of communication, including
the algorithm and the key. That agreement itself is a communication, carried
out before a secure mode is established, and subject to interception by a
potential codebreaker. Second, if you are intent on maintaining the strictest
security, you will select an algorithm known among cryptographers as being
secure. The enemy knows what those algorithms are, too, and often knows how to
determine the algorithm by analyzing patterns in the ciphertext. To quote the
book, "trusting_algorithms is like trusting snake oil." In the example of the
Huffman pseudo-encryption, the ciphertext turns out to be a simple
substitution algorithm, except that the substitution tokens are
variable-length bit streams rather than characters. Substitution algorithms
are among the easiest to break, the book teaches us.
Rather than rely on the security of the algorithm itself, most cryptographers
use well-known algorithms that resist attack. Chapter 7, "Keys," tells us that
a truly strong algorithm resists all but a brute-force attack, in which the
attacker tries every possible key value. Depending on the key length, this
process can take from milliseconds for a 1-byte key to 1025 years for a
16-byte key, which should be long enough. The best algorithms resist all but
brute-force attacks, but even they might not always be secure enough. Given
the rapid growth of computational speed, the advances in parallel processing,
and the willingness of governments to commit unlimited resources to the
gathering of intelligence, the day might come when brute-force computer
attacks are more productive than applied cryptanalysis.
Applied Cryptography says that cryptography involves protocols, each of which
involves two or more people who understand and agree on the protocol. A
codebreaker is an outsider who attacks the protocol. I had a problem with the
requirement for two or more people, which assumes that all encrypted
information is passed to another person. It does not provide for private
encryption of private information to be secure from the prying eyes of kid
sisters and other interlopers. This is only a matter of interpretation,
however. Ignore it, and the book loses no information.
The book teaches about private keys, session keys, public keys, digital
signatures, and numerous cryptography protocols. Each protocol includes a
scenario where the cast of characters exchange messages in ways that explain
the protocol. These discussions are invaluable to someone who is evaluating
encryption methods to select one that best matches a particular set of
requirements.
The association between public-key encryption and digital signatures is
interesting. Each user of a public-key cryptosystem publishes a public key.
Correspondents use the public key to encrypt messages to the user. An
associated private key, known only to the user, decrypts the message. Everyone
can send encrypted messages but only the intended receiver can decrypt them.
Inversely, a message encrypted with the private key can be decrypted only with
the associated public key. Receivers know that a broadcast message originated
with a particular user because only that user's public key successfully
decrypts the message. The encryption becomes, in effect, a digital signature.
Schneier's book describes most popular encryption algorithms with C-language
source-code examples to implement some of them. The companion diskette set,
which you must order directly from the author, includes implementations of
most algorithms, including DES, RSA, Diffie-Hellman, and PEM. A text file on
the diskette explains how to get PGP.
There is a two-page treatment of Clipper, the NSA chip with the backdoor that
only the government can open. The discussion touches on the privacy issues
associated with such an insidious scheme.
Chapter 18 is about politics. It describes the missions of various government
agencies and private organizations with respect to cryptography. It discusses
and identifies several software patents that apply. It warns you about
exporting cryptographic technology under threat of federal arms-trafficking
laws.
Applied Cryptography represents a monumental body of knowledge, particularly
to the programmer. I do not know of another work that encapsulates as much
information about cryptography and then supplies the computer code to
implement the algorithms that it describes. Even a programmer who is only
mildly interested in cryptography will find this book fascinating. If you plan
to put encryption logic into an application, get the companion diskette, which
contains many more source-code examples than are printed in the book and is a
virtual grab bag of encryption algorithms. At the same time, educate yourself
on the legal implications of using whichever algorithms you choose. They might
be patented or you might need a license to export them.
No matter how you use this book, though, Applied Cryptography is an
interesting and comprehensive explanation of an enigmatic subject, and well
worth the time you will spend with it.
Applied Cryptography: Protocols, Algorithms, and Source Code in C
Bruce Schneier
John Wiley & Sons, 1994, 618 pp. $44.95
ISBN 0-471-59756-2























May, 1994
OF INTEREST
The Hypersignal for Windows Advanced Transmission Library from Hyperception is
a set of design and analysis blocks for designing, analyzing, and prototyping
digital transmission systems--radio, wireline, fiber-optic, and the like.
Designed by John Bellamy, author of the book Digital Telephony, the
hypersignal transmission library for software simulation of baseband
transmission models, modulation/demodulation (FSK, PSK, and QAM), arbitrary
filter design, system-performance measures (BER, eye patterns, and jitter
analysis), and the like. The transmission library, which is used in
conjunction with the Hypersignal for Windows Block Diagram toolkit, sells for
$1495.00. Reader service no. 20.
Hyperception
9550 Skillman, Suite 302
Dallas, TX 75243
214-343-8525
The Software Patent Institute has published its first issue of The SPI
Reporter, a quarterly newsletter that keeps developers up to date on the
latest with software patents. Published for and distributed to members of the
Software Patents Institute, the premier issue of the 12-page newsletter
provided an overview of SPI services and activities (the SPI online database,
for instance), as well as recent patent-related developments (software-patent
hearings, patent applications, and the like). Additionally, future issues will
focus on the SPI patent-database project, educational and fund-raising
activities, and so on. Contact the Software Patent Institute for information
on The SPI Reporter and other services. Reader service no. 21.
Software Patent Institute
2901 Hubbard Road
Ann Arbor, MI 48105-2467
313-769-4606
spi@iti.org
Image Processing in C, by Dwayne Phillips, has been released by R&D
Publications and will be distributed by Prentice Hall. The book, based on the
author's long-running series in The C Users Journal, examines the basic
concepts of analyzing and enhancing digital images. Central to the book is the
C Image Processing System (CIPS), software that lets you read, write, display,
and print TIFF images. ISBN #0-13-10454802. The 500-page book retails for
$40.00. Reader service no. 22.
Prentice Hall
P.O. Box 11073
Des Moines, IA 50381-1073
515-284-6751
A pair of programming libraries for Texas Instrument's TMS320C40
digital-signal processor has been released by Sinectonalysis. DSP/Veclib
includes optimized DSP routines for Fourier transforms, convolutions and
correlations, spectral analysis, filtering, image processing, data
compression, and the like. In all, the library includes about 300 low-level
functions. DSP/Veclib supports environments that use the TI C compiler (SPOX,
3L, C, and Viruoso), as well as C++ and Ada.
STD/Mathlib is a run-time library of 33 math functions that includes
hand-coded algorithms for trigonometric, transcendental, hyperbolic, lob,
square roots, and so on. STD/Mathlib is compatible with any language
(including C++ and Ada) that adheres to TI C conventions.
Both libraries are available for DOS and Sun OS platforms. DSP/Veclib sells
for $3000.00 (both DOS and Sun), while STD/Mathlib is priced at $495.00 (for
DOS) and $695.00 (for Sun). Reader service no. 23.
Sinectonalysis Inc.
24 Murray Road
West Newton, MA 02165
617-894-8296
Up until now, even if you wanted to use the Qualitas Memory Tester only to
test RAM, you had to purchase Qualitas' 386MAX or BlueMAX packages. Now,
however, Qualitas has released RAMexam, a stand-alone version of its Memory
Tester. Unlike parity testing, which only catches a certain class of errors,
RAMexam uses a fault model based on the multiple ways in which RAM is known to
fail. This model employs a strategy of specific sequences of bit patterns
designed to detect specific types of memory failures. The DOS-based utility
sells for $29.95. Reader service no. 24.
Qualitas Inc.
7101 Wisconsin Ave., Suite 1024
Bethesda, MD 20814
301-907-6700
If you've had trouble keeping track of software bugs during the development
cycle, BugTraker from AFTek Software may enable you to keep better records.
Among other things, the program logs the type of bugs found in beta releases,
the number of unresolved bugs, and when the bugs were resolved. The
Windows-hosted program retails for $119.00. Reader service no. 25.
AFTek Software
Box 383
Troy, NH 03456
603-242-3876
If you're considering becoming an Apple Newton developer, it goes without
saying that you'll need a debugger. To that end, Creative Digital Systems has
released ViewFrame, a Newton debugging and exploration tool. The system,
written by Jason Harper, includes: ViewFrame, a utility that lets you examine
and modify Newton objects; ViewFrame Editor, a NotePad-like editor for
entering and executing NewtonScript; and the Programmer's Keyboard, a software
Newton keyboard for quick entry of NewtonScript expressions. ViewFrame sells
for $70.00. Reader service no. 26.
Creating Digital Systems
293 Corbett Ave.
San Francisco, CA 94114
415-621-4252
Designer Widgets are the latest Visual Basic custom controls from Sheridan
Software. The Dockable Toolbar control, for instance, lets you create floating
toolbars of buttons that the user can "dock" (attach) to the top, sides, or
bottom of an MDI form. Index-Tab controls let you design dialogs using the
index- tab metaphor to group collections of related options. The FormFX
control lets you customize the look of forms by manipulating captions and
borders. You can include multiline text and pictures, adjust fonts, or add a
3-D look. Designer Widgets sells for $129.00. Reader service no. 27.
Sheridan Software Systems
35 Pinelawn Road, Suite 206E
Melville, NY 11747
516-753-0985
Hummingbird Communications has announced that it is shipping eXceed/OS/2,
X-server software for OS/2. eXceed/OS/2 allows OS/2-based PCs to connect to
and display applications from X Windows-based computers running under UNIX and
VMS. The program also allows you to cut and paste data between the different
environments.
eXceed/OS/2 is a 32-bit X display server that runs as a native OS/2 app, fully
supporting X Window System Release 5. The software uses OS/2's 32-bit
multitasking architecture to enable concurrent execution of apps. The software
is compatible with IBM TCP/IP 1.2.1 and 2.0, FTP PC-TCP for OS/2 version 1.3,
and Novell LAN Workplace for OS/2 version 3.0. eXceed/OS/2 sells for $545.00.
Reader service no. 28.
Hummingbird Communications Ltd.
2900 John St., Unit 4
Markham, ON
Canada L3R 5G3
905-470-1203
Database developers who want to visually design their applications can turn to
a new set of tools that supports Sequiter Software's CodeBase 5.1 C, C++,
Basic, and Pascal DBMS development systems. CodeReporter 2.0, a Windows
interface designer and developer's report writer, includes an Instant Report
Wizard, drag-and-drop object-creation tools, a report API, and a
code-generation capability. Generated reports can run under DOS, OS/2,
Windows, NT, Macintosh System 7, and UNIX.
CodeControls 2.0 is a set of Windows custom controls for point-and-click
creation of a DBMS interface using Borland C++ Resource Workshop, Microsoft
Visual C++ App Studio, and Visual Basic. Among the tools is a Master Control
that allows for the point-and-click of record positioning, searching,
deleting, and undo. You can also access the API directly, giving complete
control of interface objects. CodeBase 5.1 and CodeBase++ 5.1 retail for
$495.00; CodeBase 5.1 and CodePascal 5.1 sell for $245.00. Reader service no.
30.
Sequiter Software Inc.
9644-54 Ave., Suite 209
Edmonton, AL
Canada T6E 5V1

403-437-2410
The SoftCop antipiracy software system has been released by SoftCopy
International. Using the software, developers embed security codes onto disks
or CD-ROMs during the development cycle. As the user installs the protected
application onto his PC, the SoftCop system prompts the user to call an 800
number for an authorization code. That code will be provided in 30 to 90
seconds by Bell Sygma, a Bell Canada subsidiary and SoftCop partner. Contact
SoftCop for licensing information. Reader service no. 29.
SoftCop International
920 Brant Street, Suite 5
Burlington, ON
Canada L7R 4J1
905-681-3213
Ligature Software has introduced the CharacterEyes SDK that enables developers
to integrate neural-network-based optical character recognition (OCR) into
software. The company claims that the CharacterEyes package allows users to
capture text at up to 300 characters/second with up to 99.6 percent accuracy.
The package supports a wide variety of fonts, including obscure typefaces such
as Greek letters, Gothic type, and all Western (Latin) character sets.
CharacterEyes supports all major scanners and image file formats, as well as
the TWAIN standard. CharacterEyes-based software requires less than 3 Mbytes
of hard-disk space and 4 Mbytes of RAM. The CharacterEyes SDK lets you
integrate Ligature's OCR engine into your applications. The SDK includes a
group of 32-bit DLLs containing the API, a C-syntax header file containing
definitions and data types, and external resource files including third-party
DLLs. The SDK, currently available only for Windows (the Macintosh version
will be ready by mid-1994) sells for $2995.00. Reader service. no. 31.
Ligature Software Inc.
26 Burlington Mall Road, Suite 300
Burlington, MA 01803
617-238-6734
The Open Software Foundation has announced that OSF/Motif has been formally
approved by a unanimous vote of the balloting group of the IEEE Computer
Society Standards Board as the GUI standard P1295--the Modular Toolkit
Environment. The P1295 standard specification of more than 600 pages is
derived from the OSF/Motif documentation set, and specifies both the API and
appearance and behavior (look and feel) for Motif. Motif 1.2 is compliant with
the standard.
Additionally, the Motif API specification has also been submitted to X/Open's
Fast-Track review process. The Motif specification under X/Open review is a
superset consistent with the core functionality in P1295, and is also derived
from the OSF Motif documentation. The X/Open version of the Motif
specification contains additional Motif features such as drag-and-drop,
internationalization, and gadgets. Reader service no. 32.
Open Software Foundation
11 Cambridge Center
Cambridge, MA 02142
617-621-8700
Now that they've buried their respective GUI hatchets, Microsoft and Apple
have announced an agreement involving the interoperability of their messaging
and directory services.
As part of the agreement, the two companies will jointly develop a suite of
Messaging Application Program Interface (MAPI) service providers, and Apple
Open Collaboration Environment (AOCE) technology-based gateways to allow
programmers to build cross-platform client-server solutions.
The new gateways and service providers will provide MAPI-compliant
applications for Windows access to Apple's PowerShare Collaboration Servers;
and will also give applications (based on the AOCE technology and PowerTalk
APIs on Macs and PowerBooks) access to Microsoft information management
software. Microsoft and Apple also agreed to provide a gateway between their
server products and to support the basic send capability Common Mail Calls in
their products. Reader service no. 33.
Apple Computer
20525 Mariani Ave.
Cupertino, CA 95014
408-996-1010
Microsoft Corp.
1 Microsoft Way
Redmond, WA 98052
206-882-8080
If your development work requires you to keep more than one operating system
installed on your hard disk, V Communications' System Commander--a utility
that lets you install up to 42 different operating systems on a single PC--may
be just the tool you've been waiting for. System Commander manages up to 16
different versions of DOS as well as 26 other Intel-compatible operating
systems without your having to repartition your hard disk. Among others, these
OSs include MS-DOS 5.0/6.2, PC-DOS 6.1, Novell-DOS 7.0, Windows 3.1/4.0,
Windows for Workgroups, NT, OS/2 2.x, UNIX, and NetWare.
When loaded, System Commander displays a menu on boot-up. You simply cursor to
the OS you want, then run it. The utility handles CONFIG.SYS and AUTOEXEC.BAT
files for each version of DOS. System Commander sells for $99.95. Reader
service no. 34.
V Communications
4320 Stevens Creek Blvd., Suite 275
San Jose, CA 95129
408-296-4224


























May, 1994
SWAINE'S FLAMES


Prodigy Hits Potholes on the Information Highway


I see that Prodigy, the IBM-Sears joint venture in information-highway
construction, is still not profitable. This despite the fact that, with some
two million users, it has the largest customer base of any online service.
This despite the fact that it has a revenue stream most other online services
don't have--it carries paid advertising.
This despite the fact that they couldn't have spent much on that interface.
Still, I'm not surprised. Prodigy could be a case study in not understanding a
market. This is just my opinion, but_.
Remember that censorship flap a couple of years back? The gist of it was,
Prodigy management decided it didn't like the content of some of the messages
being sent among users on its service and kicked some users off. And stirred
up a hornet's nest in the process.
Now, you can argue about whether online services have the right--or the
responsibility--to control the content of messages among users on the service.
You can raise the issue of pornography or of national security. You can posit
a need to keep commercial messages off a medium intended for noncommercial
communications.
You can try to defend Prodigy's position, but I don't think you can defend the
company's handling of the matter. Prodigy was apparently totally blindsided by
the vehement reaction of its users. Does that matter? After all, Prodigy does
have two million users; the censorship brouhaha couldn't have hurt it too
badly. I think it does matter. I think it demonstrates Prodigy management's
misunderstanding of its audience and of the nature of its business.
Then there's the advertising.
Prodigy was designed to tie two kinds of material together: material that you
select to read and material that some third party has paid to have presented
to you--that is, ads. You select the first, and you get the second, on the
same screen, like it or not.
Now, that's not a new idea. It's how magazines work. Magazines are physical
bundles of editorial material and advertising. You can't buy just the
editorial part of the magazine. This is part of an odd economic model, in
which you, the reader, don't pay what it costs to produce and distribute the
editorial material. Your reading of that material is subsidized by the
advertisers. Personally, I think the magazine model has a lot of advantages,
given the limitations of the print medium, but would this model hold up in a
medium where you can freely choose, and pay for, only what you really want to
read?
Prodigy attempted to extend the magazine model to electronic publishing. This
strikes me as a fundamental error. In the short term, it may be possible to
force awkward bundles of information on people (ads and edit mixed on one
screen), but it runs counter to the nature of the medium. One of the primary
reasons for storing information electronically is so that you can access it in
the way that is appropriate for you. Any system--or service--that works
against this logic is inherently crippled. And in the long run, crippled
systems and services don't compete well with healthy ones.
Does any of this explain Prodigy's current unprofitability? Probably not. But
I think it does call into question the service's long-run viability.
Michael Swaineeditor-at-large









































June, 1994
EDITORIAL


The Check is in the E-mail


Timing is everything, especially for those who enjoy irony at the expense of
public relations. Fast on the heels of the U.S. Postal Service's announcement
that it wants to raise rates next year, for instance, is the news that an
enterprising mail carrier in Chicago found a way of delivering on the promise
of a paperless society--just dump a couple hundred pounds of mail and put a
match to it. (By the time the fire department finished putting out the flames,
it really was junk mail.)
Still, there's nothing funny about rate increases that will average about 10
percent for all classes of mail. For most of us, this hike translates to three
cents more for mailing your mother a birthday card, four cents extra to send
you this magazine, seven cents more for your bank to mail out a statement, and
an additional $1 for overnight express mail.
The Postal Service is caught between a rock and a postage meter. On one hand,
the mail has to go everywhere to everyone--an expensive, yet not necessarily
profitable, process. On the other hand, the Post Office has to battle
competitors who aren't compelled to provide universal access, but who can
target and dine off of highly profitable markets such as business-to-business
overnight mail. And to make things worse, innovations in electronic
communication are posing new challenges. As Keith Smith, a senior marketing
analyst for the Postal Service, recently said, "The old postal monopoly has
been rendered obsolete by technology advances."
The impact of technology on the Post Office varies, depending on which
business you're looking at and how that technology is applied. Electronic
communication has had the greatest impact on the household-to-household
segment. (How many times have you phoned your mother instead of dropping her a
letter?) The business-to-business segment, however, has been more affected by
fax machines. Although no one really knows how many faxes are sent daily,
Smith estimates that 43 percent of them replace first-class mail and 33
percent overnight express. (Interestingly, we're in the midst of a nationwide
phenomenon whereby the growth in the number of phone lines being installed is
outstripping the growth in population. It's a safe bet that fax machines and
modems are tied to most of those lines.) In total, the Postal Service guesses
that it has lost about $2 billion to fax machines in the last few years.
Not that you can blame consumers. According to Southwestern Bell, it costs
just six cents at off-peak rates to send a one-page fax the 30 miles from
Dallas to Fort Worth (assuming a one-minute transmission time). Next year,
sending that same one-page document by first-class mail will cost 32 cents and
take a day or two at best. Granted, a first-class letter doesn't require the
up-front costs of fax machines and telephone lines. Nevertheless, the volume
of first-class mail continues to climb, contrary to the Office of Technology
Assessment's 1988 prediction that e-mail and faxes would cut first-class mail
to 40 billion pieces by 1990. Instead, it climbed to more than 92 billion
pieces in 1993.
To deal with technological innovations, the Postal Service recently
established a Technology Applications group that's charged with examining
advanced technologies, developing new products, and performing competitive
analysis. It's no surprise that handwriting recognition is high on their
priority list. Consider that sorting mail by hand used to cost the Post Office
$42 per thousand pieces. Current-generation automated systems have pared this
to $19 per thousand. Next-generation systems--based on handwriting recognition
and remote bar-coding--will cut this to as little as $3 per thousand pieces.
Looming on the electronic horizon, of course, are computer networks. According
to Smith, the Postal Service "is paying a lot of attention to the information
superhighway," even though most projects on the drawing board are
entertainment services aimed at upper-income consumers--a notion contrary to
the Post Office's charter of universal access. In recent hearings before the
Senate Governmental Affairs subcommittee, Postmaster General Marvin Runyon
reaffirmed this, stating that the Postal Service is "working with other
organizations to develop an interactive information kiosk and provide a
platform that can be used by other federal agencies." He went on to suggest
that "perhaps [post office] lobbies could serve as on-ramps, providing access
to anyone who wants to be on the electronic superhighway."
This is all well and good, but Runyon didn't stop there. He then stepped off
into the minefield of electronic security, privacy, and the Clipper chip,
saying that perhaps the Post Office should be certifying electronic messages
to safeguard privacy, "securing one company's market-sensitive information
from the intruding eyes of its competitors." Other postal officials
subsequently acknowledged that forays into encryption and security would be in
coordination with Commerce and Justice Department efforts.
The Postal Service has generally done a good job of cutting costs and
improving service, thereby fulfilling its central mandate. But teaming up with
agencies who would infringe upon our individual freedoms using Clipper-based
encryption technology runs counter to the idea of universal access and is an
idea that belongs in the dead-letter box.
Jonathan Ericksoneditor-in-chief












































June, 1994
LETTERS


Revisiting Variably Dimensioned Arrays in C




Dear DDJ,


Many scientific and engineering programmers are changing from Fortran to C or
C++ for technical and numerical applications. As with Fortran, C/C++ enable
separate development of general-purpose routines, and libraries for holding
them, that can be linked later with a variety of programs.
However, a serious weakness of C/C++ is the inability of either language to
handle arrays with two or more dimensions as arguments to a function--if the
array sizes are to be specified at run time.
With single-dimensioned arrays, there is no problem. In C or C++, the array
dimension can be omitted in the function (for example, float a[];). With
arrays of two or more dimensions, all sizes except the first must be known at
compile time, as these sizes are used by the compiler in generating the
machine instructions to access the array elements.
This has been a major cause for the slowness of scientists, mathematicians,
and engineers to accept C or C++ as a viable language for their work, despite
the availability of better data and program structuring features and
recursion.
In his article, "Variably Dimensioned Arrays in C" (DDJ, August 1993), John
Ross discussed a number of methods for overcoming this weakness of C or C++.
The most useful and transparent method for the user of the function or library
is to use pointer variables instead of arrays. The person using the library
would write a program calling such a function by passing the address of the
array and its dimensions as arguments. The person writing the function itself
would use the pointer variable and increment it as needed to process the
desired array elements.
This approach is most suitable if the array can be processed sequentially. It
has the disadvantage for the function writer that the use of pointers and the
need for sequential processing of the array can make the function itself
difficult to understand and write.
However, the approach can be extended and simplified by using the
preprocessor. It is possible to define a macro for performing an array-address
calculation. Then elements of an array can be referred to with a convenient
notation, and the function becomes much more readable.
To calculate the address of an element of the array, we need to add to the
base address (for instance, the address of the start of the array), the number
of elements in the rows before the current one, and the position of the
desired element in its row. If a is the start address of the array, m the
number of rows, and n the number of columns, the address of the i,j element
will be a+i*n+j.
A suitable macro might be #define A(I,J) (*(a+(I)*n+(J))), where we are using
the uppercase A to refer to elements of the array passed through the pointer
argument a. Macros use a notation different from array references, one that
will be familiar to Fortran programmers for the arguments. We can refer to an
element of the array as A(1,3), for example.
This is not as fast as directly incrementing pointer variables, as John
demonstrated, but makes the program easier to read--and will be applicable to
a wider range of applications.
For example, an algorithm for matrix multiplication (Example 1) and its
implementation in C (Example 2) show a direct correspondence of the lines,
made possible by the use of simple macros for array references. Such a
function can be called from a C program where the arrays are declared normally
(Example 3).
You have to be careful if you try to mix such C functions with Fortran
programs, because the two languages use different storage orders for the
elements. Fortran holds the elements of an array column-wise, whereas C and
C++ hold the elements row-wise. This, in effect, is a reversal of the order of
the dimensions.
T. Graham Freeman
Australian Defense Force Academy


Reverse Engineering




Dear DDJ,


Regarding your editorial on reverse engineering (DDJ, March 1994), you're
right in saying that most programmers end up doing some reverse engineering,
but in my experience it is rarely out of choice. Most of us are not in the
business of reverse engineering some system just to find out how it does some
neat trick; we're reverse engineering it because we have to make our own code
work with it. When supplied systems have bugs, or are incorrectly documented,
or even undocumented, what is the alternative to reverse engineering? In my
opinion, the European Community law has it right--when you have to make your
code work with somebody else's system, you need to be able to reverse
engineer. That's not lenient, that's common sense! But I'm prepared to make a
concession. When someone delivers a system that is completely and correctly
documented, and that is entirely free from bugs, I'll be more than happy to
agree not to reverse engineer it.
John Hoyland
Swindon, England


One Ringy-Dingy_




Dear DDJ,


Your April 1994 editorial about Ma Bell hit a nerve. In November 1993,
Southwestern Bell implemented a plan called the "Metropolitan Calling Area''
(or MCA), which is supposed to reduce our phone rates. Yeah, right! In St.
Charles county, near St. Louis, SW Bell raised the basic phone rates 25
percent across the board in return for the ability to call places halfway
across the state for free. (Well, free if you don't consider the 25 percent
increase in your phone rates.) The way this was passed is that they held a
series of meetings out in the sticks, asking people if they wanted the
service. Of course, these people fell all over themselves as the 25 percent
increase was a drop in the bucket compared to their long-distance bills to
call St. Louis. Then evidently with this, SW Bell convinced the Public Service
Commission (PSC) to raise the rates. Conveniently, they had only one public
meeting in our area. SW Bell conveniently ignored the opinions of hundreds of
thousands of people living close to St. Louis.
If that weren't bad enough, the plan wasn't fully implemented after the rates
went up 25 percent. In the St. Charles area, the plan will not be fully
functional until the fall of 1994. So for 8 to 10 months, we have to pay 25
percent higher bills and still pay long-distance charges to call the places we
are supposed to be able to call for free. We are, in effect, being
double-billed for the calls. SW Bell doesn't think that there is anything
wrong with raising our rates to provide a service, then not providing the
service. Letters to the PSC get double-talk responses saying that the plan
can't be implemented all at once, and 8 to 10 months of double billing seems
reasonable to them.
It's just like Lily Tomlin used to say on the TV show "Laugh In,'' "the phone
company is omnipotent."
P. Lyle Mariam
St. Charles, Missouri
DDJ responds: Thanks for your note, Lyle. Subsequent to the April issue,
Southwestern Bell began a similar push in the Kansas legislature (where it is
the state's leading lobbyist) to free itself from the shackles of public
oversight. Partly in response to SW Bell's efforts, both Missouri and Kansas
are launching extensive studies to examine what their respective
tele-communication system needs will be in the future. A Missouri bill has, in
fact, proposed the establishment of a Commission on Information Technology
which will focus on future telecommunication needs in the education and health
fields.


Help for Help





Dear DDJ,


As the author of the Help Magician, I'd like to address "Help for Windows Help
Authors," by Al Stevens (DDJ, April 1994).
Al made several negative comments about the 3-D interface. In talking to our
customers, however, we find that the overwhelming majority likes the
interface. Those who do not, however, are the most vocal.
Regarding Help Magician's functionality, in addition to entering the page
number of the topic to access, the user can use Ctrl+PgUp and Ctrl+PgDn; click
on the PgUp and PgDn button; or go to topic by title, context string, and
context number.
It is not necessary to compile the entire help file to see the results of a
particular topic. There is a One Page Preview function that compiles faster.
Not only are there left and right border functions, but complete paragraph
formatting is available from the Paragraph submenu. Al did discover a bug.
While I can't say that we've never had bugs, I can say that we fix problems
the same day they are discovered and provide the customer with the help needed
to continue working effectively. Help Magician has been on the market for
about two years now and is very stable at this point.
In response to Al's request for a help tool that emulates WinHelp, our next
release, Version 3.0, will have a WYSIWYG editor that will look like WinHelp,
with a built-in test mode. It will also have a document-to-help conversion, a
Visual Basic source-code scanner that builds a help- file shell, automatic
glossary generation, a macro editor, a build-tag manager, and a more-standard
Windows interface.
Robert B. Heberger
Foster, Rhode Island


Dear DDJ,


I agree with Al Stevens (DDJ, April 1994) that Help Magician's interface is a
little too "3-D-ish." However, there is a checkable menu item that lets you
turn it off. I also agree that the Help Magician approach isn't quite as
polished as it might be--the paired markers are a little tough to work with,
especially if you're importing an existing project; but if you're starting
from scratch and have time to get more familiar with the tool, you can get
used to it.
Al didn't address the issues of speed and support. I've used RoboHelp for
years. Compared to Help Magician, RoboHelp is slow. With RoboHelp, lookups in
list boxes for jumps can take forever on large projects or in multifile
projects. Help Magician is lightning fast, especially when it comes to RTF
generation.
On the support side, I've never worked with a company as responsive to
developer input as the Help Magician folks. For instance, I found a small
glitch on the sizing of the primary help window. When you enter 0,0,0,0 for
location and height and width when controlling the color of the nonscrolling
region in the main help window, Help Magician follows Microsoft's
documentation and outputs a _,(0,0,0,0),, in the [Windows] section of the .HPJ
file. Microsoft acknowledges this error in documentation, but that correction
didn't get worked into the current Help Magician release. When I called them
about it, I got a crisp, "You're right. We'll fix that immediately." When was
the last time you heard a tool ISV respond that way?
Richard L. Warren
CompuServe 70750,3436


Varhol is Sent to the Office




Dear DDJ,


In reference to Peter Varhol's February 1994 "Programmer's Bookshelf,'' in
which he states, "Wherever the school of the future ends up, we--as computer
professionals--should be leading the way,'' please stick to programming and
stay out of education. Heaven help us all!
No thank you Mr. Varhol, computer professionals should not be leading the way!
Your industry is in its precious adolescence and does not have the credibility
to lead any educational revolution. I applaud your interest, as a parent and
citizen, to get involved in your child's education but as an industry--keep
your collective butts at the programming desk until you get that right.
I apologize if I have overreacted to your article but either you wanted an
overreaction and stimulated discussion or you are participating in the
"lynch-mob'' mentality that is running rampant throughout the country. Your
article is full of quasi-statistics that are fueling the feeding frenzy of
criticism of the American educational system. Public education in America is
not "broken.'' It works, everyday, everywhere!
I submit most educators (myself included) found the same thing you did. "I
quickly concluded that LOGO was limiting and difficult to use compared to
other [teaching tools] and set it aside.'' I don't know about your profession,
but there is very limited time to adapt materials to classroom use. There are
many tools (and certainly LOGO is one of them) that I would love to adapt to
my classroom. It is a question of priorities.
Unfortunately, you can't complain about the high cost of education and then
"fix'' the problem by spending more money. Education is expensive and will get
more expensive and "doubling'' has no bearing, just more whipping up the
fervor value.
Please, do get involved with your local school, but don't march in and tell us
you have all of the answers. Try hitting a home run, first. Just sit down and
write some simple program like "Carmen San Diego'' or "LOGO.'' Use your
expertise, but use it carefully! Teachers may make some mistakes, but then the
computer-programming industry is based on mistakes.
Ralph Hammersborg
Seattle, Washington
DDJ
DDJ welcomes your comments and suggestions. Mail your letters to DDJ, 411
Borel Ave., San Mateo, CA 94402, or send them electronically to CompuServe
76704,50 or via MCI Mail, c/o DDJ. If your letter is lengthy or contains code,
we ask that you include a disk. Please state your name, city, and state. DDJ
reserves the right to edit letters for length and/or content.

Example 1: Simplest algorithm for array multiplication.
for i = 1 to m
 for j = 1 to p
 C(i,j) = 0
 for k = 1 to n
 C(i,j) = C(i,j)+A(i,k)*B(k,j);


Example 2: C/C++ function using macros to simplify array references.
void matmul(int m, int n, int p,
 float *a, float *b, float *c)
 /* "a" is m rows by n cols,
 "b" is n rows by p cols, and

 "c" is m rows by p cols */ {
#define A(I,J) (*(a+(I)*n+(J)))
#define B(I,J) (*(b+(I)*p+(J)))
#define C(I,J) (*(c+(I)*p+(J)))
 int i, j, k;
 for (i=0; i<m; ++i) {
 for (j=0; j<p; ++j) {
 C(i,j) = 0;
 for (k=0; k<n; ++k) {
 C(i,j) += A(i,k)*B(k,j);
 }
 }
 }
#undef A
#undef B
#undef C
}


Example 3: Main program to call the function in Example 2.
void matmul(int m, int n, int p,
 float *a, float *b, float *c);

main()
 {
 float aa[6][4],
 bb[4][7],
 cc[6][7];
 /* read numbers into aa and bb */
 matmul(6,4,7,aa,bb,cc);
 /* matmul(6,4,7, (float*)aa,(float*)bb,(float*)cc); will press the warning
messages */
 /* print cc */
 }





























June, 1994
Developing GUIs for Database Applications


Two solutions for the same problem




John Rodley


John is president of AJR Co., a Cambridge, MA consulting firm. He can be
contacted on CompuServe at 72607,3142.


Easel and Enfin are two GUI builders from the same company, offering similar
capabilities via radically different programming languages. Easel, the elder
statesman of the two, first appeared nearly a decade ago, long before the
current object-oriented programming craze. Enfin, on the other hand, was
developed from the ground up as an object-oriented system based on the
Smalltalk environment.
In its early days, Easel was a stand-alone 4GL for DOS application
development. Today, it is just one component of Easel Corp.'s Enterprise
Workbench client/server development environment, which also includes the
DB/Assist point-and-click visual tool for SQL access. The Easel-language
compiler supports OS/2 2.1, Windows, and DOS. This article focuses on the OS/2
implementation referred to as "Easel/32" (the Windows version is "Easel/Win").
For its part, Enfin (available for both OS/2 and Windows) is an
object-oriented, visual development environment that lets you select display
objects (fields, buttons, list boxes, menus, windows, dialog boxes, and the
like), position them on the screen, and define their attributes. Enfin then
automatically generates source code. As you might expect, the Enfin
environment also provides an SQL front end, class browser, inspector,
profiler, interactive debugger, and DLL/DDE support. You can also write custom
routines to be written in C, Cobol, assembler, or other low-level languages.
To examine the strengths and weaknesses of Easel/32 and Enfin, I've designed a
personal, total-health tracking and management application which exercises, in
some depth, the main features of both toolkits. The package allows an
individual to enter and analyze personal-health data, a Quicken for the body.
The heart of the application is a database of personal data: meals, workouts,
illnesses, treatments, weigh-ins, and checkups. The theory is that, by looking
at the data over time, you can get a better feel for how to manage your
health. Of course, no total-health package would be complete without drug,
anatomy, and general-health references--all areas where CD-ROM publishers have
provided strong products, so all I provide for them are hooks to start up
external applications. Figure 1 shows a simplified data model for the
application.


The Easel Language


Easel tries hard to be English-like and, up to a point, succeeds. Statements
such as make GenericList_DialogBox visible and change IngredientName_EditField
text to StringVar are typical Easel. While this makes Easel extremely
readable, there are problems. For example, the statements change text of
IngredientName_EditField to StringVar and Copy StringVar to text of
IngredientName_EditField are both made up of valid Easel constructs and mean
the same thing as change IngredientName_EditField text to StringVar in
English. In Easel, however, they don't compute. A minor gripe, admittedly, but
unlike with other languages, no matter how long I work with Easel, I still
find myself making those kinds of mistakes.
Easel supports a number of basic data types (string, integer, float, and
boolean), arrays of each type, variable-length Easel structures, and
fixed-length structures suitable for use by external C-language DLLs. It also
supports a large number of built-in object types--dialog boxes, sense,
graphical and dialog regions, fonts, image maps, and the like. All of the
standard Presentation Manager (PM) controls also get their own object type:
button, entry field, list box, and so on. Easel is an explicitly event-driven
language; thus, all the usual PM suspects--window open and close, button
click, as well as program start and end--have their own events. Easel supports
four different procedure types: actions, subroutines, functions, and response
blocks. Actions, subroutines, and functions are stand-alone pieces of code,
roughly corresponding to C's macros, call-by-reference functions, and
call-by-value functions. Response blocks are explicitly attached in their
declaration to specific events from specific objects. Whenever the event
occurs, the response block executes. References to built-in object types can
be either hard-wired object names or string variables containing a valid
object name. Most of the power of Easel comes from combining this bit of
abstraction with response blocks. Every possible event in the system can, but
doesn't have to, have a response block associated with it.
Easel supports a suite of familiar control structures: If/Then/ Else, Switch,
For, and While. You can also get outside Easel with relative ease by invoking
external executables, DDE messaging, or by calling external DLLs. The flow of
control in an Easel program is roughly: object creationevent
generationprocedure call. The procedure calls in turn can create new objects,
which generate more events, and so on. Figure 2 shows the flow in a typical
Easel program.


The Enfin-Smalltalk Language


Going to Enfin Smalltalk, I feel a little like the American in Paris who is
startled to find that "they have a different word for everything over here!"
Enfin is a Smalltalk variant, and knowing something about that language gives
you a head start. Though neither a Smalltalk V nor Smalltalk-80 variant, Enfin
promises to converge on ANSI Smalltalk whenever that arrives.
Enfin Smalltalk is object oriented (perhaps "object obsessed" is more
appropriate) from the ground up. Things you might consider basic--data types,
integer, and float, for example--are mere subclasses of more abstract classes.
There are really only four basic elements to think about in Enfin: classes,
methods, objects, and messages. A class implements one or more methods. An
object provides an instance of a class. A message sent to that object selects
a method for that object to execute. The statement System beepFrequency: 400
duration 100. sends the message beepFrequency:duration: to the object named
System. As an object-obsessed language, Enfin is almost completely untyped.
For instance, the statements:
X := 1.X beepFrequency: 400 duration: 100.
compile, even though X is a number. X's object type isn't determined until run
time, at which point it generates the error "instance method
beepFrequency:duration: not found in class smallInteger." Figure 3 shows the
flow of control for a Smalltalk program.
Since Enfin Smalltalk is nothing but objects and methods, the power of the
language lies in the supplied classes and the methods they support. Most of
the usual basic data types (bool, int, float, and so on) are supplied, and
they support the usual operators. Some of your favorite control flow is
available (conditional exec, loop-on-expression, and loop-on-index), as well
as loop-over-collection. It's important to remember that control structures
are implemented as messages with blocks of code as arguments. Thus, the
statement X ifTrue: [yes' out]. is more properly read as send the ifTrue
message to X with [yes' out] as the code-block argument. This means you can
easily implement your own control flow by writing a method with a block
argument.


User Interface


I chose to use different-style interfaces for the two implementations to
reflect the strengths and Zen of the tools. The Easel implementation has a
traditional PM/Windows style, with a hierarchy of menus available from a menu
bar off the main screen. The Enfin implementation implements a drag-and-drop
interface. Figures 4 and 5 show the first two levels of the app in both
versions. In the Enfin version, any of the icons in the top two lines can be
dragged and dropped onto the Edit or New icons in order to create a new item,
or edit an existing item.
This choice has important implications in Enfin. To implement the traditional,
hierarchical model, I would have created a Controller, with a single Form
containing a menu and multiple Subforms. Instead, I have a Controller, with a
single Form containing only a series of WorkplaceObjects. These
WorkplaceObjects appear within the main form as icons and have forms attached
to them. The WorkplaceObjects are direct substitutes for the menu items that
would populate the menu bar of a hierarchical app. When one WorkplaceObject is
dropped on another, the open method for the receiver is called. The default
WorkplaceObject>>open method opens the Form associated with the receiving
WorkplaceObject.
In the Easel version, you pick a type of item to work on from the menu. Then,
from the second- level dialog, you either select an existing item to edit or
create a new one. Each menu item gets its own response block which gets called
automatically when the user selects the item. The response block queries the
database, formats the displayable list of items, then calls a generic-list
dialog box. When the user selects New or Edit from the generic-list box, the
response block written for these buttons loads the dialog named by the
parameter.
The flow of the application from main menu to edit-item dialog combines many
of the most powerful features of Easel: parameters, object access-by-name, and
item classes. All menu items are collected into a single class (it's probably
easier to think of Easel classes as arrays of objects rather than OOP
classes). This class gets a single response block. Choosing one of the items
in the class causes the class response block to run. Each menu item also has a
unique parameter. Calling parameter of object within the response block gets
the parameter of the menu item that triggered the response. The response block
constructs the names of the actions (unique to each menu item) to call using
the parameter. The unique action blocks format the data for the list box.
Listing One (page 92) shows the primary region, menu bar, menu item, item
class, and response block code for a new Workout item.


Database Access


External database access is central to both Easel and Enfin. For this
application, I chose IBM's DB2/2 32-bit relational database system, which is
designed for both LANs and single-user systems, and which supports application
clients on DOS, Windows, and OS/2. I used the single-user OS/2 version
supported by both Easel and Enfin, and I wrote a C program for database
creation and initialization (with sample data).
Though previous versions of the Easel Workbench relied on local-application
access to databases to start up a command-line OS/2 shell and read/write its
standard I/O, both Easel and Enfin now support seamless DLL access to a number
of different SQL databases as well as ODBC. The Easel Workbench provides
automated SQL database linkage through the DB/Assist point-and-click visual
tool, a separate executable bundled with Easel. DB/Assist allows you to
construct and test SQL statements and link them to Easel variables. These
statements are turned into Easel action blocks and inserted into the project.
The Easel program calls OPEN_XXX, FETCH_XXX, and CLOSE_XXX (where XXX is the
name of the SQL statement) and the data from the select appears in the linked
variables. From there, you have to write code to enter these values into the
dialog box. There is no autocreation of data-entry forms.
Enfin, on the other hand, provides automatic creation of data-entry forms via
SQL Editor, a database tool included with the Enfin package. You pick a
database table, choose Data Entry Form and voil--a pretty good first hack at
the dialog box. Enfin also comes with a lightweight, internal SQL database
that allows you to develop and test your application without actually running
the external database.
Enfin provides automatic links into the database, which allow a view into the
database to be automatically updated whenever the data behind it changes.
Listing Two (page 92) shows the initialization code for the list box
containing the list of workouts; you can choose one to edit or delete. Two
lines of code link the TabularListBox object to the database table. Two more
lines link the edit field in the dialog box to the field in the record. Rows
fetched from the database appear as Record objects, and the fields within a
Record object can be obtained by name. HEALTH never needs that kind of access.
All database operations are handled by the provided classes.
In Enfin, the three external applications--Anatomy, General Health, and Drug
references--are set up by subclassing the WorkplaceObject with my own ExecWPO
class. The default WorkplaceObject behavior is to open the object associated
with the icon, so ExecWPO provides setSessionName and setFileName methods that
set up the session title and executable filename, and a new open method that
uses the String class method startSessionProgram to execute the program.
Listing Three (page 92) shows the new ExecWPO class. The Easel version uses a
straightforward start application call from the menu-item response block.



Source Code and the IDE


The Easel Enterprise Workbench keeps the source for your project in a binary
.EWB file. Using a third-party editor to change source involves exporting it
from the integrated development environment (IDE), then importing it back
in--an awkward process. The IDE does maintain a map of source-to-source file
(project views) so that exporting is relatively painless. You can include
Easel source files through the use of a simple include statement. Easel
applications compile into an Easel binary (.EBI) file. In order to run the
application, users need a run-time version of Easel installed. Easel run-time
licenses carry a substantial price tag.
Enfin operates directly on ASCII-text .CLS files. You can edit outside the
Enfin environment as long as you remember to reload a changed source file. You
include Enfin-Smalltalk files in your application by adding the filename to
the application file. Figure 6 shows the source-file organization of a
three-form Enfin application. Designer, the interface source-code generator,
gets you through the look-and-feel phase of program building. Like most code
generators, Designer writes code its own way, and if you modify files outside
the system, you have to be careful where you do it. The Controller method
(initialize) seems to be reserved for Designer-generated code and if you throw
something in there, then modify an object through Designer, it can get lost.
The best policy is to not modify Designer-generated methods but to write your
own and simply call them from the Designer-generated method. Enfin
applications compile into an image (.IMG) file. Again, users need the run-time
Enfin file (a single executable) but the Enfin run-time can be distributed
royalty-free.


Results


The final Enfin application comes out to about 1700 lines (including white
space) of original code. Of that, approximately 1400 is instantiation of Form
objects (essentially, dialog-box definitions). The Form initializations are
mostly uninteresting, except for the code setting up links to the database.
I created a separate form for each list of database records, with a hardwired
link to a database table, simply because it was so easy using Enfin Data
Manager's automatic-default data-entry form facility. I could have used a
single-list form by subclassing Form, as I did in the Easel version. This
subclass would have to maintain variables for the table name and the edit form
that follows when the user selects Edit or double-clicks an item in the list
box. The subclass open method would take the new table name, remove any
existing table link, then add a link to the new table and update it.
The Easel application comes out to about 2500 lines of original code. Of that,
about 1400 consist of form initialization, 800 of DB/Assist-generated SQL, and
300 of manually entered code. This is a little deceptive, though. The only way
I got away with so little manual code in Easel was by taking care in how I
named the objects and actions. If I'd created a different form for each list
of items (as I did in Enfin) the line count would have inflated exponentially.


Conclusion


Both Easel and Enfin are highly capable, complete application- production
environments. The classes provided with Enfin, in general, provide more
functionality than the built-in objects of Easel. For example, Form fields can
have validation attached, and database-item attachment to dialog controls is
handled automatically by the TableLink class, both of which require you to
write more code in Easel. The power of Enfin is somewhat muted, though, by the
IDE. Though you can subclass interface classes, when you do so, you lose the
ability to modify them within the Enfin visual designer. When I tried to
modify one of my ExecWPO objects through the designer, the code generator
turned it back into a WorkplaceObject. The choice between them comes down to
whether you're object obsessed or object tolerant. While object-obsessed Enfin
is clearly the wave of the future, Easel provides similar functionality in a
more traditional package.
 Figure 1: Simplified data model for test application.
 Figure 2: Control flow in an Easel program.
 Figure 3: Control flow in an Enfin-Smalltalk program.
 Figure 4: Hierarchical menu developed with Easel.
 Figure 5: Drag-and-drop menu developed with Enfin.
 Figure 6: Source-file structure of an Enfin application with three forms.
[LISTING ONE] (Text begins on page 18.)

# Definition of the menu bar. Note the parameter attached to every menu item.
action bar MainMenu_AB is
 enabled pulldown Activity_PD text "~Activity"
 enabled choice Workout_MC text "~Workout"
 parameter is "Workout"
 enabled choice Sleep_MC text "~Sleep"
 parameter is "Sleep"
 enabled choice Meal_MC text "~Meal"
 parameter is "Meal"
# the rest of the menu bar def. would be here
 ...
 ...
 ...
# This item class contains all the menu choices.
item class EditNewDelete_CL is
 Workout_MC Sleep_MC Meal_MC Weighin_MC Injury_MC
 Treatment_MC Checkup_MC Ingredient_MC Recipe_MC Diary_MC
# This is the main window definition for the application.
# It contains the menu bar MainMenu_AB.
enabled visible color 26
primary graphical region Main_GR size 475 179
 at position 83 112
 in desktop
window size 475 179 at position 0 0
color 27 foreground
size border
title bar "Total Health"
system menu
horizontal scroll bar scroll by 6
vertical scroll bar scroll by 2
action bar MainMenu_AB

minimize button
maximize button
# This response block is called whenever an item in
# class EditNewDelete_CL is chosen.
response to item EditNewDelete_CL
 copy parameter to Thread
 copy "OPEN_" Thread to OpenAction
 copy "FETCH_" Thread to FetchAction
 copy "CLOSE_" Thread to CloseAction
 copy Thread "ListAction" to ListAction
 make GenericList_DB visible
 change GenericList_DB text to Thread
 action FetchThemAll
# Loops over FetchAction until there are no more items in the table.
action FetchThemAll is
 clear GenericList_LB in GenericList_DB
 action OpenAction
 while(( Esqlca.Sqlcode != END_OF_DATA ) and
 (Esqlca.Sqlcode != CURSOR_CLOSED ))
 loop
 action FetchAction
 if(( Esqlca.Sqlcode != END_OF_DATA ) and
 (Esqlca.Sqlcode != CURSOR_CLOSED )) then
 action ListAction
 end if
 end loop
 action CloseAction
# This action formats a single line for the WorkoutList dialog box.
# There is an action like this one for every type of list: Meals, Injuries,
...
action WorkoutListAction is
 if( Description != "" and DateTime != "" ) then
 copy Description " " DateTime to TS
 add to GenericList_LB in GenericList_DB
 insert TS
 end if
# Response to the edit button in the single, generic list dialog box. It uses
# the variable Thread that we filled in earlier from parameter of menu item.
response to Edit_PB in GenericList_DB
 change GenericList_DB text to ""
 make GenericList_DB invisible
 copy Thread "EditAction" to LoadEditDBAction
 copy Thread "_DB" to DialogBoxName
 action LoadEditDBAction
# This is the action that fills in the edit item dialog box.
action WorkoutEditAction is
 make DialogBoxName visible # This brings up the dialog
 change Description_EF in Workout_DB text to LastName
 change DateTime_EF in Workout_DB text to DateTime
 change Duration_EF in Workout_DB text to Systolic
 change Intensity_EF in Workout_DB text to Diastolic
 change Comments in Workout_DB text to LDL

[LISTING TWO]

"The initialization code for the workout list and workout item dialog boxes."
"Some of the control definitions have been left out for brevity."

"The workout list form."
 form := Form

 name: #WorkoutList
 rect: {22 431 2043 862}
 controller: controller.
 form setInitialFocusTo: #OKButton.
 cItem := controller add: #OKButton
 class: FormButton
 rect: {511 611 350 90}
 options: {#Return #Tab #Up #Down #Backtab #Left #Right}
 form: form
 text: ~Edit'.
".... Other control definitions would be here ..."
 cItem := controller add: #ListBox
 class: FormLinkedTabList
 rect: {49 41 1676 469}
 options: {#Return #Tab #Backtab}
 form: form.
 cItem setSplitPositionTo: nil.
 "The following 2 lines link the TabularListBox cItem to the"
 "database table #WORKOUT."
 tLink := TableLink newIdentifier: #WORKOUT attribute: #ALL.
 controller setLinksTo: cItem link: tLink.
 temp := AcceleratorTable new.
 temp at: #AltC put: #cancelList.
 temp at: #Altc put: #cancelList.
 temp at: #AltD put: #deleteWorkout.
 temp at: #Altd put: #deleteWorkout.
 temp at: #AltE put: #editItem.
 temp at: #Alte put: #editItem.
 form setAcceleratorTableTo: temp.
 "Here is the initialization code for the edit workout item dialog box. Note"
 "there is a link for every entry field to a field in the database record."
 form := Form
 name: #Workout
 rect: {135 7 1837 851}
 controller: controller.
 form setGridTo: false.
 form setSnapTo: false.
 form setXGridResTo: 82.
 form setYGridResTo: 82.
 temp := IdentityDictionary newEntries: 2.
 temp at: #cancelButton1 put: #cancelWorkout.
 temp at: #OKButton1 put: #saveWorkout.
 form setReturnActionsTo: temp.
 form setInitialFocusTo: #Description.
".... All the static control definitions would be here ..."
 cItem := controller add: #Description
 class: FormString
 rect: {514 50 420 60}
 options: {#HResize #Return #Tab #Backtab #Return
 #Return #Return}
 form: form.
 cItem setMaximumLengthTo: 20.
 tLink := TableLink newIdentifier: #WORKOUT attribute: #DESCRIPTION.
 controller setUpdateLinksTo: cItem link: tLink.
 cItem := controller add: #DateTime
 class: FormString
 rect: {514 135 420 60}
 options: {#HResize #Return #Tab #Backtab
 #Return #Return #Return}

 form: form.
 cItem setMaximumLengthTo: 26.
 tLink := TableLink newIdentifier: #WORKOUT attribute: #DATETIME.
 controller setUpdateLinksTo: cItem link: tLink.
 cItem := controller add: #Location
 class: FormString
 rect: {514 220 420 60}
 options: {#HResize #Return #Tab #Backtab
 #Return #Return #Return}
 form: form.
 cItem setMaximumLengthTo: 20.
 tLink := TableLink newIdentifier: #WORKOUT attribute: #LOCATION.
 controller setUpdateLinksTo: cItem link: tLink.
 cItem := controller add: #Duration
 class: FormNumber
 rect: {514 305 420 60}
 options: {#Return #Tab #Backtab #Return
 #Return #Return}
 form: form.
 tLink := TableLink newIdentifier: #WORKOUT attribute: #DURATION.
 controller setUpdateLinksTo: cItem link: tLink.
 cItem := controller add: #Intensity
 class: FormNumber
 rect: {514 386 420 60}
 options: {#Return #Tab #Backtab #Return
 #Return #Return}
 form: form.
 tLink := TableLink newIdentifier: #WORKOUT attribute: #INTENSITY.
 controller setUpdateLinksTo: cItem link: tLink.
 cItem := controller add: #Comments
 class: FormString
 rect: {514 469 420 60}
 options: {#HResize #Return #Tab #Backtab
 #Return #Return #Return}
 form: form.
 tLink := TableLink newIdentifier: #WORKOUT attribute: #COMMENTS.
 controller setUpdateLinksTo: cItem link: tLink.
".... The rest of the dialog control definitions would be here ..."

[LISTING THREE]

WorkplaceObject subclass: #ExecWPO
instanceVariableNames: 
 FileName
 SessionName

classVariableNames: 

poolDictionaries: 
 !
!ExecWPO class methods !
!"End of class methods block"
!ExecWPO methods !
open
 "Override the default open method which would open the form associated with"
 "this WorkPlaceObject. This call starts an external application. Must call "
 " setSessionName and setFileName before opening this object!"
 SessionName startSessionProgram: FileName inputs: new'.
!"end open "

setFileName: newFileName
 "Set the filename of the executable we're going to run"
 FileName := newFileName.
!"end setFilename"
setSessionName: newSessionName
 "Set the title this session will have when opened'"
 SessionName := newSessionName.
!"end setSessionName"
!"End of method block"
End Listings




















































June, 1994
GUI Development for Real-Time Applications


Tools for creating graphical process control




Avram K. Tetewsky


Avram, who holds a BSEE from RPI and an MSEE from MIT, works at Draper
Laboratories in Cambridge, MA. He can be reached at atetewsky@draper.com.


At their best, real-time control applications have traditionally sported
nothing fancier than pick-a-number, character-based interfaces. Still, any
real-time control application which presents an operator interface can benefit
from a GUI. A GUI can, for instance, decrease the learning curve for new users
and lessen the chance of erroneous input by experienced users. There are
PC-based tools for developing GUIs for real-time systems, including both
RTWare's ControlCalc Version 1.78 and National Instruments' LabView 3.0. In
addition to letting you put a pretty face on real-time control apps, these
tools can enable rapid prototyping. In ControlCalc's case, you can even ROM
the application. And with today's 32-bit processors, you can have your GUI and
get real-time performance, even with one CPU.
While LabView has been examined several times in DDJ (most recently in "A
Visual Approach to Data Acquisition" by James F. Farley and Peter D. Varhol,
May 1993), you're probably less familiar with ControlCalc. Interestingly,
ControlCalc is based on a multipage-spreadsheet paradigm with each page
constituting a single thread of a multithreaded program. Of course, this
requires a real-time operating system. Consequently, ControlCalc utilizes
Microware's (Des Moines, IA) OS-9000 1.3 operating system and Gespac's (Mesa,
AZ) G-Windows 2.3 windowing package to provide the underlying multithreaded
environment and graphics. ControlCalc currently runs on both 680x0/OS-9 and
386/OS-9000 systems. LabView is available for the Windows, Macintosh, and Sun
platforms, none of which are real-time operating systems.
Because of the many DOS/Windows applications for Intel processors, OS-9000
provides for dual-boot DOS/OS-9000, file exchange, and the ability to run DOS
and standard Windows programs as OS-9000 processes. G-Windows is the GUI
add-on that replaces OS-9000's text interface. The spreadsheet's slash
commands provide control of a built-in compiler, debugger, and run-time
environment.
In this article, I'll examine ControlCalc's performance and real-time
multithreaded programming paradigm, as well as take a quick look at National
Instruments' LabView for Windows data-flow driven software. In the process,
I'll explore each tool's programming metaphor, addressing issues such as how
you specify algorithms, dataflow, state machines, timing, and scheduling using
ControlCalc's multithreaded scheduling model and LabView's data-flow paradigm.


Real-Time Control Benchmarks


Example 1 is the pseudocode for a control benchmark (benchmark #1) that
contains a high-speed control-loop, data-logger, background-editing session,
and a GUI, all running on a single CPU. Code implementations and executables
for ControlCalc- and LabView-specific versions of the benchmarks are available
electronically, as are programmer's notes; see "Availability," page 3.
The acquisition, logging, and analysis portion of the benchmark can use
buffers to compensate for transient loads such as the GUI (due to a windowing
system/editor temporarily usurping CPU time). While buffering allows you to
maintain an overall average throughput rate, control problems usually require
absolute performance. In these cases, the control inputs and outputs must
occur on time; in some cases they may even need to be synchronized with
external events (control of certain laser servos or digital-phase tracking
loops in GPS, for example).
Quantitative measurements are made possible by allowing the operator to vary
the display and control rates and disk/display blocking factors. (With
benchmark #1 the external background job is the running of a simple ASCII
editor.) Finally, there is an operator control that toggles between simulated
and real hardware. This is invaluable for testing when hardware hasn't been
built yet. All you have to do is specify a control algorithm.
To keep the closed-loop control part of the benchmark easy to understand, I
used a running mean and standard-deviation calculation. As with proportional
integral derivative (PID) control algorithms, state memory is required. Unlike
with PID algorithms, however, you don't have to be a control specialist or
have hardware to understand and verify correct operation of this benchmark.
Basically, as each new point is taken in, the effect of the old point leaving
the FIFO is removed; then the new, unscaled running-mean and mean-squared
values are calculated. Scaling is based on the size parameter. A new mean and
standard deviation is sent out at each sample point, which lets you test
without actually closing a loop. Inserting a test sine wave on a bias and
picking averaging times that are integral numbers of cycles should yield the
familiar s=A*`2/2 and m=bias results. Using the Watcom 9.0D compiler on a
66-MHz 486DX2 PC, this algorithm took 17 msec, or equivalently a sustained
rate of 57 kHz.
I built a second benchmark (benchmark #2) to measure how well each tool
translates the algorithm into compiled code. Because real-time scheduling
isn't needed for benchmark #2, it is fair to compare ControlCalc and LabView
in this one area. The code consists of two nested loops. The inner loop has
some algebra, trigonometry, and two accumulations. Accumulators require state
memory between iterations; because state memory is important for control, DSP,
and many other scientific calculations, I felt it was an important part of
this test for each tool. Example 2 is the pseudocode for benchmark #2.


ControlCalc and Graphical Real-Time Paradigms


To completely describe a problem, real-time software requires information
about data flow, control flow, and scheduling/timing. No single pictorial or
abstract programming convention ties all of these diverse requirements
together. It isn't that people haven't tried timing diagrams, state-machines,
and Buhr diagrams, but it took Rich Clarke, ControlCalc's creator, to realize
that users could quickly create real-time applications without constraining
the solution with unproven graphical models.
Like LabView, ControlCalc's graphical real-time spreadsheet approach has two
parts: a GUI builder and an automated code-generation tool. G-Window's Wedit
program builds the front panels, storing them in separate files. For the
programming paradigm, ControlCalc uses a spreadsheet to generate the
executable code. At the top left of Figure 1 (which illustrates the complete
system), the G-Windows Wedit program allows for the design of front panels
with special-purpose GUI widgets (gauges and the like). Wedit uses standard
drag-and-drop editing techniques. After naming all widgets within a window,
each newly created window is stored in a file. The /GC graphics-configure
option presents a dialog box listing each widget, its attribute variables, and
call-back functions, allowing you to then select the spreadsheet cells you
wish to attach them to. Because this architecture is open, you can add your
own widgets (as long as compound attributes are not used).
Next, you create all of the separate threads of execution. Each thread is one
"scan" page of a multipage spreadsheet (see the middle of Figure 1). By using
the /SC scan-configuration menu option, each page can be scheduled as either a
cyclic thread with specified fixed priority, an interrupt-service routine, an
initialization page, a clean-up page, or a page that can be signaled by the
user (see the right side of Figure 1).
To use this scheduling model, you must know when each scan page can be
triggered and what happens when it is. When a page is executed, it is scanned
from top left to bottom right. If cells contain formulas, they are executed;
if they contain text, they're skipped; and if they contain macros, they're
usually skipped unless attached to a user-activated input widget or keyboard
macro.
Scan-page triggering is determined by both configuration scheduling settings
and explicit programming calls. In terms of the /SC scan configurations, the
initialization and cleanup scans run at the obvious times. Cyclic scans can
run at periods between 2 and 65,536 clock ticks. The default tic size is 100
msec, but on 486 PCs, there is enough throughput to support 0.5-msec tics.
(This essentially requires regeneration of the OS-9000 system.) Thus, 1-kHz
cyclic scheduling is possible depending on the processor-clock speed. For
faster triggering rates, the interrupt configuration must be used, and I will
attempt to measure the maximum rate for using the control-loop benchmark. The
remaining /SC scan configuration, Signaled, is used to program explicit
interthread communication protocols such as "protect a critical region."
Signaled scan pages execute a wait just before they reach the first cell. When
they finally receive a signal or wakeup, they execute until hitting a sleep or
wait. Scan pages will yield to other scan pages during lengthy I/O operations.
When they hit the end of the scan, they go back to the top and wait. Given
these low-level building blocks, you can create most of the typical real-time
communication protocols such as the "share critical resource" in Figure 2.
ControlCalc allows several user-written ControlCalc programs to simultaneously
run and communicate with each other using signal numbers between 500 and 900.
The rules are that there are 127 pages, and up to 999 signal numbers. Signal
numbers 1--127 are synonymous with pages 1--127. Signal numbers 490--499 are
reserved for future use. Signal numbers greater than 500 can be linked with
OS-9000 events to talk with other applications, including other ControlCalc
programs.
Although there are 65,536 priority levels in OS-9000, G-Windows runs at
priority 50000. Thus, you effectively have about 10,000 priority levels, more
than enough for most projects to ensure that the GUI is placed at a lower
priority than the computations. But real-time GUI performance is actually
carried out by the built-in WINNER process (see Figure 1), which does all of
the interfacing between the scan pages and graphics windows, hiding the
details from the user. Input is always read at 5 Hz and is not missed.
Figure 3 shows the front panel for the ControlCalc/OS-9000 benchmark #1 when
it is running, Figure 4(a), the configurations of the six scan pages, and
Figure 4(b), the I/O page-definition file. Figure 5 is the graphics
configuration page, while Figure 6 shows formula and value modes for the
interrupt handler, respectively. Because OS-9000 is a real-time OS, you can
spawn a separate process to allow applications such as the umacs editor to run
without degrading the higher-priority parts of your ControlCalc program.
Watcom C performed the control-loop portion of benchmark #1 at 57-kHz rates;
ControlCalc did that part alone at 47-kHz rates.
Table 1 provides the ControlCalc results for benchmark #1. Actual performance
numbers depend on whether you have an optimized graphics display driver and
whether the A/D board can have the sample ready when the interrupt occurs. For
example, National Instruments' AT-MIO-16X board can have a 16-bit sample ready
when the interrupt occurs, while the Keithly Metrabyte DAS 8/PGA board cannot;
it only has a 30-msec settling time. Thus, the DAS 8 board is limited to
33-kHz rates, while the NI board is good for 100-kHz sampling. In terms of
computing hardware, the local-bus video board on my 66-MHz/486 wasn't
supported by G-Windows. However, RTWare had a 33-MHz/486 with a TSENG Labs
ET4000 video board that G-Windows does support. By measuring performance with
and without various graphics features enabled, it was relatively easy to
predict peak performance by extrapolation using the formula in Example 3.
Those extrapolations are denoted with an asterisk (*) in Table 1.
For example, if the interrupt handler runs at 5 kHz, and you can save 30 msecs
by using the NI board instead of the DAS8, you have saved 5 kHz*30 msec, or 15
percent loading. At 10 kHz, that's a 30-percent savings in load. Most
real-time system programmers like to see the CPU running at not more than
50-percent capacity. This allows you to add future enhancements. Table 1 shows
the ControlCalc results for the benchmark described in Example 1.
Even when ControlCalc is saturating, you can view spreadsheet pages and toggle
the display modes. Simple displays, gauges, readouts, buttons, and the like
can be running along with a 10-kHz control loop. But displays with intense
graphic operations eat up all available CPU loading. If you are doing intense
graphics displays on a single CPU system, 5 kHz seems more reasonable.
In terms of compilation, in benchmark #2 ControlCalc 1.77 took 18.4 msecs per
loop iteration compared with Watcom C's 12.48 msec.
The ControlCalc spreadsheet is similar to VisiCalc in both features and use.
For example, the classic AA1 type-naming conventions are used. Of course, an
additional page number is placed in front of each--1AA1 refers to the first
cell on page 1. However, there are some key differences. First, the
spreadsheet is strongly typed, and the following types are supported: doubles,
integers, text, variable text, and in some cases, history cells (FIFO
buffers). Second, ControlCalc applications are compiled. Although ControlCalc
automatically compiles before running, during development you must sometimes
issue the /TC toggle compile. While this gives good performance, there are
certain side effects: Until the spreadsheet is compiled, some cells do not
always evaluate and display meaningful results. You must also use the SETVAL
functions in the initialization pages to force some initial conditions each
time you start and stop a run. This is especially true of cells that
modify/increment themselves. The most important difference is that the
ControlCalc spreadsheet supports history cells. Because history cells can act
like either FIFOs or stacks, it is possible to insert buffering into almost
any operation on an as-needed basis.
You can give symbolic constants meaningful names such as SIGNAL(MY_KEY) vs.
SIGNAL(400). Unfortunately, the /SC menu commands accept only a number, not a
named constant. Thus, you cannot give pages names such as the_reader_page or
the_writer_page, nor can you place comments in any of the configuration menus.
Although printing the /SC configuration lets you see how the program is
organized in time, you have no clue about data flow. This is similar to the
Gosub facility in early Basics, which had no local variables. Also, loops and
logic statements require Gotos, although you can label the targets.
Unless you are very organized, ControlCalc is best suited for small- and
medium-scale jobs. Thus, you could use ControlCalc to control the engines of
an aircraft, but I wouldn't use it to control an entire aircraft until a full
set of development tools is available. LabView has similar problems when
tackling large problems. Despite its ability to have a hierarchy of diagrams,
it has trouble with timing and state variables. At a minimum, both ControlCalc
and LabView need some type of code management, a data dictionary, and
organized data-flow tools before undertaking extremely large projects.


LabView's Modified Data-Flow Paradigm


Figures 7 and 8 show LabView's front panel and wiring diagram for benchmark
#1, respectively. Wiring diagrams use G-Language, National Instruments'
"graphical" language. Because the data flow is used to determine execution
flow, the data-flow model must be specially modified for state equations. The
LabView approach is to append shift registers to a While loop (the small
triangles in Figure 8). In fact, to create the equivalent of C external
variables (static memory), you must use a subroutine with an uninitialized
shift register, a switch for reading and writing these variables, and a switch
to do a first-time initialization (or use LabView 3.0's globals). Figure 9 is
the data-flow equivalent of x(next_time)=x(current_time)+1.0.
LabView tries to mitigate the effects of the non-real-time host operating
systems (Macintosh 7.x, MS-DOS 6.x/Windows 3.1, and Solaris 2.0) by building
hardware buffers into all their acquisition boards. This essentially allows
one extra layer of fast buffering so that, on average, LabView can keep up its
throughput whenever the non-real-time OSs preempt the CPU for uncontrolled
amounts of time.



Wrapping it Up


In terms of widget assortment and aesthetics, LabView outshines
ControlCalc/G-Windows. In particular, LabView graphs and charts all have
on-the-fly palette controls, and all charts/graphs support multiple curves
with legends. You can add online help for each widget, making your front
panels self-documenting (although you take a temporary performance hit in real
time when this facility is used). LabView input controls and output indicators
can be placed in 1- or 2-D arrays and in clusters. G-Windows minimally
supports 1-D arrays of widgets, while 2-D arrays are manually wrapped 1-D
arrays. ControlCalc does not support compound (clusters of) widgets, but its
WINNER process guarantees good performance. Although the widget selection is
limited and not as good as LabView's, you can add your own. ControlCalc also
supports password protection and touchscreens. Finally, the polygon bar graph
can be used to create chemical tank and other interesting displays.
It is potentially easier to add C code in ControlCalc because you get source
code to portions of the software. This lets you debug routines while running
ControlCalc from within the OS-9000 debugger. With LabView, you get a debug
print window and memory-check routines, but that's about it. Consequently I
developed a mini-emulator. Thus, I inserted a dummy mainline into my LabView
code, created the LabView input/output arguments, and emulated a few key calls
so that I could test my routines without LabView. Without the emulator, you
usually crash LabView under MS-DOS or the Macintosh OS if you make a mistake.
Neither program is currently compatible with version-control software, but for
ControlCalc, it is relatively straightforward to output ASCII versions of the
spreadsheet-template files.
Although LabView is primarily for acquisition, analysis, and display output,
National Instruments does provide a PID control-application library. So
performance-wise, there is some merit in measuring maximum control rates while
simultaneously running low-rate graphic displays. No matter what I did to
minimize graphics calls on a 66-MHz 486 PC, I could not get more than 30-Hz
performance for my control loop. However, LabView clearly can get continuous
signal-processing rates of 20 kHz using an AT-MIO-16X board, including
displays. After the LabView people reviewed my benchmarks, this apparent
paradox was resolved.
The overhead between the 32-bit LabView model, 16-bit Windows, and
hardware-interrupt drivers limits you to 300 scans/second (transfers/second)
on a 66-MHz 486 PC. (The maximum scan rate on a 33-MHz 68030 Macintosh FX is
about 2 kHz, according to the author of the LabView PID package. Ellipsis
Inc., of Boston, MA, has found that if you write your C code with
operating-system-dependent callbacks and embed the control algorithm in the
A/D and D/A code, you can get synchronized kHz performance with low
update-rate graphics for symmetric-channel processing on a Macintosh.)
If you are only sending one point (BLOCK_SIZE=1), that's also the
control-algorithm limit. For DSP work (such as FFTs), you can process blocks
of data at a time. Thus, the effective sample rate is really limited to
MAX_SCAN_RATE*BLOCK_SIZE. As long as you have extra memory and your algorithm
can crunch BLOCK_SIZE pieces of data within one MAX_SCAN_RATE period, you can
keep up with the incoming data. Until LabView is hosted on a real-time OS, it
is best to use this tool for algorithms that can be block processed. You can
squeeze some additional performance by block processing your graphics. With
control benchmark #1, FIFO-driven displays were five times faster than the
non-FIFO versions.
With benchmark #2, ControlCalc 1.77 took 18.48 msecs per iteration to compile;
Watcom C took 12.48 msecs, and LabView (using globals) took 82.4. By using
shift registers on the outside of the loops, I got LabView down to 24.9 msecs,
at the expense of cluttering the diagram.


The Future


Both ControlCalc and LabView have enormous potential for real-time developers.
RTWare is preparing to release ASCII file exports amendable to code-management
systems, improved syntax for loops and logic statements, and better
data-dictionary tools. And because it's internally parsed into parallel
multithreaded loops, LabView will certainly utilize these features, too, when
Windows NT and Solaris show true real-time performance.

Example 1: Pseudocode for benchmark #1.
during setup, size = sampling_rate * requested_averaging_time;
check that size <= max FIFO size (avoid dynamic allocation for now, a bit
faster).
new_start = (start + 1) % size;
raw_mean --= data_buffer[start]; raw_rms2 --= data2_buffer[start];
data_buffer[start] = new_data; data2_buffer[start] = new_data*new_data;
raw_mean += data_buffer[start]; raw_rms2 += data2_buffer[start];
start=new_start; the_mean = raw_mean*scale; the_rms2 = raw_rms2*scale;
the_std2 = the_rms2 -- the_mean*the_mean;
(test if negative) the_std = sqrt(the_std2) else 0, inc total_err;

Example 2: Pseudocode for benchmark #2 (n1=n2=2001 for all results).
do ii=0,(n1--1),
 do jj=0,(n2--1)
 x=ii;y=jj;xpy=x+y;xty=x*y;
 x1 = sin(xpy);x2=sqrt(xty);x3=(x+1)/(y+1);
 x2a=--x2 if xty > 0; x1a=x1 if jj%100==0;
 x4=x1a+x2a;
 accumulators: x5+=1.0;x6+=x1a;
 end jj;
end ii;

 Figure 1: ControlCalc overview.
Figure 2: Using signals to share/protect a resource from simultaneous
reads/writes. Using the Name utility, define the name MY_KEY as a value of 400
so that SIGNAL(400) can be written as SIGNAL (MY_KEY). The ControlCalc command
is /NU MY_KEY V 400.
Given the following Scan Page Configurations (/SC) of:
 Page Configuration Comments
 1 Initialization Put one copy of MY_KEY into play
 2 Signaled The Reader Page
 3 Signaled The Writer Page
 4 Cyclic Will signal page 2 when wanting to read
 5 Interrupt Will signal page 3 when wanting to write


Shared memory that must be protected from having two pages read/write at the
same time


Init Page
SEND(MY_KEY)


Page 2

 ... some code
 ... grab the shared
 ... resource
WAIT (MY_KEY)
 ... got it, now read it
 ... now release it.
SEND(MY_KEY)
... rest of page 2.


Page 3
 ... some code
 ... grab the shared
 ... resource
WAIT (MY_KEY)
 ... got it, now write it
 ... now release it.
SEND(MY_KEY)
... rest of page 3.



 Figure 3: Front panel of the ControlCalc running benchmark #1.

 Figure 4: (a) Scan configuration settings for the ControlCalc benchmark #1;
(b) I/O page settings for ControlCalc benchmark #1.

Table 1: ControlCalc performance for benchmark #1.
 Platform All graphics hidden Graphics at 50 points per minute and turned off
(Two strip charts and one x,y plot)

 33-MHz, VGA, @0.02 kHz, load=10% @0.02 kHz, load=61%
 DAS8 64K Cache @1 kHz, load=28% @1 kHz, load=70%
 @2.5 kHz, load=53% @2.5 kHz, load=89%
 @5 kHz, load=96% @5 kHz, saturate--but runs
 33-MHz Tseng DAS8 @0.02 kHz, load=10% @0.02 kHz, load=48%
 @1 kHz, load=27% @1 kHz, load=65%
 @2.5 kHz, load=55% @2.5 kHz, load=89%
 @5 kHz, load=96% @5 kHz, saturate--but runs
 33-MHz VGA AT-MIO-16X *@0.02 kHz,load=10% *@0.02 kHz, load=61%
 *save rate_ kHz *@1 kHz, load=25% *@1 kHz, load=67%
 *30msec/10% *@2.5 kHz, load=48% *@2.5 kHz, load=82%
 from case 1 *@5 kHz, load=81% *@5 kHz, saturate
 33-MHz Tseng AT-MIO-16X *@0.02 kHz, load=10% *@0.02 kHz, load=48%
 *save rate_ kHz *@1 kHz, load=24% *@1 kHz, load=62%
 *30msec/10% *@2.5 kHz, load=48% *@2.5 kHz, load=82%
 from case 2 *@5 kHz, load=81% *@5 kHz, saturate--but runs
 66-MHz, VGA, @0.02 kHz, load=2% @0.02 kHz, load=42%
 DAS8 256K Cache @1 kHz, load=13% @1 kHz, load=54%
 @2.5 kHz, load=27% @2.5 kHz, load=82%
 @5 kHz, load=50% @5 kHz, load=92%
 @8 kHz, load=76% @8 kHz, load=99%
 @10 kHz, load=92% Saturate but runs
 66-MHz Tseng DAS8 No change from above Could not reliably extrapolate
 66-MHz VGA AT-MIO-16X *@0.02 kHz, load=2% *@0.02 kHz, load=42%
 *@1 kHz, load=10% *@1 kHz, load=51%
 *save rate_ kHz *@2.5 kHz, load=20% *@2.5 kHz, load=75%
 *30msec/10% *@5 kHz, load=35% *@5 kHz, load=77%
 from case 5 *@8 kHz, load=52% *Saturate but runs
 *@10 kHz, load=62% *Saturate but runs
 66-MHz Tseng AT-MIO-16X No change from above Could not reliably extrapolate

 * Extrapolated using the formula in Example 3.

Example 3: Formula used for performance extrapolation.
cyclic_tasks
 S freq_cyclic_taski*Comp_Timei+Background_Load<1
 i=1


 Figure 5: Graphics configuration page for ControlCalc benchmark #1.
 Figure 6: Interrupt-handler scan page using formula-display mode for
ControlCalc benchmark #1.
 Figure 7: LabView front panel for benchmark #1.
 Figure 8: LabView data-flow wiring diagram for benchmark #1.
 Figure 9: Two ways of implementing state memory with LabView's G-Language
dataflow notation; x(n+1)=x(n)+1.0.
For More Information

LabView
National Instruments
6504 Bridge Point Parkway, MS 53-02
Austin, TX 78730-5039
800-433-3488

ControlCalc
RTWare Inc.
714 Ninth Street, Suite 206
Durham, NC 27705
919-286-3114




































June, 1994
A Dual-UI Constraint Equation Solver in C++


When two UIs are better than one




Larry Medwin


Larry is the director of engineering at Advanced NMR Systems Inc., in
Wilmington, MA. He is currently responsible for the development of an MRI
(Magnetic Resonance Imaging) scanner dedicated to mammography. He can be
reached on the Internet at !uunet!thehulk!medwin!larry.
Irecently implemented a set of C++ programs to solve constraint equations,
including a program that uses the InterViews GUI toolkit for the X Window
System. Designing a constraint-equation solver turned out to be an interesting
problem, both algorithmically and in terms of its user interface.
Algorithmically, it's interesting because equations are normally programmed to
be solved in one direction--one is always solving for a particular unknown,
and the rest of the variables are always given. But many systems are modeled
in terms of mathematical relationships among many quantities. In such systems,
any one value can be found if all the rest are known. You can represent the
relationships among the variables and operators of a linear algebraic equation
by using what's known as a "constraint network." With this structure (and
appropriate algorithms), you can compute the value of any one variable
automatically as soon as the values of all other variables are given.
In terms of the user interface, I found it a design challenge to present a
clear and simple model of program functionality to the user via a GUI. I also
used a dual-mode architecture, implementing two versions: one using a GUI and
the other driven by a tty interface (simple commands read from stdio). Both
are connected to a common, equation-solving back end.
Although the constraint solver is a relatively small system (about 1500 lines
of code), the design approach used should allow scaling up to more powerful
systems. This article describes the implementation of the program, starting
with the dual-mode architecture.


A Dual-UI Architecture


The two versions of constraint solver are known as calc and icalc and
represent alternate front ends to the equation solver. The functionality of
the two programs is almost identical, but calc is accessed via a stdio-driven
interface, while icalc is a GUI program built with the InterViews toolkit for
X Windows. The application's ability to access the constraint-solving "engine"
via calc's stdio interface allows it to be used in batch-style regression
testing: calc receives input from test files, and the resulting output can be
compared automatically with known good output. Internally, user-visible
functionality is implemented via virtual functions. In calc, these become tty
commands for input, output, and control; in the GUI version, these functions
have overloaded equivalents.
To use calc, you launch the program and type some simple commands--for
example, the keyword equation followed by an algebraic equation. An equation
is defined as two algebraic expressions joined by an equal sign. Other
commands include the set command, to force a value on a variable, and unset,
which leaves the value unspecified. Note that a variable must be unset before
it can be given a new value with set. Finally, you can simply type the name of
a variable to obtain its solved value. At any time during the session, you can
use the equation command to switch to another problem.
The GUI version of the constraint solver is similar to the command-line
version in that there are three basic operations: editing an equation,
specifying whether a variable is forcing its value, and editing that value. In
designing the user interface to icalc, I made sure the user does not have to
know about things such as equation parsing or be aware of internal entities.
All the user needs to know is that he or she can type in an equation and
specify some, but not all, of the variables. The user cannot specify a
variable without first indicating that its value will be "forced;" the program
does not allow forcing if all remaining variables have already been forced.
When icalc is launched, a window similar to Figure 1 is displayed. An equation
is preloaded into the equation-editing area, and the user can immediately set
values of variables. To set the value of a variable, you must first click on
the corresponding Force Value pushbutton to change the variable's state so
that it can force a value. Then you can enter a value, which is a
floating-point number. This value can be changed at any time. Clicking on the
Force Value pushbutton again changes the variable's state so that it will no
longer accept values.


The Constraint Solver


The constraint-solving engine is in the constraint.c module, available
electronically (see "Availability," page 3). My implementation is based on a
system described in Abelson and Sussman's The Structure and Interpretation of
Computer Programs (MIT Press, 1985), except that their's is built using the
Lisp-like language Scheme.
I chose C++ because I found that the components of the constraint network can
be naturally modeled as objects that encapsulate functions and state.
Inheritance allows a "constraint protocol" to be defined for a virtual base
class, Constraint. The type and connectivity of objects of classes derived
from Constraint is known only to the equation parser and the objects
themselves. Polymorphism allows me to treat objects of classes derived from
Constraint as objects of class Constraint itself. Constrained values, from
Constants or user controlled Syms, are propagated around the network by
Connectors according to a message protocol.
The bidirectional properties of the constraint network are the result of the
behavior of the primitive constraints--operators which enforce relationships
like x+y=z. In this case, knowing any two variables allows computation of the
third. The other primitive constraints are Constants and Syms. A Constant
always forces its value, but a Sym is a UI object for input (user sets value)
or output (value is calculated from other known values). In the InterViews
version, I added a class derived from Sym, called IVSym, which inherits from
both Sym and the MonoGlyph classes in InterViews.
With the addition of Connector objects, primitive constraint objects can be
combined to express more complex relationships. Connectors connect pairs of
constraints. They can hold and propagate values. When a connector gets a new
value from one constraint, it sends the value to its other one. A constraint
can also "unconstrain" a connector. In this case, the Connector will tell its
other constraint to forget the value previously forced by that Connector.
A Binop (BINary OPerator) is an operator that computes a result from two
operands. It is a primitive constraint attached to three connectors. It
represents the symmetrical relationship between two operands and a result. The
symmetry results from the fact that knowing any two values, the third can be
computed. For example, assume x+y=z. If x and y are known, z can be
calculated. Alternatively, if z and y are known, x can be calculated as z--y.
The same Binop represents both addition and subtraction by assigning the
proper connectors to its operands and results. In normal operation of the
network, a Binop is told every time one of its connectors gets or loses a
value.
The root of the inheritance hierarchy is the Constraint class. A Constraint
object must be able to determine if it has enough information to calculate a
new value, and to tell attached Connectors about that new value. Similarly, it
can withdraw a value lost when one of its inputs loses a value. These
behaviors are implemented by the member functions process_new_value() and
process_forget_value(), defined as pure virtual member functions in
Constraint, implemented only its subclasses. This interface protocol allows
client classes, such as Connector, to invoke process_new_value() and
process_forget_value() on attached objects, knowing only that the objects are
derived from class Constraint.
The other common behavior of all Constraints has to do with the construction
of the constraint network. The member function replace_connector() asks the
Constraint to disconnect itself from one specified connector and replace it
with another. This is used when the equation parser sees an equal sign. All
Constraints are attached to one or more Connectors, but the connectivity is
specialized by the derived classes.
The process_delete() member function is similar to a destructor, except that
it requires a single argument to identify the caller. The network is destroyed
by traversing it and passing process_delete() messages recursively to each
network object, except the caller. If a conventional destructor were called,
the recursion step would pass a delete message to all attached Connectors,
including the caller, resulting in an infinite loop.
The Binop class encapsulates the behavior of a generalized undirected operator
taking two arguments. Binop has no state or value; it only responds to changes
in its inputs. Binop is also a virtual class, and although it has a
constructor, Binop objects themselves cannot be instantiated.


Propagation of Constraints


Now that you know what comprises a constraint network, I'll present an example
that illustrates how constraints propagate. This particular network expresses
the relationship between the force exerted on a spring, F, and the length to
which it is stretched, L; see Figure 2. The spring constant K gives the
"stiffness" of the spring, while the constant L0 gives the length of the
spring with no forces acting on it. As you can see from the figure, the two
circles represent Binops: a Multiplier and Adder. The four rectangles
represent the four Syms. Connectors (labeled c1 through c5) attach the Binops
and Syms to each other.
Because the network is initially created by the parser, the Syms do not have
values to propagate. Assume the user starts off by setting the value of the
spring constant K. As soon as the value of the K Sym is set, it sets the value
of Connector c4; c4 tells the Multiplier that it has acquired a value, but the
Multiplier determines that it only has one value and cannot calculate
anything.
Now the user sets L0, the unstretched length of the spring. Again, the L0 Sym
sets the value of Connector c2, but the Adder doesn't have enough information
to perform any computations.
Say the user assigns a value to the Sym L that tells Connector c1 to perform a
computation; this, in turn, tells Adder to do the same. Adder finds that two
of its connectors have values, so it computes a third value (by subtracting
the value on Connector c2 from the value on Connector c1) and passes this
value to the Connector c3. Connector c3 passes its value to the Multiplier,
which finds that it has two values, so it multiplies the values on Connectors
c3 and c4 and passes the product to Connector c5. Finally, Connector c5 tells
the Sym to display its value.
Before you can use the network to compute the length L from a known force F,
you must first stop forcing the value of L. The user tells the Sym L to
"unforce" its value, and the Sym passes this message to the Connector c1. C1
sees that the value has been retracted by the Constraint that set it in the
first place, so it forgets its value, and informs the Adder that it no longer
has one. The Adder sees that one of its Connectors that had been forcing its
value has withdrawn that value, and tells Connectors c2 and c3 to forget their
values. C2 knows that its value was set by the Sym L0, so it retains its
value, but Connector c3 was forced by the Adder. So c3 forgets its value and
passes a "forget value" message along to the Multiplier. The process repeats
until the value of Sym F is no longer forced. At this point, the user can have
the Sym F force its own value on the network.


The User Interface


Listing One (page 93) is calc.c. This is the straightforward, stdio-driven
version of the user interface, which simply invokes a yacc-generated parser to
process input from stdio. The electronic version of the source code contains
the grammar and lexer source files (gram.y and lex.l).
The second version of the program is the GUI implemention in icalc.h and
icalc.c (Listings Two and Three, page 93). This UI is built using InterViews,
a public-domain GUI toolkit in C++ created by Mark Linton at Stanford
University. It contains class libraries for user-interface objects and for
drawing and interacting with users, as well as general-purpose classes such as
lists and strings. A central part of the design of InterViews is the glyph, a
lightweight screen object. Characters are implemented as glyphs, as are the
various layout objects: hbox, vbox, hglue, and vglue. These objects are all
constructed by calls to a LayoutKit instance, essentially a "factory" of
layout objects.
The FieldEditor, used in the equation editor and the IVSym object, is built by
an instance of DialogKit, another factory, this one for interactive objects.
The FieldEditor allows a line of text to be edited and displayed; a callback
function gets invoked when a line is entered. Another widget factory,
WidgetKit, is used to get instances of buttons. InterViews allows the
look-and-feel of these objects to be Motif-like or OpenLook-like, depending on
a command-line argument used when invoking the application.
The instance hierarchy of UI objects basically consists of two levels. An
instance of class App is at the root. The areas of the user interface that are
not background are the eq_editor_ (the equation editor), and sym_box_ (the
vbox holding the IVSym). The sym_box_ is built by traversing the symbol table
and appending each IVSym inside the sym_box_.

The instance of class IVSym is the most complex user-interface object in this
program. Nested inside its hbox are a Label, Patch, and Button, each of which
is separated from the other and the edges of the hbox by hglue. The Label
shows an IVSym's name, and the Button allows the user to force or unforce an
IVSym's value. The Patch causes a portion of the screen to be redisplayed when
it changes.
Underneath the Patch is a Deck, an unusual interface object which contains a
set of Glyphs but only displays one. Inside the Deck are two FieldEditors:
ed_, which represents the IVSym value when it can be edited; and uned_, which
shows the value but does not allow the user to change it. The Deck has a
member function, flip_to(), which gives the index of the glyph to display.
When the IVSym is built, the Deck is given a list of pointers to ed_ and
uned_, as in the following statement: deck_=layout.deck( uned_, uned_, ed_,
ed_);. Due to an InterViews bug, the symbol area is not redisplayed with the
new variable values when you delete the equation in the equation area and type
in a new one. Resizing the window (using the window manager) forces the symbol
area to be redrawn.


Future Directions


This program can be enhanced in several ways. One idea is to extend the
underlying constraint mechanism to solve simultaneous equations. This might
involve the propagation of partially constrained values around the network,
and algorithms to reduce sets of partially constrained values to known values.
This would be an interesting test of the generality and reusability of the
class design used in this program.
 Figure 1: Constraint solver with graphical-user interface.
 Figure 2: An example constraint network; F=K*(L--L0).


Fresco: The Next-Generation InterViews




Mark Linton




Mark, known as the father of the InterViews toolkit, is the principle
architect of Fresco. Portions of this article first appeared in X Resources,
published by O'Reilly & Associates. Mark can be reached at linton@sgi.com.


The X Consortium's officially supported successor to InterViews is Fresco, a
technology that supports "graphical embedding" by combining structured
graphics and application embedding into a single architectural framework. With
graphical embedding, you can compose both graphical objects (circles,
polygons, and the like) and application-level editors (word processors,
spreadsheets, and so on) with a uniform mechanism. Embedded application
objects can appear within graphical scenes and will be appropriately
transformed.
The Fresco API in X11R6 covers Xlib and Xt functionality, with additional
features for graphical embedding. The Fresco API is defined using the
Interface Definition Language (IDL) that's part of the Common Object Request
Broker Architecture (CORBA) from the Object Management Group (OMG). Among
IDL's advantages are:
IDL is more abstract than object-oriented programming languages, yet concrete
enough to be translated automatically into source code.
IDL can be mapped to several languages. OMG has defined a mapping to C and is
defining a mapping for C++ and Smalltalk.
IDL can support distribution--the sender and receiver of an operation may be
on different machines in a network.
Being able to distribute IDL-specified objects across address spaces or
machines is particularly important for application embedding. For instance,
you often might prefer relatively large applications to run as independent
processes. Distribution also increases the likelihood of concurrency because
of the desire to take advantage of available processing power. Even in the
absence of distribution, there are certain applications that are simpler when
implemented using multiple threads. The Fresco sample implementation (Fresco
1.0) supports multithreaded applications by locking access to shared data and
spawning an independent thread to handle screen redraw.
As part of the support for graphical objects, Fresco operations are
screen-resolution independent--coordinates are automatically translated by the
implementation from application-defined units to pixels. Consequently, an
application's appearance is the same on different screens or printers, without
special coding by the programmer. Fresco includes a type that provides
stencil/paint operations for rendering.
Fresco 1.0 in X11R6, written entirely in C++, includes an IDL-to-C++
translator, C++ include files, and a library that implements the Fresco
classes. This implementation does not completely support distribution, though
extensions to the run-time library would make that possible.
The sample implementation also includes an application called "Dish" that
allows invocation of Fresco operations through the Tcl scripting language
(developed at the University of California, Berkeley). Fresco does not include
a Tcl implementation, but if you have the Tcl library installed, then Dish
uses it along with CORBA dynamic invocation to create and manipulate Fresco
objects from a script.
To summarize, the distinguishing features of Fresco are a standard object
model (OMG CORBA) with a C++ implementation, resolution-independence, and
graphical embedding. The use of CORBA means Fresco objects can be distributed
across a network. Fresco also supports multithreaded applications in a single
address space.
[LISTING ONE] (Text begins on page 44.)

/***** CALC.C --- simple tty interface to constraint solver.
#include <stdio.h>
#include "constraints.h"

extern int yyparse();

main()
{
 Phase = INIT;
 yyparse(); // Parse input
 exit(0);
}

[LISTING TWO]

// *** icalc.h -- Larry Medwin 4/5/93 (Abridged version)
// *** InterViews user interface for the constraint system

extern class IVSym;
extern class Lexio;
extern class App;
extern Style* global_style;

extern App* Myapp;

declareFieldEditorCallback(App)
declareFieldEditorCallback(IVSym)
declareActionCallback(IVSym)

//---------------------------------------------------------------------
class IVSym : public Sym , public MonoGlyph
{
 public:
 IVSym(Sym*);
 ~IVSym() {};
 void accept_editor(FieldEditor*); // FieldEditor callbacks
 void cancel_editor(FieldEditor*);
 void click_me(); // Pushbutton callback
 static Style* sym_style; // set by App::App()
 private:
 void IVSym_init();
 // UI stuff
 FieldEditor* ed_; // User enters values here
 FieldEditor* uned_; // Replaces ed_ when not forced
 Deck* deck_;
 Patch* patch_;
 Button* pbutton_; // Displays name_

 virtual void show_val(); // Display value and state
 virtual void show_no_val();
 virtual void show_state();
};
// Replaces lex I/O handlers to read from char arrays instead of stdio
class Lexio
{
 private:
 char* equation_; // char string for lex
 char* current_; // cursor for lex
 public:
 Lexio(String);
 ~Lexio();
 // Undefine lex I/O macros
# undef lex_input()
# undef unput(char)
 // Lex I/O handlers
 int lex_input();
 void unput(char);
};
//----------------------------------------------------------------------
class App // Canvas with equation editor and ActiveSyms
{
 private:
 String equation_; // FieldEditor String
 IVSym* network_; // Access to constraint network
 public:
 App(Style*);
 Lexio* lex_handler;
 FieldEditor* eq_editor_; // InterViews components
 Glyph* sym_box_; // vbox to hold sym_area_
 void accept_editor(FieldEditor*); // FieldEditor callbacks
 void cancel_editor(FieldEditor*);
 // Start off App with initial equation

 void kick_start(char *); // (shouldn't need this)
};

[LISTING THREE]

//*** icalc.c -- user interface for InterViews-based version of
// the constraint equation solver. By Larry Medwin, 1993

#include <stream.h>
#include "icalc.h"

Style* IVSym::sym_style; // Set by App::App(), read by IVSym::IVSym()
App* Myapp; // Global I/O handlers for lex
implementFieldEditorCallback(IVSym)
implementActionCallback(IVSym)

//*** class IVSym ***
IVSym::IVSym( Sym* source_sym) : Sym(source_sym->name())
{
 state_ = source_sym->state(); // Copy state_ and value_
 value_ = source_sym->value();
 IVSym_init();
}
void IVSym::IVSym_init()
{
 WidgetKit& kit = *WidgetKit::instance();
 const LayoutKit& layout = *LayoutKit::instance();
 //--- Init field editors (one for editing, one for uneditable display)
 kit.begin_style("ed");
 ed_ = DialogKit::instance()->field_editor(
 " ", sym_style,
 new FieldEditorCallback(IVSym)(
 this, &IVSym::accept_editor, &IVSym::cancel_editor)
 );
 kit.end_style();
 //-------------------
 kit.begin_style("uned");
 uned_ = DialogKit::instance()->field_editor(
 " ", sym_style,
 new FieldEditorCallback(IVSym)(
 this, &IVSym::cancel_editor, &IVSym::cancel_editor)
 );
 kit.end_style();
 //--- Put ed and uned in deck arrange the glyphs in the deck in same
 // order as enum state_ values, so that flip_to can index the proper
 // glyph by state
 deck_ = layout.deck( uned_, uned_, ed_, ed_);
 deck_->flip_to( state());

 patch_ = new Patch( deck_);
 //--- The Pushbutton
 pbutton_ = kit.push_button("Force value",
 new ActionCallback(IVSym)(this, &IVSym::click_me));
 //--- Put all this stuff in an hbox
 body(
 layout.vbox(
 layout.vglue(),
 layout.hbox(
 layout.hglue(), kit.label(name()), // Sym name

// layout.hglue(), kit.label(" "), // spaces
 layout.hglue(), patch_, // deck
 layout.hglue(), pbutton_,
 layout.hglue()
 ),
 layout.vglue()
 )
 );
 //-------------------
 ed_->field(""); // Start these guys up
 uned_->field("");

 return;
};
void IVSym::click_me() // Implement state transition
{
 switch(state()) {
 case UNCONSTRAINED:
 state(ACTIVE_NO_VALUE);
 break;
 case PASSIVE: // Can't make transition if Sym is constrained
 break;
 case ACTIVE_WITH_VALUE:
 state(UNCONSTRAINED);
 con_->forget_value( this );
 break;
 case ACTIVE_NO_VALUE:
 state(UNCONSTRAINED);
 break;
 default:

 assert(True);
 }
 // see /home/iv/src/examples/zoomer/main.c
 patch_->redraw();
 return;
//------------------------------------------------------------------
void IVSym::accept_editor(FieldEditor* ed)
{
 float ed_val;
 assert( state() != PASSIVE); // Shouldn't try to set this
 // value if it is being forced
 const String& valueStr = *ed->text(); // Get value from field editor
 if (!valueStr.convert(ed_val)) { // Illegal number?
 cancel_editor(ed);
 }
 else {
 con_->forget_value(this); // Remove last constraint
 state(ACTIVE_WITH_VALUE); // Update state & propagate value
 set_value(ed_val);
 }
 return;
};
void IVSym::cancel_editor(FieldEditor* ed)
{ switch(state()) // Restore old value
 { case PASSIVE:
 case ACTIVE_WITH_VALUE:
 show_val();
 break;

 case UNCONSTRAINED:
 case ACTIVE_NO_VALUE:
 show_no_val();
 break;
 default:
 assert(True);
 }
 return;
};
void IVSym::show_val()
{
 char tmp[100];
 assert( state() != ACTIVE_NO_VALUE);
 assert( state() != UNCONSTRAINED);
 switch(state())
 {
 case ACTIVE_WITH_VALUE:
 sprintf( tmp, "%6.2f", value());
 ed_->field(tmp);
 break;
 case PASSIVE:
 sprintf( tmp, "%6.2f", con_->get_value());
 uned_->field(tmp);
 break;

 default:
 assert(True);
 }
 return;
}
//----------------------------------------------------------------------
{
 assert( state() != ACTIVE_WITH_VALUE);
 assert( state() != PASSIVE);
 switch(state())
 {
 case ACTIVE_NO_VALUE:
 ed_->field("");
 break;
 case UNCONSTRAINED:
 uned_->field("");
 break;
 default:
 assert(True);
 }
 return;
}
void IVSym::show_state()
{
 deck_->flip_to(state());
 patch_->redraw();
 return;
}
//*** class Lexio ***
Lexio::Lexio( String equation)
{
 // Copy into lex input string
 // Format: "equation " %s \0'
 equation_ = new char[equation.length()+strlen("equation ")+1];

 sprintf( equation_, "equation %s", equation.string());
 current_ = equation_;
}
Lexio::~Lexio()
{
 delete[] equation_;
}
int Lexio::lex_input()
{
 if (current_ < equation_) // Don't step back over start of string...
 return 0; // EOF
 else // Return char, or 0 at end of string
 return *(current_++);
//---------------------------------------------------------------------
void Lexio::unput(char c)
{

 if (current_ > equation_) // If not before beginning
 *(--current_) = c; // Push back pointer and put in char
 return;
}
//*** class App ***
implementFieldEditorCallback(App)
App::App(Style* style) : network_(0)
{
 const LayoutKit& layout = *LayoutKit::instance();
 IVSym::sym_style = new Style(style); // Save copy of style
 // Init field editor
 eq_editor_ = DialogKit::instance()->field_editor(
 " ", IVSym::sym_style,
 new FieldEditorCallback(App)(
 this, &App::accept_editor, &App::cancel_editor
 )
 );
 sym_box_ = layout.vbox(); // Init CurSymArea & its enclosing box
}
void App::accept_editor(FieldEditor* ed)
{
 IVSym* tmp_sym;
 WidgetKit& kit = *WidgetKit::instance();

 // Get rid of old constraint network
 // if( network_ != 0) {
 // network_->process_delete((Connector*)0);
 // delete network_;
 //}
 for (GlyphIndex g = sym_box_->count(); g > 0; g--) // Clean up glyphs
 {
 sym_box_->remove(g-1);
 }
 lex_handler = new Lexio( *ed->text()); // Get FieldEditor string to lex
 delete Symtab; // Create and empty symbol table
 Symtab = new SymbolList;
 // Parse the equation, generate constraint network and symbol table
 Phase = PARSE;
 yyparse();
 // Build up sym_box area
 for (ListUpdater(SymList) i(*Symtab); i.more(); i.next())
 {

 Sym* cur_sym = i.cur();
 tmp_sym = new IVSym( cur_sym); // "Promote" Sym to IVSym
 // Replace Sym with IVSym in the constraint network
 Connector* tmp_con = cur_sym->con_;
 tmp_con->disconnect(cur_sym);
 tmp_sym->connect(tmp_con);
 // delete cur_sym;
 // Put the IVSym in the sym_box_ area
 sym_box_->append(tmp_sym);
 }
 // Keep a hook into the network so we can clean up later

 // network_ = tmp_sym;
 // UI gives us access to symbols, not the symbol table
 // delete Symtab;
 // Send expose event (How??)
//---------------------------------------------------------------------
void App::cancel_editor(FieldEditor* ed)
{
 ed->field(equation_); // Restore old equation
 return;
//---------------------------------------------------------------------
void App::kick_start(char* s) // Start it off
{
 eq_editor_->field( s ); // Set up initial string
 accept_editor( eq_editor_); // Get started
 return;
}
//*** main() ***
int main(int argc, char** argv)
{
 Session* session = new Session("FieldEditorTest", argc, argv);
 WidgetKit& kit = *WidgetKit::instance();
 const LayoutKit& layout = *LayoutKit::instance();
 Myapp = new App(session->style());
 Myapp->kick_start("a+b+c+d+e=0"); // Start it off
 return session->run_window( // Set up app window
 new ApplicationWindow(
 new Background(
 layout.hbox(
 layout.hglue(),
 layout.vbox(
 layout.vglue(), Myapp->eq_editor_,
 layout.vglue(), Myapp->sym_box_,
 layout.vglue()
 ),
 layout.hglue()
 ),
 kit.background()
 )
 )
 );
 exit(0);
}
End Listings







June, 1994
Rethinking Memory Management


Portable techniques for high performance




Arthur D. Applegate


Arthur is the author of SmartHeap, the portable memory-management library
published by MicroQuill Software Publishing. He can be contacted at
applegate@bix.com.


Applications written in C and (especially) C++ spend much of their execution
time allocating and de-allocating memory. Moreover, when applications run in
virtual-memory environments, references to objects stored in dynamically
allocated memory are the main cause of excessive swapping--a condition that
can bring application performance to unacceptable lows.
In this article, I explain why the traditional approaches to dynamic-memory
management--malloc and operator new--are inadequate for memory-intensive
applications. I suggest alternative techniques that you can use on all
platforms to speed up allocation and minimize swapping. Finally, I present an
empirical case study that shows how applying these techniques can result in
very dramatic improvements in overall application performance.


Why Memory Management Matters


Today, most applications use a great deal of dynamic memory. Operating systems
and hardware platforms make ever-larger address spaces available, and
applications are growing in size to take advantage of this available memory.
All modern operating systems now provide virtual memory. While virtual memory
has real benefits, its presence can affect application performance more than
any other factor--a single memory reference can be 10,000 times slower if it
causes a page fault! The rate of page faulting is determined by only two
factors: available physical memory and how you manage memory in your
application.
Two other features of modern operating systems--multitasking and
multithreading--also increase the demands on memory management. Every time
there is a context switch, the data your application is referencing may be
swapped out to make room for the other thread or process. Unless each thread's
data is contained in a very tight working set, every context switch will cause
a lot of unnecessary swapping.
The increasing popularity of C++ also magnifies the importance of efficient
heap management. By their nature, C++ programs use dynamic memory much more
heavily than their C counterparts--often for small, short-lived allocations.


malloc Woes


The malloc API specified by ANSI is a general-purpose search for an
arbitrarily sized free area of memory. (Note that compilers generally
implement C++ operator new directly in terms of malloc.) malloc must handle
all block sizes and allocation patterns. Since it does not take a parameter to
specify a heap, all allocations through malloc must share the same heap.
The traditional implementation of malloc is simply a list of free blocks that
is searched linearly until a suitably sized block is found. With a
steady-state pattern of allocations and frees of random sizes, the number of
free blocks will, on average, be about half the number of blocks in use; for
proof, see Donald Knuth's The Art of Computer Programming Volume 1, second
edition (Addison-Wesley, 1973). Consequently, the performance of malloc
degrades in proportion to the total number of allocation requests.
In virtual-memory environments, most malloc implementations scale
pathologically when the total heap size exceeds currently free physical
memory. When malloc's free list does not fit in memory, malloc can thrash,
with every call causing page faults. In some implementations, individual calls
to malloc can then cause hundreds of page faults, with a single call taking
seconds rather than microseconds.
Applications generally reference data items more than once. Therefore,
efficiency of references to your own data can be even more important than
references by malloc to its own free list. Good locality is the key to fast
references to your data in virtual-memory environments.
Because all callers of malloc in a given process share the same heap,
individual blocks allocated by malloc tend to be scattered arbitrarily
throughout the process's address space. Consequently, a single data
structure's elements are strewn across many pages, together with unrelated
data--the data has poor locality. The working set required to hold the entire
data structure is thus much larger than necessary, since the virtual-memory
manager swaps in units of pages. Traversing the data structure therefore
causes many expensive page faults. The problem is compounded even further if
there are context switches during the data-structure traversal.
Even in non-virtual-memory environments, malloc has another problem: It wastes
memory. Because malloc must handle blocks of variable size, it is prone to
fragmentation. This, of course, is the condition in which many free blocks too
small to be useful are trapped between larger, in-use blocks.
Another instance of memory waste involves granularity and overhead. The size
that your program passes to malloc gets rounded up to the nearest granularity
multiple--often between 8 and 32 bytes. Overhead consists of the internal
header associated with every memory block, which can be as much as 8 or 16
bytes per allocation. Memory waste is significant even if address space is
plentiful, because you pay for any increase in working-set size with page
faults.
In multithreaded applications, malloc incurs significant extra overhead in
serializing access to the heap. Having only one heap also increases contention
and prevents processors from being concurrently active in heap operations on
multiprocessor platforms.


Solutions


The single most important improvement on malloc is the provision of multiple
heaps, or memory pools. With multiple memory pools, you can partition your
program's allocations into groups that correspond to your program's data
structures.
Many benefits result from this change. First, allocation speed improves,
because heap sizes are smaller. Second, there is much less swapping, because
locality is vastly improved, which reduces working-set sizes. Third, there is
less fragmentation, because per-pool allocations vary less in size and extent.
Fourth, there is faster freeing, because you can free an entire memory pool
rather than individually freeing thousands of blocks. Finally, you can
concurrently allocate in separate heaps without expensive synchronization in
multithreaded applications.


Fixed-Size Allocators


Another approach is to change the allocator's algorithm. Remember that malloc
degrades with heap size, and is roughly O(n) (in physical memory). What we'd
really like is an allocator closer to O(c).
Remember that in most applications, especially those written in C++, the
allocations are usually quite small. In fact, most allocations are for small,
fixed-size class or structure objects. You could allocate these very
efficiently by maintaining free lists of blocks of just the right size for
each major class or structure. Allocation is then just a matter of popping off
the head of the free list, and deallocation is pushing on a new head.
The benefits of this method include: greater speed than with an allocator of
variable-size; better scaling, on the order of O(c) rather than O(n); no
internal fragmentation, because all free blocks are just the right size; good
locality, if implemented correctly; and, finally, potential elimination of
overhead and granularity, because there's no need to store the size with each
block (since blocks are of fixed size).


Overloading operator new in C++



In C++, you can overload operator new for individual classes. This allows you
to customize memory management on a per-class basis without users of the class
even knowing about it (they just call operator new in the usual way).
You can establish a fixed-size allocator for a class, and allocate objects
from fixed-size memory pools according to the data structure to which the
class object will belong. This elegantly combines the benefits of multiple
memory pools and fixed-size allocators.


A Linked-List Example


I'll illustrate the techniques I've discussed with an example that directly
applies to many applications: linked-list manipulation. The sample application
randomly inserts and deletes randomly sized strings in five linked lists. It
does a full linear traversal of each list and then deletes each list. The list
size is an input parameter.
The first implementation (Listing One, page 96) uses the default global
operator new. Element insertion demonstrates allocation speed. All list links
and values are allocated from the same heap. If you run the program with a
size such that total list data exceeds available physical memory, operator new
thrashes before insertion is complete.
List traversal demonstrates how poor locality impacts the speed of memory
references. If the total size of list data exceeds physical memory, each list
search thrashes. This is because the lists' memory is all allocated from the
same heap. Each page in the heap contains data from each of the lists, so the
total number of pages needed for one list (the working-set size) is five times
more than it needs to be.
When the program terminates, the destructors for each list are invoked. The
list destructor frees each element, demonstrating deallocation speed.


Improving Memory Management


A second version of the program is identical except for memory-management
calls. This version uses SmartHeap APIs to demonstrate multiple memory pools,
a fixed-size allocator, and memory-pool de-allocation.
Listing Two (page 97) shows the changed definitions in the new version. Each
List object has its own memory pool, from which Link objects and string values
of that list are allocated.
List's constructor calls MemPoolInitFS to initialize the memory pool with a
fixed-size threshold equal to the size of a Link. This allows all Link objects
to be allocated with a fixed-size algorithm.
List::Insert allocates a Link from List's pool and passes the pool to Link's
constructor. Note that the memory pool is passed to operator new with the
placement syntax.
Link defines its own operators new and delete. Link::operator new calls
MemAllocFS, the SmartHeap fixed-size allocator. Link's constructor also
allocates string values from the owning list's memory pool. This is
reasonable, since Link objects and their associated string values tend to be
referenced and freed together.
List's destructor simply calls MemPoolFree to deallocate the entire memory
pool. This frees all Links and all string values. There is no need to free
each Link object or string individually. Moreover, the pages in which these
objects reside need not even be touched (they might not be resident).
Note that main, List::RandomOp(), and List::Find are not changed at all, yet
the times for each of these functions are dramatically different in the two
implementations.


Performance Results


Table 1(a) shows the times, in seconds, printed by both versions of the
program when run with a count of 20,000 in Windows 3.1 on a 25-MHz 386 with 4
Mbytes of RAM. Table 1(b) shows the results with a count of 40,000 in Windows
NT running on a 33-MHz 486 with 16 Mbytes of RAM. Table 1(c) shows the results
with a count of 150,000 in HP-UX 9.0 on an HP 710 with 32 Mbytes of RAM.
In each case, changing only the memory-management calls results in an order of
magnitude or greater overall application performance improvement. This example
is admittedly somewhat contrived in that the program allocates more data than
will fit in available physical memory. However, it illustrates the impact
memory management can have on operations that applications commonly and
frequently perform. It also serves to alert you to the fact that memory
management cannot be ignored if your application is to perform well in a
virtual-memory environment.


Alternative Implementations


The techniques I've discussed are illustrated here with SmartHeap, a
commercial memory manager available for most platforms. However, other
alternatives are available to you, in some cases.
For example, in Windows 3.x, you can use LocalAlloc to suballocate in
GlobalAlloced blocks to create multiple heaps. Paul Yao describes this
technique in detail in his Windows 3.0 Power Programming Techniques (Bantam
Books, 1990). The Win32 API provides a facility for creating multiple heaps;
see HeapAlloc in the Win32 API Reference.
For a fixed-sized allocator in C++, a good place to start is Bjarne
Stroustrup's The C++ Programming Language (Addison-Wesley, 1991), which
includes simple, fixed-size allocator implementations.
If you want to embark on your own implementation of a multiple-pool memory
manager, Knuth's The Art of Computer Programming, Volume 1, second edition, is
required reading.

Table 1: (a) Windows 3.1 benchmarks (20,000 iterations); (b) Windows NT
benchmarks (40,000 iterations); (c) HP-UX benchmarks (150,000 iterations).
(Note: All times are in seconds.)
 Operation Program 1 Program 2

 (a) Insertion 963.01 24.17
 Search 26.15 15.92
 Deletion 91.42 2.09
 Overall App 1086.65 45.15

 (b) Insertion 22.65 9.12
 Search 175.79 0.50
 Deletion 159.29 0.14
 Overall App 359.76 10.04

 (c) Insertion 319.57 19.97
 Search 171.15 2.82
 Deletion 320.04 0.35
 Overall App 813.21 23.15


[LISTING ONE] (Text begins on page 52.)

//--------------------------------------------------------------------
// Memory management exerciser. By Arthur Applegate.
// Program 1: Creates, traverses, and destroys linked links, to show
// the effect of memory management on an application's performance.
//--------------------------------------------------------------------

#include <stdio.h>
#include <time.h>
#include <string.h>
#include <stdlib.h>

//-----------------------------------------------------------------------------
// class Timer: Instances of this class, when declared as automatic variables,
// record and print the time taken to execute the block in which they are
declared.
//-----------------------------------------------------------------------------
class Timer
{
private: const char *m;
 clock_t startTime;
public: Timer(const char *msg) : m(msg), startTime(clock()) { }
 ~Timer()
 { printf("\n%s: %.2f seconds.", m,
 (double)(clock()-startTime) / (double)CLOCKS_PER_SEC);
 }
};
//-------------------------------------------------------------------
class Link // Link objects are list elements
{ friend class List;
public: Link (const char *val, Link *p, Link *n);
 ~Link ();
private: char *value;
 Link *prev;
 Link *next;
};
//-------------------------------------------------------------------
class List // List objects are doubly-linked lists
{
public: List ();
 ~List ();
 const Link *Insert (Link *prev, const char *val);
 void Delete (Link *remove);
 void RandomOp ();
 const Link *Find (const char *val) const;
private: Link *head;
 Link *tail;
};
//------------------------------------------------------------------
 // create a list element
Link::Link (const char *val, Link *p, Link *n)
{
 value = new char[strlen(val)+1];
 strcpy(value, val);
 if ((prev = p) != NULL) prev->next = this;

 if ((next = n) != NULL) next->prev = this;
}
//-----------------------------------------------------------------

Link::~Link () // destroy a list element
{
 delete value;
 if (prev) prev->next = next;
 if (next) next->prev = prev;
}
//------------------------------------------------------------------
List::List () // create a new list: clear head, tail
{
 head = tail = NULL;
}
//------------------------------------------------------------------
List::~List () // destroy list: delete each Link individually
{
 while (head)
 Delete(head);
}
//-----------------------------------------------------------------
 // add a new link after specified element
const Link *List::Insert (Link *prev, const char *val)
{
 Link *next = prev ? prev->next : head;
 Link *l = new Link (val, prev, next);
 if (!prev) head = l;
 if (!next) tail = l;
 return l;
}
//-----------------------------------------------------------------
void List::Delete (Link *l) // remove a link; no-op if link NULL
{
 if (!l) return;
 if (head == l) head = l->next;
 if (tail == l) tail = l->prev;
 delete l;
}
//-----------------------------------------------------------------
void List::RandomOp () // insert (80%) or delete (20%) an element
{
 if (rand() % 5 != 0)
 {
 const maxStr = 64;
 char buf[maxStr+1];

 // generate a string of random length between
 // 1 and 64 bytes, with random fill value
 int len = rand() % maxStr + 1;
 memset(buf, rand() % 128, len);
 buf[len] = \0';
 Insert(tail, buf);
 }
 else
 Delete(head);

}
//-----------------------------------------------------------------
 // find the first element with a given value
const Link *List::Find (const char *val) const
{
 for (register Link *l = head; l; l = l->next)

 if (strcmp(l->value, val) == 0)
 break;
 return l;
}
//-----------------------------------------------------------------
void main(int argc, char *argv[])
{
 if (argc < 2)
 {
 printf("\nusage: lists <element-count>");
 return;
 }
 long count = atol(argv[1]);

 Timer t("Overall Application");
 List *list1 = new List, *list2 = new List, *list3 = new List,
 *list4 = new List, *list5 = new List;

 // test allocation speed
 // note that to properly benchmark allocations, we need
 // interspersed de-allocations since all allocators will
 // be fast with an empty free-list!
 {
 Timer t("Insertion");

 // generate and insert `count' strings of random lengths
 // and contents; occasionally delete to simulate a
 // dynamically shrinking and growing linked list
 while (count--)
 {
 // intersperse each operation so that elements are
 // chaotically distributed in memory if we do not
 // have multiple memory pools
 list1->RandomOp();
 list2->RandomOp();
 list3->RandomOp();
 list4->RandomOp();
 list5->RandomOp();
 }
 }
 // test locality: each list will touch five times as many
 // pages if allocations were from a single heap
 {
 Timer t("Search");

 list1->Find("not present");
 list2->Find("not present");
 list3->Find("not present");
 list4->Find("not present");
 list5->Find("not present");
 }
 // destructors for lists 1-5: test freeing speed
 {
 Timer t("Deletion");

 delete list1;
 delete list2;
 delete list3;
 delete list4;

 delete list5;
 }
}

[LISTING TWO]

//--------------------------------------------------------------------
// Memory management exerciser, version 2. By Arthur Applegate.
// This version has changes for improved performance. Only ten lines
// of code needed to be changed from version shown in Listing 1. These
// changes are indicated with "***" in the comment line.
//--------------------------------------------------------------------

#include <smrtheap.hpp> // *** include SmartHeap C++ header file

//--------------------------------------------------------------------
class Link
{ friend class List;
public: // *** constructor receives memory pool parameter
 Link (MEM_POOL pool, const char *val, Link *p, Link *n);
 ~Link ();
private: // *** overload new/delete to use fixed-size algorithm,
 // *** allocating from the owning List's memory pool
 void *operator new(size_t, MEM_POOL pool)
 { return MemAllocFS(pool); }
 void operator delete(void *mem) { MemFreeFS(mem); }
 char *value;
 Link *prev;
 Link *next;
};
//--------------------------------------------------------------------
class List
{
public: List();
 ~List();
 const Link *Insert(Link *prev, const char *val);
 void Delete(Link *remove);
 void RandomOp();

 const Link *Find(const char *val) const;
private: Link *head;
 Link *tail;
 MEM_POOL pool; // *** per-List memory pool data member
};
//--------------------------------------------------------------------
// *** create a list element
Link::Link(MEM_POOL pool, const char *val, Link *p, Link *n)
{
 // *** use placement syntax to allocate string
 // *** from owning List's memory pool
 value = new (pool) char[strlen(val)+1];

 strcpy(value, val);
 if ((prev = p) != NULL) prev->next = this;
 if ((next = n) != NULL) next->prev = this;
}
//--------------------------------------------------------------------
List::List() // create a new list
{

 // *** initialize memory pool for Link's and strings
 pool = MemPoolInitFS(sizeof(Link), 0, 0);
 head = tail = NULL;
}
//--------------------------------------------------------------------
List::~List() // destroy a list
{
 // *** no need to free individual links or values,
 // *** as freeing the pool frees all Links and string values
 // note that if properly implemented, freeing the memory pool
 // will not even _touch_ the pages that contain the individual
 // blocks -- so there is no swapping regardless of heap size
 MemPoolFree(pool);
}
//--------------------------------------------------------------------
 // add a new link after specified element
const Link *List::Insert(Link *prev, const char *val)
{
 Link *next = prev ? prev->next : head;

 // *** use placement syntax to allocate Link
 // *** from current List's memory pool
 Link *l = new (pool) Link(pool, val, prev, next);

 if (!prev) head = l;
 if (!next) tail = l;
 return l;
}
End Listings

































June, 1994
Writing PCMCIA Software


Programming for portable systems




Troy A. Miles


Troy runs Entertainment Software Partners and can be reached via the Internet
at troymiles@delphi.com.


The Personal Computer Memory Card International Association (PCMCIA) 1.0
specification defined a standard for adding 68-pin memory cards to a computer
system. PCMCIA 2.0 added support for I/O devices such as modems, pagers, LAN
cards, and hard disks. Unlike their ISA and MCA cousins, system- and
CPU-independent PCMCIA cards don't have jumpers or dip switches. Instead,
they're configured via software and can be inserted and extracted with the
system powered.
Nearly all portable DOS machines now include PCMCIA slots. Additionally, both
the IBM DOS and OS/2 operating systems include PCMCIA support (and it's
rumored that forthcoming versions of Microsoft MS-DOS and Windows will, too).
This article examines what's involved in writing PCMCIA software for DOS-based
systems.


PCMCIA Hardware Architecture


The first layer of PCMCIA hardware is the socket, a receptacle for a PCMCIA
card. There's no limit to the number of sockets a system can have, although
one, two, and four are the most common. Two or more sockets are usually joined
to an adapter, which is hooked into the system bus. Unlike hard disks or
modems, there are no standard I/O addresses used by adapters. Nor are there
any standard mappings for the registers on the adapters. All of this is left
to the system designer. The adapter allows the card's memory and I/O resources
to be mapped into the system under software control. For example, a modem card
can be made to appear at any available COM port by executing the appropriate
code. PCMCIA cards are about the size of a credit card. Type I cards are the
thinnest and are mainly used for memory cards. Type IIs are thicker and used
mostly for modems and LANs. Type IIIs are the thickest and are used by
rotating hard drives.
PCMCIA cards have both common and attribute on-board memory. Common memory
stores data and is usually present only in memory cards. Attribute memory,
present on all 2.0-compliant cards, is where information about the card is
kept. The card information structure (CIS), beginning at address 0 in
attribute memory, consists of a linked list of data blocks called "tuples"
which hold information about the type of card and the information the software
needs to configure it.


PCMCIA Software Architecture


PCMCIA software consists of socket and card services. Socket services, the
lowest layer (similar to BIOS), is written specifically to support a single
type of PCMCIA adapter. It is intended to be small and needs minimal RAM,
allowing it to be ROMable. Card services, on the other hand, is more like DOS.
It never deals directly with the hardware, using socket services, instead.
Card services is the API most important to programmers. While it is possible
to bypass card services and go directly to socket services, doing so could be
dangerous. Card and socket services are tightly synchronized. If this
synchronization is lost, a system crash or hardware damage can result. If you
make direct calls to socket services, you'd better know what you're doing.


Card Services API


There are 54 card-services functions, divided into five types: client
services, resource management, bulk memory, client utilities, and advanced
client services. Client-service functions get information from card services.
Resource-management functions allocate and free system resources. (This is how
PCMCIA card resources are made visible to the rest of the system.) Bulk-memory
services are for accessing common memory, and client utilities parse tuple
information. Advanced client-service functions get even lower level
information from card services.
All of the functions are called with a similar argument. In C, a card-services
call has the form: results=CardServices(function, handle, pointer, arglength,
argpointer); in which function is a byte that indicates the card-services
function number, handle is a word used to send or receive a client handle, and
pointer is a far pointer to code or data (for most functions, pointer is
NULL). arglength holds the size of the buffer that argpointer points to.
argpointer is a far pointer to an argument packet which sends/receives
information from card services. If a card-services call is successful, results
will be 0; otherwise it holds an error number. Some functions return
additional error information in the argument packet.
Since a bad PCMCIA program can damage the hardware, card services does a lot
of checking on parameters passed to it. If any of the values are determined to
be bad, the function call will fail before any action is taken. It is critical
to always check the values card services returns because, unlike normal DOS
programming, PCMCIA is dynamic. Hardware can be added and removed from the
system at any time. You should never make assumptions about the state of the
system.


The Cardinfo Program


The Cardinfo program presented in Listing One (page 152) returns information
about cards plugged into the first or second socket of the system. It is
written mostly in C, although two functions are written in inline assembly.
I've tested Cardinfo on Toshiba, IBM, AST, and Compuadd notebooks, using both
Phoenix and SystemSoft PCMCIA software.
The most important function in this program is CardServices, which
encapsulates the call to card services with a C-callable function. Under DOS,
card services is called via interrupt 0x1A, function 0xAF. The value in AL
holds the number of the desired card-services function. Other registers hold
the other arguments. Putting this call into a C function and its argument
packet in a structure is easier to manage than manipulating bytes in assembly.
GET_CARDSERVICES_INFO is the first call every PCMCIA program should make,
since nothing will work without card services. If card services is installed,
the return value will be SUCCESS (or 0) and the signature word of the argument
packet will be set to CS. Both values must be checked, since other functions
use interrupt 0x1A, and any of those can accidentally zero the AX register
which holds the return value.
Besides confirming card-services installation, GET_CARDSERVICES_INFO returns
the number of PCMCIA sockets, the vendor revision number, and the version of
card services. All this information is displayed before the program enters its
main loop. In the 2.0 version of card services, sockets were numbered
beginning with 1. In version 2.01 and later, socket numbering begins with 0.
Cardinfo always begins its numbering with 1. It determines what the base
socket is by checking the PCMCIA version.
Since PCMCIA is a dynamic environment, card services provides a callback
mechanism for notifying you of card events as they occur. To use the callback,
a call is made to REGISTER_CLIENT. The argument packet first contains the
attribute which tells card services what kind of client your program is. This
allows card services to set your priority level, since there may be more than
one client in the system. Priority is given first to I/O clients, then to
memory-technology drivers, and finally to memory clients. The Cardinfo program
case assumes a memory client which will get notification after all other
clients. The next piece of information is the event mask, which tells card
services what kind of events you are interested in. We are only interested in
detecting changed events, indicated by setting bit 7. The last piece of
information is the version of card services the program was written to be
compliant with. Since it's compliant with both version 2.0 and 2.01, the
program returns the version number it gave us to card services.
If the request for a callback is granted, card services returns with SUCCESS,
and ClientHandle will hold the client handle, which is used later to free the
callback. Callbacks are a limited system resource, so not all requests will be
granted. Be sure to check the value that is returned.
If the callback request is granted, the program enters the information-display
loop. The first pass through the loop will force an update by initially
setting fCardEvent to True. Afterwards, fCardEvent will be True only after an
event occurs.
Card services makes getting informa-tion from a card simple. A call to
GET_STATUS will return the state of a socket. If a card is in the socket, the
CARD_DETECT_FLAG of the CardState word will be set. The version 1.0
information tuple (CISTPL_VERS_1) contains the manufacturer and product-name
strings. The GET_FIRST_TUPLE and GET_TUPLE_DATA card services calls returns
these strings in the GetLevel1Info function. The strings themselves will be in
a buffer with NULLs separating them and a double NULL marking the end of the
buffer.
A call to GetDeviceType is made to determine the type of card. It makes calls
to GET_FIRST_TUPLE and GET_TUPLE_DATA also, except the desired tuple is
CISTPL_DEVICE. Only memory types are defined. If it is an I/O card, the device
type will be DTYPE_NULL or DTYPE_FUNCSPEC. Memory devices also return their
size in bytes. If the memory type is DTYPE_SRAM, the state of the device's
battery is contained in wState.
The information-display loop continues to wait for card events until the user
presses a key. At that point, the callback is free, and the program returns to
DOS. If the callback isn't free, card services will call it after the next
card event and crash the system.


Conclusion



There are many ways to enhance Cardinfo. For instance, you could have it tell
the user what kind of I/O card is being used by comparing the I/O addresses
used by the card with those of known PC devices.
Finally, keep in mind that the PCMCIA specification is an open standard. If
you'd like more information on it, contact the Personal Computer Memory Card
International Association, 1030G E. Duane Ave., Sunnyvale, CA 94086
(408-720-0107).
[LISTING ONE] (Text begins on page 150.)

//*** CARDINFO.C -- by Troy-Anthony Miles -- A PCMCIA Information Utility ***

#include <dos.h>
#include <conio.h>
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include "cardinfo.h"

#define MAX_VENDOR_STRING (80-10)
#define MAX_SOCKET 2

BYTE buffer[256], heap[256];
WORD ClientHandle=0, fCardEvent=TRUE, bEventType=0;
WORD wPCMCIAVersion, wVendorVersion, wSockets, wBaseSocket;

DWORD SizeCodes[] = {
 0x200, //* 512 bytes
 0x800, //* 2k
 0x2000, //* 8k
 0x8000, //* 32k
 0x20000, //* 128k
 0x80000, //* 512k
 0x200000 //* 2m
};
char *OldCard = "Information Unavailable";
char *strLines[] = {" product:", " line 3:", " line 4:"};
char *strDevices[] = {"ROM","OTPROM","EPROM","EEPROM","FLASH","SRAM","DRAM"};
WORD CardServices(WORD, WORD, DWORD, WORD, void far *);
WORD GetLevel1Info(WORD socket, char *buffer);
WORD GetDeviceType(WORD socket, DWORD *size, char *buffer);
WORD SetCallbackHandler(void);
void far Callback(void);
char *CreateMemoryString(DWORD value);
void FilterString(char *string);
void SetCursorPos(WORD x, WORD y);
void ClearScreen(void);
//*******************************************************************
void main()
{
 STATUS_INFO *statInfo = (STATUS_INFO *)heap;
 CS_INFO *csInfo = (CS_INFO *)heap;
 DWORD dwSize;
 WORD wState, wStatus, wType;
 ClearScreen();
 SetCursorPos(0, 0);
 puts("CardInfo - A PCMCIA Information Utility");
 //* See if CS is installed and get basic information
 csInfo->Signature = 0;
 wStatus = CardServices(GET_CARDSERVICES_INFO,0,0L,sizeof(CS_INFO),csInfo);
 if(wStatus != SUCCESS csInfo->Signature != SIGNATURE)
 {
 puts("ERROR: Card Services not installed.");
 return;
 }

 SetCursorPos(0, 2);
 wSockets = csInfo->Count;
 wVendorVersion = csInfo->Revision;
 wPCMCIAVersion = csInfo->CSLevel;
 //* after v2.0 socket numbering began at 0, before it was 1
 wBaseSocket = (wPCMCIAVersion > 0x200)? 0: 1;
 //* Display the basic information
 printf("PCMCIA version: %X.%02X, Vendor version: %X.%02X, Sockets: %d",
 HIBYTE(wPCMCIAVersion), LOBYTE(wPCMCIAVersion),
 HIBYTE(wVendorVersion), LOBYTE(wVendorVersion), wSockets);
 //* Try to get a callback handle, exit if can't get one
 if(SetCallbackHandler())
 {
 puts("ERROR: Unable to allocate a callback handle.");
 return;
 }
 //* Display the Vendor Information string
 SetCursorPos(0, 3);
 if(csInfo->VStrLen)
 {
 FilterString(csInfo->VendorString);
 printf("Vendor: [%s]", csInfo->VendorString);
 }
 //* This is the information display loop
 do
 {
 //* Wait until a card event occurs
 if(fCardEvent)
 {
 WORD ndx, wCurrSocket;
 fCardEvent = FALSE;
 //* Display the event
 SetCursorPos(0, 4);
 printf("Last Card Event: 0x%02X", bEventType);
 //* Display info for each socket
 for(wCurrSocket=wBaseSocket, ndx=1;
 ndx <= wSockets && ndx <= MAX_SOCKET;
 wCurrSocket++, ndx++)
 {
 int iYLine;
 //* Erase the socket information
 for(iYLine=0; iYLine < D_LINES; iYLine++)
 {
 SetCursorPos(0, (ndx-1)*D_LINES + D_Y + iYLine);
 printf(" ");
 }
 //* Get the socket state information
 statInfo->Socket = wCurrSocket;
 CardServices(GET_STATUS, 0, 0L, sizeof(STATUS_INFO), statInfo);
 wState = statInfo->CardState;
 //* Is there a card in the socket?
 if(wState & CARD_DETECT_FLAG)
 {
 WORD wOffset, wNumLines, wCurrLine;
 //* Display level 1 information
 wNumLines = GetLevel1Info(wCurrSocket, buffer) - 1;
 SetCursorPos(0, (ndx-1)*D_LINES + D_Y);
 printf(" Socket %d: %s", ndx, buffer);
 //* Loop through and display each line of level 1 info

 for(wOffset=0, wCurrLine=1; wCurrLine < 4; wCurrLine++)
 {
 SetCursorPos(D_X, (ndx-1)*D_LINES + D_Y + wCurrLine);
 printf("%s", strLines[wCurrLine-1]);
 wOffset = strlen(&buffer[wOffset]) + wOffset + 1;
 if(wNumLines)
 {
 printf(" %s", &buffer[wOffset]);
 wNumLines--;
 }
 else
 printf(" %s", "UNDEFINED");
 }
 //* Display type information
 wType = GetDeviceType(wCurrSocket, &dwSize, buffer);
 SetCursorPos(D_X, (ndx-1)*D_LINES + D_Y2 + 0);
 printf("Device Type: %s", buffer);
 //* Only memory cards have size information
 if(wType && wType != DTYPE_FUNCSPEC)
 {
 SetCursorPos(D_X, (ndx-1)*D_LINES + D_Y2 + 1);
 printf(" Size: %s", CreateMemoryString(dwSize));
 printf(", Write Protect: %s",(wState & WRITE_PROTECT)?"ON": "OFF");
 //* Only an SRAM will have a battery
 if(wType == DTYPE_SRAM)
 {
 printf(", Battery: ");
 if(wState & BATTERY_DEAD_FLAG)
 printf("Dead");
 else if(wState & BATTERY_LOW_FLAG)
 printf("Low");
 else
 printf("Okay");
 }
 }
 }
 else
 {
 SetCursorPos(0, (ndx-1)*D_LINES + D_Y);
 printf(" Socket %d: Empty", ndx);
 }
 }
 }
 }while(!_kbhit());
 //* Flush the keyboard buffer
 while(_kbhit())
 _getch();
 //* Deallocate the callback else CS will crash later
 if(ClientHandle)
 if(CardServices(DEREGISTER_CLIENT, ClientHandle, 0L, 0, 0L))
 puts("ERROR: Unable to free the callback handle.");
 SetCursorPos(0, 23);
}
//***** Call CS via its interrupt *****
WORD CardServices(WORD Function, WORD Handle, DWORD Pointer,
 WORD ArgLength, void far *ArgPointer)
{
 WORD wStatus, wHandle;
 DWORD pointer;

 _asm
 {
 push es
 push di
 push si
 mov bx, WORD PTR ArgPointer
 mov ax, WORD PTR ArgPointer+2
 mov es, ax
 mov dx, Handle
 mov si, WORD PTR Pointer
 mov di, WORD PTR Pointer+2
 mov cx, ArgLength
 mov al, BYTE PTR Function
 mov ah, CARD_SUBFUNCTION
 int CARD_INTERRUPT
 mov wStatus, ax
 mov wHandle, dx
 mov WORD PTR pointer, si
 mov WORD PTR pointer+2, di
 pop si
 pop di
 pop es
 }
 if(Handle && Function == REGISTER_CLIENT)
 *((WORD *)Handle) = wHandle;
 return(wStatus);
}
//*** Receives Card Services callback when a card event occurs sets the flag
//* ENTRY: AL - holds the event type
//* EXIT: AX - is set to zero to indicate no errors
void far Callback()
{
 _asm
 {
 push ds
 mov bx, SEG fCardEvent ;get the data segment
 mov ds, bx
 inc fCardEvent ;set the event flag
 mov BYTE PTR bEventType, al ;save the event type
 xor ax, ax
 pop ds
 clc
 }
}
//*** Get Level one information (manufacturer, product name, etc) from card.
//* ENTRY: socket - the socket to retrieve the information from
//* buffer - a pointer to a buffer which will recieve the info
//* it should be at least 256 bytes which is the max tuple length
//* EXIT: buffer - hold the info strings, separated by \0'
//* RETURN: the number of strings in buffer
WORD GetLevel1Info(WORD socket, char *buffer)
{
 WORD len, ndx1, ndx2, wStatus1, wStatus2, strCount;
 TUPLE_INFO *pTup;
 TUPLE_DATA_INFO *pTupData;
 strCount = 0;
 //* We want the level 1 info tuple. Find it with GET_FIRST_TUPLE
 //* Then retrieve its data with GET_TUPLE_DATA
 pTup = (TUPLE_INFO *)heap;

 pTupData = (TUPLE_DATA_INFO *)heap;
 pTup->Socket = socket;
 pTup->Attributes = 0;
 pTup->DesiredTuple= CISTPL_VERS_1;
 pTup->Reserved = 0;
 wStatus1 = CardServices(GET_FIRST_TUPLE, 0, 0L, sizeof(TUPLE_INFO), pTup);
 wStatus2 = CardServices(GET_TUPLE_DATA, 0, 0L, sizeof(TUPLE_DATA_INFO),
 pTupData);
 //* If data was retrieved successfully, sort it out
 if(wStatus1 == SUCCESS && wStatus2 == SUCCESS)
 {
 ndx1 = 2;
 ndx2 = 0;
 while(pTupData->TupleData[ndx1] != 0xFF && strCount < 4)
 {
 strCount++;
 strcpy(&buffer[ndx2], &pTupData->TupleData[ndx1]);
 len = strlen(&pTupData->TupleData[ndx1]) + 1;
 ndx2 += len;
 ndx1 += len;
 }
 }
 //* If an error occurred, Information Unavailable
 else
 {
 strCount++;
 strcpy(buffer, OldCard);
 }
 return(strCount);
}
//*** Retrieves information from the Device Information tuple. If the
//* card is memory, its size is also returned
//* ENTRY: socket - the socket to retrieve the information from
//* dwSize - a pointer to a DWORD which will receive the device's size
//* buffer - a pointer to a buffer which will receive the info
//* it should be at least 256 bytes which is the max tuple length
//* EXIT: buffer - hold the type string
//* RETURN: the device type
WORD GetDeviceType(WORD socket, DWORD *dwSize, char *buffer)
{
 WORD wStatus1, wStatus2, wType;
 TUPLE_INFO *pTup;
 TUPLE_DATA_INFO *pTupData;
 *dwSize = 0;
 //* We want the device tuple. Find it with GET_FIRST_TUPLE
 //* Then retrieve its data with GET_TUPLE_DATA
 pTup = (TUPLE_INFO*)heap;
 pTupData = (TUPLE_DATA_INFO*)heap;
 pTup->Socket = socket;
 pTup->Attributes = 0;
 pTup->DesiredTuple= CISTPL_DEVICE;
 pTup->Reserved = 0;
 wStatus1 = CardServices(GET_FIRST_TUPLE, 0, 0L, sizeof(TUPLE_INFO), pTup);
 wStatus2 = CardServices(GET_TUPLE_DATA, 0, 0L, sizeof(TUPLE_DATA_INFO),
 pTupData);
 //* If data was retrieved successfully, sort it out
 if(wStatus1 == SUCCESS && wStatus2 == SUCCESS)
 {
 wType = (WORD)((pTupData->TupleData[0] & 0xF0) >> 4);

 if(wType != DTYPE_NULL)
 {
 if(wType >= DTYPE_ROM && wType <= DTYPE_DRAM)
 {
 strcpy(buffer, strDevices[wType - 1]);
 strcat(buffer, " Memory");
 if(pTupData->TupleData[1] != 0xFF)
 {
 WORD wSizeCode, wUnits;
 wSizeCode = (WORD)(pTupData->TupleData[1] & 0x07);
 wUnits = (WORD)(((pTupData->TupleData[1] & 0xF8) >> 3) + 1);
 *dwSize = (DWORD)(wUnits * SizeCodes[wSizeCode]);
 }
 }
 else if(wType == DTYPE_FUNCSPEC)
 {
 strcpy(buffer, "FUNCTION SPECIFIC");
 }
 }
 else
 {
 strcpy(buffer, "Input/Output");
 }
 }
 else
 {
 strcpy(buffer, OldCard);
 wType = 0;
 }
 return(wType);
}
//*** Sets the callback handler to our callback function. If status
//* is !0, an error occurred.
//* ENTRY: none
//* EXIT: none
WORD SetCallbackHandler(void)
{
 WORD wStatus;
 REGISTER_CLIENT_INFO pRC;
 //* We are memory client device driver. We only want card detect events
 //* We want the version, CS it told us it was
 pRC.Attributes = 0x0001;
 pRC.EventMask = 0x0080;
 pRC.Version = wPCMCIAVersion;
 wStatus = CardServices(REGISTER_CLIENT, (WORD)&ClientHandle,
 (DWORD)(void far *)Callback, sizeof(REGISTER_CLIENT_INFO), &pRC);
 if(wStatus)
 ClientHandle = 0;
 return(wStatus);
}
//*** Converts a DWORD byte value into an ASCII string containing the
//* value in normalized byte form
//* ENTRY: value - holds the byte value to be converted
//* EXIT:
//* RETURN: a pointer to the size string. This string will be overwritten
//* the next time this function is called
char *CreateMemoryString(DWORD value)
{
 static char line[40];

 if(value < 0x400L)
 sprintf(line, "%ld bytes", value);
 else if(value < 0x100000L)
 sprintf(line, "%ld KB", value / 0x400L);
 else
 sprintf(line, "%ld MB", value / 0x100000L);
 return(line);
}
//*** This function limits a string to MAX_VENDOR_STRING characters and
//* changes any non-printable characters to .'
//* ENTRY: szString - the string to be filtered
//* EXIT: szString - the string filtered string
void FilterString(char *szString)
{
 int ndx;
 for(ndx=0; ndx < MAX_VENDOR_STRING && *szString; ndx++, szString++)
 {
 if(!isprint(*szString))
 *szString = .';
 }
 *szString = \0';
}
//*** Set the cursor position using BIOS ***
void SetCursorPos(WORD x, WORD y)
{
 _asm
 {
 mov ax, 0200h
 mov bh, 00
 mov dl, BYTE PTR x
 mov dh, BYTE PTR y
 int 10h
 }
}
//*** Clear the screen using BIOS ***
void ClearScreen()
{
 _asm
 {
 mov ax, 0600h
 mov bx, 0700h
 xor cx, cx
 mov dx, (25-1)*100h+(80-1)
 int 10h ;Scroll up using BIOS
 }
}
//*** CARDINFO.H -- by Troy-Anthony Miles ***
//* GENERAL DEFINITIONS
typedef unsigned char BYTE;
typedef unsigned short int WORD;
typedef unsigned long DWORD;

#define LOBYTE(w) ((BYTE)(w))
#define HIBYTE(w) ((BYTE)(((WORD)(w) >> 8) & 0xFF))
#define LOWORD(l) ((WORD)(DWORD)(l))
#define HIWORD(l) ((WORD)((((DWORD)(l)) >> 16) & 0xFFFF))
#define FALSE 0
#define TRUE 1
#define CARD_INTERRUPT 0x1A

#define CARD_SUBFUNCTION 0xAF
#define SIGNATURE SC'

//* POSITIONS
#define D_LINES 7
#define D_Y 6
#define D_Y2 (D_Y + 4)
#define D_X 0

//* FUNCTION CODES
#define GET_CARDSERVICES_INFO 0x0B
#define REGISTER_CLIENT 0x10
#define DEREGISTER_CLIENT 0x02
#define GET_STATUS 0x0C
#define RESET_CARD 0x11
#define SET_EVENT_MASK 0x31
#define GET_EVENT_MASK 0x2E

#define REQUEST_IO 0x1F
#define RELEASE_IO 0x1B
#define REQUEST_IRQ 0x20
#define RELEASE_IRQ 0x1C
#define REQUEST_WINDOW 0x21
#define RELEASE_WINDOW 0x1D
#define MODIFY_WINDOW 0x17
#define MAP_MEM_PAGE 0x14
#define REQUEST_SOCKET_MASK 0x22
#define RELEASE_SOCKET_MASK 0x2F
#define REQUEST_CONFIGURATION 0x30
#define GET_CONFIGURATION_INFO 0x04
#define MODIFY_CONFIGURATION 0x27
#define RELEASE_CONFIGURATION 0x1E

#define OPEN_MEMORY 0x18
#define READ_MEMORY 0x19
#define WRITE_MEMORY 0x24
#define COPY_MEMORY 0x01
#define REGISTER_ERASE_QUEUE 0x0F
#define CHECK_ERASE_QUEUE 0x26
#define DEREGISTER_ERASE_QUEUE 0x25
#define CLOSE_MEMORY 0x00

#define GET_FIRST_TUPLE 0x07
#define GET_NEXT_TUPLE 0x0A
#define GET_TUPLE_DATA 0x0D
#define GET_FIRST_REGION 0x06
#define GET_NEXT_REGION 0x09
#define GET_FIRST_PARTITION 0x05
#define GET_NEXT_PARTITION 0x08

#define RETURN_SS_ENTRY 0x23
#define MAP_LOG_SOCKET 0x12
#define MAP_PHY_SOCKET 0x15
#define MAP_LOG_WINDOW 0x13
#define MAP_PHY_WINDOW 0x16
#define REGISTER_MTD 0x1A
#define REGISTER_TIMER 0x28
#define SET_REGION 0x29
#define VALIDATE_CIS 0x2B

#define REQUEST_EXCLUSIVE 0x2C
#define RELEASE_EXCLUSIVE 0x2D
#define GET_FIRST_CLIENT 0x0E
#define GET_NEXT_CLIENT 0x2A
#define GET_CLIENT_INFO 0x03
#define ADD_SOCKET_SERVICES 0x32
#define REPLACE_SOCKET_SERVICES 0x33
#define VENDOR_SPECIFIC 0x34
#define ADJUST_RESOURCE_INFO 0x35

//* FLAGS
#define WRITE_PROTECT 0x01
#define BATTERY_DEAD_FLAG 0x10
#define BATTERY_LOW_FLAG 0x20
#define CARD_DETECT_FLAG 0x80

//* RETURN CODES
#define SUCCESS 0x00
#define BAD_ADAPTER 0x01
#define BAD_ATTRIBUTE 0x02
#define BAD_BASE 0x03
#define BAD_EDC 0x04
#define BAD_IRQ 0x06
#define BAD_OFFSET 0x07
#define BAD_PAGE 0x08
#define READ_FAILURE 0x09
#define BAD_SIZE 0x0A
#define BAD_SOCKET 0x0B
#define BAD_TYPE 0x0D
#define BAD_VCC 0x0E
#define BAD_VPP 0x0F
#define BAD_WINDOW 0x11
#define WRITE_FAILURE 0x12
#define NO_CARD 0x14
#define UNSUPPORTED_FUNCTION 0x15
#define UNSUPPORTED_MODE 0x16
#define BAD_SPEED 0x17
#define BUSY 0x18
#define GENERAL_FAILURE 0x19
#define WRITE_PROTECTED 0x1A
#define BAD_ARGS_LENGTH 0x1B
#define BAD_ARGS 0x1C
#define CONFIGURATION_LOCKED 0x1D
#define IN_USE 0x1E
#define NO_MORE_ITEMS 0x1F
#define OUT_OF_RESOURCE 0x20
#define BAD_HANDLE 0x21

//* TUPLES
#define CISTPL_NULL 0x00
#define CISTPL_DEVICE 0x01
#define CISTPL_CHECKSUM 0x10
#define CISTPL_LONGLINK_A 0x11
#define CISTPL_LONGLINK_C 0x12
#define CISTPL_LINKTARGET 0x13
#define CISTPL_NO_LINK 0x14
#define CISTPL_VERS_1 0x15
#define CISTPL_ALTSTR 0x16
#define CISTPL_DEVICE_A 0x17

#define CISTPL_JEDEC_C 0x18
#define CISTPL_JEDEC_A 0x19
#define CISTPL_CONFIG 0x1A
#define CISTPL_CFTABLE_ENTRY 0x1B
#define CISTPL_DEVICE_OC 0x1C
#define CISTPL_DEVICE_OA 0x1D
#define CISTPL_VERS_2 0x40
#define CISTPL_FORMAT 0x41
#define CISTPL_GEOMETRY 0x42
#define CISTPL_BYTEORDER 0x43
#define CISTPL_DATE 0x44
#define CISTPL_BATTERY 0x45
#define CISTPL_ORG 0x46
#define CISTPL_END 0xFF

//* DEVICE TYPES
#define DTYPE_NULL 0x0
#define DTYPE_ROM 0x1
#define DTYPE_OTPROM 0x2
#define DTYPE_EPROM 0x3
#define DTYPE_EEPROM 0x4
#define DTYPE_FLASH 0x5
#define DTYPE_SRAM 0x6
#define DTYPE_DRAM 0x7
#define DTYPE_FUNCSPEC 0xD
#define DTYPE_EXTEND 0xE

//* EVENTS
#define PM_RESUME 0x0B
#define PM_SUSPEND 0x0C
#define BATTERY_DEAD 0x01
#define BATTERY_LOW 0x02
#define CARD_INSERTION 0x40
#define CARD_LOCK 0x03
#define CARD_READY 0x04
#define CARD_REMOVAL 0x05
#define CARD_RESET 0x11
#define CARD_UNLOCK 0x06
#define EJECTION_COMPLETE 0x07
#define EJECTION_REQUEST 0x08
#define ERASE_COMPLETE 0x81
#define EXCLUSIVE_COMPLETE 0x0D
#define EXCLUSIVE_REQUEST 0x0E
#define INSERTION_COMPLETE 0x09
#define INSERTION_REQUEST 0x0A
#define REGISTRATION_COMPLETE 0x82
#define RESET_COMPLETE 0x80
#define RESET_PHYSICAL 0x0F
#define RESET_REQUEST 0x10
#define MTD_REQUEST 0x12
#define CLIENT_INFO 0x14
#define TIMER_EXPIRED 0x15
#define SS_UPDATED 0x16

//* STRUCTURES
typedef struct {
 WORD InfoLen;
 WORD Signature;
 WORD Count;

 WORD Revision;
 WORD CSLevel;
 WORD VStrOff;
 WORD VStrLen;
 BYTE VendorString[80];
}CS_INFO;

typedef struct{
 WORD Socket;
 WORD CardState;
 WORD SocketState;
}STATUS_INFO;

typedef struct {
 WORD Socket;
 WORD Attributes;
 BYTE DesiredTuple;
 BYTE Reserved;
 WORD Flags;
 DWORD LinkOffset;
 DWORD CISOffset;
 BYTE TupleCode;
 BYTE TupleLink;
}TUPLE_INFO;

typedef struct {
 WORD Socket;
 WORD Attributes;
 BYTE DesiredTuple;
 BYTE TupleOffset;
 WORD Flags;
 DWORD LinkOffset;
 DWORD CISOffset;
 WORD TupleDataMax;
 WORD TupleDataLen;
 BYTE TupleData[];
}TUPLE_DATA_INFO;

typedef struct {
 WORD Attributes;
 WORD EventMask;
 BYTE ClientData[8];
 WORD Version;
}REGISTER_CLIENT_INFO;

//# END

End Listing














June, 1994
Optimizing MC68882 Code


Squeezing more performance out of pipeline architectures




Gary McGrath


Gary has a PhD in physics and is currently working on the B factory at the
Stanford Linear Accelerator Center. He can be reached at
mcgrath@slac.stanford.edu.


The MC68882 floating-point coprocessor (FPCP) adds 46 instructions to the
MC68020/030 32-bit microprocessor, substantially increasing the speed of
floating-point calculations. The FPCP implements the ANSI-IEEE 754-1985 binary
floating-point arithmetic standard and performs calculations in 80-bit
extended precision (64-bit mantissa, sign bit, and 15-bit signed exponent), as
well as six other formats: byte, word, long word, single, double, and packed
decimal. Along with implementing floating-point calculations in hardware, the
FPCP performs many of its calculations concurrently with the MPU for
additional speed benefits.
In short, the FPCP is a powerful addition to the MPU, not only because of
their parallel operation, but because of the FPCP's pipeline architecture. The
use of a pipeline allows certain instructions to execute either partially or
fully concurrently within the FPU itself. Thus, efficient programs not only
take advantage of the coprocessor, but follow a few basic rules that ensure
optimal use of the pipeline. This article examines how certain instruction
combinations are faster than others, and how to use that knowledge when
programming the FPU.


Programming Model


The MC68882 maintains the sequential- programming model of its predecessor,
the MC68881. Although instructions can execute either partially or fully
concurrently through the pipeline, a coprocessor interface register (CIR)
ensures that, if a certain result is required, that instruction will complete
before the next instruction begins. This makes the MC68882 completely downward
compatible with the MC68881, while still providing the opportunity for
enhanced performance.
The FPU has eight 80-bit floating-point user registers (FP0--FP7) that always
contain extended precision numbers, a control register (FPCR), a status
register (FPSR), and an instruction-address register (FPIAR). Instructions can
operate directly to memory or to user registers, and most speed optimizations
result from properly utilizing the user registers to minimize memory accesses;
this is accomplished by keeping the most-frequently accessed variables in
registers. In addition to providing user registers for frequently used
variables, the FPU provides many frequently used constants in on-chip ROM that
can be moved directly into a register, which is much faster than moving them
from memory.
The FPU utilizes a conversion unit (CU) to operate on and return data in seven
formats (B,W,L,S,D,X,P). Moreover, a floating-point formatted number can
appear as normalized, denormalized, zero, infinity, or not-a-number (NaN).
Because the calculations are performed internally in extended precision,
variables are kept in the extended-precision format to minimize the number of
format conversions, and this often results in speed gains.
Writing FPU code directly in assembly language permits many speed and
code-size optimizations. For instance, you can often make better use of
registers than possible with compiler-generated code. There are also many
peephole optimizations which come to light in assembly language; for example,
replacing fsin and fcos combinations with fsincos is difficult--if at all
possible--in a high-level language, yet trivial in assembly.


MC68882-Specific Optimizations


Beyond the usual advantages of assembly-language programming, the MC68882
architecture benefits from code written specifically to maximize use of the
pipeline. There are several general techniques for making sure the code is
properly written, including unrolling loops, eliminating register conflicts,
utilizing FPU instruction concurrency, and achieving concurrency with the MPU.
To quantize the differences in execution times for different instruction
combinations, you can measure times directly or calculate them from a table of
execution times. Table 1 is a partial list of execution times for the
instructions used in this article; the complete listing is available in
MC68881/MC68882 Floating-Point Coprocessor User's Manual (Prentice Hall,
1989). The times are expressed in clock cycles and separated into two
categories: head times and tail times.
The head times in Table 1 combine all the times for pipeline units that can
operate concurrently with the arithmetic-processing unit (APU); the tail gives
the time for the APU execution itself. If there are no conflicts, parallel
execution effectively eliminates the HEAD time, up to a maximum of the
preceding tail. In other words, as long as the result is needed, the pipeline
units will begin working on the next instruction while the APU is busy. This
ability to perform parallel execution is the basis for all pipeline
optimizations. Four specific programming techniques can greatly improve the
pipeline efficiency for many routines.


Eliminate Register Conflicts


Eliminating register conflicts is the most important technique for improving
pipeline efficiency, as it is necessary for the other techniques to work. A
register conflict occurs when the destination register is the source register
of the next concurrent instruction. Because the MC68882 adheres to a
sequential programming model, register conflicts force the second instruction
to wait for the first to complete, thereby foregoing the savings from the
concurrency for concurrent instructions. As evident from Table 1, register
conflicts can increase the time for an instruction like fadd from 35 to 56
cycles. For fully concurrent instructions like fmove, the difference is 100
percent, as the instruction was otherwise free.
Unfortunately, register conflicts are common because of the tendency to
program linearly, which results in instructions that depend on the immediate
results of their predecessors. For example, if two fadd instructions occur
sequentially, the second instruction typically depends on the result from the
first; see Example 1(a). Therefore, the second instruction must wait for the
first to complete. If the second instruction were to instead utilize registers
that are not in conflict, then the result would be as in Example 1(b).
Once the CU and bus-interface unit (BIU) finish with FP0, the CPU hands off
the data to the APU, and the MPU can launch the next instruction. That next
instruction puts FP2 through the CU and BIU, after which it waits for the APU
to become free; thus, the effective head time is 0. All in all, the second
instruction pair is approximately 16 percent faster than the first.


Unroll Loops


A normal loop contains instructions that operate once during each indexed
iteration, and unrolling these loops is one of the best optimizations for
creating efficient MC68882 code. The primary reason why loop unrolling is
associated with more efficient code is that it allows one to better perform
the other optimizations, like eliminating register conflicts and maximizing
fmove concurrency. In addition, the unrolled version usually has fewer or
less-complex instructions because some of the index-addressing calculations
have been eliminated.
Because loop unrolling seeks to minimize pipeline stalls, fewer branching
instructions are associated with the unrolled version, which leads to fewer
pipeline stalls--an additional benefit. When the MPU or FPU comes to a branch
instruction, it must flush the pipeline by waiting for the instructions
currently executing to finish. Thus, tight loops usually result in inefficient
use of the pipeline, because the pipeline is flushed for each index value, so
eliminating branching instructions like dbra can be especially beneficial.
Unrolling a loop is trivial if the loop index runs over a fixed range of
values, without variation. For example, if a routine were to total the
elements of a three-dimensional vector, it might take the form of a small
loop. The rolled loop might look like Example 2(a), and the unrolled like
Example 2(b). The unrolled version better utilizes the pipeline and eliminates
the branching instruction.


Maximize FPU Concurrency


Once pipeline concurrency is available, proper instruction placement can
further maximize the concurrency. As evident from Table 1, the fmove
instructions will execute fully concurrently with others if there are no
register conflicts. In Example 3(a), the fmove execution occurs sequentially.
However, eliminating the register conflict, as in Example 3(b), permits the
fmove to execute concurrently, making the second segment 29 percent faster.
To best use this concurrency, it is beneficial to rearrange code so that the
longest fmove instructions follow the longest arithmetic instructions; doing
so minimizes the chance of the APU completing before the fmove finishes.
Code written to follow the logical order of an operation often results in like
instructions being grouped. In Example 4(a), unrolling a routine that
normalizes a three-dimensional vector leaves such a grouping. Unfortunately,
the ordering of these instructions stalls the pipeline in a few places, as the
next instruction waits for the previous to finish. However, vector operations
naturally lend themselves to concurrency optimizations, as the grouped
instructions seldom rely upon one another. For instance, the execution time of
Example 4(a) can be reduced by alternating between mathematical operations and
the fmove instructions, while still minimizing register conflicts. Applying
this technique to Example 4(a) yields Example 4(b).

Switching around a few lines lets a few more fmove instructions be executed in
parallel with the APU. A timing profile indicates that this simple trick
yielded a routine that is approximately 11 percent faster than its
predecessor; at the same time, the readability of the source code is not
dramatically impaired.


Utilizing MPU Concurrency


Example 5 demonstrates how code is written so that like instructions are
clustered together. For instance, FPU instructions tend to appear sequentially
with other FPU instructions. Since the execution times for FPU instructions
tend to be much longer than those for the MPU, the MPU is usually
underutilized through a block of FPU instructions, unless those instructions
require complex memory-address calculations--the MPU performs address
calculations for the FPU. After the MPU launches an FPU instruction and
performs the necessary address calculations, it is idle and ready to execute
the next instruction. If that instruction is for the FPU, it must wait for the
FPU to request it; otherwise it may proceed and execute the next MPU
instruction.
If the FPU instruction is time consuming, as almost all of them are, there is
ample time for the MPU to complete at least one MPU instruction between FPU
instructions. Therefore, one can globally minimize a program's execution time
by maximizing FPU/MPU concurrency. To utilize this concurrency, you must
rearrange instructions so that the code alternates between MPU and FPU
instructions as much as possible. Moreover, an ideal optimization would pair
the longest MPU instruction just after the longest FPU instructions.
Although time matching is seldom feasible, the basic interleaving of
instructions will shield a large fraction of the MPU instruction times from
the overall execution time, because they will be performed in what was
previously idle time. As a simple example of this, you can insert MPU
instructions between two long FPU instructions; see Example 5(a). Execution
time does not change if two average-length MPU instructions are put in
between, as in Example 5(b), demonstrating that many MPU instructions can be
executed for free, if they occur in the right place. Although these situations
are somewhat rare, sizable gains are possible for the few occasions that
arise.


The Net Effect


Reductions in execution time are made possible by following a few simple rules
to best utilize the MC68882 pipeline, but the examples given thus far only
hint at some of the possibilities. Therefore, we'll examine the net effect of
the various techniques on a larger routine: a three-dimensional,
vector-rotating routine optimization.
Listing One (page 98) shows a simple vector-rotation routine written in C with
many coding tricks already applied. This routine is specific to
three-dimensional vectors, so the loops are completely unrolled. Additionally,
some strength reduction is achieved by substituting multiplications for pow()
calls (divisions are intentionally left as is). Listing Two (page 98) shows a
handcoded assembly-language version. Notice that the natural structure of the
routine has a fairly minimal number of register conflicts.
If you doubt the effectiveness of rewriting a few target routines in assembly
language, you should note that the execution time of this assembly-language
version is approximately 46 percent shorter than the THINK C compiled version
on a 16-MHz 68020/68882 combination. However, the focus here is on pipeline
optimizations, so the measurement uses the version in Listing Two as the
baseline.
After removing the comments and applying these techniques, the routine is more
efficient for the MC68882 pipeline; Listing Three (page 99) is the result.
Although theoretically possible, it is treacherous to calculate the difference
in execution times from the table of head and tail times. Instead, measuring
the execution times illustrates the savings. In this example, the optimized
version is approximately 14 percent faster than its assembly-language
predecessor, and roughly 54 percent faster than the compiled version. Because
this routine is the most frequently used in my Monte Carlo simulation, the
increase is substantial for the entire application.


Conclusion


The addition of a FPCP to the MC68020/030 greatly increases the floating-point
performance, but the MC68882 is especially advantageous over the MC68881
because of its pipeline architecture. Although assembly-language programming
can result in substantial execution speed gains and/or reductions in code
size, further optimizations to maximize pipeline efficiency can increase the
FPU performance even more, and the readability of the code deteriorates only
slightly because the optimizations are usually local.
The tactics I've described here do not require much effort to implement, yet
they offer an easy way to increase the performance of a routine. In addition
to better programming the MC68882, these skills and ideas are important, as
they pertain to many other pipeline architectures. Although these techniques
will not make up for inefficient algorithms, sometimes the best algorithms are
not enough. Next time your application needs to squeeze out all the
floating-point performance it can, following these simple rules can result in
faster code with relatively little added effort.
Table 1: Execution times (register to register) expressed in cycles.
 Instruction Head Tail Total

 fadd 17 35 56
 fsub 17 35 56
 fmul 17 55 76
 fdiv 17 87 108
 fsqrt 17 89 110
 fmove 21 0 21
 fmovecr 10 0 32
 fsin 17 373 394
 fcos 17 373 394
 fsincos 17 433 454


Example 1: Dealing with register conflicts. (a) If two fadd instructions occur
sequentially, the second instruction depends on the result of the first; (b)
if the second instruction utilizes registers that aren't in conflict, this is
the result.
(a)

fadd.x fp0, fp1 ; HEAD=17, TAIL=35
fadd.x fp1, fp2 ; HEAD=17, TAIL=35


(b)

fadd.x fp0, fp1 ; HEAD=17, TAIL=35
fadd.x fp2, fp3 ; (HEAD=17), TAIL=35


Example 2: (a) A small loop that totals the elements of a three-dimensional
vector; (b) the unrolled version of the small loop better utilizes the
pipeline.
(a) lea Array. a0 move.l 3, d0 fmovecr #0xf, fp0@loop: fadd.x (a0), fp0 lea
#0xfff4(a0), a0 dbra d0, loop
(b) lea Array. a0fmovecr #0xf, fp0fadd.x (a0), fp0fadd.x 12(a0), fp0fadd.x
24(a0), fp0

Example 3: The fmove execution in (a) occurs sequentially; (b) eliminating the
register conflict permits the fmove to execute concurrently, making it 29
percent faster.
(a)


fadd.x fp0, fp1 ; HEAD=17, TAIL=35
fmove.x fp1, fp2 ; HEAD=21

(b)

fadd.x fp0, fp1 ; HEAD=17, TAIL=35
fmove.x fp2, fp3 ; (HEAD=21)




Example 4: (a) Unrolling a routine that normalizes a three-dimensional vector
leaves a grouping; (b) improving the execution time by alternating between
mathematical operations and the fmove instructions, while still minimizing
register conflicts.
(a) move.l a0, -(sp)movea.l Array, a0fmovecr #0xf, fp0fmove.x (a0), fp1fmove.x
12(a0), fp2fmove.x 24(a0), fp3fadd.x fp1, fp0fadd.x fp2, fp0fadd.x fp3,
fp0fdiv.x fp0, fp1fdiv.x fp0, fp2fdiv.x fp0, fp3fmove.x fp3, 24(a0)fmove.x
fp2, 12(a0)fmove.x fp1, (a0)move.l (sp)+, a0
(b) move.l a0, -(sp)movea.l Array, a0fmovecr #0xf, fp0fmove.x (a0), fp1fmove.x
12(a0), fp2fadd.x fp1, fp0fmove.x 24(a0), fp3fadd.x fp2, fp0fadd.x fp3,
fp0fdiv.x fp0, fp1fdiv.x fp0, fp2fmove.x fp1, (a0)fdiv.x fp0, fp3fmove.x fp2,
12(a0)fmove.x fp3, 24(a0)move.l (sp)+, a0




Example 5: (a) Two long FPU instructions; (b) there is no difference in
execution time if two average length MPU instructions are put in between them.
(a) fsin.x fp1, fp0fcos.x fp1, fp0

(b) fsin.x fp1, fp0add.w d0, d1add.w d0, d1fcos.x fp1, fp0

[LISTING ONE] (Text begins on page 58.)

#include<math.h>
#include<asm.h>

void RotVect( double *Cx, double *Cy, double *Cz, double theta, double phi )
{
double pd1, pd2, pd3, pd4, V, Cxx, Cyy, Czz, x, y, z;

 x = *Cx;
 y = *Cy;
 z = *Cz;

 pd1 = sin( theta );
 pd2 = cos( theta );
 pd4 = sin( phi );
 pd3 = cos( phi );

 V = sqrt( 1.0 - z*z );

 Cxx = pd1/V*(y*pd3 - x*z*pd4) + x*pd2;
 Cyy = -pd1/V*(x*pd3 + z*y*pd4) + y*pd2;
 Czz = pd1*V*pd4 + z*pd2;

 V = sqrt( Cxx*Cxx + Cyy*Cyy + Czz*Czz );
 *Cx = Cxx/V;
 *Cy = Cyy/V;
 *Cz = Czz/V;
return;
}

[LISTING TWO]

#include<asm.h>


void FRotVect( double *Cx, double *Cy, double *Cz, double theta, double phi )
{
 asm 68020, 68882{
 fmovem fp4-fp7, -(a7) ;Save the registers
 move.l a2, -(a7)

 movea.l Cz(a6), a2
 fmove.x (a2), fp7 ; Cz -> fp7

 fmove.x fp7, fp0
 fmul.x fp0, fp0
 fmovecr #0x32, fp5 ; 1.0 -> fp5
 fsub.x fp0, fp5
 fsqrt.x fp5 ; V -> fp5

 fsincos.x theta(a6), fp4:fp3 ; pd2, pd1
 fsincos.x phi(a6), fp1:fp2 ; pd4, pd3

 fmove.x fp3, fp0 ;\
 fmul.x fp5, fp0 ; pd1*V*pd4
 fmul.x fp2, fp0 ;/

 fdiv.x fp5, fp3 ; pd1/V -> fp3

 fmove.x fp7, fp6
 fmul.x fp4, fp6
 fadd.x fp6, fp0 ; Czz -> fp0

 movea.l Cx(a6), a0 ;
 fmove.x (a0), fp5 ; Cx -> fp5
 movea.l Cy(a6), a1 ;
 fmove.x (a1), fp6 ; Cy -> fp6

 fmove.x fp0, (a1) ; save Czz to memory

 fmove.x fp4, fp0
 fmul.x fp5, fp0 ; Cx*pd2 -> fp0
 fmove.x fp0, (a0) ; save to memory
 fmul.x fp6, fp4 ; Cy*pd2 -> fp4

 fmove.x fp2, fp0
 fmul.x fp7, fp0
 fmul.x fp5, fp0 ; Cx*Cz*pd4 -> fp0
 fmul.x fp6, fp2
 fmul.x fp7, fp2 ; Cz*Cy*pd4 -> fp2
 fmul.x fp1, fp6 ; Cy*pd3 -> fp6
 fsub.x fp0, fp6 ; ( ... ) -> fp6

 fmul.x fp3, fp6
 fadd.x (a0), fp6 ; Cxx -> fp6

 fmul.x fp1, fp5
 fadd.x fp5, fp2
 fmul.x fp2, fp3
 fsub.x fp3, fp4 ; Cyy -> fp4

 fmove.x fp6, fp2
 fmul.x fp2, fp2 ; Cxx*Cxx -> fp2
 fmove.x fp4, fp3

 fmul.x fp3, fp3 ; Cyy*Cyy -> fp3

 fmove.x (a1), fp5 ; get Czz from memory
 fmove.x fp5, fp0
 fmul.x fp0, fp0 ; Czz*Czz -> fp0

 fadd.x fp3, fp0
 fadd.x fp2, fp0
 fsqrt.x fp0
 fdiv.x fp0, fp6 ;\
 fdiv.x fp0, fp4 ; Cx, Cy, Cz
 fdiv.x fp0, fp5 ;/
 fmove.x fp6, (a0) ;\
 fmove.x fp4, (a1) ; and move to memory
 fmove.x fp5, (a2) ;/

 move.l (a7)+, a2
 fmovem (a7)+, fp4-fp7
 }
}

[LISTING THREE]

#include<asm.h>

void FRotVect( double *Cx, double *Cy, double *Cz, double theta, double phi )
{
 asm 68020, 68882{
 fmovem fp4-fp7, -(a7)
 fsincos.x theta(a6), fp4:fp3
 move.l a2, -(a7)
 movea.l Cz(a6), a2
 fmove.x (a2), fp7
 fmovecr #0x32, fp5
 fmove.x fp7, fp0
 movea.l Cx(a6), a0
 fmul.x fp0, fp0
 fsincos.x phi(a6), fp1:fp2
 fsub.x fp0, fp5
 fmove.x fp3, fp0
 fsqrt.x fp5
 movea.l Cy(a6), a1
 fmul.x fp5, fp0
 fmul.x fp2, fp0
 fdiv.x fp5, fp3
 fmove.x fp7, fp6
 fmul.x fp4, fp6
 fmove.x (a0), fp5
 fadd.x fp6, fp0
 fmove.x (a1), fp6
 fmove.x fp0, (a1)
 fmove.x fp4, fp0
 fmul.x fp5, fp0
 fmul.x fp6, fp4
 fmove.x fp0, (a0)
 fmove.x fp2, fp0
 fmul.x fp7, fp0
 fmul.x fp6, fp2
 fmul.x fp5, fp0

 fmul.x fp7, fp2
 fmul.x fp1, fp6
 fsub.x fp0, fp6
 fmul.x fp3, fp6
 fadd.x (a0), fp6
 fmul.x fp1, fp5
 fadd.x fp2, fp5
 fmul.x fp3, fp5
 fmove.x (a1), fp2
 fsub.x fp5, fp4
 fmove.x fp6, fp2
 fmove.x fp4, fp3
 fmul.x fp2, fp2
 fmul.x fp3, fp3
 fmove.x fp5, fp0
 fadd.x fp2, fp3
 fmul.x fp0, fp0
 fadd.x fp3, fp0
 fsqrt.x fp0
 fdiv.x fp0, fp5
 fdiv.x fp0, fp6
 fmove.x fp5, (a2)
 fdiv.x fp0, fp4
 move.l (a7)+, a2
 fmove.x fp6, (a0)
 fmove.x fp4, (a1)
 fmovem (a7)+, fp4-fp7
 }
}
End Listings
































June, 1994
Extending imake


Taking a tool beyond the X Window System




Kamran Husain


Kamran is an independent consultant specializing in designing X/Motif and
real-time systems for geophysical and telecommunications applications. He can
be reached at 713-265-1635.


Imake is a utility that works with make so that code can automatically be
configured, compiled, and installed on different UNIX platforms. It is
currently used to configure systems such as the X Window System and Kerberos
authentication. In his book, Software Portability with imake (O'Reilly &
Associates, 1993), Paul DuBois points out that much of X's success can be
credited to its portability, and this portability is in large part due to
imake. While primarily an X tool, imake is useful for any project that
involves porting to multiple UNIX systems.
Imake generates makefiles from the Imakefiles template--a set of C
preprocessor macros. Makefiles are generally not portable across different
machines. Separating machine dependencies from items being built, however,
renders Imakefiles platform independent. imake uses Imakefiles to generate a
makefile for each platform for a given application; see Figure 1. It is
invaluable for making a release available on a wide variety of machines. The X
Window System imake (distributed by MIT with the standard X Window System
release 3 and greater) generates platform-specific makefiles by using
descriptions defined in Imakefiles. In this article, I'll discuss imake, its
template and rule files, and Imakefiles. And since imake isn't restricted to
X, I'll also show you how to extend Imakefiles beyond the X Window System to
AIX, SunOs, Linux, and the like. I'll refer to the site-specific config files
as siteConfFile (and the directory they're stored in as siteConfDir) and
project-specific configuration files as projConfFile (and the directory
they're stored in as projConfDir).


How imake Works


Imake uses the cpp preprocessor's conditionals to determine how to create a
makefile: It requires you to define an Imakefile and a template file with all
default rules, definitions, compiler options, and special make rules.
Just as make relies on makefiles, imake relies on Imakefiles. Imakefiles
contain definitions of make variables and one or more invocations of macro
functions to build the desired parts of a makefile; see Example 1.
Imake looks at template files to determine how to create a makefile from your
instructions. This makefile will have the clean, depend, install, and all
targets defined for you. imake looks for its template file, Imake.tmpl,
through the path specified in the command-line option --Ipathname.
The Imake.tmpl file contains definitions common to all makefiles. The general
structure of this file is in Example 2. The preprocessor picks up definitions
in the files defined. If these definitions are not overridden in subsequent
files, they will be used to create makefiles. Think of the Project.tmpl file
as a way to override definitions in the site.def file which, in turn, can
override declarations in the platform.cf file.
Using the Imake.rules file, imake appends a set of common make rules at the
bottom of a makefile. These rules are generally for the all, depend, clean,
and install directives.
The Imake.tmpl file should be modified with great care since it contains
platform-specific definitions. These files can usually be found in the
./lib/X11/config directory where you've installed the X Window System. If you
don't have this directory, use a find command to find the files.


Imakefiles


Example 3 shows the definition of a platform in the Imake.tmpl file. If you're
running on a Sun, for instance, you should have a #define sun declared before
these constructs. There are about ten such constructs for machines in the
template file (DEC, Apollo, Cray, and the like) and one for a generic file, in
case a machine isn't defined. For example, if #define sun is declared at the
top, then the sun.cf file will be included in Imake.tmpl. If nothing is
declared, the generic template file will be used.
The site.def file defines a preprocessor variable called ProjectRoot, which is
the root directory for all your projects; see Example 4. The Project.tmpl file
has multiple levels of nested if/ifndef/else/endif pairs that can be
confusing. Consequently, it's a good idea to print out the file and examine
the nest levels. Since in most cases, you'll have to modify variables related
only to your project, you probably should take the Project.tmpl provided here
and modify it to fit your needs.


Extending Imakefiles for Multiple Projects


The file structures I've defined so far are for general imake use. However,
you can modify these file structures. Example 5, for instance, shows how the
Imake.tmpl file can be modified to work with multiple projects.
Once Imake.tmpl has been modified, you can create four empty files
(platform.pcf, site.pdef, Project.ptmpl, Imake.prules) in the same directory
as Imake.tmpl. Next you can create a pmkmf from the xmkmf shell script, as in
Example 6. The inclusion of the search path for the template files forces
imake to look at the current project directory before the standard directory.
(I'm using a flag called UseInstalled in the xmkmf file. If this flag is not
defined in the command line, imake will attempt to make itself.)
In each of the project directories, you should have the option of overriding
anything in the standard directories by creating your own platform.pcf,
site.pdef, Project.ptmpl, and Imake.prules files. If you choose to work with
the standard files, then you don't have to do anything for that project. The
empty files in the standard directory will be picked up, satisfying the
preprocessor requirements.
If you do have to override the declarations in the standard files, then you
can create your own files in your project directory with those changes. The
empty file will not be picked up in that event since you search your local
directory (PROJECTDIR) first.
In Example 6, I defined two empty files, Motif.rules and Motif.tmpl, for my
particular installation in the project-files area. You may have to do the same
if your Imake.tmpl file requires some files unnecessary to your project. This
makefile is available electronically (see "Availability," page 3). Of special
interest are the targets depend, all, and clean that have been defined in the
makefile by default.
Recall that almost all the declarations in the standard files are of type:

#ifdef variable
#define variable
#endif
This forces the need for the template files for projects to be in front of the
standard files so that these declarations can be seen by the preprocessor
first. However, you must be careful to check if this declaration is not
defined elsewhere. If a similar variable is declared earlier in the execution
cycle, then this particular declaration will not be used. If a similar
variable is declared later in the execution cycle, then this particular
declaration will override the latter declaration. Be sure to grep all template
files just before declaring a new variable.
However, if you want to explicitly override a variable (even if it was not
previously defined), then use the following construct if you do not want
warning messages from the preprocessor:
#ifdef variable
#undef variable
#endif
#define variable asSomethingDifferent



Working with Multiple Projects


Assume that you have three projects--alpha, beta, and theta-- under your
directory. You would include in each of these directories an Imakefile similar
to Example 1. To invoke imake, you would then add a script file in your path
similar to Example 6. This Imakefile would contain the rule necessary to
create your directory's project.
You then have two options: You can either browse through the Imake.rules to
find the rule that best fits your needs or write your own. In most cases,
you'll find a rule close to what you need. Obviously, any new global rules you
create will go in the standard Imake.rules file; if they're specific to this
project, they'll go in the Imake.prules file.
Imake rules have the format in Example 7(a), which is illustrated in the rule,
Example 7(b). When called, MakeMyProgram(OurProject,BigFile.o Onefile.o
OurProject.o,--lm) will expand as in Example 7(c). There are two preconditions
for parsers in the C preprocessor and imake, respectively. The entire macro
must be one continuous line and the end of the line is terminated by the @@\
symbol. If you forget to add one or the other, you'll see some really unusual
rule expansion. The @@\ symbol is stripped away by imake before handing off to
the C preprocessor.
In almost all cases, you want to consider macros offered by Imake.rules before
writing your own. Table 1 lists some of the macros in the Imake.rules file.
Table 2 lists some of the predefined variables that you can use in your
Imakefiles and macros. You can always look in the generated makefile for the
complete list of predefined symbols.
For my projects, I typically use:
 SimpleProgramTarget, which lets you specify only one simple target with just
one source and one object file.
 SingleProgramTarget, which lets you specify a target composed of the objects
defined in the list OBJS (see Example 1).
 ComplexProgramTarget, for when you have more than one executable or new
libraries to install on your system (use ComplexProgramTarget_1,
ComplexProgramTarget_2, and so on).
Actually, SingleProgramTarget is sometimes considered an obsolete version of
NormalProgramTarget, which allows you to have dependent libraries. I still
find it simpler to use, and in most cases, I don't have dependent libraries.
If an existing macro doesn't meet your particular rule requirements, you can
modify a similar one that's defined in Imake.rules.
The Imake.rules macro MakeSubdirs($(SUBDIRS)) allows you to descend into a
list, issuing $(MAKE) in subdirectories listed in $(SUBDIRS). Since I didn't
find the rule to fit my exact needs, I wrote my own macro (see Example 8),
which lets you issue a command that can be executed in each of the directories
listed in dir, with flags passed into cmd. (In your particular case, the cmd
would be your shell script to invoke imake.)
This lets you invoke your own command set using imake on various sets of
directories. This shell script is extensible for adding in man pages, copying
binaries to a passed location, and so on. However, Imake.rules can still use
all of the rules if you generalize it. If you want to use other predefined
rules within this rule, then put your rule either at the end of the
Imake.rules file, or in another file (say, Imake.postRules) that's searched
after the Imake.rules file. Then include the reference to it, as in Example 2.


Testing Macros and/or Rules


Naturally, you want to test macros without clobbering existing makefiles. To
avoid overwriting these files, use imake's -- sFilename option. If the --
option precedes the filename, the output is written to standard output;
otherwise it's written to the filename specified. Therefore, if you invoke the
imake command with the -- option and redirect the output to a temporary file
(say, myMakefile), you can compare the myMakefile and the old makefile file
using diff to see what changes were incurred by including your new macro. The
--v option on the imake command line will tell you what's being passed to the
C preprocessor.
Remember that macros are difficult to debug. When in doubt, put parentheses
around any variable that might be expanded and ensure that the number of
parentheses match up. If you see unusual expansions, check to see if you have
placed the \ and @@\ terminators correctly at ends of all noncontiguous lines.
When in doubt, place echo statements in the rule to show what's going on
during expansion. It's important to remember that in makefiles the $ is a
special character that expands the next field in front of itself. To keep
$MAKE from expanding $M instead of $MAKE, be sure to use $(MAKE).
For example, consider this construct expanded to a rule in a makefile:

a.o b.o : $$@.bkp
cp $@.bkp $@

This is used to derive a.o from a.o.bkp and b.o from b.o.backup. make always
reads a dependency list twice, once when it's initially reading the makefile,
and again when it's generating the dependency list for the target's
dependencies. In each pass, it performs a macro expansion. The $$@ is thus
expanded to $@ on the first pass, and $@ is expanded to the actual dependency
on the second.
Another catch when using special make macros in imake rules is that, unlike
dependency lists, the target-name part is scanned only once per invocation of
make. So a $$(NAME) in the target area will not expand beyond $(NAME) for a
target name and will yield unexpected results.
 Figure 1: Building programs with imake and make.


Example 1: Definitions of make variables.
# This is a simple Imakefile for a
# single program target myfile
SRCS=myfile.c another.c
OBJS=$(SRCS:.c=.o)
MYLIBS=-lm

SingleProgramTarget(myfile,$(OBJS),
$(MYLIBS),NullParameter)


Example 2: General structure of Imake.tmpl file; (a) system descriptions; (b)
general rules.
(a) #include <platform.cf>#include <site.def>

(b) #include <Project.tmpl>#include <Imake.rules>


Example 3: Definition of a platform in the Imake.tmpl file.
#ifdef sun

#define MacroIncludeFile <sun.cf>
#define MacroFile sun.cf
#undef sun
#define SunArchitecture
#endif /* sun */

#ifdef hpux

#define MacroIncludeFile <hp.cf>
#define MacroFile hp.cf
#undef hpux
#define HPArchitecture
#endif /* hpux */

Example 4: The site.def file defines a preprocessor variable called
ProjectRoot, the root directory for all your projects.
/* This is the site.def file for my projects */

#define ProjectRoot /home/kamran/proj
#define BinDir Concat(ProjectRoot,/bin)


Example 5: Modifying the Imake.tmpl file to work with multiple projects; (a)
system descriptions; (b) general rules.
(a) #include <platform.pcf>#include <platform.cf>#include <site.pdef>#include
<site.def>


(b) #include <Project.ptmpl>#include <Project.tmpl>#include
<Imake.prules>#include <Imake.rules>

Example 6: Creating a pmkmf from the xmkmf shell script.
#!/usr/bin/sh
CONFIGSTDDIR=/usr/local/X4M1.1/usr/lib/X11/config
PROJECTDIR=.
imake -DUseInstalled -I$PROJECTDIR -I$CONFIGSTDDIR $MAKEDEFINES

Example 7: (a) imake rule format; (b) typical imake rule; (c) expanding the
imake rule.
(a) #define RuleName(arg1,arg2,..,argN) @@\ definitions @@\ definitions @@\
definitions

(b) #define MakeMyProgram(program,objs,libs) @@\ program: objs @@\ cc -o
program objs libs

(c) OurProject: BigFile.o Onefile.o OurProject.o cc -o OurProject BigFile.o
Onefile.o OurProject.o -lm


Table 1: Some sample macros in the Imake.rules file.
 Macro Description

 NormalProgramTarget program,objects,deplibs,locallibs,syslibs
 SimpleProgramTarget program
 ComplexProgramTarget program
 ComplexProgramTarget_1 program,locallib,syslib
 ComplexProgramTarget_2 program,locallib,syslib
 ComplexProgramTarget_3 program,locallib,syslib
 ServerTarget server,subdirs,objects,libs,syslibs
 InstallLibrary libname,dest
 InstallSharedLibrary ibname,rev,dest
 InstallLibraryAlias ibname,alias,dest
 InstallLintLibrary libname,dest


Table 2: Some Imakefile variables.
 Variable Description

 CC C compiler invocation
 SRCS C sources files
 OBJS Object files
 DEFINES Application-specific preprocessor symbols
 INCLUDES Application-specific header files
 SYS_LIBRARIES X11 libraries
 DEPLIBS Libraries used for dependencies

 CDEBUGFLAGS Compiler options using --g, --o, and so on
 LOCAL_LDFLAGS Linker options


Example 8: Macro which lets you issue a command that can be executed in each
of the directories listed in dir, with flags passed into cmd.
/* An example rule for the top level directory of a multi-project
 * directory. This rule should be placed in Imake.rules or * Imake.prules.
 * FireUpSubDirs - For each directory in dirs do cmd with flags. Look
 * at a similar example in Imake.rules called MakeSubDirs * for an
alternative.
 */

#ifndef FireUpSubDirs
#define FireUpSubDirs(name,dirs,cmd,flags) @@\
 for i in dirs ;\ @@\
 do \ @@\
 (cd ./$$i ; echo "In $(CURRENT_DIR)/$$i..."; \ @@\
 cmd flags ); \ @@\
 done
#endif











































June, 1994
Examining Symantec C++


Updating the PT periodic-table program




Michael Yam


Michael is an independent consultant and has served New York's financial
district since 1984. He can be reached on CompuServe at 76367,3040.


Symantec's C++ Professional Compiler is an eclectic collection of tools that
includes the Symantec C++ compiler (formerly Zortech C++), SLR Systems'
Optlink linker, the Multiscope debugger, visual tools from Blue Sky and the
Whitewater Group, and version 2.0 of the Microsoft Foundation Class (MFC)
library. Furthermore, Symantec provides access to (but does not bundle)
Intersolv's PVCS toolkit for team development and version control.
Of course, simply collecting a set of powerful programming tools such as these
is one thing; integrating them into a cohesive working environment is another.
In particular, I was curious as to how the compiler handles the subtleties of
MFC, since I've done quite a bit of MFC 2.0 development. In "Examining MFC
2.0" (DDJ, June 1993), for instance, I created a Windows application, called
PT, which displays a periodic table using a modeless dialog box as its main
window. PT allows the user to point and click on atomic elements in the
periodic table, then displays edit fields containing the name, symbol, atomic
number, and atomic weight of that element. In this article, I'll revisit PT
and examine some of the changes required to get it running under Symantec C++
(SC++) 6.x. Finally, I've used Symantec C++ 6.0 in this project. Symantec has
since issued version 6.1, a maintenance release available free of charge to
registered 6.0 users. For the purposes of porting PT, I've noted the
differences between versions.


The SC++ Environment


If you plan a complete installation of SC++, be prepared to give up about 53
Mbytes of disk space. For that, you'll get a complete environment that
supports DOS and Windows development. I found that a workable configuration
which supported Windows development, the large memory model, MFC libraries,
help files, and sample programs, required 38 Mbytes. If you use a CD-ROM
drive, SC++ can be configured to occupy only 12 Mbytes of your hard disk. To
run SC++, you'll need at least a 386 system and 4 Mbytes of RAM. For practical
purposes, however, you really should have a 33-MHz/486 with 8 Mbytes of RAM,
although I found that even that was slow when compiling with full debugging
information. (Full debugging information is necessary if you want to use
Multiscope to browse class libraries, such as MFC, and to debug DLLs.)
I also recommend a Super VGA display. The integrated development and debugging
environment (IDDE) can get cluttered, and the extra surface area can save you
from getting lost. To minimize this clutter, SC++ implements five virtual
screens to separate editing, source-level debugging, low-level debugging,
compiling, and output viewing (Figure 1). The IDDE is not a classic MDI
application--the windows "float," much like those in Microsoft's Visual Basic.
It's a matter of personal taste, but after working in the IDDE for a couple of
days, I discovered I liked floating windows.
The IDDE, however, is not without problems. It lacks, for instance,
context-sensitive help. If you highlight strcpy in the editor and press F1,
both the Microsoft and Borland environments will bring up a description of
that topic. With Symantec's IDDE, however, you must open the proper help file
and request a search. The editor could also be improved. For example, while it
allows Shift+Del to cut text, it requires Ctrl+V to insert text. Finally, when
restarting SC++, the IDDE intelligently reloads your workspace but neglects to
reload your project.


Drawing Hydrogen


In dusting off PT, I couldn't resist the temptation to enhance the program.
With the help of the MFC device context class (CDC), this version of PT can
display the atomic structure of hydrogen; see Figure 2. The model is simple,
though it's possibly incorrect in the context of quantum mechanics. (A more
accurate rendering would describe hydrogen with a nucleus surrounded by an
electron cloud, or electron-density distribution corresponding to a wave
function.) The class CATOM (Listing One, page 100) stores drawing methods and
structural information (the number of electrons and the number of shells). The
instantiation and drawing of hydrogen takes place in a modeless dialog box.
Details about the dialog box are encapsulated in the CATOMDialog class, and
the object is created when the user clicks on the "Show Atom" bar. When the
CATOMDialog object receives a WM_PAINT message, control goes to
CATOMDialog::OnPaint(); see Listing Two, page 100.
With SDK techniques, you would need to place your painting code between
BeginPaint() and EndPaint() calls. With MFC, however, you can use the CPaintDC
class, as shown in Figure 3(a). This isn't much better because you still wind
up sandwiching your code. A better solution would be to create the CPaintDC
object on the stack; see Figure 3(b). When your painting routine is complete,
pdc is automatically deleted and the device context freed.
With the device context established for the dialog box, the CATOMDialog object
has the CATOM object draw itself. To define the drawing area, I used a group
box which is a CWnd object in MFC. Doing so required that I obtain another
device context. MFC provided a CClientDC class for just such a purpose, and I
declared the object on the stack, like CPaintDC, mentioned earlier. Device
contexts are a precious resource (Windows 3.1 permits five), but I used an
extra one because it was easier to draw in an area exclusively reserved for an
atom. It also generalized the drawing routine to write to any CWnd object.
The remainder of the painting code uses ellipses to draw the hydrogen atom.
Note that Ellipse() produces a solid figure; the interior is filled with the
current brush. To prevent the electron shell from partially eclipsing the
electron and totally covering the nucleus, I drew from the outside-in, or
back-to-front: first the electron shell, then the electron, and finally, the
nucleus. The nucleus and electron appear solid because their ellipses are
filled with a BLACK_BRUSH. To make the electron shell appear transparent, I
employed a HOLLOW_BRUSH to fill the ellipse using the background color.


Building PT


Compiling was straightforward, although I had to replace the
Microsoft-specific _stricmp() with the standard stricmp(). Also, in one case,
I called MessageBox() only to have the compiler tell me it "cannot implicitly
convert from: char _near * to: unsigned." This meant that one of my four
arguments was of an incorrect type. I double checked them and they appeared
correct. Puzzled, I compiled the identical source using Microsoft Visual C++
and received this message: " MessageBox' : function does not take four
parameters." Microsoft's message was clearer because it reminded me that I was
using MFC's MessageBox(), which accepts three arguments, not the SDK's
MessageBox(), which takes four.
Upon linking, I was presented with a list of unresolved externals because the
linker could not automatically locate the MFC libraries. I had to edit the
project file by adding two libraries: LIBW.LIB and LAFXCW.LIB. I also
discovered that to work with MFC, I needed the --k flag to keep segments in
.DEF order.
Curiosity made me peek at the makefile SC++ produced. I was pleasantly
surprised to find that the makefile reproduced in Listing Three (page 100) was
comprehensible, unlike those of Microsoft and Borland. In these days of
integrated environments and project files, readable makefiles may not seem
very important. But, they do offer a small degree of flexibility, especially
if you deal with cross-platform development.
Moving PT's resources over to Symantec's Resource Toolkit required only a
modicum of effort. The Resource Toolkit did not permit me to load PT.RC
directly; it wanted the compiled resource, PT.RES. I had to load PT.RC from
the IDDE, where it gave me a choice of editing the resource file visually or
as text. Selecting the visual approach compiled PT.RC, producing PT.RES, which
was then loaded into the Resource Toolkit. After building PT, running it
corrupted the USER kernel.


Postmortem


SC++ comes with MED, an execution monitor reminiscent of, yet more powerful
than, Microsoft's Dr. Watson or Borland's WinSpector. MED will trap run-time
errors as well as general-protection faults. Also, in the event of an infinite
loop or a hung program, pressing Ctrl+Alt+SysRq will force MED to dump to a
file. To access this additional debugging power, you need to compile your
programs with a MED header file and link with a MED library. I could not test
this feature with 6.0 because the compiler could not locate MEDW.H. It turns
out that this file was missing in 6.0; it is, however, included in 6.1.
A related debugging tool, the Crash Analyzer, reads the postmortem dump
created by MED and helps you determine the point of failure at the source
level. Think of it as an interactive version of Borland's WinSpector Assistant
(DFA.EXE). Again, I couldn't test this tool because of the missing header
file--unfortunate because the Crash Analyzer would have been useful in
tracking PT's crash problem.
After compiling and linking with debugging options, PT ran successfully. Since
enabling debugging disabled all code optimizations, I hypothesized that PT
crashed because of faulty compiler optimizations. Once I recompiled with both
debug information and code optimizations set to "none," PT ran successfully.
SC++ provides 11 optimization flags. To determine which option was causing the
problem, I built and tested versions of PT with only one flag enabled at a
time. As it turned out, any of the options will cause PT to die. (I did not
test this under 6.1.) I should mention that an MFC sample program, CHKBOOK,
compiled and ran successfully with all optimizations enabled. Perhaps PT's
unorthodox approach of using a dialog box as the main window was giving the
SC++ optimizer problems. (VC++, however, didn't have problems building an
optimized PT.)


Extending PT


When I first wrote PT, I took the lazy approach and stored the element names,
symbols, atomic numbers, and atomic weights in a structure stored in memory.
This worked well because the information was static. Since this updated
version of PT only draws hydrogen, I have left the data in memory. If you plan
to extend PT to incorporate atomic drawings for all the elements, you should
consider maintaining the information in a database. The constructor to CATOM
accepts the name of the element, which can be used as a lookup key. If you
overload the constructor to accept the atomic number or atomic weight, you can
also use them as lookup keys. You can't, however, overload the constructor to
accept the atomic symbol because both symbol and element names are character
strings.
Hydrogen is instantiated from the CATOM class because hydrogen is a kind of
atom. When creating instances of other atoms, resist the temptation to derive
atoms from other atoms. For example, you wouldn't want to derive helium, which
has two electrons, from hydrogen, which has one. Helium is not a kind of
hydrogen, and should be instantiated from CATOM. Similarly, avoid deriving
"down" the periodic table: Don't derive silicon from carbon, for example. This
is tempting because carbon and silicon are in the same family of elements
(valence of four) but differ in behavior--carbon is the stuff of organic life,
silicon the stuff of computer life. Yet, can silicon be considered a kind of
carbon? If a particular derivation sparks a philosophical debate, it must be
an ambiguous object, and thus, can only serve your program in an ambiguous
fashion. There is little to discuss when stating carbon or silicon is a kind
of atom.



Conclusion


Symantec C++ is a natural upgrade path for Zortech users. The Zortech compiler
has traditionally appealed to an elite group, gaining special capabilities
usually before Microsoft and Borland, such as 16- and 32-bit versions,
DOS-extender support, cross-platform capabilities, and native C++. Under
Symantec, Zortech users will have access to a useful development environment,
debugger and linker, class library, and visual-programming tools. SC++ users
upgrading to 6.1 will also benefit from new features such as support for
debugging templates, syntax-directed color highlighting, an improved project
manager, and a 32-bit version of MFC 2.0 (CD-ROM version only).
However, the advantages for Microsoft and Borland users are less clear. SC++
is no longer the only package sporting a 32-bit compiler for Windows. With
Microsoft's Visual C++ 32-bit edition and Borland C++ 4.0, there is less of a
compelling reason to make the switch. Additionally, visual tools, GUI class
libraries and Windows-hosted development environments have become standard
fare. Symantec does offer an advantage to developers by including the
Multiscope debugger and Optlink linker, which are superior tools. Microsoft
and Borland developers, however, have access to these same tools as
third-party add-ons.
 Figure 1: SC++ implements five virtual screens to separate editing,
source-level debugging, low-level debugging, compiling, and output viewing.
 Figure 2: Updated version of PT.
Figure 3: (a) MFC's CPaintDC method; (b) creating the CPaintDC object on the
stack.
(a) CPaintDC *pdc = new CPaintDC(this); [painting routine here]delete pdc;

(b) CPaintDC pdc(this);[painting routine here]


[LISTING ONE] (Text begins on page 80.)

//----- PTATOM.H - Declares class interface for Periodic Table ---

#ifndef __PTATOM_H__
#define __PTATOM_H__

#include "ptdefs.h"

class CATOM
{
private:
 char Name[PT_NAMELEN+1];
 int NumberOfElectrons;
 int NumberOfShells;
public:
 CATOM (char *szName);
 ~CATOM();
 int DrawAtom (CWnd *Parent);
};
class CATOMDialog : public CDialog
{
private:
 CATOM *Atom;
public:
 CATOMDialog (char *AtomName);
 ~CATOMDialog ();

 //{{ AFX_MSG (CATOMDialog)
 afx_msg void OnPaint();
 afx_msg void OnOK();

 //}} AFX_MSG

 DECLARE_MESSAGE_MAP()
};
#endif

[LISTING TWO]

//------ PTATOM.CPP - Periodic Table for Windows -------

#include <afxwin.h>
#include <windows.h>


#include <string.h>

#include "resource.h"
#include "ptatom.h"

//----------------------------------------------------------------------------
// CATOMDialog Constructor -- Creates a modeless dialog box to display an
// atom. Also sets dialog caption to atom name and creates atom object.
//----------------------------------------------------------------------------
CATOMDialog::CATOMDialog(char *AtomName)
{
 if (stricmp (AtomName, "Hydrogen"))
 {
 MessageBox ("This version of PT can only draw Hydrogen", "SORRY",
 MB_OK MB_ICONINFORMATION MB_TASKMODAL);
 return;
 }
 Atom = new CATOM (AtomName);
 if (Create ("ATOM") == FALSE)
 MessageBox ("Cannot create modeless dialog box.", "ERROR", MB_OK);
 else
 SetWindowText (AtomName);
}
//---------------------------------------------------------------------------
// ~CATOMDialog Destructor -- Destroys atom object and modeless dialog box.
//---------------------------------------------------------------------------
CATOMDialog::~CATOMDialog()
{
 delete Atom;
 DestroyWindow();
}
//--------------------------------------------------------------
// OnOK -- User pressed OK button. Destroy the dialog box.
//--------------------------------------------------------------
void CATOMDialog::OnOK()
{
 delete this;
}
//---------------------------------------------------------------------
// OnPaint -- Received a WM_PAINT message. Get the device context and
// draw the atom inside the GroupBox.
//---------------------------------------------------------------------
void CATOMDialog::OnPaint()
{
 CPaintDC pdc(this); // paint device context on stack
 CWnd *GroupBox = GetDlgItem (IDD_GROUPBOX);
 if (GroupBox != NULL && Atom != NULL)
 Atom->DrawAtom (GroupBox);
}
//--------------------------------------------------------------
// CATOM Constructor
//--------------------------------------------------------------
CATOM::CATOM (char *szName)
{
 strcpy (Name, szName);
}
//-----------------------------------------------------------------------
// CATOM Destructor -- No handling necessary. Included for completeness.

//------------------------------------------------------------------------
CATOM::~CATOM ()
{
}
//----------------------------------------------------------------------------
// DrawAtom - This method only draws Hydrogen atom. Output goes to CWnd
object.
// Drawing is done from outside-in: electron shell, electron, then nucleus.
//----------------------------------------------------------------------------
int CATOM::DrawAtom(CWnd *Parent)
{
 RECT rc;
 int HorzSF, VertSF; // scale factors

 CClientDC pdc(Parent); // client device context on stack

 // scale down rectangle. Use ellipse() to describe
 // electron orbits.around nucleus.
 Parent->GetClientRect (&rc);
 HorzSF = rc.right/5;
 VertSF = rc.bottom/5;

 rc.left = HorzSF;
 rc.top = VertSF;
 rc.right -= rc.left;
 rc.bottom -= rc.top;

 pdc.SelectStockObject (HOLLOW_BRUSH);
 pdc.SelectStockObject (BLACK_PEN);
 pdc.Ellipse (&rc); // electron orbit

 // set up to draw electron. Easier if we position
 // at 12 o'clock.
 rc.left = rc.left + (rc.right - rc.left)/2 - 4;
 rc.top -= 4;
 rc.right = rc.left + 8;
 rc.bottom = rc.top + 8;

 pdc.SelectStockObject (BLACK_BRUSH);
 pdc.Ellipse (&rc); // electron

 // scale down rectangle to draw nucleus.
 Parent->GetClientRect (&rc);

 rc.left = HorzSF*2;
 rc.top = VertSF*2;
 rc.right -= rc.left;
 rc.bottom -= rc.top;

 pdc.SelectStockObject (BLACK_BRUSH);
 pdc.Ellipse (&rc); // nucleus

 return 0;
}
//--------------------------------------------------------------
// MESSAGE MAP
//--------------------------------------------------------------
BEGIN_MESSAGE_MAP (CATOMDialog, CDialog)
 //{{ AFX_MSG_MAP (CPTATOMDialog)
 ON_WM_CLOSE ()

 ON_COMMAND (IDOK, OnOK)
 ON_WM_PAINT ()
 //}} AFX_MSG_MAP
END_MESSAGE_MAP()

[LISTING THREE]

ORIGIN = Symantec C++
ORIGIN_VER = Version 6.0
VERSION = DEBUG

PROJ = SCPT
APPTYPE = WINDOWS EXE
PROJTYPE = EXE

CC = SC
MAKE = MAKE
RC = RCC
HC = HC
ASM = SC
DISASM = OBJ2ASM
LIBR = IMPLIB
LNK = LINK
CVPK = CVPACK

DLLS =
HEADERS = pt.h resource.h ..\..\..\sc\mfc\include\afx.h \
 ..\..\..\sc\mfc\include\afxver_.h \
 ..\..\..\sc\include\windows.h \
 ..\..\..\sc\include\shellapi.h \
 ..\..\..\sc\mfc\include\afxres.h \
 ..\..\..\sc\mfc\include\afxcoll.h \
 ..\..\..\sc\include\win16\print.h \
 ..\..\..\sc\mfc\include\afxmsg_.h \
 ..\..\..\sc\mfc\include\afxdd_.h \
 \sc\mfc\include\afx.h \sc\mfc\include\afxver_.h \
 \sc\include\windows.h \sc\include\shellapi.h \
 \sc\mfc\include\afxres.h \sc\mfc\include\afxcoll.h \
 \sc\mfc\include\afxmsg_.h \sc\mfc\include\afxdd_.h \
 \sc\include\win16\windows.h \
 ptatom.h ptdefs.h
LIBS = ..\..\..\sc\lib\libw.lib \
 ..\..\..\sc\mfc\lib\lafxcw.lib \
 LIBW.LIB COMMDLG.LIB SHELL.LIB
DEFFILE = pt.def
CFLAGS = -Jm -ml -C -W1 -s -2 -c -g -gh -gf
HFLAGS = $(CFLAGS)
LFLAGS = /CO /LI /NOI /INF /RC -k :pt.RES
MFLAGS =
RESFLAGS =
AFLAGS = -c
HELPFLAGS =

MODEL = L
DEFINES =
RCDEFINES =
LIBDIRS =
INCLUDES = -I\SC\INCLUDE -I\SC\MFC\INCLUDE


OBJS = pt.OBJ ptatom.OBJ
RCFILES =
RESFILES = pt.RES
SYMS = pt.SYM resource.SYM
HELPFILES =
BATS =

.C.OBJ:
 $(CC) $(CFLAGS) $(DEFINES) $(INCLUDES) -o$*.obj $*.c
 $(CC) $(CFLAGS) $(DEFINES) $(INCLUDES) -o$*.obj $*.cpp
.CXX.OBJ:
 $(CC) $(CFLAGS) $(DEFINES) $(INCLUDES) -o$*.obj $*.cxx
 $(CC) $(CFLAGS) $(DEFINES) $(INCLUDES) -o$*.obj $*.cp
.H.SYM:
 $(CC) $(HFLAGS) $(DEFINES) $(INCLUDES) -HF -o$*.sym $*.h
.HPP.SYM:
 $(CC) $(HFLAGS) $(DEFINES) $(INCLUDES) -HF -o$*.sym $*.hpp
.HXX.SYM:
 $(CC) $(HFLAGS) $(DEFINES) $(INCLUDES) -HF -o$*.sym $*.hxx
.C.EXP:
 $(CC) $(CFLAGS) $(DEFINES) $(INCLUDES) -e $*.c -l$*.lst
 $(CC) $(CFLAGS) $(DEFINES) $(INCLUDES) -e $*.cpp -l$*.lst
.CXX.EXP:
 $(CC) $(CFLAGS) $(DEFINES) $(INCLUDES) -e $*.cxx -l$*.lst
 $(CC) $(CFLAGS) $(DEFINES) $(INCLUDES) -e $*.cp -l$*.lst
.ASM.EXP:
 $(CC) $(CFLAGS) $(DEFINES) $(INCLUDES) -e $*.asm -l$*.lst
.OBJ.COD:
 $(DISASM) $*.OBJ >$*.cod
.EXE.COD:
 $(DISASM) $*.EXE >$*.cod
.COM.COD:
 $(DISASM) $*.COM >$*.cod
.OBJ.EXE:
 $(LNK) $(LFLAGS) @$(PROJ).LNK
.OBJ.COM:
 $(LNK) $(LFLAGS) @$(PROJ).LNK
.DLL.LIB:
 $(LIBR) $*.LIB $*.DLL
.DEF.LIB:
 $(LIBR) $*.LIB $*.DEF
.RTF.HLP:
 $(HC) $(HELPFLAGS) $*.HPJ
.ASM.OBJ:
 $(ASM) $(AFLAGS) $(DEFINES) $(INCLUDES) $*.ASM
.RC.RES:
 $(RC) $(RCDEFINES) $(RESFLAGS) $(INCLUDES) $*.rc
.DLG.RES:
 echo \#include "windows.h" >$$$*.rc
 echo \#include "$*.h" >>$$$*.rc
 echo \#include "$*.dlg" >>$$$*.rc
 $(RC) $(RCDEFINES) $(RESFLAGS) $$$*.rc
 -del $*.res
 -ren $$$*.res $*.res

all: $(PROJ).$(PROJTYPE) done
$(PROJ).$(PROJTYPE): $(PROJS) $(OBJS) $(RCFILES) \
 $(RESFILES) $(HELPFILES) $(BATS)
 $(LNK) $(LFLAGS) @$(PROJ).LNK

 $(CVPK) $$SCW$$.$(PROJTYPE)
 -del $(PROJ).$(PROJTYPE)
 -ren $$SCW$$.$(PROJTYPE) $(PROJ).$(PROJTYPE)
done:
 -echo $(PROJ).$(PROJTYPE) done
buildall: clean all
clean:
 -del $(PROJ).$(PROJTYPE)
 -del SCPH.SYM
 -del pt.OBJ
 -del ptatom.OBJ
 -del pt.SYM
 -del resource.SYM
cleanres:
res: cleanres $(RCFILES) link
link:
 $(LNK) $(LFLAGS) @$(PROJ).LNK
 $(CVPK) $$SCW$$.$(PROJTYPE)
 -del $(PROJ).$(PROJTYPE)
 -ren $$SCW$$.$(PROJTYPE) $(PROJ).$(PROJTYPE)
pt.OBJ: \
 pt.cpp \
 \sc\mfc\include\afx.h \
 \sc\mfc\include\afxver_.h \
 \sc\include\windows.h \
 \sc\include\shellapi.h \
 \sc\mfc\include\afxres.h \
 \sc\mfc\include\afxcoll.h \
 ..\..\..\sc\include\win16\print.h \
 \sc\mfc\include\afxmsg_.h \
 \sc\mfc\include\afxdd_.h \
 resource.h \
 pt.h \
 ptdefs.h \
 ptatom.h
ptatom.OBJ: \
 ptatom.cpp \
 \sc\mfc\include\afx.h \
 \sc\mfc\include\afxver_.h \
 \sc\include\windows.h \
 \sc\include\shellapi.h \
 \sc\mfc\include\afxres.h \
 \sc\mfc\include\afxcoll.h \
 ..\..\..\sc\include\win16\print.h \
 \sc\mfc\include\afxmsg_.h \
 \sc\mfc\include\afxdd_.h \
 resource.h \
 ptatom.h \
 ptdefs.h

End Listings











June, 1994
Cross-Platform Database Development


Strategies for FoxPro developers




J. Randolph Brown


Ihave been writing database applications in FoxPro for a number of
years--first on DOS, then Windows, and more recently on the Macintosh
(including FoxBase+/Mac). FoxPro for Macintosh (FPM) was closely adapted from
its predecessors, FoxPro for Windows (FPW) and FoxPro for DOS (FPD), hence its
emphasis on cross-platform database development.
Among other features, FPM provides a rich API and a powerful command set that
includes SQL commands to accompany its query builder. It also includes a
comprehensive screen builder that offers three-dimensional objects and
supports System 7 features such as AppleScript, Balloon Help, and QuickTime
movies. FPM also provides FoxPro's first wizard tools for automated building
of screens and reports. (Microsoft's recently released version 2.6 of FPD and
FPW also includes wizards.)
But just as there's no such thing as a free lunch, neither is it entirely
possible to effortlessly move software from one platform to
another--particularly when it comes to user interfaces. Consequently, in this
article, I'll present some hard-won strategies for cross-developing FoxPro
screens, focusing in particular on screen objects and font characteristics.
Additionally, I'll share some more-general, FoxPro cross-platform strategies;
see the accompanying text box entitled, "FoxPro Development Tips."


FoxPro Screens


Early in my cross-platform development efforts, I made a conscious decision to
isolate screen code into two basic components:
Interface design, or platform-specific, code (SCX files).
Database operation, or platform-transparent, code (PRG files).
When code is generated for screen SCX/SCT files, "code snippets" and
interface/environment code are generated simultaneously. These snippets
(screen setup, cleanup, object valids, and the like), which are embedded
directly within SCX/SCT files, define how the application handles typical
database operations such as record movement or deletion. In essence, the
screen--all its objects and their functionality--can be entirely
self-contained in a manner similar to that of object-oriented programming. As
with any other non-library FoxPro file, an SCX/SCT file can be ported directly
to another platform. When a screen file is opened on another platform, the
FoxPro transporter intercedes to create a duplicate set of screen objects
specifically for that platform. And when you regenerate screen code, often
twice as much code is created, much of it redundant. Consequently, I avoid
using snippets with my screen files.
Code snippets are merely expressions with calls to procedures/functions in the
same or higher calling program (PRG), which contains mostly
platform-transparent database operation code. The SCX/SCT files, on the other
hand, hold the platform-specific interface code. Editing a PRG is quicker and
doesn't call GENSCRN each time a change is made. If you are storing code in
your screen snippets, you must ensure that any change is also made to those
same objects for all other platforms in the SCX/SCT file.
In general, working between FPM and FPW is relatively easy because both
support virtually the same set of screen objects; see Table 1. While FPD makes
use of many GUI-like controls, it cannot include many of the options supported
by FPW and FPM. (There are third-party tools available, such as Espia, from
Espia Corp. of Indianapolis, IN, that provide a true graphics feeling to FPD
applications.)
While it's easy to say, "only use objects supported by all platforms being
used," I don't adopt this strategy or want such a limit on my development.
There is no reason why FPW shouldn't be able to use picture buttons and FPD
normal pushbuttons. (This isn't to say there aren't limitations: People still
use monochrome Macs, and Apple still makes 640x400 Powerbooks and Macs with
9-inch, 512x384 displays.)
When a screen is ported to another platform and opened for the first time,
FoxPro invokes the "transporter"--a program (TRANSPRT.PRG) that creates new
platform screen objects from existing ones. As you might imagine, complex
heuristics are involved in mapping an object from a character-based coordinate
system (FPD) to a graphical one (FPW or FPM). Topping the list of these
calculations is font handling. Transporting between FPM and FPW, however, is
simply a matter of remapping fonts with similar characteristics--fontmetrics.
In fact, it is likely that less than 10 percent of the 384K TRANSPRT.PRG file
is devoted to GUI transports.
The FoxPro transporter does an adequate job of transporting files between FPW
and FPM. You should expect to make minor adjustments to both the position and
size of many objects. Once a screen is converted, however, the transporter
does an excellent job of keeping objects consistent between platforms.
Listing One (page 102), written entirely in the FoxPro native language,
provides an alternate, yet basic, transporter for converting a screen. The
main advantage of this program over the FoxPro transporter is that it gives
you the ability to specify both default screen and object fonts.


Fontmetrics


To understand what is happening with the transporter, you need to understand
how fonts work in FoxPro. One of the dilemmas Microsoft faced when developing
FPW was how to address object size and positioning, since the company wanted
software to be compatible on both GUI and character-based platforms. The
solution was a unit of measurement known as a "foxel"--a cross between a pixel
and a FoxPro row/column. Example 1 is a FoxPro command that displays an input
field on a screen at coordinates of 9.063,40.125. These are actually rows and
columns on the screen. These coordinates, however, are not controlled by the
object's font (Geneva,9,Normal), but are based entirely on the default screen
font. This is often set by window definitions such as Example 2. (Examples 1
and 2 are both GENSCRN output from a sample screen file.)
Each font has its own unique set of attributes, commonly known as
"fontmetrics." These values are always measured in pixels in both FPM and FPW.
The values I'll examine here primarily affect the font's height and width
dimensions:
FontMetric(1). Character height in pixels.
FontMetric(5). Extra leading in pixels (not available in FPM).
FontMetric(6). Average character width in pixels.
For example, the formula in Example 3 yields the single foxel row and column
values, as well as the calculations of the example field's position. When you
calculate the foxel values as in Example 3(b), the result is the location of
the @..GET field; see Example 3(c). The pixel values 145 and 321 represent the
coordinate position of the field from the upper left of the defined window.
The field sizes (1.000,3.200) are based on the fontmetric values of the object
font (Geneva,9). As you might expect, the foxel values for the object are
smaller; see Example 4.
When you take another look at the DEFINE WINDOW command (refer to Example 2)
for the screen definition, you see a similar analogy. SIZE coordinates are
based on the window's own font, but the AT coordinates are based on the
global, FoxPro default font. FoxPro works with multiple-coordinate systems.
All objects within a single screen are based on the local coordinate system of
that screen, while the screen itself is based on a more global coordinate
screen. Each coordinate system varies because of the differences in foxel
values.
The FoxPro transporter only allows one font per platform--the object font.
Although Microsoft may change this in future releases, there are currently no
accommodations for specifying the default screen font used in the window
definition. In fact, FoxPro defaults to these two default fonts (one on each
platform) when transporting between the platforms; see Table 2(a).
It is virtually impossible to obtain an identical-looking screen when you
first transport. The single-pixel differences in the fontmetrics of the
platforms affect two critical components of the transport process. First, the
position of objects will be off because their coordinate system is based on a
font with different dimensions. Second, the size of the screen is altered,
since its definition is based on this same font. It is much more crucial that
the fontmetrics of default screen fonts match those of object fonts.
It just so happens that the two fonts mentioned are also the default screen
fonts used when a new screen is created (using the CREATE SCREEN command).
Most programmers don't bother changing the default screen font because it
doesn't visually impact the look of a screen, since each object can have its
own font. The key to a truly successful transport is finding default screen
fonts with exact fontmetrics. Table 2(b) lists my preferences for default
screen fonts. I chose these fonts because both are common for their respective
platforms. No doubt, there are countless combinations of fonts with similar
matching values. If possible, you should use common fonts that you know exist
on the computers running your screens. This is especially important if you
plan to redistribute your applications.
There is more room for variation in the fontmetrics of the object fonts, since
most people leave extra space on the screen. The transporter is actually quite
smart in how it handles object fonts. You can specify a single object font for
all objects to convert. If you choose, however, not to specify an object font
and instead use the default transporter font (Geneva for FPM), both font size
and style are retained. With FPM, the font used will either be Geneva or
Chicago, depending on which font it finds in the FPW objects. The transporter
even transports controls, such as pushbuttons, to default system fonts.
The screen's look-and-feel is up to your discretion. If you're looking for a
simple cross-platform font strategy for screen objects, you may want to try
the fonts in Table 2(c), which have a similar look-and-feel on their
respective platforms and are close in fontmetrics. In addition, these are
common Windows and Mac system fonts.


Conclusion


Much of the screen fontmetric discussion can be applied to other aspects of
FoxPro, such as reports. Spend some time up front devising your cross-platform
strategies. It will be well worth the effort in the long run.
Table 1: FoxPro screen objects.
 Screen Object DOS Windows Macintosh

 Static text x x x
 Fields x x x

 Lines x x x
 Boxes x x x
 Rounded rectangles x x
 Pushbuttons x x x
 Invisible buttons x x x
 Picture buttons x x
 Radio buttons x x x
 Picture radio buttons x x
 Check boxes x x x
 Picture check boxesx x x
 Popups x x x
 Lists x x x
 Edit regions x x x
 Spinners x x
 3-D effects x
 OLE objects x x
 Pictures BMP BMP.PICT


Randy is a consultant for Sierra Systems, specializing in FoxPro development
on Macintosh, Windows, and DOS. He is the author of FoxPro 2.5 OLE and DDE
(Pinnacle Publishing, 1994) and is currently working on a new book, FoxPro
MAChete: Hacking FoxPro for Macintosh, to be published by Brady. He can be
reached via CompuServe at 71141,3014.




Example 1: FoxPro command that displays an input field on a screen at
specified coordinates.
@ 9.063,40.125 GET m.state
 SIZE 1.000,3.200 ;
 DEFAULT " " ;
 FONT "Geneva", 9 ;
 PICTURE "@K XX" ;
 COLOR ,RGB(,,,255,255,255)

Example 2: A window definition.
IF NOT WEXIST("_qls1cbchi")
 DEFINE WINDOW _qls1cbchi ;
 AT 0.000, 0.000 ;
 SIZE 18.188,62.500 ;
 TITLE "Customer" ;
 FONT "Geneva", 13 ;
 FLOAT ;
 COLOR RGB(,,,192,192,192)
 MOVE WINDOW _qls1cbchi CENTER
ENDIF


Example 3: (a) Formula yielding single-foxel row and column values and field's
position; (b) calculating the foxel values; (c) the result of the calculation.
(a) 1 Foxel Row = FontMetric(1) + FontMetric(5)1 Foxel Column = FontMetric(6)

(b) 1 Foxel Row = Font(1,'Geneva',13) + Font(5,'Geneva',13)= 16 pixels1 Foxel
Column = Font(6,'Geneva',13)= 8 pixels

(c) = number of rows * pixels per foxel row= 9.063 * 16 = 145= number of
columns * pixels per foxel col= 40.125 * 8 = 321


Example 4: Foxel values for the object in Example 3.
1 Foxel Row = Font(1,'Geneva',9) + Font(5,'Geneva',9)
= 12 pixels

1 Foxel Column = Font(6,'Geneva',9)
= 5 pixels



Table 2: (a) FoxPro default fonts; (b) preferred default screen fonts; (c)
fonts for a simple cross-development strategy.
 Platform Default Screen Font FontMetric(1) FontMetric(6)

 Windows MS Sans Serif,8 13 5
 Macintosh Geneva,10 12 6

 Windows MS Sans Serif,10,B 16 8
 Macintosh Geneva,13,N 16 8

 Windows MS Sans Serif,8 13 5
 Macintosh Geneva,9 12 5


(a)(b)(c)



FoxPro Development Tips


Before starting any cross-platform database project with FoxPro, you might
want to make adjustments to your Macintosh CONFIG.FPM configuration file
(Figure 1), which is loaded when FoxPro is launched. I don't touch the
MACDESKTOP and KEYCOMP options in my CONFIG.FPM file since I'd rather
Macintosh users have a true Mac look-and-feel. The MACDESKTOP setting (set
with the SET command) can make FoxPro act like a true Windows app, in which
all windows exist within the confines of the main FoxPro desktop window.
Normally when this setting is on (by default), all windows exist within the
Macintosh desktop. The KEYCOMP setting gives FPM Windows-equivalent keyboard
shortcuts. Experienced FoxPro developers often increase the MVCOUNT setting,
which tells FoxPro the maximum allocation for memory variables. This ceiling
has been raised from 256 to 1024 with FPM (MVCOUNT limits for PC versions 2.6
have also been raised to 1024).
Once set up, you can begin writing code, porting files, and generating
applications. One of FoxPro's unique features is that an application can run
without recompilation on any FoxPro platform. This means you can create an
application in FPW, port the APP file to the Macintosh, and run it unmodified
in FPM. While binary APP files port directly, it's still your responsibility
that the code contained within be platform-ready.
The VOLUME setting (Figure 1) reassigns the name of my Mac startup volume to a
DOS-like C:\ drive notation. This hard-drive reassignment setting allows me to
port project, screen, and report files from the Mac to a PC without error when
I reopen them. One of the more powerful uses of the VOLUME setting is that you
can assign any Mac path to a specific single-letter drive VOLUME (such as C,
D, Z, or X) and refer to that abbreviated path reference whenever you need to
specify the path in your code.
Internally, FoxPro treats commands and functions that use filenames and paths
by standard DOS shorthand notation. The FPM function SYS(2027) returns the
file/pathname in true Macintosh notation; see Figure 2. Even though native
FoxPro commands and functions properly handle paths regardless of platform,
there are circumstances when you'll use SYS(2027). Example 5, for instance,
uses the API function fxNewFolder (found in FOXTOOLS.MLB) to create a new
Macintosh folder. Since the routine uses native-Macintosh toolbox routines
requiring Mac pathing conventions, it would fail without SYS(2027).
FPM is reasonably smart in handling paths, especially those with spaces. As
you might expect, spaces (common in Macintosh paths) will cause applications
to bomb on PCs, which have more rigid naming conventions. It's a good idea to
use quotes around paths. When you are working with cross-platform
applications, avoid hardcoding paths in your applications and use memory
variables for storing paths and/or filenames.
The FoxPro language is generally platform transparent. However, users are
always going to want applications to adhere to platform standards.
Consequently, it is a good idea to bracket platform code using IF..ENDIF or DO
CASE..ENDCASE statements; see Example 6(a). While it is easy to hide specific
platform commands within these CASE statements, you will likely encounter
problems working with the FoxPro Project Manager (PM) on different platforms.
For example, if you try to rebuild an application in FPW with a program
containing the code in Example 6(b), a compile error will be generated since
the Windows version was developed prior to the Mac version and has no
awareness of certain FPM commands and functions. A way around this is macro
substitution; see Example 6(c). In fact, you might create a function whose
sole purpose is to execute a command passed it as a character string.
Working with API libraries also presents a problem for the FoxPro PM. Any
reference to SET LIBRARY pulls in that library as an external project file;
see Example 7(a). This helps the PM resolve any calls made by other programs
to functions contained within the library.
A useful coding strategy is to make a call to that library indirectly so the
PM doesn't pull in the file; see Example 7(b). The PM will still cause an
error when a function call is made which can't be located (such as one in the
indirect library name). One popular option used by developers is to add a
dummy program file containing just the function names. The dummy program, like
Example 7(c), is never called, but the PM searches each program for
procedures/functions.
Taking this a step further, the Foxtools library, which ships with both FPW
and FPM, contains many functions that you can call from your applications.
Unfortunately, there is no equivalent library in FPD (although, there is a
file called FPATH.PLB that contains some of the functions). One way around
this is to use a SET PROCEDURE file. As Example 7(d) shows, you can place a
function such as YMsgbox in a program called DOS_PROC.PRG.
--J.R.B.

Figure 1: This Macintosh CONFIG.FPM configuration file is loaded when FoxPro
is launched. It is configured for a Windows user.
VOLUME c=\
MACDESKTOP=off
KEYCOMP=windows


Figure 2: The FPM function SYS(2027) returns the file/pathname in true
Macintosh notation.
? CURDIR()
 \FOXPRO\

? SYS(2027,CURDIR())
 COSMIC II:FOXPRO:



Example 5: Using the API function fxNewFolder to create a new Macintosh
folder.
SET LIBRARY TO foxtools
mydir=GETDIR(Select directory:')
retval=fxNewFolder(SYS(2027,m.mydir+'TEMP FOLDER'))

Example 6: (a) Bracketing platform code using IF..ENDIF or DO CASE..ENDCASE
statements; (b) rebuilding an application in FPW with a program containing
this code will generate a compile error; (c) this macro substitution will help
you avoid such errors.
(a)

DO CASE
CASE _DOS
CASE _WINDOWS

CASE _MAC
CASE _UNIX
ENDCASE


(b)

IF _MAC
 SET XCMDFILE TO xalert'
ENDIF


(c)

IF _MAC
 XMac_Cmd='SET XCMDFILE TO "xalert"'
 &XMac_Cmd
ENDIF



Example 7: (a) Reference to SET LIBRARY pulls in the specified library as an
external project file; (b) making a call to that library indirectly so that
the PM doesn't pull in the file; (c) adding a dummy program file, which isn't
called; (d) an alternate method for FPD, which lacks an equivalent library.
(a) SET LIBRARY TO foxtools
(b) SET LIBRARY TO (foxtools')
(c) *dummy.prgFUNCTION justpathFUNCTION juststemFUNCTION msgbox
(d) IF _DOS SET PROCEDURE TO dos_procENDIF


[LISTING ONE] (Text begins on page 84.)

PRIVATE objfont,objfsize,objfstyle
PRIVATE scrnfont,scrnfsize,scrnfstyle
PRIVATE nosize,nostyle,SysControl
PRIVATE mystamp,splatform,splatform2
PRIVATE tmparr,tmpcurs,tmpalias,scrnfile
* Select fonts you want to use in transporting
DO CASE
CASE _MAC
 * default screen font
 scrnfont='Geneva'
 scrnfsize=13
 scrnfstyle=0
 * object font
 objfont='Geneva'
 objfsize=9
 objfstyle=0
CASE _WINDOWS
 * default screen font
 scrnfont='MS Sans Serif'
 scrnfsize=10
 scrnfstyle=1
 * object font
 objfont='MS Sans Serif'
 objfsize=8
 objfstyle=0
ENDCASE
splatform = IIF(_MAC,'MAC','WINDOWS')
splatform2 = IIF(_MAC,'WINDOWS','MAC')
m.nosize = .F. &&retain original font style

m.nostyle = .T. &&retain original font size
m.SysControl = .F. &&use system font for controls
* Select screen file to transport
m.scrnfile=GETFILE(SCX','Select Screen File:')
IF !'.SCX'$UPPER(m.scrnfile)
 RETURN
ENDIF
* If the file already has platform objects it is kicked out.
* You can manually delete these objects and retransport.
m.tmpalias='_'+LEFT(SYS(3),7)
SELECT 0
USE (m.scrnfile) ALIAS (m.tmpalias) EXCLUSIVE
LOCATE FOR platform=m.splatform
IF FOUND()
 WAIT WINDOW File has already been transported.'
 USE IN (m.tmpalias)
 RETURN
ENDIF
WAIT WINDOW Transporting Screen...' NOWAIT
* Create cursor of new platform objects to be appended to original file later.
=AFIELDS(tmparr)
m.tmpcurs='_'+LEFT(SYS(3),7)
CREATE CURSOR (m.tmpcurs) FROM ARRAY tmparr
APPEND FROM DBF(tmpalias) FOR platform = m.splatform2
* Add new platform
REPLACE ALL platform WITH m.splatform
* Handle porting of objects
DO CASE
CASE m.nostyle AND m.nosize &&change only fontface
 REPLACE ALL fontface WITH m.objfont;
 FOR INLIST(objtype,5,11,12,13,14,15,16,22,23)
CASE m.nostyle &&dont' change fontstyle
 REPLACE ALL fontface WITH m.objfont,;
 fontsize WITH m.objfsize;
 FOR INLIST(objtype,5,11,12,13,14,15,16,22,23)
CASE m.nosize &&dont' change fontsize
 REPLACE ALL fontface WITH m.objfont,;
 fontstyle WITH m.objfstyle;
 FOR INLIST(objtype,5,11,12,13,14,15,16,22,23)
OTHERWISE
 REPLACE ALL fontface WITH m.objfont,;
 fontsize WITH m.objfsize,fontstyle WITH m.objfstyle;
 FOR INLIST(objtype,5,11,12,13,14,15,16,22,23)
ENDCASE
* Add system fonts for controls if option set
IF m.SysControl
 DO CASE
 CASE _MAC
 * use Geneva,10,N for controls
 REPLACE ALL fontface WITH Geneva',;
 fontsize WITH 10,fontstyle WITH 0;
 FOR INLIST(objtype,11,13,14,16,22)
 * use Geneva,10,B for text buttons
 REPLACE ALL fontface WITH Geneva',;
 fontsize WITH 10,fontstyle WITH 1;
 FOR objtype=12
 CASE _WINDOWS
 * use MS Sans Serif,8,B for controls
 REPLACE ALL fontface WITH MS Sans Serif',;

 fontsize WITH 8,fontstyle WITH 1;
 FOR INLIST(objtype,12,13,14,16,22)
 * use MS Sans Serif,8,N for lists
 REPLACE ALL fontface WITH MS Sans Serif',;
 fontsize WITH 8,fontstyle WITH 0 FOR objtype=11
 ENDCASE
ENDIF
* Handle screen default font objects
* - picture buttons, invisible buttons
* - picture check boxes, picture radios
REPLACE ALL fontface WITH m.scrnfont,;
 fontsize WITH m.scrnfsize,fontstyle WITH m.scrnfstyle ;
 FOR INLIST(objtype,1,20) OR @*B'$picture OR ;
 @*RB'$picture OR @*CB'$picture
* Note: can add code here to replace objtype 23 info
* Cleanup a little
SELECT (m.tmpalias)
APPEND FROM DBF(m.tmpcurs)
USE IN (m.tmpalias)
USE IN (m.tmpcurs)
WAIT CLEAR
MODIFY SCREEN (m.scrnfile) NOWAIT
RETURN
End Listing






































June, 1994
PROGRAMMING PARADIGMS


Mushroom Programming for Newton




Michael Swaine


I admit, somewhat sheepishly, that the source code for a Newton application
accompanies this column. Why, you ask; and I ask myself, is this an exercise
in futility?
The computer press has not been easy on Apple's Newton MessagePad, the
purported realization of John Sculley's dream of a Personal Digital Assistant.
Jokes are made, and references to the Apple III and the Lisa occur with
distressing frequency.
Distressing, at least, to someone who has invested in Newton's future by
buying a MessagePad, a developer's kit, a place at the developer's conference,
and so on. I have the receipts in front of me now, as motivation.
But I am not downhearted. I take solace in this truth: Newton is a technology,
not a platform. Writing for Newton doesn't mean writing for the MessagePad, or
the MessagePad 100, as the original has been renamed now that there's a new
and improved model 110. Certainly there were problems with the 100, and
certainly not all of them have been solved in the 110, although it is a
significant improvement on the original design.
The 110 has a faster infrared beaming port (38.4 kbps instead of 19.2),
different power system, more internal RAM (1 Mbyte total rather than 640
Kbytes, which works out to a big proportional increase in user-usable RAM), a
slightly different form factor, a slightly different size of screen, a
different pen, a flip-up lid, and a new ROM that includes deferred recognition
and a try-by-letter recognition option. The ROM is available as an upgrade for
the original MessagePad. Deferred recognition should mean the difference
between usability and unusability in some note-taking situations.


Newton's Flaws


But I believe that the significance of the MessagePad's handwriting
deficiencies has been inflated. In my humble opinion, the main technical
problems with the original Newton MessagePad, in decreasing order of
importance, were:
1. There was no built-in modem. Isn't that a rather serious flaw in a
so-called communication device?
2. It didn't fit in the average pocket. It just missed, but that was a big
miss for a so-called portable device. Maybe everybody at Apple carries a bag,
but some of us out here grew up in the Midwest and rely on our pockets.
3. It was buggy. This was less serious. Okay, it was unacceptable, but it was
also predictable. This is a very new technology. This was version 1.0. There's
a ROM upgrade.
4. The handwriting recognition failed to recognize handwriting often enough to
make the device unusable as a meeting note-taker, which is just what any
computer journalist would try to use it for if given the slightest
encouragement.
I regard the handwriting problem as being not as serious as the other problems
mentioned, although it has received most of the press. That was Apple's fault.
The worst problem with the machine, in fact, was Apple's positioning of it. If
you ask me, it should have been sold as a device for communications,
name-and-address storage, to-do lists, and appointment reminders. The
note-taking capability and handwriting recognition should have been presented
as a novelty feature, a hands-on demo of technology under development, an ATG
(Advanced Technology Group, also known as "Alan's Toys and Games") freebie.


Life Stinks, or BYTE's Reality


Which brings me back to the question that motivated my writing a Newton app:
What is a Newton good for?
Apparently not for implementing the game of Life. DDJ contributing editor
David Betz described his efforts in this direction in the March 1994 issue of
BYTE. He found NewtonScript grindingly slow for this application. Okay,
scratch that.
For what it's worth, I'd say that the current Newton devices are well adapted
to three kinds of applications:
1. Personal Information Manager (PIM) stuff. Electronic to-do lists,
appointment calendars, name-and-address databases. Small apps you'd like to
carry around in your pocket. (Oops. The pocket problem again.)
2. Communications. A built-in modem is still lacking, but the pager on a
PCMCIA card is amazing, and the two-way cellular communication from a Newton
device redefines portable computing.
3. What I think of as "tap apps"--applications that don't require much typing
or writing. These apps just ask the user to tap a few buttons or menu choices,
then return some brief text or illustration. Mobile kiosks, you might call
them. This is not a real category, of course; it could include anything from a
calculator to an expert system for medical diagnosis. But it seems like a
reasonable way of thinking about the question of how to make the Newton
useful, a question that would be heavy on your mind, too, if you had these
receipts in front of you.
And in fact I think expert systems are not at all a bad idea for Newton
applications.


AI Lite


Why expert systems?
The interface seems right. Expert systems, at least those I've come across,
typically take input from users in small chunks and return brief textual
opinions (possibly augmented by lengthy explanations). Sounds good for a
device with a small black-and-white screen.
Expert systems need not be compute-heavy apps that require heavy iron. Even a
Newton ought to have the horsepower for a simple expert system. That suggests
a subtle third point.
Devices like Newton could spur interest in truly simple expert systems. AI
lite. Users might have quite different expectations of an expert system that
can be carried in the pocket (oops, the pocket problem again) and that boots
(machine and app) in four seconds or so.
Imagine using an expert system in the field on your conventional portable
computer. If you have to turn on the portable, wait through its boot cycle,
load the app, and take an occasional hike in the middle of using it to get an
answer to a question (portable or not, you don't want to move a computer while
it's running), you probably aren't going to be satisfied with a response of
"Gee, I dunno."
Now picture using a handheld device that is carried in your pocket (let's
imagine that Apple can crack the pocket problem) and can be consulted in a few
seconds. You might shrug off a "Gee, I dunno" response more philosophically.
For some situations, it seems to me, the place where you need that expert
advice is in the field, and the time when you need it is ASAP, and pretty good
right now is better than excellent some time later. Small expert systems might
be ideal in such situations. (You'll notice that I don't actually name any
such situations, but take it on faith that there are some.)
Anyway, whether there's a market or not, I'm writing a small expert system for
Newton. This month you'll see the application shell and the user interface.
Next month, I'll present the inference engine and the knowledge-base
structure.
A two-part presentation actually makes sense. The project as I've conceived it
breaks down nicely into the front-end and back-end components: the UI and the
smarts. Since the Newton Toolkit (NTK) provides an abundance of templates for
user-interface development, this first part of the project is mainly an
exercise in using the NTK and its templates. On the other hand, the NTK
doesn't provide any templates or classes for what I'll be doing in the second
half, so that'll be more an exercise in writing NewtonScript code. Just so you
know, I mean an exercise for the author, not the reader. You'll be watching me
learn here. Scary for all concerned.


A Fungus Amongus



I wanted a tool I could take with me into the woods when I go hunting for
mushrooms. It didn't have to give authoritative advice on mushroom species,
but it should help me decide whether it was worth throwing the latest find in
the basket to take home and look up in my mushroom books. A fallible but
helpful advisor.
That sounded like a good candidate for a tap app. Tap to select mushroom
features like color and cap shape, tap a button to start the identification,
and read off the identification in a text field.
Some mushrooms can be identified from a few features, some require more. The
program should accept partial data. It should also allow refinement of the
feature list: Hmm, that identification doesn't look right; let's try calling
this thing red rather than brown. Or: Here's another mushroom like that last
one I identified, but it has a scaly stem; I'll change just that feature and
ask for another identification. And, so that I can learn which features matter
in identifying different kinds of mushrooms, it should give feedback on what
features it used in making its decision, but this information should be shown
only on request.
It was clear that the program would need to display a hierarchy of mushroom
features, since over 100 features might be relevant for some identifications.
For the current version, though, I restricted it to a single screen of feature
choices.


Frame and Fortune


A frame is a ubiquitous data structure in NewtonScript, and Listing One (page
137) consists of a collection of frames. A frame consists of an unordered
collection of slots, each of which comprises a label and a value. The slot's
value can be any data type, including a frame or a function. You access a
slot's value using dot notation; see Example 1(a).
Methods are defined for frames by creating slots whose values are functions.
NewtonScript uses braces to enclose the frame and commas to separate the
slots, so you can create a frame like Example 1(b). The top-level frames in
Listing One define views, which are UI components.
In creating views, NTK lets you select from a lot of predefined elements
called "prototypes," or "protos." I used the supplied application-shell proto
named protoApp to create the base view, Mushrooms, for the program Fungus.
ProtoApp comes with slots for a title, view bounds, format, various flags,
attributes, methods, and a required slot named declareSelf, which has a
default value of base for an application, and which identifies the view to
the system. (The single quote in front of base identifies it as a symbol.)
You can add slots, of course, to any view that you construct using this or any
proto. ViewSetupFormScript is a standard method, executed before any of a
view's other slots are evaluated. It's the place to set screen coordinates of
the view, for example. Now that Apple has come out with a second MessagePad
(the 110) with different screen dimensions, it's important to create views
that work with different screen sizes. The code in viewSetupFormScript sizes
the view to fit the screen. All subviews should then be sized to fit in this
view, and I confess I haven't done that yet.
The observations frame holds the features selected by the user. Its collection
of frames will grow as I add more features to be identified, and I'll probably
have to rework other parts of the program if these features become
hierarchical, as they should; see Example 1(c).
The advisor method will be replaced by a simple expert-system advisor. These
are just a few If/Then tests to let the app return some kind of identification
based on the observations.
The _proto slot identifies this view as having been derived from the protoApp
proto. The _proto slot defines an inheritance path in one of NewtonScript's
two inheritance mechanisms. A view can inherit from its proto (in this case a
system proto residing in ROM) and from its parent view. This base view,
Mushrooms, is the parent view for all the other views in this program. They
normally have access to its slots, but not vice versa. There is a mechanism
for making child views visible to the parent: You declare the child to the
parent. This installs an extra slot in the parent, pointing to the child. It
has some overhead, and you should only do it when necessary. I use it with the
MessageBox and Size views in this app.


The Kids Are All Protos


Most of the user's selection of mushroom features is handled using views based
on the protoLabelInputLine proto. This displays a label with a dotted line
next to it. When you tap on the label, a list of possible values pops up.
Tapping on one of these selects it, displays it on the dotted line, and places
it in a slot of the view. Listing One shows one of the views based on this
proto.
The normal way of reading off the user's selection with this proto is by using
the textChanged method. The MessagePad has a fixed Undo button at the bottom
of its screen, so you can add undo capability wherever it's appropriate.
Undoing a selection in this view is a simple matter of putting back the
previous selection, so I implemented that. It's necessary to register the undo
method with the system, since the Undo button belongs to the system, rather
than to your app.
Listing One also shows how I implemented a slider, adding two extra views, one
to display a label like those of the protoLabelInputLine views, and one to
display the slider's value as a number of centimeters. Declaring the latter to
the base view and having the slider view message the base view to update the
centimeter display view is one way to synchronize these two sibling views, the
slider, and the centimeter display. I doubt, however, that it's the best way.
The last two views in the listing are the Advise and MessageBox views. The
only interesting thing about the Advise view, which defines the button the
user taps to start the identification, is that it's a picture button. You can
include PICT-format pictures in Newton apps, for illustrations or icons, by
placing them in resource files and choosing a menu item that adds the files.
Figuring I'd let Apple do as much work for me as possible, I created this
button's icon using a mushroom picture that I found in the Apple-supplied
HyperCard art-bits stack.
The MessageBox view displays the identification to the user. Its text slot is
initialized to a brief introductory message. MessageBox is declared to the
base view, Mushrooms, so that the base view can update its text slot with the
identification.
Programming with the NTK at this level is a mixture of coding and visual
programming. You do a certain amount of clicking and dragging to initially
create your views, then add functionality by writing methods. Next month will
be all coding, though, as I try to put some smarts into the Advisor method.

Example 1: (a) Accessing a slot's value using dot notation; (b) using braces
to enclose the frame and commas to separate the slots so as to create a frame;
(c) hierarchical approach to mushroom program.
(a) viewBounds.top := b.appAreaTop + 2

(b) myFrame := { slot1 : 1000, frameSlot : { name : "David", game : "Life" },
methodSlot : fun() begin // method body; end}
(c) observations := { cap := { cap_surface : "", cap_color : "", cap_shape :
"" }, gills := { ... }}
[LISTING ONE] (Text begins on page 105.)

// Fungus -- A Mushroom Identification Program for Newton by Mike Swaine
// Fungus presents a single screen of mushroom attributes, lets the user set
// values for some or all of them, and tries to identify the mushroom based
// on these values. NB: This is a demo of NewtonScript, not a useful app.
// Its "advice" is NOT to be relied upon! The base view of the application is
// a frame named Mushrooms. It's based on the protoapp proto.
Mushrooms :=
 { viewSetupFormScript: /* executed during view creation */
 func()
 begin
 // Set view bounds relative to screen dimensions.
 local b := GetAppParams();
 self.viewBounds.top := b.appAreaTop + 2;
 self.viewBounds.left := b.appAreaLeft + 2;
 self.viewBounds.bottom := self.viewBounds.top+b.appAreaHeight-4;
 self.viewBounds.right := self.viewBounds.left+b.appAreaWidth-4;
 end,
 title: "Mushroom Field Guide",
 viewflags: 5, /* visible, clickable, etc. */
 viewFormat: 328017, /* pen, frame, etc. */
 declareSelf: base, /* required for base view */

 observations: /* attributes of mushroom to be identified */

 {color : "", /* all initialized to emptiness */
 size : 0,
 cap_shape : "",
 cap_surface : "",
 gill_type : "",
 gill_attachment : "",
 stem_position : "",
 stem_surface : "",
 veils : ""},

 advisor: /* the mushroom identification engine */
 func()
 begin
 // Dummy code. Could be extended to a humungous list of IF-THEN tests,
 // but plan is to replace with a simple expert system. Note: in
 // NewtonScript, all = tests on structured objects compare pointers,
 // while < and > tests compare contents. Hence variations in syntax.
 if strEqual(base.observations.color,"brown")
 AND base.observations.size > 3
 AND strEqual(base.observations.gill_type,"absent")
 then base : advise("a Bolete",1);
 else if strEqual(base.observations.color,"brown")
 AND base.observations.size <= 3
 then base : advise("an LBM (little brown mushroom)",1);
 else base : advise("too little data for a conclusion",0);
 end,

 advise: /* outputs the identification */
 func(m,c)
 begin
 setValue(MessageBox,'text,"It looks like you have" && m & ".
 \n Confidence level:" && c & ".");
 end,

 showSize: /* updates Size display to current SizeSlider value */
 func(n)
 begin
 setValue(Size,'text,n && "cm"); /* value shown as centimeters */
 end,

 _proto: protoapp, /* proto inheritance link */
 };

// Cap Shape, based on the protolabelinputline proto, is a child view of
// base view. It displays a label and an input line. Tapping the label
// shows a list of values. Tapping a value puts it in the input line.
Cap Shape :=
 { viewSetupFormScript: /* executed during view creation */
 func()
 begin
 // This should set base-relative view bounds.
 // The protolabelinputline proto that this view is based on
 // has a child view, entryLine, responsible for the input line.
 // This is how its slots are accessed:
 entryLine.viewFont := userFont10;
 entryLine.text := "";
 prevText := entryLine.text; /* save for undo */
 end,
 label: "Cap Shape", /* the label */

 labelCommands: /* the values displayed */
 ["cylindrical", "conical", "bell", "convex", "flat", "concave"],

 textChanged: /* invoked when text in input line is changed */
 func()
 begin
 // Store user selection in slot in observations frame.
 base.observations.cap_shape := entryline.text;
 // Register this method's undo method with the system
 // so the Undo button will know what to do.
 AddUndoAction(undoTextChange,[prevText]);
 end,

 labelClick: /* invoked when user taps the label */
 func(unit)
 begin
 prevText := entryLine.text; /* save for undo */
 return nil; /* otherwise method is not passed */
 end,
 undoTextChange: /* the undo method for this method */
 func(t)
 begin
 entryLine.text := t;
 base.observations.cap_shape := entryline.text;
 end,
 _proto: protolabelinputline, /* proto inheritance link */
 };
// ...and so on. There are also frames for user input of Cap Surface, Gill
// Type, etc., but they look like this frame for Cap Shape.
// The frame(s) for input of Size, though, are a little different:
// SizeSlider, SizeLabel, and Size are views that implement a kind of slider,
// an alternative to the protolabelinputline used in Cap Shape.
// SizeSlider is based on the protoslider proto.
SizeSlider :=
 { viewSetupFormScript: /* executed during view creation */
 func()
 begin
 viewValue := 5; /* initial setting */
 / Display initial slider setting in Size view.
 base : showSize(self.viewValue);
 prevValue := viewValue; /* save for undo */
 end,

 // Slider settings are interpreted by interpolating between
 // minValue and maxValue. This app treats the result as centimeters.
 minValue: 0,
 maxValue: 24,
 changedSlider: /* invoked when slider is moved to a new position */
 func()
 begin
 // Store user selection in slot in observations frame.
 base.observations.size := viewValue;
 // Register this method's undo method with the system
 // so the Undo button will know what to do.
 AddUndoAction(undoValueChange,[prevValue]);
 end,
 trackSlider: /* invoked as slider is moved */
 func()
 begin

 // Display slider setting in Size view.
 base : showSize(self.viewValue);
 end,
 viewClickScript: /* invoked when user touches slider */
 func(unit)
 begin
 prevValue := viewValue; /* save for undo */
 return nil; /* otherwise method is not passed */
 end,
 undoValueChange: /* the undo method for this method */
 func(v)
 begin
 viewValue := v;
 base.observations.size := viewValue;
 end,
 _proto: protoslider, /* proto inheritance link */
 };
// SizeLabel, based on the protostatictext proto, labels SizeSlider.
SizeLabel :=
 { text: "Size",
 _proto: protostatictext, /* proto inheritance link */
 };
// Size displays the setting of SizeSlider numerically.
// It's based on the protostatictext proto.
Size :=
 { text: "",
 _proto: protostatictext, /* proto inheritance link */
 };
// View Size is accessible from Mushrooms.

// Advise is the button pressed to start the identification.
// It's based on the protopicturebutton proto.
Advise :=
 { icon: GetPictAsBits("Mushrooms", 1),
 buttonClickScript: /* invoked when button clicked */
 func()
 begin
 base : advisor(); /* fire up the identification engine */
 end,
 _proto: protopicturebutton, /* proto inheritance link */
 };
// Messagebox displays messages to user. Based on clParagraphView view class.
MessageBox :=
 { text: /* initial value */
 "Select the characteristics that best describe your find
 and tap the picture of the mushroom.",
 viewclass: 81,
 };
// View MessageBox is accessible from Mushrooms.
End Listing












June, 1994
C PROGRAMMING


The Quincy Preprocessor




Al Stevens


Last month I introduced Quincy, a new "C Programming" column project. Quincy
is a C-language teaching interpreter with an interactive D-Flat user
interface. Its original version was a K&R interpreter. The new project is much
closer to Standard C with a CUA integrated environment.
This month I'll discuss the interpreter's preprocessor, which implements a
subset of Standard C's preprocessing operators. Quincy supports #if, #ifdef,
#ifndef, #else, #elif, #endif, #define, #undef, #include, and the backslash
(\) line-continuation character in macros. It does not support the #
"stringizing" and ## concatenation operators in macros, but I might add them
later. Quincy also does not support the #line, #error, or #pragma directives.
A preprocessor reads C source code and translates it for the compiler. The
preprocessor deletes comments, excess white space, and code that compile-time
conditionals (#if, and so on) delete. It also resolves #define macros and
inserts other source-code files that the #include directive specifies. The
preprocessor maintains line-number integrity in the output source code so a
source-level debugger can set breakpoints and step through the code.
Traditionally, the preprocessor is a stand-alone program that runs as the
first pass of a compile, producing a temporary file for the second pass to
read. Quincy is an interactive interpreter, so the preprocessor is implemented
through a function that the interpreter calls before it begins translating the
code.


A p Descendent


A preprocessor is a complex piece of code. The original Quincy did not have
any preprocessing, although it supported simple #define macro substitutions
without parameters. Other preprocessing directives were comments. You could
put an #include statement in, for example, but it did nothing. All of the
library functions were built in, and K&R C did not have prototypes, so a
preprocessor was not necessary. The current version has header files with
prototypes and macros. Some header files even have functions. Consequently, a
preprocessor became necessary.
Not wanting to reblaze old trails, I went looking for an existing C
preprocessor to adapt. My first thought was to download the Gnu version. I'm
sure it's tucked away somewhere in one of those megabytes of Gnu uploads, but
I couldn't tell which one from the file descriptions, and I sure didn't want
to download all of that stuff. A search of the likely CompuServe libraries
with PREPROCESSOR keywords didn't turn up anything productive, either, so I
did the obvious--I turned to the Doctor for help.
Years ago, DDJ published an article with a preprocessor for the Small C
compiler. The program was called "p." I found it in one of the annual bound
editions. Because of its age, the source code is not available electronically,
so I typed it in and compiled it. By gum, it worked. It's not the program you
see in this issue, but the example showed me how to handle all of the nested
#if, #ifdef, #ifndef, #else, and #elif operators. The p program is an
interesting study in how we used to recklessly treat pointers and integers
interchangeably. I used to write programs that way. Trying to adapt the p code
to ANSI C showed me how much the standard language encourages better coding
practices. Eventually, I gave up and just extracted the logic I wanted. Even
though I couldn't use the p code itself, the exercise demonstrates the
endurance of the early DDJ issues. Don't throw anything away.


Preprocessing


Listing One, page 143, is preproc.c, the Quincy preprocessor. There are other
parts, which the preprocessor shares with the interpreter, and I will discuss
them in later columns, but preproc.c is the main thread.
Quincy calls the PreProcessor function after the programmer types or loads a
source program and tells Quincy to run it. The function accepts two
parameters, a pointer to the preprocessed code, and a pointer to the raw
source code. When the function returns, the preprocessed code is ready to be
translated.
The Quincy source-code model consists of one source-code module in memory,
which may have been loaded from disk, and zero or more #include files that are
on disk. Because the environment is an interactive interpreter, there is no
link process, so there are no other compiled object modules or libraries with
which to link. The preprocessor translates the source code of the main and
#include files into one source-code stream. Each input source-code file and
the preprocessed source-code file must, therefore, fit into a 64K buffer.
After some housekeeping, the PreProcessor function calls the PreProcess
function to translate the code. This function is the top level of the
preprocessing loop, which calls itself from a lower level when it encounters
an #include statement in the code. The function processes source code one line
at a time. The program passes through the input buffer by calling the
ReadString function, which first determines the length of the next line in the
input buffer, allocates a line buffer to hold the line, and copies it into the
line buffer.
Throughout the preprocessing, the program uses the ExtractWord function to
pull logical words from the input stream. This function accepts a pointer to a
buffer to receive the word, the address of the input-stream pointer, and a
string of special characters that are allowed in the word. The function copies
characters as long as they are alphabetic, numeric, or one of the specified
allowed characters. Usually the underscore is the only non-alphanumeric
character allowed in C identifiers. Preprocessing tokens themselves allow no
special characters. When the program extracts the filename from the #include
directive, it allows periods, dollar signs, underscores, and backslashes.
Tests for white space in the source code are done by the isSpace macro in
preproc.h, Listing Two, page 146. This test recognizes Quincy's internal
notation for tab expansion, which uses the tab and form-feed characters with
the most-significant bit set.


Preprocessing Directives


Each preprocessing directive must, by definition, be on its own source-code
line. If the program finds a pound sign (#) in the first non-white-space
character, the line is assumed to be a preprocessing directive, and the
function extracts the directive keyword. To convert the directive into a
token, the program calls FindPreProcessor, passing the directive's keyword.
This function is in a different place in Quincy--the place where all symbol
translations occur. There are functions that translate C-language identifiers
and keywords into character tokens. A switch statement tests the directive
token and calls a function to process it.


#include


The #include directive tells the program to include another source file. The
program maintains a linked list of source-code files that contribute to the
running program. This list stays in place while the program is running so the
interpreter can identify the location of errors. Quincy recognizes the
difference between #include <filename> and #include "filename". If you use
angle brackets, Quincy looks for the file in the subdirectory where the Quincy
executable is located. Otherwise, it looks in the current subdirectory. The
preprocessor makes sure the source program does not include a file more than
once. This is to avoid #include loops, such as when file A includes file B,
which includes file A.
Each source file being processed has its own context, and the #include logic
saves the current context, reads the new file into a fresh buffer, and calls
PreProcess to continue the process. When PreProcess returns, the program frees
the buffer, restores the context, and returns to continue processing the
previous source file. Each context includes a file number and line number. As
Quincy emits preprocessed source-code lines, it generates newline tokens,
which are just newline characters followed by C comments that specify the
current file and line number like: /*1:3*/. This format is valid C-language
source code and provides the debugger with file- and line-number information
for setting breakpoints and reporting errors.


#define and #undef


Quincy supports the #define directive with recursive argument substitutions.
That operation divides into two parts, the logic that records the macro itself
and the logic that substitutes arguments for parameters when the source
program calls the macro.
The DefineMacro function adds a new macro to a linked list of defined macros,
first making sure the macro is not already defined. A macro may or may not
have a parameter list. One with no parameters may or may not have an empty
parameter list. One with no parentheses at all is meant to be used for simple
substitutions. The DefineMacro function breaks the macro into three strings:
the macro name, its parameter list, and the macro definition. Then it calls
the AddMacro function. This function builds an array of pointers to the
parameter identifiers in the macro. Then it converts the matching identifiers
in the macro definition into parameter-number tokens. A macro that looks like
this in source code: #define min(a,b) (a<b?a:b) looks like this internally:
min (#0<#1?#0:#1).
The ResolveMacro function (to be discussed in a later column) substitutes the
arguments in the parameter call with the matching argument numbers in the
macro definition. If I decide to implement the # and ## operators later, I
will probably need to use a different token for the internal parameter
numbers.
The #undef directive removes the macro named by its argument from the linked
list of #define macros. If no such macro is defined, the program ignores the
directive.



Compile-Time Conditionals


The #ifdef and #ifndef directives test to see if the macro specified by the
argument is defined. If so, the directives set the Skipping variable
accordingly. The #if and #elif directives each test their respective constant
arguments, which may involve calls to other macros, for a positive or negative
value and set the Skipping variable if the value is true. The Skipping
variable tells the preprocessor when to skip source code. Since these #if
forms can be nested, they each increment the IfLevel variable and use it to
set the Skipping variable. This is the logic I borrowed from the
aforementioned p.
The #if and #elif directive functions call MacroExpression, which is a
recursive-descent parsing algorithm that evaluates constant expressions. I'll
be discussing expression evaluation in a later column. For now, it is enough
to know that MacroExpression returns a false value if the argument expression
evaluates to 0, or returns a true value otherwise.
The #else and #endif directives manage the Skipping value based on the current
IfLevel setting. These variables have the following meaning: If the Skipping
variable is greater than 0, the preprocessor ignores all source-code lines
except those that have compile-time conditional directives. While the IfLevel
variable is greater than zero, the program is within one or more levels of
nested #ifs and #elses. Every #if form increments IfLevel and, if Skipping is
not set and the argument's value is true, sets Skipping to the IfLevel value.
For the #endif, #else, or #elif directives to be valid, the IfLevel variable
must be greater than 0. #endif decrements the IfLevel variable. If the IfLevel
variable is greater than 0 at the end of the preprocessing stage, there is an
unterminated #if macro form somewhere in the source code.


Code Output


If the first character in the source-code line was not a pound sign, and the
program is not skipping source lines because of a compile-time conditional,
the function calls the OutputLine function to process a source-code line.
Every identifier on a source line is searched against the table of #define
macros to see if the identifier is a macro. Every nonidentifier-- operators,
constants, literals, and so on--is passed to the preprocessed output. The
OutputLine function inserts the file/line-number token comments and strips
white space and comments from the input.


Resolving Macros


To convert identifiers, the OutputLine function calls the ResolveMacro
function, which translates its result into the string pointed to by its first
argument. The result is either the identifier itself when it is not a macro
invocation, or the resolution of the macro. Resolving macros is a recursive
operation, because macros often call other macros. The ResolveMacro function
is a part of the code that evaluates expressions.


Quincy Error Checking


Quincy does some of its error checking during code compilation and some during
run time. This reflects its interactive interpreter status. I could go
overboard and turn Quincy's dialect of C into a strong run-time type and
bounds-checking language, but that would belie Quincy's role as a C
interpreter. The original Quincy allowed you to use full expressions to
initialize global variables, for example. That was easy to do because
everything was interpreted at run time. That does not, however, reflect the
way C works, and so, even though it added work to change the behavior, the new
Quincy emulates the compiled C program when it interprets the source code.
Error checking stops the compiling or interpretation of the program at the
first error and returns part of the IDE to the editor. If the cursor is on the
offending source-code line, an appropriate error message displays. If the
error is in an #include file, the error message names the file and the line
number where the error was found. Since #include files may contain executable
code--some of the standard header files do-- these errors, too, can occur
during translation or run time.
The programmer sees no difference between compiling and run time. When you
tell Quincy to run or step through a program, it runs the preprocessor, the
lexical scanner, the translator, and then begins interpreting.


Subsets


Looking at Quincy's subset of C, in both the interpreter and the preprocessor,
I find it reflects the ways I use the C language. For example, last month I
said that Quincy does not support the typedef operator. It does now. I kept
missing it.
A notable exception to that rule is the goto statement. I never use it in a
program, but I put support for it into Quincy. The original Quincy did not
support goto because of the way the interpreter constructed and destroyed
local variables. goto would have been hard to implement. The new interpreter
uses different logic for local variables, and goto is relatively easy to
accommodate. Rather than force my view of goto on students and other teachers,
I decided to include it and let them decide for themselves.
The only reason Quincy does not yet support multidimensional arrays is that
the code necessary to parse and process their initializers is hard to fit into
the program. Even though most of the existing program is gone, the underlying
structure of the interpreter is the same, and I keep running into walls I have
to tear down in order to add something. It bothers me that the feature is
missing, however, and I intend to put it in.
If you find yourself wanting a particular feature, let me know. Remember
Quincy's purpose, though, which is to help students teach themselves C.
Whether or not I add a feature depends on how difficult it is and how relevant
it is to learning C at the primary level. For instance, I probably won't put
#pragmas in.
C is not an easy language to interpret. It has some nutty constructs. There
are comma-separated declarators, with and without initializers; initializers
that must be constants under some circumstances and may be full expressions
under others; auto-increment and decrement operators on either side of a
variable identifier; an incestuous relationship between pointers and arrays;
and so on. Don't misunderstand me. As a programmer, I like using those
features in C and C++. But parsing and interpreting them are something else
again. The compiler builders have my respect. Doing a translator by hand makes
you appreciate why they came up with tools such as LEX and YACC to make the
job easier.


Quincy's Influence on D-Flat


Using D-Flat as the user interface for Quincy was a natural choice.
Practically everything I needed was already there, and of course, there was no
learning curve. I did, however, find some things about D-Flat I wanted to
change as a direct result of using Quincy.
The first area to improve was the editor. For years D-Flat users beat me up
for not having an editor that expands and collapses tabs. My answer was always
that D-Flat provides a basic edit-box class. If you need more than that, use
the window-class derivation technique to build one. Well, finally, I needed
one for Quincy, so I built the Editor class specifically for that purpose. You
can stop beating me up now.
The second area was the Help system. To begin with, there has been an
insidious bug in the Help system for a while. For some reason, it would crash
an application upon exit to DOS if you did a lot of navigating around the help
database using the hypertext links. I always suspected a heap problem but
could never get the program to crash consistently enough to find it. Quincy
relies heavily on the Help system in its tutorials. I had to fix that bug. I
tore apart all of the hypertext stuff and overhauled it to not use the heap so
flamboyantly. The bug seems to have gone away.
Next was the size of the Help database. D-Flat loads the database by reading
all of the text and building an internal table of help windows. Quincy's
database is going to be big. It was taking a long time on slower machines just
to start the program. I modified D-Flat's program that compresses the help
file to build the table and add it at the end of the file. Now D-Flat
applications load much faster regardless of the help database size.
I never liked the D-Flat File Open and Save As common dialogs. I designed them
according to the CUA spec. When I built D-Flat++, I improved the design to
look more like those in Windows 3.1. Before I started Quincy, I decided to
port the improved design to D-Flat.
The last change was to accommodate the tutorial. Not all Quincy users will
need or want it, so I built it as a second Help database. I had to modify
D-Flat to allow an application to switch between Help databases.
As a result of these changes, you need D-Flat version 18 or later to build
Quincy.


Why Not D-Flat++?


You might wonder why Quincy uses D-Flat rather than D-Flat++. Sometimes I ask
myself the same question. First, Quincy is a C program. Converting it to C++
would have added work. In retrospect, I can see that it might have saved some
work, too, but that's another story. Second, D-Flat has more features than
D-Flat++, most notably the hypertext Help system, which is central to the
tutorial. Porting that feature to D-Flat++ would have been a sizeable job.
Finally, Quincy is a C interpreter. Something said to me that writing a C
interpreter in C++ was backwards, kind of like going to a hog-calling contest
in a Lexus. It just didn't sit right.


C Programming Source Code



Quincy, D-Flat, and D-Flat++ are available to download from the DDJ Forum on
CompuServe and on the Internet by anonymous ftp. See page 3 for details. If
you cannot get to one of the online sources, send a diskette and a stamped,
self-addressed mailer to me at Dr. Dobb's Journal, 411 Borel, San Mateo, CA
94402. I'll send you a copy of the source code. It's free, but if you want to
support the Careware program, include a dollar for the Brevard County Food
Bank. They help hungry and homeless citizens.
[LISTING ONE] (Text begins on page 111.)

/* -------- preproc.c -------- */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <dos.h>
#include <sys\stat.h>
#include "qnc.h"
#include "preproc.h"

static MACRO *FirstMacro;
int MacroCount;

/* --- #included source code files --- */
typedef struct SourceFile {
 unsigned char *fname;
 struct SourceFile *NextFile;
} SRCFILE;
static SRCFILE *FirstFile;
static SRCFILE *LastFile;
static SRCFILE *ThisFile;
static unsigned char FileCount;

static int Skipping;
static int IfLevel;
static unsigned char *Line;
static unsigned char *Word;
static unsigned char *FilePath;
static unsigned char *Ip, *Op;
static unsigned char *IncludeIp;

/* ------ local function prototypes ------- */
static void FreeBuffers(void);
static void PreProcess(void);
static void OutputLine(void);
static void DefineMacro(unsigned char*);
static void Include(unsigned char*);
static void UnDefineMacro(unsigned char*);
static void If(unsigned char *);
static void Elif(unsigned char *);
static void IfDef(unsigned char *);
static void IfnDef(unsigned char *);
static void Else(void);
static void Endif(void);

static void UnDefineAllMacros(void);
static int ReadString(void);
static void WriteChar(unsigned char);
static void WriteWord(unsigned char*);

/* --- preprocess code in SourceCode into pSrc --- */
void PreProcessor(unsigned char *pSrc,unsigned char *SourceCode)
{
 Op = pSrc;
 Ip = SourceCode;

 Ctx.CurrFileno = 0;
 Ctx.CurrLineno = 0;
 IfLevel = 0;
 Skipping = 0;
 Word = getmem(MAXMACROLENGTH);
 FilePath = getmem(128);
 PreProcess();
 if (IfLevel)
 error(IFSERR);
 FreeBuffers();
}
/* --- delete all preprocessor heap usage on error --- */
void CleanUpPreProcessor(void)
{
 FreeBuffers();
 DeleteFileList();
}
/* ---- free heap buffers used by preprocessor ---- */
static void FreeBuffers(void)
{
 UnDefineAllMacros();
 free(IncludeIp);
 free(Line);
 free(FilePath);
 free(Word);
 IncludeIp = NULL;
 FilePath = NULL;
 Word = NULL;
 Line = NULL;
}
/* ---- bypass source code white space ---- */
void bypassWhite(unsigned char **cp)
{
 while (isSpace(**cp))
 (*cp)++;
}
/* ---- extract a word from input --- */
void ExtractWord(unsigned char *wd, unsigned char **cp, unsigned char
*allowed)
{
 while (**cp) {
 if (isalnum(**cp) strchr(allowed, **cp))
 *wd++ = *((*cp)++);
 else
 break;
 }
 *wd = \0';
}
/* ---- internal preprocess entry point ---- */
static void PreProcess()
{
 unsigned char *cp;
 while (ReadString() != 0) {
 if (Line[strlen(Line)-1] != \n')
 error(LINETOOLONGERR);
 cp = Line;
 bypassWhite(&cp);
 if (*cp != #') {
 if (!Skipping)
 OutputLine();

 continue;
 }
 cp++;
 /* --- this line is a preprocessing token --- */
 bypassWhite(&cp);
 ExtractWord(Word, &cp, "");
 switch (FindPreProcessor(Word)) {
 case P_DEFINE:
 if (!Skipping)
 DefineMacro(cp);
 break;
 case P_ELSE:
 Else();
 break;
 case P_ELIF:
 Elif(cp);
 break;
 case P_ENDIF:
 Endif();
 break;
 case P_IF:
 If(cp);
 break;
 case P_IFDEF:
 IfDef(cp);
 break;
 case P_IFNDEF:
 IfnDef(cp);
 break;
 case P_INCLUDE:
 if (!Skipping)
 Include(cp);
 break;
 case P_UNDEF:
 if (!Skipping)
 UnDefineMacro(cp);
 break;
 default:
 error(BADPREPROCERR);
 break;
 }
 }
}
/* ----- find a macro that is already #defined ----- */
MACRO *FindMacro(unsigned char *ident)
{
 MACRO *ThisMacro = FirstMacro;
 while (ThisMacro != NULL) {
 if (strcmp(ident, ThisMacro->id) == 0)
 return ThisMacro;
 ThisMacro = ThisMacro->NextMacro;
 }
 return NULL;
}
/* ----- compare macro parameter values ---- */
static int parmcmp(char *p, char *t)
{
 char tt[80];
 char *tp = tt;

 while (alphanum(*t))
 *tp++ = *t++;
 *tp = \0';
 return strcmp(p, tt);
}
/* ---- add a newly #defined macro to the table ---- */
static void AddMacro(unsigned char *ident,unsigned char *plist,
 unsigned char *value)
{
 char *prms[MAXPARMS];

 MACRO *ThisMacro = getmem(sizeof(MACRO));
 ThisMacro->id = getmem(strlen(ident)+1);
 strcpy(ThisMacro->id, ident);
 /* ---- find and count parameters ---- */
 if (plist) {
 /* ---- there are parameters ---- */
 ThisMacro->isMacro = 1;
 plist++;
 while (*plist != )') {
 while (isspace(*plist))
 plist++;
 if (alphanum(*plist)) {
 if (ThisMacro->parms == MAXPARMS)
 error(DEFINERR);
 prms[ThisMacro->parms++] = plist;
 while (alphanum(*plist))
 plist++;
 }
 while (isspace(*plist))
 plist++;
 if (*plist == ,')
 plist++;
 else if (*plist != )')
 error(DEFINERR);
 }
 }
 /* --- build value substituting parameter numbers --- */
 if (value != NULL) {
 /* ---- there is a value ---- */
 ThisMacro->val =
 getmem(strlen(value)+1+ThisMacro->parms);
 if (ThisMacro->parms) {
 char *pp = ThisMacro->val;
 while (*value) {
 if (alphanum(*value)) {
 int p = 0;
 ExtractWord(Word, &value, "_");
 while (p < ThisMacro->parms) {
 if (parmcmp(Word, prms[p]) == 0) {
 sprintf(pp, "#%d", p);
 pp += 2;
 break;
 }
 p++;
 }
 if (p == ThisMacro->parms) {
 strcpy(pp, Word);
 pp += strlen(Word);

 }
 }
 else
 *pp++ = *value++;
 }
 *pp = \0';
 }
 else
 /* --- no parameters, straight substitution --- */
 strcpy(ThisMacro->val, value);
 }
 ThisMacro->NextMacro = FirstMacro;
 FirstMacro = ThisMacro;
 MacroCount++;
}
/* ----- #define a new macro ----- */
static void DefineMacro(unsigned char *cp)
{
 unsigned char *vp = NULL, *vp1;
 unsigned char *lp = NULL;
 bypassWhite(&cp);
 ExtractWord(Word, &cp, "_");
 if (FindMacro(Word) != NULL)
 error(REDEFPPERR); /* --- already defined --- */
 /* ---- extract parameter list ---- */
 if (*cp == () {
 lp = cp;
 while (*cp && *cp != )' && *cp != \n')
 cp++;
 if (*cp++ != )')
 error(DEFINERR);
 }
 bypassWhite(&cp);
 /* ---- extract parameter definition ---- */
 if (*cp)
 vp = getmem(strlen(cp)+1);
 vp1 = vp;
 while (*cp && *cp != \n') {
 char *cp1 = cp;
 while (*cp && *cp != \n')
 cp++;
 --cp;
 while (isSpace(*cp))
 --cp;
 cp++;
 strncpy(vp1, cp1, cp-cp1);
 vp1[cp-cp1] = \0';
 vp1 = vp + strlen(vp)-1;
 if (*vp1 != \\')
 break;
 ReadString();
 cp = Line;
 bypassWhite(&cp);
 vp = realloc(vp, strlen(vp)+strlen(cp)+1);
 if (vp == NULL)
 error(OMERR);
 vp1 = vp + strlen(vp)-1;
 }
 if (strcmp(Word, vp))

 AddMacro(Word, lp, vp);
 free(vp);
}
/* ----- remove all macros ------ */
static void UnDefineAllMacros(void)
{
 MACRO *ThisMacro = FirstMacro;
 while (ThisMacro != NULL) {
 MACRO *tm = ThisMacro;
 free(ThisMacro->val);
 free(ThisMacro->id);
 ThisMacro = ThisMacro->NextMacro;
 free(tm);
 }
 FirstMacro = NULL;
 MacroCount = 0;
}
/* ------ #undef a macro ------- */
static void UnDefineMacro(unsigned char *cp)
{
 MACRO *ThisMacro;
 bypassWhite(&cp);
 ExtractWord(Word, &cp, "_");
 if ((ThisMacro = FindMacro(Word)) != NULL) {
 if (ThisMacro == FirstMacro)
 FirstMacro = ThisMacro->NextMacro;
 else {
 MACRO *tm = FirstMacro;
 while (tm != NULL) {
 if (ThisMacro == tm->NextMacro) {
 tm->NextMacro = ThisMacro->NextMacro;
 break;
 }
 tm = tm->NextMacro;
 }
 }
 free(ThisMacro->val);
 free(ThisMacro->id);
 free(ThisMacro);
 --MacroCount;
 }
}
/* ------ #include a source code file ------ */
static void Include(unsigned char *cp)
{
 FILE *fp;
 int LocalInclude;
 int holdcount;
 unsigned char holdfileno;
 SRCFILE *holdfile;
 unsigned char *holdip;
 struct stat sb;

 holdfile = ThisFile;
 *FilePath = \0';
 bypassWhite(&cp);
 /* ---- test for #include <file> or #include "file" ---- */
 if (*cp == ")
 LocalInclude = 1;

 else if (*cp == <')
 LocalInclude = 0;
 else
 error(BADPREPROCERR);
 cp++;
 /* ---- extract the file name ---- */
 ExtractWord(Word, &cp, ".$_\\");
 if (*cp != (LocalInclude ? " : >'))
 error(BADPREPROCERR);
 /* ---- build path to included file ---- */
 if (!LocalInclude) {
 unsigned char *pp;
 strcpy(FilePath, _argv[0]);
 pp = strrchr(FilePath, \\');
 if (pp != NULL)
 *(pp+1) = \0';
 }
 strcat(FilePath, Word);
 /* --- test to see if the file was already included --- */
 ThisFile = FirstFile;
 while (ThisFile != NULL) {
 if (stricmp(Word, ThisFile->fname) == 0)
 return;
 ThisFile = ThisFile->NextFile;
 }
 /* ---- add to list of included files --- */
 ThisFile = getmem(sizeof(SRCFILE));
 ThisFile->fname = getmem(strlen(Word)+1);
 strcpy(ThisFile->fname, Word);
 if (LastFile != NULL)
 LastFile->NextFile = ThisFile;
 ThisFile->NextFile = NULL;
 LastFile = ThisFile;
 if (FirstFile == NULL)
 FirstFile = ThisFile;
 /* ----- get file size ----- */
 stat(FilePath, &sb);
 /* - save context of file currently being preprocessed - */
 holdip = Ip;
 holdcount = Ctx.CurrLineno;
 holdfileno = Ctx.CurrFileno;
 /* --- file/line numbers for #included file --- */
 Ctx.CurrFileno = ++FileCount;
 Ctx.CurrLineno = 0;
 /* -------- open the #included file ------ */
 if ((fp = fopen(FilePath, "rt")) == NULL)
 error(INCLUDEERR);
 /* ---- allocate a buffer and read it in ---- */
 Ip = IncludeIp = getmem(sb.st_size+1);
 fread(Ip, sb.st_size, 1, fp);
 fclose(fp);
 /* ----- preprocess the #included file ------ */
 PreProcess();
 free(Ip);
 IncludeIp = NULL;
 /* restore context of file previously being preprocessed */
 Ctx.CurrFileno = holdfileno;
 Ctx.CurrLineno = holdcount;
 Ip = holdip;

 ThisFile = holdfile;
}
/* ---- delete files from the file list ---- */
void DeleteFileList(void)
{
 ThisFile = FirstFile;
 while (ThisFile != NULL) {
 SRCFILE *sf = ThisFile;
 free(ThisFile->fname);
 ThisFile = ThisFile->NextFile;
 free(sf);
 }
 FirstFile = LastFile = NULL;
 FileCount = 0;
}
/* -------- #if preprocessing token -------- */
static void If(unsigned char *cp)
{
 IfLevel++;
 if (!Skipping) {
 if (MacroExpression(&cp) == 0)
 Skipping = IfLevel;
 }
}
/* -------- #ifdef preprocessing token -------- */
static void IfDef(unsigned char *cp)
{
 IfLevel++;
 if (!Skipping) {
 bypassWhite(&cp);
 ExtractWord(Word, &cp, "_");
 if (FindMacro(Word) == NULL)
 Skipping = IfLevel;
 }
}
/* -------- #ifndef preprocessing token -------- */
static void IfnDef(unsigned char *cp)
{
 IfLevel++;
 if (!Skipping) {
 bypassWhite(&cp);
 ExtractWord(Word, &cp, "_");
 if (FindMacro(Word) != NULL)
 Skipping = IfLevel;
 }
}
/* -------- #else preprocessing token -------- */
static void Else()
{
 if (!Skipping && IfLevel == 0)
 error(ELSEERR);
 if (Skipping == IfLevel)
 Skipping = 0;
 else if (Skipping == 0)
 Skipping = IfLevel;
}
/* -------- #elif preprocessing token -------- */
static void Elif(unsigned char *cp)
{

 if (IfLevel == 0)
 error(ELIFERR);
 if (Skipping == IfLevel)
 Skipping = (MacroExpression(&cp) == 0);
}
/* -------- #endif preprocessing token -------- */
static void Endif()
{
 if (!Skipping && IfLevel == 0)
 error(ENDIFERR);
 if (Skipping == IfLevel)
 Skipping = 0;
 --IfLevel;
}
/* ----- write a preprocessed line to output ----- */
static void OutputLine()
{
 unsigned char *cp = Line;
 unsigned char lastcp = 0;
 while (isSpace(*cp))
 cp++;
 if (*cp != \n') {
 char eol[20];
 sprintf(eol, "\n/*%d:%d*/", Ctx.CurrFileno, Ctx.CurrLineno);
 WriteWord(eol);
 }
 while (*cp && *cp != \n') {
 if (isSpace(*cp)) {
 while (isSpace(*cp))
 cp++;
 if (alphanum(*cp) && alphanum(lastcp))
 WriteChar( );
 }
 if (alphanum(*cp)) {
 ResolveMacro(Word, &cp);
 WriteWord(Word);
 lastcp = x';
 continue;
 }
 if (*cp == /' && *(cp+1) == *') {
 int inComment = 1;
 cp += 2;
 while (inComment) {
 while (*cp && *cp != \n') {
 if (*cp == *' && *(cp+1) == /') {
 cp += 2;
 inComment = 0;
 break;
 }
 cp++;
 }
 if (inComment) {
 lastcp =  ;
 if (ReadString() == 0)
 error(UNTERMCOMMENT);
 cp = Line;
 }
 }
 continue;

 }
 else if (*cp == ") {
 WriteChar(*cp++);
 while (*cp != ") {
 if (*cp == \n' *cp == \0')
 error(UNTERMSTRERR);
 WriteChar(*cp++);
 }
 }
 lastcp = *cp++;
 WriteChar(lastcp);
 }
}
/* ----- write single character to output ---- */
static void WriteChar(unsigned char c)
{
 *Op++ = c;
}
/* ----- write a null-terminated word to output ----- */
static void WriteWord(unsigned char *s)
{
 int lastch = 0;
 while (*s) {
 if (*s == ") {
 /* --- the word has a string literal --- */
 do
 WriteChar(*s++);
 while (*s && *s != ");
 if (*s)
 WriteChar(*s++);
 continue;
 }
 if (isSpace(*s)) {
 /* --- white space --- */
 while (isSpace(*s))
 s++;
 /* --- insert one if char literal or id id --- */
 if (lastch == \'' 
 (alphanum(lastch) && alphanum(*s)))
 WriteChar( );
 }
 lastch = *s;
 WriteChar(*s++);
 }
}
/* ------ read a line from input ---- */
static int ReadString()
{
 unsigned char *lp;
 Ctx.CurrLineno++;
 if (*Ip) {
 int len;
 /* --- compute the line length --- */
 lp = strchr(Ip, \n');
 if (lp != NULL)
 len = lp - Ip + 2;
 else
 len = strlen(Ip)+1;
 if (len) {

 free(Line);
 Line = getmem(len);
 lp = Line;
 while ((*lp++ = *Ip++) != \n')
 if (*(lp-1) == \0')
 break;
 if (*(lp-1) == \n')
 *lp = \0';
 return 1;
 }
 }
 return 0;
}
/* ----- find file name from file number ---- */
char *SrcFileName(int fileno)
{
 ThisFile = FirstFile;
 while (ThisFile != NULL && --fileno)
 ThisFile = ThisFile->NextFile;
 return ThisFile ? ThisFile->fname : NULL;
}

[LISTING TWO]

/* ------- preproc.h -------- */

#ifndef PREPROC_H
#define PREPROC_H

/* ---- #define macro table ---- */
typedef struct MacroTbl {
 unsigned char *id; /* macro identification */
 unsigned char *val; /* macro value */
 int isMacro; /* true if () macro */
 unsigned char parms; /* number of parameters */
 struct MacroTbl *NextMacro;
} MACRO;

extern int MacroCount;

#define isSpace(c) \
 ( c ==   c == \t' c == \t'+128 c == \f'+128)
#define MAXMACROLENGTH 2048

int MacroExpression(unsigned char **);
int ResolveMacro(unsigned char *, unsigned char **);
MACRO *FindMacro(unsigned char*);
void ExtractWord(unsigned char *, unsigned char **, unsigned char *);
#endif

End Listings











June, 1994
ALGORITHM ALLEY


Fractal Rulers




Tom Swan


With the growing popularity of GUIs, on-screen rulers are becoming standard
elements in word processing, drawing, and other software. But displaying a
true-to-life image of a ruler on a graphics screen (see Figure 1) is not as
simple as it may seem. The key to discovering an efficient algorithm is to
realize that a ruler's markings are a kind of fractal--they exhibit the
self-similar characteristics of fractal geometry in which any section of an
image looks like any other regardless of magnification. A ruler's markings are
literally scale-symmetric, and drawing them on a computer screen naturally
leads to a recursive algorithm with many practical applications.
A couple of years ago, I wrote a function to display a ruler in a Windows
program for Borland's ObjectWindows (OWL) C++ and Pascal videos. I wasn't
entirely happy with that code, so, when revising the program for OWL 2.0, I
decided to upgrade the ruler-display procedure. I began with a recursive
implementation similar to one in Robert Sedgewick's Algorithms in C++. I
improved that code, then, using recursion-removal techniques, I wrote several
new versions, finally arriving at a nonrecursive method that can easily be
adapted to any programming language. The set of functions also demonstrates
valuable techniques for removing recursion, which you might need, for example,
when implementing recursive algorithms in a nonrecursive programming language
such as assembly.


Recursive Rulers


Example 1 is my improved recursive algorithm in Pascal for drawing a ruler's
markings. The method differs from Sedgewick's in two ways. First, I use a
Level parameter, which when equal to 0, ends the procedure's recursions. (The
original algorithm stops calling itself when the marker height equals 0,
making it difficult to develop a program that can display different levels of
markings of the same relative sizes.) Second, I throw an exception if the
marker height H becomes 0 or less before the procedure reaches its deepest
recursion. In place of Throw, you could instead halt the program or call an
Error function. To use the procedure, you need to define several global
values--a constant smallMarkerSize equal to the largest size of a ruler's
submarkings, another constant smallMarkerIncr equal to the difference in size
between submarkers, a variable Top for the ruler's top-border coordinate, and
a Line (X1, Y1, X2, Y2) function. The method works by repeatedly finding the
midpoint between L and R, drawing a line at that point, then calling itself
recursively one time each for the left- and right-hand subdivisions--an
approach that closely resembles recursive divide-and-conquer algorithms for
tree traversal, sorting, and other well-known programming techniques.


Removing Recursion


Although recursion usually makes algorithms easier to understand, it isn't
always desirable. A procedure that calls itself wastes stack space and can
slow performance by reexecuting entry and exit code added by the compiler.
Removing recursion, however, is a painstaking chore that requires the utmost
care. To master the process, it helps to follow the step-by-step stages of an
algorithm such as Ruler in its transformation from a recursive procedure to a
nonrecursive metamorphosis.
Example 2 shows the first such stage, which makes only one recursive call. I
removed the second recursive statement from Example 1 by using a technique
called "end-recursion removal." In general, when a recursive procedure's last
statement calls that procedure, you can replace the statement by performing
two simple operations: assigning the recursive statement's argument values to
the procedure's parameters and using a Goto to restart the procedure from the
top. In fact, a smart optimizing compiler should be able to remove end
recursions automatically. After all, the compiler already generates code that
pushes parameters onto the stack and calls the subroutine--it could just as
easily reassign those values and jump to the subroutine's beginning. I don't
know why more compilers don't perform end-recursion removal, but every
compiler should offer this optimization.
Goto statements make me uneasy, but they are easily replaced with structured
loops. As in Example 2, the Goto-less procedure in Example 3 removes the
original's end recursion but uses a While loop in place of Goto. If you can
derive the structured code directly from the original, go ahead, but I find it
easier to waddle through the intermediate step. Removing recursions is one of
the few practical uses I've found for the notorious Goto.
Removing a recursive call from the middle of a procedure is much more
difficult than removing an end recursion. The trick is to think like a
compiler, which generates code to push variables onto the stack before
recursively calling the subroutine. In this case, however, you can't simply
assign a few values and execute a Goto. You also have to add code to handle
the processes that would occur after the original procedure's recursions
unwind. Usually, the best way to write that code is to push parameters and
local variables onto a stack variable. (You can implement the stack however
you wish--as an array, for example, or a list.) Upon reaching the innermost
loop--analogous to reaching the final recursion--pop the saved parameters and
variables from the stack and restart the function. That description may seem
simple enough, but writing the code to remove an inner recursion can be
exceedingly difficult.
Again, I find it easier to accomplish the task by using unstructured code.
Starting with Example 2, for instance, I derived Example 4, in which Gotos
replace all recursive function calls. The code may seem highly obscure at this
stage. I added a third Goto label, RulerRestart, which marks the processes
that the original version executes after each recursion unwinds. If the stack
is not empty at the end of the procedure, a Goto statement jumps to
RulerRestart, which pops the stack and restarts the procedure from its
beginning--exactly what the compiler-generated code does in the original
recursive design. For illustration, I also inserted obviously unnecessary
statements such as the assignment of L to L, simulating the code a compiler
generates for pushing variables onto the stack.
All Gotos and no Whiles is about as much fun as all work and no play, so let's
get rid of all those ugly labels and jumps. Example 5 replaces the Gotos in
Example 4 with structured While and If statements for a fully nonrecursive
version of the original procedure. Here again, I left in various
inefficiencies, simulating the code a compiler might generate. As a general
rule, when removing recursion, it's best to postpone optimizations for the
final result. Inefficiencies don't matter at this stage. The new procedure in
Example 5 tests whether the stack is empty at the procedure's beginning, so
the first statement pushes parameters onto the stack before the While loop
begins. Because local variable M isn't initialized at this point, the
statement pushes a literal 0 in its place.
The nonrecursive, Goto-less code is nearing its final stage, and at this
point, a useful trick for optimizing the procedure is to examine Push and Pop
statements in the hope of weeding out null operations. For instance, Example 5
pushes L onto the stack, then later pops L, but immediately assigns it the
value of M. Obviously, an improved procedure can simply push M and delete the
assignment. Similarly, the procedure pushes and pops Level, but decreases it
by one after each such operation. Instead, the code may as well first decrease
Level, push that value, and delete the second subtraction. Parameter H is also
reduced in two places by smallMarkerIncr, and the value can therefore be
reduced before it is pushed. By making these optimizations, you can delete the
three assignments in the If statement, leaving a Pop statement followed by a
Push of the same values--another obvious null operation that you also can
remove. In fact, the entire If statement isn't needed at all! Deleting other
unnecessary statements such as the assignment of L to L leads to the final
nonrecursive, optimized Ruler procedure in Example 6.
The final procedure lacks the intuitive simplicity of the original in Example
1, but the on-screen results are the same. I didn't profile any of the code
listed here, so I can't say whether the nonrecursive version runs faster, but
removing recursion usually produces a speed boost. If anyone cares to analyze
the procedures, I'll list the results in a future "Algorithm Alley."


Listings


Listings One through Four (page 148) implement Examples 1 through 6 in C++ for
Windows 3.1 or Windows NT. The final program displays the window in Figure 1,
and it also shows miscellaneous steps for drawing the ruler's outline,
labeling inch markings, and so on. I included each intermediate stage of the
Ruler algorithm as comments so you can compare them. The program requires
Borland C++ 4.0 with ObjectWindows 2.0. If you have the files on disk, open
the .IDE project file and compile. You might have to adjust directory
pathnames. If you are typing the listings, create a 16- or 32-bit Windows
project for RULER.CPP, RULER.RC, and RULER.DEF.
When you run the program, you'll notice that the ruler is not drawn to scale.
I ignored that problem to keep the algorithms as general as possible, but you
can easily produce a real-size ruler by passing MM_LOENGLISH to the Windows
SetMapMode function in reference to a display context. If you make that
change, you also have to negate y coordinates because, with a mapping mode in
effect, coordinate 0,0 is located at lower left rather than its normal
pixel-based position at upper left. The printed ruler should be close to real
size, but because Windows expands displayed graphics to keep text readable,
the on-screen image still appears larger than life. If you need a foot-long
ruler that actually measures 12 inches, you'll have to perform your own
display mapping--but that's a subject for another time.


Your Turn


Next month, more algorithms and techniques in Pascal and C++. Meanwhile, send
your favorite algorithms, tools, and comments to me in care of DDJ.
 Figure 1: On-screen ruler.

Example 1: Pascal code for Algorithm #20: Ruler (recursive version).
procedure Ruler(L, R, H, Level: Integer);
var
 M: Integer;
begin
 if H <= 0 then
 Throw(Levels incomplete');
 if Level > 0 then
 begin
 M := (L + R) DIV 2;

 Line(M, Top, M, Top + H);
 Ruler(L, M, H - smallMarkerIncr, Level - 1);
 Ruler(M, R, H - smallMarkerIncr, Level - 1);
 end;
end;


Example 2: Pascal code for Algorithm #20: Ruler (end recursion removed using
Goto).
procedure Ruler(L, R, H, Level: Integer);
label
 RulerStart, RulerEnd;
var
 M: Integer;
begin
RulerStart:
 if H <= 0 then
 Throw(Levels incomplete');
 if Level <= 0 then
 goto RulerEnd;
 M := (L + R) DIV 2;
 Line(M, Top, M, Top + H);
 Ruler(L, M, H - smallMarkerIncr, Level - 1);
 L := M;
 H := H - smallMarkerIncr;
 Level := Level - 1;
 goto RulerStart;
RulerEnd:
end;

Example 3: Pascal code for Algorithm #20: Ruler (end recursion removed using
While).
procedure Ruler(L, R, H, Level: Integer);
var
 M: Integer;
begin
 while level > 0 do
 begin
 if H <= 0 then
 Throw(Levels incomplete');
 M := (L + R) DIV 2;
 Line(M, Top, M, Top + H);
 Ruler(L, M, H - smallMarkerIncr, Level - 1);
 L := M;
 Level := Level - 1;
 H := H - smallMarkerIncr;
 end;
end;

Example 4: Pascal code for Algorithm #20: Ruler (all recursion removed using
Goto).
procedure Ruler(L, R, H, Level: Integer);
label
 RulerStart, RulerRestart, RulerEnd;
var
 M: Integer;
begin
RulerStart:
 if Level = 0 then
 goto RulerEnd;
 if H <= 0 then
 Throw(Levels incomplete');

 M := (L + R) DIV 2;
 Line(M, Top, M, Top + H);
 Push(L, R, M, H, Level);
 L := L;
 R := M;
 H := H - smallMarkerIncr;
 Level := Level - 1;
 goto RulerStart;
RulerRestart:
 Pop(L, R, M, H, Level);
 L := M;
 Level := Level - 1;
 H := H - smallMarkerIncr;
 goto RulerStart;
RulerEnd:
 if not StackEmpty then
 goto RulerRestart;
end;


Example 5: Pascal code for Algorithm #20: Ruler (all recursion removed using
While and If).
procedure Ruler(L, R, H, Level: Integer);
var
 M: Integer;
begin
 Push(L, R, 0, H, Level);
 while not StackEmpty do
 begin
 Pop(L, R, M, H, Level);
 while Level > 0 do
 begin
 if H <= 0 then
 Throw(Levels incomplete');
 M := (L + R) DIV 2;
 Line(M, Top, M, Top + H);
 Push(L, R, M, H, Level);
 L := L;
 R := M;
 H := H - smallMarkerIncr;
 Level := Level - 1;
 end;
 if not StackEmpty then
 begin
 Pop(L, R, M, H, Level);
 L := M;
 Level := Level - 1;
 H := H - smallMarkerIncr;
 Push(L, R, M, H, Level);
 end;
 end;
end;




Example 6: Pascal code for Algorithm #20: Ruler (final optimized version).
procedure Ruler(L, R, H, Level: Integer);
var
 M: Integer;

begin
 Push(L, R, 0, H, Level);
 while not StackEmpty do
 begin
 Pop(L, R, M, H, Level);
 while Level > 0 do
 begin
 if H <= 0 then
 Throw(Levels incomplete');
 M := (L + R) DIV 2;
 Line(M, Top, M, Top + H);
 H := H - smallMarkerIncr;
 Level := Level - 1;
 Push(M, R, M, H, Level);
 R := M;
 end;
 end;
end;

[LISTING ONE] (Text begins on page 117.)

EXETYPE WINDOWS
CODE PRELOAD MOVEABLE DISCARDABLE
DATA PRELOAD MOVEABLE MULTIPLE
HEAPSIZE 4096
STACKSIZE 5120

[LISTING TWO]

// ruler.rh -- Resource header file
#define ID_MENU 100

[LISTING THREE]

#include <owl\window.rh>
#include "ruler.rh"

ID_MENU MENU
BEGIN
 POPUP "&Demo"
 BEGIN
 MENUITEM "E&xit", CM_EXIT
 END
END

[LISTING FOUR]

/* =========================================================== *\
** ruler.cpp -- Ruler algorithms implemented for Windows 3.1 **
** Requires Borland C++ 4.0 and ObjectWindows 2.0 **
** Copyright (c) 1994 by Tom Swan. All rights reserved. **
\* =========================================================== */
#include <owl\applicat.h>
#include <owl\framewin.h>
#include <owl\scroller.h>
#include <owl\dc.h>
#include <classlib\stacks.h>
#include <string.h>
#include <cstring.h>

#include <stdio.h>
#pragma hdrstop
#include "ruler.rh"
// === The application's main window ===
class TRulerWin: public TFrameWindow {
public:
 TRulerWin(TWindow* parent, const char far* title);
protected:
 BOOL StackEmpty()
 { return stack.IsEmpty(); }
 void Line(int x1, int y1, int x2, int y2)
 { dc->MoveTo(x1, y1); dc->LineTo(x2, y2); }
 void Rectangle(int left, int top, int right, int bottom)
 { dc->Rectangle(left, top, right, bottom); }
 void TextAt(int x, int y, const char *s)
 { dc->TextOut(x, y, s, strlen(s)); }
 void InchRuler(int xOutline, int yOutline, int numInches);
 void Paint(TDC& paintDC, BOOL erase, TRect& rect);
 void Ruler(int l, int r, int h, int level);
 void Push(int l, int r, int m, int h, int level);
 void Pop(int& l, int& r, int& m, int& h, int& level);
private:
 TDC* dc; // Device context for member functions
 int unitsPerInch; // Display scale
 int numDivisions; // Number of ruler marker divisions
 int largeMarkerSize; // Size of main markers at labels
 int smallMarkerIncr; // Size of sub marker increments
 int smallMarkerSize; // Size of largest sub marker
 int left, top, right, bottom; // Ruler outline coordinates
 TStackAsVector<int> stack; // Push-down stack
};
TRulerWin::TRulerWin(TWindow* parent, const char far* title)
 : TFrameWindow(parent, title),
 TWindow(parent, title)
{
 AssignMenu(ID_MENU);
 Attr.Style = WS_VSCROLL WS_HSCROLL;
 Scroller = new TScroller(this, 1, 1, 2000, 2000);
 dc = 0; // Set pointer to null
 unitsPerInch = 100; // 1 pixel == 1/100 inch
 numDivisions = 4; // Recursion level (i.e. to 1/16-inch)
 smallMarkerIncr = 4; // In units per inch (i.e. 0.04 inch)
 left = top = right = bottom = 0; // Ruler coordinates
 smallMarkerSize = // Size of largest sub marker
 smallMarkerIncr + (smallMarkerIncr * numDivisions);
 largeMarkerSize = // Size of markers at digit labels
 smallMarkerSize + (smallMarkerIncr * 2);
}
// Display ruler
void
TRulerWin::InchRuler(int xOutline, int yOutline, int numInches)
{
 int i; // For-loop control variable
 int x, y; // Working coordinate variables
 char s[4]; // Holds ruler digits in text form
// Initialize and draw ruler outline
 left = xOutline;
 top = yOutline;
 right = left + (numInches * unitsPerInch);

 bottom = top + (largeMarkerSize * 3);
 Rectangle(left, top, right, bottom);
// Label main ruler markers at every inch
 y = top + largeMarkerSize;
 x = left;
 for (i = 1; i < numInches; i++) {
 x += unitsPerInch;
 Line(x, top, x, y);
 sprintf(s, "%d", i);
 TextAt(x, y, s);
 }
// Call Ruler() function to display ruler markings
 x = left;
 for (i = 0; i < numInches; i++) {
 try {
 Ruler(x, x + unitsPerInch, smallMarkerSize, numDivisions);
 }
 catch (const char *msg) {
 throw TXOwl(msg);
 }
 x += unitsPerInch;
 }
}
/* // Ruler implementation #1 (recursive version)
void
TRulerWin::Ruler(int l, int r, int h, int level)
{
 int m;
 if (h <= 0)
 throw "Levels incomplete";
 if (level > 0) {
 m = (l + r) / 2;
 Line(m, top, m, top + h);
 Ruler(l, m, h - smallMarkerIncr, level - 1);
 Ruler(m, r, h - smallMarkerIncr, level - 1);
 }
}*/
/* // Ruler implementation #2 (end-recursion removed using goto)
void
TRulerWin::Ruler(int l, int r, int h, int level)
{
 int m;
RulerStart:
 if (h <= 0)
 throw "Levels incomplete";
 if (level <= 0)
 goto RulerEnd;
 m = (l + r) / 2;
 Line(m, top, m, top + h);
 Ruler(l, m, h - smallMarkerIncr, level - 1);
 l = m;
 level--;
 h -= smallMarkerIncr;
 goto RulerStart;
RulerEnd:
}*/
/* // Ruler implementation #3 (end-recursion removed using while)
void
TRulerWin::Ruler(int l, int r, int h, int level)

{
 int m;
 while (level > 0) {
 if (h <= 0)
 throw "Levels incomplete";
 m = (l + r) / 2;
 Line(m, top, m, top + h);
 Ruler(l, m, h - smallMarkerIncr, level - 1);
 l = m;
 level--;
 h -= smallMarkerIncr;
 }
}*/
/* // Ruler implementation #4a (all-recursion removed using goto)
// Derived from implementation #2
void
TRulerWin::Ruler(int l, int r, int h, int level)
{
 int m;
RulerStart:
 if (level == 0)
 goto RulerEnd;
 if (h <= 0)
 throw "Levels incomplete";
 m = (l + r) / 2;
 Line(m, top, m, top + h);
 Push(l, r, m, h, level);
 l = l;
 r = m;
 h -= smallMarkerIncr;
 level--;
 goto RulerStart;
RulerRestart:
 Pop(l, r, m, h, level);
 l = m;
 level--;
 h -= smallMarkerIncr;
 goto RulerStart;
RulerEnd:
 if (!StackEmpty())
 goto RulerRestart;
}*/
/* // Ruler implementation #4b (all recursion removed using while and goto)
// Because this intermediate version jumps into a while loop, it may not be
// allowed in all languages. Derived from implementation #3
void
TRulerWin::Ruler(int l, int r, int h, int level)
{
 int m;
RulerStart:
 while (level > 0) {
 if (h <= 0)
 throw "Levels incomplete";
 m = (l + r) / 2;
 Line(m, top, m, top + h);
 Push(l, r, m, h, level);
 l = l;
 r = m;
 h -= smallMarkerIncr;

 level--;
 goto RulerStart;
RulerRestart:
 Pop(l, r, m, h, level);
 l = m;
 level--;
 h -= smallMarkerIncr;
 }
 if (!StackEmpty())
 goto RulerRestart;
}*/
/* // Ruler implementation #5 (structured all-recursion removed)
// Non-optimized version
void
TRulerWin::Ruler(int l, int r, int h, int level)
{
 int m;
 Push(l, r, 0, h, level); // 0 == m, which is uninitialized
 while (!StackEmpty()) {
 Pop(l, r, m, h, level);
 while (level > 0) {
 if (h <= 0)
 throw "Levels incomplete";
 m = (l + r) / 2;
 Line(m, top, m, top + h);
 Push(l, r, m, h, level);
 l = l;
 r = m;
 h -= smallMarkerIncr;
 level--;
 }
 if (!StackEmpty()) {
 Pop(l, r, m, h, level);
 l = m;
 level--;
 h -= smallMarkerIncr;
 Push(l, r, m, h, level);
 }
 }
}*/
// Ruler implementation #6 (structured all-recursion removed)
// Final optimized version
void
TRulerWin::Ruler(int l, int r, int h, int level)
{
 int m;
 Push(l, r, 0, h, level); // 0 == m, which is uninitialized
 while (!StackEmpty()) {
 Pop(l, r, m, h, level);
 while (level > 0) {
 if (h <= 0)
 throw "Levels incomplete";
 m = (l + r) / 2;
 Line(m, top, m, top + h);
 h -= smallMarkerIncr;
 level--;
 Push(m, r, m, h, level);
 r = m;
 }

 }
}
// Push integer arguments onto stack
void
TRulerWin::Push(int l, int r, int m, int h, int level)
{
 stack.Push(l);
 stack.Push(r);
 stack.Push(m);
 stack.Push(h);
 stack.Push(level);
}
// Pop integer arguments from stack
void
TRulerWin::Pop(int& l, int& r, int& m, int& h, int& level)
{
 level = stack.Pop();
 h = stack.Pop();
 m = stack.Pop();
 r = stack.Pop();
 l = stack.Pop();
}
// Respond to WM_PAINT messages
void
TRulerWin::Paint(TDC& paintDC, BOOL /*erase*/, TRect& /*rect*/)
{
 dc = &paintDC; // Address paintDC with object's private dc
 InchRuler(3, 3, 12); // x == 3, y == 3, length = 12 inches
}
// === The application class ===
class TRulerApp: public TApplication {
public:
 TRulerApp(const char far* name)
 : TApplication(name) {}
 void InitMainWindow();
};
// Initialize the program's main window
void
TRulerApp::InitMainWindow()
{
 EnableCtl3d(); // Use Windows 3D controls and dialogs
 EnableBWCC(); // Use Borland Custom Controls
 MainWindow = new TRulerWin(0, "Ruler");
}
#pragma argsused
// Main program
int
OwlMain(int argc, char* argv[])
{
 TRulerApp app("RulerApp");
 return app.Run();
}
End Listings









June, 1994
UNDOCUMENTED CORNER


OS/2 for Windows: IBM's Patch-O-Rama




Art Rothstein, Roger Alley, and others


Art Rothstein and Roger Alley are software developers in the San Francisco
area. They can be contacted on CompuServe at 70353,47 and 71163,2407,
respectively.




Introduction




by Andrew Schulman


The commercial viability of IBM's OS/2 depends on its ability to run Windows
applications. For some time, to run Win apps, every copy of OS/2 included a
modified version of Windows, WIN-OS/2. IBM's agreement with Microsoft gave it
source code for Windows 3.10; IBM modified the Windows 3.10 source to create
WIN-OS/2.
Modifications are necessary because the normal retail version of Windows can't
run as a well-behaved client under OS/2. While Microsoft recommends that Win
apps follow certain rules, Windows itself observes almost no rules at all--it
is what the industry (including Microsoft) calls an "ill-behaved application."
For example, while Windows provides DOS protected-mode interface (DPMI)
services--it is a DPMI "host"--the Windows kernel does not itself consistently
use DPMI services--it is not a DPMI "client". Instead, it directly manipulates
protected-mode structures, such as the local descriptor table (LDT). There has
been some speculation as to why Windows is not a DPMIclient: Is it only
performance considerations? In any case, IBM had to modify the Windows source
code, in part, to makeWIN-OS/2 a well-behaved DPMIclient.
By including a version of Windows with OS/2, IBM paid Microsoft royalties for
every copy of OS/2 sold. This was not only expensive, but gave Microsoft the
inside on OS/2 sales. Given the competition between Microsoft's Windows and
IBM's OS/2, IBM was in an awkward position.
Since so many PCs already have Windows, in late 1993 IBM came out with "OS/2
for Windows" (OS2fW)--OS/2 without Windows--which uses the Windows already on
your PC to run Win apps under OS/2. This is a neat solution: The user gets to
run Win apps, and IBM doesn't have to pay Microsoft for including this ability
in OS/2.
But, since Windows can't run as is under OS/2, OS2fW must modify the copy of
Windows a user has installed on the machine. How? That's the subject of this
month's "Undocumented Corner." The short answer: IBM patches Windows in
memory. The in-memory patches have the same net effect as the source-code
changes made in WIN-OS/2.
Patching someone else's code is often thought to be the ultimate hack.
Quarterdeck patched Windows 3.0 Standard mode to force VCPI clienthood, Adobe
patched Windows to make it use ATM, and, naturally, Windows itself massively
patches MS-DOS. Now IBM is patching Windows.
Patching requires reverse engineering. Even with access to the source code,
IBM had to reverse engineer something, to find the offsets where the patches
go, and the number of bytes to patch. This is "deep" reverse engineering, to
use Microsoft's terminology from the recent Stac vs. Microsoft case (see the
May 1994 "Undocumented Corner").
IBM's patches are custom-tailored for Windows 3.10. Microsoft has recently
come out with Windows 3.11. There is also a downloadable patch file for
turning a 3.10 system into 3.11. Guess what: OS2fW doesn't work with Windows
3.11.
OS2fW will install without complaint over Windows 3.11, but when you attempt
to load a 3.11 session, you get an OS/2 SYS3176 illegal-instruction exception
("A program in this session encountered a problem and cannot continue"), and
the session fails. The failure is not graceful, and the error message does not
refer to the Windows version number.
Upon the discovery of OS2fW/Windows 3.11 incompatibility, OS/2 enthusiasts
immediately claimed that Microsoft deliberately broke OS2fW. Apparently, some
initial test versions of 3.11 did not break OS2fW, only the final released
version did. IBM's Dave Whittle stated on CompuServe's Canopus forum that
Windows 3.11's "primary feature is that it renders Windows unable to run under
OS/2 for Windows."
Microsoft says that OS2fW depends upon fixed locations in Windows 3.10, so
that even the most trivial bug fixes in Windows would throw off OS2fW patches.
In response, IBM representatives claimed that the OS2fW uses "smart" patching
that isn't dependent on fixed locations of code. As you'll see here, the OS2fW
patch engine unfortunately is not "smart" and does depend heavily (on the
order of 100 separate patches) upon fixed locations in Windows 3.10. As noted
by Dan Gillmor of the Detroit Free Press, "I haven't seen any evidence that
IBM's patch contained that much intelligence--just IBM's assurances that it
did."
All of this was discussed in the Undocumented Corner area in the DDJ Forum (GO
DDJFORUM) on CompuServe. Because this online conversation yielded so much
solid information, this month we have an epistolary "Undocumented Corner."
We'll just listen in from when Larry Seltzer of PC Week first raised the
question, to when Art Rothstein and Roger Alley figure out exactly how OS2fW
patches Windows. This is a heavily edited transcript of the online
conversation; the entire thread is available electronically (see
"Availability," page 3).


How's It Work?


19-Feb-94 17:52:44 Fm: Larry Seltzer [PCWLabs] There is a patch available for
making a Win 3.1 into a Win 3.11. The filename is WW0981.EXE and it's a hair
under 600,000 bytes.
I heard from someone whose brother knows a guy who installed it (not
literally, it's just that I haven't seen it myself) that one of the files the
patch modifies is KRNL386.EXE. Yet, Microsoft claims Windows 3.11 is just a
"driver revision." Spencer Katt got a tip this week that the patch breaks
OS2fW. Now mind you, OS2fW isn't Microsoft's problem, but the situation would
be pretty awful for OS2fW if the rumor's true. After all, MS says all the OEMs
have 3.11.
19-Feb-94 22:31:10 Fm: Andrew Schulman Remember that, even if Windows 3.11
"breaks" OS2fW, it doesn't mean that there was any attempt to "thwart" OS2fW.
Not every incompatibility is a deliberate incompatibility!
I'd feel a lot more sure about all of this if I first understood how OS2fW
works. What aspects of Windows 3.1 does it rely on? Does it patch, or what?
How does it differ from the old WIN-OS/2 modifcations to Windows source?
23-Feb-94 01:41:51 Fm: Larry Seltzer [PCWLabs] I'm amazed, but it appears as
if OS2fW makes no modifications to KRNL386.EXE.
23-Feb-94 22:47:57 Fm: Tim Farley Larry, have you checked both statically and
dynamically? It might not patch the file on disk, but instead, as it loads
into memory. QEMM used to do this to support Standard mode in Win 3.0.
24-Feb-94 09:18:41 Fm: Andrew Schulman The bit about QEMM patching Windows 3.0
Standard mode is important. Geoff Chappell describes it on pp. 562--565 of DOS
Internals. Briefly: "QEMM modifies the DOSX code so that it complies with VCPI
(one Quarterdeck spokesperson has apparently referred to this as we ram VCPI
down Windows' throat')."
I would think this is the kind of thing OS2fW would have to do, though in
their case it would be forcing KRNL386, etc. to be DPMI compatible. Windows of
course is a DPMI host, but it's not a DPMI client. (Sort of the opposite of
that guy in the "Hair Club for Men" commercials: "I'm not only the Hair Club
president, I'm also a client.")
The issue of Windows not being DPMI compliant is discussed in Undocumented
DOS, second edition, pp. 27 and 142: "the kernel bypasses DPMI. This means
that IBM must hack Windows to create a slightly [??] different WIN-OS/2". So I
wonder how OS2fW works.
25-Feb-94 15:40:15 Fm: Larry Seltzer [PCWLabs] If IBM is patching Windows in
memory, MS would have to freeze Win 3.1 forever not to break OS2fW.


Finally, Some Answers



10-Mar-94 05:35:20 Fm: Art Rothstein I installed OS2fW on a machine tonight
and looked around a little. Windows is started from WINOS2.COM, supplied by
IBM. Some "shallow" disassembly <grin> suggested that WINOS2 loads KRNL386.EXE
via DOS function 4B01h [Load but don't execute] and patches it in memory.
There are eight small DLLs with .SCR [script] suffixes. Each corresponds to a
Windows DLL. There is USERS.SCR, GDIS.SCR, MOUSES.SCR, etc. Each one's LibMain
calls the same entry point in FIXMGR.DLL.
11-Mar-94 01:14:18 Fm: Art Rothstein I did a little more work, dumping the
kernel's code segments from memory in Windows 3.1 and in OS2fW, then examining
the differences. By my count there are 60 places [in KRNL386 alone] where the
code is patched to do a far jump, probably to a patch module. Most of the
patches are very dependent on the exact code, e.g., the number of residual
bytes to NOP after the far JMP. These patches could not survive more than a
trivial change to KRNL386.EXE. It's no wonder they are incompatible with
Windows 3.11.
15-Mar-94 10:57:50 Fm: Art Rothstein I have uploaded OS2DIF.ZIP in DDJFORUM
data library 3. There are four DIFF.B0? files in this archive, one for each
global heap code segment of KERNEL. We wrote a program that used TOOLHELP
functions to find each such code segment and dump it to disk. We ran the
program under retail Windows 3.1 and under OS2fW. We used the DOS FC command
to compare the resulting files byte for byte. For example, [LISTING ONE, PAGE
130] from WIN.B01, is an in-memory comparison of two fragments from KRNL386,
under Windows 3.10 vs. OS/2 for Windows.
We used Sourcer to disassemble KRNL386.EXE and started matching disassembled
instructions to patched instructions. We have almost completed this exercise
for segment 1. Most patches take the form of a far-jump instruction, a 5-byte
sequence beginning with 0EAH. Segment 1 has 47 such patches. Segment 2 has 12.
Segments 3 and 4 have none.
Our current focus is to determine how the patches are applied, in particular
to understand the type of validation performed prior to application of a
patch. The code of interest appears to be at offset 0C55H in WINOS2.COM's code
segment. Among the data structures used are a module table, which has the
signature MH' at offset 26H; and a script table, which has the signature SH'
at offset 1AH. The eight .SCR files (COMMS, CONTROLS, GDIS, MOUSES, TIMERS,
USERS, VGAS and WINFILES), which are actually DLLs, contain the same
structures and appear to be processed by the same code in WINOS2.
15-Mar-94 20:32:31 Fm: Andrew Schulman Information Week (March 14) has an
article on OS/2 for Windows. Rogers Weed, Microsoft's marketing product
manager for Windows, is quoted saying (rightly, I think) that IBM should bear
the burden of future compatibility with Windows. "But Weed suggests that IBM
may not be up to the task. Since OS/2 relies on a fixed image of Windows in
memory to run correctly, Weed notes that any change to Windows will require
that IBM change OS2fW to keep up.' Because IBM, as of last September, no
longer has access to the Windows source code, it now must reverse engineer
Windows every time the code changes so its products can remain compatible."
It came out during Stac vs. Microsoft that IBM reverse engineered the preload
interface in MS-DOS 6--the same preload interface that Stac reverse
engineered, and that Microsoft claims is a trade secret. Now it turns out that
IBM is going to have to reverse engineer Windows, if they aren't doing so
already.
The obvious question is whether Microsoft will next go after IBM for doing the
exact same thing Stac did. IBM's patching of Windows obviously involves far
"deeper" reliance on specific Microsoft code than anything Stac did! [And IBM
couldn't get specific bytes and offsets by looking at the Windows 3.10 source
code.]
Secondly, it's interesting that IBM is engaged in reverse engineering, while
the company's lawyers are actively trying to have it outlawed.


Analyzing the Patches


17-Mar-94 17:21:13 Fm: Andrew Schulman Art, here are few quick comments, based
on your diff files, about what IBM is patching in the Windows kernel:
KRNL386 seg #1: The variable at 1.0032 (labelled data_0368 in the Sourcer
disassembly you show) [LISTING ONE] is a writeable selector to the LDT.
KRNL386 creates this using a call at 1.C3B0 to INT 2Fh function 168Ah (see
Pietrek, Windows Internals, pp. 16, 18, 90). In Undocumented Windows (entries
for GetSelectorBase and SetSelectorBase), this variable is called WIN_LDT. The
actual name in Windows, Pietrek explains, is very confusing: "GDTDsc." In any
case, we now know from your DIFF.B01 that OS2fW replaces some references to
WIN_LDT/GDTDsc with far jumps.
Your DIFF.B01 shows ten patches involving 1.0032. My KRNL386.LST shows many
more references than that: at least 32. And those are just the ones that
Sourcer has correctly resolved. There are many, many more that Sourcer didn't
link up properly. :-)
It's confusing that we don't see many more differences between 3.10 and OS2fW.
Maybe the WIN_LDT selector is still valid but read-only?
KRNL386 seg #1, offset 80E2h: They're NOPing out a call to 2F/1689, which is
the KERNEL idle call (Pietrek, pp. 418--420). KRNL386 generates a ton of these
calls, one every time it gets to the idle portion of the internal Reschedule
routine.
19-Mar-94 01:35:47 Fm: Roger Alley Description of areas being patched by OS/2:
I've mainly relied on Matt Pietrek's Windows Internals (WI). The following
function names are from that book, and the page number where a routine's
general description appears is noted.
Art's listing shows that cs:[0032] contains 0117h, probably a valid handle,
but the R/W status can't be determined. I'm pretty sure it's valid and can be
read. For example, 1.2261 is an unpatched function called from various
locations which uses cs:[0032] to read descriptor information.
Here are some of the patches: [Only a few of the patches Roger analyzed are
shown here; the full thread is available electronically; see page 3.]
1.13FC: In GlobalPageLock (1.13E9, WI p. 159), replacing the call to DPMI
function 4 (a supposedly obsolete [undocumented] function which locks the
pages of the selector in bx) with a jump to C7:155D. I guess IBM's DPMI did
not implement this function.
1.1E18: In (the undocumented) AllocSelectorArray (1.1DF5, WI p. 89). This
routine gets a sequential list of LDT selectors (via Get_Sel), and then sets
the DATA and PRESENT access-rights flags in each of their descriptors. The
patch replaces the instruction which sets the flags, and the following
instruction which moves bx to the next selector, with a jump to C7:1665. [In a
later message, Roger shows that the jumped-to replacement code calls DPMI int
31h function 9.]
1.2017: In Free_Sel (1.2009, WI p. 94). The patch is replacing a test of
GDTDsc (WIN_LDT, cs:[0032]) with a jump to C7:16B3. My guess is that 16B3 is
just setting the Z flag, as the next instruction (jz Free_to_DPMI) will jump
around the descriptor mucking.


More on the Patch Engine


19-Mar-94 13:20:45 Fm: Art Rothstein [It's been suggested that OS2fW might be
sensitive to changes in the size of Windows executable files.] The issue is
not the length of the file, but the length of the segments that will be
patched. The first script table for each module table contains a nonzero WORD
length at offset 4 if WINOS2.COM should validate the segment limit. At least
for GDI.EXE, USER.EXE, TIMER.DRV, COMM.DRV and WINFILE.EXE, the value in the
script table matches exactly the size of one of the module's code segments in
Retail 3.1, as reported by EXEHDR. Aware that code segments are allocated from
the global heap, the patch machine massages the value from the script table
before comparing it with the value returned by LSL for the target selector. If
the comparison is not exact, WINOS2.COM will not apply any of the patches in
that script table.
19-Mar-94 22:10:52 Fm: Art Rothstein WINOS2.COM loads KRNL386.EXE via DOS
function 4B01h, then executes its patch machine on a short script (from
DS:050B). The script copies 8 bytes of instructions from the kernel's
initialization code to a patch area in WINOS2.COM, then replaces these
instructions with a far jump to the patch area. To see the instructions
patched, run DEBUG KRNL386.EXE and U 75 L 8. At the end of the patch code is a
far jump back to the first instruction in the patched KRNL386 code.
This patch code, at CS:12FB in WINOS2.COM, executes the patch machine on
another short script (from DS:04BF). The effect of this patch is to insert
over 400 bytes of instructions after the ADD AX,10H in segment 1 of KRNL386.
The original kernel code appears to be initializing DPMI services. As part of
this lengthy patch, WINOS2.COM executes the patch machine on a long script
(from DS:17C7) that places 71 distinct jumps from the kernel to code in
WINOS2.COM.
One of the many patches applied from the DS:17C7 script is at offset B488 in
kernel segment 1, where the kernel processes parameters from the BOOT section
of SYSTEM.INI. The patch code loads FIXMGR.DLL if it is not already loaded. In
its LibInit routine, FIXMGR loads 8 DLLs. All the filenames have the SCR (as
in SCRipt) extension. The prefixes are GDIS, USERS, MOUSES, CONTROLS,
WINFILES, COMMS, TIMERS and VGAS. The LibInit of each routine calls the
REGISTERSCRIPT entry point in FIXMGR, which in turn calls a routine in
WINOS2.COM (its address was passed to FIXMGR's LibInit routine) that adds the
respective DLL's module table to the singly linked module table chain. Another
patch applied from the DS:17C7 script is at offset 759B in kernel segment 1
[in a subroutine used by FarSegmentLoad; WI p. 260]. This patch runs the
module table chain, looking for one that matches the current selector. If it
finds a matching module table, it runs the patch machine. [LISTING TWO, PAGE
130, SHOWS THE FORMAT OF THE SCRIPT TABLE, AND OF EACH PATCH BLOCK TYPE.]
As an example, [LISTING THREE, PAGE 130] shows the data for the first script
in the bootstrap process, from the script table at DS:050B in WINOS2.COM.


Analyzing More Patches


20-Mar-94 Fm: Roger Alley Some additional areas patched by OS/2. [Again, just
a few of Roger's descriptions for KRNL386 are shown here.]
1.214C: In PrestorChangeSelector (1.213E, WI p. 97). This patch is right at
the start of the routine--only the epilog, register saving, and ds loaded from
GDTDsc have been done. The code for loading es with GDTDsc, setting si to the
source descriptor, and then masking off the bottom three bits is replaced with
a jump to 16E5.
1.21C1: In AKA (1.21A3, WI p. 99). The code from "Save aliasSelector in BX"
through and including "if (isData) Turn on the CODE bit in the alias
descriptor" is replaced with a jump to C7:16E5. The code being replaced copies
and then manipulates a selector.
1.269F: In Get_Blotto (1.2674), which get the "blotto" selector, used to zero
out memory segments.
1.2735: In SetSelectorBase (1.2725, WI p. 97). The patch replaces a save and
subsequent load of ds (with GDTDsc) with a jump to C7:182A. [The jumped-to
code ends up calling DPMI INT 31h function 7 (Set Selector Base). This is
important, not only because OS/2 can't allow Windows to write to the LDT, but
also because testing by Art Rothstein showed that OS/2's implementation of
DPMI prevents certain operations allowed by Windows, such as mapping the
linear address of the Global Descriptor Table [GDT] into a program's address
space.]
1.277D: In SetSelectorLimit (1.276D, WI p. 95). The routine loads bx with the
selector, saves ds, and loads ds with GDTDsc. The next two instructions, which
strip the flag bits from bx and load ax with LOWORD(limit), are replaced with
a jump to C7:1839.
1.299D: In Get_Arena_Pointer32 (1.297C). This routine takes a handle and
returns the 32-bit offset to its arena (by looking it up in the Selector
Table--see WI, p. 110). The routine is allowed to construct the offset into
the Selector Table, and then the patch replaces the instructions which add in
the base of the Selector Table, and move the arena pointer to eax, with a jump
to C7:1880.
1.2A47: In xLockLinearRegion (1.2A33), called by GReAlloc. This routine takes
a linear address and size, and simply passes them to DPMI-Lock Linear Region.
The patch replaces the mov ax,0600/int 31 with a jump to C7:1896.
1.2B66: In DPMIProc (1.2B42). This routine is just a DPMI wrapper that catches
functions 000B, 000C, 0007, 0006, and 0009 and processes those itself (by
reading or writing from/to the LDT). The patch allows all calls with ah != 0,
or ax == 000B, to process normally. It replaces the check for al == 0C (Set
Descriptor), and the next 2 instructions, with a jump to C7:18F6.
1.2BAD: In DPMIProc also, replacing the test for al = 09 (Set Descriptor
Access Rights), and the next 2 instructions, with a jump to C7:18FB. Because
of the presence of this and the previous patch, I would guess that only these
2 functions (Set Descriptor, Set Descriptor Access Rights) are being
overridden.
1.408A: In GAlloc (at 1.4025, WI p. 128), this code replaces a DPMI function
0004 call (at the bottom of p. 129) with a jump to C7:1900.
22-Mar-94 14:02:57 Fm: Roger Alley 71163,2407
Here's a listing for the few patches which I have fully analyzed. [Up to now,
Roger had shown what Windows 3.10 code OS2fW would overwrite with far jump
instructions to OS/2 segment C7h; in the descriptions that follow, he examines
what happens at the jumped-to code in segment C7h.]
1.1E18: In undocumented AllocSelectorArray (1.1DF5, WI p. 89). This routine
gets a sequential list of LDT selectors (via Get_Sel), and then sets the DATA
and PRESENT flags in each of those selectors. The patch replaces the
instruction which sets the flags directly in Kernel's copy of the LDT with a
call to DPMI Set Descriptor Access Rights. [LISTING FOUR, PAGE 130.]
1.1E3C: In Get_Sel (1.1E2B, WI p. 90). This patch replaces "if (*FirstFreeSel
== --1) goto try_DPMI" with "goto try_DPMI." This has the effect of always
using DPMI, and skipping a lot of descriptor-mucking code. [LISTING FIVE, PAGE
130.]
1.1FB4: In Alloc_Sel (1.1F55). This patch changes some descriptor mucking. The
original code just made the changes directly to Win_LDT. The new code copies
the descriptor from Win_LDT into local storage, makes the changes, and then
issues DPMI Set Descriptor. Note that, on entry to this code, the contents of
ax are the same as the value on the top of the stack, and es:di points to a
descriptor in local storage. [LISTING SIX, PAGE 130.]
1.2017: In Free_Sel (1.2009, WI p. 94). The patch effectively replaces the
statement "if (GDTDsc == 0) goto Free_to_DPMI" with "goto Free_to_DPMI,"
skipping the direct descriptor-mucking code and using DPMI instead.

So What Changed in Windows 3.11?
24-Mar-94 06:25:13 Fm: Roger Alley I have completed a first pass over the
differences between Windows 3.10 and 3.11 for KRNL386.EXE. [This Windows 3.11
KRNL386.EXE is identical to the one in Windows for Workgroups 3.11.] The file
seems to have been regenerated by a rebuild. The segments begin at different
offsets in the file, and the new file contains the VERSIONINFO resource. The
differences represent minor changes.
Data segment: The string "Incorrect MS-DOS version. MS-DOS 3.1 or greater
required" was replaced with a very long string that began with "Windows for
Workgroups 3.11 could not start. Make sure_". The rest of the string states
that MS-DOS 3.30 is required, and that "files=" has to be at least 20. This
seemed to be the only change in the data segment, and it threw the addresses
off D5h bytes. A later paragraph alignment returned 5 bytes, so addresses
after that were off D0h bytes.
Kernel segments 2 and 3: No changes.
Kernel segment 1: GetExePtr (WI p. 475) now checks if the passed-in handle is
a valid selector. Because GetExePtr is called from various routines, and the
parameter passed in can be one of many different things (such as a module
handle, an instance handle, a memory handle), it's possible that an invalid
parameter may not be caught in the normal validation, and thus GetExePtr could
receive a garbage value. If that value was odd, but not a valid selector,
KERNEL would GP fault (at 1.4E84). This fixes the bug by verifying that the
parameter, if odd, is a valid selector.
At this point, 5 bytes have been added.
There's a null byte, which does not seem to be used, inserted in front of
UnlinkObject. [This aligns the routine on a word boundary, but there's no
particular reason to align this one when so many other routines are
unaligned.]
At this point, we're +6.
The routine at 5A34, called indirectly by OpenFile, has a number of
(unanalyzed) changes to it.
Amazingly, we're even again (this saved a lot of time).
The routine at 88C8, called from PreloadResources, replaces a call to a
routine which does nothing (at 52EF) with a test and then, possibly, a call to
the routine at 5302 (previously 52FC). I'm not sure what the effect of this
is.
Now +14 (decimal).
At (old) A820, there's a large table, which is now 2 bytes larger, possibly
because of a paragraph align.
Now +16.
In InitDosVarP (WI p. 22), part of the kernel initialization, the program now
checks for at least DOS 3.30 instead of DOS 3.10.
That's it, pretty minor from what I can tell so far. [Of course, there are
other modules, such as USER.EXE.]


What's It All Mean?


23-Mar-94 16:05 EST Fm: Art Rothstein After discussing IBM's business reasons
for creating OS2fW (even the name of the product is inspired), you should
launch into a discussion of the technical problems such a product must solve.
Among the problems that come to mind are modification of descriptor tables;
direct screen modification when a WinApp or Windows itself (yes, you can do
this) runs in a window on the PM desktop; spawning of DOS applications. OS/2
must also provide replacements for system services that enhanced mode Windows
requests from VMM, such as DPMI.
The fact that IBM solved these technical problems, and that they got it right
the first time, is a minor miracle. The sheen on the miracle fades, however,
when we try to run OS2fW with Windows 3.11, Microsoft's so-called refresh
release of 3.1. ("Windows 3.11. It's not an upgrade. It's just our way of
letting you know who's boss.")
You should reflect on the sheer number of patches and on their dependence on
the exact location of Windows 3.1 code and the exact instructions at those
locations. Keeping up with Microsoft's changes, even if Microsoft does not
intentionally create obstacles, can keep at least a few IBM developers
constantly busy.
While the patch machine does validate the size of most patched segments, it
does not validate instructions. Most patch programs, going back to IBM's Super
Zap utility on MVS, make replacement contingent on successful validation.
[LISTING ONE]: In-memory comparison of two fragments from KRNL386: Windows 3.0
vs. OS2fW.

00000030: 87 2F DGROUP
00000032: 17 C7
00000033: 01 00
 ...
000013FC: EA B8 1.13FC B8 0004 mov ax,4
000013FD: 5D 04 1.13FF CD 31 int 31h
000013FE: 15 00
000013FF: C7 CD
00001400: 00 31
 ...
00002017: EA 2E 1.2017 2E: 83 3E 0032 00 cmp cs:data_0368,0 ; (1.0032=0)
00002018: B3 83
00002019: 16 3E
0000201A: C7 32
0000201C: 90 00
 ...

[LISTING TWO]: OS2fW patching-script format and patch block types (Art
Rothstein, 19-March-1994).

Script table
 0- 1 Offset in this segment of next script table, 0 if none
 2- 3 Which segment of this module patches refer to:
 4- 5 Segment limit to check, 0 if none
 6- 7 Offset of patch data
 8- 9 Offset of validation table, FFFF if none
 0A-0B Selector of patchee for block types 06 and 07
 0C-0D Selector of patchee for block type 04
 12-13 Offset to place in jump instruction patches
 14-15 Selector of patcher for block types 04 and 06
 16-17 Selector of patcher for block type 07
 18-19 Unknown
 1A-1B SH' signature


Patch data is a set of contiguous blocks. Each block has a 2-byte header.
 0- 0 Block type
 01 Change internal state
 02 Set patchee offset for subsequent block types 04, 06 and 07.
 03 Determine patchee offset to use in a subsequent block type 04.
 04 Make patcher return to patchee.
 05 Determine patchee offset to use in subsequent block types 06 and 07.
 06 Save patchee instructions in patcher, replace with NOPs.
 07 Make patchee jump to patcher (EA offset segment).
 08 May not be interesting.
 FF End of patch data
 1- 1 Number of bytes remaining in the block after this byte

The remaining bytes in a patch block depend on the patch type:

Type 01 Change internal state
 2- 3 New internal state

Type 02 Set patchee offset for subsequent block types 04, 06 and 07.
 0A-0B 0001
 0C-0D Patchee offset.

Type 03 Determine patchee offset to use in a subsequent block type 04.
 08-09 Value to use unconditionally, unless FFFF.
 0A-0B Value to pass for massaging if FFFF in 08-09.

Type 04 Create instructions in patcher to return to patchee.
 2- 3 Instruction type
 0001 Far jump (EA offset segment)
 0002 IRET (CF)
 0003 Far jump (PUSH segment, PUSH offset, RETF)
 4- 5 Offset in patcher
 The segment of the patchee comes from [BX+0C].
 The segment of the patcher comes from [BX+14].


 The offset in the patchee is the sum of offsets derived from the
 preceding block types 02 and 03. For the kernel, at least, block
 type 02 always contributes a zero.

Type 05 Determine patchee offset to use in subsequent block types 06 and 07.
 08-09 Value to use unconditionally, unless FFFF.
 0A-0B Value to pass for massaging if FFFF in 08-09.

Type 06 Save patchee instructions in patcher, replace with NOPs.
 2- 3 Offset in patcher
 4- 5 Number of bytes to patch
 The segment of the patchee comes from [BX+0A].
 The segment of the patcher comes from [BX+14].
 The offset in the patchee is the sum of offsets derived from the
 preceding block types 02 and 05. For the kernel, at least, block
 type 02 always contributes a zero.

Type 07 Make patchee jump to patcher (EA offset segment).
 1- 2 0001
 2- 4 Offset in patcher
 The segment of the patchee comes from [BX+0A].
 The segment of the patcher comes from [BX+16].

 The offset in the patchee is the sum of offsets derived from the
 preceding block types 02 and 05. For the kernel, at least, block
 type 02 always contributes a zero.

[LISTING THREE]: A sample OS2fW patch script.

01 02 01 00
03 0A FF 00 FF 00 FF 00 FF FF 7D 00
04 04 01 00 35 13
05 0A FF 00 FF 00 FF 00 FF FF 75 00
06 04 FB 12 08 00
07 04 01 00 FB 12
FF 00

[LISTING FOUR]: AllocSelectorArray.

1:1E18 Old code New code

 ; BX is an LDT offset
 mov [bx+5],cx mov ax,0009h
 or bx,7 ; make into selector
 int 31h

[LISTING FIVE]: Get_Sel.

1:1E3C Old code New code

 mov ax,[si] mov ax,[si]
 inc ax inc ax
 jz try_DPMI cmp ax,ax
 jz try_DPMI

[LISTING SIX]: Alloc_Sel.

1:1FB4 Old code New code

 and bl,F8h add sp,2
 pop [bx+5] cld
 mov ax,[bp+4] movsd
 mov [bx],ax movsd
 mov [bx+7],cl sub si,8
 sub di,8
 mov es:[di+5],ax
 mov ax,[bp+4]
 mov es:[di],ax
 mov es:[di+7],cl
 or bx,7
 mov ax,000Ch
 int 31
End Listings












June, 1994
PROGRAMMER'S BOOKSHELF


Custom Controls and Windows Programming




Michael Floyd


When it comes to the popularity of custom and visual controls, there's no
mystery. As a custom-control developer, you can create a control that is used
in the Visual Basic (VB) environment just as any standard control is used.
When loaded, the control appears on the tool bar. In Visual Basic, you can
edit the control's properties and event code just as with standard
controls--extending, in effect, the VB environment.
Think of custom controls as data encapsulation on a grand scale. A custom
control is an object contained within a .VBX file--it has a predefined set of
features and a clearly defined interface to that feature set. Yet developing
custom controls adds a level of complexity and isn't without drawbacks.
Although a .VBX is a DLL, developing a custom control is far more involved
because a custom control is self contained--it must encapsulate the machinery
required to operate in both the application space and the visual-design
environment. From the programmer's perspective, there are compatibility issues
between versions of the VB API. And you must determine whether your controls
need to be visual, nonvisual (for use only in an application), or both.
Windows Programming Power with Custom Controls by Paul Cilwa and Jeff
Duntemann offers insight into these problems and provides useful tools for the
custom-control developer. Windows Programming Power with Custom Controls takes
a workbench approach to developing custom controls. First, it walks you
through the development of a generic control, then guides you through new
concepts while creating more interesting and useful controls.


Tools for Development


If you plan on creating the visual versions of the controls in the book,
you'll need to get either the Visual Basic 3.0 Professional Edition, which
includes the Control Development Kit (CDK), or the Visual Control Pack. The
CDK, a C-based SDK, contains files needed for writing VBXs, as well as the
visual layer required by the controls. You'll also need a C compiler and
linker capable of generating a Windows DLL. Duntemann and Cilwa suggest
Microsoft C 6.x, Borland or Turbo C++, or Symantec C++. Because the authors
use Borland C/C++, you'll need to make some changes to the modules to get
these controls to compile under other development environments. An obvious
example is the author's use of Borland's #pragma argsused to eliminate certain
warning messages when compiling. This doesn't appear in the book's listings,
but is in the source code supplied on the accompanying disk. The Microsoft
equivalent is #pragma warning.
When I loaded the .RC files into AppStudio, the error "undefined keyword or
key name VOS__WINDOWS16" appeared. This is a constant specified with the
VERSIONINFO statement. Including VER.H in the resource-definition file solved
the problem.
Since the Borland compiler doesn't directly support the creation of .VBX
files, you must define your project as a .DLL and rename the output file using
a .VBX extension. The authors provide a utility DLL2VBX to aid in this.
Interestingly, Borland C++ 4.0 allows you to rename the file's extension from
within the IDE. I tried this, but the compiler bombed out on the build with
the cryptic error message: "IDE error." I solved this problem by restoring the
file's extension to .DLL.


A Bare-Bones Control


The book's first project creates a "skeleton" control that implements the
basic functionality of a custom control. The skeleton is used in later
chapters as a basis for creating more-interesting controls. Upon finishing
this chapter, you'll know how a generic custom control performs its job.
You'll also have a functional custom control that you can use as a
custom-control template in your own development. The skeleton control is
minimal by design, so you may want to enhance this template for real work.
In keeping with a modular design, the skeleton control is broken into several
components. INTERNAL.H is a private #include file that supplies the constants
and prototypes used by the skeleton control. SKELETON.H contains the #defines
for message IDs and data structures used by a control. Because the skeleton
control doesn't define any messages, this file is initially empty. The
SKELETON header file is renamed and used in subsequent chapters.
MAIN.C is where the Windows message handler resides. In this respect, this
module looks like a typical Windows application. However, there are some
differences in how a custom control handles certain messages. For instance, a
control can respond to both Windows messages and messages generated by visual
development environments like VB. The difference is that VB messages refer to
properties and events. The discussion here glosses over this point, but is
still better than a blow-by-blow description of the source code. For example,
the authors provide useful tidbits, such as how to use static near to simulate
function nesting (a feature supported by languages such as Borland Pascal).
Duntemann and Cilwa also point out that DLLs require the large memory model.
This is, in fact, a requirement of C++. However, you can still make
intrasegment calls near.
VISUAL.C, an optional module that supports controls in a visual environment,
is a fine introduction to the inner workings of the VB environment. This
module zeros in on properties, methods, and events. Properties help define how
a visual control will appear to the environment. The discussion of properties
primarily refers to custom properties. It's worth noting that VB also supplies
a library of standard properties and events which help to define a uniform
interface to visual controls so that you don't have to reinvent the wheel.
When you can, use them. To implement a standard property, simply include the
appropriate constant in the property table.
It's interesting that the authors decide to set up properties at compile time.
Typically when a control is loaded, it's registered, and the MODEL control
data structure is initialized. Standard programming practice is to store a
visual control's property-name strings in a resource pool to be loaded when
the control is initialized. This added overhead can slow things down. Setting
up properties at compile time can significantly improve performance.
One topic the authors skim over is the MODEL structure, merely mentioning that
MODEL is a sort of master structure that points to all other structures in the
control. This is partially true. However, MODEL contains information about the
control's general behavior, window styles, class names and styles, and so on.
Understanding this is key to the behavior of your control. Figure 1 recreates
the MODEL structure as defined in VISUAL.C. VB_VERSION is an unsigned int
defined in VBAPI.H, and specifies the version of VB being used. Next, two
flags are listed. The MODEL_fFocusOk flag allows the control to receive the
focus at run time, and MODEL_fArrows allows the arrow keys on the keyboard to
be used within the control. Normally, arrow keys are used to move between
controls. PCTLPROC specifies the address of the control procedure. CS_VREDRAW,
CS_HREDRAW, and WS_BORDER are window styles. The rest of the structure is as
advertised. The Control Development Guide documents the MODEL data structure.
Other modules include DIALOG.C (to provide support for dialog boxes), PAINT.C
(paint routines for displaying a control), and HELP.C (for online help).


Other Controls


Subsequent chapters begin the process of creating something real. First, a
panel control is created. The panel control is a customization of the skeleton
control and shows how you can give a three-dimensional look to controls by
adding sunken and raised bevels. The control also supports flood filling
within the beveled control. Because the panel control is similar to that in
VB, the authors reasoned that it was not necessary to support the visual
environment hooks. Therefore, the VISUAL module is not included for this
control and it can only be used within a standard Windows app. I found this
surprising since the authors hint elsewhere in the book that other visual
environments will soon support visual controls.
On the upside, I was pleasantly surprised by the inclusion of a Pascal unit
that allows the panel control to be used in Borland Pascal for Windows
applications.
Other controls created in the book include a virtual list-box control which
shows how you can extend list boxes to contain up to 32 Kbytes; a database
Browser control that allows you to browse records in a database; a page-list
control that allows the control user to select a document page from a list of
page icons; a text-file viewer; and a text-file editor based on the
file-viewer control. Throughout these sections, a number of useful
embellishments are provided, including the use of offset text to highlight key
points and the use of text boxes to highlight some rather interesting asides.
One text box, for example, explains how Borland's Resource Workshop extends
the CTLINFO data structure, which defines the class name and version number
for a custom control. The CTLINFO structure also contains an array of CTLTYPE
structures, each of which lists commonly used combinations of control styles
(called "variants") with a short description and information about the
suggested size. Borland redefines this structure as RWCTLINFO in CUSTCNTL.H.
The authors have created their own, called CONTROLINFO. These and other
tidbits are sprinkled throughout the book.
Likewise, every good book has a hidden gem. In Windows Programming Power, this
jewel is the .INI file manager control, called "IniData." This VB-only control
allows users to view, process, and manage Windows .INI files--a task not
easily accomplished in VB. One unique feature of this control allows you to
encrypt key values to prevent users from reading or modifying an application's
.INI file. The algorithm uses simple substitution, where each plaintext
character is replaced by a ciphertext character. Most substitution algorithms
are easy to break because they merely shift to the right a specified number of
characters to obtain the encrypted text. Such algorithms do nothing to hide
the frequency of characters. Instead, this algorithm uses a substitution table
to supply the encrypted text characters. The algorithm is not sophisticated.
It converts all letters to upper case and only works with alphabetic
characters; nonalphabetic characters are passed through. However, the purpose
here is to prevent a user from accidentally blowing away a sensitive section
of the .INI file.


Missing in Action


The Microsoft approach to developing custom controls is through Visual C++ and
the Microsoft Foundation Class (MFC) Library. MFC 2.5 emulates portions of VB
to allow an MFC application to use .VBXs. If you're looking for help in this
area, look elsewhere. This topic is missing, but not missed. The authors'
SDK-style approach to developing custom controls precludes any discussion of
the MFC approach. And with OLE Custom Controls on the horizon, future support
of .VBXs in MFC is uncertain.
The three levels of support for custom controls coincide with the three
versions of VB. Each version adds support for new features, and older versions
can be viewed as a subset of subsequent versions. Visual C++ only supports VB
1.0-compliant controls. Naturally, you may have to support multiple versions.
It's possible to develop a single custom control that behaves correctly in
each host environment. This is detailed in the Control Development Guide.
Unfortunately, the book does not cover this.
Creating custom-control wrappers is another topic you might find useful. A
wrapper is basically a custom control that provides a high-level interface to
DLL functions. Many developers who have created DLLs have the potential to
convert these to custom controls. Even if the DLL has no UI components, you
can create an invisible control which displays its bitmap at design time, but
is hidden at run time.


Conclusion


Windows Programming Power with Custom Controls is not a replacement for the
CDK documentation. In fact, you'll want to keep the Custom Control Reference
and the Control Development Guide nearby. Windows Programming Power does
provide a generic custom control which can serve as a starting point for any
custom control, and it gives you useful controls that can be used as-is, or
embellished for your specific needs. The authors have designed this book so
that the learning process is incremental--each new control introduces a new
concept while leveraging concepts built in previous chapters. Within this
process, Duntemann and Cilwa add their own unique insights into the design and
development of custom controls. In the end, you'll walk away with a better
understanding of custom controls and some new and useful tools.

Windows Programming Power with Custom Controls
Paul Cilwa and Jeff Duntemann
Coriolis Group Books, 1994, 480 pp.
$39.95
ISBN 1-883577-00-4

Figure 1: The MODEL data structure.
MODEL Model =
 {
 VB_VERSION,
 MODEL_fFocusOk MODEL_fArrows,
 (PCTLPROC) CtlProc,
 CS_VREDRAW CS_HREDRAW,
 WS_BORDER,
 sizeof (VBDATA),
 8000,
 NULL,
 NULL,
 NULL,
 Properties,
 Events,
 IPROPINFO_STD_NAME,
 IPEVENTINFO_STD_CLICK,
 -1
 };





































June, 1994
OF INTEREST
Drag-it, a Visual C++ add-on class library from Performix, allows Windows
programmers to incorporate drag-and-drop functionality into Visual C++
applications. Drag-it includes a class library, starter files, a "builder,"
and a sample application. You use the builder to draw or import bitmap
symbols, then place them on the palette. Drag-it sells for $495.00. Reader
service no. 20.
Performix
6618 Daryn Drive
Westhills, CA 91307
818-992-0840
Repository Technologies has announced an interface between its ControlFirst
system (CFS) and Intersolv's PVCS source-code version-control system. CFS
provides problem tracking, work-flow management, and release control for
software-development sites. For its part, PVCS is a configuration- management
tool that tracks changes to components of a software system.
CFS provides a central repository for all information related to software
changes during the development process. With CFS, you can GET and PUT
source-code modules associated with a given problem. All PVCS
locking/unlocking features are also accessible. A file-server version of CFS
sells for $895.00. Reader service no. 21.
Repository Technologies Inc.
6825 Hobson Valley Drive, Suite 201
Woodridge, IL 60517
708-515-0780
VGA Animate 2.0, a set of C language tools for DOS-based graphics animation
from Nexus Software, uses the undocumented page-switching Mode X to achieve
high-speed, flicker-free animation. Mode X, which is available on all VGA/SVGA
cards and provides page-flipping and off-screen image storage, was examined
extensively by Michael Abrash in his "Graphics Programming" column (DDJ, July
and August, 1991).
VGA Animate 2.0 includes over 130 functions for creating and animating images.
It supports PCX and FLI file formats and is compatible with Microsoft and
Borland C compilers. The royalty-free toolkit sells for $39.00 without source
code, and $79.00 with source. Reader service no. 22.
Nexus Software
P.O. Box 341126
Milwaukee, WI 53234-1126
414-321-6792
The HyperBase Tool Box from Amzi! makes it possible for you to integrate
hypertext and Prolog code to create interactive documents. HyperBase allows
Prolog code to be "attached" to hypertext buttons or pages. The Tool Box runs
under Cogent Prolog and contains full source code for all modules and
programs, including the hyperdocument developer and reader, encoder/sealer,
and PCX utilities.
Existing Cogent Prolog 2.0 users can buy the HyperBase Tool Box for $129.00.
The complete Cogent Prolog development system, including a royalty-free
run-time license, sells for $377.00. Reader service no. 23.
Amzi! Inc.
40 Samuel Prescott Drive
Stow, MA 01775
508-897-7332
Graphics Gems IV, edited by Paul Heckbert and published by AP Professional, is
the latest in the series of what really are gems for graphics programmers.
This edition includes coverage of topics such as polygons and polyhedra, ray
tracing, shading, image processing, frame buffering, transformations, and
more. In addition to background information, algorithms, and performance
analysis, each selection also includes C source-code implementations. The
$49.95 hardcover book comes with an MS-DOS or Macintosh 3.5-inch disk
containing source code from all four volumes of the series. ISBN
0-12-336155-9. Reader service no. 24.
AP Professional
6277 Sea Harbor Drive
Orlando, FL 32887
800-321-5068
Microsoft has released TCP/IP and data- link control (DLC) protocol support
for Windows for Workgroups 3.11. Microsoft's TCP/IP product allows IP-based
connectivity to Windows NT and Windows NT Advanced Server, as well as
UNIX-based LANs and wide-area networks. The TCP/IP for Windows for Workgroups
includes support for the Windows Sockets API. Microsoft DLC for Windows for
Workgroups allows Windows for Workgroups-based PCs to operate in IBM SNA
environments and to connect to mainframes and minicomputers, such as the
AS/400. The DLC and TCP/IP tools make it possible for Windows for
Workgroups-based systems to support networking protocols including TCP/IP,
IPX/SPX, and NetBEUI, and run seamlessly on networks such as Windows NT
Advanced Server, Novell NetWare, Banyan Vines, DEC Pathworks, and SunSelect
PC-NFS. Both TCP/IP and DLC for Windows for Workgroups 3.11 can be downloaded
at no charge from the Microsoft Download Service (206-936-6735), CompuServe
(GO MSCLIENT), and the Internet (ftp.microsoft.com/advsys/msclient/wfw).
Reader service no. 25.
Microsoft Corp.
One Microsoft Way
Redmond, WA 98052-6399
206-882-8080
IXI, a subsidiary of the Santa Cruz Operation, has begun shipping the Wintif
Developer's Pack, which provides a Windows look-and-feel to UNIX applications.
Wintif will be provided free of charge with the next upgrade of Premier Motif,
IXI's upgraded OSF/Motif developer's toolkit. The pack will be available
initially for SunOS and Solaris platforms, with support for SCO Open Desktop,
thereafter. Future versions of Wintif will support Microsoft's OLE 2.0 and
other key interchange formats already supported by Windows applications. IXI's
Wintif technology currently complies with OSF/Motif 1.2., XPG4, X11R5, and the
Microsoft Windows 3.1 style guide. Reader service no. 26.
IXI Ltd.
400 Encino Street
Santa Cruz, CA 95061
408-427-7700
Swim, an OSF/Motif 1.2.3 run-time and development system, has been released by
Sequoia International. Swim runs on a variety of UNIX platforms including
Coherent 4.2 (in fact, Swim is the first Motif-compatible implementation for
Coherent), Linux 0.99, BSD/386 1.x, FreeBSD 1.0.2, and NetBSD 0.9.
Swim includes: an mwm, the window manager; a shared library, including libXm
(Linux only); static libraries, libXm, libMrm, and libUil; header and include
files; online manual pages; source code for OSF/Motif demo programs
(drag-and-drop, clipboard, periodic table, text editor, and more); and the
OSF/Motif user's guide. Swim sells for $149.95. Reader service no. 27.
Sequoia International Inc.
600 West Hillsboro Blvd., Suite 300
Deerfield Beach, FL 33441
305-480-6118
A suite of tools for developers of client/server applications for wireless
data networks, such as RAM Mobile Data and ARDIS, has been announced by Client
Server Technologies. Central to the MECCA (short for "Mobile Empowered
Collaborative Computing Architectures") toolset is AirClient, which contains
Windows DLLs required by RAM Mobile and ARDIS. Any app which supports an
external C call can be MECCA-enabled for wireless communication using
AirClient. The tool also includes file-transfer utilities and support for
several DOS commands. Other components of the MECCA toolkit include the
yet-to-be-released AirServer and MECCA/LAN which will enable RF and cellular
wireless apps for NetWare- and UNIX-based servers. The first release of the
MECCA toolkit supports the following GUIs: Paradox for Windows, Visual Basic,
Powerbuilder, and Gupta's SQL Windows.
An AirClient license costs $199.00 per PC, plus $495.00 for each GUI-specific
MECCA toolkit. A technical white paper discussing wireless communication is
available free of charge from Client Server. Reader service no. 28.
Client Server Technologies Inc.
1920 North Thoreau Drive, Suite 122
Schaumburg, IL 60173
708-397-7300
RJSwantek has released Dis Doc for Windows, a Windows version of its venerable
Dis Doc disassembler. With Dis Doc for Windows, you can disassemble Windows,
OS/2, OBJ, EXE, COM, and BIOS files. At the same time, the company announced
that it has upgraded its Dis Doc Professional for DOS disassembler to
disassemble Pentium-specific programs. Dis Doc for Windows sells for $299.00,
while Dis Doc Professional for DOS sells for $249.00. Reader service no. 29.
RJSwantek Inc.
33 Spencer Brook Rd.
New Hartford, CT 06057
800-336-1961
The SoftwareWedge from T.A.L. Enterprises lets you add two-way serial I/O
capabilities to any DOS, Windows, OS/2, or NT application. The tool also
provides the ability to parse and filter incoming data and add keystrokes and
date/time stamps. The Windows version of the tool supports DDE, letting you
place serial I/O buffers directly in other Windows or OS/2 apps.

SoftWedge Professional for DOS sells for $295.00, while Softwedge for Windows
sells for $395.00. Reader service no. 30.
T.A.L. Enterprises
2022 Wallace Street
Philadelphia, PA 19130
215-763-2620
LOOX 2.0, from LOOX Software, is an object-oriented development tool for
creating GUIs for UNIX programs. With LOOX, you create a graphic
representation of any object with which the user must interact using
LOOXMaker, a vector-based drawing program for creating diagrams, schematics,
toolboxes, animation sequences, and the like. Then, with LOOX-lib, a
vector-based C-function library (with more than 200 functions), you build
LOOXMaker-created objects into apps. LOOX is compatible with most UNIX
systems, including those from Sun, HP, IBM, SGI, DEC, and SCO. LOOX is
integrated with OSF/Motif and the X Window System. The LOOX graphics
development system sells for $9950.00. Reader service no. 31.
LOOX Software Inc.
151 South Bernardo Ave., Suite 45
Sunnyvale, CA 94086
415-903-0942
Together/C++ for Windows is an object-oriented modeling and programming
environment which automatically and simultaneously updates an application's
object model and code. Together/C++, which was developed by Object
International, lets you edit in either an object-modeling window or a C++
programming window, which are displayed side-by-side. The tool then keeps the
two continuously in sync with each other. Together/C++ includes a full C++
parser to catch syntax problems, configuration management for team
programming, and an SQL-generation tool for building relational tables from
object-modeling results. Together/C++ for Windows sells for $4400.00. Reader
service no. 32.
Object International Inc.
8140 N. MoPac 4-200
Austin, TX 78759
512-795-0202
A book/CD-ROM combination entitled, Cross-Platform Power Tools, by Steve
Petrucci, has been released by Random House Electronic Publishing. The book is
based on XPLib, a set of libraries for Macintosh, Windows 3.1, and Windows NT
developers. The libraries include an API with over 300 functions for checking
and converting characters: managing events; dialogs; and menus; controlling
fonts, text output, and more. The CD-ROM also includes utility and debugging
functions, as well as sample programs. In the book, Petrucci discusses aspects
of cross-platform development ranging from managing memory to printing between
platforms. ISBN 0-679-79147-7. The 432 page book sells for $55.00. Reader
service no. 33.
Random House Electronic Publishing
800-733-3000
PC Media, a DSP-based approach to incorporating sound, speech, video, and
other multimedia capabilities into one system (thereby eliminating the need
for multiple add-in cards), has been announced by Motorola. The system is
based on a speech-compression technology called "Truespeech," that was
developed by the DSP Group and licensed to Motorola for PC Media. System
designers can get Truespeech driver software from either Motorola or the DSP
Group.
Truespeech compresses a 1-minute voice file down to 60 Kbytes without
noticeable degradation. This compression is suitable for voice mail, voice
annotation, dictation, and the like. The compression scheme is based on
algorithms derived from the way airflow from our lungs is shaped by the
throat, mouth, and tongue when we speak. The DSP Group claims that its
approach is 5 to 15 times more efficient than other methods of digital voice
storage. Reader service no. 34.
DSP Group Inc.
2855 Kifer Road, Suite 200
Santa Clara, CA 95051
408-986-4300
MotorolaMicrocontroller Technologies Group
6501 William Cannon Drive
Austin, TX 78735
512-891-2030
The KIPP ImageControl toolset (from Kofax Image Products) is a suite of
object-oriented, imaging development tools for Visual Basic and Visual C++
developers who need to add imaging capabilities to applications or building
new Windows-based imaging apps. The KIPP (short for "Kofax Image Processing
Platform") drag-and-drop tools enable scanning and displaying, image deskewing
for OCR, text annotation, bar-code recognition, and the like.
When using the tools, you simply drop the appropriate image-control icon (such
as scanner control) into the programming workspace. You can also customize the
control settings through the Visual Basic properties interface. The KIPP suite
supports any Windows-compatible display and includes a royalty-free software
decompression engine with built-in Cornerstone ImageAccell support. KIPP sells
for $1495.00. Reader service no. 35.
Kofax Image Products
3 Jenner Street
Irvine, CA 92718
714-727-1733




























June, 1994
SWAINE'S FLAMES


Infobahn Clich Kit


It was the phrase "road kill on the information superhighway" that got me
paying particular attention to the number and variety of highway metaphors
that have appeared in print and conversation lately, inspired by the term
"information highway." Although I guess now we're supposed to say "infobahn."
It seems there's a new infobahn clich everyday, until one columnist was
driven to call for a motortorium_uh, moratorium, on such metaphors.
But this is just misguided turf protection. Professional writers don't own the
language; at best, we have grazing rights. Granted, some of these figures of
speech are as attractive as road kill, but there is a natural progression
here. New language always passes through an annoying stage, in which it is
widespread but not yet so familiar that we no longer notice it. Clich is just
a kitschy rest stop on the road from inventive to invisible.
In support of your right to neologize and metaphorm freely, I have compiled a
list of road-oriented language. Note that this is not a collection of infobahn
clichs; rather, it's an Infobahn Clich Developer's Toolkit. Clichs that you
develop with it can be used freely, with no royalty or license fee. Here 'tis:
The infobahn (and the various infostreets, inforoads, infodriveways,
infotrails, infoblind alleys, and infogarden paths that lead to it) will
doubtless have its tollbooths and tollgates; cloverleafs, overpasses,
underpasses, bypasses, and business routes; stop lights, stop signs, direction
signs, warning signs, under-construction signs, Burma-Shave signs, and
billboards; truck stops (where Mrs. Gore is a big tipper), rest stops, pit
stops; chuckholes, S curves, steep grades, curbs and berms and islands and
soft shoulders.
We'll have our disparaging characterizations of our fellow drivers and
passengers: He's just a backseat driver, hitchhiker, carjacker, Sunday driver,
day tripper, joy rider, road warrior, road runner, crash-test dummy.
But what sort of vehicles will take us on our infodrives, spins, jaunts, and
junkets? Sports cars, luxury cars, trucks, or buses? We'll want a Porsche, but
will we settle for a Yugo? Will they have seatbelts and airbags, or running
boards and rumble seats? Will we plaster them with "Honk if you love Jesus"
and "Baby on board" stickers, next to those smileys? :-)
We'll watch the traffic reports, avoiding the bumper-to-bumper traffic, and
cross the yellow line to get in the fast lane or the passing lane or (if we
qualify) the car-pool lane, unless we're not in a hurry and decide to take the
scenic route.
But you've got to keep on truckin', put the pedal to the metal on that old
eight-lane, and get this show on the road. And if you miss your exit while
folding up the map, well then all roads lead to Rome--even the yellow brick
road, the road to Morocco, the road less taken, the back road, and off the
beaten track where the streets have no name and I'm in a rut.
All this assumes that the highway construction proceeds apace, because
somebody's got to pave the way, and you know the road to hell is paved with
good intentions (or is it inventions), and the road is my middle name so I'll
just have one for the road, I'm just along for the ride, we must explore every
avenue, so why don't we do it in the road?
That's it. I'm out of gas.
Michael Swaineeditor-at-large











































Special Issue, 1994
EDITORIAL


Why Ask Why?


Is multimedia really a solution without a problem? Can sound, music,
full-motion video, and graphics effects increase your productivity? Do you
really need a talking spreadsheet that tells you "I'll be back" upon
termination? Even the big multimedia-related vendors have trouble answering
questions such as these. That's why their response usually sounds more like a
"why-ask-why?" commercial for Bud Dry--an answer that tastes good, but is less
than filling.
New products--hardware and software--are cropping up faster than developers,
users, and even standards committees can keep up with them. Like it or not,
multimedia is reaching critical mass.
On the upside for developers, multimedia programming can be downright fun,
albeit at times confusing. Which video file format should you use? How can you
support all of the available sound cards? What are the best compression
schemes? Can a DSP really increase performance without sacrificing
flexibility? To help you answer this ever-growing list of questions, we're
bringing you Dr. Dobb's Sourcebook of Multimedia Programming, a special issue
of Dr. Dobb's Journal focusing on multimedia software development. Typical of
the tricks you'll uncover inside are Scott Anderson's techniques for morphing
and Neil Rowland's C++ class library for encapsulating Windows' low-level
waveform audio services. Other authors examine digital-video file formats,
audio-compression techniques, and programming animation under Windows. You'll
also read about the VESA committee's efforts to create a standardized software
interface for audio, and--in the spirit of the original Dr. Dobb's--a feature
on how to roll your own RS-232-based sound system. By the time you've finished
reading Dr. Dobb's Sourcebook of Multimedia Programming, you'll be on your way
to deciding what's hot and what's not when it comes to multimedia applications
development.
For instance, among the "what's hot" items is QSound, a virtual-audio
technology from Archer Communications (discussed by John Ratcliff) allows left
and right panning within a 180-degree "soundscape." The sound technology was
first used by the film industry in the movie Willow, and has been used by
recording artists such as Paula Abdul and Madonna. Game developers have now
begun to include QSound in CD-ROM games. Although the QSound algorithms are
patented, Creative Technology has licensed the technology from Archer and is
providing, free of charge, a QSound API to registered Creative Labs/Technology
developers. Sierra Semiconductor has also licensed QSound technology and is
currently providing it in their Aria chipset.
Another area that's warming up is speech recognition, and several companies
are clamoring to tap into what some experts project will become a
one-billion-dollar industry by the end of the decade. At the high end, IBM has
ported its RS/6000-based, continuous-speech technology to the 486 and Pentium.
Included in its product line is a developer's edition codeveloped with
Carnegie Mellon University that sells for $299 and includes a 1000-word
vocabulary. Interestingly, both Creative Labs and Sierra Semiconductor have
added speech-recognition technology to their product lines. Sierra's Aria
Listener is a low-cost speech-recognition engine included as part of its
multimedia chipset that also includes the Aria Synthesizer for general-MIDI
support. Creative also has a Windows-based speech-recognition engine called
VoiceAssist. As with QSound, Creative is providing a VoiceAssist API, which is
available for $99.95. Just as interesting is a recent agreement between
Borland and Creative Labs to bundle an OWL 2.0-compatible VoiceAssist API in
Borland C/C++ 4.0.
Of course, all of this new capability comes at a price. To keep up with
emerging technologies, the Multimedia PC Marketing Council has upped the ante
on the definition of a "basic" multimedia system. The new MPC level 2
specification defines a multimedia PC as having a 25-MHz 486SX CPU, a minimum
of 4 Mbytes of RAM (with 8 Mbytes recommended), a double-speed CD-ROM drive
capable of transferring data at 300 Kbytes/second, a 160-Mbyte hard disk, and
a VGA+ adapter with a display resolution of 640x480 and 64,000 colors. (For
more information, contact the Multimedia PC Marketing Council, 1730 M Street
NW, Suite 707, Washington, DC 20036.)
Given this relatively high cost for entry, the technical hoops to make it
work, and the lack of clear productivity gains, you have to wonder if
multimedia is worth the effort.
Recall that in the early days of the PC color added little to the immediate
productivity of users, yet the CGA quickly became standard fare. Obviously, if
history is any gauge, appearances are indeed important. Now, sight and sound
allow multimedia developers to involve users in ways not readily possible
before. What you see and hear is what you get.
Michael Floyd
Executive Editor












































Special Issue, 1994
Morphing on Your PC


An easy-to-use algorithm for animation and morphing




Scott Anderson


Scott is the president of Wild Duck, a software publishing and development
company in Rohnert Park, California. He is also the author of the animation
program Fantavision, and the recently released book Morphing Magic from SAMS.
He can be reached at 73710,1055 on CompuServe.


Morphing has become almost as ubiquitous as bubble-gum under theater seats. As
the July 1993 issue of Dr. Dobb's Journal illustrated, from autos turning into
tigers to bulls into bears, we see examples of morphing everywhere on
television and in movies. While the effect is compelling, morphing is useful
beyond the magical transformations we've grown used to seeing.
This article examines the history and mathematics behind morphing, and
includes three utility programs that allow you to run full-screen animated
morphs on your PC.


Warping, Tweening, and Dissolving


Morphing is a blend of three separate algorithms: warping, tweening, and
dissolving. Warping is the mathematical trick of stretching and squashing an
image as if it were painted on rubber. This article discusses an
implementation of one of the latest warping routines, based on the method used
by Pacific Data Images (PDI) to make the Michael Jackson video "Black or
White." The algorithm uses lines to control the warping. Lines make the
warping specification much easier than using points, giving the artist a
break.
Tweening is short for in-betweening, the interpolation of two images to yield
a smooth-flowing animation. Tweening is typically done with points, lines, or
polygons. Since this warping algorithm is line oriented, tweening fits right
in. By tweening the position of the control lines, you can smoothly warp an
image around. With warping and tweening alone, you can create photorealistic
animations from single photographs. And it's simple to do. (For more details,
see my book Morphing Magic, SAMS, 1993.)
Dissolving, or cross-dissolving, is Hollywood-speak for fading out one scene
while fading in the next. In the middle, you have a double exposure. This
powerful effect was used in the early Wolfman movies. While poor Lon Chaney,
Jr. stuck his head in a vice, makeup people swarmed about, adding hair and
putty. After each little change in makeup was completed, another few frames of
film were squeezed off. Each short take was cross-dissolved with the previous
one to complete the illusion.
When you put all three of these effects together, you end up with morphing.
Here's how it works. Let's say you want to morph yourself into a tiger. On the
first frame of your self-portrait, you mark the key areas with lines. You
might place a control line from one eye to the other, down the nose, and
across the lips. These three lines capture the most important aspects of a
face. Then, on the tiger's ferocious countenance, you would draw control lines
on the same features: eyes, nose, and mouth. That's enough information for the
computer to morph the two. To create a single morph in the middle, the control
lines are tweened halfway and then the images are warped to follow the new
lines. The algorithm warps your face midway to the tiger, and warps the tiger
midway to you. This is the fifty-fifty interpolation between the two sets of
control lines--the central tween. It is also simply the average of the two
sets.
After creating the two warped images, the routine cross-dissolves them,
finally producing the morph. If you use more tweens, say ten interpolations
between you and the tiger, you can make a smoothly animated sequence of you
turning into the tiger. The first frame uses 10 percent of the tiger mixed
with 90 percent of you. The second frame has 20 percent of the tiger with 80
percent of you. By the ninth frame, there is 90 percent tiger, and you have
faded out to 10 percent.


Warping Algorithms


There are as many ways to warp as there are sort routines. The method employed
here is a variation on the PDI line method. As I mentioned before, this
approach is friendly to the artist. Other routines use dots to warp the image.
These changes are local and act only over a short distance, so you need
hundreds of them. Misplacing some of these dots and ruining the warp isn't at
all difficult. With the new method, each line represents the string of points
that compose it, so a few short lines can stand in for hundreds of points. As
you move a control line, the pixels around it are pulled as well.
One of the interesting peculiarities of this algorithm is the global nature of
the warping. A single line can specify any combination of scaling,
translation, and rotation. This can be a nice effect in itself. Just draw,
say, a vertical line, rotate it 90 degrees, and the image will also be
rotated. When there are more lines (as there usually are), they will each
compete for influence, but the effect will still be global. The downside is
that it takes a little longer to perform the warping.
In the algorithm presented here, the influence of the line falls off as
1/distance2. This is just like the influence of gravity, but you can select
any variation on this that you desire. This formulation was chosen based
largely on speed considerations. A mathematical discussion of the warping
algorithm is in the text box entitled, "Warping with Control Lines." The basic
warping algorithm is as follows:
1. Find the distance between a pixel and the source line by dropping the
perpendicular d (see Figure 1).
2. Find the fractional distance f along the source line to the perpendicular.
The fraction is normalized to go from 0 to 1.
3. Move the point to a spot that is the same distance from, and fraction of,
the target line.
This algorithm is carried out by two routines, sumLines and getSourceLoc; see
Listing One, page 75. In getSourceLoc, the warped pixel is determined for each
line, while sumLines adds up the influence of each line. GetSourceLoc performs
the mathematics given in the text box entitled, "Warping with Control Lines."
SumLines uses the distance returned by getSourceLoc to determine the weight
contributed by each control line. In a loop through the lines, SumLines
calculates the individual weights, as shown in Figure 2. You can experiment
with the weight calculation (.001 is added to avoid division-by-zero errors).
Try equations that depend on the length of the line, or inversely on the cube
of the distance. The method used here is simple and fast, but don't be afraid
to try something new. You never know what will happen!


Tweening Algorithm


The second piece of the morphing trio is tweening. Compared with warping,
tweening is a piece of cake. In the simplest case of linear tweening, all you
do is interpolate. Linear interpolation is just finding a point on the line
connecting two others. Although there are dozens of ways to interpolate, for
our purposes linear tweening is just fine. All we do is take the lines
describing the source image and tween them into the lines describing the
target. For each intermediate tween, we warp the images accordingly.
For speed, divide the distance between the points into as many steps as you
desire, say ten. You will have an X and a Y component for the length of these
segments, called "deltas." For a regular division of the line, all these
deltas will be equal to one-tenth of the line distance, so you only need to
calculate it once and then simply add it to the starting point. After ten
additions of this delta, you will have arrived at the target point. The deltas
are given by the simple equations in Figure 3. Adding a delta in this fashion
is fast, but care must be taken to avoid round-off errors.


Dissolving Algorithm


Now we are at the third and final part of morphing, the dissolve. This is
where you combine the two images you are warping and tweening. Mix any two
colors just the way you might expect. The color midway between two colors is
given by the formulas in Figure 4. On a 24-bit color system, the color you get
from this calculation will always be another color you can display. Not so
with color-mapped graphics cards. You can easily derive a color that isn't in
the puny list of displayable colors called the "palette." The best most VGA
cards can offer is 256 colors, which is pretty sad for this application.
To fix this you need to calculate all the colors as if you had 24 bits to play
with. Then, you need to create a new palette that uses the 256 most popular
colors. The rest of the colors must be forced into the closest color available
from those 256. This process is referred to as "palette mapping."
So dissolving, which should be the smallest part of this whole algorithm,
suddenly swells into an unsightly carbuncle on this otherwise straightforward
code. Those are the basics of morphing. There are many different algorithms
for performing this terrific effect, but this should serve as a good
jumping-off point for further experimentation.


The Morphing Programs



So that you can experiment, I've included three programs: MORPH.C (Listing
Three, page 77), LOAD.C (Listing Four, page 78), and FIX.C (Listing Five, page
78). The header containing the #defines for all three of these utilities is in
Listing Two, page 76. MORPH gets the user input and creates the morphing
sequence, FIX finds a good palette and LOAD plays the sequence back from
memory. All the programs work with PCX files having 256 colors and 320x200
resolution. You can capture your own PCX files with any screen-capture
program, or you can find GIF images on CompuServe and convert them with a
program such as ZSoft's Paint Shop. Remember to convert the number of colors
to 256. If you have access to a scanner, then your troubles are over. You can
scan in your own photos and magazine clips and go from there.
When specifying a PCX file, you don't need to type in the file extension. The
programs automatically append ".PCX" to the end of each filename, saving you
the trouble.
Some of these programs produce a numbered sequence of output files. These are
also PCX files. The maximum sequence size is 99. The sequence number, from 1
to 99, is appended to the root name of the file sequence. This takes two
characters away from the already-tiny DOS allotment of eight. Therefore, the
<OutFile> name must be six characters or less.
Figure 5 shows the syntax used on the command line to call the MORPH program.
MORPH first reads the two input files: the source and target PCX images. Then
it looks on the disk for any existing control lines that might be associated
with the source or target image. If the user okays it, these line files are
loaded in. Otherwise, the user creates a new set of control lines from
scratch, and these are saved to the disk. Morph metamorphoses between the two
in the given number of steps, to make a sequence of files. The files are
numbered and can be displayed by the LOAD program. As an example, issuing the
command line MORPH JIM BOB will cause MORPH to load the files JIM.PCX and
BOB.PCX. It will create the middle tween and display it. No files are created.
However, MORPH JIM BOB 7 will load the files JIM.PCX and BOB.PCX, create a
sequence of seven frames, and display them. No files are created. Finally,
MORPH JIM BOB 7 JIMBOB will also load JIM.PCX and BOB.PCX and create a
sequence of seven frames. Each frame is displayed while it is saved to the
disk. The files will be named JIMBOB1.PCX, JIMBOB2.PCX, and so on.
Figure 6 shows the syntax used to call the LOAD program. LOAD reads a sequence
of PCX files indicated by the given name. It reads as many as it can fit into
memory, then it displays them sequentially. You can control the playback rate
by pressing a number from 1 (slow) to 9 (fast). To single-step through the
pictures, press Spacebar. To continue playback, press Enter. To quit, press
Esc. The number of images that can be animated depends on how much free RAM
you have. Get rid of your TSRs and use the new memory features of DOS 6.0. The
best you can do in 640K is about eight images.
The command-line syntax used to call FIX is in Figure 7. FIX takes a sequence
created by MORPH and forces each picture in the sequence to have the same
palette. FIX then writes the sequence out under the new name. The command FIX
JIM FXJIM reads the sequence JIM1.PCX, JIM2.PCX, and so on, outputting the
same number of files named FXJIM1.PCX, FXJIM2.PCX, and so. Also, FIX JIM JIM
reads the JIM sequence and writes over the original files. Be careful! Make
sure you don't need the originals before you do this.


Compiling


These programs were all compiled with Microsoft C/C++ 7.0. If you didn't
include the graphics library when you installed your compiler, you will need
to link that library in at compile time. I've included a makefile, which is
available electronically; see "Availability," page 2. The makefile is set up
assuming that you haven't installed the graphics library. If you have
installed the graphics library, then you'll have to modify this makefile
accordingly.
To make the programs work with other C compilers, you will probably need to
make some changes to the non-ANSI library calls. In Microsoft C, these library
calls are prefixed with the underscore character. The non-ANSI library calls
are all in IO.C (available electronically) and LINECALC.C (Listing One).


Conclusion


The most obvious upgrade to this program is the ability to use extended RAM,
so you can hold more frames in memory. If you have access to some of the new
software on the market that attempts to leap the 640K barrier, use it. If you
can work with smaller images, say, quarter-screen, 160x100 resolution, you
should be able to quadruple the number of frames that can be held in memory,
not to mention speeding up the animation. The display-picture routines you
need to change are in Listing Five. Currently, they slam a continuous string
of bytes into memory. For smaller screens, you will need to move the image a
line at a time.
If you have a true-color card, you don't need the agonizing color-collapsing
routines from Listing Six (page 79). Just yank them all out. You will need to
save the files in 24-bit color mode, which can present some data problems, so
look at the drivers and documentation for your particular graphics card.
With these programs, you can create some amazing morphs. With a video capture
board and a computer capable of displaying 30 frames per second, you can
output to videotape. Now you can play with some of the same toys they have at
Industrial Light and Magic and Pacific Data Images. Have fun!
 Figure 1: A given point P has a relationship to the line AB that gets mapped
to the warping line A'B'.
Figure 2: Calculating individual weights.
distance = getSourceLoc(&orig,origline,warp,warpline);
weight = 1/(.001+distance*distance);
deltaSumX += (orig.x - warp->x) * weight;
deltaSumY += (orig.y - warp->y) * weight;


Figure 3: Equations to calculate the interpolation deltas.
DeltaX = (TargetX - SourceX) / (Tweens + 1)
DeltaY = (TargetY - SourceY) / (Tweens + 1)


Figure 4: Calculating the color midway between two colors.
NewRed = (SourceRed + TargetRed) / 2
NewGreen = (SourceGreen + TargetGreen) / 2
NewBlue = (SourceBlue + TargetBlue) / 2


Figure 5: Command-line syntax for calling MORPH.
MORPH <File1> <File2> [<Steps> [<OutFile>]]

<File1> is the name of the source PCX file to morph.
<File2> is the name of the target PCX file.
<Steps> is the optional number of output files.

If you don't specify the number of steps in your sequence, the program
produces one in-between morph. You must specify <Steps> if you want to output
files.
<OutFile> is the root name of the optional output files. The value of <Steps>
determines the number of files to output. The step number is appended to the
root name, so the <OutFile> name must be six characters or less.
Figure 6: Using the LOAD program.
LOAD <File>

<File> is the root name of a PCX sequence to display.
Figure 7: Command-line syntax used to invoke the FIX program.
FIX <InFile> <OutFile>

<InFile> is the root name of the PCX sequence to fix.
<OutFile> is the root name of the fixed PCX sequence.



A History of Warping


Warping's history reaches back to the '60s space program, when NASA was
snapping shots of the earth like an eager tourist. But when it came time to
put all the pictures together, NASA discovered that nothing quite lined up
correctly because of the different altitudes, angles, times, and optics for
each shot. Consequently, researchers at NASA developed algorithms that treated
the digitized data as points on a polynomial surface that could be stretched
to fit a set of reference points. They called their efforts image
registration. Those marvelous Landsat pictures were thus stretched and pulled
into a big quilt of pictures blanketing the globe. This was a great start for
a little warping routine.
Warping was dusted off in the mid-seventies, when Viking 2 landed on Mars.
Unfortunately, Viking came to rest at an angle, with its camera pointing down.
The landscape it depicted looked like the Big Valley, with the horizon
strongly curved. All the images of Mars were pushed through a warping
algorithm that corrected for the odd optics before the public ever saw them.
Warping has found its way into the medical world, too. In a procedure called
"digital subtraction angiography," an X-ray of a patient is taken before and
after the injection of a dye into the area of interest. By subtracting the
first image from the second, everything is removed but the dyed arteries. This
uncluttered image is of great value to the diagnostician. Unfortunately, if
the patient moves, the effect is wrecked. And, except for tree surgeons,
moving patients are the norm. As you might have guessed, warping is the
perfect tool to ensure registration.
-- S.A.


Warping with Control Lines


Given point P in the source image, you want to deduce point P' in the target
image (see Figure 1). The distance d is the projection of vector AP on the
perpendicular, which yields AP cos a, where AP represents the magnitude of the
vector. The dot product of vector AP with the perpendicular to AB (denoted
^AB) is defined in Example 1(a), where AP and AB are the magnitudes, and the
subscripted variables are the x and y components of the vectors. You can see
that this solution in terms of components, (APx*^ABx+APy*^ABy), doesn't
involve any trigonometry. Example 1(b) solves for d.
The magnitude of the line itself, AB is used instead of the magnitude of the
perpendicular, ^AB. The reason is that these two are equal, and you can reuse
this number. The vector dot product allows you to calculate the desired values
without computing any angles or using any transcendental functions. That makes
it both simple and fast. From Figure 1, note that the distance represented by
f is the projection of AP on AB itself, AP cos b, which is the quantity:
(AP_AB)/AB. We want the fractional part of line AB, so we divide by the length
of AB again as shown in Example 1(c).
Examples 1(b) and 1(c) represent a translation into a new-scaled orthogonal
coordinate system based on d and f, instead of x and y. Now you are ready to
transfer these two important relationships over to the new line, A'B'. The
fractional part, the "f-axis," is simply the fractional part of the new line,
A'B': f*A'B' Next, apply the distance to the "d-axis," which is perpendicular
to the new line. The unit d-vector is (^A'B')/A'B'. So, with the new origin at
A', the source pixel P (represented as a vector from the origin) is
transformed into the destination pixel P'; see Example 1(d). This algorithm
uses the relationship a point has with the original line and applies it to the
new line. To be of real use, however, you need more lines. They all have to
compete for pixels. As mentioned in the text, this implementation uses a
weighting proportional to one over the distance squared.
In the calculation for two control lines, the distance to the point is
computed from each line, and a new pixel is calculated as before. But this
time there are two lines and therefore two warped pixels, so some further work
is needed. As shown in Figure 8, from the original point P to the new points
P1' and P2' there are two displacements, D1 and D2; see Example 1(e). The
routine calculates the weighted average of the two displacements to arrive at
the final position, P'; see Example 1(f). If there are three lines, there are
three displacements to average, and so on. Finally, the general equation for
calculating with n lines is shown in Example 1(g).
--S.A.
 Example 1(a): Dot product of vector AP with the perpendicular to AB; (b)
solving for d; (c) calculating the fractional part of line AB; (d)
transforming the source pixel P into the destination pixel P'; (e) determining
the displacements D1 and D2; (f) weighted average of the two displacements;
(g) solving for the general case using n lines.
 Figure 8: Where two lines contribute influence, the warped point is the
weighted average of the displacements D1 and D2.
[LISTING ONE] (Text begins on page 4.)

/****************************************************************
* FILE: linecalc.c
* DESC: Warping calculations and line-handling functions.
* HISTORY: Created 3/11/1993 LAST CHANGED: 5/ 6/1993
* Copyright (c) 1993 by Scott Anderson
****************************************************************/

/* ----------------------INCLUDES----------------------------- */
#include <conio.h>
#include <stdio.h>
#include <io.h>
#include <math.h>
#include <graph.h>
#include <malloc.h>
#include <memory.h>
#include <string.h>
#include "define.h"

/* -----------------------MACROS------------------------------ */
#define PIXEL(p,x,y) (p->pixmap[y * (long) p->wide + x])
#define SQUARE(x) (((long) x)*(x))

/* ----------------------PROTOTYPES--------------------------- */
/**** line routines ****/
int xorLine(int x1, int y1, int x2, int y2);
int getLine(int *argx1, int *argy1, int *argx2, int *argy2);
int findPoint(LINE_LIST *lineList, int * line, int * point, int x, int y);
int movePoint();
/**** warping and morphing routines ****/
int sumLines(PICTURE *picture, COLOR *color, LINE *origline,
 POINT *warp, LINE *warpline);
float getSourceLoc(POINT *orig, LINE *origline, POINT *warp, LINE *warpline);
int setLength(LINE *line);
void setupScreen(PICTURE *pic, int editFlag);

/* ----------------------EXTERNALS---------------------------- */

/* set from last picture loaded */
extern int Xmin, Ymin, Xmax, Ymax;
extern int NumLines;
extern LINE SrcLine[MAX_LINES];
extern LINE DstLine[MAX_LINES];

/* ----------------------GLOBAL DATA-------------------------- */
int TargFlag=0;
/******** These are the basic warping calculations **********/
* FUNC: int sumLines(PICTURE *picture, COLOR *color,
* LINE *origline, POINT *warp, LINE *warpline)
* DESC: Sum and weight the contribution of each warping line
*****************************************************************/
int
sumLines(PICTURE *picture, COLOR *color, LINE *origline,
 POINT *warp, LINE *warpline)
{
 int x, y;
 float weight, weightSum;
 float distance;
 int line;
 POINT orig;
 int paletteIndex;
 float deltaSumX = 0.0;
 float deltaSumY = 0.0;
 /* if no control lines, get an unwarped pixel */
 if (NumLines == 0)
 orig = *warp;
 else {
 weightSum = 0.0;
 for (line = 0; line < NumLines; line++, origline++, warpline++) {
 distance = getSourceLoc(&orig,origline,warp,warpline);
 weight = 1/(.001+distance*distance);
 deltaSumX += (orig.x - warp->x) * weight;
 deltaSumY += (orig.y - warp->y) * weight;
 weightSum += weight;
 }
 orig.x = warp->x + deltaSumX / weightSum + .5;
 orig.y = warp->y + deltaSumY / weightSum + .5;
 }
 /* clip it to the nearest border pixel */
 x = clip(orig.x, Xmin, Xmax);
 y = clip(orig.y, Ymin, Ymax);
 paletteIndex = PIXEL (picture, x, y);
 color->r = picture->pal.c[paletteIndex].r;
 color->g = picture->pal.c[paletteIndex].g;
 color->b = picture->pal.c[paletteIndex].b;
 return (paletteIndex);
}
/*****************************************************************
* FUNC: float getSourceLoc(POINT *orig, LINE *origline,
* POINT *warp, LINE *warpline)
* DESC: For a given line, locate the corresponding warped pixel
*****************************************************************/
float
getSourceLoc(POINT *orig, LINE *origline, POINT *warp, LINE *warpline)
{
 float fraction, fdist;
 int dx, dy;

 float distance;
 dx = warp->x - warpline->p[0].x;
 dy = warp->y - warpline->p[0].y;
 fraction = (dx * (long) warpline->delta_x + dy
 * (long) warpline->delta_y) / (float) (warpline->length_square);
 fdist = (dx * (long) -warpline->delta_y + dy
 * (long) warpline->delta_x) / (float) warpline->length;
 if (fraction <= 0 )
 distance = sqrt(dx*(long) dx + dy * (long) dy);
 else if (fraction >= 1) {
 dx = warp->x - warpline->p[1].x;
 dy = warp->y - warpline->p[1].y;
 distance = sqrt(dx*(long) dx + dy * (long) dy);
 }
 else if (fdist >= 0)
 distance = fdist;
 else
 distance = -fdist;
 orig->x = origline->p[0].x + fraction * origline->delta_x -
 fdist * origline->delta_y / (float) origline->length + .5;
 orig->y = origline->p[0].y + fraction * origline->delta_y +
 fdist * origline->delta_x / (float) origline->length + .5;
 return distance;
}
/*****************************************************************
* FUNC: int setLength(LINE *line)
* DESC: Set the deltas, the length and the length squared for a given line.
*****************************************************************/
int
setLength (LINE *line)
{
 line->delta_x = line->p[1].x - line->p[0].x;
 line->delta_y = line->p[1].y - line->p[0].y;
 line->length_square = SQUARE(line->delta_x) + SQUARE(line->delta_y);
 line->length = sqrt(line->length_square);
}
/********************* The line routines **********************/
* FUNC: int xorLine(int x1, int y1, int x2, int y2)
* DESC: Draw a line on the screen using the XOR of the screen index.
*****************************************************************/
int
xorLine(int x1, int y1, int x2, int y2)
{
 int oldcolor = _getcolor();
 _setcolor(WHITE); /* Use white as the xor color */
 _setwritemode(_GXOR);
 _moveto (x1,y1);
 _lineto (x2,y2);
 _setcolor(oldcolor); /* restore the old color */
}
/*****************************************************************
* FUNC: int getLine(int *argx1, int *argy1, int *argx2, int*argy2)
* DESC: Input a line on the screen with the mouse.
*****************************************************************/
int
getLine (int *argx1, int *argy1, int *argx2, int *argy2)
{
 int x1,y1, x2,y2;
 int oldx, oldy;

 int input;
 /* save the current mode */
 short old_mode = _getwritemode();
 /* get input until we have a real line, not just a point */
 do {
 /* wait for button or key press */
 while (!(input = mousePos (&x1, &y1)));
 if (input & KEYPRESS) {
 _setwritemode(old_mode);
 return 1;
 }
 oldx=x1, oldy=y1;
 hideMouse();
 /* prime the pump with this dot */
 xorLine (x1, y1, oldx, oldy);
 showMouse();
 while (input = mousePos (&x2, &y2)) {
 /* rubber band a line while the mouse is dragged */
 if (x2 != oldx y2 != oldy)
 {
 hideMouse();
 xorLine (x1, y1, oldx, oldy);
 xorLine (x1, y1, x2, y2);
 showMouse();
 oldx=x2, oldy=y2;
 }
 }
 } while (x1 == x2 && y1 == y2);
 *argx1 = x1, *argy1 = y1;
 *argx2 = x2, *argy2 = y2;
 _setwritemode(old_mode); /* get out of XOR mode */
 return (0);
}
/*****************************************************************
* FUNC: int findPoint(LINE_LIST *lineList,int * line,int * point,int x, int y)
* DESC: loop thru dstline and find point within GRAB_DISTANCE,
* return 1 if found, 0 otherwise.
*****************************************************************/
int
findPoint (LINE_LIST *lineList, int * line, int * point, int x, int y)
{
 int l, p;
 int minl, minp;
 long length;
 long minlength = SQUARE(640) + SQUARE(480);
 for (l = 0; l < lineList->number; l++) {
 for (p = 0; p <= 1; p++) {
 length = SQUARE(lineList->line[l].p[p].x - x)
 + SQUARE(lineList->line[l].p[p].y - y);
 if (length < minlength) {
 minlength = length;
 minl = l;
 minp = p;
 }
 }
 }
 if (minlength > GRAB_DISTANCE)
 return 0;
 *line = minl;

 *point = minp;
 return 1;
}
/*****************************************************************
* FUNC: int movePoint(LINE_LIST *lineList)
* DESC: Grab a point and move it. Return 1 when key is pressed, else return 0.
*****************************************************************/
int
movePoint(LINE_LIST *lineList)
{
 int stuckx, stucky, movex,movey;
 int oldx, oldy;
 int input;
 int line, point;
 /* save the current mode */
 short old_mode = _getwritemode();
 do {
 /* keep getting input until we have a mouse button */
 while (!(input = mousePos (&movex, &movey)));
 if (input & KEYPRESS) {
 _setwritemode(old_mode);
 return 1;
 }
 if (!findPoint(lineList, &line, &point, movex, movey)) {
 _setwritemode(old_mode);
 return 0;
 }
 /* establish fixed end point */
 stuckx = lineList->line[line].p[1-point].x;
 stucky = lineList->line[line].p[1-point].y;
 oldx=movex, oldy=movey;
 hideMouse();
 /* erase the old line */
 xorLine (stuckx, stucky,
 lineList->line[line].p[point].x,
 lineList->line[line].p[point].y);
 /* and prime the pump with the new line */
 xorLine (stuckx, stucky, oldx, oldy);
 showMouse();
 while (input = mousePos (&movex, &movey)) {
 /* rubber band a line while the mouse is dragged */
 if (movex != oldx movey != oldy) {
 hideMouse();
 xorLine (stuckx, stucky, oldx, oldy);
 xorLine (stuckx, stucky, movex, movey);
 showMouse();
 oldx=movex, oldy=movey;
 }
 }
 } while (stuckx == movex && stucky == movey);
 lineList->line[line].p[point].x = movex;
 lineList->line[line].p[point].y = movey;
 _setwritemode(old_mode); /* get out of XOR mode */
 return (0);
}
/*****************************************************************
* FUNC: void createLines(PICTURE *pic, LINE_LIST *lineList)
* DESC: create a list of line segments for a picture
*****************************************************************/

void
createLines(PICTURE *pic, LINE_LIST *lineList)
{
 setupScreen(pic, 0); /* set for enter prompt */
 initMouse();
 showMouse();
 for (lineList->number = 0;lineList->number < MAX_LINES;
 lineList->number++) {
 if (getLine(&lineList->line[lineList->number].p[0].x,
 &lineList->line[lineList->number].p[0].y,
 &lineList->line[lineList->number].p[1].x,
 &lineList->line[lineList->number].p[1].y))
 break;
 }
 hideMouse();
}
/*****************************************************************
* FUNC: void editLines(PICTURE *pic, LINE_LIST *lineList)
* DESC: move around some existing lines
*****************************************************************/
void
editLines(PICTURE *pic, LINE_LIST *lineList)
{
 int segment;
 setupScreen(pic, 1); /* set for edit prompt */
 initMouse();
 for (segment = 0; segment < lineList->number; segment++) {
 xorLine(lineList->line[segment].p[0].x, lineList->line[segment].p[0].y,
 lineList->line[segment].p[1].x, lineList->line[segment].p[1].y);
 }
 showMouse();
 /* move the endpoints around */
 while(!movePoint(lineList));
 hideMouse();
}
/*****************************************************************
* FUNC: void setupScreen(PICTURE *pic, int editFlag)
* DESC: Print a message introducing the screen, wait for input,
* then set the graphics mode and display the screen.
*****************************************************************/
void
setupScreen(PICTURE *pic, int editFlag)
{
 static char *editMess[2] = {"enter", "edit"};
 static char *targMess[2] = {"source", "target"};
 setTextMode();
 _settextposition(VTAB, HTAB);
 printf("When you are ready to %s the control lines", editMess[editFlag]);
 _settextposition(VTAB+2, HTAB);
 printf("for the %s image, press any key.", targMess[TargFlag]);
 waitForKey();
 setGraphicsMode();
 displayPicture(pic);
}

[LISTING TWO]

/****************************************************************
* FILE: define.h

* DESC: These are the main defines for dissolve, warp, morph, load and fix.
* HISTORY: Created 1/11/1993 LAST CHANGED: 5/ 6/1993
* Copyright (c) 1993 by Scott Anderson
****************************************************************/

/* ----------------------DEFINES------------------------------ */
#define ON 1
#define OFF 0
#define MAX_TWEENS 99 /* Maximum tweens (2 digits) */
/* minus 2 digit tween# appended to end */
#define MAX_NAME_SIZE (8-2)
#define HEADER_LEN 128 /* PCX header length */
/* Number of colors in the palette */
#define COLORS 256
/* bytes in palette (COLORS*3) */
#define PALETTE_SIZE (3*COLORS)
/* Maximum number of morphing lines */
#define MAX_LINES 32
/* max number of pixels wide we handle */
#define MAX_WIDE 320
/* max number of pixels tall we handle */
#define MAX_TALL 200
/* Size of screen buffer */
#define MAX_BYTES (MAX_WIDE*(long) MAX_TALL)
/* Number of components per color (RGB) */
#define COMPS 3
/* largest color component value */
#define MAX_COMP 32
/* the midpoint of the colors - for gray */
#define MID_COMP (MAX_COMP/2)
/* enough to handle about 10 different palettes */
#define MAX_FREQ 1023
#define MAX_FILES 10
/* length of a file name including directory */
#define MAX_PATHLEN 80
#define ENTER 13 /* Keyboard values */
#define ESC 27
#define HTAB 18 /* Position for text messages */
#define VTAB 8
/* The mouse button & keyboard constants */
#define NO_BUTTON 0
#define LEFT_BUTTON 1
#define RIGHT_BUTTON 2
#define KEYPRESS 4
/* the square of min dist for grabbing pt */
#define GRAB_DISTANCE 25
/* Some of the graphics colors */
#define BLACK 0
#define WHITE 255
#define EXT_PCX ".PCX" /* pcx file extension */
/* primary line file holder extension */
#define EXT_LINE1 ".LN1"
#define EXT_LINE2 ".LN2" /* aux file for warp lines */
#define ERROR -1 /* General-purpose error code */

typedef enum {
 NO_ERROR, /* first entry means everything is ok */
 MEMORY_ERR, /* Not enough memory */
 READ_OPEN_ERR, /* Couldn't open file for reading */

 READ_ERR, /* Trouble reading the file */
 WRITE_OPEN_ERR, /* Couldn't open the file for writing */
 WRITE_ERR, /* Couldn't write the file */
 MOUSE_ERR, /* No mouse driver found */
 WRONG_PCX_FILE, /* PCX file format not supported yet */
 READ_CONTENTS_ERR /* error in .LN file */
}
ERR;
/* -----------------------MACROS------------------------------ */
#define MIN(a,b) (((a)<(b)) ? (a) : (b))
#define PIXEL(p,x,y) (p->pixmap[y * (long) p->wide + x])
#define SQUARE(x) (((long) x)*(x))
/* ----------------------TYPEDEFS----------------------------- */
typedef struct {
 int x,y; /* the screen coordinates of the point */
}
POINT;
typedef struct {
 POINT p[2];
}
LINE_SEGMENT;
typedef struct {
 int number; /* number of segments to follow */
 LINE_SEGMENT line[MAX_LINES];
 char *filename; /* name of file holding the line list */
}
LINE_LIST;
typedef struct {
 POINT p[2]; /* the endpoints */
 int delta_x, delta_y; /* x & y displacement */
 float length; /* the precalculated length of the line */
 long length_square; /* the length squared */
}
LINE;
typedef struct {
 /* red, green, and blue color components */
 unsigned char r, g, b;
}
COLOR;
typedef struct {
 COLOR c[COLORS]; /* a 256 entry palette */
}
PALETTE;
typedef struct {
 int xmin, ymin; /* the upper left corner */
 int xmax, ymax; /* the lower right corner */
 int wide, tall; /* the width and height */
 int pal_id; /* an ID number for each palette */
 PALETTE pal; /* the actual palette is here */
 unsigned char far *pixmap; /* a pointer to the pixel map */
}
PICTURE;
typedef struct linko {
 struct linko *next;
 char *str;
}
LINKED_LIST;
/* ----------------------PROTOTYPES--------------------------- */
/**** file handling routines ****/

extern PICTURE *loadPicture(char *filename);
extern int loadPalette(FILE *fp, PALETTE *palette);
extern int getBlock (unsigned char *byte, int *count, FILE *fp);
extern int mustRead(FILE *fp, char *buf, int n);
extern int saveScreen(PALETTE *pal);
extern int putBlock(unsigned char num, unsigned char color, FILE *fp);
extern int writeByte(unsigned char *byte, FILE *fp);
/**** screen and color routines ****/
extern int defaultPalette(PALETTE *palette);
extern int setPalette(PALETTE *palette);
extern int displayPicture(PICTURE *picture);
extern int displayNoPal(PICTURE *picture);
extern int freePicture(PICTURE *pic);
/**** mouse routines ****/
extern int initMouse();
extern int hideMouse();
extern int showMouse();
extern int mousePos(int *x, int *y);
/**** general purpose routines ****/
extern int clip(int num, int min, int max);
extern int quitCheck();
extern void quit(int err, char *name);
extern int wait(int count);
extern int waitForKey();
extern char lineAsk(char *name);
/* ----------------------GLOBAL DATA-------------------------- */
extern int TargFlag;
extern int Key;

[LISTING THREE]

/****************************************************************
* FILE: morph.c
* DESC: Create a metamorphosing sequence between two given images. This
* program lets you specify two files to morph, then prompts you for
* control lines. It uses the lines to warp the underlying images a step
* at a time, combine them, and optionally save them as numbered PCX files.
* HISTORY: Created 1/13/1993 LAST CHANGED: 5/ 6/1993
* Copyright (c) 1993 by Scott Anderson
****************************************************************/

/* ----------------------INCLUDES----------------------------- */
#include <conio.h>
#include <stdio.h>
#include <io.h>
#include <math.h>
#include <graph.h>
#include <malloc.h>
#include <memory.h>
#include <string.h>
#include "define.h"

/* ----------------------DEFINES------------------------------ */
#define MORPH_TWEENS 1

/* ----------------------PROTOTYPES--------------------------- */
int tweenMorph(PICTURE *src, PICTURE *dst);

/* ----------------------EXTERNALS---------------------------- */

/**** color routines ****/
extern int closestColor(int r, int g, int b, PALETTE *palPtr);
extern void collapseColors(PALETTE *palPtr);
/**** line routines ****/
extern int setLength(LINE *line);
extern int sumLines(PICTURE *picture, COLOR *color,
 LINE *origline, POINT *warp, LINE *warpline);
/**** io routines ****/
extern LINE_LIST *loadLines(char *filename, char *extension);
extern void saveLines(char *filename,
 LINE_LIST *lineList, char *extension);

/***** variables used to compute intermediate images ****/
/* number of colors in tweened image before reduction*/
extern int Ncolors;
/* r, g, b frequency counter array */
extern unsigned int far Freq[MAX_COMP][MAX_COMP][MAX_COMP];
/* tweened images red, grn, and blu components*/
extern unsigned char far Red[MAX_WIDE][MAX_TALL];
extern unsigned char far Grn[MAX_WIDE][MAX_TALL];
extern unsigned char far Blu[MAX_WIDE][MAX_TALL];
extern PALETTE TweenPal; /* resulting palette */

/**** other variables ****/
extern char *OutFilename;
/* set from last picture loaded */
extern int Xmin, Ymin, Xmax, Ymax;
/* ID of palette currently being displayed */
extern int CurrentPal;

/* ----------------------GLOBAL DATA-------------------------- */
PICTURE *Src; /* source & destination picture pointers */
PICTURE *Dst;
LINE SrcLine[MAX_LINES];
LINE DstLine[MAX_LINES];
int Tweens;
int NumLines;

/*****************************************************************
* FUNC: main (int argc, char *argv[])
* DESC: Read in a filename to load
*****************************************************************/
main (int argc, char *argv[])
{
 int segment;
 LINE_LIST *lineSrcList;
 LINE_LIST *lineDstList;
 char answer;
 /* load the pcx file if one is given */
 if ((3 > argc) (argc > 5)) {
 printf("Usage: morph <source><dest> [<steps> [<output>]]\n\n");
 printf("Where: <source> is the source PCX filename\n");
 printf(" <dest> is the destination filename\n");
 printf(" <steps> is the optional sequence size\n");
 printf(" (the max is %d, the default is %d)\n",
 MAX_TWEENS, MORPH_TWEENS+2);
 printf(" <output> is the optional output filename\n");
 printf(" (defaults to no output)\n\n");
 printf("Note: The output filename can be at most %d

 characters long.\n", MAX_NAME_SIZE);
 printf(" The PCX extension is added automatically,
 so don't\n");
 printf(" include it in the filename.\n");
 printf(" Morph only accepts PCX files with %d X %d
 resolution\n", MAX_WIDE, MAX_TALL);
 printf(" and %d colors.\n", COLORS);
 exit(0);
 }
 if (argc > 3) {
 /* subtract two from the series count to get the tweens
 * since the starting and ending frame are included. */
 Tweens = clip (atoi(argv[3]) - 2, 1, MAX_TWEENS);
 if (argc > 4)
 OutFilename = argv[4];
 }
 else
 Tweens = MORPH_TWEENS;
 printf("Loading the file %s\n", argv[1]);
 Src = loadPicture(argv[1]);
 if (Src == NULL)
 quit(MEMORY_ERR, "");
 printf("Loading the file %s\n", argv[2]);
 Dst = loadPicture(argv[2]);
 if (Dst == NULL)
 quit(MEMORY_ERR, "");
 lineSrcList = loadLines(argv[1], EXT_LINE1);
 if (lineSrcList->number != 0) {
 if (lineAsk(argv[1]) == N')
 createLines(Src, lineSrcList);
 else
 editLines(Src, lineSrcList);
 }
 else
 createLines(Src, lineSrcList);

 TargFlag = 1; /* For the screen intro message */
 NumLines = lineSrcList->number;
 if (NumLines) {
 lineDstList = loadLines(argv[2], EXT_LINE1);
 /* inconsistent warp target*/
 if (lineDstList->number != NumLines)
 lineDstList->number = 0;
 if (lineDstList->number) { /* ask what he wants to do */
 if (lineAsk(argv[2]) == N')
 lineDstList->number = 0;
 }
 if (lineDstList->number == 0) { /* create a warp target */
 /* copy the source lines */
 lineDstList->number = NumLines;
 for (segment = 0; segment < NumLines; segment++)
 lineDstList->line[segment] = lineSrcList->line[segment];
 }
 editLines(Dst, lineDstList);
 saveLines(argv[1], lineSrcList, EXT_LINE1);
 saveLines(argv[2], lineDstList, EXT_LINE1);
 beep();
 for (segment = 0; segment < NumLines; segment++) {
 DstLine[segment].p[0]=lineDstList->line[segment].p[0];

 DstLine[segment].p[1]=lineDstList->line[segment].p[1];
 setLength(&DstLine[segment]);
 SrcLine[segment].p[0]=lineSrcList->line[segment].p[0];
 SrcLine[segment].p[1]=lineSrcList->line[segment].p[1];
 setLength(&SrcLine[segment]);
 }
 }
 tweenMorph(Src, Dst);
 setTextMode();
}
/*****************************************************************
* FUNC: int tweenMorph(PICTURE *src, PICTURE *dst)
* DESC: calculate a pixel to plot, from the warping function
*****************************************************************/
#define TOTAL_WEIGHT (100) /* Good for up to 99 tweens */
tweenMorph(PICTURE *src, PICTURE *dst)
{
 int color;
 POINT warp;
 int x,y;
 COLOR scolor, dcolor;
 LINE warpLine[MAX_LINES];
 int t, i, p;
 int r, g, b;
 unsigned int srcweight, srcpaletteindex;
 unsigned int dstweight, dstpaletteindex;
 displayPicture(src);
 saveScreen(&src->pal);
 /* src is on screen, now tween to the target */
 for (t = 1; t <= Tweens; t++) {
 /* Tween the lines used to warp the images */
 for (i = 0; i < NumLines; i++) {
 for (p = 0; p < 2; p++) {
 warpLine[i].p[p].x = SrcLine[i].p[p].x +
 ((DstLine[i].p[p].x - SrcLine[i].p[p].x) * t)
 /(Tweens+1);
 warpLine[i].p[p].y = SrcLine[i].p[p].y +
 ((DstLine[i].p[p].y - SrcLine[i].p[p].y) * t)
 /(Tweens+1);
 }
 setLength(&warpLine[i]);
 }
 dstweight = t * TOTAL_WEIGHT / (Tweens+1);
 srcweight = TOTAL_WEIGHT - dstweight;
 /* Zero out the buffers */
 initFreq();
 /* set background to black */
 _fmemset(Red, 0, sizeof Red);
 _fmemset(Grn, 0, sizeof Grn);
 _fmemset(Blu, 0, sizeof Blu);
 /* Go through the screen and get warped source pixels */
 for (warp.y = Ymin; warp.y <= Ymax; warp.y++) {
 if (quitCheck())
 quit(0, "");
 for (warp.x = Xmin; warp.x <= Xmax; warp.x++) {
 sumLines(src, &scolor, SrcLine, &warp, warpLine);
 sumLines(dst, &dcolor, DstLine, &warp, warpLine);
 r = (scolor.r * srcweight + dcolor.r * dstweight)
 / TOTAL_WEIGHT;

 g = (scolor.g * srcweight + dcolor.g * dstweight)
 / TOTAL_WEIGHT;
 b = (scolor.b * srcweight + dcolor.b * dstweight)
 / TOTAL_WEIGHT;
 if (Freq[r][g][b] == 0) /* A new color */
 Ncolors++;
 /* Keep it to one byte */
 if (Freq[r][g][b] < MAX_FREQ)
 Freq[r][g][b]++;
 /* put RGB components into temporary buffer */
 Red[warp.x][warp.y] = r;
 Grn[warp.x][warp.y] = g;
 Blu[warp.x][warp.y] = b;
 }
 }
 collapseColors(&TweenPal);
 setPalette(&TweenPal);
 for (y = Ymin; y <= Ymax; y++) {
 if (quitCheck())
 quit(0, "");
 for (x = Xmin; x <= Xmax; x++) {
 color = closestColor( Red[x][y], Grn[x][y],
 Blu[x][y], &TweenPal);
 _setcolor (color);
 _setpixel (x, y);
 }
 }
 /* no output file name on command line */
 if (!OutFilename) {
 beep();
 waitForKey(); /* so pause to enjoy the pictures */
 }
 else
 saveScreen(&TweenPal);
 }
 if (OutFilename) { /* save the last pic in this series */
 CurrentPal = 0; /* force a new palette */
 displayPicture(dst);
 saveScreen(&dst->pal);
 }
}

[LISTING FOUR]

/****************************************************************
* FILE: load.c
* DESC: This program loads a PCX file or a list of them. It crams as
* many into memory as it can, then it flips quickly through them.
* HISTORY: Created 1/13/1993 LAST CHANGED: 3/20/1993
* Copyright (c) 1993 by Scott Anderson
****************************************************************/
/* ----------------------INCLUDES----------------------------- */
#include <conio.h>
#include <stdio.h>
#include <io.h>
#include <math.h>
#include <graph.h>
#include <string.h>
#include "define.h"

/* ----------------------EXTERNALS---------------------------- */
/* External functions */
extern int quitCheck();
extern LINKED_LIST *rootSequence(int argc, char *argv[]);
/* External variables */
extern int Wait;
extern int Key;
extern int EndWait;
/* ----------------------GLOBAL DATA-------------------------- */
PICTURE *Src[MAX_FILES]; /* source picture pointer */
/*****************************************************************
* FUNC: main (int argc, char *argv[])
* DESC: Display the file or sequence passed on the command line. Read in as
* many files as will fit in memory, then display them in a loop.
*****************************************************************/
main (int argc, char *argv[])
{
 int file, fileNum;
 int direction;
 int i;
 LINKED_LIST *pcxList;
 LINKED_LIST *pcxListHead;
 if (argc == 1) {
 printf("Usage: load <name>\n\n");
 printf("Where: <name> is the root name of a sequence\n");
 exit(23);
 }
 setGraphicsMode();
 file = 1;
 pcxListHead = rootSequence(argc, argv);
 for (pcxList=pcxListHead; pcxList; pcxList = pcxList->next) {
 Src[file] = loadPicture(pcxList->str);
 if (Src[file] == NULL)
 break;
 displayPicture(Src[file++]);
 }
 fileNum = file - 1;
 if (fileNum == 1) /* there's only one file */
 waitForKey(); /* so wait for the user to quit */
 else if (fileNum > 1) {
 file = 1;
 direction = 1;
 while (!(quitCheck())) {
 if ((file += direction) >= fileNum)
 direction = -1;
 if (file <= 1)
 direction = 1;
 displayPicture(Src[file]);
 if (EndWait && (file == 1 file == fileNum))
 wait(Wait);
 }
 }
 /* Reset to original mode, then quit */
 setTextMode();
}

[LISTING FIVE]

/****************************************************************

* FILE: fix.c
* DESC: This program inputs a list of pictures, creates a best
* fit palette, remaps the pictures, and writes them out.
* HISTORY: Created 1/13/1993 LAST CHANGED: 3/10/1993
* Copyright (c) 1993 by Scott Anderson
****************************************************************/
/* ----------------------INCLUDES----------------------------- */
#include <conio.h>
#include <stdio.h>
#include <io.h>
#include <math.h>
#include <graph.h>
#include <malloc.h>
#include <memory.h>
#include <string.h>
#include "define.h"
/* ----------------------EXTERNALS---------------------------- */
extern LINKED_LIST *rootSequence(int argc, char *argv[]);
/**** color routines ****/
extern int closestColor(int r, int g, int b, PALETTE *pal);
extern void collapseColors(PALETTE *palPtr);
extern int mergePalette(PICTURE *pic);
extern int remapPicture(PICTURE *picPtr, PALETTE *palPtr);
/**** line routines ****/
extern int getLine(int *argx1, int *argy1, int *argx2, int *argy2);
extern int movePoint();
extern int setLength(LINE *line);
/**** other variables ****/
extern char *OutFilename;
/* set from last picture loaded */
extern int Xmin, Ymin, Xmax, Ymax;
/* ----------------------GLOBAL DATA-------------------------- */
PICTURE *Src; /* source & destination picture pointers */
/***** variables used to compute intermediate images ****/
/* number of colors in tweened image before reduction*/
extern int Ncolors;
/* r, g, b frequency counter array */
extern unsigned int far Freq[MAX_COMP][MAX_COMP][MAX_COMP];
/* tweened images red, grn, and blu components*/
extern unsigned char far Red[MAX_WIDE][MAX_TALL];
extern unsigned char far Grn[MAX_WIDE][MAX_TALL];
extern unsigned char far Blu[MAX_WIDE][MAX_TALL];
extern PALETTE TweenPal; /* resulting palette */
/*****************************************************************
* FUNC: main (int argc, char *argv[])
* DESC: Read in a list of filenames to load, change their palettes
* to the best-fit palette, and write them out.
*****************************************************************/
main (int argc, char *argv[])
{
 int file;
 LINKED_LIST *pcxList, *pcxListHead;
 /* load the pcx file if one is given */
 if (argc < 3) {
 printf("Usage: fix <infile> <outfile>\n\n");
 printf("Where: <infile> is the input sequence name\n");
 printf(" <outfile> is the output sequence name\n");
 exit(0);
 }

 OutFilename = argv[argc-1];
 initFreq();
 pcxListHead = rootSequence(argc-1, argv);
 for (pcxList = pcxListHead; pcxList; pcxList=pcxList->next) {
 printf("Loading the file %s\n", pcxList->str);
 Src = loadPicture(pcxList->str);
 if (Src == NULL)
 quit(MEMORY_ERR, "");
 mergePalette(Src);
 freePicture(Src);
 }
 collapseColors(&TweenPal);
 setGraphicsMode();
 setPalette(&TweenPal);
 for (pcxList = pcxListHead; pcxList; pcxList=pcxList->next) {
 Src = loadPicture(pcxList->str);
 if (Src == NULL)
 quit(MEMORY_ERR, "");
 remapPicture(Src, &TweenPal);
 displayNoPal(Src);
 saveScreen(&TweenPal);
 freePicture(Src);
 }
 setTextMode();
}

[LISTING SIX]

/****************************************************************
* FILE: color.c
* DESC: This file contains the color routines used by morph, dissolve and fix.
* HISTORY: Created 3/18/1993 LAST CHANGED: 5/ 6/1993
* Copyright (c) 1992 by Scott Anderson
****************************************************************/
#include <stdio.h>
#include <memory.h>
#include "define.h"
/* ----------------------DEFINES------------------------------ */
/* ----------------------TYPEDEFS/STRUCTS--------------------- */
/* ----------------------PROTOTYPES--------------------------- */
int closestColor(int r, int g, int b, PALETTE *palPtr);
void collapseColors(PALETTE *palPtr);
int mergePalette(PICTURE *pic);
int remapPicture(PICTURE *picPtr, PALETTE *palPtr);
int initFreq();
/* ----------------------EXTERNALS---------------------------- */
/* set from last picture loaded */
extern int Xmin, Ymin, Xmax, Ymax;
/* ----------------------GLOBAL DATA-------------------------- */
/* number of colors in tweened image before reduction*/
int Ncolors;
/* r, g, b frequency counter array */
unsigned int far Freq[MAX_COMP][MAX_COMP][MAX_COMP];
/* tweened images red, grn, and blu components*/
unsigned char far Red[MAX_WIDE][MAX_TALL];
unsigned char far Grn[MAX_WIDE][MAX_TALL];
unsigned char far Blu[MAX_WIDE][MAX_TALL];
PALETTE TweenPal; /* resulting palette */
/*****************************************************************

* FUNC: void collapseColors(PALETTE *palPtr)
* DESC: Collapse the colors in the Freq table until
* Ncolors < COLORS, then put it in the given color palette.
*****************************************************************/
void
collapseColors(PALETTE *palPtr)
{
 int freqCutoff;
 int r, g, b;
 int index;
 int ncolors;
 static int freqCount[MAX_FREQ+1];
 memset(freqCount, 0, sizeof freqCount);
 for (r = 0; r < MAX_COMP; r++)
 for (g = 0; g < MAX_COMP; g++)
 for (b = 0; b < MAX_COMP; b++)
 freqCount[Freq[r][g][b]]++;
 ncolors = 0;
 for (freqCutoff = COLORS-1; freqCutoff > 1; freqCutoff--) {
 ncolors += freqCount[freqCutoff];
 if (ncolors > COLORS) break;
 }
 /* Collapse color space to 256 colors */
 r = g = b = 0;
 while (Ncolors >= COLORS) {
 for (; r < MAX_COMP; r++, g=0) {
 for (; g < MAX_COMP; g++, b=0) {
 for (; b < MAX_COMP; b++) {
 if (Freq[r][g][b] && Freq[r][g][b]
 <= freqCutoff)
 goto castOut; /* the ultimate no no */
 }
 }
 }
 r = g = b = 0;
 freqCutoff++;
 continue;
 castOut:
 Freq[r][g][b] = 0; /* just remove this low freq color */
 Ncolors--;
 }
 /* build a palette out of all the remaining non zero freq's */
 index = 0;
 for (r = 0; r < MAX_COMP; r++)
 for (g = 0; g < MAX_COMP; g++)
 for (b = 0; b < MAX_COMP; b++)
 /* we have a color we need to map */
 if (Freq[r][g][b]) {
 palPtr->c[index].r = r;
 palPtr->c[index].g = g;
 palPtr->c[index].b = b;
 /* remember index in palette */
 Freq[r][g][b] = index;
 index++;
 }
}

/*****************************************************************
* FUNC: int closestColor(int r, int g, int b, PALETTE *palPtr)

* DESC: return the palette index of the color closest to rgb.
*****************************************************************/
int
closestColor(int r, int g, int b, PALETTE *palPtr)
{
 int index;
 int distance;
 int min_distance = 3200; /* a big number */
 int min_index;
 /* The value in Freq is now the index into the color table */
 if (Freq[r][g][b]) return Freq[r][g][b];
 /* If zero, search for the closest color */
 for (index = 1; index < Ncolors; index++) {
 /* this is really the distance squared, but it works */
 distance = SQUARE (r - palPtr->c[index].r) +
 SQUARE (g - palPtr->c[index].g) +
 SQUARE (b - palPtr->c[index].b);
 if (distance < min_distance) {
 min_distance = distance;
 min_index = index;
 if (distance <= 2) break; /* close enough! */
 }
 }
 /* New index - for future reference */
 Freq[r][g][b] = min_index;
 return min_index;
}
/*****************************************************************
* FUNC: int mergePalette(PICTURE *picPtr)
* DESC: Merge a palette into Freq count table.
*****************************************************************/
int
mergePalette(PICTURE *picPtr)
{
 int r, g, b;
 unsigned int pos;
 unsigned char index;
 PALETTE *palPtr = &picPtr->pal;
 unsigned char far *bufPtr = picPtr->pixmap;
 for (pos = 0; pos < MAX_BYTES; pos++) {
 index = *bufPtr++;
 r = palPtr->c[index].r;
 g = palPtr->c[index].g;
 b = palPtr->c[index].b;
 if (Freq[r][g][b] == 0) /* A new color */
 Ncolors++;
 if (Freq[r][g][b] < MAX_FREQ) /* Keep it managable */
 Freq[r][g][b]++;
 }
}
/*****************************************************************
* FUNC: int remapPicture(PICTURE *picPtr, PALETTE *palPtr)
* DESC: Remap a picture with a different palette.
*****************************************************************/
int
remapPicture(PICTURE *picPtr, PALETTE *palPtr)
{
 int x, y;
 int index;

 int r, g, b;
 unsigned int pos;
 unsigned char lookup[COLORS];
 unsigned char far *bufPtr;
 /* Create the cross-reference lookup table */
 for (index = 0; index < COLORS; index++) {
 r = picPtr->pal.c[index].r;
 g = picPtr->pal.c[index].g;
 b = picPtr->pal.c[index].b;
 lookup[index] = closestColor(r, g, b, palPtr);
 }
 /* Save the new palette in the picture's palette */
 for (index = 0; index < COLORS; index++) {
 picPtr->pal.c[index].r = palPtr->c[index].r;
 picPtr->pal.c[index].g = palPtr->c[index].g;
 picPtr->pal.c[index].b = palPtr->c[index].b;
 }
 /* Remap the individual pixels to point to the new colors */
 for (bufPtr = picPtr->pixmap, pos = 0; pos < MAX_BYTES;
 bufPtr++, pos++)
 *bufPtr = lookup[*bufPtr];
}
/*****************************************************************
* FUNC: int initFreq()
* DESC: zero out the frequency color space table
*****************************************************************/
int
initFreq()
{
 int bytes = (sizeof Freq) / 2;
 _fmemset(Freq, 0, bytes);
 /* divide because of element size */
 _fmemset(Freq+(bytes/sizeof *Freq), 0, bytes);
 /* Guarantee a black color */
 Freq[0][0][0] = MAX_FREQ;
 /* a grey color */
 Freq[MID_COMP-1][MID_COMP-1][MID_COMP-1] = MAX_FREQ;
 /* and a white color */
 Freq[(long)MAX_COMP-1][MAX_COMP-1][MAX_COMP-1] = MAX_FREQ;
 Ncolors = 3;
}
End Listings




















Special Issue, 1994
Digital Video File Formats


Understanding QuickTime and Video for Windows


Mark is president of the San Francisco Canyon Company, which developed
QuickTime for Windows for Apple Computer. Canyon publishes the Movie Toolkit,
a C++ class library for manipulating QuickTime and AVI movie files, and Canyon
Clipz!. Canyon's How to Digitize Video will be published by John Wiley in
early 1994. Mark can be reached at 415-398-9957; through CompuServe at
72371,104; or through AppleLink at CANYON.


Fads come and go, and like object-oriented programming, artificial
intelligence, and bell-bottom dungarees, digital video is currently in vogue.
Desktop digital video was pioneered--and proven--by Apple on the Macintosh.
Over two years ago, QuickTime emerged as a strong standard with a loyal and
talented following of developers. In late 1992, Apple announced QuickTime for
Windows at the same time Microsoft ushered in Video for Windows, each vying to
become desktop standards.
As of yet, there's no clear winner. But with the advent of powerful programs
such as Adobe's Premiere for Windows, it's clear that digital video is
approaching some stage of maturity. But what is digital video, and how can
you, as a programmer, harness its power?


The 30,000-foot Perspective


Digital-video movies on your PC can be viewed as nothing more than a
collection of rather large files that otherwise look like regular DOS files.
Instead of holding eye-glazing columns of accounts-receivable data, however,
these files contain digitized sequences of video and sound. (Incidentally,
although I'll talk a lot about digital video, strictly speaking I'm referring
to time-based data. For example, a QuickTime movie of a performance of Tosca
might contain a track of video, a track of stereo sound, and additional text
tracks of the libretto in English, German, and Italian, each synchronized to
the other. Another point worth noting is that while the technocrats are well
aware of the symmetry of using the Latin terms video ["I see"] and audio ["I
hear"], sound is, for some reason, often preferred.)
An implementation of digital video must solve three problems to be viable.
First, just like the analog systems that preceded it (CD-DA or NTSC for TV),
it must define an architecture. This architecture must be robust enough to
endure (so that content providers can be sure that their material won't
quickly become obsolete) but flexible enough to adapt to the future. QuickTime
and Video for Windows mainly embody their architectures in the data structures
of their respective file formats (in the case of Video for Windows, this is
called AVI, for audio-visual interleave). I'll examine these file formats in
this article.
Secondly, digital video must provide extremely efficient compression and
decompression implementations. A quick exercise in arithmetic shows why.
Full-screen (say, 640x480), full-motion (either 24 frames per second in the
movies, or 29.97 fps on your TV), uncompressed video needs between 22.1 and
27.6 Mbytes per second. One of my favorite movies, The Great Escape, would
need 307 gigabytes to store uncompressed, which, if laid end-to-end_well, you
get the picture. And space requirements are only part of the story. Imagine if
mass-storage devices were cheap enough that you could afford 307 gigabytes for
a single movie. Your hardware would still have to support a sustained data
rate of 1 Mbyte per second to play it back.
Finally, a successful digital-video implementation must provide an engine that
can play back the movies it digitizes at realistic frame sizes and rates on
general-purpose desktop PCs. Both QuickTime and Video for Windows more or less
succeed in this goal.


QuickTime Movie File Format


The first important point to note about the QuickTime movie file format is
that QuickTime uses a strict subset of the Mac file format on the PC, making
life easier for content providers. One consequence of this is that the byte
ordering of structured data is Motorola, not Intel (because the Mac
implementation came first). This may, at first, make it confusing to relate
some of the discussion in this article to a hex dump of a QuickTime file
(although I find Motorola hex easier to read!). In general, I'll still talk
mostly about the PC, because it uses a simpler subset of the full QuickTime
specification.
QuickTime files have a recursive, atom-based format. An atom is prefixed by a
32-bit length and a 32-bit identifier. It can contain either data or more
atoms. Figure 1(a) shows the basic atom layout. The semantics of an atom are
implied by its identifier. Each atom identifier is a four-character mnemonic,
which is also the value of the identi-fier itself. This may seem odd to
Windows programmers, to whom constructs like: #define SOME_ATOM_ID 0x12345678
/* unreadable value */ are more familiar. On almost every other platform,
32-bit compilers have been quite happy to accept character constants like moov
or mdat.
Clearly, a QuickTime movie can be viewed as a tree structure. Normally, of
course, trees have a single root, but QuickTime movies have two. On the Mac,
movie data (video frames and sound samples) is stored in the file's data fork;
the atoms that describe that data can be stored in the resource fork. DOS, of
course, does not have this concept, so both tree structures are concatenated,
or flattened in QuickTime jargon. QuickTime on the Mac is quite happy to play
these flattened movies.
Movie data is stored in the mdat atom, which always comes first. It contains
only data, not other atoms, that data being the video frames and sound samples
that comprise the actual movie. The moov atom (pronounced, moo-vee) is the
root to a structure of atoms that act as an index to the movie data. Figure
1(b) shows the basic structure of all QuickTime movies on the PC.


The moov Atom


I mentioned earlier that atoms can be viewed as a tree. Table 1 shows the
basic tree structure of the moov atom. While the semantics of a particular
atom constrain it to a certain level in the tree, the ordering of atoms at a
given level is arbitrary. Moreover, software that parses the tree is expected
to ignore atoms it doesn't recognize. It is this simple facility that gives
the QuickTime movie file structure the flexibility to adapt to future needs.
You can explore this structure for yourself by using Apple's DUMPMOOV program.
Under DOS, it generates output like that shown in Listing One, page 17. Space
constraints prevent a detailed examination of each atom. This can be found in
Apple's QuickTime Movie Exchange Toolkit, and Canyon's Movie Tookit.
The mvhd atom defines the overall characteristics of the movie, principally
its time scale and duration. A time scale is simply the units (in events per
second) in which time values are expressed. For example, a time scale of 1000
means that time values are interpreted as milliseconds. However, time scales
of 100 or 1000, while seemingly convenient, are not often used. You will more
likely see scales of 600 because it has more factors, allowing integer
arithmetic to be performed with less loss of precision.
The Movie Header time scale and duration provide the key to synchronization.
Rather than synchronize video and sound to a particular fixed frame rate (like
analog systems or Microsoft's AVI), QuickTime synchronizes all its tracks to
the Movie Header. In digital video (as opposed to analog video), frame rates
do not have to be constant. There's no celluloid driven by sprockets in front
of a beam of light. A single digital image can be displayed on the CRT for as
short or as long a time as necessary. This is where the stts atom comes into
play. Conceptually, it is an array of durations for each frame in the movie,
each of which can, of course, be a different value. In practice, a simple
compression scheme allows a single value to be applied to multiple frames. For
example, Figure 2 specifies that all 1270 frames have a duration of 100.
QuickTime movies may contain an arbitrary number of trak atoms, reflecting
each of its tracks. Like their analog counterparts, tracks can hold video or
sound. Additionally, QuickTime supports text tracks, although they are not yet
implemented in the Windows version. Any number of video tracks can be present.
QuickTime will choose the one that will look best when played on the target
device. For example, you could digitize video into three tracks using 8-bit
color, 16-bit color, or 24-bit (so-called) true color. If the movie is played
on a PC with a video adapter capable of only 8-bit color, the 8-bit color
track will be chosen. Similarly, any number of sound tracks may be present.
Each can be recorded in a different language, for example. QuickTime will
select the track that matches the current Windows language specification.
Tracks can, and typically do, have time scales and durations different from
those in the Movie Header. For example, the natural time scale for a sound
track is its sampling rate, say 11.025 kHz. A QuickTime movie can have any
number of tracks, and no one type of track is conceptually favored. For
example, movies are not required to have video tracks, and sound-only movies
are quite common. They are often used in multimedia presentations, along with
more conventional movies, instead of Microsoft WAVE files. In this way, a
single API can be used to control all aspects of the application.
A track may have an arbitrary number of elst atoms. These atoms are mainly
generated by movie-editing software like Adobe's Premiere. They allow selected
parts of a track to be played out of sequence. You may have seen the recent
Woody Allen movie, Manhattan Murder Mystery, in which Woody Allen and Diane
Keaton attempt to blackmail their neighbor, whom they suspect of murder, by
recording his girlfriend's audition for a play they've faked. Then they
literally cut-and-paste the tape to produce a convincing, but quite different,
shake-down message. In QuickTime, elst atoms are the digital equivalent of
Woody's razor blade.
The stsd atom tells QuickTime how the track's video or sound data is
compressed. The accompanying text box entitled, "Video Compression
Technologies" explores this subject further. An additional text box entitled,
"Selecting a Decompressor" describes how a decompressor is selected on
playback.
A track may have multiple stsd atoms. At first, it's hard to see why this is
useful. It implies that different parts of the track can be encoded using
different compressors, and appears to be a somewhat esoteric feature. But
consider a movie-editing package that might glue together parts of existing
movies to form a new movie. If the source movies used different compressors,
but multiple stsd atoms weren't supported, the new movie would have to be
recompressed using a single compressor. However, each time a frame is
compressed with a lossy compressor, it loses quality, much like a video tape
that is copied.
The stsc, stco, and stsz atoms are used to extract data from the mdat atom.
stsc allows video frames or sound samples to be grouped into chunks, to
improve performance on playback. Typically, chunking is performed by
postproduction optimization software. It gives the size of each chunk and the
number of video frames or sound samples in that chunk. stco gives the offset
of each chunk within the mdat atom. stsz gives the size of each video frame or
sound sample.


The mdat Atom


The mdat atom is simply a stream of video frames and sound samples.
Theoretically, the physical ordering is unimportant because the stsc, stco,
and stsz atoms are used as indexes. However, in order to play back from
relatively slow devices like CD-ROM, seeks must be avoided at all costs, so in
practice, physical ordering is extremely important. QuickTime prefers that
video frames and sound samples be physically grouped into half-second chunks,
with sound leading. The text box entitled, "Sound Encoding Techniques"
describes how sound samples are stored.


Reading and Writing QuickTime Movie Files


The first routine we'll need to tackle the task of reading and writing
QuickTime movies is a fast WORD and DWORD flip routine, which converts Intel
ordering to Motorola, and vice versa. Listing Two, page 18 shows Flip16 and
Flip32, both of which can be conveniently called from C/C++ code using the
prototypes in Figure 3. In production code, you'll want to implement
Flip16Many and Flip32Many in assembler so that you don't have to iterate over
Flip16 or Flip32 to flip multiple WORDs or DWORDs.
I'll use recursive descent to parse QuickTime movies. This technique has the
advantage of simplicity and elegance, as the structure of the code exactly
matches the structure of the file itself. Listing Three, page 18, shows the
CollectAtomsFromFile routine, the heart of the recursive-descent logic.
Listing Four, page 18, shows the actual parsing code. I use the Windows
multimedia I/O calls (mmioOpen, mmioSeek, mmioRead, and mmioClose) for
convenience; in this context, they're equivalent to any other I/O interface. I
use the Windows mmioFOURCC macro to construct atom identifier constants; if
you had a 32-bit compiler (as for Windows NT), you could simply code moov, for
example, directly.
You may have noticed that CollectAtomsFromFile flips the atom size atmh.lSize
(in order to perform arithmetic) but not its identifier atmh.lName. This isn't
a bug. Rather, it allows you to code mmioFOURCC constants in their natural,
readable Motorola order.

Although there isn't room here to show the actual code, writing a QuickTime
movie uses a structurally parallel technique. I use the following procedure:
1. Write out a dummy mdat atom with a zero length.
2. Write out all the movie data (video frames and sound chunks) in the desired
order.
3. At the same time, accumulate, in internal tables, the information you'll
need to build the moov atoms.
4. Seek to the beginning of the file, and then write out the true length of
the mdat atom.
5. Seek to the end of the mdat atom.
6. Write out all the moov atoms from your internal tables. Mirror the
recursive-descent technique to write leaf atoms first, working up the tree
toward the root. Each routine that writes an atom returns its length. This
way, routines that write nonleaf atoms simply accumulate their length from the
routines they call.


AVI Movie File Format


AVI files are stored as a specialization (form in Microsoft jargon) of the
Microsoft RIFF (Resource Interchange File Format) standard. Microsoft defines
RIFF as a tagged-file specification used to define standard formats for
multimedia files. Other forms are WAVE for waveform audio data and RDIB for
bitmaps. An introduction to RIFF can be found in the Windows Multimedia
Programmer's Guide, and a discussion of the AVI form can be found in the Video
for Windows Development Kit Programmer's Guide.
The AVI RIFF form starts with a standard 12-byte header; see Figure 1(c). Of
course, Intel byte ordering is used for all fields. In order to code
identifiers, such as RIFF and AVI, naturally, Microsoft provides the
mmioFOURCC macro. For example, the following type of construct is common:
#define formtypeAVI mmioFOURCC(A', V', I',  ).
In general, RIFF files consist of chunks, lists of chunks, or a combination of
both. The AVI form specifies which chunks are defined and the order in which
they are expected. All programs that read RIFF files are expected to ignore
chunks they don't recognize (but preserve them when the file is written). A
chunk is very similar in both form and concept to a QuickTime atom; it
consists of a 4-byte identifier and a 4-byte length, followed by the chunk
data; see Figure 1(d). The semantics of a chunk or list are implied by its
identifier. A list of chunks is prefixed by a 12-byte header, as in Figure
1(e).
Microsoft supplies two good programs for exploring AVI files. RIFFWALK works
under DOS, and generates output like that shown in Listing Five, page 19. It's
worth taking a look at this code; armed with the information discussed so far,
you'll be able to infer a lot about the AVI file structure. FILEWALK displays
similar output under Windows. Table 2 shows the required chunks and lists of
chunks in an AVI file. I'll discuss the highlights of the important chunks
shortly. Unlike QuickTime atoms, the ordering of AVI chunks is important.


The hdrl List


The hdrl list must come first in the AVI file. Its function is analogous to
that of the QuickTime moov atom. The avih chunk defines the overall
characteristics of the movie, principally the number of streams (more
conventionally known as "tracks") the movie contains, and the frame rate and
size of the video. This scheme has three problems. First, it is biased towards
video. It is conceptually impossible for an AVI file to not have a video track
(like a QuickTime sound-only movie). Second, all video tracks must have frames
of identical size. Third, AVI movies are bound to the old analog concept of a
fixed frame rate. As we noted earlier, digital-video engines are free to
display frames for as long or as short a time as they like.
Recall that CD-ROM transfer speed is currently a tremendous limiting factor on
playback rate, regardless of the rate at which the movie was recorded and the
speed of the decompressor. For example, a common CD-ROM drive can sustain a
transfer rate of about 150 Kbytes per second. Simple arithmetic shows that if
each frame is 10 Kbytes, a maximum 15 fps frame rate can be expected.
Consequently, some sophisticated postproduction software has been written to
limit the data rate of digital-video movies. Apple's MovieShop for QuickTime
does a decent job all the way down to 90 Kbytes per second (for single-speed
CD-ROM playback). At the risk of oversimplifying, one technique it may use is
to combine similar frames. To preserve sound synchronization it simply adjusts
the length of time the frame is displayed in the stts atom. This technique
cannot be used for AVI movies.
The hdrl list contains one strl list for each track. Currently, Video for
Windows requires exactly one video track and at most one sound track. The
ordering of the hdrl lists is unimportant; however (for reasons that will
become apparent when we look at the movi list), the first is denoted as stream
00, the second as stream 01, and so on. The strh and strf further define the
characteristics of each track (such as the sampling rate and size for sound).
Video and sound tracks are not synchronized to a common time base. Instead,
video is expected to be played at its frame rate, and sound at its sampling
rate, with some element of faith that they will match up. This simple scheme
does not allow, for example, effects such as discontinuous sound without
artificially inserting periods of silence (and increasing the data rate).


The movi List


The movi list follows the hdrl list, and contains the actual movie data. It is
analogous to the QuickTime mdat atom. Often, the movi list is preceded by a
so-called "junk" chunk so that the first data chunk is aligned on a 2K
boundary, improving playback performance from CD-ROM. These chunks are used
only for alignment and have no other semantics.
Data in the movi list can be structured either in chunks or in lists of
chunks. Sound data is stored in ##wb chunks, while video data is stored in
##dc chunks, where ## represents the corresponding stream number. Video for
Windows prefers that sound and video chunks be paired, with the video chunk
holding a frame and the corresponding sound chunk holding a frame's worth of
sound.
For efficiency, these sound-video chunk pairs can themselves be grouped inside
a rec list. The playback engine will read the entire contents of a rec list at
once. Often, the list will end with a junk chunk so that its length is a
multiple of 2K.


The idx1 Chunk


Although it is technically optional, almost all AVI movies end with an index
(idx1) chunk. Each entry in the index points to a chunk or rec list in the
movi list. Figure 1(f) shows the AVI index-entry format. If the index is
present (as denoted by flags in the avih chunk), you are expected to use it to
parse the data in the movi list. The ordering of index entries defines the
order in which video and sound chunks must be played. One trick about using
the index is worth noting. It normally records chunk offsets relative to the
start of the movi list. However, Microsoft reportedly changed its mind during
the Video for Windows beta period and some early encodings record chunk
offsets relative to the beginning of the file. To determine which is being
used, I read the first index entry. If its chunk offset is large (greater than
2K), I assume the old encoding; if it is small, I assume the new.


Reading and Writing AVI Movie Files


The Windows multimedia I/O calls (mmioOpen, mmioClose, mmioRead, mmioWrite,
mmioSeek, mmioDescend, and mmioAscend) are designed to process RIFF files. In
particular, mmioDescend and mmioAscend allow chunks and lists of chunks to be
processed quite conveniently. As a point of comparison, QuickTime provides no
such assistance, and dealing with AVI files is considerably simpler.
Listing Six (page 19) shows how to parse a basic AVI movie file. For clarity,
error checking has been omitted. Again, there isn't room to show the code for
writing an AVI file, but I use this technique.
1. Seek to an offset of 2K into the file.
2. Write out all the movie data (video frames and sound chunks) in the desired
order.
3. At the same time, accumulate, in internal tables, the information you'll
need to build the hdrl list.
4. Seek back to the beginning of the file and create a RIFF chunk and the
required chunks in the hdrl list.
5. Create a junk chunk to pad the end of the hdrl list to the beginning of the
movi list.
6. Create a movi list chunk.
7. Seek to the end of the file and create an index chunk.


Conclusion


Content developers often ask whether they should develop for QuickTime or
Video for Windows. On the one hand, I think that QuickTime is technically
superior. As far as production is concerned, the Intel Smart Video Recorder
(ISVR card) can capture QuickTime and AVI movies with equal ease, and products
like Adobe's Premiere bring first-rate editing capabilities to both. And on
the Mac, where Video for Windows is not even a player, there exists a vast
pool of equipment, software, and (most importantly) production talent, all
dedicated to QuickTime.

On the other hand, Apple is fast losing ground to Microsoft by daring to play
in Microsoft's sandbox. The decision to do Windows was a bold one, but unless
Apple begins exhibiting a much stronger commitment to QuickTime for Windows,
it may ultimately be overwhelmed.
Mark Florence
 Figure 1: (a) Basic atom format; (b) basic QuickTime movie file structure;
(c) AVI RIFF form header; (d) basic chunk format; (e) list-header format; (f)
AVI index-entry format.
Table 1: moov atom tree structure.
Atom Purpose
moov Movie atom.
-mvhd Movie header. Defines the time scale and duration of the movie.
-trak Track atom.
--tkhd Track header. Defines the dimension, time scale, and duration of the
track.
--edts Edit list.
---elst Edit-list entry. Allows selections of the track be played out of
sequence.
--mdia Media atom.
---mdhd Media header. Defines the characteristics of the media holding this
track's data.
---hdlr Handler. On the Mac, defines the component that handles the media.
---minf Media information.
----vmhd or smhd Video- or sound-media information header. Defines basic media
requirements.
----hdlr Handler. On the Mac, defines the component that handles the video or
sound.
----dinf Data information.
-----dref Data reference. On the Mac, can point to another file holding this
track's data.
----stbl Sample table.
-----stsd Sample description. Describes how the track's video or sound is
compressed.
-----stts Time-to-sample. Gives the duration of each video frame.
-----stss Sync sample. Indicates the location of key frames.
-----stsc Sample-to-chunk. Groups video frames or sound samples into chunks.
-----stco Chunk offset. Gives the offset into the mdat atom of each chunk.
-----stsz Sample size. Gives the size of each video frame or sound sample.
-trak As many additional tracks as required.
Figure 2: Applying a duration value to multiple frames.
stts (24) Time To Sample
-Version/Flags: 0x00000000
-Number Of Entries: 1
- 0: Sample Count 1270, Sample Duration 100.


Figure 3: Prototypes for the Flip16 and Flip32 routines, which converts Intel
ordering to and from Motorola.
WORD PASCAL Flip16(WORD);
DWORD PASCAL Flip32 (DWORD);


Table 2: AVI file structure.
 Code Purpose
RIFF AVI File header.
 LIST hdrl Defines structure of data in the movi list.
 avih Defines basic movie format.
 LIST strl One strl list per stream (video or sound data)
 strh Defines stream format.
 strf
 LIST strl
 strh
 strf
 _
 junk Optionally, provides padding (otherwise ignored).
 LIST movi Contains actual movie data.
 LIST rec Groups video and sound data for efficiency.
 ##wb Sound data.
 ##dc Video data.
 LIST rec
 ##wb
 ##dc

 _
 idx1 An optional index into movi list.


Video Compression Technologies


Table 3 summarizes the compression technologies available today for QuickTime
and Video for Windows. By the time you read this, more may be known about the
Captain Crunch and Indeo R3 compressors (both still in beta at the time of
this writing). Those compressors producing encodings that are identical in
both systems are indicated with an asterisk. I have deliberately omitted MPEG,
motion, JPEG, and other technologies, simply because no QuickTime or Video for
Windows CODECs exist yet.
I've also indicated a typical frame size and rate, although these numbers
should be taken with a grain of salt. I've assumed software-only decompression
on a 486/33 PC. In my opinion, the current leader of the pack is clearly
CinePak, although Captain Crunch and Indeo R3 show signs of catching up.
When analyzing the performance of a CODEC, the most important gating factor is
the CD-ROM transfer rate, because most movies are distributed this way.
Consider that common CD-ROMs have 150--200 Kbytes/second transfer rates. A
good CODEC will attempt to compress the data as tightly as possible (which
gates the maximum playback rate from CD) in such a way that it can be decoded
as quickly as possible (which gates the actual playback rate). For example, if
the average frame size is 10K, then no more than 15 to 20 fps from CD can be
expected, regardless of the speed of the decompressor.
--M.F.
Table 3: Compression technologies currently available for QuickTime and Video
for Windows (*common to both QuickTime for Windows and AVI).
Compressor Identifier Frame Size/Rate Comments
Apple Animation rle 320x240/12 fps Optimized for animations and cartoons.
Gives poor performance for real-life video content.
Apple Graphics smc 160x120/15 fps A modest performer optimized for 8-bit
content. (The identifier is the initials of its patent holder, Sean
Callaghan.)
Apple Video rpza 160x120/15 fps Also known as "road pizza" because of its good
compression ratio, it is now superceded by CinePak.
Captain Crunch* klic 320x240/15 fps From MediaVision; currently in beta.
CinePak* cvid 320x240/15 fps The one apparent disadvantage currently is that
the algorithm is highly asymetrical. It takes up to 100 seconds to compress
one second of video. For content with a high turnover and a short life, this
can be critical.
Intel Indeo R2* rt21 160x120/12 fps A modest performer without hardware
assistance. When available, will be superceded by Indeo R3. Intel's ISVR card
captures directly into this format.
Intel Indeo R3* iv31 320x240/15 fps Currently in beta.
Intel YVU9 yvu9 Primarily used only during capture; available only in Video
for Windows. Content is almost always converted into another format.
JPEG jpeg Primarily used only during capture; available only in QuickTime.
Content almost always converted into another format.
Microsoft RLE mrle 160x120/12 fps Optimized for animations and cartoons. Gives
poor performance for real-life video content. Not the same as Apple's
Animation compressor.
Microsoft Video 1* msvc 160x120/12 fps Media Vision's ProMovie Studio captures
directly into this format.


Selecting a Decompressor


Key to the flexibility of both QuickTime and Video for Windows is their open
architecture for compressors and decompressors (CODECs). Today,
vector-quantization compression techniques allow playback rates of
approximately 12 to 15 fps of 240x180 frames on most general-purpose
computers. Tomorrow, perhaps wavelets or fractals will double this. It's vital
that both QuickTime and Video for Windows accommodate this growth without
changing their file formats or architecture. Fortunately, they do, and we are
starting to see a wide range of powerful CODECs from Apple, Microsoft, and
third-party developers.
QuickTime decompressors are structured as components. Components are a Mac
concept, ported to the PC in QuickTime for Windows. A component is a special
kind of DLL (in Windows, they normally use the .QTC extension) that negotiates
its capabilities with its callers through a predefined set of entry points. A
single .QTC file, which Windows views as a DLL, can contain multiple
components. Full details are in Apple's QuickTime documentation.
You may recall from the general discussion that the stsd atom describes how a
track's video is compressed. It does this by encoding the four-character
identifier of the compressor. The assignment of these identifiers is regulated
by Apple to ensure that they remain unique across all third-party developers.
They look just like atom identifiers; for example, cvid is assigned to
SuperMac's CinePak CODEC.
When QuickTime for Windows starts to play a video track, it negotiates with
all the decompressor components it can find, using the standard Windows
LoadLibrary search strategy. Each decompressor is asked, of course, if it can
handle the identified encoding. But it also has the opportunity to check if a
preferred environment (for example, special hardware) is present. In any
event, it will report whether or not it can perform the decompression and, if
so, how fast. The speed being measured as the number of milliseconds necessary
to decode a 320x240 frame. QuickTime then uses the fastest decompressor.
Even when a movie is playing, QuickTime can switch decompressors. For example,
if the video frame becomes clipped by another window, QuickTime will repeat
the decompressor selection process. It does this because a decompressor that
uses hardware assistance may wish to defer to a software-only decompressor for
nonrectangular frames.
This elegant scheme is simple and effective, although it does place a burden
on the decompressor writer to develop the correct negotiation logic. It has
the advantage in that decompressors can simply be dropped into the user's
system without any SYSTEM.INI changes. For example, content providers can
deliver a CD of movies and a proprietary QuickTime decompressor without fear
of a conflict with the existing environment or special installation
requirements. Further, multiple decompressors for the same encoding can be
present, and QuickTime will automatically choose the most appropriate.
Video for Windows decompressors are drivers (DLLs with the extension .DRV)
written to the specification Microsoft documents in the Video for Windows
Development Kit Programmer's Guide. In a manner similar to QuickTime, the AVI
file format encodes the four-character identifier of the compressor in the
video stream header, strh. Again, the assignment of these identifiers is
regulated by Microsoft to ensure uniqueness, although we can be sure that the
level of coordination between Apple and Microsoft is fairly low! Fortunately,
where an encoding is supported in both systems, its identifier is constant.
For example, Microsoft has also assigned cvid to SuperMac's CinePak CODEC.
Before it plays a video stream, Video for Windows simply takes the encoder
identifier, prefixes it with VIDC, and uses it to look up the name of the
CODEC in the [drivers] section of SYSTEM.INI; see the [drivers] section in
Figure 4.
The scheme is simple and effective, but it has disadvantages compared to
QuickTime. An installation procedure of some kind is required, and multiple
decompressors for the same encoding cannot coexist. This means that a new
version of a decompressor cannot specialize the capabilities of existing
versions; it must totally replace them. Imagine that Intel wants to develop a
new version of the Indeo decompressor optimized especially for the XYZ video
chipset. Under QuickTime, it need only perform this one task, and can defer to
other decompressors if the XYZ chip is not present. Video for Windows
decompressors under Video for Windows must assume all the functionality of
prior versions.
--M.F.
Figure 4: Example [drivers] section of a Windows SYSTEM.INI file.
[drivers]
VIDC.MSVC=msvidc.drv
VIDC.YVU9=isvy.drv
VIDC.IV31=indeor3.drv
VIDC.RT21=indeo.drv
VIDC.CVID=iccvid.drv
VIDC.MRLE=msrle.drv




Sound Encoding Techniques


Both QuickTime and AVI formats store sound in similar ways. At the time of
this writing, neither supported compressed sound. Table 4 summarizes the
encoding techniques each uses.
When sound is digitized, analog signals are converted to numbers. The size of
those numbers is referred to as the sample size. The rate at which the analog
signal is sampled to is called the sample rate. In general, the larger the
sample size and rate, the better the quality of the digitization. As a point
of reference, CD-DA (standard audio CDs) is the equivalent of 16-bit, 44.1 kHz
sound.
For sample sizes of 8 bits, each sample represents one of possibly 256
different values; for 16-bit samples, 65,536 discrete values can be
represented. You might visualize the difference in quality to be analogous to
that more easily perceived between 8- and 16-bit color. Each sample represents
the deviation of a waveform from a midpoint. Two conventions exist for the
midpoint. In AVI, 8-bit samples use 0x80 as the midpoint (the so-called "raw
format"), and 16-bit samples use conventional signed numbers with 0x0000 as
the midpoint (so-called "two's-complement" format). QuickTime can use either
format with either sample size. Figure 5 shows this more clearly.
To complicate matters a little, Microsoft does not actually use the jargon raw
and two's complement. Instead, it uses the acronym PCM to refer to its 8-bit
and 16-bit encodings. To convert between the two formats, simply XOR each
sample with 0x80 or 0x8000 as appropriate.
QuickTime stores 16-bit sound samples in Motorola order; AVI uses Intel order.
Byte ordering is, of course, moot for 8-bit samples! Consequently, 16-bit
sound samples must be flipped when converting from AVI to QuickTime and vice
versa.

Most PC sound cards can only digitize and play back at the three standard MPC
rates of 11.025, 22.05, and 44.1 kHz. Many QuickTime movies are captured on
the Mac and their sample rates can appear as weird numbers like 11.12754 kHz.
Both QuickTime and AVI share the same convention for stereo sound in that the
left-channel sample appears before the right-channel sample in the stream.
The interplay of sample size, rate, and number of channels has a great effect
on the ability of the QuickTime or AVI engine to playback a movie. For
example, CD-DA quality sound (16-bit, 44.1 kHz, stereo) requires a sustained
data transfer rate of 176.4 Kbytes per second. Single-speed CD-ROM drives are
capable of a peak rate of 150 Kbytes per second, which doesn't leave a lot of
room for video! For this reason, most digital video movies you'll see today
use 8-bit, 11-kHz mono sound (which doesn't sound too bad through most PC
speakers). This situation is unlikely to improve much until we see a quantum
leap in hardware performance.
Interleave is a primary characteristic of digital video files, so much so that
the AVI file format is named after the concept. However, interleave is mainly
a factor for slow playback devices such as CD-ROM. The trick that both the
QuickTime and AVI engines have had to master is to stream enough data from the
CD-ROM to keep themselves busy. It is a delicate balance of RAM buffer sizes,
transfer rates, seek times, and playback rate. Note that streaming is actually
quite the opposite of the more conventional caching. A cache (like SMARTDrive)
attempts to improve performance by anticipating that data, once read, will be
read again. Streaming assumes that data will be read once, from beginning to
end, and attempts to steadily supply that data at the same rate that it is
consumed.
Although QuickTime and AVI acknowledge the same concept, their engines have
different requirements for interleave. QuickTime prefers sound and video in
half-second chunks, with sound leading. AVI prefers sound and video
interleaved on a frame-by-frame basis. That is, each video frame is physically
followed by a frames worth of sound. To complicate matters, though, sound
samples are skewed ahead of video by 0.75 second. In a dump of an AVI file,
you'll see the first few sound samples unmatched by video frames, and the last
few video frames unmatched by sound samples (look at the end of Listing Five
(page 19) for an example of this).
When an AVI file is converted to QuickTime, or vice versa, the interleave
factor should be adjusted to these preferred values. If it is not, you can
expect poor playback performance from a CD-ROM.
--M.F.
Table 4: Sound-encoding techniques.
 QuickTime AVI
Sample Size 8 bit, 16 bit. 8 bit, 16 bit.
Sample Rate Continuum of rates, normally Normally the discrete MPC rates of
between 11.0 and 44.1 kHz. 11.025, 22.05, and 44.1 kHz.
Channels Mono and stereo. Mono and stereo.
Interleave Half-second chunks, sound leading. Frame-by-frame, sound skewed.
ahead by 0.75 second.
 Figure 5: Comparison of "raw" and "two's" sound.
two's min max mid
8 bit 80 7F 0
16 bit 8000 7FFF 0
raw min max mid
8 bit 0 FF 80
16 bit 0 FFFF 8000
For More Information
QuickTime Developer's Kit
Apple Computer
P.O. Box 319
Buffalo, NY 14207
800-282-2732
$195

Video for Windows
Microsoft Corp.
One Microsoft Way
Redmond, WA 98052-6399
Available free on CompuServe

Canyon Movie Toolkit
San Francisco Canyon Company
150 Post Street, Suite 620
San Francisco, CA 94108
415-398-9957
$795


[LISTING ONE] (Text begins on page 10.)


moov (16658) Movie Atom
 mvhd (108) Movie Header
 -Version/Flags: 0x00000000
 -Creation Time: Thu Aug 19 13:26:31 1993
 -Modification Time: Thu Aug 19 13:26:31 1993
 -Time Scale: 1000 per second
 -Duration: 127000
 -Preferred Rate: 1
 -Preferred Volume: 0x00ff
 -Matrix: 1 0 0
 0 1 0
 0 0 1
 -Preview Time: 0
 -Preview Duration: 0
 -Poster Time: 0

 -Selection Time: 0
 -Selection Duration: 0
 -Current Time: 0
 -Next Track ID: 2
 trak (5524) Track Atom
 tkhd (92) Track Header
 -Version/Flags: 0x0000000f
 -Creation Time: Thu Aug 19 13:26:31 1993
 -Modification Time: Thu Aug 19 13:26:31 1993
 -Track ID: 0
 -Time Scale: 0 per second
 -Duration: 127000
 -Movie Time Offset: 0
 -Priority: 0
 -Layer: 0
 -Alternate Group: 0
 -Volume: 0
 -Matrix: 1 0 0
 0 1 0
 0 0 1
 -Track Width: 320
 -Track Height: 240
 edts (36) Edit List
 elst (28) Edit Entry
 -Version/Flags: 0x00000000
 -Number Of Entries: 1
 - Entry 0: Duration 127000, time 0, rate 1.
 mdia (5388) Media Atom
 mdhd (32) Media Header
 -Version/Flags: 0x00000000
 -Creation Time: Thu Aug 19 13:26:31 1993
 -Modification Time: Thu Aug 19 13:26:31 1993
 -Time Scale: 11025 per second
 -Duration: 14001750
 -Language: 0x0000
 -Quality: 0x0000
 hdlr (32) Handler
 -Version/Flags: 0x00000000
 -Component Type: mhlr
 -Component Subtype: soun
 -Component Manufacturer: appl
 -Component Flags: 0x00000000
 -Component Flags Mask: 0x00000000
 minf (5316) Video Media Info
 smhd (16) Sound Media Information
 -Version/Flags: 0x00000000
 -Balance: 0
 hdlr (32) Handler
 -Version/Flags: 0x00000000
 -Component Type: dhlr
 -Component Subtype: alis
 -Component Manufacturer: appl
 -Component Flags: 0x00000000
 -Component Flags Mask: 0x00000000
 dinf (36) Data Info
 dref (28) 00 00 00 00 00 00 00 01 00 00 00 0c 61 6c 69 73
 00 00 00 01
 stbl (5224) Sample Table
 stsd (52) Sample Description

 -Version/Flags: 0x00000000
 -Number Of Entries: 1
 raw (36) Sound Description
 -Data reference ID: 0x0000
 -Version: 0x0000
 -Codec Revision Level: 0x0000
 -Codec Vendor: appl
 -Number of Channels: 1
 -Bits/Sample: 8
 -Compression ID: 0
 -Packet Size: 0
 -Sample Rate: 11025.
 stts (24) Time To Sample
 -Version/Flags: 0x00000000
 -Number Of Entries: 1
 - 0: Sample Count 14001750, Sample Duration 1.
 stsc (3832) Sample To Chunk
 -Version/Flags: 0x00000000
 -Number Of Entries: 318
 - 0: First Chunk 1, Sample per Chunk 4410, Chunk Tag 1.
 ...
 - 317: First Chunk 318, Sample per Chunk 1500, Chunk Tag 1.
 stsz (20) Sample Size
 -Version/Flags: 0x00000000
 -Sample Size: 1
 -Number Of Entries: 14001750
 stco (1288) Chunk Offset
 -Version/Flags: 0x00000000
 -Number Of Entries: 318
 8 18268 71942 125877 181638 239367 295397 351595 408044 466170
 -Dumping 1232 bytes
 trak (11018) Track Atom
 tkhd (92) Track Header
 -Version/Flags: 0x0000000f
 -Creation Time: Thu Aug 19 13:26:31 1993
 -Modification Time: Thu Aug 19 13:26:31 1993
 -Track ID: 1
 -Time Scale: 1000 per second
 -Duration: 127000
 -Movie Time Offset: 0
 -Priority: 0
 -Layer: 0
 -Alternate Group: 0
 -Volume: 0
 -Matrix: 1 0 0
 0 1 0
 0 0 1
 -Track Width: 320
 -Track Height: 240
 edts (36) Edit List
 elst (28) Edit Entry
 -Version/Flags: 0x00000000
 -Number Of Entries: 1
 - Entry 0: Duration 127000, time 0, rate 1.
 mdia (10882) Media Atom
 mdhd (32) Media Header
 -Version/Flags: 0x00000000
 -Creation Time: Thu Aug 19 13:26:31 1993
 -Modification Time: Thu Aug 19 13:26:31 1993

 -Time Scale: 1000 per second
 -Duration: 127000
 -Language: 0x0000
 -Quality: 0x0000
 hdlr (32) Handler
 -Version/Flags: 0x00000000
 -Component Type: mhlr
 -Component Subtype: vide
 -Component Manufacturer: appl
 -Component Flags: 0x00000000
 -Component Flags Mask: 0x00000000
 minf (10810) Video Media Info
 vmhd (20) Video Media Information Header
 -Version/Flags: 0x00000000
 -Graphics Mode: 64
 -Op Color: 0x0000, 0x0000, 0x0000
 hdlr (32) Handler
 -Version/Flags: 0x00000000
 -Component Type: dhlr
 -Component Subtype: alis
 -Component Manufacturer: appl
 -Component Flags: 0x00000000
 -Component Flags Mask: 0x00000000
 dinf (36) Data Info
 dref (28) 00 00 00 00 00 00 00 01 00 00 00 0c 61 6c 69 73 00 00 00 01
 stbl (10714) Sample Table
 stsd (102) Sample Description
 -Version/Flags: 0x00000000
 -Number Of Entries: 1
 cvid (86) Image Description (cvid)
 -Version: 1
 -Revision Level: 1
 -Vendor: appl
 -Temporal Quality: 0x3ff
 -Spatial Quality: 0x3ff
 -Width (in pixels): 320
 -Height (in pixels): 240
 -Horizontal Resolution: 72
 -Vertical Resolution: 72
 -Data Size: 0
 -Codec name: Movie Toolkit (cvid)
 -Depth: 24
 -Dumping 2 bytes
 stts (24) Time To Sample
 -Version/Flags: 0x00000000
 -Number Of Entries: 1
 - 0: Sample Count 1270, Sample Duration 100.
 stss (356) Sync Sample
 -Version/Flags: 0x00000000
 -Number Of Entries: 85
 1 16 31 46 61 76 91 106 121 136
 -Dumping 300 bytes
 stsc (28) Sample To Chunk
 -Version/Flags: 0x00000000
 -Number Of Entries: 1
 - 0: First Chunk 1, Sample per Chunk 1, Chunk Tag 1.
 stsz (5100) Sample Size
 -Version/Flags: 0x00000000
 -Sample Size: 0

 -Number Of Entries: 1270
 13850 12357 12148 12439 12320 12338 12323 12481 12383 12499
 -Dumping 5040 bytes
 stco (5096) Chunk Offset
 -Version/Flags: 0x00000000
 -Number Of Entries: 1270
 4418 22678 35035 47183 59622 76352 88690 101013 113494 130287
 -Dumping 5040 bytes

[LISTING TWO]

 ALIGN 16
Flip16 PROC FAR VALUE:WORD
 MOV AX, VALUE
 ROL AX, 8
 RET
Flip16 ENDP


 ALIGN 16
Flip32 PROC FAR VALUE:DWORD
 MOV DH, BYTE PTR VALUE
 MOV DL, BYTE PTR VALUE + 1
 MOV AH, BYTE PTR VALUE + 2
 MOV AL, BYTE PTR VALUE + 3
 RET
Flip32 ENDP

[LISTING THREE]

typedef int (*ATOMFILPARSER) (HMMIO hmmio, long lName, long lOffset, long
lSize);

int CollectAtomsFromFile (HMMIO hmmio, long lOffset, long lSize,
 ATOMFILPARSER apfil) {
 struct {long lSize, lName; } atmh;

 // Process the various atoms as we find them
 for (; lSize > 0; lOffset += atmh.lSize, lSize -= atmh.lSize) {
 if (lSize < sizeof atmh)
 return FALSE ;
 mmioSeek (hmmio, lOffset, SEEK_SET);
 if (mmioRead (hmmio, (HPSTR) &atmh, sizeof atmh) != sizeof atmh)
 return FALSE ;
 atmh.lSize = Flip32 (atmh.lSize);
 if (atmh.lSize < sizeof atmh)
 return FALSE ;
 if (! apfil (hmmio, atmh.lName, lOffset+sizeof atmh,
 atmh.lSize-sizeof atmh))
 return FALSE ;
 }
 // If the movie is well-formed, we should end on an atom boundary
 return (lSize == 0);
}

[LISTING FOUR]

 ...
hmmio = mmioOpen (szFileName, NULL, MMIO_READ MMIO_DENYNONE);
CollectAtomsFromFile (hmmio, 0, mmioSeek (hmmio, 0, SEEK_END),

&ParseWholeMovie);
mmioClose (hmmio);
 ...

int ParseWholeMovie (HMMIO hmmio, long lName, long lOffset, long lSize)
{
 switch (lName) {
 case mmioFOURCC (m','o','o','v'):
 return CollectAtomsFromFile (hmmio, lOffset, lSize,
&ParseMoovAtom);
 default:
 return TRUE;
 }
}
int ParseMoovAtom (HMMIO hmmio, long lName, long lOffset, long lSize) {
 switch (lName) {
 case mmioFOURCC (m','v','h','d'):
 /* your code */
 return TRUE;
 case mmioFOURCC (t','r','a','k'):
 return CollectAtomsFromFile (hmmio, lOffset, lSize, &ParseTrakAtom);
 default:
 return TRUE;
 }
}
int ParseTrakAtom (HMMIO hmmio, long lName, long lOffset, long lSize) {
 switch (lName) {
 case mmioFOURCC (t','k','h','d'):
 /* your code */
 return TRUE;
 case mmioFOURCC (e','d','t','s'):
 return CollectAtomsFromFile (hmmio, lOffset, lSize, &ParseEdtsAtom);
 case mmioFOURCC (m','d','i','a'):
 return CollectAtomsFromFile (hmmio, lOffset, lSize, &ParseMdiaAtom);
 default:
 return TRUE;
 }
}
int ParseEdtsAtom (HMMIO hmmio, long lName, long lOffset, long lSize) {
 switch (lName) {
 case mmioFOURCC (e','l','s','t'):
 /* your code */
 return TRUE;
 default:
 return TRUE;
 }
}
int ParseMdiaAtom (HMMIO hmmio, long lName, long lOffset, long lSize) {
 switch (lName) {
 case mmioFOURCC (m','d','h','d'):
 /* your code */
 return TRUE;
 case mmioFOURCC (h','d','l','r'):
 /* your code */
 return TRUE;
 case mmioFOURCC (m','i','n','f'):
 return CollectAtomsFromFile (hmmio, lOffset, lSize, &ParseMinfAtom);
 default:
 return TRUE;

 }
}
int ParseMinfAtom (HMMIO hmmio, long lName, long lOffset, long lSize) {
 switch (lName) {
 case mmioFOURCC (s','t','b','l'):
 return CollectAtomsFromFile (hmmio, lOffset, lSize, &ParseStblAtom);
 default:
 return TRUE;
 }
}
int ParseStblAtom (HMMIO hmmio, long lName, long lOffset, long lSize) {
 switch (lName) {
 case mmioFOURCC (s','t','s','d'):
 /* your code */
 return TRUE;
 case mmioFOURCC (s','t','t','s'):
 /* your code */
 return TRUE;
 case mmioFOURCC (s','t','s','s'):
 /* your code */
 return TRUE;
 case mmioFOURCC (s','t','s','c'):
 /* your code */
 return TRUE;
 case mmioFOURCC (s','t','s','z'):
 /* your code */
 return TRUE;
 case mmioFOURCC (s','t','c','o'):
 /* your code */
 return TRUE;
 default:
 return TRUE;
 }
}

[LISTING FIVE]

00000000 RIFF (00E5DE86) AVI 
0000000C LIST (000007D4) hdrl'
00000018 avih (00000038)
 TotalFrames : 1270
 Streams : 2
 InitialFrames: 8
 MaxBytes : 307200
 BufferSize : 30720
 uSecPerFrame : 100000
 Rate : 10.000 fps
 Size : (320, 240)
 Flags : 0x00000710
 AVIF_HASINDEX
 AVIF_ISINTERLEAVED
 AVIF_VARIABLESIZEREC
 AVIF_NOPADDING
00000058 LIST (00000074) strl'
00000064 strh (00000038)
 Stream Type : vids
 Stream Handler: cvid
 Samp/Sec : 10.000
 Priority : 0

 InitialFrames : 0
 Start : 0
 Length : 1270
 Length (sec) : 127.0
 Flags : 0x00000000
 BufferSize : 14654
 Quality : 7500
 SampleSize : 0
000000A4 strf (00000028)
 Size : (320, 240)
 Bit Depth : 24
 Colors used : 0
 Compression : cvid
000000D4 LIST (0000005C) strl'
000000E0 strh (00000038)
 Stream Type : auds
 Stream Handler: <default>
 Samp/Sec : 11025.000
 Priority : 0
 InitialFrames : 8
 Start : 0
 Length : 1399470
 Length (sec) : 126.9
 Flags : 0x00000000
 BufferSize : 1103
 Quality : 7500
 SampleSize : 1
00000120 strf (00000010)
 wFormatTag : WAVE_FORMAT_PCM
 nChannels : 1
 nSamplesPerSec : 11025
 nAvgBytesPerSec : 11025
 nBlockAlign : 1
 nBitsPerSample : 8
00000138 vedt (00000008)
000007E8 LIST (00E4E7F6) movi'
000007F4 LIST (0000045C) rec 
00000800 01wb (0000044F)
00000C58 LIST (0000045A) rec 
00000C64 01wb (0000044E)
000010BA LIST (0000045C) rec 
000010C6 01wb (0000044F)
0000151E LIST (0000045A) rec 
0000152A 01wb (0000044E)
00001980 LIST (0000045C) rec 
0000198C 01wb (0000044F)
00001DE4 LIST (0000045A) rec 
00001DF0 01wb (0000044E)
00002246 LIST (0000045C) rec 
00002252 01wb (0000044F)
000026AA LIST (0000045A) rec 
000026B6 01wb (0000044E)
00002B0C LIST (00003A7E) rec 
00002B18 00dc (0000361A)
0000613A 01wb (0000044F)
00006592 LIST (000034A8) rec 
0000659E 00dc (00003045)
000095EC 01wb (0000044E)
00009A42 ...

00E4E9CA LIST (00000614) rec 
00E4E9D6 00dc (00000608)
00E4EFE6 idx1 (0000EEA0)
00E5DE8E

[LISTING SIX]

void ParseAVIMovie (char *szFileName) {
 MMCKINFO ckAVI, ckAVIH, ckHDRL, ckSTRL, ckSTRH, ckSTRF, ckIDX1, ckMOVI;
 MainAVIHeader avihdr;
 AVIStreamHeader strhdr;
 AVIINDEXENTRY avindx;
 HMMIO hmmio;
 long lStream;

 // Open file
 hmmio = mmioOpen (szFileName, NULL, MMIO_READ);

 // Read the AVI header
 mmioSeek (hmmio, 0, SEEK_SET);
 ckAVI.ckid = ckidRIFF;
 ckAVI.fccType = formtypeAVI;
 mmioDescend (hmmio, &ckAVI, 0, MMIO_FINDRIFF);
 ckHDRL.ckid = ckidLIST;
 ckHDRL.fccType = listtypeAVIHEADER;
 mmioDescend (hmmio, &ckHDRL, &ckAVI, MMIO_FINDLIST);
 ckAVIH.ckid = ckidAVIMAINHDR;
 mmioDescend (hmmio, &ckAVIH, &ckHDRL, MMIO_FINDCHUNK);
 mmioRead (hmmio, (HPSTR) &avihdr, sizeof(MainAVIHeader));

 // Read each stream header
 for (lStream = 0; lStream < (long) avihdr.dwStreams; lStream++) {
 ckSTRL.ckid = ckidLIST;
 ckSTRL.fccType = listtypeSTREAMHEADER;
 mmioDescend (hmmio, &ckSTRL, &ckHDRL, MMIO_FINDLIST);
 ckSTRH.ckid = ckidSTREAMHEADER;
 mmioDescend (hmmio, &ckSTRH, &ckSTRL, MMIO_FINDCHUNK);
 mmioRead (hmmio, (HPSTR) &strhdr, sizeof(AVIStreamHeader));
 mmioAscend (hmmio, &ckSTRH, 0);

 // Is it video?
 if (strhdr.fccType == streamtypeVIDEO) {
 /* your code */
 }

 // Or is it sound?
 else if (strhdr.fccType == streamtypeAUDIO) {
 /* your code */
 }
 // Loop until all streams processed
 mmioAscend (hmmio, &ckSTRL, 0);
 }

 // Done reading headers
 mmioAscend (hmmio, &ckHDRL, 0);
 mmioAscend (hmmio, &ckAVI, 0);

 // Locate movi data
 mmioSeek (hmmio, 0, SEEK_SET);

 ckAVI.ckid = ckidRIFF;
 ckAVI.fccType = formtypeAVI;
 mmioDescend (hmmio, &ckAVI, 0, MMIO_FINDRIFF);
 ckMOVI.ckid = ckidLIST;
 ckMOVI.fccType = listtypeAVIMOVIE;
 mmioDescend (hmmio, &ckMOVI, &ckAVI, MMIO_FINDLIST);
 /* your code */
 mmioAscend (hmmio, &ckMOVI, 0);
 mmioAscend (hmmio, &ckAVI, 0);

 // Locate index
 mmioSeek (hmmio, 0, SEEK_SET);
 ckAVI.ckid = ckidRIFF;
 ckAVI.fccType = formtypeAVI;
 mmioDescend (hmmio, &ckAVI, 0, MMIO_FINDRIFF);
 ckIDX1.ckid = ckidAVINEWINDEX;
 mmioDescend (hmmio, &ckIDX1, &ckAVI, MMIO_FINDCHUNK);
 /* your code */
 mmioAscend (hmmio, &ckIDX1, 0);
 mmioAscend (hmmio, &ckAVI, 0);

 // Close file
 mmioClose (hmmio);
}
End Listings





































Special Issue, 1994
Compressing Waveform Audio Files


Cut your .WAV files in half with Windows' low-level waveform API




Neil G. Rowland, Jr.


Neil is a programmer at Gradient Technologies, porting OSF's Distributed
Computing Environment to DOS/Windows machines. He can be reached on CompuServe
at 72133,426, on the Internet as neil_r@gradient.com, and on Channel One as
"Neil Rowland."


If you're a Windows multimedia developer and have been working with waveform
audio files, you know that sampled waveform data can grow in size pretty fast.
In fact, a single channel with a minimal sampling rate of 11.025 kHz at eight
bits per sample translates into 11,025 bytes per second, or roughly 660 Kbytes
per minute of audio data. Higher sampling rates and stereo each double these
storage requirements.
Recently, I developed an application in C++, that uses the low-level waveform
services to do some signal processing. It also happens to do something
interesting. It compresses waveform files to about half the original size.
In this article, I'll show you how I accomplished this, and provide a class
library that tames the Windows waveform API. To compile and run the source
code from this article, you'll need a C++ compiler and the Microsoft
Multimedia Development Kit (MDK). Of course, you'll also need a sound card
with Windows drivers.


NYB1 Compression


The NYB1 format is a "lossy" compression scheme that I've developed. That is
to say, it "loses" (throws out) some of the information in the input file in
order to save space. The trick of it is to throw out the least-important
information. NYB1 uses a single nybble for each sample, plus some slight
overhead, and minus some compression of silence. If the input file uses one
byte per sample (the usual case), then there is about 50 percent compression,
guaranteed. I used some educated guesses, plus a little trial and error, to
decide what should go in that nybble. The most obvious thing to try is just to
store the top nybble of each sample. You still get a representation of the
waveform, but it's less precise. In practice, this leads to a great deal of
hiss in the output. This quantization noise is always present in digital
audio, but is normally unobtrusive. Clearly, four bits per sample is not
enough to keep it unobtrusive, at least by this brute-force method.
It's possible to improve the performance in the general case by storing the
first-order differential of the input signal, then reintegrating it upon
playback. Something similar to this goes on in analog tape recording. This
works because of the way sound energy is distributed in the real world. Each
octave in the audible range has roughly the same amount of sound on average as
every other. (By roughly, I mean within an order of magnitude.) Human hearing
seems constructed to take advantage of this. So is audio equipment. Look at
any graphic equalizer with a spectrum analyzer. The bands are divided into
equal sections of a logarithmic scale of frequency, usually by octaves. The
spectrum analyzer outputs pink noise, which has an exactly equal distribution
of energy per octave. When you view pink noise on the spectrum analyzer, it
looks like a flat line.
But the brute-force method of digital coding, where you just convert each
sample into a linear number, works best for a very different situation. The
most complex waveform you can throw at it is a completely random value for
each sample. This amounts to what's called "white noise." White noise has its
energy equally distributed across frequencies on a linear frequency scale (not
a logarithmic one), which is very rare in nature.
When you store the sort of sounds you hear in nature this way, all but the
highest frequencies are stored inefficiently. Because the amplitudes of the
lower frequencies are typically much lower than those of the higher
frequencies, a scaling factor that does justice to the high frequencies makes
the low frequencies use only part of the range of numbers. Thus, the
low-frequency sounds are stored with less than the full precision the sample
has room for. This translates into a poorer signal-to-noise ratio in the lower
frequencies. (The "noise" in this signal-to-noise is quantization noise.)
You can turn pink noise into white noise by taking its first-order
differential. Do the same to an audio waveform, and you've translated
something similar to pink noise into something similar to white noise. Store
this quasi-white noise digitally, and you're making the most of the digital
medium. You've adapted the nature of the input to map to the medium's
strengths.
During less-loud parts of the waveform, the quantization noise can swamp the
"real" waveform because there isn't enough precision in the sample to
represent small (soft) details. Even with the first-order differential, this
is a problem. One solution is to change the scale of samples to reflect the
range of the waveform values. That way, precision isn't wasted.
I've broken the sample into two parts, a mantissa and an exponent. The
mantissa is the sample proper. It represents the first-order differential of
the waveform, to a certain scale. Each mantissa is one nybble. The exponent
indicates the scale. I call the exponent "shrite," short for "shift right." On
encoding, I take the first-order differential as a 16-bit integer, then I
shift it right by the exponent. This gives me a mantissa in the bottom four
bits that I can store. On decoding, I do the reverse.
It takes one nybble for the mantissa and one nybble (roughly) for the
exponent. This adds up to one byte, the same size as a typical, uncompressed
.WAV file sample. Obviously, you gain nothing if you store the exponent with
each mantissa. Therefore, I divide the incoming waveform samples into groups
of seven each. I take their first-order differentials and determine the
exponent that is appropriate for the highest amplitude sample in the group.
This will be the shrite value for the entire group. Thus, I store four bytes
(seven mantissa nybbles plus one exponent nybble) for each seven samples in
the original waveform. This is reasonably close to 2:1 compression.
The number seven is somewhat arbitrary. There's a trade-off between precision
and compression ratio. If you make the group smaller, then there are more
exponents (shrites) in the file, and the storage efficiency suffers. If you
make the group larger, then the one shrite chosen for the greatest-magnitude
sample is less likely to be well suited for any other sample in the group.
Consider the extreme case, where there's one exponent for all the mantissas,
which is the same as having no exponent at all. It's a case of one size fits
one, and can be made to do for a few, but not for many.
Finally, I compress silent sections by leaving out the mantissas altogether. I
simply store the exponent of each group, which is 0.


Encoding/Decoding Logic


The code to work with WAV files is in WAVELIB.CPP; see Listing Two (page 24).
The prototypes are in WAVELIB.H; Listing One (page 24). The WAV library has
three classes: WAVUSR, which is a superclass; WAVRDR to read WAV files; and
WAVWRT to write them. WAVUSR's base class is RIFFUSR, which contains logic
common to RIFF files. Both WAVRDR and WAVWRT have APIs that let the caller
deal with them on a sample-by-sample basis, and all the messy details of
buffering are hidden inside the classes. The classes also partially hide the
differences between mono and stereo and the detail of how many bits each
sample consists of.
At any given point, an open WAV object has a file position that is on a given
sample. When you read a sample via mNextSample() or write a sample via
mWriteSample(), the object steps to the next sample in the file. Each class
has three public members named sample, left, and right. These unsigned 16-bit
values hold the current sample. The value 32,768 represents an amplitude of 0.
Left and right are the current values for the left and right channel. Sample
is the average of the two. Each mNextSample() reads in the next sample from
the WAV file and fills in these three values accordingly. mWriteSample()
writes out sample if it's a mono file, or left and right if it's stereo.
The NYB1 library (see Listing Three, page 25 and Listing Four, page 26) shares
code with the WAV library by means of the RIFFUSR class. It contains the
classes NYBUSR, NYBRDR, and NYBWRT. These all have interfaces identical to the
corresponding WAV classes. So the caller can treat them as if they were WAV
classes. The process of encoding and decoding in the NYB1 format is hidden
inside these classes.
The NYBWRT version of mWriteSample() collects incoming samples into groups.
When it's just been called for the last sample in a group, it processes the
group it's built. First, it figures the exponent by taking a first-order
differential of all the samples, figures the appropriate shrite value for the
largest differential it sees, and calls iNegFeedbackStage() for each mantissa
in the group. iNegFeedbackStage() does three things: It applies the encoding
by calling iEncode(); writes the encoded group out to the NYB1 file; and
decodes each encoded mantissa and compares the result to the original value.
iNegFeedbackStage() determines the sign of the error value and on the next
sample applies this as a bias to the encoded sample before writing it out.
This has the effect of preventing cumulative error, by a sort of negative
feedback.
The decoding is done by the NYBRDR version of mNextSample(), which reads in
nybbles from the NYB1 file, stores shrite values when it encounters them, and
decodes mantissas. There's no need to worry about cumulative error at this
phase. The iNegFeedbackStage() routine acted to prevent this when the file was
created. Finally, mNextSample() recognizes when you're in a zero-compressed
group and dummies up seven silent samples (sample=32,768).


Using the Library


The sample application that I've provided demonstrates how to use the NYB1
library. WinMain() in Listing Five (page 27) simply shovels samples between
the input and output objects, one sample at a time. On conversion, it opens
the input WAV file by means of WAVRDR object, and the output NYB1 file by
means of a NYBWRT object. Then it repeatedly calls mNextSample(), feeds the
sample to the NYBWRT object, and calls an mWriteSample() for that object. On
playback, the input object is a NYBRDR, and the output object is a WAVEPLAYER.
Again, it's a matter of ferrying samples from one to the other, one at a time,
until there are no more samples at the input end.
When running the program, notice that the main screen is a small dialog with
two edit fields. The top field is for an input wave file. If you want to
convert a file from WAV to NYB1 format, enter the full path and file
specification of the source file here. To simply play an existing NYB1 file,
leave this blank. The second edit control takes the full path and file
specification of the NYB1 file. If you leave it blank, it defaults to \X.NYB.
For playback, this is the input specification. For conversion, it is the
output specification.


The WAVEPLAYER Class


The output stage is encapsulated in the WAVEPLAYER class (see Listing Two).
The waveform portion of a sound-card driver for Windows takes its input in
buffers. Then it plays the current buffer in the background while the CPU
prepares the next buffer for it. This is good for efficiency, but you'll want
to deal with the waveform output as a serial string of samples. The Windows
API does almost nothing to hide the details of buffering from us. So, I do it
in the WAVEPLAYER class. The programmer using the class feeds it a string of
samples, one at a time. The WAVEPLAYER object handles the buffering. The code
calling WAVEPLAYER need not bother with any part of the Windows waveform API
calls.
When WAVEPLAYER is started, it opens the wave-playing side of the sound card.
The WAVEPLAYER::mOpen() method uses Windows API call waveOutOpen() to open a
WAVE_MAPPER device, which specifies a default device for waveform handling.
Currently, it's the only device ID supported by Windows. The mOpen() method
then allocates a buffer header, of type WAVEHDR. This is the Windows object
that manages the buffers for the sound card. When WAVEPLAYER::mPlaySample() is
called for a sample, it's usually appended to the samples in the current
buffer. But when there's no room in the buffer, it must flush the buffer.
iCloseoutSampleBuffer() contains the logic to flush the previous buffer. The
call to waveOutUnPrepareHeader() tells Windows you're done with a buffer. It
won't return (successfully) unless and until Windows itself is done playing
the contents of the buffer. When waveOutUnPrepareHeader() returns you can
safely free the old buffer.
A note on the parameter to WAVEPLAYER::mOpen is needed. Though its type is a
pointer to type WAVEFORMAT, it is really being passed a PCMWAVEFORMAT. This
oddity is a result of the way the Windows API call waveOut-Open is prototyped.
This call takes a pionter to type WAVEFORMAT. But in reality, when the device
type in the passed structure is WAVE_FORMAT _PCM (and it always is), it should
point to a PCMWAVEFORMAT. The difference is that PCWAVEFORMAT has an extra
field, wBitsPerSample, that Windows must have. In NYBI.CPP, PlayIt makes the
call Play.mOpen (&NybIn.Fmt.wf). This is the same as saying Play.mOpen
((WAVEFORMAT*)(&NybIn.Fmt)).

mPlaySample() "hands" Windows the buffer just filled, calling the iPlay()
method to handle the mechanics of handing over a buffer. iPlay() sets up the
WAVHDR to point to the new buffer, and calls waveOutPrepareHeader() and
waveOutWrite(). Both calls are necessary to tell Windows to play the buffer.
iCloseoutSampleBuffer() then sets the buffer pointer to NULL and returns. The
next call to mPlaySample() will see this null pointer, allocate a new buffer,
and start filling it. For now, mPlaySample() simply returns to the caller.
When you've finished with the output, and closed the WAVEPLAYER, flush the
current buffer, even though it's probably not full. Otherwise, you'll never
hear the last bit of the waveform. iCloseoutBuffer() is the method that does
this. First it waits until the sound card is done with the previous buffer.
The WHDR_DONE flag in the WAVEHDR that manages the buffer will go to 0 when
this happens. Then it executes a final waveOutUnPrepareHeader() to tell
Windows you're done with the final buffer. Finally, it frees the buffer and
returns.


RIFFUSR and Friends


Windows Multimedia file formats are all special cases of the RIFF file format.
Therefore, a WAV file is a type of RIFF file. For consistency, I've decided to
make my NYB1 file format a RIFF format as well. Windows has an API to help
parse RIFF files. Both the WAV and NYB1 libraries need it, so I've
encapsulated it in the RIFFUSR class.
A RIFF file consists of "chunks," each of which has a 4-byte type, a length
field, and the actual data. These chunks, which vary according to format, can
be nested. In the case of a WAV file, there is one top-level RIFF chunk of
type WAVE, containing a format chunk (type fmt) and a large data chunk (type
data) that takes up most of the file. I use the same scheme in NYB1 format to
help share code. The only difference is that the top-level chunk is of type
NYB1. The RIFFUSR methods iOpenRead() and iOpenWrite() are made possible by
this consistency. iOpenRead() and iOpenWrite() use the Windows call mmioOpen()
to open the file in such a way that it can be manipulated with the mmio*()
calls. The option flag MMIO_ALLOCBUF instructs Windows to allocate buffers for
use.
The Windows API for RIFF files deals with these chunks almost as if they were
files. There are calls to open, close, read, and write chunks. The API calls
mmioAscend() and mmioDescend() determine which chunk within the file is
currently open. mmioDescend() moves the read pointer into a chunk within the
current chunk. If there's not currently one, it moves into a top-level chunk.
It finds the chunk, given its name in the parameter ck, and then moves the
file pointer to point to the first data byte of that chunk, essentially
opening it. Note that since chunks can be nested, so can mmioDescend() calls.
mmioRead() and mmioWrite() are analogous to the standard library read() and
write() calls, except that they deal with the current chunk instead of the
whole file. mmioRead() won't read past the end of the chunk. mmioWrite()
appends to the current chunk.
Finally, mmioAscend() is analogous to close(). On writing, it closes out the
chunk in a tidy manner. On reading, it tells Windows that you are done reading
that chunk. You have then "ascended" one level higher in the hierarchy of
nested chunks. You can then do another mmioDescend() to select another chunk
to work with. Also note that because mmioDescend() calls can be nested,
mmioAscend() calls are also nested. Every mmioDescend() must have a matching
mmioAscend().
This API is somewhat object oriented. All of the calls with the "mmio" prefix
take a handle of type HMMIO as the first parameter. HMMIO is a scalar that
acts like a file handle, but it is especially for RIFF files and for the mmio
calls. There's also an MMIOINFO structure, which holds various useful bits of
information about the RIFF file. The most important items in MMIOINFO are the
next and end pointers to the current buffer (fields pchNext and pchEndWrite).
Though I use mmioSetBuffer() to tell Windows to allocate buffers for me, I
still need to know where they are since I must manipulate them directly. The
call mmioGetInfo() reads the MMIOINFO data. I store both the HMMIO handle and
the MMIOINFO data in the RIFFUSR object.
The inline functions iReadByte() and iWriteByte() encapsulate this buffer
business. Using these calls, I can write my code as if I were dealing with a
stream of bytes, instead of a buffer-oriented API. The calls are in WAVELIB.H.
When they need to go to another buffer, they call iReadBuf() and iWriteBuf().
These routines use the Windows API call mmioAdvance() to get another buffer
and simultaneously update your copy of the MMIOINFO data. One messy detail: On
writing, you must set the MMIO_DIRTY flag in MMIOINFO so that Windows knows
that you want this buffer written out. Then you must do mmioSetInfo() so that
Windows' internal information is in sync with yours and knows you've set the
flag.
Finally, when you close an output RIFF file, the last buffer needs to be
flushed. Consider RIFFUSR::iClose() in Listing Two. First set MMIO_DIRTY and
do the mmioSetBuffer() to tell Windows to flush the buffer. However, Windows
doesn't flush it just then. Do an mmioAscend(), which writes the last buffer
out to disk, and takes the file pointer out of the chunk. But you're still in
the RIFF chunk (recall that these chunks are nested). One more mmioAscend() to
get out of this, and finally an mmioClose() to tell Windows that you're done
with the RIFF file.


Future Enhancements


NYB1 is not the final word on compressing waveforms cheaply and easily, but
it's a good starting point, and many areas can be improved or enhanced. I've
left out certain optimizations for clarity's sake. For example, you may want
to rewrite iNegFeedback() so it no longer does the differential twice for each
sample.
There are also possibilities for enhancements in the format itself. For
example, you may want to use more bits for louder sections. When the shrite
value is high, the quantization noise that isn't filtered out by various
tricks is very noticeable. To counter this, use more than four bits for each
mantissa. Experiment to determine just how much more and where the cutoff
point should be. Remember, you're sacrificing compression here for greater
clarity.
Also, consider the case of 16-bit input samples. Should NYB2 be able to
creditably handle high-fidelity waveforms? This could call for a different
trade-off. Consider a separate formula for increasing the bits per mantissa in
the case of a 16-bit input. Remember, the header of the .NYB file contains a
copy of the header of the original WAV file, so you'll know which formula to
use on playback.
A final enhancement might be to provide for stereo. This is simple enough. If
the header says the input was stereo, then use left/right channel pairs of
mantissas. The exponent (shrite) needn't be doubled in this way. Even when the
overall loudness varies between left and right channels, the listener probably
won't notice any improvement in clarity in the quieter channel, because the
louder channel will drown it out.
[LISTING ONE] (Text begins on page 20.)


//****************************** WAVELIB.H
*************************************
// Class library for Windows waveforms and MIDI.
// Copyright (c) 1993 by Neil G. Rowland, Jr. 04-JUN-93
#ifndef __WAVELIB_H
#define __WAVELIB_H
extern "C"
 {
 #include <windows.h>
 #include <mmsystem.h>
 }
#pragma hdrstop
//******************************************************************************

class RIFFUSR
 { // Base class for a user for a RIFF file.
 public:
 HMMIO hmmio; // handle to open WAVE file
 MMCKINFO ckRIFF; // chunk info. for RIFF chunk
 MMCKINFO ck; // info. for a chunk
 MMIOINFO mmioinfo; // current status

 RIFFUSR() { hmmio = NULL; };

 protected:
 BOOL iOpenRead(char* _pszInFile, char* _pFmt, int _fmtlen,
 char _t1, char _t2, char _t3, char _t4);
 void iCloseRead();
 BOOL iReadBuf();
 BOOL iReadByte(BYTE& _byte)
 { // If we are at end of the input file I/O buffer, fill it.
 // Test that we don't hit end of file while (lSamples > 0).
 if (mmioinfo.pchNext == mmioinfo.pchEndRead) {

 if (!iReadBuf()) { MessageBeep(0); return FALSE; }
 if (mmioinfo.pchNext == mmioinfo.pchEndRead) return FALSE;
 }
 _byte = *(mmioinfo.pchNext);
 mmioinfo.pchNext++;
 return TRUE;
 };
 BOOL iOpenWrite(char* _pszOutFile, const char* _pFmt, int _fmtlen,
 char _t1, char _t2, char _t3, char _t4);
 void iCloseWrite();
 BOOL iWriteBuf();
 BOOL iWriteByte(BYTE _byte)
 {
 if (!mmioinfo.pchNext) return FALSE;
 if (mmioinfo.pchNext >= mmioinfo.pchEndWrite)
 { // If at end of output file I/O buffer, flush it.
 if (!iWriteBuf())
 { MessageBeep(0); return FALSE; }
 }
 if (!mmioinfo.pchNext) { MessageBeep(0); return FALSE; }
 *mmioinfo.pchNext = _byte;
 mmioinfo.pchNext++;
 return TRUE;
 };
 };
//------------------------------------------------------------------------------
class WAVUSR : public RIFFUSR
 { // User for a .WAV file.
 public:
 PCMWAVEFORMAT Fmt; // format of WAVE file.
 WORD sample, left, right; // average, left and right channels.
 };
class WAVRDR : public WAVUSR
 {
 public:
 BOOL mOpenRead(char* _pszInFile);
 BOOL mNextSample();
 void mClose();
 protected:
 BOOL iReadChanSample(WORD& _sample);
 };
class WAVWRT : public WAVUSR
 {
 public:
 BOOL mOpenWrite(char* _pszOutFile, const PCMWAVEFORMAT* _pFmt);
 BOOL mWriteSample();
 void mClose();
 protected:
 BOOL iWriteChanSample(WORD _sample);
 };
typedef struct
 { // Play waveform output
 HWAVEOUT hwaveout;
 HANDLE hHdr;
 LPWAVEHDR lpHdr;
 HANDLE hBuf; // current waveform buffer.
 // accum buffer for feeding in a sample at a time...
 HANDLE hBufS;
 LPSTR lpBufS;

 unsigned countS;
 BOOL mOpen(LPWAVEFORMAT _pFmt);
 void mClose();
 void mPlaySample(WORD _sample);
 protected:
 inline void iCloseoutBuffer();
 inline void iCloseoutSampleBuffer();
 inline void iPlay(MMIOINFO* _pInfo);
 inline void iPlay(HANDLE _hbuf, LPSTR _lpBuf, int _len);
 }
WAVEPLAYER;
#endif //ndef __WAVELIB_H

[LISTING TWO]

//****************************** WAVELIB.CPP
***********************************
// Class library for Windows waveforms.
// Copyright (c) 1993 by Neil G. Rowland, Jr. 04-JUN-93
#include "wavelib.h"
//******************************************************************************

void lmemcpy(LPSTR _pszDest, LPSTR _pszSrc, DWORD _len)
 {
 if (!_pszDest) return;
 if (!_pszSrc) return;
 if (!_len) return;
 while (_len--) {
 *(_pszDest++) = *_pszSrc;
 _pszSrc++;
 }
 }
//********************************* RIFFUSR
************************************
#define BUFSIZE 25000
BOOL RIFFUSR::iOpenRead(char* _pszInFile, char* _pFmt, int _fmtlen,
 char _t1, char _t2, char _t3, char _t4)
 { // open a generic RIFF file for reading.
 // on success, it is descended into the data chunk,
 // and *_pFmt holds a copy of the fmt chunk.
 hmmio = mmioOpen(_pszInFile, NULL, MMIO_ALLOCBUF MMIO_READ);
 if (hmmio == NULL) return FALSE; // cannot open RIFF file

 mmioSetBuffer(hmmio, NULL, BUFSIZE, 0); // allocate buffers
 // Descend the input file into the RIFF' chunk.
 if (mmioDescend(hmmio, &ckRIFF, NULL, 0) != 0)
 goto ERROR_BAD_FORMAT;
 // Make sure the input file is of the desired type....
 if ((ckRIFF.ckid != FOURCC_RIFF) 
 (ckRIFF.fccType != mmioFOURCC(_t1, _t2, _t3, _t4)))
 goto ERROR_BAD_FORMAT;
 // Search the input file for for the fmt  chunk.
 ck.ckid = mmioFOURCC(f', m', t',  );
 if (mmioDescend(hmmio, &ck, &ckRIFF, MMIO_FINDCHUNK) != 0)
 goto ERROR_BAD_FORMAT; // no fmt  chunk
 // Expect the fmt' chunk to be at least as large as _fmtlen;
 // if there are extra parameters at the end, we'll ignore them
 if (ck.cksize < (long) _fmtlen)
 goto ERROR_BAD_FORMAT; // fmt  chunk too small
 // Read the fmt  chunk into *_pFmt.
 if (mmioRead(hmmio, (HPSTR) _pFmt, (long)_fmtlen) != (long)_fmtlen)

 return FALSE; // truncated file, probably
 // Ascend the input file out of the fmt  chunk.
 if (mmioAscend(hmmio, &ck, 0) != 0) return FALSE; // truncated file?
 // Search the input file for for the data' chunk, and descend.
 ck.ckid = mmioFOURCC(d', a', t', a');
 if (mmioDescend(hmmio, &ck, &ckRIFF, MMIO_FINDCHUNK) != 0)
 goto ERROR_BAD_FORMAT; // no data' chunk
 mmioGetInfo(hmmio, &mmioinfo, 0);
 return TRUE;
 ERROR_BAD_FORMAT:
 MessageBox(NULL,"Input file must be a RIFF file", "RIFFUSR::mOpenRead",
 MB_ICONEXCLAMATION MB_OK);
 iCloseRead();
 return FALSE;
 }
void RIFFUSR::iCloseRead()
 {
 if (!hmmio) return;
 mmioSetInfo(hmmio, &mmioinfo, 0); // properly close out input file.
 mmioClose(hmmio, 0);
 hmmio = NULL;
 }
BOOL RIFFUSR::iReadBuf()
 { return mmioAdvance(hmmio, &mmioinfo, MMIO_READ) == 0; }
//------------------------------------------------------------------------------
BOOL RIFFUSR::iOpenWrite(char* _pszOutFile, const char* _pFmt, int _fmtlen,
 char _t1, char _t2, char _t3, char _t4)
 {
 // Open the output file for writing using buffered I/O. Note that
 // if the file exists, the MMIO_CREATE flag causes it to be truncated
 // to zero length.
 hmmio = mmioOpen(_pszOutFile, NULL, MMIO_ALLOCBUF MMIO_WRITE
 MMIO_CREATE);
 if (hmmio == NULL) return FALSE; // cannot open WAVE file

 mmioSetBuffer(hmmio, NULL, BUFSIZE, 0); // allocate buffers
 // Create the output file RIFF chunk of desired type.
 ckRIFF.fccType = mmioFOURCC(_t1, _t2, _t3, _t4);
 if (mmioCreateChunk(hmmio, &ckRIFF, MMIO_CREATERIFF) != 0)
 goto cantwrite; // cannot write file, probably
 // We are now descended into the RIFF' chunk we just created.
 // Now create the fmt  chunk. Since we know the size of this chunk,
 // specify it in the MMCKINFO structure so MMIO doesn't have to seek
 // back and set the chunk size after ascending from the chunk.
 ck.ckid = mmioFOURCC(f', m', t',  );
 ck.cksize = _fmtlen; // we know the size of this ck.
 if (mmioCreateChunk(hmmio, &ck, 0) != 0) goto cantwrite;
 // Write the *_pFmt data to the fmt  chunk.
 if (mmioWrite(hmmio, (HPSTR) _pFmt, _fmtlen) != _fmtlen)
 goto cantwrite;
 // Ascend out of the fmt  chunk, back into the RIFF' chunk.
 if (mmioAscend(hmmio, &ck, 0) != 0) goto cantwrite;
 // Create the data' chunk that holds the waveform samples.
 ck.ckid = mmioFOURCC(d', a', t', a');
 if (mmioCreateChunk(hmmio, &ck, 0) != 0) goto cantwrite;
 mmioGetInfo(hmmio, &mmioinfo, 0);
 return TRUE;
 cantwrite:
 iCloseWrite();

 return FALSE;
 }
void RIFFUSR::iCloseWrite()
 {
 if (!hmmio) return;
 // flush the output RIFF chunk...
 mmioinfo.dwFlags = MMIO_DIRTY;
 if (mmioSetInfo(hmmio, &mmioinfo, 0) != 0) goto ERROR_CANNOT_WRITE; // cannot
flush, probably
 // Ascend the output file out of the data' chunk -- this will cause
 // the chunk size of the data' chunk to be written.
 if (mmioAscend(hmmio, &ck, 0) != 0) goto ERROR_CANNOT_WRITE; // cannot write
file, probably
 // Ascend the output file out of the RIFF' chunk -- this will cause
 // the chunk size of the RIFF' chunk to be written.
 if (mmioAscend(hmmio, &ckRIFF, 0) != 0) goto ERROR_CANNOT_WRITE; // cannot
write file, probably
 goto close;
 ERROR_CANNOT_WRITE:
 MessageBox(NULL, "Error closing out output file", "RIFFUSR::mCloseWrite",
MB_ICONEXCLAMATION MB_OK);
 close:
 mmioClose(hmmio, 0);
 hmmio = NULL;
 }
BOOL RIFFUSR::iWriteBuf()
 {
 mmioinfo.dwFlags = MMIO_DIRTY;
 if (mmioAdvance(hmmio, &mmioinfo, MMIO_WRITE) != 0) return FALSE;
 return TRUE;
 }
//******************************************************************************
BOOL WAVRDR::mOpenRead(char* _pszInFile)
 {
 if (!RIFFUSR::iOpenRead(_pszInFile, (char*)&Fmt,
 sizeof(PCMWAVEFORMAT), W', A', V', E'))
 return FALSE;
 // Make sure the input file is a PCM WAVE file of a variety we support.
 if ((Fmt.wf.wFormatTag != WAVE_FORMAT_PCM)) goto ERROR_BAD_FORMAT; // bad
input file format
 if ((Fmt.wBitsPerSample != 8) && (Fmt.wBitsPerSample != 16))
 goto ERROR_BAD_FORMAT; // bad input file format
 return TRUE;
 ERROR_BAD_FORMAT:
 MessageBox(NULL, "Input file must be a PCM WAVE file",
 "WAVUSR::mOpenRead", MB_ICONEXCLAMATION MB_OK);
 ERROR_BAD_FORMAT1:
 mClose();
 return FALSE;
 }
BOOL WAVRDR::iReadChanSample(WORD& _sample)
 { // read one sample for one channel.
 BYTE c;
 if (!iReadByte(c)) return FALSE;
 _sample = c << 8;
 if (Fmt.wBitsPerSample > 8) { // 16 bit
 if (!iReadByte(c)) return FALSE;
 _sample = c;
 }
 return TRUE;
 }
BOOL WAVRDR::mNextSample()
 { // read the next sample into fields sample, left and right.
 if (Fmt.wf.nChannels == 1) { // mono

 if (!iReadChanSample(sample)) return FALSE;
 left = right = sample;
 }
 else { // stereo
 if (!iReadChanSample(left)) return FALSE;
 if (!iReadChanSample(right)) return FALSE;
 sample = left>>1 + right >>1;
 }
 return TRUE;
 }
void WAVRDR::mClose()
 { RIFFUSR::iCloseRead(); }
//******************************************************************************
BOOL WAVWRT::mOpenWrite(char* _pszOutFile, const PCMWAVEFORMAT* _pFmt)
 {
 if (!RIFFUSR::iOpenWrite(_pszOutFile, (const char*)_pFmt,
sizeof(PCMWAVEFORMAT), W', A', V', E'))
 return FALSE;
 Fmt = *_pFmt;
 return TRUE;
 cantwrite:
 mClose();
 return FALSE;
 }
void WAVWRT::mClose()
 { RIFFUSR::iCloseWrite(); }
BOOL WAVWRT::iWriteChanSample(WORD _sample)
 {
 if (mmioinfo.pchNext >= mmioinfo.pchEndWrite-1) { // If we are at the end of
the output file I/O buffer, flush it.
 if (!iWriteBuf()) { MessageBeep(0); return FALSE; }
 }
 *(mmioinfo.pchNext)++ = HIBYTE(_sample);
 if (Fmt.wBitsPerSample > 8) // 16 bit
 *(mmioinfo.pchNext)++ = LOBYTE(_sample);
 return TRUE;
 }
BOOL WAVWRT::mWriteSample()
 {
 if (Fmt.wf.nChannels == 1) // mono
 if (!iWriteChanSample(sample)) return FALSE;
 else { // stereo
 if (!iWriteChanSample(left)) return FALSE;
 if (!iWriteChanSample(right)) return FALSE;
 }
 return TRUE;
 }
//******************************************************************************
inline void WAVEPLAYER::iCloseoutBuffer()
 { // Flush and free the current buffer.
 if (!lpHdr !hBuf) return;
 if (!(lpHdr->dwFlags&WHDR_PREPARED)) return;
 // Finish up with previous buffer...
 while ((lpHdr->dwFlags&WHDR_DONE) == 0) Yield(); // wait
 waveOutUnprepareHeader(hwaveout, lpHdr, sizeof(WAVEHDR));
 GlobalUnlock(hBuf); GlobalFree(hBuf);
 hBuf = NULL;
 }
inline void WAVEPLAYER::iCloseoutSampleBuffer()
 { // flush buffer for sample mode...
 if (!hBufS !lpBufS) return;

 if (countS) iPlay(hBufS, lpBufS, countS); // no longer own old buffer.
 lpBufS = NULL; // so will get new buffer.
 hBufS = NULL;
 }
inline void WAVEPLAYER::iPlay(MMIOINFO* _pInfo)
 {
 if (!hwaveout) return;
 if (!_pInfo) return;
 if (!lpHdr) return;
 DWORD len = _pInfo->pchNext - _pInfo->pchBuffer;
 HANDLE hNewBuf = GlobalAlloc(GMEM_MOVEABLE GMEM_SHARE, len);
 LPSTR lpNewBuf = GlobalLock(hNewBuf);
 lmemcpy(lpNewBuf, (LPSTR)_pInfo->pchBuffer, len);
 iPlay(hNewBuf, lpNewBuf, len);
 }
inline void WAVEPLAYER::iPlay(HANDLE _hbuf, LPSTR _lpBuf, int _len)
 { // the passed buffer will be freed later, not by caller...
 iCloseoutBuffer(); // finish with previous.
 // Queue this buffer's worth...
 hBuf = _hbuf;
 lpHdr->lpData = _lpBuf;
 lpHdr->dwBufferLength = _len;
 lpHdr->dwLoops = 0L;
 lpHdr->dwFlags = 0L; // MPG, p5-28
 if (0 == waveOutPrepareHeader(hwaveout, lpHdr, sizeof(WAVEHDR)))
 waveOutWrite(hwaveout, lpHdr, sizeof(WAVEHDR));
 else MessageBeep(0);
 }
//------------------------------------------------------------------------------
BOOL WAVEPLAYER::mOpen(LPWAVEFORMAT _pFmt)
 {
 hwaveout = NULL; hHdr = NULL; lpHdr = NULL;
 hBufS = NULL; lpBufS = NULL; countS = 0;
 if (!_pFmt) return FALSE;
 if (0 != waveOutOpen(&hwaveout, WAVE_MAPPER, _pFmt, NULL, NULL, 0))
 return FALSE;
 // allocate the buffer header, but no buffer yet...
 hBuf = NULL;
 hHdr = GlobalAlloc(GMEM_MOVEABLE GMEM_SHARE, sizeof(WAVEHDR));
 lpHdr = (LPWAVEHDR)GlobalLock(hHdr);
 lpHdr->dwFlags = 0;
 return TRUE;
 }
void WAVEPLAYER::mClose()
 {
 iCloseoutSampleBuffer();
 iCloseoutBuffer();
 if (hwaveout) waveOutClose(hwaveout);
 if (hHdr) { GlobalUnlock(hHdr); GlobalFree(hHdr); }
 hwaveout = NULL;
 hHdr = NULL; lpHdr = NULL;

 if (hBufS) {
 GlobalUnlock(hBufS); GlobalFree(hBufS);
 hBufS = NULL; lpBufS = NULL;
 }
 }
void WAVEPLAYER::mPlaySample(WORD _sample)
 {

 if (!lpBufS) { // need a new buffer...
 hBufS = GlobalAlloc(GMEM_FIXED, BUFSIZE+1);
 if (!hBufS) return;
 lpBufS = GlobalLock(hBufS);
 countS = 0;
 }
 lpBufS[countS] = HIBYTE(_sample);
 if (++countS >= BUFSIZE)
 iCloseoutSampleBuffer();
 }

[LISTING THREE]

//******************************** NYBLIB.H
***********************************
// Main header file for NYB1 compressed waveform utility.
// Copyright (c) 1993 by Neil G. Rowland, Jr. 24-JUN-93

#include "wavelib.h"

//******************************************************************************
// Access to NYB1 files. Makes it seem like .WAV files
// A group has this many nybbles, preceded by a magnitude nybble...
#define SAMPLESPERGROUP 7
class NYBUSR : public RIFFUSR
 { // User for a .NYB file.
 public:
 PCMWAVEFORMAT Fmt; // format of would-be WAVE file.
 WORD sample; // 16-bit .WAV style sample.
 NYBUSR();
 protected:
 int groupcount;
 WORD encprevsample; // used by iEncode.
 WORD decprevsample; // used by iDecode.
 BYTE shrite; // maginitude of current group (shift-right amount)
 inline void iDecode(WORD& _sample);
 inline void iEncode(WORD& _sample);
 };
class NYBRDR : public NYBUSR
 {
 public:
 unsigned samplespersec;
 NYBRDR();
 BOOL mOpenRead(char* _pszInFile);
 BOOL mNextSample();
 void mClose();
 protected:
 BYTE inbuf; // first nybble is high, second is low.
 BOOL nyb2; // true if second nybble
 inline BOOL iReadNybble(BYTE& _nybble);
 };
class NYBWRT : public NYBUSR
 {
 public:
 NYBWRT();
 BOOL mOpenWrite(char* _pszOutFile, const PCMWAVEFORMAT* _pFmt);
 BOOL mWriteSample(WORD _sample, WAVEPLAYER* _pOut = NULL);
 void mClose();
 protected:
 BYTE outbuf; // first nybble is high, second is low.

 BOOL nyb2w;
 inline BOOL iWriteNybble(BYTE _nybble);
 WORD outsamp, intsamp; // output and intermediate samples
 signed diff; // compensation, to prevent cumulative error.
 BOOL iNegFeedbackStage();
 // internal to mWriteSample (dealing with groups)...
 WORD groupbuf[SAMPLESPERGROUP];
 WORD* pgroup;
 WORD diffmin, diffmax; // for determining magnitude.
 WORD prevsample;
 };

[LISTING FOUR]

//******************************** NYBLIB.CPP
**********************************
// Routines to deal with nybble format compressed waveforms.
// Copyright (c) 1993 by Neil G. Rowland, Jr. 24-JUN-93
extern "C"
 {
 #include <math.h>
 #include <stdlib.h>
 #include <string.h>
 }
#include "nyblib.h"
//******************************************************************************
static WORD PreAdj[12+1] = {
 // table to map delta translating to nybble to 0 to a 0...
 32768-8,
 32768-16, 32768-32, 32768-64, 32768-128,
 32768-256, 32768-512, 32768-1024, 32768-2048,
 32768-4096, 32768-8192, 32768-16384, 0};
static BOOL fNagged = FALSE;
inline void Shrite(WORD& _sample, BYTE _shrite)
 { // get delta down to a nybble...
 WORD orig = _sample; // helps in clipping.
 if (_shrite == 0) { _sample = 8; return; }
 _sample -= PreAdj[_shrite];
 _sample >>= _shrite;
 if (_sample > 15) { // overflow/underflow.
 //D if (!fNagged)
 //D { MessageBox((HWND)NULL, "Shrite:overflow", "NYBLIB",
 //D MB_OK); fNagged = TRUE; }
 _sample = (orig & 0x8000)? 15:0;
 }
 }
inline void UnShrite(WORD& _sample, BYTE _shrite)
 { // convert a nybble to a delta...
 if (_shrite == 0) { _sample = 32768; return; }
 _sample <<= _shrite;
 _sample += PreAdj[_shrite];
 }
inline void Diff(WORD& _sample, WORD& prevsample)
 { // called once per sample.
 _sample >>=1; _sample+= 16384; // halve magnitude to avoid overflow.
 WORD sample = _sample;
 _sample = (sample-prevsample) + 32768;
 prevsample = sample;
 };
inline void Integ(WORD& _sample, WORD& prevsample)

 { // called once per sample.
 unsigned intl;
 intl = prevsample; intl +=_sample-32768;
 _sample = intl;
 prevsample = _sample;
 // undo halving and catch overflow...
 _sample -= 16384;
 _sample = (_sample&0x8000)? (prevsample-16384)<<1 : _sample<<1;
 };
//****************************************************************************
NYBUSR::NYBUSR()
 { hmmio = NULL; groupcount = 0; encprevsample = decprevsample = 32768; }
inline void NYBUSR::iEncode(WORD& _sample)
 { // encode the data. does not include cumulative error prevention.
 if (shrite < 12) // otherwise no gain, maybe even loss
 Diff(_sample, encprevsample);
 else
 encprevsample = _sample; // in case last in group
 Shrite(_sample, shrite);
 if (_sample>15) MessageBox((HWND)NULL, "Bad result", "NYBUSR", MB_OK);
 }
inline void NYBUSR::iDecode(WORD& _sample)
 { // decode the encoded data. doubles as cumulative error prevention.
 //D if (_sample > 15)
 //D { MessageBox((HWND)NULL, "bad _sample", "NYBUSR::mDecode",
 //D MB_OK); return; }
 UnShrite(_sample, shrite);
 if (shrite < 12) // otherwise no gain, maybe even loss
 Integ(_sample, decprevsample);
 else
 decprevsample = _sample; // in case last in group
 }

//****************************************************************************
void NYBRDR::NYBRDR()
 { nyb2 = FALSE; }
BOOL NYBRDR::mOpenRead(char* _pszInFile)
 {
 if (!RIFFUSR::iOpenRead(_pszInFile, (char*)&Fmt, sizeof(PCMWAVEFORMAT),
 N', Y', B', 1'))
 return FALSE;
 // Make sure the input file is a mono PCM WAVE file.
 if ((Fmt.wf.wFormatTag != WAVE_FORMAT_PCM)) goto ERROR_BAD_FORMAT;
 if (Fmt.wf.nChannels != 1) { // stereo not supported yet.
 MessageBox((HWND)NULL, "This is a STEREO wave file header",
 "NYBRDR", MB_ICONEXCLAMATIONMB_OK);
 goto ERROR_BAD_FORMAT1; // bad input file format
 }
 samplespersec = Fmt.wf.nSamplesPerSec;
 groupcount = 0;
 return TRUE;
 ERROR_BAD_FORMAT:
 MessageBox(NULL, "Input file must be a NYB1 file",
 "NYBRDR::mOpenRead", MB_ICONEXCLAMATION MB_OK);
 ERROR_BAD_FORMAT1:
 mClose();
 return FALSE;
 }
inline BOOL NYBRDR::iReadNybble(BYTE& _nybble)

 {
 if (nyb2) _nybble = inbuf & 0xf;
 else {
 if (!iReadByte(inbuf)) return FALSE;
 _nybble = inbuf >> 4;
 }
 nyb2 = !nyb2;
 return TRUE;
 }
BOOL NYBRDR::mNextSample()
 {
 BYTE nybble;
 if (groupcount == 0)
 if (!iReadNybble(shrite)) return FALSE;
 if (shrite)
 { if (!iReadNybble(nybble)) return FALSE; }
 else nybble = 8; // we don't store empty groups.
 sample = nybble; iDecode(sample);
 groupcount++; if (groupcount == SAMPLESPERGROUP) groupcount = 0;
 return TRUE;
 }
void NYBRDR::mClose()
 { RIFFUSR::iCloseRead(); }
//*****************************************************************************
void NYBWRT::NYBWRT() {
 nyb2w = FALSE; diff = 0;
 pgroup = groupbuf;
 diffmin = diffmax = prevsample = 32768;
 }
BOOL NYBWRT::mOpenWrite(char* _pszOutFile, const PCMWAVEFORMAT* _pFmt)
 {
 if (!RIFFUSR::iOpenWrite(_pszOutFile, (const char*)_pFmt,
 sizeof(PCMWAVEFORMAT), N', Y', B', 1'))
 return FALSE;
 groupcount = 0;
 return TRUE;
 cantwrite:
 mClose();
 return FALSE;
 }
void NYBWRT::mClose()
 { RIFFUSR::iCloseWrite(); }
inline BOOL NYBWRT::iWriteNybble(BYTE _nybble)
 { // write out to file:
 if (_nybble > 15) { MessageBox((HWND)NULL, "Bad nybble",
 "NYBWRT", MB_OK); return FALSE; }
 if (!nyb2w) outbuf = _nybble << 4;
 else {
 outbuf = _nybble;
 if (!iWriteByte(outbuf)) return FALSE; // to the file.
 }
 nyb2w = !nyb2w;
 return TRUE;
 }
BOOL NYBWRT::iNegFeedbackStage()
 { // encode and fight cumulative error
 // called once per sample.
 intsamp = sample;
 iEncode(intsamp);

 // fight cumulative error by applying a bias to intsamp
 if (diff < 0) { if (intsamp < 15) intsamp ++; }
 if (diff > 0) { if (intsamp > 0) intsamp --; }
 if (!iWriteNybble(intsamp)) return FALSE;
 // see what the output would be, and figure compensation...
 outsamp = intsamp;
 iDecode(outsamp);
 if (outsamp > sample) diff = 1; // overshot
 else if (outsamp < sample) diff = -1; // undershot
 else diff = 0;
 sample = outsamp; // in case caller is interested.
 return TRUE;
 }
BOOL NYBWRT::mWriteSample(WORD _sample, WAVEPLAYER* _pOut)
 { // takes a stream of samples divides it into groups, passes it
 WORD diff1;
 WORD samp16 = _sample;
 WORD samp15 = samp16 >> 1;
 diff1 = samp16; Diff(diff1, prevsample);
 if (diff1 > diffmax) diffmax = diff1;
 if (diff1 < diffmin) diffmin = diff1;
 if (++groupcount >= SAMPLESPERGROUP) { // end-of-group processing...
 // first determine the magnitude of the group:
 WORD diffrange = diffmax-diffmin;
 if (diffrange < 0xF000) diffrange += diffrange>>4;
 if (diffrange == 0) shrite = 0; // signifies silence
 else { // shrite is bits delta shifted right
 shrite = 12;
 while (shrite>0 && !(diffrange&0x8000))
 { shrite--; diffrange <<= 1; }
 }
 iWriteNybble(shrite);
 if (shrite) { // process the group...
 pgroup = groupbuf;
 for (groupcount=0; groupcount<SAMPLESPERGROUP;
 groupcount++) { // 2nd pass.
 sample = *(pgroup++);
 if (_pOut) _pOut->mPlaySample(sample);
 if (!iNegFeedbackStage()) return FALSE;
 //D if (_pOut) _pOut->mPlaySample(sample);
 }
 }
 // reset for next group:
 diffmin = diffmax = 32768;
 groupcount = 0;
 pgroup = groupbuf;
 }
 *(pgroup++) = samp16;
 return TRUE;
 }

[LISTING FIVE]

//******************************** NYB1.CPP
***********************************
// Main app for NYB1 compressed waveform utility.
// Copyright (c) 1993 by Neil G. Rowland, Jr. 04-JUN-93
extern "C" {
 #include <math.h>
 #include <stdlib.h>

 #include <string.h>
 }
#include "nyblib.h"
#define IDM_ABOUT 11 // menu items
#define ID_INPUTFILEEDIT 101 // input file name edit box
#define ID_OUTPUTFILEEDIT 102 // output file name edit box
//*****************************************************************************
int PASCAL WinMain(HANDLE hInst, HANDLE hPrev, LPSTR lpszCmdLine,int
iCmdShow);
BOOL FAR PASCAL AboutDlgProc(HWND hwnd, unsigned wMsg,WORD wParam,LONG
lParam);
BOOL FAR PASCAL CvtDlgProc(HWND hwnd, unsigned wMsg, WORD wParam, LONG
lParam);
//*****************************************************************************
char gszAppName[] = "Waver"; // for title bar, etc.
HANDLE ghInst; // app's instance handle
void PlayIt(char* _pszNybFile)
 {
 WAVEPLAYER Play; // audio output.
 PCMWAVEFORMAT pcmWaveFormat; // contents of fmt' chunks
 NYBRDR NybIn; // for playback of nybble file
 if (!NybIn.mOpenRead(_pszNybFile)) {
 MessageBox(NULL, "Cannot read input NYB1 file", _pszNybFile,
 MB_ICONEXCLAMATION MB_OK);
 return;
 }
 Play.mOpen(&NybIn.Fmt.wf);
 for (;;) {
 if (!NybIn.mNextSample()) break;
 Play.mPlaySample(NybIn.sample);
 }
 NybIn.mClose();
 Play.mClose();
 }
void DoIt(char* _pszInFile, char* _pszNybFile)
 {
 WAVRDR In;
 NYBWRT Out;
 WAVEPLAYER Play; // audio output.
 long lSamples; // number of samples to filter
 unsigned char cThis = 0;
 signed char olddelta = 0;
 BOOL lastwasendpoint = TRUE;
 if (_pszInFile && *_pszInFile) {
 // provide a default:
 if (!_pszNybFile !lstrlen(_pszNybFile)) _pszNybFile = "\x.nyb";
 // Open the input file for reading using buffered I/O.
 if (!In.mOpenRead(_pszInFile))
 goto ERROR_CANNOT_READ;
 if (!Out.mOpenWrite(_pszNybFile, &In.Fmt))
 goto ERROR_CANNOT_WRITE;
 Play.mOpen(&In.Fmt.wf);
 for (lSamples = In.ck.cksize; lSamples > 0; lSamples--) {
 WORD sample;
 if (!In.mNextSample()) goto ERROR_CANNOT_READ;
 sample = In.sample; // raw input
 Out.mWriteSample(sample, &Play);
 }
 Out.mClose();
 Play.mClose();
 }
 // now play it back...

 PlayIt(_pszNybFile);
 goto EXIT_FUNCTION;
 ERROR_CANNOT_READ:
 MessageBox(NULL, "Cannot read input file",
 gszAppName, MB_ICONEXCLAMATION MB_OK);
 goto EXIT_FUNCTION;
 ERROR_CANNOT_WRITE:
 MessageBox(NULL, "Cannot write output NYB1 file", gszAppName,
 MB_ICONEXCLAMATION MB_OK);
 goto EXIT_FUNCTION;
 EXIT_FUNCTION:
 // Close the files (unless they weren't opened successfully).
 In.mClose();
 Out.mClose();
 Play.mClose();
 }

//*****************************************************************************
int PASCAL WinMain(HANDLE hInst, HANDLE hPrev, LPSTR lpszCmdLine, int
iCmdShow)
 {
 FARPROC fpfn;
 HWND hwd;
 MSG msg;
 // Save instance handle for dialog boxes.
 ghInst = hInst;
 if (lpszCmdLine && *lpszCmdLine)
 DoIt(NULL, lpszCmdLine);
 // Display our dialog box.
 fpfn = MakeProcInstance((FARPROC) CvtDlgProc, ghInst);
 if (!fpfn) goto erret;
 hwd = CreateDialog(ghInst, "LOWPASSBOX", NULL, fpfn);
 if (!hwd) goto erret;
 ShowWindow(hwd, TRUE); UpdateWindow(hwd);
 while (GetMessage(&msg, NULL, 0, 0))
 if (!IsDialogMessage(hwd, &msg))
 DispatchMessage(&msg);
 DestroyWindow(hwd);
 FreeProcInstance(fpfn);
 return TRUE;
 erret:
 MessageBeep(0);
 return FALSE;
 }
// AboutDlgProc - Dialog procedure function for ABOUTBOX dialog box.
BOOL FAR PASCAL AboutDlgProc(HWND hWnd, unsigned wMsg,WORD wParam,LONG lParam)
 {
 switch (wMsg) {
 case WM_INITDIALOG:
 return TRUE;
 case WM_COMMAND:
 if (wParam == IDOK)
 EndDialog(hWnd, TRUE);
 break;
 }
 return FALSE;
 }
// CvtDlgProc - Dialog procedure function for conversion dialog box.
BOOL FAR PASCAL CvtDlgProc(HWND hWnd, unsigned wMsg, WORD wParam, LONG lParam)
 {

 FARPROC fpfn;
 HMENU hmenuSystem; // system menu
 HCURSOR ghcurSave; // previous cursor
 switch (wMsg) {
 case WM_INITDIALOG:
 // Append "About" menu item to system menu.
 hmenuSystem = GetSystemMenu(hWnd, FALSE);
 AppendMenu(hmenuSystem, MF_SEPARATOR, 0, NULL);
 AppendMenu(hmenuSystem, MF_STRING, IDM_ABOUT,
 "&About LowPass...");
 return TRUE;
 case WM_SYSCOMMAND:
 switch (wParam) {
 case IDM_ABOUT:
 // Display "About" dialog box.
 fpfn = MakeProcInstance((FARPROC) CvtDlgProc, ghInst);
 DialogBox(ghInst, "ABOUTBOX", hWnd, fpfn);
 FreeProcInstance(fpfn);
 break;
 }
 break;
 case WM_COMMAND:
 switch (wParam) {
 case IDOK: // "Begin"
 // Set "busy" cursor, filter input file, restore cursor.
 char szInFile[200]; // name of input file
 char szOutFile[200]; // name of output file
 // Read filenames from dialog box fields.
 szInFile[0] == 0;
 GetDlgItemText(hWnd, ID_INPUTFILEEDIT, szInFile,
 sizeof(szInFile));
 szOutFile[0] == 0;
 GetDlgItemText(hWnd, ID_OUTPUTFILEEDIT, szOutFile,
 sizeof(szOutFile));
 ghcurSave = SetCursor(LoadCursor(NULL, IDC_WAIT));
 DoIt(szInFile, szOutFile);
 SetCursor(ghcurSave);
 break;
 case IDCANCEL: // "Done"
 PostQuitMessage(0);
 break;
 }
 break;
 }
 return FALSE;
}
End Listings















Special Issue, 1994
Multimedia Audio Systems


Will General MIDI lead the way in interactive music?




John Ratcliff


John is a graphic artist, designer, and programmer living in St. Louis. His
entertainment products include 688 Attack Sub and Seawolf from Electronic Arts
and KaleidoSonics from Masque publishing. He can be contacted at 747 Napa
Lane, St. Charles, MO 63303.


If any single segment of the multimedia-hardware market has boom-ed over the
last two years, it's sound. Spurred on by the success of Creative Labs'
SoundBlaster and Windows 3.1, most vendors (hardware and software) are
supporting multimedia sound in one way or another. From Turtle Beach's
Multisound and Roland's Sound Canvas at the high-end, to the SoundBlaster and
MediaVision's Thunderboard at the low-end, users can take their pick,
depending on price, features, and performance.
What does all of this activity mean? For the user it means chaos and
confusion; for the hardware vendors, bitter rivalries and vicious competition.
Still, at the center of this is a fascination with producing music and sound
effects inside software, a captivation that goes beyond that of even hardcore
audiophiles. People are passionate about music, and thus the intense interest
in this burgeoning technology.
The sound industry is currently in a great state of flux. While most of the
two million or so installed sound cards use reliable, albeit basic, digital
sound and the familiar FM (kazoo-like) synthesis, the current debate involves
the type of audio device yet to come. What are the features of the
next-generation sound system? Who is going to provide these systems, in what
package, with what features, and at what price point? And the $64,000 question
is, "Who needs next-generation sound and why?"
As a longtime observer of and participant in this industry, I'll address some
of the questions about where we're heading. As for the "why" we're heading in
this direction, I'm not entirely clear. Multimedia technology is, to a great
extent, a solution without a problem. It's a collection of gee-whiz toys that
are fun, but not yet particularly useful. Certainly many aspects of multimedia
greatly enhance the enjoyment and appeal of applications software. Maybe
that's enough.
What most people accept as a definition of multimedia is what computer-game
developers have been trying to provide since Space War first popped up on a
university mainframe. Computer games are pushing the envelope and defining the
nature of how sound and music are used in an interactive environment. This is
evidenced by products such as Virgin/Trilobyte's The 7th Guest. This
interactive film on CD-ROM has an original score by the Fat Man (aka George
Sanger, an Austin, Texas-based composer of music for computer games like The
7th Guest and Wing Commander; for more on the Fat Man, see my article,
"Examining PC Audio," Dr. Dobb's Journal, March 1993) and a fully orchestrated
"Red Book," or CD-DA, audio sound track featuring vocals and music by three
Grammy winners. Another example is LucasArts' X-Wing, which uses the John
Williams score from the Star Wars film as arranged for general MIDI and
adapted for interactivity by Michael Land, Peter McConnel, and Clint Bajakian.
In addition to The Fat Man, other interactive music composers include Rob
Wallace of Wallace Music and Sound (Glendale, Arizona) Donald Griffin of
Computer Music Consulting (San Francisco, California) and Bobby Prince
(Venice, Florida).
The way in which sound and music are used in an interactive environment is
critical to the success of multimedia. As the Fat Man said in the March 1993
issue of Dr. Dobb's Journal:
Sound and music can increase throughput, enhance the pleasantness of computer
experience, and increase entertainment value of a program.
When should music and sound be used? Not just in games.
Although computers and film are very different media, the relative maturity of
the latter makes it useful for developers to look at film as a model for the
future of some aspects of computer software. Especially with the importance of
multimedia, we can look for examples not only in feature films, but all kinds
of video applications: educational and industrial films, training,
advertising, and news programs--anything that's been on a film or videotape.
When do they use music and sound? To enhance emotion where it already exists.
Some folks like happy faces or other graphics when their computers start up--a
happy tune can triple the happy effect.
Sound also manipulates or changes an emotion that might already exist. A child
safety multimedia presentation might show a picture of a cute baby near a
swimming pool, something like the Jaws theme might keep the user from focusing
on the cuteness of the baby. A short, simple title tune might make a database
program seem less complex and frightening. In the case of the "bad news"
dialog box, consider how much more palatable a warning a pleasant "ping" is
than an explosive sound (or a picture of a bomb, for that matter.) With audio,
the developer, like a film director is able to control the degree of emotion
the user feels.
How should music and sound be used in computer programs? Like graphics, they
have to be done right. Simply filling up space with spectacular graphics done
by an "artist friend" (everybody's got one) is simply not the way to create an
effective program, and the same applies to music. There's as much an art to
using music and sound in software as there is in film. And of course it's too
big a subject to address here.
Regardless of the extent to which music and sound will be used in multimedia
applications, it's the computer-game industry that's on the leading edge,
helping define standards and pushing the limits of the technology. I've worked
with more than a hundred game publishers to incorporate music and sound in
over 200 commercial products. From this vantage point, it seems that "general
MIDI" is the best way to deal with interactive music in applications software.
The Musical Instrument Digital Interface (or "MIDI") specification is an
internationally supported de facto standard that defines a serial interface
for connections between music synthesizers, musical instruments, and
computers. MIDI, which is maintained by the MIDI Manufacturers Association
(Los Angeles, California), is based both on hardware (I/O channels, cables,
and the like) and software (encoded messages defining device, pitch, volume,
and so forth). According to the specification, the receiving device in a MIDI
system interprets the musical data even though the sending device has no way
of knowing what the receiver can do. But this can be a problem if the
receiving device doesn't have the capability to interpret the data correctly.
General MIDI addresses this problem by identifying hardware capabilities in
advance.
All general-MIDI devices have 128 sound effects as well as musical instrument
and percussion sounds. General-MIDI systems support simultaneous use of 16
MIDI channels with a minimum of 24 notes each, and they have a specified set
of musical controllers. This means that with general MIDI, the sender knows
what to expect of the receiver. Consequently, a file created with one
general-MIDI device is recognizable when played on any other--without losing
notes or changing instrumental balance.
General MIDI, a platform-independent subset of MIDI that's being supported by
a wide-range of hardware and software vendors, has been endorsed by more than
100 companies, including Microsoft, Apple, IBM, WordPerfect, and the like. The
specification is supported on Windows, DOS, Macintosh, Atari, and Amiga. In
fact, general MIDI is the authoring format for MIDI required by Windows and
MPC. Currently, the MIDI Manufacturers Association is defining general-MIDI
file-format standards.
Sound boards that provide general-MIDI support include the MediaVision's Pro
AudioStudio 16 and Creative Labs' SoundBlaster 16 with general MIDI
wave-sample synthesis daughterboard. Furthermore, general-MIDI chipsets are
being developed by a number of companies. I'll touch on some of these boards
and chipsets later in this article.
From a practical viewpoint, general MIDI is widely acknowledged as being
defined by Roland Corporation's SoundCanvas (SCC55 and SCC1). The Roland
SoundCanvas is a wave-table synthesis MIDI device that supports the
general-MIDI specification and implements chorus and reverb effects that give
the music a rich timbre. (Unlike FM synthesis, which begins with a generic
sound wave, wave-table synthesis uses recordings of actual instruments to
generate music.) The result is that when a user plays X-Wing with a Roland
SoundCanvas, and the familiar John Williams score kicks in, the music is
comparable to the actual film score. The Roland SoundCanvas and the
general-MIDI specification both support rich symphonic sound. Using wave-table
synthesis that incorporates extremely high-resolution, digitally sampled
instruments gives you rich string sections, unbelievable piano, and percussion
that rocks your socks off.
Another recent development that's bridged the conversion to general MIDI is
Yamaha's license of the Fat Man's general-MIDI patch set for the OPL2 and OPL3
FM synthesizers--the same FM synthesizers found in sound cards from Adlib,
Creative Labs, MediaVision, and Microsoft. The Fat Man developed a set of
tones that allow fully orchestrated general-MIDI music to be played on sound
cards such as the SoundBlaster. These tones, available to all developers, are
being incorporated into computer games and Windows drivers. Though FM
synthesis chips like the OPL2 or OPL3 can never play music that sounds like a
wave-table device, their tones convey a general-MIDI score with a high level
of musicality.
While it's beyond the scope of this article to go into any detail about the
myriad of general-MIDI sound cards crowding the multimedia market, I'll
nonetheless briefly mention each. Table 1(a) provides further details. The
cream of the crop is the Roland SoundCanvas (SC55 and SCC1). Roland is now
shipping the RAP-10, a lower-cost version with digital sound support. Turtle
Beach has one of the most advanced multimedia sound cards, Multisound, which
not only supports general MIDI, but features a full Proteus 1XR and the
highest quality digital sound support. Turtle Beach has a new low-cost
general-MIDI synthesizer in development called the "Maui" card. Creative Labs'
entry into the general-MIDI market is their WaveBlaster card, a daughterboard
for the already-popular SoundBlaster 16.
Sierra Semiconductor has developed a general-MIDI hardware specification
called "Aria" which provides general MIDI, digital audio, and SoundBlaster
compatibility in a low-cost package. Sierra has also licensed Archer
Communications' QSound Virtual Audio tech-nology for integration into the
DSP-based Aria chipsets. QSound Virtual Audio is a multidimensional
sound-localization technology that allows sounds to appear to be coming from
locations unreferenced to the speakers. Programmers can use QSound to generate
"soundscapes" that exceed the physical bounds of stereo-speaker geometry. The
Aria chipset consists of a controller, DSP, and memory. It has been adopted by
a number of vendors and is used in MidiMaestro from Computer Peripheral and
SonicSound from Diamond Computer Systems.
Ensoniq's entry into the general-MIDI market is their SQ1000 card, which also
features SoundBlaster-compatibility. The Media FX sound card from Video Seven,
also based on Ensoniq's Soundscape chipset, provides 32 voices in four Mbytes
of ROM. It records and plays back at 44.1 kHz. Advanced Gravis has added
general-MIDI support to their popular Gravis Ultrasound card. Yamaha is
developing a chipset called the "OPL4" that's likely to be adopted by a number
of vendors soon. The OPL4 features full general-MIDI support, as well as
backward compatibility with OPL2 and OPL3 FM synthesis. Yamaha has also
released its own sound card, the Hello!Music!, that's based on the chipset. At
the heart of the board, which is fully MIDI compatible, is the CBX-T3 Tone
Generator that provides 192 instrument sounds, 10 drum kits, and digital
reverb. The CBX-T3 can also accept audio input from audio devices such as
tape/radio players, CD players, or microphones. Logitech's SoundMan 16, which
uses MediaVision's Spectrum chipset, is a latecomer to the 16-bit sound-board
arena. The SoundBlaster-compatible card uses the Yamaha OPL-3 synthesizer chip
to provide 20-voice MIDI support. As you can see, there are a lot of vendors
fighting over a market that has yet to even be defined.
How do you provide general-MIDI support on such a wide array of hardware
platforms? Under Windows, the multimedia interface will do a lot of the work,
but it's an incomplete solution. With Windows, you can ask to play a MIDI file
from disk, with no interactive control and the accompanying disk thrashing, or
you can feed it MIDI by hand. In other words, there's a high-level and
low-level interface, but no medium-level interface. And that's a problem.
Under DOS there's currently only one complete MIDI solution--the Audio
Interface Library from Miles Design. This is a set of medium-level interface
libraries for DOS real mode, DOS protected mode, and Windows. These drivers
are currently used by about 70 percent of the DOS-based game-development
community. Miles Design licenses these MIDI drivers to professional developers
and publishers with full source code disclosure and unlimited distribution
rights with a license fee. The VESA committee is currently working on a
standardized interface specification for wave audio and MIDI music. (For more
information, see "The VESA BIOS Extension/Audio Interface" on page 58 of this
issue.) Though VESA's efforts are noble, it will be sometime before all of the
various vendors agree to and implement the new standard. Once this is
achieved, things will definitely improve. For the moment, however, there
appear to be just two de facto standards: SoundBlaster compatibility
(available through a license from Creative Labs) and general MIDI utilizing
the MPU401 uart in "dumb" mode ( la Roland MT32, LAPC, SCC55, RAP-10, and so
on).
Adding music and sound to your application software can be both fun and
rewarding. It can also be the most frustrating. To make it more rewarding,
here's my advice: Make the investment in general MIDI and find a good
professional source for sound and music, either from an experienced
interactive-media composer, or from quality off-the-shelf music and sound
clips.
 Table 1(a): Audio Hardware Manufacturers and Specifications
 Table 1(b): Audio Hardware Manufacturers and Specifications
 Table 1(c): Audio Hardware Manufacturers and Specifications


For More Information


Activision
P.O. Box 67001
11440 San Vincent Blvd., #310
Los Angeles, CA 90049
310-207-4500
Adlib Corp.
20020 Grande Allee East, #850
Quebec City, PQ
Canada G1R 2J1
418-529-9676

Advanced Gravis
101-3750 N. Fraser Way
Burnaby, BC
Canada V5J 5E9
604-431-5020
Advanced Strategis Corp.
60 Cutter Mill Road, #502
Great Neck, NY 11021
516-482-0088
Artisoft Inc.
691 East River Rd.
Tucson, AZ 85704
800-846-9726
ASC Computer Systems
P.O. Box 566
26401 Harper Ave.
St. Clair Shores, MI 48080
313-882-1133
ATI Technologies Inc.
33 Commerce Valley Dr. East
Thornhill, ON
Canada L3T 7N6
416-882-2600
Covox Inc.
675 Conger St.
Eugene, OR 97402
503-342-1271
Creative Labs Inc.
1901 McCarthy Blvd.
Milpitas, CA 95035
408-426-6600
Computer Peripherals Inc.
67 Rancho Conejo
Newbury Park, CA 91320
800-854-7600
Diamond Computer Systems
1130 E. Arques Ave.
Sunnyvale, CA 94086
408-736-2000
DSP Solutions
2464 Embarcadero Way
Palo Alto, CA 94303
415-494-8086
Ensoniq
155 Great Valley Parkway
Malvern, PA 19355
215-647-3930
The Fat Man
7611 Shoal Creek Blvd.
Austin, TX 78757
512-454-5775
Logitech
6505 Kaiser Dr.
Fremont, CA 94555
510-795-8500
MediaVision
47300 Bayside Parkway
Fremont, CA 94538
510-770-8600

Miles Design Inc.
10926 Jollyville, #308
Austin, TX 78759
512-345-2642
Bobby Prince
P.O. Box 1436
Venice, FL 34284
813-484-4969
Roland Corp.
7200 Dominion Circle
Los Angeles, CA 90040-3647
213-685-5141
Computer Music Consulting
Donald S. Griffin
239 Richland Avenue
San Francisco, CA 94110
415-285-3852
Sequoia Systems Inc.
400 Nickarson Rd.
Marlboro, MA 01752
800-562-4593
Echo Speech Corp.
6460 Via Real
Carpinteria, CA 93013
805-684-4593
The Audio Solution
P.O. Box 11688
Clayton, MO 63105
314-567-0267
Turtle Beach Systems
52 Grumbacher Road, #6
York, PA 17402
717-767-0200
Voyetra Technologies
333 Fifth Ave.
Pelham, NY 10803
914-738-4500
Wallace Music & Sound Inc.
Rob Wallace, Executive Producer
6210 West Pershing Avenue
Glendale, Arizona 85304-1141
602-979-6201
Walt Disney Software
P.O. Box 290
Buffalo, NY 14207-0290
818-841-3326
Yamaha Corp. of America
Consumer Products Division
P.O. Box 6600
Buena Park, CA 90622-6600
714-522-9240.











Special Issue, 1994
Inside OS/2 Software Motion Video


Using threads to synchronize audio and video data




Les Wilson


Les is a senior programmer in IBM's OS/2 Multimedia Software group. He was the
project leader of the team that invented and developed the software
motion-video support in OS/2 2.1. The synchronization algorithm described here
was invented by Steve Hancock and Bill Lawton. Les can be reached at IBM
Corp., 1000 NW 51st St., Boca Raton, FL 33431.


Until recently, digital video's huge demands for processing power and data
storage were major hurdles for PC developers. To a great extent, recent
advances in hardware, CD-ROM, and image-compression technologies have enabled
us to make gains in the race for realistic digital video. Fully synchronizing
audio and visual data, however, is one of the technical challenges yet to be
solved. You know the problem: "Out-of-sync" audio and video in foreign monster
films where English-speaking voices are dubbed over a non-English-speaking
actor's moving lips. For low-budget entertainment, we've been generally
tolerant of this lack of synchronization. However, system providers must
address the timing and synchronization problems to ensure the serious use and
acceptance of their system.
IBM's Multimedia Presentation Manager/2, Apple's QuickTime, and Microsoft's
Video for Windows all provide users with the ability to create and manipulate
digital video and audio data. To do software motion-video playback, such a
system must first locate the data for presentation and transport it from its
current location to the playback system. Depending on the way the data was
created, the task of locating and transporting the data can be simple or
complex. In the simple case, the system opens the file and reads a buffer of
data. In the complex case, the system traverses data structures and retrieves
the required data using indirect pointers to other files.
Next, the system segregates and moves each type of data to the appropriate
processor for that data. Audio data goes to the audio subsystem, video goes to
the software decompressor, and so on. Synchronization of the data presentation
occurs at the target end of the pipeline; see Figure 1.


Types of Synchronization


There are two types of synchronization: free-running and monitored. The
free-running technique cues up the audio and video data at the target
processor, kicks them off, crosses its fingers, and hopes for the best.
Sometimes it works, sometimes it doesn't. Systems that use this technique
alone often exhibit inconsistent audio and video sync. This is especially
common when processing high-motion scenes in the video that cause the target
video processor to lag behind the audio. Other interference from device
contention can also cause a given target processor to lose synchronization.
Once a target processor is out of sync in a free-running system, nothing other
than chance will bring it back in sync.
Monitored systems add a policing activity to free-running target processors.
In these systems, target processors detect when they're out of sync and employ
appropriate techniques to resynchronize the processed data. Timing
compensation occurs when a target processor is either ahead or behind the
desired location in the data. In doing so, target processors adjust their
processing speed depending on the complexity of the data, processing power
available on the system, and interference from outside activities.
Monitored systems react to the complex interactions occurring in the system.
For audio/visual data, this type of system constantly resynchronizes what is
seen with what is heard. This doesn't help the foreign actor with the wrong
lip movements for the English-language sound track, but it does ensure that
the "thud" is heard when the monster hits the ground.


Using Interleaved Data


Both free-running and monitored systems are affected by how the data is
organized at the source. When audio and video data is evenly distributed, or
interleaved, it flows easily into the system with minimal overhead. That is,
as long as there's sufficient data for the target processors, the source data
can be read in a single sequential stream. Free-running systems work best when
audio and video data are interleaved. Interleaving is also very good when data
is on slow devices such as CD-ROM.
While interleaving can help synchronization, it shouldn't be mandatory. Some
file formats allow audio and video data to be "clumped" at the beginning or
the end of a file. Other file formats allow the data to be distributed in
other local and remote files. Either way, multimedia systems must ensure
adequate processing time is allocated to prefetch the required data to affect
on-time delivery to the target processors.


Using Multiple Threads for Synchronization


The OS/2 data-streaming model employs the concept of source and target stream
handlers. Chains of stream handlers process and move data at discrete points
in a stream of data. As the stream handler moves data from point to point, it
performs the required data-specific timing and processing operations. In this
way, the stream handler becomes a convenient place to encapsulate
data-dependent and timing-dependent operations.
OS/2's Multimedia Presentation Manager implements this streaming model with
several independently dispatched threads. Each stream handler is a thread that
controls the processing of data through a certain point in the stream.
Additionally, a centralized buffer-management and timing thread called the
"synchronization stream manager" (SSM) provides services to each stream
handler for handling buffers and monitoring its own processing against that of
other stream handlers. Together, these threads join to move, filter, and
present data with monitored synchronization. This is illustrated in the data
and control flows diagramed in Figure 2.
Here's what happens during playback of a software motion-video file:
1. The application initiates the playback operation using the media-control
interface (MCI) API. These operations identify the source file and the
operation(s) to be performed (play, pause, seek, and so on).
2. The system loads the movie and starts the play.
3. The digital-video media-control driver (MCD) uses the Multimedia I/O
services (MMIO) to find and open the file. The MCD is isolated from the file
being local or remote.
4. MMIO identifies the file format and uses one of its pluggable I/O
procedures (IOPROC) to handle any file-format dependent operations. This
allows support of additional file formats without modification of the MCD.
5. The IOPROC opens the file, identifies it, examines the contents, and
determines the type of video it contains.
6. The IOPROC loads and initializes the appropriate software decompressor.
7. The MCD initiates the required stream handlers, and allocates the buffer
management, timing services, and hardware resources required by the contents
of the file.
8. When ready, the multitrack stream handler (MTSH) reads the audio and video
data from the file.
 9. The MTSH identifies and splits the data into its output streams.
10. The system cues the streams.
11. The MCD starts target stream handlers and controls the playback as
requested by the application.
12. As the video stream handler runs, it tells the pluggable video
decompressor to reconstruct and display the frame.
13. On systems that allow direct access to the display (such as OS/2), the
video decompressor reconstructs the image directly into its window.Otherwise,
the image is reconstructed in system memory and displayed by the video stream
handler.
As the system runs, it dispatches threads according to priorities and
scheduling algorithms. Each point in the stream performs its part of the
entire task to move data from the source to the target. As each stream handler
does its work, it records its progress so that the SSM can monitor the data
stream and provide synchronization services.
Depending on a hardware platform's display hardware, the time required to
display a frame can vary greatly. The less efficient the display subsystem,
the less processing power is available for other activities in the system. To
solve the problems of inefficient display subsystems, OS/2 2.1 allows its
video decompressors to bypass the graphics subsystem and access the display
adapter directly. When implemented by the adapter's OS/2 display driver, this
bypass dramatically improves the performance of software motion video. (See
the accompanying text box "About IBM's Ultimotion" for an example of the
performance levels achieved by this technology.) Given such varying
video-display performance, the synchronization system must be built to
compensate for the variable video-stream performance and still deliver
synchronized video and audio data.



Synchronizing Stream Handlers


One of a stream handler's responsibilities is to report its progress to the
SSM. In turn, the SSM monitors each stream and provides tolerance checks of a
given "slave" stream against a "master" stream. In OS/2, the video stream is a
slave stream and the audio stream is the master stream.
When there's sufficient processing power to handle a movie's frame rate, the
video stream handler displays a frame, calculates the time to the next frame,
and sets a timer for that duration. As long as the system and audio times
remain in tolerance, the system behaves like a free-running system. However,
this rarely happens for very long and before you know it, you need adjustments
in the video output timing. To acheive synchronization between the two
streams, the slave stream adjusts itself to the master stream. For video, this
adjustment is made in the calculation of when to display the next frame.
Figure 3 shows a flowchart of the algorithm. However, the details of the
algorithm may be better illustrated with an analogy. Consider two postal
workers with delivery routes containing the same number of mailboxes. Each
postal worker tries to deliver the mail to each box at the same time the other
worker delivers the mail to the corresponding box. Postal-worker A (audio)
delivers mail at a large apartment complex. The mailboxes have a central
location and an efficient and predictable means for mail delivery. Worker A
calls in his progress to the main post office (SSM) on a regular basis.
Postal-worker V (video) delivers mail in a nearby suburb. This route has rural
mailboxes, and the worker must drive from mailbox to mailbox. At the end of
each block, postal-worker V calls in (to SSM) and reports his current box
number.
In general, each stop along worker V's route is predictable. However, as in
real life, he may deliver mail too quickly and get ahead of worker A by the
end of a block. When worker V realizes he shot ahead or lagged behind worker
A, worker V adjusts his delivery rate so that he delivers to the next box at
the same time that worker A is expected to deliver to the corresponding box.
Should anything keep A from his or her "appointed task," this behavior ensures
synchronization of V to A at the end of each block.
Conversely, if worker V lags behind worker A, worker V adjusts his rate of
delivery until he catches up. If the difference is small, worker V attempts to
catch up by eliminating any unneeded waiting at each box. If worker V is
already delivering at the fastest possible rate (that is, he has already
eliminated waiting between boxes) and chronically lags behind, a more drastic
change in delivery is required. In this analogy, we let postal worker V race
down the block and simply toss the mail out the window at each mailbox. Worse
yet, we may forget delivery altogether and just drive to the end of the block.
(For video, the actual technique used depends on the capabilities of the video
decompressor. In either case, the basic idea is to try and drop frames so that
the video correlates with the audio.)
Listing One, page 41, is a C implementation of this algorithm using the OS/2
system and synchronization stream manager (SSM) APIs. The routine that
calculates the next video-frame decompression time (that is, the next time for
worker V to deliver at a mailbox) is called CalcNextFrameIval. The input
parameters to this routine are pointers to instance structures. The pointer
psib points to the SSM timing information for this thread. The pointer pMovie
points to the video stream handler's instance data for the movie being played.
Local variables exist for calculating the various error values used by the
algorithm as well as flags used for controlling the path through the
algorithm.
First, the code gets the current time using the system timer. Based on the
movie's authored frame rate, the variable TimeNextFrame is incremented to
reflect when the next frame should be displayed. Using these two pieces of
information, the fVideoTooSlow Boolean and the VideoTimeError variables are
set. These reflect how far off, and in which direction (ahead or behind), the
video is relative to the system timer. Next, the algorithm tests if SSM is
reporting that the slave stream (video) is out of tolerance. When this test
fails, the algorithm drops directly into the last section of code. If the
video is not too slow, the thread sleeps for the calculated TimeNextFrame. If
the value of TimeNextFrame is less than the system-timer granularity, a quick
yield is done to let higher priority threads execute. This helps prevent the
file-system threads from starving.
Returning to where the SSM "out-of-tolerance" check is successful (video and
audio are out of tolerance), the algorithm goes on to set the error values and
synchronization flags.
Using the information about the relative position of the master (audio) and
slave (video) streams, the variable TimeNextFrame is calculated. It's
important to note that at this point, the algorithm works to force the video
in sync with the audio and not in sync with where it computes it "should" be.
The comments in the code detail the precise conditions tested and the way
TimeNextFrame is calculated. However, regardless of which stream lags behind
the other, the algorithm uses the same scheme. When video lags behind audio,
TimeNextFrame is set behind the current time. This has the effect of making
the video play faster. Conversely, when the audio lags behind video,
TimeNextFrame is set ahead of the current time, which has the effect of
slowing down the video. The amount by which TimeNextFrame is set ahead or
behind the current time is always calculated to be the difference between the
audio time and the system timer. In addition to forcing the video to
synchronize to the audio, this cross check also adjusts for possible
differences in the timing sources. (For example, SSM and the system timer use
different physical timers, and the two clocks drift.)
Once TimeNextFrame is recalculated, the algorithm drops into the same code
discussed earlier that takes action based on the local flags and error values.
If it isn't time to display the next frame, the thread sleeps for the
calculated interval. If the time is past, the algorithm stores this
information in the movie-instance structure for use by the frame-dropping
algorithm.


Dropping Frames to Catch Up


At first glance, dropping frames to make video "catch up" seems simple and
straightforward. However, most software algorithms use temporal compression
and only store "delta frames"--those portions of a frame that have changed
since the last frame. If these frames are dropped, portions of the movie that
should have changed aren't changed from frame to frame and unpleasant visual
artifacts are displayed. To counter this, the compression algorithm frequently
compresses the entire frame and inserts it into the video stream. This
intraframe compressed frame is called an "I-frame." When displayed, I-frames
repaint the entire video frame and repair any artifacts.
Since frame dropping depends on the compression algorithm, the frame-dropping
logic in the stream handler defers the actual drop processing to the
decompressor. If the compressed data stream cannot tolerate frame dropping,
the decompressor simply ignores the information and the system continues as
best it can.


Summary


A fully threaded system provides efficient tools to process multimedia data.
These tools effectively help address multimedia-related problems in areas such
as buffer management, data movement, data filtering, and synchronization. As a
platform for multimedia applications, OS/2 provides a rich set of
synchronization and video-output services. Through the use of threads,
system-timing services and OS/2's multimedia synchronization services, the
synchronization algorithm presented here can provide an independent
video-timing mechanism for when an audio track is not present. It can also
correct for the following: clock drifts in system and audio timers,
dynamically and statically varying display subsystem efficiency, chronic
video-synchronization loss, and catastrophic audio failure. It is an example
of how multiple timing services can be combined to provide a solution to an
age-old problem.
 Figure 1: Typical data flow in multimedia systems.


Digital Video Compression


Digital-video compression is the act of taking a raw digitized image and
reducing the amount of storage required to represent the image. There are two
compression domains: spatial (or intraframe) compression, which tries to
eliminate redundancies within a given frame; and temporal compression, which
tries to eliminate redundancies over intervals of time (frames). An
algorithm's resulting compression ratio is determined by the degree to which
it is able to exploit redundancies and irrelevancies in each of these domains.
There are also two types of compression: lossy and lossless. As its name
implies, lossless video compression compacts the source data without losing
any of the information it contains. For video data, this means that the
compression retains all the image detail. When the image is decompressed, the
result is identical to the original. Lossless algorithms are well suited for
compressing computer-generated images and are commonly used for storing video
animations. To date, however, lossless video algorithms are still too
computationally complex for software playback and have limited applicability
when the objective is to achieve high compression and frame rates.
Lossy algorithms compact the source data by throwing away data that is
redundant from one frame to the next. Examination of a reconstructed frame
from any of today's software algorithms reveals the use of this technique. The
advantage of this technique is that a respectable representation of the
original is retained and it satisfies the requirement for high compression
ratios and low computational complexity.
When examining software-compression algorithms, one must consider several
characteristics; see Table 1. Because these characteristics are interrelated,
you should avoid stressing one at the expense of another. Consider the frame
size, which is determined at movie-creation time. Frame size is the width and
height (in pixels) that the compression algorithm stores in the movie. In a
video digitizer that uses 16-bit color (65,535 colors), the raw data for each
frame of a 320x240 movie is 153,600 bytes. At 15 frames per second, the raw
data is 15 times that, or 2,304,000 bytes. Assuming 22 Kbytes per second for
an audio track, a video-compression algorithm must achieve an average
compression ratio of 18:1 to play back at single-speed CD-ROM rates.
The frame rate, determined at movie-creation time, is usually expressed in
frames per second. The frame rate determines how smoothly the motion is
perceived by the viewer. Studies show that most people dislike frame rates
below 12 frames per second. Most movies made for software motion video are 15
frames per second and higher.
Data rate expresses how much bandwidth is required for a movie. A given
movie's frame size, frame rate, and compression ratio determine how much space
is required for a given interval of time. Usually expressed in bytes per
second, the data rate simply expresses the average number of bytes used for
one second of movie. A movie's data rate must not exceed the storage device's
data rate. If it does, the device won't deliver the data fast enough, and the
movie will break up.
A 16-bit color, 320x240 movie running at 15 frames per second takes 2,304,000
bytes (320x240x2x15) for one second of uncompressed (raw) data. Using a
compressor that averages 18:1 compression, one second of video data can be
reduced to 128,000 bytes. Adding a 22,000-byte audio track, the resulting
movie has a data rate of 150,000 bytes per second.
The compression algorithm determines a movie's computational complexity, which
is a measure of how much processing power is required to decompress and
display a movie's frames. The density at which an algorithm encrypts the data,
the technique required to reconstruct each image, and the volume of data per
frame determine an algorithm's computational complexity. As an algorithm's
playback computational complexity increases, a given processor's ability to
decompress it is diminished. To bring it back in balance, the volume of data
must be reduced using a smaller frame size, lower frame rate, or both. In this
way, a given algorithm can be evaluated on the basis of how large a frame rate
and size can be achieved without compromising the appearance of the image.
Using the previous example, we can calculate that the average number of bytes
for each compressed frame is 10,000 (150,000 bytes/15 frames per second).
Therefore, the decompression algorithm has 1/15 of a second to read the 10,000
bytes, reconstruct the frame, and display it. The higher a movie's frame rate,
the less time there is to do this. The larger the frame size, the more pixels
have to be displayed. The more dense or complex the compressed data, the more
time is required to reconstruct the frame.
Compression complexity determines how an algorithm is used to make a movie. An
algorithm that takes longer to compress a frame than it does to decompress a
frame is called an "asymmetric," or "off-line" algorithm. Conversely, one that
takes equally long to compress as decompress is called a "symmetric," or
"real-time" algorithm.
Off-line algorithms first get raw data from a file or frame-stepped device and
then compress it. By their nature, these algorithms usually have the best
image quality and compression ratios. Real-time algorithms are useful on live
video sources that are not controlled one frame at a time. These algorithms
compress the data on the fly as it is digitized. The advantage of these
algorithms is that they eliminate the need for vast disk storage of raw data.
Real-time algorithms usually trade off frame size and compression to achieve
reasonable frame rates.
A movie's author determines frame rate, frame size, and data rate at
movie-creation time. If played on a system that lacks sufficient power to
handle the movie (for example, if the frame rate is too high), the scalability
of an algorithm determines what the playback system can do to compensate. The
degree to which a movie's characteristics (frame rate, size, resolution, or
color depth) can be scaled down at playback time is called its "playback
scalability." Table 2 shows how each characteristic of a movie can scale.
Image quality is largely a subjective measure of how well the algorithm
retains the details of a movie. As lossy algorithms, today's software
motion-video algorithms are perceived differently by different people.
However, in general, as image detail increases, one or more of the other
characteristics (such as computational complexity) are affected. Again, the
overall quality of an algorithm is a function of how well it balances all of
these characteristics.
Each video-compression algorithm tries to balance these characteristics to
deliver the best video possible in its environment. Which algorithm is best
for you depends on which characteristics and environments are most important
to you. To some degree, most algorithms let you trade off one characteristic
to improve another (for example, reduce the frame size so that you can
increase the frame or data rate). Ultimately, before drawing any conclusions,
make sure you see a good representation of movies for yourself.
--L.W.
 Figure 2: Architecture of OS/2 software motion-video playback.
Table 1: Characteristics that must be considered in digital-video compression.
Characteristic Description
Frame size Width and height (in pixels) of each frame.
Frame rate Number of frames over a certain interval (usually seconds).
Data rate Average number of bytes in a second of video.
Computational complexity Amount of processing power required to deliver the
video at its authored size and rate.
Compression complexity Amount of time required to compress a second of video
vs. its decompression time.
Playback scalability Degree to which video playback can be degraded when the
video is too complex for the system on which it's played.
Image quality How well the original detail of the frame is retained.
Table 2: Characteristics of a movie and how it can scale during playback.

Characteristic How it Scales
Frame Rate Frames can be dropped to keep up with audio track.
Frame Size The output window can be reduced so less processing is required.
Resolution Image detail can be skipped so that less processing is required.
Color Depth Movies with more colors than are available can be mapped to
displays with fewer colors.
Table 3: (a) Ultimotion's characteristics on a 150-Kbyte-per-second CD-ROM
drive for 320x240 frame size; (b) Ultimotion's characteristics on a
150-Kbyte-per-second CD-ROM drive for 640x480 frame size; (c) Ultimotion's
characteristics on a 300-Kbyte-per-second CD-ROM drive for 320x240 frame size.
Characteristic Description
(a)
Frame Size 320x240
Frame Rate 5 frames per second
Data Rate 150 Kbytes per second
Computational Complexity 25-MHz 386
Compression Complexity Both off-line (8 seconds per frame) and real-time
Scalability Scales from 65,535 to 16 colors; frame rate: up to the authored
rate; frame size: half, normal, and double size
(b)
Frame Size 640x480
Frame Rate 5 frames per second
Data Rate 150 Kbytes per second
Computational Complexity 25-MHz 486
Compression Complexity Off-line (5 seconds per frame)
Scalability Scales from 65,535 to 16 colors; frame rate: up to the authored
rate; frame size: half, normal, and double size
(c)
Frame Size 320x240
Frame Rate 30 frames per second
Data Rate 300 Kbytes per second
Computational Complexity 50-MHz 486DX
Compression Complexity Off-line (approximately 5 seconds per frame) and
real-time
Scalability Scales from 65,535 to 16 colors; frame rate: up to the authored
rate; frame size: half, normal, and double size


About IBM's Ultimotion


Ultimotion is a video-compression algorithm optimized for software playback on
a general-purpose microprocessor. From its inception, Ultimotion was designed
to break through the "small-video-window" barrier (160x120 pixel window size)
and deliver video at four times that size. The resulting algorithm delivers
320x240 movies playable from 150-Kbyte CD-ROM drives. Typical Ultimotion
movies need only a 25-MHz 386 processor and SVGA or XGA display adapter.
One of the factors that help achieve these levels of performance is that OS/2
uses a direct video-access technique now beginning to emerge in other systems.
When supported by a display adapter's device driver, the video decompressor is
given direct access to the display adapter. The system automatically detects
the presence of the support and uses it without the knowledge of the
decompressor. Using this high-speed access, 486-based machines are able to
play Ultimotion's larger 320x240 movies at 30 frames per second.
Ultimotion is a software-only, video-compression algorithm that averages 18:1
compression. It uses both spatial and temporal compression and was designed to
run on processors as low as a 25-MHz 386. On a single-spin CD-ROM drive (150
Kbytes per second), it exhibits the characteristics summarized in Table 3(a).
Table 3(c) shows the characteristics on a double-spin CD-ROM drive (300 Kbytes
per second).
Ultimotion delivers a respectable frame size and frame rate at relatively low
data rates and processing power. Its organization of the compressed data
enables efficient output display, clipping, doubling, and halving of the
movie's frame size. Ultimotion movies can be created using either off-line or
real-time algorithms across a wide range of frame sizes and frame rates.
--L.W.
 Figure 3: Logic flow of audio/video synchronization algorithm.
[LISTING ONE] (Text begins on page 34.)

RC CalcNewFrameIval ( PSIB psib,
 PMOVIE_STR pMovie )
{
 LONG AudioSynchError;
 LONG VideoTimeError;
 LONG lmsTimeError;
 BOOL fVideoTooSlow = FALSE;
 BOOL fVideoBehindAudio = FALSE;
 BOOL fSynchPulse = psib->syncEvcb.ulSyncFlags&(SYNCPOLLING+SYNCOVERRUN);
 ULONG CurrentTime;
 MMTIME mmtimeMaster = psib->syncEvcb.mmtimeMaster;

 // get current time
 DosQuerySysInfo (QSV_MS_COUNT,QSV_MS_COUNT,&CurrentTime,sizeof(ULONG));

 // Update frame time
 pMovie->TimeNextFrame += pMovie->FrameInterval;



 //*********************************************
 // Determine if the video is ahead or behind
 // the frame rate specified for this stream.
 //*********************************************
 if (CurrentTime <= pMovie->TimeNextFrame) {

 //*****************************************************
 // Video is ahead according to system clock
 // Compute Video Error
 //*****************************************************
 VideoTimeError = (pMovie->TimeNextFrame - CurrentTime);
 fVideoTooSlow = FALSE;

 } else {
 //***************************************************
 // Video is behind according to system clock
 // Compute Video Error
 //***************************************************
 VideoTimeError = (CurrentTime - pMovie->TimeNextFrame);
 fVideoTooSlow = TRUE;

 } /* endif - who is ahead? */


 //*************************************
 // Is SSM reporting "Out of Tolerance"
 // Check for a Synch Pulse
 //*************************************
 if (psib->fSyncFlag == SYNC_ENABLED) {

 if (fSynchPulse && mmtimeMaster) {
 pMovie->ulSynchPulseCount++; /* Accumulate count of sync pulses


 /***********************************************************/
 /* Is SSM reporting Video Behind Audio?
 /***********************************************************/
 if (mmtimeMaster >
 (psib->syncEvcb.mmtimeStart + psib->syncEvcb.mmtimeSlave))
 { // Video is behind audio


 fVideoBehindAudio = TRUE;

 // Compute Audio error
 AudioSynchError = mmtimeMaster -
 (psib->syncEvcb.mmtimeStart + psib->syncEvcb.mmtimeSlave);

 // Is Video behind according to system clock?
 //
 if (fVideoTooSlow) {
 if ( AudioSynchError > VideoTimeError ) {
 // Set the next frame time behind the current time
 // so that the delta to the cur time = SSM Audio Error
 // This will cause the video to speed up

 pMovie->TimeNextFrame -=
 ( AudioSynchError - VideoTimeError );



 } else {
 // Set the next frame time ahead of the current time
 // so that the delta to the cur time = SSM Audio Error
 // This will cause the video to slow down
 pMovie->TimeNextFrame +=
 ( VideoTimeError - AudioSynchError );

 }

 } else { // Video OK by System Clock but SSM reports otherwise
 // Set the next frame time behind the current time so
 // that the delta to the current time = SSM Audio Error
 // This will cause the video to speed up
 pMovie->TimeNextFrame -= (AudioSynchError + VideoTimeError);

 } /* endif */

 } else { // SSM reports Video ahead of Audio

 fVideoBehindAudio = FALSE;
 AudioSynchError = (psib->syncEvcb.mmtimeStart +
 psib->syncEvcb.mmtimeSlave) -
 mmtimeMaster;

 // Is video behind according to system clock?
 if (fVideoTooSlow) {
 //*********************************************************
 // Video is behind according to system time, but video is
 // running ahead of the audio (the audio must have started
 // late or somehow broken up, fallen behind and can't get up
 //*********************************************************
 // Set the next frame time ahead of the current time
 // so that the delta to the cur time = SSM Audio Error
 // This will cause the video to slow down
 pMovie->TimeNextFrame += AudioSynchError + VideoTimeError;

 } else { // Video ahead according to system clock and SSM
 //*********************************************************
 // Video is keeping up or is ahead according to system time
 // AND video is running ahead of audio.
 //*********************************************************

 if ( AudioSynchError > VideoTimeError ) {
 // Video is further ahead than system clock indicated
 // Set the next frame time ahead of the current time
 // so that the delta to the cur time = SSM Audio Error
 // This will cause the video to slow down
 pMovie->TimeNextFrame += AudioSynchError-
 VideoTimeError;
 } else {
 // Video not as far ahead as system clock indicated
 // Set the next frame time behind the current time
 // so that the delta to the cur time = SSM Audio Error
 // This will cause the video to speed up
 pMovie->TimeNextFrame -= VideoTimeError-
 AudioSynchError;
 }



 } /* endif */

 } /* endif */

 psib->syncEvcb.ulSynchFlags = 0;

 //*************************************************************
 // Recompute video time error based on updated TimeNextFrame
 //*************************************************************
 if (CurrentTime <= pMovie->TimeNextFrame) {
 VideoTimeError = (pMovie->TimeNextFrame -
 CurrentTime);
 fVideoTooSlow = FALSE;

 } else {
 VideoTimeError = (CurrentTime -
 pMovie->TimeNextFrame);
 fVideoTooSlow = TRUE;

 } /* endif - time exceeded. */

 } /* endif - SSM reporting "Out of Tolerance" */
 } /* endif - Listening to SSM */

 /***********************************
 /* Take action based on whether the
 /* video is running ahead or behind
 /***********************************
 if (!fVideoTooSlow) {

 lmsTimeError = VideoTimeError / 3; //Convert error to milliseconds
 if (lmsTimeError > 32L) {
 // Block till next time to display a frame
 DosSleep(lmsTimeError);
 pMovie->ulLastBlockTime = CurrentTime;
 } else {
 // Too close to next frame time for system clock to be used
 // Be good and yield to higher priority thread if one around
 DosSleep(0);
 pMovie->ulLastBlockTime = CurrentTime;
 }

 } else {

 /********************************/
 /* Drop some frames if behind! */
 /********************************/
 :
 :
 } /* endif - video too slow */

 // Update Frame Count
 pMovie->ulFrameNumber++;


 return (NO_ERROR);
}

End Listing





























































Special Issue, 1994
Programming the QUANTUMdsp


Downloadable microcode makes softcoding a reality




Charles A. Mirho


Charles is a consultant specializing in multimedia and telephony. He can be
reached on CompuServe at 70563,2671.


Many conventional multimedia boards are hard function in that their
capabilities are defined by ROM chips (or ROM embedded in microcontrollers)
planted on the board at the factory. The problem with the hard-function
approach is that technology, particularly in the areas of audio compression
and telecommunications, marches ruthlessly on. How state-of-the-art is that
8-bit SoundBlaster board you purchased two years ago, or that three-year-old
modem? New compressions offering improved quality and higher ratios are
constantly being invented. Standards evolve. Modem and fax technology advances
toward higher bit rates. As a result, hard-function hardware becomes less than
state-of-the-art within months of its manufacture. The QUANTUMdsp board from
Communication Automation & Control offers a solution to this dilemma. The
microcode that other multimedia and telephone boards hardcode into ROM is
encapsulated into disk files. These disk files are downloaded to the board's
local RAM as needed. The QUANTUMdsp's capability at any given time is defined
by the contents of this RAM. Therefore, the board can be decoding a JPEG image
while simultaneously playing MPEG-coded audio and rotating multiple 3-D
wireframes. Or it can be synthesizing MIDI while answering the telephone,
simply by replacing the contents of RAM with a new set of functions. The great
advantage of this approach is that pieces of the microcode can be combined and
configured on the fly. What's more, updating the microcode disk files updates
the board's capabilities. For instance, the initial beta version I had
contained a disk file with microcode to emulate a 9600-baud modem. Within
weeks I received a floppy disk with new microcode to emulate a 14.4-baud
modem. I copied the new file over the old one, and I had a working 14.4-baud
modem running from the Windows Terminal program.
The QUANTUMdsp board supports advanced audio-compression formats such as MPEG,
G.722, G.728, and subband coding, as well as the more well-known mLaw, aLaw,
and ADPCM compressions. The board includes a general-MIDI synthesizer,
baseline 24-bit JPEG decoder, 14.4-Kbps analog modem, Class I fax,
speaker-independent voice recognition, and a fast, graceful means of rotating
3-D wireframes. And, as just mentioned, many of these are available
simultaneously.


Sound Familiar?


Historically, this isn't a new approach. Desktop printers followed a similar
evolution during their early stages. Over the last 15 years, printers evolved
away from hard function. Early printers had only a built-in set of fonts
(still true of low-end printers). When an application used the printer, it was
stuck with these internal fonts. If, later on, the application required better
fonts, the user had to purchase a new printer, or possibly upgrade the
printer's ROM chip to include the new fonts. Then laser printers came along,
and everything changed. If an application required a font not included in the
printer's internal set, it simply downloaded a suitable "soft font" to the
printer. This flexibility was a strong influence in the rise of desktop
publishing.


How it Works


A library of microcode files (the manual calls them "modules") are stored in a
separate directory. These modules can be combined sequentially using a simple
scripting language. Example 1, for instance, shows a script file for playing
linear, 16-bit audio at 8000 samples per second. Linear data exists in the
raw, 16-bit native format of the A/D converter that sampled it. No special
processing is required to play linear data; it's moved directly from the
source to the speaker. This is rarely the case--audio data is usually coded,
which means each sample has been compressed. (Without compression, audio files
can consume 2 to 50 times more space on a hard drive.) The script in Example 1
offers some insight into the board's inner workings. The first two lines
declare a flow-control buffer. A Windows application can get a pointer to this
buffer and fill it with audio data from a disk file. (We will see how this is
done in a moment.) The flow-control buffer is needed for two reasons. First,
the Windows application runs on the PC (Intel) processor under cooperative
multitasking, which means its behavior is nondeterministic. That is, it is
subject to unpredictable delays. The module (fta in the script) is responsible
for feeding data to the speaker at exactly 8000 samples per second. That means
having a steady supply of audio data on hand at all times; delays are
unacceptable, unless you enjoy gaps and hiccups in your audio. The module
cannot be left to the mercy of the Windows program, since the Windows program
is at the mercy of other Windows programs, ISRs, the Windows memory manager,
and so forth.
The second reason for the flow-control buffer is a bit more complex and
involves signal theory. Essentially, the problem is that it is most efficient
for programs to read large chunks of data at a time from disk files. The
Windows program, for the sake of efficiency, would like to read between 4
Kbytes and 32 Kbytes at a time from the audio file. The DSP module, on the
other hand, must execute in real time. (The playing of audio is inherently
real-time in nature, since we hear sound in real time.) For this reason the
DSP module can only deal with tiny chunks of data at any one time (typically
between 50 and 3000 bytes). The flow-control buffer lets both the Windows
application and the DSP module have it their way--the app feeds large chunks
of data into one end of the buffer at a rate it is comfortable with, while the
DSP module takes data in small blocks from the other end at its own rate. For
this to work, the buffer needs two ends, a head, and a tail. Thus the
flow-control buffer is implemented as a FIFO data queue.
The FIFO is a circular buffer with associated read and write pointers. The
read pointer marks the spot in the FIFO where the next read operation will
find data. The write pointer marks the spot where the next write operation
will put data. In Figure 1, the area indicated in light green contains data.
Reading this data increments the read pointer. Writing more data increments
the write pointer. The FIFO is full when the write pointer is one position to
the left of the read pointer. The FIFO is empty when the read and write
pointers are equal.
The third line in the script declares a channel--a connection to an external
device (such as a speaker, microphone, or telephone line). Channels have two
other characteristics besides the device type: the sampling rate and the
device number. The sampling rate is simply the number of samples per second
that will pass through the channel. The device number specifies which device
of the specified type you wish to connect to. If two speakers are connected to
the board, you could direct the audio to the first speaker by setting the
device number to 1. A declaration of the channel in the form
device.type_sample.rate_device.number defines all of these characteristics:
8000 samples per second flow over the channel to the first speaker connected
to the board. Therefore, AudOut_8_1 in Example 1 specifies an audio output
device (a speaker) with 8000 samples per second, which connects to the first
speaker (there may be more than one).
Finally, the module itself is declared. The module fta is nothing more than a
piece of DSP microcode. The definition of modules in the script is object
oriented; each module is treated as a "black box" with inputs, outputs, and
possibly, controls. The fta module is simply designed to take "bites" of its
input and move them to its output. The input of fta is connected to the flow
control buffer, and the output is connected to the channel. fta bites off 80
samples at a time from the flow-control buffer and moves them out over the
channel where they reach the speaker to be perceived as sound. The DSP always
reads 100 blocks of data per second for a total throughput of 8000 samples per
second.
Taken in its entirety, the script defines the flow of audio data from the
Windows application to the first speaker connected to the board. Graphically,
the data flow looks like Figure 2.
The module library is full of simple, useful modules like fta. The modules can
be combined sequentially in a script to form more-complex multimedia
functions. However, such examples are beyond the scope of this article.


Change is Easy


You may be wondering if all of this scripting is worth the trouble. After all,
playing 16-bit uncompressed audio at 8000 samples per second is hardly a
monumental feat. Example 2, however, shows how simple it is to modify the
script for playing audio at 44,100 samples per second. The only difference is
in the channel declaration, AudOut_44_1. The number 44 is shorthand for
44,100, just as 8 is shorthand for 8000.
Suppose you wanted to record audio at 44,100 samples per second instead of
playing it. You simply modify the script to reverse the flow of data; see
Example 3. There are two changes to the script in Example 3. The first is the
channel declaration, which changes the device type from AudOut (specifying a
speaker) to AudIn (specifying a microphone). The second difference is that the
fta module has been replaced with the module atf, which works like fta but in
reverse; it accepts blocks of data from the channel and moves them to the
flow-control buffer. Graphically, the flow of data looks like Figure 3. To add
or remove compression from the audio stream, a coder/decoder module could be
added in series with either the atf or fta module. The module library is full
of useful coders and decoders.


Some C Required


Returning to the example of playing 16-bit, linear audio at 8000 samples per
second, on the Windows side, a C program is required to read the data file and
move the audio data into the flow-control buffer so that it can be sent out
over the channel. Listing One, page 46 shows the complete program. The program
has the structure of a DOS program and is compiled and linked for QuickWin.
This isn't a requirement, but I use it for the benefit of DOS programmers
making the transition to Windows. QuickWin programs are much easier to follow
than conventional Windows programs, while still illustrating all the important
concepts.
The program begins with a list of necessary header files. In addition to the
standard C header files, the file vclib.h is included. This header defines the
API to the multimedia board. You can easily spot board-specific functions in
the example; they are all prefixed by the letters vc.
The first call is to function vcAddTask, which loads the script file. The
microcode from any disk modules in the script are downloaded to the board at
this point. The 40,000-byte flow-control buffer defined in the script is also
allocated. The first parameter to vcAddTask is the DSP number on which the
script will execute. Each board contains a single floating-point 32-bit DSP
running at 55 MHz. This should be enough for all but the most intensive
applications. If you need more power, however, up to four boards can be added
in a single machine. The second parameter is the name of the script file to
load. The function returns a handle which will be used to identify this script
in future function calls. (A single program can load many scripts
simultaneously.)
At this point the script is loaded and idle. The example calls the functions
vcGetFifoHandle and vcInitFifo. The function vcGetFifoHandle returns a handle
to the FIFO buffer, just as the Windows memory-allocation functions return
handles to memory blocks. This handle will be dereferenced shortly when it is
time to move data from the file into the FIFO. The call to vcInitFifo simply
sets the FIFO read pointer equal to the value of the FIFO write pointer,
indicating that no data is available in the FIFO (the empty state).
Next the data file is opened. This data file contains audio data sampled at
8000 samples per second. After opening the data file, Listing One calls the
function DiskToFifo. This isn't a board function but rather a local function
for moving data from the audio file to the FIFO. We will see how the function
does this in a moment; for now, suffice it to say that when the function
returns, the FIFO is loaded with data and ready to go. The script is loaded
and sitting idle with data in the buffer. All that remains is to call
vcStartTask to get things started. vcStartTask takes a single parameter, the
script handle returned by vcAddTask.


Inside DiskToFifo


Function DiskToFifo moves data from the audio file to the FIFO. DiskToFifo
dereferences the handle to the FIFO returned by the previous call to
vcGetFifoHandle. It also returns the number of bytes available for writing in
the FIFO. vcGetFifoWritePtr is used to get a pointer to the write FIFO. The
FIFO write count is returned in the variable lWriteCount. The pointer to the
FIFO is dereferenced into the variable lpWrite. The next line of code limits
the number of bytes to move to less than 32 Kbytes. This isn't a limitation of
the board software but rather a choice in this example to avoid huge data
moves that can cause problems in the Intel segmented architecture.

Data is read directly from the data file into the FIFO using the standard C
language read function. Since data is read directly into the FIFO this way,
the write count returned by vcGetFifoWritePtr must be the number of
consecutive bytes available for writing in the FIFO (remember, FIFOs are
circular buffers). Obviously, a function like read is "unaware" of the
circular nature of the FIFO buffer, and so it is up to the program to supply
the number of consecutive bytes.
If an end-of-file condition is reached, indicating that all the data in the
file has been moved to the FIFO, the read count is checked using the formula
lReadCount &=~0x3L.
This formula rounds down to the nearest multiple of four bytes. This is
necessary because the last read may have reached an end-of-file, so the actual
read count is something less than the number of bytes requested. For example,
if 0x7fff bytes were requested but only 1001 bytes remained in the file, then
the read count will be 1001. This number must be rounded down to the nearest
multiple of four bytes: 1001 &=~3L0 becomes 1000. (An annoying anomaly of the
FIFOs--they must be a multiple of four bytes in size, and must be read and
written in multiples of four bytes.) Unfortunately, this discards the last
byte in the file but it's not usually a problem when playing audio files. If
it is essential to preserve the last byte in the file, then pad bytes can be
added, making the total read a multiple of four.
Even after the entire file is read into the FIFO, it isn't yet safe to shut
down and exit the program. The entire file has not been played until the FIFO
is empty. The flag donewriting is set to indicate that the file has been read
into the FIFO and we are now waiting for the FIFO to play out. An if statement
at the top of the function checks if the FIFO has played out by calling the
function vcGetFifoReadCount. The FIFO read count will be 0 when the FIFO is
empty, indicating that all data has been played.
The last statement in DiskToFifo is a call to the function
vcUpdateFifoWritePtr. This function updates the read and write pointers for
the FIFO. The FIFO pointers are not updated automatically when the FIFO is
read or written, because there is no way for the board software to know how
many bytes the standard C read function moved into the FIFO.
The main function calls DiskToFifo from a tight loop until the entire file has
been played. When DiskToFifo returns 0, indicating that all data has been
played, main calls vcDeleteTask with the handle to the script file. This
unloads the script and frees the memory allocated for the FIFO.


Support for Standards


Developers who prefer standards will be happy to know that the board supports
all MPC functions for the playing and recording of wave audio, as well as for
MIDI (synthesis only). Technically, the QUANTUMdsp is not MPC compliant
because it does not include a MIDI port. MIDI output messages are directed to
the board's general-MIDI synthesizer (also a downloadable microcode file). I
tested the board with both Sound Recorder and Media Player and both worked
well. Table 1 shows the linear wave-audio formats supported by the board.
Most of the audio formats can mix with one another in any combination. That
means two or more audio files of different sampling rates and sample size can
play simultaneously. Try that with a Soundblaster! Sample rate-converter
modules in the microcode library allow two or more audio streams with
different underlying sampling rates to mix on the same output channel. (The
sample rate-converter modules are inserted automatically when the board
software detects a format conflict.) Supported audio coder/decoders are G.722,
G.728, MPEG Layer Two, ADPCM, mLaw, aLaw, and subband.
Analog modem and fax (Class I) capabilities are available by replacing the
standard Windows COMM driver with the board's COMM driver. Once the driver is
installed, the Windows Terminal program can be used to dial from 14.4 kbps
right on down to 300 bps. Any Windows communication program that uses the
standard COMM driver will work as well. Fax programs which use the Windows
COMM driver (such as BitFax from BIT Software) also work at 9600 baud. Demo
programs for 3-D wireframes, 24-bit (16-million-color) JPEG still-image
decompression, and an audio jukebox are included.


Conclusion


While the microcode library supports an impressive set of audio functions,
there's relatively little in the way of image compressions (only JPEG). The
lack of a MIDI or joystick port are certainly drawbacks for the
music-composition and game markets. Motion video isn't supported, but a video
daughterboard is planned for the future.
Still, the high-powered set of audio compressions are well suited to voice
mail and teleconferencing applications. The fast JPEG decoder and MPEG audio
compression should be very useful in top-end presentation packages. The
telephone and fax features rival similarly priced, dedicated communication
boards. But probably the most important feature of the board is the fact that
the buyer is not committing to today's multimedia standards. As standards
evolve, as compression formats improve, and modem bit rates move up, you can
expect updates in the form of disk files (courtesy of AT&T and third parties).
This should double or possibly triple the useful life of the board over
more-conventional hard-function aproaches.
Example 1: Script to play linear, 16-bit audio at 8000 samples per second.
FifoSize: 40000 /* size of flow-control buffer */
Local: FlowBuffer FifoSize /* declare flow control buffer */
Extern: AudOut_8_1 /* output channel */
fta( ILevel)
{
 fin FlowBuffer
 aout AudOut_8_1
}

 Figure 1: The flow-control buffer.

 Figure 2: Flow of audio data from a Windows application to the first speaker.
Example 2: Modifying the script in Example 1 to play linear, 16-bit audio at
44,100 samples per second.
FifoSize: 40000 /* size of flow-control buffer */
Local: FlowBuffer FifoSize /* declare flow control buffer */
Extern: AudOut_44_1 /* output channel */
fta( ILevel)
{
 fin FlowBuffer
 aout AudOut_44_1
}


Example 3: Script to record linear, 16-bit audio at 44,100 samples per second.
FifoSize: 40000 /* size of flow-control buffer */
Local: FlowBuffer FifoSize /* declare flow control buffer */
Extern: AudIn_44_1 /* input channel */
atf( ILevel)
{
 ain AudIn_44_1
 fout FlowBuffer

}


 Figure 3: Flow of data when recording audio at 44,100 samples per second.
Table 1: Linear wave audio formats supported (almost all formats can mix).

Sample Mono PLAY Stereo Mono RECORD Stereo
Rate 8 bit 16 bit 8 bit 16 bit 8 bit 16 bit 8 bit 16 bit
 8000 x x x x x x x x
11025 x x x x x x x x
16000 x x x x x x x x
22050 x x x x x x x x
24000 x x x x x x x x
32000 x x x x x x x x
44100 x x x x x x x x
48000 x x x x x x x x
For More Information

Communication Automation & Control
1642 Union Blvd., Suite 200
Allentown, PA 18103
800-367-6735


[LISTING ONE] (Text begins on page 42.)

/* QuickWin Audio Player */
#include <stdlib.h>
#include <stdio.h>
#include <conio.h>
#include <io.h>
#include <errno.h>
#include <sys\types.h>
#include <sys\stat.h>
#include <fcntl.h>
#include <vclib.h> /* VCAS function prototypes */

int DiskToFifo(long hf, int fd);
static int fd, tidPLAY, tidADA;

main()
{
static long hf;
long hparam;

 if( vcAddTask( 1, "PLAY",&tidPLAY) < 0) /* load script */
 return -1;
 if( vcGetFifoHandle( tidPLAY, "FlowBuffer", &hf) < 0)
 /* get handle to flow control buffer */
 return -1;
 if( vcInitFifo( hf) < 0) /* initialize (zero out) flow control buffer*/
 return -1;
 fd=open( "HELLO.L8", O_BINARYO_RDONLY); /* open the audio data file */
 if(fd == -1) printf("Cannot open HELLO.L8\n");

 printf ("Hit return...\n");
 getchar();

 DiskToFifo( hf, fd); /* put some data in the FIFO */
 if( vcStartTask( tidPLAY) < 0) /* start playing the audio file */
 return -1;
 while(DiskToFifo(hf,fd)==1); /* play until done */
 vcDeleteTask (tidPLAY);
 close(fd); /* close input file */
}


int DiskToFifo(long hf, int fd)
{
long lReadCount, lWriteCount, *lpWrite, li;
static donewriting = 0;

 if (donewriting)
 {
 vcGetFifoReadCount( hf, &li); /* if DSP has emptied the FIFO... */
 if(li==0L) return(0); /* then quit */
 return 1;
 }
 printf("."); /* print something to indicate activity */
 vcGetFifoWritePtr( hf, &lWriteCount, &lpWrite);
 /* get FIFO write pointer */
 if(lWriteCount > 0x7fff) lWriteCount=0x7fff;
 /* limit data moves to 32K */
 /* read the disk directly into the FIFO */
 lReadCount= read( fd, (void*)lpWrite, (unsigned int)lWriteCount);
 if(lReadCount < lWriteCount) /* if disk is getting empty... */
 {
 lReadCount &= ~0x3L; /* ensure 32-bit transfer */
 donewriting = 1;
 }
 vcUpdateFifoWritePtr( hf, lReadCount); /* update FIFO indices */
 return(1);
}
End Listing


































Special Issue, 1994
Animation with the Windows GDI


Saving and restoring bitmapped images within a window


Joe Sam is the owner of Autumn Software, which provides software and services
for the AS/400, DOS, and Windows. He has been involved in consulting and
commercial software for over 15 years and currently specializes in C, C++, and
RPG. He can be contacted at P.O. Box 261646, Tampa, FL 33685.


Animation with Windows presents some special challenges, as I discovered in a
recent project that involved an animation sequence in a multiprogram-database
application. The animation was similar to the classic bouncing ball, moving a
nonrectangular bitmap over a previously painted background. A requirement of
the project was that the animation should work even if the background changed,
and it had be independent and reusable enough to easily integrate into other
programs. I soon found that my standard library let me down, and I was off on
one of those deductive journeys for which Windows so freely offers a ticket to
ride.
This article describes some of the quirks_er, features in Windows that, if
ignored, can cause graphical graffiti to erupt all over your beautiful
displays. Some of the hurdles include performing bit-block transfers (bitblts)
in a window and overcoming Windows' optimization procedures when repainting
the screen. I'll also address smoothness in larger animations and present a
program, MOVBMP, that shows you how to put it all together.


Animating Bitmaps


Most paint programs as well as the BitBlt function, which is our primary tool
for manipulating images, require rectangular bitmaps. In general, to display
nonrectangular images, a program combines a portion of the screen, a bit mask
and the image itself using AND and XOR operations. This allows background
graphics to appear in the areas of the rectangle outside of the actual image
("transparent bitmaps" in Microsoft lingo). For this to work, the bit mask
must be a duplicate of the original, except that the image is totally black on
a white background. The image bitmap must be enclosed in black. This can be
done in code at run time, but it is both more convenient and faster to use
predrawn images, usually as BITMAPs in the resource file. The animated image
in MOVBMP (see Listing One, page 50, and Listing Two, page 51) is an iron
cross with three layers of color on each arm and a mostly transparent center.
Figure 1 shows the six bitmaps that are used. Background graphics will appear
in the white areas of the mask. The top, right, bottom, and left bitmaps are
used for rotating the animation sample. The "true image" bitmap shows the iron
cross as it appears on screen.
In any graphics system, this type of animation is accomplished by saving the
display area where the image will appear, ANDing the mask with the screen, and
XORing the image with the ANDed result. New positions are then calculated, the
saved image is restored, and the process is repeated. Depending on the bitmap
size, the preceding steps may be enough. Windows is no speed demon, however,
and display memory is relatively slow, so most programs require the use of
memory device contexts (DCs). Memory DCs created with CreateCompatibleDC can
be treated exactly like a display or other DC, depending on which type of DC
is passed as an argument to this function. For larger bitmaps, additional
efficiency measures are necessary.


The Trouble with BitBlt


The BitBlt function provides a means to quickly get and put predrawn screen
images. Unfortunately, when copying a displayed image to memory, BitBlt is
more interested in what appears on the screen, and not necessarily what
appears in your window. This means that if another window overlaps your client
area, or even if one of your own menus is displayed over the animation, BitBlt
copies the image from the overlapping display or menu. When your window
regains focus, you may suddenly have a piece of another program's screen
embedded in your window, which is generally unacceptable in a professional
application. Also, due to the low priority and imprecision of timer messages,
which are used to determine when to move the images, this state of events
occurs more frequently than you might expect. This is a known problem that has
occasionally popped up in question-and-answer columns. Microsoft is surely
aware of the problem, but who knows when, or if, they will fix it, and, more
importantly, if your users will have the fix. Fortunately, BitBlts to the
screen are confined to a window's client area.


An Optimization that Creates Work


The second feature to overcome involves a Windows 3.1 optimization. When
BeginPaint is called, a structure is returned that contains, among other
things, what Windows considers the invalid portion of your client area
expressed as a rectangle. For the example in Figure 2, your program will get
coordinates corresponding to rect A, C, G, F. However, Windows appears to clip
to the region F, D, E, B, C; and the area A, B, E, D will not be repainted.
This is good for Windows since less work and less time is necessary, but bad
for you under certain circumstances. There's no fast, reliable method to
determine if another window covers the image or if Windows will clean up the
screen for you. In the end, you're pretty much on your own since most
animations operate outside of the paint routine.
And there you have it: BitBlt may end up saving and restoring pieces of a menu
or another program's display on your client area. The paint routine would seem
to be a good place to get a clean copy of the background to avoid that
situation, but part or all of the image can be inside the invalid rectangle
yet outside of the clip region. In that case, if you resave the screen area,
your image gets embedded in the screen in interesting ways.


Getting a Clean Image


My solution appears in DoEraseBkgnd (see Listing Two) which is called in
response to the WM_ERASEBKGND message. The bypass timer flag is set (reset in
DoPaint) to temporarily stop the animation, and the window DC is obtained to
avoid clipping problems. The currently saved image is then restored to the
screen. At that point, the DC is released and DefWindowProc is called to
perform normal processing. If the image was in the invalid region, the screen
is cleaned up. If not, the proper background is restored. Either way, you're
then able to get a certifiably clean background image in DoPaint. This may
seem inefficient, but the available alternatives take more code and more time,
and are still imprecise (rectangles vs. regions again).
To automate the animation, a timer is used to send messages at approximately
correct intervals and to allow the animation to proceed even when other
windows have the focus. GetNewTimer illustrates obtaining a timer and allows
the user, via a message box, to retry or cancel on failure (courtesy Charles
Petzold).
Now that you can ensure clean images and know when to display them, how can
you do so smoothly? Figure 3 illustrates a typical situation: The image is
moving to the southeast. Rectangle 1 has just been displayed. Rectangle 2 will
be displayed after restoring the screen. There is a large area of overlap and,
if possible, you'd prefer to paint this area only once. This is actually what
happens in MOVBMP. When the animation begins, a clean image of the screen is
saved in the global HBITMAP hbmOrg (see SelectBmp) and maintained throughout
the animation. The Animate routine handles all save, display, and restore
operations using two additional bitmaps, the window DC, and two memory DCs.
In Animate, the old positions are saved and new ones calculated. hbmOrg is
copied to hbmSave. The old and new positions are passed to DiffORects, which
returns the client coordinates of the incremental area to be covered by the
next display (the L-shaped area marked "2" in Figure 3) as two rectangles.
These clean areas are copied from the screen to the proper positions in
hbmOrg, as is the overlap portion from hbmSave. hbmOrg now has a complete
clean image of the area to be used by the next display. hbmOrg is then copied
to hbmDest. The bit mask is ANDed with hbmDest and the selected image is XORed
with hbmDest. BitBlt accomplishes these tasks using the raster operation (ROP)
codes SRCAND and SRCINVERT instead of the usual SRCCOPY. At this point, the
new "transparent" image is complete. DiffORects is called again to determine
the area to be restored. The same two rectangles are used, but the arguments
are reversed, since we are now concerned with the L-shaped area marked "1" in
Figure 3. With the values for these two differential rectangles in hand, the
restore area is BitBlted to the screen from hbmSave and the new image is put
to the screen from hbmDest. While there is a lot going on here, the work is
done with very fast memory bitmaps and DCs. No redundant data is written to
the display. DiffORects is the mainstay of this differential approach. The
routine was developed empirically and has been (very) thoroughly tested.
Because it works with client coordinates, the returned values are immediately
usable. For smaller images, routines like DiffORects can be eliminated, but
after about 20x20 pixels,
flash and jerkiness become increasingly noticeable. The iron
-cross images are 66x66 pixels.
Unless the Single Image menu option has been chosen, a different image is
selected every other cycle. The bitmaps were drawn and loaded as top, right,
bottom, and left images. By displaying the images in order, the illusion of
rotation is generated in addition to the diagonal movement of the image. These
could have been different colors, sunbursts, gradual fills, and so on, and are
intended to demonstrate that animation does not have to involve moving to
another area of the display. MOVBMP supports zero displacements so you can see
pure rotation and even what appears to be a single stationary image.
Lastly, Animate cleans up the working DCs and bitmaps. Pay close attention to
the reselection of the original or old bitmaps before deleting the DCs and
work bitmaps. If this is omitted or done incorrectly, Windows will experience
severe technical difficulties, as may your reputation, because the resulting
loss in system resources continues for the duration of the session. Animation
routines may be invoked thousands of times during a program run, so any
leakage can totally deplete available system resources.


Additions for Your Toolbox


MOVBMP also illustrates both unusual and commonly needed functionality for
your toolbox. Probably the most obvious is the elimination of the switch
statement in WndProc via an array of message id/function pointer pairs
(courtesy Ray Duncan). This is efficient for programmers because, among other
benefits, new messages are easy to add, each message is related to a function
(which helps avoid side effects and increases modularity), size boundaries can
be calculated for you at compile time, and the basic skeleton can be used in
every program. An extra function call is generated, but in practice this has
been a small price to pay. Priority messages or those requiring special
operations can be handled outside the array loop, as are WM_TIMER and
WM_DESTROY. The message-associated functions take the standard hwnd, msg,
wParam, and lParam arguments, which are #defined as STDWINARGS in the header
file for use in function declarations and definitions. I've defined STDWINARGS
in a base header that is included in every program.
Illustrations of font common-dialog usage are available, but little has been
written about nondefault font initialization, as shown in Listing Two.
Probably the best way to determine appropriate values is to write a small
program using the font common dialog and use your debugger to view the
returned LOGFONT structure. The font common dialog is also demonstrated in
MOVBMP and used as a device to allow the background to be changed in order to
verify that the animation is working properly.
Partial invalidation of the client area is used when displacement or timer
values are changed. Although all of the code in DoPaint is invoked, only the
affected area of the screen is redrawn. I cheated in calculating the area. You
should determine the longest string using GetTextExtent for actual sizes.


Conclusion



I've tried to avoid "exercises for the reader," but any program can be
improved. You might prefer to use a larger bitmap and one BitBlt to blast both
the restore area and new image in one operation. I chose lesser memory usage
and fewer calculations, and on my 386 20-MHz test machine, that is good
enough. Modifying SelectBmp to allow loading any bitmap would be a more
rewarding improvement. MOVBMP could then be used as a tool to determine
optimal timer and displacement values for a given animation. The remainder of
the program is not tied to specific bitmaps and may be used as is. MOVBMP is
written as a single C-code module for simplicity; however, the animation
functionality could easily be placed in a DLL or encapsulated in an animator
class. The complete project, including resource, definition, and executable
files is available electronically; see "Availability," page 2.


References


Duncan, Ray. "The Hazards of Exploring Evolving Environments." PC Magazine
(April 13, 1993).
Petzold, Charles. Programming Windows. Redmond, WA: Microsoft Press, 1990.
 Figure 1: Bitmaps used in the MOVBMP program. The animated image is an iron
cross with three layers of color on each arm. The center is transparent.
 Figure 2: The invalid portion of your client area expressed as a rectangle.
Given the coordinates corresponding to rect A, C, G, F, Windows clips to the
region F, D, E, B, C, and the area A, B, E, D is not repainted.
 Figure 3: Typical situation where the image is moving to the southeast,
rectangle 1 has just been displayed, and rectangle 2 will be displayed after
restoring the screen.
ANIMATION
[LISTING ONE] (Text begins on page 48.)

// movbmp.h

#ifndef MOVBMP_H
#define MOVBMP_H

#define STRICT

// INCLUDES
// --------
#include <windows.h>
#include <windowsx.h>
#include <commdlg.h>

// DEFINES
// -------
// define standard windows arguments
#define STDWINARGS HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam

#define IDM_EXIT 501

#define IDM_FONT 505
#define IDM_SELECTBMP 510
#define IDM_UNSELECTBMP 511
#define IDM_DDEC 515
#define IDM_DINC 520
#define IDM_TDEC 525
#define IDM_TINC 530
#define IDM_SINGLEIMAGE 535
// Timer ID
#define IMAGE_TIMER 1

// VARIABLE/USER-DEFINED TYPES
// ---------------------------
typedef struct
 { UINT msgid; // message id
 // pointer to a funtion taking std winproc parms
 LONG PASCAL (*pfnProc)( STDWINARGS );
 } WINMSGFN;


// FUNCTION PROTOTYPES
// -------------------
// Window and Exported procedures
extern "C" {
LRESULT WINAPI WndProc( STDWINARGS );

 } // end extern statement

// message function procs
LONG PASCAL DoCmd( STDWINARGS );
LONG PASCAL DoEraseBkgnd( STDWINARGS );
LONG PASCAL DoPaint ( STDWINARGS );
LONG PASCAL DoSize ( STDWINARGS );

// Command message function procs
LONG PASCAL ChangeDisplacement( STDWINARGS );
LONG PASCAL ChangeTimer ( STDWINARGS );
LONG PASCAL DoFont ( STDWINARGS );
LONG PASCAL SelectBmp ( STDWINARGS );
LONG PASCAL SingleImage ( STDWINARGS );
LONG PASCAL UnSelectBmp ( STDWINARGS );

// other functions
void Animate(HWND hwnd);
void DiffORects(RECT *rBmp, RECT *rRst, RECT *rUpper, RECT *rLower);
BOOL GetNewTimer(HWND hwnd);

#endif /* movbmp.h */

[LISTING TWO]

// MOVBMP Main Module

#include "movbmp.h"

// pgm globals ---------------------------
WINMSGFN msgfn[] =
 {
 WM_COMMAND, DoCmd,
 WM_ERASEBKGND, DoEraseBkgnd,
 WM_PAINT, DoPaint,
 WM_SIZE, DoSize,
 };
WINMSGFN cmdmsgfn[] =
 {
 IDM_DDEC, ChangeDisplacement,
 IDM_DINC, ChangeDisplacement,
 IDM_TDEC, ChangeTimer,
 IDM_TINC, ChangeTimer,
 IDM_FONT, DoFont,
 IDM_SELECTBMP, SelectBmp,
 IDM_SINGLEIMAGE, SingleImage,
 IDM_UNSELECTBMP, UnSelectBmp,
 };

// keep number of msgfn and cmdmsgfn elements
int imsgfndim = (sizeof(msgfn) / sizeof(msgfn[0]));
int icmdmsgfndim = (sizeof(cmdmsgfn) / sizeof(cmdmsgfn[0]));

HINSTANCE hInstGBL; // this instance
COLORREF crrgbColors = RGB(0,0,255); // blue
RECT rWndGBL = {0, 0, 0, 0}, rWndInv = {5, 0, 255, 0};
HDC hdcOrg;
HBITMAP hbmOrg, hbmOldOrg;
HBITMAP hbm[5]; // top, right, bottom, left, background


// init the font to TrueType Arial, 72 points
LOGFONT lfFontSel = { -96, // Height
 0, // Width
 250, // Escapement
 0, // Orientation
 400, // Weight
 \0', // Italic
 \0', // Underline
 \0', // StrikeOut
 \0', // CharSet
 \x03', // OutPrecision
 \x02', // ClipPrecision
 \x01', // Quality
 \x22', // PitchAndFamily
 "Arial" // FaceName
 };

// flag variables
BYTE btrblFlag = 0; // top/right/bottom/left flag
BYTE bBypassTimer = 255; // 255 indicates first pass
BYTE bBmpIsSelected = 0, bSingleImage = 0;

char szAppName[] = "Move Bitmap", szClassName[] = "MOVBMP";

int ndxGBL; // work index
int imageWidth, imageHeight, nTimerValue = 50;
int nXDsp = 1, nYDsp = 1; // displacement values
int nbmpX = 1, nDeltaX = 1, nbmpY = 1, nDeltaY = 1;

// begin -------------------------------------
#pragma argsused
int pascal WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
 LPSTR lpszCmdLine, int cmdShow)
 { HWND hWnd;
 MSG msg;
 WNDCLASS wndclass;
 char szString[] = "Move Bitmap is ending...";

 hInstGBL = hInstance; // save instance info
 // handle class registration and window creation
 if( !hPrevInstance )
 { wndclass.lpszClassName = szClassName;
 wndclass.hInstance = hInstance;
 wndclass.lpfnWndProc = WndProc;
 wndclass.hCursor = LoadCursor(NULL, IDC_ARROW);
 wndclass.hIcon = LoadIcon(NULL, IDI_APPLICATION);
 wndclass.lpszMenuName = "Main";
 wndclass.hbrBackground = (HBRUSH)(COLOR_WINDOW + 1);
 wndclass.style = NULL;
 wndclass.cbClsExtra = 0;
 wndclass.cbWndExtra = 0;

 if(!RegisterClass( &wndclass )) //error
 { MessageBox(NULL, "Can't create window class.", szString,
 MB_ICONEXCLAMATION MB_OK);
 return 1;
 }
 }

 hWnd = CreateWindow( szClassName,
 szAppName,
 WS_OVERLAPPEDWINDOW WS_CLIPCHILDREN,
 CW_USEDEFAULT,
 0,
 CW_USEDEFAULT,
 CW_USEDEFAULT,
 NULL,
 NULL,
 hInstance,
 NULL
 );
 if(hWnd == NULL)
 { MessageBox(NULL, "Can't create window.", szString,
 MB_ICONEXCLAMATION MB_OK);
 return 1;
 }
 ShowWindow (hWnd, cmdShow);
 UpdateWindow(hWnd);

 // pgm loop
 while ( GetMessage(&msg, NULL, 0, 0) )
 {
 TranslateMessage (&msg);
 DispatchMessage (&msg);
 }
 return 0;
 } // end WNDMAIN



LRESULT WINAPI WndProc( STDWINARGS )
 { // handle Windows messages
 if( msg == WM_TIMER ) // here for faster handling
 { // animate if appropriate timer;
 if(wParam == IMAGE_TIMER && !bBypassTimer) Animate(hwnd);
 return(0L);
 }
 for(ndxGBL = 0; ndxGBL < imsgfndim; ndxGBL++)
 { // walk messages array
 if(msg == msgfn[ndxGBL].msgid) // if match, call Fn
 return ((*msgfn[ndxGBL].pfnProc) (hwnd, msg, wParam, lParam));
 }
 if( msg == WM_DESTROY ) // if not in array
 { // happens once; don't waste searches or fn
 if(bBmpIsSelected) // clean up if bitmap was moving
 UnSelectBmp(hwnd, msg, wParam, lParam);
 PostQuitMessage (0);
 return(0L);
 }
 else return ( DefWindowProc(hwnd, msg, wParam, lParam) );
 } // end WndProc



void Animate(HWND hwnd) // Handle Bitmap Movement
 { static int ncounter = 0; // when to inc btrblFlag
 int nToX = 0, nToY = 0, nFromX = 0, nFromY = 0,
 nPrevX = nbmpX, nPrevY = nbmpY;

 RECT rBmp, rRst, rUpper, rLower;
 HDC hdcClient, hdcDest, hdcSave;
 HBITMAP hbmDest, hbmOldDest, hbmSave, hbmOldSave;

 hdcClient = GetDC( hwnd);
 hdcDest = CreateCompatibleDC( hdcClient);
 hdcSave = CreateCompatibleDC( hdcClient);

 hbmSave = CreateCompatibleBitmap(hdcClient,
 imageWidth, imageHeight);
 hbmOldSave = (HBITMAP)SelectObject(hdcSave, hbmSave);

 // create destination bitmap
 hbmDest = CreateCompatibleBitmap(hdcClient,
 imageWidth, imageHeight);
 hbmOldDest = (HBITMAP)SelectObject(hdcDest, hbmDest);

 // determine border/direction changes, if any
 if( nbmpX < 1) nDeltaX = nXDsp;
 else if( nbmpX > (rWndGBL.right-imageWidth) )
 nDeltaX = -(nXDsp);

 if( nbmpY < 1) nDeltaY = nYDsp;
 else if( nbmpY > (rWndGBL.bottom-imageHeight) )
 nDeltaY = -(nYDsp);

 nbmpX += nDeltaX; nbmpY += nDeltaY; // get new positions

 // copy clean image of saved screen
 BitBlt(hdcSave, 0, 0, imageWidth, imageHeight,
 hdcOrg, 0, 0, SRCCOPY);

 // set up incremental new screen image
 rBmp.left = nbmpX; rBmp.top = nbmpY;
 rBmp.right = nbmpX + imageWidth - 1;
 rBmp.bottom = nbmpY + imageHeight - 1;

 rRst.left = nPrevX; rRst.top = nPrevY;
 rRst.right = nPrevX + imageWidth - 1;
 rRst.bottom = nPrevY + imageHeight - 1;
 // get overlapping rect differentials
 DiffORects(&rRst, &rBmp, &rUpper, &rLower);
 // get clean incremental rects from screen
 BitBlt(hdcOrg, rUpper.left - nbmpX, 0,
 (rUpper.right - rUpper.left + 1),
 (rUpper.bottom - rUpper.top + 1),
 hdcClient, rUpper.left, rUpper.top, SRCCOPY);

 BitBlt(hdcOrg, rLower.left - nbmpX, rLower.top - nbmpY,
 (rLower.right - rLower.left + 1),
 (rLower.bottom - rLower.top + 1),
 hdcClient, rLower.left, rLower.top, SRCCOPY);

 // copy overlapping portion from previously saved screen image
 if(nbmpX < nPrevX) nToX = nXDsp; // if moving left
 else nFromX = nXDsp;

 if(nbmpY < nPrevY) nToY = nYDsp; // if moving up
 else nFromY = nYDsp;


 BitBlt(hdcOrg, nToX, nToY,
 imageWidth - nXDsp, imageHeight - nYDsp,
 hdcSave, nFromX, nFromY, SRCCOPY);

 // set up initial destination image
 BitBlt(hdcDest, 0, 0, imageWidth, imageHeight,
 hdcOrg, 0, 0, SRCCOPY);
 // set up mask reusing hdcSave
 SelectObject(hdcSave, hbm[4]);
 // AND the mask
 BitBlt(hdcDest, 0, 0, imageWidth, imageHeight,
 hdcSave, 0, 0, SRCAND);

 // set up to put the bitmap
 SelectObject(hdcSave, hbm[btrblFlag]);
 // XOR the bitmap
 BitBlt(hdcDest, 0, 0, imageWidth, imageHeight,
 hdcSave, 0, 0, SRCINVERT);

 // restore incremental rects from old screen image
 // use same rects with reverse order in fn call
 DiffORects(&rBmp, &rRst, &rUpper, &rLower);
 // put clean rects to screen
 SelectObject(hdcSave, hbmSave); // reselect hbmSave
 BitBlt(hdcClient, rUpper.left, rUpper.top,
 (rUpper.right - rUpper.left + 1),
 (rUpper.bottom - rUpper.top + 1),
 hdcSave, rUpper.left - nPrevX, 0, SRCCOPY);

 BitBlt(hdcClient, rLower.left, rLower.top,
 (rLower.right - rLower.left + 1),
 (rLower.bottom - rLower.top + 1),
 hdcSave, rLower.left - nPrevX,
 (rLower.top - nPrevY), SRCCOPY);

 // PUT image to the screen
 BitBlt(hdcClient, nbmpX, nbmpY, imageWidth, imageHeight,
 hdcDest, 0, 0, SRCCOPY);

 if(!bSingleImage)
 { // change top, right, bottom, left every other display
 if(++ncounter >= 2)
 { // set bitmap index
 btrblFlag++; ncounter = 0;
 if(btrblFlag > 3) btrblFlag = 0; // reset flag/index
 }
 }
 // clean up memory DC's, work bitmaps
 SelectObject(hdcDest, hbmOldDest);
 DeleteDC(hdcDest);
 DeleteObject(hbmDest);

 SelectObject( hdcSave, hbmOldSave);
 DeleteDC( hdcSave);
 DeleteObject( hbmSave );

 ReleaseDC( hwnd, hdcClient);
 } // end Animate




#pragma argsused
LONG PASCAL ChangeDisplacement( STDWINARGS )
 { // inc/dec x & y displacement by 1 to min/max
 if(wParam == IDM_DDEC) // decrement rqs
 {
 if(nXDsp < 1 nYDsp < 1) // 0 pix min
 nXDsp = nYDsp = 1;
 else nXDsp -= 1, nYDsp -= 1;
 }
 else // must have been increment rqs
 {
 if(nXDsp > 49 nYDsp > 49) // 50 pix max
 nXDsp = nYDsp = 50;
 else nXDsp += 1, nYDsp += 1;
 }
 // reset delta values
 nDeltaX = (nDeltaX < 1) ? -nXDsp : nXDsp;
 nDeltaY = (nDeltaY < 1) ? -nYDsp : nYDsp;

 InvalidateRect(hwnd, &rWndInv, TRUE); // redraw new values
 return (0L);
 } // end ChangeDisplacement



#pragma argsused
LONG PASCAL ChangeTimer( STDWINARGS )
 { // inc or dec timer interval by 50
 if(bBmpIsSelected)
 {
 KillTimer(hwnd, IMAGE_TIMER);
 bBypassTimer = 1; // set bypass flag
 }
 if(wParam == IDM_TDEC) // decrement rqs
 { // 50 ms min
 if(nTimerValue < 100) nTimerValue = 50;
 else nTimerValue -= 50;
 }
 else // must have been increment rqs
 { // 1000 ms max
 if(nTimerValue > 950) nTimerValue = 1000;
 else nTimerValue += 50;
 }
 InvalidateRect(hwnd, &rWndInv, TRUE); // redraw new values
 if(bBmpIsSelected)
 {
 if( !GetNewTimer(hwnd) ) // if can't get timer, end movement
 UnSelectBmp(hwnd, msg, wParam, lParam);
 bBypassTimer = 0; // reset bypass flag
 }
 return (0L);
 } // end ChangeTimer



void DiffORects(RECT *rBmp, RECT *rRst, RECT *rUpper, RECT *rLower)

 { // finds 2 rects portion of rRST outside overlap in
 // client coords, returns rects in rUpper and rLower
 rUpper->top = rRst->top; rLower->bottom = rRst->bottom; // always

 if(rBmp->top > rRst->top)
 rUpper->bottom = rBmp->top - 1, rLower->top = rBmp->top;
 else
 rUpper->bottom = rBmp->bottom, rLower->top = rBmp->bottom + 1;

 if(rBmp->left > rRst->left)
 {
 rUpper->left = rLower->left = rRst->left;
 if(rBmp->top > rRst->top)
 rUpper->right = rRst->right, rLower->right = rBmp->left - 1;
 else // rBmp->top < rRst->top
 rUpper->right = rBmp->left - 1, rLower->right = rRst->right;
 }
 else // rBmp->left is same or less than rRst->left
 {
 rUpper->right = rLower->right = rRst->right;
 if(rBmp->top > rRst->top)
 rUpper->left = rRst->left, rLower->left = rBmp->right + 1;
 else
 rUpper->left = rBmp->right + 1, rLower->left = rRst->left;
 }
 return;
 } // end DiffORects



#pragma argsused
LONG PASCAL DoCmd( STDWINARGS )
 { // Handle WM_COMMAND msg, check wParam contents
 for(ndxGBL = 0; ndxGBL < icmdmsgfndim; ndxGBL++)
 {
 if(wParam == cmdmsgfn[ndxGBL].msgid) // if match, call Fn
 return ((*cmdmsgfn[ndxGBL].pfnProc)(hwnd, msg, wParam, lParam));
 }
 if( wParam == IDM_EXIT ) // if not in array
 { // happens once; don't waste searches or fn
 SendMessage(hwnd, WM_CLOSE, 0, 0L);
 return(0L);
 }
 else return ( DefWindowProc(hwnd, msg, wParam, lParam) );
 } // end DoCmd



#pragma argsused
LONG PASCAL // Handle WM_ERASEBKGND message
 DoEraseBkgnd( STDWINARGS )
 { // chk bypass - SOMETIMES get this msg before AND during
 // WM_PAINT/BeginPaint. Set flag here, DoPaint clears it.
 if(bBmpIsSelected && !bBypassTimer)
 { HDC hdc;
 // WGO-- BitBlt captures image from overlapping screen
 // when bmp is underneath. So work around... If this
 // restore is erroneous, ERASE gets it in invalidated
 // region. If valid, the restoration remains.

 // Done here due to timing and because BeginPaint
 // CLIPS to the invalid rectangle (or smaller)
 // so our work would be for nothing in that routine.
 bBypassTimer = 1; // set bypass timer routine flag.
 hdc = GetDC( hwnd ); // restore saved screen
 BitBlt(hdc, nbmpX, nbmpY, imageWidth, imageHeight,
 hdcOrg, 0, 0, SRCCOPY);
 ReleaseDC( hwnd, hdc);
 }
 return ( DefWindowProc(hwnd, msg, wParam, lParam) );
 } // end DoEraseBkgnd



 #pragma argsused
LONG PASCAL // Handle IDM_FONT Menu message
 DoFont( STDWINARGS )
 { CHOOSEFONT cfStruct;
 LOGFONT lfStruct = lfFontSel;

 cfStruct.lStructSize = sizeof(CHOOSEFONT);
 cfStruct.hwndOwner = hwnd;
 cfStruct.lpLogFont = &lfStruct;
 cfStruct.Flags = CF_SCREENFONTS CF_EFFECTS 
 CF_INITTOLOGFONTSTRUCT;
 cfStruct.rgbColors = crrgbColors;

 if( ChooseFont(&cfStruct) )
 {
 lfFontSel = lfStruct;
 lfFontSel.lfEscapement = 250;
 crrgbColors = cfStruct.rgbColors; // set selected color
 // redraw with selected font
 InvalidateRect(hwnd, &rWndGBL, TRUE);
 }
 return (0L);
 } // end DoFont



#pragma argsused
LONG PASCAL // Handle WM_PAINT message
 DoPaint( STDWINARGS )
 { HDC hdc;
 HFONT hFont = NULL, hFontPrev = NULL;
 PAINTSTRUCT ps;
 char szOutString[40];
 int nlength;

 hdc = BeginPaint(hwnd, &ps);
 // draw timer and displacement values
 nlength = wsprintf(szOutString, "Timer Value is %d.", nTimerValue);
 ExtTextOut(hdc, 5, rWndGBL.bottom - 60, 0, NULL, szOutString,
 nlength, NULL);

 nlength = wsprintf(szOutString, "X Displacement is %d.", nXDsp);
 ExtTextOut(hdc, 5, rWndGBL.bottom - 40, 0, NULL,
 szOutString, nlength, NULL);


 nlength = wsprintf(szOutString, "Y Displacement is %d.", nYDsp);
 ExtTextOut(hdc, 5, rWndGBL.bottom - 20, 0, NULL,
 szOutString, nlength, NULL);

 SetBkMode(hdc, TRANSPARENT);
 // create and select the font
 hFont = CreateFontIndirect(&lfFontSel);
 hFontPrev = (HFONT)SelectObject(hdc, hFont);

 SetTextAlign(hdc, TA_BASELINE TA_CENTER);
 // create shadow by drawing in black first, one over
 ExtTextOut(hdc, (rWndGBL.right >> 1) + 1,
 (rWndGBL.bottom >> 1) + 1,
 0, NULL, "U S A", 5, NULL);
 SetTextColor(hdc, crrgbColors); // now in selected color
 ExtTextOut(hdc, rWndGBL.right >> 1,
 (rWndGBL.bottom >> 1),
 0, NULL, "U S A", 5, NULL);

 if(hFontPrev) // delete the font
 { SelectObject(hdc, hFontPrev);
 DeleteObject(hFont);
 }
 if(bBmpIsSelected) // get fresh screen image
 BitBlt(hdcOrg, 0, 0, imageWidth, imageHeight,
 hdc, nbmpX, nbmpY, SRCCOPY);

 EndPaint( hwnd, &ps);
 bBypassTimer = 0; // reset bypass flag
 return(0L);
 } // end DoPaint



#pragma argsused
LONG PASCAL DoSize( STDWINARGS )
 { // Handle Wm_SIZE message
 if( rWndGBL.right == LOWORD(lParam) &&
 rWndGBL.bottom == HIWORD(lParam) )
 return 0L; // if same size, exit

 // otherwise get new size
 rWndGBL.right = LOWORD(lParam);
 rWndGBL.bottom = HIWORD(lParam);
 // update invalid text rect
 rWndInv.top = rWndGBL.bottom - 60;
 rWndInv.bottom = rWndGBL.bottom;

 if(bBypassTimer == 255)
 { // if first pass, exit after setting size
 bBypassTimer = 0;
 return 0L;
 }
 if(wParam == SIZE_MAXIMIZED wParam == SIZE_RESTORED)
 { // redraw on new size
 if(bBmpIsSelected != 2)
 InvalidateRect(hwnd, &rWndGBL, TRUE);
 else // restart if was icon and movement sel
 SelectBmp(hwnd, msg, wParam, lParam);


 return 0L;
 }
 if(wParam == SIZE_MINIMIZED && bBmpIsSelected == 1)
 { // If going to icon, stop movement, set sel to 2 for restore.
 UnSelectBmp(hwnd, msg, wParam, lParam);
 bBmpIsSelected = 2; // set Flag for size return
 }
 return (0L);
 } // end DoSize



BOOL GetNewTimer(HWND hwnd)
 { // set up timer
 while( !SetTimer(hwnd, IMAGE_TIMER, nTimerValue, NULL) )
 {
 if( IDCANCEL == MessageBox( hwnd,
 "Can't get timer for animation.",
 "Move Bitmap Information",
 MB_ICONEXCLAMATION MB_RETRYCANCEL) )
 return (FALSE);
 }
 return (TRUE); // got timer
 } // end GetNewTimer



#pragma argsused
LONG PASCAL // Handle IDM_SELECTBMP Menu message
 SelectBmp( STDWINARGS )
 { HDC hdcClient;
 BITMAP bm; // Bitmap info
 HMENU hmenuMain = GetMenu(hwnd);

 if( !GetNewTimer(hwnd) )
 return 0L; // return if can't obtain timer

 if(wParam == IDM_SELECTBMP) // chg menu select to stop
 ModifyMenu(hmenuMain, IDM_SELECTBMP, MF_BYCOMMAND,
 IDM_UNSELECTBMP, "Stop Movement");
 // load bitmaps from resource file
 hbm[0] = LoadBitmap(hInstGBL, "image1"); // top
 hbm[1] = LoadBitmap(hInstGBL, "image2"); // right
 hbm[2] = LoadBitmap(hInstGBL, "image3"); // bottom
 hbm[3] = LoadBitmap(hInstGBL, "image4"); // left
 hbm[4] = LoadBitmap(hInstGBL, "imagebkg"); // mask

 GetObject(hbm[0], sizeof(bm), (LPSTR)&bm);
 imageWidth = bm.bmWidth; imageHeight = bm.bmHeight;
 btrblFlag = 1; // set top as first image

 hdcClient = GetDC( hwnd);
 hdcOrg = CreateCompatibleDC( hdcClient);
 hbmOrg = CreateCompatibleBitmap(hdcClient,
 imageWidth, imageHeight);
 hbmOldOrg = (HBITMAP)SelectObject(hdcOrg, hbmOrg);
 // save initial screen image
 BitBlt(hdcOrg, 0, 0, imageWidth, imageHeight,

 hdcClient, nbmpX, nbmpY, SRCCOPY);

 // clean up
 ReleaseDC(hwnd, hdcClient);
 bBmpIsSelected = 1;
 return (0L);
 } // end SelectBmp



#pragma argsused
LONG PASCAL // Handle IDM_SINGLEIMAGE command
 SingleImage( STDWINARGS )
 { // Toggle Single Image/Multiple image switch
 int nChkType;
 HMENU hmenu = GetMenu(hwnd);

 bSingleImage = !bSingleImage;
 nChkType = (bSingleImage) ? MF_CHECKED : MF_UNCHECKED;
 CheckMenuItem(hmenu, IDM_SINGLEIMAGE, MF_BYCOMMAND nChkType);
 return(0L);
 } // end SingleImage



#pragma argsused
LONG PASCAL // Handle IDM_UNSELECTBMP
 UnSelectBmp( STDWINARGS )
 { HMENU hmenu = GetMenu(hwnd);
 HDC hdcClient;

 KillTimer(hwnd, IMAGE_TIMER);
 bBmpIsSelected = 0;
 // restore the screen
 hdcClient = GetDC( hwnd);
 BitBlt(hdcClient, nbmpX, nbmpY, imageWidth, imageHeight,
 hdcOrg, 0, 0, SRCCOPY);
 ReleaseDC( hwnd, hdcClient);

 if(wParam == IDM_UNSELECTBMP)
 { // if menu call, redraw: image may be under menu
 InvalidateRect(hwnd, &rWndGBL, TRUE);
 // chg menu select to start
 ModifyMenu(hmenu, IDM_UNSELECTBMP, MF_BYCOMMAND,
 IDM_SELECTBMP, "Start Movement");
 }
 for(ndxGBL = 0; ndxGBL < 5; ndxGBL++) // delete the bitmaps
 DeleteObject( hbm[ndxGBL] );

 SelectObject(hdcOrg, hbmOldOrg); // deselects hbmOrg
 DeleteDC(hdcOrg);
 DeleteObject(hbmOrg);

 return (0L);
 } // end UnSelectBmp







Special Issue, 1994
Writing a Multimedia App in Liana


Dynamic behavior requires a dynamic language


Jack is president of Base Technology, where he develops and markets Liana and
consults on development of Windows applications. He can be reached at
303-440-4558, by fax at 303-444-4186, on CompuServe at 70642,2662, or at 1320
Pearl St., Suite 110-B, Boulder, CO 80302.


Three fashionable topics that get a lot of talk are multimedia, Windows, and
object-oriented programming (OOP). But sometimes it's hard to find instances
in which these technologies are applied to real-world problems. In this
article, I'll talk about a multimedia application that's currently in
day-to-day use at a number of corporations and describe how it was developed.
I developed Avenue as a consulting project for VIS Development Corp. (Waltham,
MA). In implementing Avenue, I used the Liana object-oriented programming
language and class library. Liana is a development tool I created that lets
you build small to medium-size Windows applications very quickly. (See Ray
Valds's "The Liana Programming Language," DDJ, October 1993.)


Enhancing Training Videos


Although businesses have much money invested in training videos, managers may
wonder if these videos are effective in imparting knowledge, given the passive
nature of video-watching. To make the learning experience more interactive,
VIS Development offers a service that digitizes existing training video tapes,
places them on CD-ROM, and structures them via an interactive graphical front
end: Avenue. The result is a package that is more engaging to users. The
package can also gather statistics on usage, as well as conduct tests on
comprehension.
Last year, VIS staff prototyped Avenue using a DOS-based multimedia authoring
system. Then they decided to move it to Windows using Asymetrix ToolBook.
However, progress soon stalled because the staff lacked programming
experience. When I was brought on board to implement the application, I
proposed Liana as a way of obtaining a quality result as soon as possible.
Liana's interpretive nature and high level of abstraction free you from much
of the details of C/C++ programming and the Windows API. Although Liana did
not have any multimedia support, a quick review of the Media Control Interface
(MCI) led me to the conclusion that a Liana video class would be very easy to
implement. I was able to get a preliminary version of the class running within
a few hours. But before discussing my Liana implementation, I'll describe
Avenue further.


Overview of Avenue


Avenue allows the user to walk up to a PC and quickly select and start a video
training course. The hierarchical structure of the front end allows the user
to visually navigate to the desired training video; see Figure 1.
Avenue is really a meta-application, akin to an authoring tool. It adds menus,
which can be full-screen bitmap images, and buttons, which can be of arbitrary
size. It operates via keyboard or mouse, but is also well suited for use with
touch screens. The images may be as simple as text menus or as sophisticated
as scanned photographs or diagrams that allow users to visually select an item
of interest. A selected item might immediately bring up a training-video
segment, or, alternately, a more-detailed image might be shown so that the
user can make a more precise selection. The author of the training application
can also specify an initial logon dialog, so that the material can be tailored
to each user's skill level. Progress through the video can also be a result of
passing tests in previous training sessions.
Course authors, managers, and system administrators can customize a particular
training video via a set of scripts and configuration files. In addition to
controlling the order of presentation of video segments, the scripts define
the hierarchical structure of the presentation and can fine-tune the
appearance of the screen.
The users of Avenue-produced training materials vary widely in skill level,
depending on the course material; they can include factory workers and sales
personnel, as well as corporate managers. Some users are comfortable with a
dense set of choices and controls; others will want a friendlier, sparser
presentation of choices with fewer controls. Avenue can accommodate changes
without requiring expensive video editing or custom software development.
Avenue has a facility for logging who watched what, and for how long. It also
has a testing capability with features that can be used to survey users, as
well as test for comprehension.


Books and Chapters


In Avenue parlance, each of the top-level menus is known as a "book," while an
individual training video is a "chapter." The user selects a chapter from a
book; a chapter is, of course, organized as a sequence of pages. Each page
consists of a video clip and optional text and graphics.
The chapter's author decides whether a given page should auto-advance to the
next page upon completion of that segment or whether to give the user complete
control. A standard complement of VCR-like buttons is available, including
previous page, rewind, play/pause, forward, and next page. The Pages button
brings up a list box of all page titles so the user can randomly go to any
page. The Index button brings up an alphabetical index for the chapter to also
allow random selection. The author can decide to share an index between
multiple chapters; the index supports branching across chapters. The Mark
button allows the user to set bookmarks and return to them later. The More
button allows the user to request more detail in the form of a nested
chapter-within-a-chapter.
The course author can associate a test with a given chapter. A test is really
just a specialized chapter in which each question is a page. A test page often
has a full-screen bitmap that displays the question and multiple-choice
answers using a mix of fonts, colors, and graphics. A test question may also
be posed by a video clip associated with the page. Another video clip can be
shown if the user gets an incorrect answer. The Avenue script file offers the
course author a lot of flexibility in designing the test and in how to respond
to user responses.
Avenue supports various ways of logging user activity. These include recording
what chapters have been viewed, which pages have been seen, the time spent in
each chapter or page, and the results of any tests taken. An instructor can
find out not just a user's test score, but also how each question was
answered, in how many attempts. There is more to Avenue than I have the space
for, but the description here should give you an idea of the implementation
task I faced.


Application Code


Due to space limits and proprietary considerations, I'll show only an abridged
version of Avenue's source code. Hopefully, the excerpts in Listings One
through Five will give you a good idea of what's involved in writing a
nontrivial Liana application. There is additional code in the electronic
version of this article; see "Availability," page 2.
The application consists of a main function (Listing One, page 56) and a
number of classes. Similar to C, the Liana interpreter calls main after
initializing the environment. Unlike C, a Liana main function initializes the
application and then quickly returns. The Liana interpreter then waits until
the application's windows have been closed before cleaning up and exiting.
Liana automatically dispatches Windows events by calling member functions for
the Liana object associated with each window.
The hierarchy of open books is stored in a Liana stack object created in the
main function. The stack class (Listing Two, page 56) is derived from the
Liana dynamic-array class and is a good example of how easy it is to
manipulate objects in Liana. Since Liana is a typeless language, there's no
need to have separate classes or templates for every type of object that you
wish to store on a stack or in an array. The same class is used for the
chapter stack.
The two primary classes in Avenue are bookwin (Listing Three, page 56), which
manages a book window, and chapwin (Listing Four, page 56), which manages a
chapter window. The touch_menu class (available electronically) is common to
both bookwin and chapwin and manages the display of a bitmap and the selection
of soft buttons. A Liana displaylist is used to represent the soft buttons.
Other classes in Avenue include a book_entry class, which is based on the
Liana figure class, plus a user (skill) level, a path string, and a label
string. There is also a figure class, which represents the rectangular area to
be defined for each soft button used to bring up books. Avenue's touch_menu
class is a general-purpose class that could be reused in other applications.
The chapfile class manages a single chapter file, which consists of a text
file with a bunch of header lines followed by one line per page. Each page
line is stored in a page object; the page objects are stored in a Liana
dynamic array. The Liana range class is used to store the range of video
frames for a chapter page. It is another good example of the flexibility of
Liana, since the start and end of the range can be of any type: integer, real,
string, or a user-defined class.
Avenue's chapter_state class is used to record the state of a chapter,
including the path of the chapter file, the information about the current page
of the chapter, the current video-frame number, and whether video is actually
playing. The index_entry class is based on the chapter_state class because an
index entry needs much of the same information, plus the text to be displayed
in the index. Avenue supports index entries that branch across chapters and
even to a specific frame within a page.
Finally, the Liana video class (Listing Five, page 57) encapsulates the
Windows Media Control Interface (MCI) API. This class also provides useful
features such as the option of displaying a bitmap over the video window. At
the end of Listing Five, you can see how a DLL entry point is declared in
Liana. The Avenue implementation does not use the video class directly, but
instead derives class chvideo from it. This is for several reasons:
To detect frame transitions so Avenue can update bitmaps when the end of a
chapter page is reached.
To customize error handling.
To get control when the video finishes playing a range of frames.
To get control whenever a seek is performed on the video so the bitmaps can be
synchronized in case the page number changes. The frame, mci_error, succeeded,
and position_set member functions are called for those events, so they are
redefined by the chvideo class.


Project Assessment



Unlike many commercial projects, the implementation effort on Avenue was
rather modest. As the the only developer, I started the project in late
September 1992, working less than half-time. The first product shipment
occurred barely four months later in early February 1993.
Avenue consists of about 3000 lines of Liana code. Based on my experience
implementing Windows apps in C using the Windows SDK (Liana is one such
instance), I estimate that a native C implementation of Avenue would require
about 30,000 lines of code and would have taken perhaps ten times longer.
Estimating a C++ implementation is a bit trickier, given the various class
libraries and application frameworks, but I suspect it would require 15,000
lines of C++ and take perhaps five times longer.
Would an application generator have helped? Not really. Avenue's user
interface is dynamic in nature and is not something that can be statically
defined with a dialog editor, resource toolkit, or window builder.
The only real downside to using Liana is that it's a proprietary language from
a one-person company (however, full source code is available). Also, some
programmers prefer a more comprehensive interactive development environment
(IDE). Even so, Liana's prototyping speed merits your consideration.
 Figure 1: The Avenue interface.
[LISTING ONE] (Text begins on page 54.)

// Main entry point for Avenue --- by Jack Krupansky.
// Copyright 1993 VIS Development Corporation

void main ()
{ // Access the private profile (.INI) file for AVENUE.
 env = new envfile (path (getcwd (), apname, ".ini"));

 // Determine whether status line is enabled for books.
 if ((string s = env ["enable_book_status_line"]) == "")
 enable_book_status_line = true;
 else
 enable_book_status_line = bool (int (s));

 // Create initial book window.
 window w = new bookwin (path (argv [1], apname));
 if (! w.path)
 exit (); // Exit if we can't open the book.
 books = new stack; // Create a stack for the book hierarchy.
 book << w; // Put the initial book on the stack.
 w.show; // Make the book window visible.
 // main() then returns to Liana to wait until application is
 // terminated by the user.
}

[LISTING TWO]

// Implementation of class Stack --- by Jack Krupansky
// Copyright 1993 John W. Krupansky d/b/a Base Technology

class stack: array {
 bool isempty () { return this.size == 0; }
 any pop () {
 if (int i = this.size) {
 any v = this [i - 1];
 trim (1);
 }
 return v;
 }
 any push (any v) { this << v; return v; }
 void stack () { array (); } // constructor calls base class
 any top () {
 if (int i = this.size) return this [i - 1];
 else return null;
 }
};

[LISTING THREE]

// Implementation of class Bookwin --- by Jack Krupansky
// Copyright 1993 VIS Development Corporation


class bookwin: touch_menu {
 publicread:
 string path;
 void bookwin (string path);
 void clicked (int i) {
 if (i == -1)
 this.status = "Nothing selected, try again...";
 else {
 string path = this [i].path;
 if (path == "*exit") exit_book ();
 else new_chapter (i);
 }
 }
 bool close ();
 void destroy ();
 void exit_book ();
 void new_chapter (int course) {
 any c = this [course];
 if (! access (string path = c.path)) {
 warn ("Unable to access map/chapter file: "+path);
 this.status = "Select something...";
 return;
 }
 this.status = "Loading "+c.text; // Let user know what we're doing
 this.cursor = "wait"; // Display the hour-glass cursor.
 window cur_map = books.top; // Get top book from the stack.
 // See if we're going to another
 if (ext (path) == ".CHP") { // book or to a chapter.
 if (chapwin // See if we're going to the same chapter.
 && path == chapwin.path) {

 this.status = "Resuming chapter...";
 } else {
 if (! chapwin) chapwin = new chapwin;
 }
 chapwin.path = path; // Open the new chapter.
 if (chapwin.path.size)
 chapwin.show_maximized; // Make the chapter visible.
 } else {
 window new_map = new bookwin (path); // Open a new book window.
 if (new_map.path) {
 books << new_map; // Push the book on the stack.
 new_map.show; // Make the new book window visible
 this.hide; // Hide the current book.
 }
 }
 this.cursor = "normal"; // Restore the cursor.
 }
 void path_set (string path);
 int user_level_set (int user_level);
};

[[LISTING FOUR]

// Implementation of class Chapwin -- by Jack Krupansky.
// Copyright 1993 VIS Development Corporation

class chapwin: touch_menu {
 chapter_state incorrect_video_saved_state;

 publicread:
 chapfile chapfile;
 pushbutton play_button;
 chvideo tv;
 string path;
 int page;
 page p;
 int correct_answers;
 public:
 bool page_ended;

 void chapwin () {
 touch_menu (); // Initialize object for parent class.
 chapters = new stack; // Create stack for nested chapters.
 // Create the buttons at top of window.
 this
 << (prev_page_button = new pushbutton ("<<-", "prev_page"))
 << (play_button = new pushbutton ("Play","toggle_play_video",8))
 // ...more buttons...;
 }
 void chose (int i);
 void clear_touch_area_definitions ();
 void clicked (int i);
 void close ();
 answer_dialog get_answer_dialog (string code);
 void force_store_playing_time (bool finished);
 void go_forward ();

 void load_touch_area_definitions (string path);
 void moved (x, y);
 void next_page () {this.page++;}
 int page_finished ();
 int page_set (int page);
 void paint (x1, y1, x2, y2);
 void paint_status ();
 void paint_msg_status ();
 void paint_page_status ();
 void paint_title_status ();
 string path_set (string path);
 void pause_video () {
 if (tv.playing) {
 tv.pause; // Stop the video.
 // Toggle label of Play btn
 play_button.text = "Play"; // from Pause back to Play.
 // Display status for user.
 this.status = "Pausing video at frame "+tv.position;
 force_store_playing_time (false); // Record user statistics.
 }
 }
 range play_range ();
 void play_video () {
 if (! tv.playing) toggle_play_video ();
 }
 void position (x, y);
 int position_set (int position);
 void prev_page () {this.page--;}
 void push_to_chapter ();
 void play_video ();
 void refresh ();

 void return_from_chapter ();
 void rewind_video ();
 void set_bookmark ();
 void set_button_visibility (pushbutton button, bool visible);
 void set_control_visibility (control c, bool visible);
 void show_help ();
 void show_index ();
 void show_more_detail ();
 void show_pages ();
 chapter_state state ();
 chapter_state state_set (chapter_state s);
 string status_set (string status);
 bool status_line_enabled_set (bool v) {return v;}
 void store_playing_time (bool finished);
 void toggle_play_video () {
 if (tv.playing) pause_video ();
 else {
 if (page_ended) // If user presses play at end of a page
 this.page++; // Then go on to the next page.
 tv.play; // Start the video playing.
 started_play = time (); // Record the time spent playing video.
 play_button.text = "Pause"; // Change label of Play btn to Pause
 this.status = "Playing frames " // Display status for user.
 + tv.frames.start

 + " to " + tv.frames.end;
 }
 }
 void video_finished ();
};

[LISTING FIVE]

// Implementation of class Video, which encapsulates the Windows
// Windows Media Control Interface (MCI)
// Copyright 1993 John W. Krupansky d/b/a Base Technology

class video: custombutton {
 static int vid_gen;
 string vid;
 timer frame_timer;
 range frames;
 bool video_visible;
 publicread:
 string path;
 bitmap bitmap;
 bitmap bitmap_set (bitmap bitmap);
 int clean_position (int pos) {
 return max (1, min (this.size, pos));
 }
 void close ();
 void destroy ();
 void frame (int frame) {}
 range frames () {return frames.copy;}
 range frames_set (range r);
 long handle_message (
 unsigned hWnd,
 unsigned message,
 unsigned wParam,

 long lParam);
 bool mci_execute (string command);
 void mci_error (string command, string msg);
 void show_mci_error (string command, string msg);
 bool mci_execute (string command, int hWnd);
 void paint (int x1, int y1, int x2, int y2) {paint ();}

 void paint ();
 string path_set (string path);
 void pause () {
 if (path && ! ready () && ! stopped ()) {
 mci_execute ("pause "+vid);
 frames.start = this.position;
 }
 }
 void play () {
 pause ();
 if (path) {
 if (frames.start != null) play (frames);
 else {
 mci_execute ("play "+vid+" notify", handle);
 this.video_visible = true;
 start_timer ();
 }
 }
 }
 void play (range range) {play (range.start, range.end);}
 void play (int start, int end) {
 if (path) {
 this.frames.range (start, end);
 pause ();
 start = clean_position (start);
 end = clean_position (end);
 string cmd = "play "+vid;
 if (this.position != start)
 cmd += " from "+start;
 cmd += " to "+end+" notify";
 mci_execute (cmd, handle);
 this.video_visible = true;
 start_timer ();
 }
 }
 bool playing () {return this.status == "playing";}
 int position () {return int (status ("position"));}
 int position_set (int position)
 {
 position = clean_position (position);
 if (this.position != position) {
 if (path) {
 if (bool not_ready = ! ready ())
 pause ();
 mci_execute ("seek "+vid+" to "+position);
 frames.start = position;
 if (not_ready)
 play ();
 }
 } else
 frames.start = position;
 return position;

 }
 bool ready () {return status ("ready") == "true";}
 void rewind ();

 int size () {return int (status ("length"));}
 void start_timer ();
 string status () {return status ("mode");}
 string status (string item)
 {
 if (path) {
 string result = "";
 memory buf = malloc (256);
 string command = "status "+vid+" "+item;
 // call the MCI DLL entry points
 if (int rv = mciSendString (command, buf, buf.size, handle)) {
 mciGetErrorString (rv, buf, buf.size); // another DLL entry point
 warn ("MCI Error", command+"\n\n"+buf);
 } else {
 result = ""+buf;
 }
 free (buf);
 return result;
 }
 }
 void step ();
 void stop ();
 bool stopped () {return status ("mode") == "stopped";}
 void succeeded ();
 void video ();
 void video (string path);
 bool video_visible () {return status ("video") == "on";}
 bool video_visible_set (bool video_visible);
 };
unsigned long mciSendString (
 char *lpstrCommand,
 char *lpstrReturnString,
 unsigned wReturnLength,
 unsigned hCallback) "mmsystem.dll:mciSendString"
bool mciGetErrorString (
 unsigned long wError,
 char *lpstrBuffer,
 unsigned uLength) "mmsystem.dll:mciGetErrorString"

End Listings


















Special Issue, 1994
The VESA BIOS Extension/Audio Interface


A standard software interface for audio




Doug Cody


Doug is a software engineer at MediaVision and member of the VBE/AI committee.
He can be contacted at MediaVision, 47300 Bayside Parkway, Fremont, CA 94538.


The Video Electronics Standards Association (VESA, for short) is an
international, nonprofit standards organization responsible for creating the
Super VGA BIOS standards, that has developed and published the VESA BIOS
Extension/Audio Interface (VBE/AI) for AT-class PCs. The main goal of VBE/AI
is to provide a standard software interface for audio, similar to the ROM BIOS
for video. Just as the lack of Super-VGA standards has hindered application
development in the video world, the lack of sound standards has limited the
sound-board industry. At the current time, support for individual audio boards
falls on the shoulders of third-party developers. In addressing this void,
VBE/AI has three main goals:
Provide a hardware-independent approach to controlling digital (WAVE) audio,
MIDI, and volume control.
Hide implementation details, so the code can be implemented as a ROM BIOS, DOS
device driver, or TSR.
Give the interface comparable functionality with Windows, NT, and OS/2, so
these OSs can support the interface through virtual services.
The following article briefly describes various aspects of the VBE Audio
Interface.


VESA Audio Services


The video BIOS in PCs provides a set of functions to set video modes,
read/write characters to the screen, scroll the screen, and so on. Figure 1
shows the original services. The VESA Super-VGA BIOS extensions started higher
numerically to coexist with the standard BIOS. In the first release, the
extensions appeared similar to the standard set; see Figure 2. In the next
couple of releases, a provision was made to allow extensions for other types
of devices, starting at function number 0x4F10. All new VESA standard devices
will be allocated a particular ID, as in the case of the Audio Interface,
which was assigned 0x13. Figure 3 lists the complete set of VBE/AI Int 10H
functions. The primary purpose of the Int 10 services is to allow the
application to query, open, and close the device. Access to the device API is
performed through 32-bit far calls using Pascal calling convention.
Since there can be multiple audio devices in one system, VBE/AI allows for
multiple software APIs, one per device. A handle gives the application access
to one device API. To use a device, the application locates a device handle,
queries the specific device via the handle, then opens and closes the device
via the handle.
Before jumping into specific device services, I just want to skim over the
query process. Within VBE/AI, each device provides two data structures. One of
these contains information about the device, and the other contains the
specific entry points of the device services. The query process allows the
application to retrieve a copy of the information structure. This structure
contains information such as hardware ID, configuration, and capabilities. The
format of this structure is not yet complete at the time of this writing, but
it is worth mentioning because it is an integral part of the VBE/AI interface.


WAVE Audio Interface


The Wave device class supports two common metaphors for handling the data:
single-block and continuous streaming. The single-block operation performs the
operation on the data block, issues an interrupt when done, then returns to an
idle state. The streaming mode attempts to keep data moving to and from the
hardware and doesn't stop until explicitly told to by the application. In this
mode, completion interrupts are presented to the application at fixed points
during the streams I/O. These approaches model the real-world hardware
operation of the 8237 DMA controller, of which there are two in every AT class
machine. Most audio cards make use of a DMA channel; some use two. Other
devices, such as the Turtle Beach Multisound card, make use of memory mapping
with on-board memory, using the CPU to move the data instead of a DMA channel.
Since only the metaphor of single-block and continuous modes are characterized
in the interface, these non-DMA devices can easily be supported.
When a VBE/AI Wave device is opened, the application is given a copy of it's
services structure. Listing One, page 62 illustrates the Wave Services
structure. The data fields in the structure are for identification purposes or
reserved for future updates of the VBE/AI release. The remaining portion of
the services structure presents subroutine calls for the application and two
32-bit far callbacks for the application to fill out. These callbacks allow
the VBE/AI code to call the application on certain events.
The services can be lumped into three groups: housekeeping, process control,
and block I/O. Each of these services is described in Figure 4. Note
wsTimerTick. In the VBE/AI definition, the application "owns" the system-clock
timer. The VBE/AI device specifies how many callbacks per second are needed to
keep running. The application is responsible for calling this subroutine at
the requested rate.
The two callbacks from the device to the application provide the application
with real-time notification that certain events have occurred. These two
callbacks indicate the completion of block I/O performed by the device. There
is one callback for completion of PCM playback, and one for PCM record; see
Figure 5. The two-callback approach was adopted so that full duplex (that is,
simultaneous play and record) could be performed using the VBE Audio
Interface.


MIDI Interface


The MIDI device class in its simplest form uses the metaphor of a MIDI
transmitter for playing notes. This is similar to the Windows approach that
views all musical devices, such as an FM chip, wave-synthesis card, or MIDI
port, as a common device type. The VBE/AI MIDI device is defined as a
general-MIDI device, so its channel and patch (instrument) assignments are
consistent with the International MIDI Association's definition of general
MIDI. Unlike Windows, though, there is no differentiation between low- and
high-end synthesizers. This scheme, within Windows, was implemented by
allocating channels 1-10 for the high-end synthesizer, and channels 11-16 for
a low-end synthesizer. In the general-MIDI specification (and therefore in the
VBE/AI interface), only channel 10 is recognized as a special channel. All
others are free to be used any way the application desires. (Incidentally,
channel 10 in the general-MIDI specification is the percussive instrument
channel.)
When a VBE/AI MIDI device is opened, a copy of its services structure is given
to the application; Listing Two, page 62 illustrates this structure. The
structure, like the wave-services structure, contains few data variable,
several subroutine calls, and a couple of callbacks to the application.
The only data variable presently worth looking at is the mspatches bit field.
This array of 256 bits indicates which general-MIDI patches are currently
preloaded, and thus available for use. For simple MIDI transmitters and
receivers, this bit field could have all bits set, since it can only "assume"
that the external device contains the required patches. For devices like an FM
synthesizer that have no memory, this table may have all bits set to 0, thus
indicating it will need patch data from the application before the instrument
is played. Later on, I'll discuss the VBE/AI unique approach to loading patch
data into the device.
Again, the services can be lumped into three groups: housekeeping, patch
handling, and MIDI I/O; see Figure 6. Additionally, there are two callbacks
from the device to the application to provide the application with real-time
notification that certain events have occurred. The first callback allows
downloaded patch memory blocks to be returned to the application. The second
callback lets VBE/AI MIDI receivers send inbound MIDI data to the application
in real time; see Figure 7.
One issue addressed by the VBE/AI interface is that most PC MIDI hardware
doesn't contain full general-MIDI patch sets. For devices, such as the OPL2 FM
chip (as found on the original Adlib FM card), there's no memory within the
chip to hold all the general-MIDI instruments. So, either the driver takes up
an exorbitant amount of memory to cache the instrument bank, or the
application will have to provide the patches from an external source, such as
a disk-resident library. The later approach of storing the patches to reside
on disk has been adopted for the VBE audio interface. To do this, the
architecture has been divided in two parts. The first part is the
hardware-specific driver to translate the MIDI stream into audible sound, as
described earlier. The second part is an optional disk-resident,
vendor-specific, library that holds the individual instrument settings, known
as patches.
The role of the patch library is to allow the application to present a given
patch to the hardware before the instrument is played. To integrate hardware
dependencies in this approach, the application need only "understand" the
skeletal structure of the library. In other words, the app must know how to
get to the patch data, but doesn't have to understand the contents of the
patch. When a program change is required, the application will search the
library, extract the patch, and pass this to the device using the
msPreLoadPatch subroutine. For patches larger than available memory, a
protocol has been defined for sending patch data in smaller chunks.
One benefit to this architecture is that applications can substitute their own
patches for the ones in the vendor's library. There's no rule saying the
patches must come from the vendor's own library. Of course, this does mean the
application is delving into hardware dependencies.


Volume Control


The VBE/AI Volume services is an API for controlling both volume and mixer
channels. Within the API are setting controls for stereo volume, sound
positioning within a 3-D space using a spherical-coordinate system, parametric
equalizer, and filter control. The 3-D sound positioning, which is derived
from the virtual-reality field, is the most exciting aspect of this interface.
In new products coming to market this year, 3-D audio will become an important
selling feature. Two-dimensional schemes such as Qsound and Surround Sound can
also work with this interface.
The approach to the volume/mixer class is different than the MIDI and Wave
device classes. When opened, the MIDI and Wave devices are considered owned
until closed, whereas a volume device can be acquired by any number of
applications, and is thus not exclusively owned. This way, the user may have
volume control through such things as a TSR, even if the application does not
provide one.
An interesting aspect is that volume service APIs are acquired without
performing an Int 10 Device Open function these services are acquired through
the functions rather than the device open functions.
When a VBE/AI Volume device is acquired, a copy of its services structure is
given to the application; see Listing Three, page 62. The structure (like the
wave-services structure) contains few data variables, and several subroutine
calls. The services can be lumped into two groups: housekeeping, and device
control; see Figure 8.



Next Up


In future releases of the VBE/AI specification, you can expect to see a 32-bit
audio interface and sound-effects processing control. In addition, VESA has
active working committees developing open standards in the areas of video
interfacing (VAVI), feature connector (VAFC), and multimedia data channels
(VM-Channel).


DOS and Software Standards for Audio


At the Game Developer's Conference held earlier this year in Santa Clara,
California, Tom Rettig of Broderbund Software held a round-table discussion on
the sound issues facing software developers. A list of topics was fielded from
the audience and rated by importance. By an overwhelming majority, the
number-one topic was the need for a DOS audio software standard.
You may be wondering why there is such concern when the world seems to be
moving to platforms like Windows and OS/2 that already support an audio
interface. Indications are that the games industry is leaning towards staying
in DOS via 32-bit DOS extenders. DOS is still the optimum game environment due
to having direct control over the machine. So, if the games don't move soon to
these new OSs, PC-audio industry revenue will be tied to DOS for quite a
while. There's still no compelling business audio application that creates
revenue like the DOS entertainment market.
It may seem ironic that games developers are asking for a software interface,
yet want to have direct control over the hardware. In fact, most DOS software
developers demand, as a inalienable right, the knowledge to directly program
the hardware. Up until 1990, software vendors had few audio-hardware
selections, so support was a relatively easy endeavor. For music, there was
the Adlib card and Roland MPU-401. For voice, there was Creative Labs'
SoundBlaster and Covox Speech Thing. But when all else failed, the lowly PC
speaker could still be used to play tones, and rudimentary sounds.
Since 1990, an explosion of audio hardware, with varying architectures has
entered the PC sound market. Each software company that has maintained
in-house support for sound has had to carry the burden of adding support for
these new cards. As a result, third-party software libraries for audio became
very popular in 1991 as the number of cards grew.
But due to the plethora of architectures, the industry is swaying under the
burden of trying to support too much with too little resources. There are well
over a dozen hardware interfaces a developer will need to support, in order
support all PC audio-board products. The showing of hands at the Game
Developer's Conference suggests that software developers are ready to take one
step back off the hardware, but first need a uniform approach to programming
audio. The trade-off of using a hardware-independent software interface, vs.
creating, maintaining, and growing a complete library, is now a prudent choice
of engineering time.
Not everyone has come to this conclusion, however. A few die-hard game
developers have retreated to a position that they will only support a few
audio cards. Unfortunately, this doesn't serve the customer, who might have an
unsupported audio board, nor does it foster development of advanced
technologies. From a competitive standpoint, these developers will ultimately
suffer, since competitors will gain a technical one up by taking advantage of
the new whiz-bang features found on the newer cards. The old lesson is,
standing still only allows your competitors to pass you by.
--D.C.
Figure 1: Standard Int 10H functions
Figure 2: VESA Int 10h extensions
Figure 3: VESA Int 10h extensions for the audio interface
Figure 4: Wave-device services (a) housekeeping, (b) block I/O, (c) process
control
Figure 5: Callback functions
Figure 6: MIDI services: (a) Housekeeping, (b) for MIDI I/O; (c) patch
handling
Figure 7: MIDI callback functions
Figure 8: Volume and mixer services: (a) housekeeping; (b) device control
[LISTING ONE]

typedef struct {

 // housekeeping

 char wsname[4]; // name of the structure
 long wslength; // structure length

 char wsfuture[16]; // 16 bytes for future expansion

 // device driver functions

 long (pascal far *wsDeviceCheck ) (int, long );
 int (pascal far *wsPCMInfo ) (int, long, int, int );
 int (pascal far *wsPlayBlock ) (void far *, long );
 int (pascal far *wsPlayCont ) (void far *, long, long);
 int (pascal far *wsRecordBlock ) ();
 int (pascal far *wsRecordCont ) (void far *, long, long);
 int (pascal far *wsPauseIO ) ();
 int (pascal far *wsResumeIO ) ();
 int (pascal far *wsStopIO ) ();
 void (pascal far *wsTimerTick ) ();
 int (pascal far *wsGetLastError) ();

 // callback filled in by the application

 void (pascal far *wsApplPSyncCB ) (int, void far *, long ); //play
 void (pascal far *wsApplRSyncCB ) (int, void far *, long ); //rec

} WAVEService, far *fpWAVServ;



[LISTING TWO]

typedef struct {

 // housekeeping

 char msname[4]; // name of the structure
 long mslength; // structure length

 // runtime data

 int mspatches[16]; // patches loaded table bit field
 char msfuture[16]; // 16 bytes for future expansion

 // device driver functions

 long ( pascal far *msDeviceCheck ) (int, long );
 int ( pascal far *msGlobalReset ) ();
 int ( pascal far *msMIDImsg ) (char far *, int );
 int ( pascal far *msPollMIDI ) (int );
 int ( pascal far *msPreLoadPatch) (int, int, void far *, long );
 int ( pascal far *msUnloadPatch ) (int, int );
 void ( pascal far *msTimerTick ) ();
 int ( pascal far *msGetLastError) ();

 // callbacks filled in by the application

 void ( pascal far *msApplFreeCB ) ( int, int, void far * );
 void ( pascal far *msApplMIDIIn ) ( int, int, char );

} MIDIServ, far *fpMIDServ;


[LISTING THREE]

typedef struct {

 // housekeeping

 char vsname[4]; // name of the structure
 long vslength; // structure length

 char vsfuture[16]; // 16 bytes for future expansion

 long (pascal far *vsDeviceCheck ) (int , long );
 void (pascal far *vsSetVolume ) (int , int , int );
 long (pascal far *vsSetFieldVol ) (int , int , int );
 int (pascal far *vsToneControl ) (int , int , int );
 long (pascal far *vsFilterControl) (long );
 void (pascal far *vsOutputPath ) (int );
 void (pascal far *vsResetChannel ) ();
 int (pascal far *vsGetLastError ) ();

} VolumeService, far *fpVolServ;








Special Issue, 1994
DSP and Audio Compression


Enhancing audio compression performance


Jay, a senior member of the technical staff at Texas Instruments, has been
involved in every generation of TMS320 digital-signal processors. He has also
been involved in the development of a number of audio and communications
algorithms and systems. Jay can be contacted at Texas Instruments, P.O. Box
1443, M/S 701, Houston, TX 77251-1443.


By its very nature, audio compression is a real-time process. You simply
cannot afford to have gaps during the processing of data that lead to dropouts
in the output audio. Historically, the architecture of the PC's operating
environment--that is, its operating system and hardware--wasn't designed to
optimally support real-time operation. With increasing demands placed on the
PC's host processor to support graphics and provide access to file systems,
audio processing is relegated to whatever MIPs are left over. This means that
as the system designer, you have two choices: shoehorn the audio processing
into the leftover MIPS, or have a dedicated coprocessor that can meet the
real-time requirements of audio processing.
This is where the benefits of digital-signal processors (DSPs) become
apparent. The architecture of DSPs has, from its inception, been optimized to
support the needs of real-time systems. The DSP is designed to do a great deal
of work in each clock cycle. This unit of work is consistent and includes the
ability to perform a multiply- accumulate calculation (MAC) in the same time
the processor can perform an add calculation or other ALU operation. Many
microprocessors still require multiple cycles for add and ALU operations,
although RISCs only require one. Since the essence of many of the
audio-compression techniques (LPC, ADPCM, MPEG) is a "sum-of-products" type
calculation, DSPs provide perfect support for them. This article explores the
various audio-compression techniques and how DSPs can be used to improve
performance.


Audio-compression Techniques


To reconstruct an analog signal, basic sampling theory dictates that the
sample rate be based on the signal's bandwidth. For audio signals, the
sampling frequency for selected inputs is shown in Figure 1. The second major
attribute of a sampled signal is the resolution, or number of bits, that are
used to digitize the input waveform. Most PC sound systems offer 8 bits of
resolution, while music or CD-quality systems require 16 bits of resolution.
These two attributes--sampling rate and resolution--determine the overall data
rate of the system and the total amount of memory required to store a
digitized signal.
A basic technique to capture digitized audio and preserve it is called linear
pulse code modulation (PCM). PCM samples an audio signal at a given frequency
with a selected resolution to generate a digital representation of an audio
input. PCM requires a large amount of memory to store a digital representation
of a sampled signal. One minute of uncompressed, stereo, CD-quality audio can
consume close to ten Mbytes of storage.
Therefore, one of the biggest challenges associated with the use of audio
information is the implementation of efficient compression algorithms that
minimize the amount of storage space required for a sampled signal. By
selecting the most effective audio-compression technique, you can minimize the
amount of memory required to store a sampled audio signal while producing an
output audio signal of acceptable quality. Some of the more popular
audio-compression techniques are shown in Figure 2.
One popular type of audio compression, pioneered by the telecommunications
industry, is LogPCM. This compression technique is based on a form of
compression that is logarithmic instead of linear. One of two logarithmic
functions are generally used: the mLaw function in Example 1(a), and the aLaw
function in Example 1(b). The logarithmic nature of the compression algorithm
provides nearly 2:1 compression by allowing the system to have 14 bits of
dynamic range contained in a sample that only requires 8 bits to physically
store. There are many devices currently available to implement this
compression technique, and for the storage of voice-only signals, LogPCM is
adequate to the task.
Another variation of PCM encoding, called "adaptive differential PCM" (ADPCM),
gives a 4:1 compression ratio. It uses the history of the input audio signal
as a method of predicting the changes in that signal. Instead of encoding the
actual input audio signal's value, the ADPCM technique encodes the difference
between the value of the input audio sample and the same signal's predicted
value. If the prediction, which is typically done with a filter process
through a feedback loop, is good and the difference is small, the difference
can easily be encoded with fewer bits. The quality of an audio input
compression using the ADPCM technique is largely determined by the amount it
is oversampled and the length of the signal's stored history.
The subband coder (SBC) can provide additional compression by combining
elements of ADPCM with a technique for segmenting the signal into frequency
bands. The significance of each frequency band is either predetermined for the
coder or determined by additional analysis at the time of compression. Those
frequency bands deemed to be more important to the overall quality of the
signal can then be allocated a proportionally larger number of the bits used
in coding. This process does not waste bandwidth on unused or less-significant
frequency bands.
MPEG audio, a method of compression which is being widely considered for a
large number of high-quality audio applications, uses a subband coder
technique. Figure 3 provides a block diagram of the MPEG encoder and decoder.
To maintain quality with increased compression, MPEG relies on a masking
model. In this model, processing noise for one frequency band is covered up or
masked by the signal in an adjacent frequency band. The use of multiple
frequency bands and the masking model enables MPEG to be used for effective
audio compression of signals with a 20-kHz audio bandwidth. Compression ratios
as high as 24:1 can be realized with MPEG compression.
Linear predictive coding (LPC) is a technique which provides even greater
levels of compression, but is applicable to voice rather than general audio.
LPC was designed for human speech and uses a production model that takes
advantage of speech-production characteristics--that is, the human vocal tract
and the assumption of a single sound source. Using this model, compression
ratios of 32:1 and greater are possible. One of the more recent variations of
LPC, codebook-excited linear prediction (CELP), offers superior quality over
previous versions. A block diagram of CELP is shown in Figure 4.


Audio-compression Quality


While the quality of each compression technique shown in Figure 2 is
subjective, there's general agreement that as the compression ratio increases,
the quality decreases, at a given sample frequency. While this statement is
generally true, there are certain exceptions based on the compression
technique. In other words, linear PCM only maintains its superiority over
LogPCM up to a point. Below that point (at fewer bits per sample), the noise
introduced by reducing the number of bits in the PCM data is more noticeable
than that introduced by the compression technique used by logPCM.
Basically, every compression technique has a limit on quality for different
levels of compression. This means no single compression technique is a panacea
for all potential applications. The selection of the proper audio-compression
technique is a key element in the design of any enhanced audio application.


Selecting an Audio-compression Technique


The selection of the appropriate audio-compression model for a particular
application is a complex process. None of the compression techniques is
perfect for all applications, and the designer or system user must always make
trade-offs when selecting an audio-compression algorithm. Figure 5 provides a
means of visualizing the trade-offs between audio-compression techniques. The
three-dimensional space shown is defined to be the complexity of the
compression algorithm, the bit rate of the compressed data, and the audio, or
acoustic, quality of the signal. In all three cases, the further the attribute
is from the origin, the better it is. Although other attributes can be
assigned to each of the axes, most are dependently linked to each other.
When a point on each axis is chosen, a plane in the three-dimensional space is
defined. This plane remains relatively constant within the constraints of
system cost, where system cost is defined in terms of processing power (MIPs)
and memory requirements, both for processing and for storage or transmission.
Therefore, to maintain a given level of audio quality, the system has to bear
the additional cost of a higher bit rate or a more-complex algorithm. If the
system cost cannot be increased, some trade-offs will have to be made between
the algorithm's complexity and bit rate.
There are also several other dependent factors that you must consider when
selecting a compression technique. The additional factors that affect the axes
attributes are listed under their respective axes.


DSP-based Audio Compression


Although DSP techniques have been used in many fields for decades, it has only
been since the early '80s, with the introduction of TI's TMS320 series of
single-chip DSP devices, that these devices have had enough power to perform
the calculations required to implement real-time audio-compression techniques
for compression ratios greater than 2:1, while maintaining high levels of
quality. Much of the early demand for DSP audio compression was driven by the
telecommunications industry, where various techniques such as ADPCM have been
in use for the last ten years.
A compelling example of just why DSP-based, or for that matter, any other type
of audio compression, is important, can be made using this article as a test
case. For the sake of argument, let's make the following assumptions: This
article contains about 3000 words. On average, a word consists of about six
ASCII characters, including spaces. A typical speaking rate is 100 words per
minute. In order to retain good voice quality, I'll use a sample rate of 8000
samples per second and a resolution of 16 bits per sample.
Making these assumptions means the ASCII representation of this article
requires nearly 20 Kbytes to store. If it takes 30 minutes to orate this
article, a quick calculation shows that using basic PCM encoding means the
audio representation of this article would require about 28.8 Mbytes to store!
If I were to use the ADPCM compression technique, the storage requirements for
this same example would shrink to 7.2 Mbytes. If the more-advanced CELP
audio-compression technique were used, the audio representation of this
article would become 900 Kbytes.
As Figure 6 demonstrates, while audio-compression techniques can vastly
minimize the storage needed for a given sample, they can't reach the density
of pure ASCII. The question you must ask is whether or not the advantages of
audio outweigh the disadvantages, such as the associated overhead. Based on
the advantages of incorporating audio into an application, the proper
audio-compression technique makes this decision much easier to make.


DSP and Audio Compression


An important aspect of the DSP-based system, especially as it relates to the
various requirements of multimedia, is its ability to process and schedule
multiple tasks simultaneously. The design of a flexible DSP-based system
guarantees that each and every real-time task will have the processing
resources it needs. As compression algorithms become more complex and the host
processor becomes more overloaded, the need for a dedicated multitasking
DSP-based solution becomes more pronounced.
Another important aspect of a DSP-based approach is the efficiency of its I/O
system. By incorporating a powerful, direct memory access (DMA) capability, a
DSP-based system is able to use most of its processing power for actual
computation, not for moving data in and out of the system.
As previously mentioned, virtually all audio-compression techniques are
typified by the MAC cycle. Since the architecture of the DSP has been defined
to allow MAC cycles to occur efficiently while doing other tasks in
parallel--such as fetching data from memory or storing results back to
memory--audio-compression techniques map well onto DSP-based hardware. This
"hand-in-glove" fit of a DSP-based architecture to implement audio-compression
algorithms is certainly more than coincidence. The latest techniques for audio
compression, including MPEG and LPC, all require some sort of input filtering,
such as infinite-impulse response (IIR), or finite-impulse response (FIR)
filtering, in order to operate. This filtering process, which is very MAC
intensive, is a perfect application for the DSP.
MPEG uses another key DSP technique, the Fast Fourier Transform (FFT), as an
important part of its audio-compression algorithm. MPEG uses the FFT to
determine where most of the energy in the input signal is and can therefore
allocate the distribution of masking bits appropriately. Since the DSP
supports real-time FFT processing, this distribution of masking bits can occur
in real time, thus improving the overall compression algorithm and the output
signal.



Software to Simplify System Development


The use of audio compression in a software application can seem like an
overwhelming task, but in reality, a flexible, DSP-based solution simplifies
this task significantly. Choosing the right system allows you to include
audio-compression techniques without a corresponding increase in overall code
complexity. This means you will be able to focus on the high-level metrics
associated with audio compression; for example, the number of disk-storage
bytes one minute of recorded speech or audio will require, or which sound
quality is appropriate for a particular application. In order for this to take
place, a simple interface to the DSP is a must.
A typical DSP-based system block diagram is shown in Figure 7. This block
diagram, which is based on TI's TMS320 system, is typical of the standard
DSP-based, multimedia solutions available today. A system such as this allows
you to support many different types of audio compression without the overhead
associated with the development of a new software driver for each one. Since
the audio files are stored with information about the sample rate, the number
of bits per sample, and the encoding algorithm, a DSP-based system allows a
high-level API to be used. As new compression methods become available, the
flexible nature of this type of system will accommodate them with a minimal
impact on the system design.
As shown in Figure 7, there are two main software interfaces in a DSP-based
system that are of interest to a system designer: the applications interface
and the task developer's interface. The software support required for each of
these, while very different, must be comprehensive to provide the system
designer with the tools needed for a complete software-development
environment.


Applications Interface


For most software developers, the applications interface--the interface
between user applications, the operating system, and the DSP-based system--is
most important. The key attribute for the applications interface is to provide
a simple, standardized interface between the application and the DSP-based
system. This allows you to easily incorporate advanced audio-compression
techniques in new software applications and to make the use of different types
of audio compression transparent to the application itself.
One method of accomplishing this is through the use of APIs to extend the
base-level Windows Media Control Interface (MCI) device drivers. These
high-level APIs would, for example, allow the application to simply call a
driver that would play a compressed audio file. The driver itself recognizes
the type of compression algorithm used and plays the file back accordingly.
All of these operations would take place transparently to the user because of
the advanced programmable nature of the DSP-based system. This means an
application can easily incorporate audio compression to enhance its
functionality and marketability.


Task Developer's Interface


The task developer's interface is the low-level software interface to the
DSP-based system that allows you to produce new applications-interface drivers
and write new tasks (such as compression algorithms) for the DSP device
itself. This level of software interface requires a full set of
software-development tools, including debuggers, compilers, assemblers, and
linkers as well as access to the DSP operating system.
The key to accessing this level of the system is the ability to incorporate
the latest advancements in audio compression in your application without
changing the hardware. By writing the underlying code associated with a new
audio-compression algorithm, you can be assured that upgradability and
functionality will follow your systems, increasing their value to the end
user.
One example for isolating the functionality of a new process from the
underlying hardware is shown in the code segment in Listing One, page 69. This
task developer's model is relatively simple and provides a standard
methodology for handling data I/O and process control. Data I/O handled in
this manner minimizes the amount of overhead associated with the task of
interfacing the system to data from a continuous source, or sink, such as A/D
or D/A converters, by automatically keeping track of where the data is at all
times.
Listing One contains two basic, key forms of task communications that are
important to the task developer. The first involves the concept of interfacing
a stream of continuous data, in either a compressed or uncompressed form, to
the application via general-purpose connectors (GPCs), which are circular
buffers that serve as the interface to the audio-compression algorithm (one
GPC for the input, and one GPC for the output) and the I/O data streams. These
circular buffers allow the algorithm to have free access to the data on an "as
needed" basis without being burdened with the additional overhead and
peculiarities associated with controlling the data transfers in and out of
physical I/O channels.
The GPCs also provide an excellent means for inserting or removing additional
tasks from a processing flow without requiring changes in each task to account
for the presence or absence of the additional tasks. As an example, a mLaw
compression task could be added following an LPC synthesis task if a mLaw
CODEC is being used to convert the digital signals to analog in one system. In
another system, where a linear CODEC is used, the mLaw compression task would
be removed. In either case, the LPC synthesis task would process data in
exactly the same way.
The second key concept involves the use of an intertask communications block
(ITCB), which is a form of state control for system communications. The
purpose of the ITCB is to pass pertinent information, such as task status and
process commands, between the host processor and the DSP or between two DSP
tasks. A typical application for an ITCB would be to store the volume-control
information associated with the compressed-audio data. In this application,
the ITCB would hold the audio-gain information for a particular data sample,
which both the host processor and the DSP subsystem have direct, shared access
to.
Listing One includes two functions, Echo and ModifySample, which demonstrate
the processing that may be done on typical sample data. The Echo function
simply demonstrates how to read data from the input GPC and write data to the
output GPC. The purpose of ModifySample is to show where the data-compression
algorithm would be placed in the code to implement the selected compression
functionality.


Conclusion


The potential for audio compression in today's multimedia market is
practically limitless. As more applications demand audio, practical system
limitations, such as cost and system size, will dictate the need for audio
compression. And while the current usage of enhanced audio in multimedia
systems is impressive, it pales in comparison to the potential for future
applications. Although the applications for enhanced audio compression
continue to evolve, compression techniques are all based on fundamental
concepts that will remain applicable well into the future. The selection of a
programmable DSP-based system can help you implement the next generation of
multimedia applications.


References


Lynch, Thomas J. Data Compression: Techniques and Applications. New York,
N.Y.: Van Nostrand Reinhold, 1985.
 Figure 1 Sampling frequency. The amount of data that will need to be stored
for an audio signal is directly related to the quality of the output you wish
to produce. Higher-quality outputs require more data to be stored and directly
affect the type of compression technique that you should use.
 Example 1: Logarithmic functions used in LogPCM compression: (a) the mLaw
function; (b) the aLaw function.
 Figure 2: Common algorithms and compression ratios.
 Figure 3: A single MPEG audio channel. MPEG audio is a subband algorithm with
adaptive quantization.
 Figure 4: CELP is a relatively new, enhanced version of LPC. This algorithm
uses the specific qualities of human speech to provide maximum audio
compression.
 Figure 5: The trade-offs between the basic elements of audio-compression
algorithms modeled in three dimensions. If more than one of the dimensions is
fixed at some upper bound, the attributes of the other dimensions must be
varied and a compromise made.
 Figure 6: Results of compressing a 3000-word article using various
audio-compression techniques. PCM, ADPCM, and LPC assume 8K samples per
second, 16 bits per sample.
 Figure 7: Block diagram for a fully programmable, DSP-based audio-compression
system based on Texas Instruments' TMS320 system.
[LISTING ONE] (Text begins on page 63.)

typedef struct gpc_t {
 void *gpc_putp; /* Get/Put Pointer */
 unsigned short gpc_size; /* Size of GPC, in bytes */
 unsigned short gpc_mwpf; /* Max. words to be written in 1 frame */
 void **gpc_aput; /* Address of owner's put/user's get pointer */
 unsigned short gpc_prot; /* protocol to be used */
} gpc_t;

#define gpc_aget gpc_aput /* Aliases for owner/user */
#define gpc_getp gpc_putp

 .
 .
 .
extern gpc_t vio_input, vio_output;
 .
 .
 .
extern struct vioitcb_t {
 int hioctl; /* 0=>tel, -1=>handset connected to processor */
 int inputctl; /* 0=>handset, -1=>microphone input */
 int shactive; /* 0=>on-hook, -1=>off-hook */
} *vioitcb2;

 /*
 * prototypes for routines declared later
 */
void Echo(gpc_t * src, gpc_t * dst, short n);
int ModifySample (short s);
 .
 .
 .
short oh;
void main(void)
{
 vioitcb2->hioctl = actl;
 vioitcb2->inputctl = amic;
 oh = vioitcb2->shactive;
 if (oh) { /* Are we off-hook? */
 Echo(&vio_output, &vio_input, VIO_SPF);
 } else {
 GPCFillM128(&vio_input, VIO_SPF, 0);
 GPCAdvance32M128(&vio_output);
 }
}
 /*
 * Echo
 *
 * src is a pointer to the input GPC
 * dst is a pointer to the output GPC
 * n is the number of words (16-bit) to move
 */

 void Echo(gpc_t * src, gpc_t * dst, short n)
{
 short *s;
 short *d;
 short sample;
 s = src->gpc_getp;
 d = dst->gpc_putp;
 while (n-- > 0) {
 s = GPCIncM128(s);
 sample = *s;
 /* NOW MODIFY THE SAMPLE */
 sample = ModifySample (sample);
 d = GPCIncM128(d);
 *d = sample;
 }
 src->gpc_getp = s;
 dst->gpc_putp = d;

}
 .
 .
 .

End Listing
























































Special Issue, 1994
Build Your Own RS-232 Sound System


Digital audio from your RS-232 port




Dennis Cronin


Dennis almost completed an EE degree before being lured into the sordid world
of fast computers, easy money, and loose connections. He currently specializes
in UNIX driver development for Solaris and HP-UX operating systems and can be
contacted at denny@cd.com.


Have you ever thought about using the extra RS-232 port on your computer as an
audio output? Probably not, right? However, it can be done. This article
describes a rather unique method of coding data that will produce an audio
signal when streamed to a speaker attached to the RS-232 port. Granted, the
sound isn't CD quality, but it's still quite intelligible. I'll start by
examining how to get an audio signal from what was only designed to be a
serial data port. Then I'll look at a program that outputs a Microsoft Windows
.WAV audio file to the comm port of your PC. Finally, I'll provide the wiring
diagrams necessary to build your own speaker to attach to the comm port of
your PC.


From ASCII to Audio


To understand how to turn a mundane ASCII character stream into music, you
need to look beneath the characters, at the underlying serial bit stream
coming from the UART. Although a complete primer on asynchronous
communications is beyond the scope of this article, I'll cover some basics of
where the audio comes from.
An RS-232 asynchronous communications line idles at a state defined as being
the "1" state of the line. This is also referred to as a "marking" state. When
you shove, say, the letter A into the transmit register of the UART, it
generates a start bit. A start bit is always a 0, and the transition from the
marking state of 1 to that first start bit of 0 is the critical point of
synchronization whence all the rest of the bit stream is referenced.
After the start bit, the UART proceeds to send the bits of the character,
starting with the least-significant bit (LSB). An A is 0x41, so the LSB is a
1. Assuming that the line settings are for 8 bits and no parity, you'll see
the UART crank out all 8 bits of the character, immediately followed by a stop
bit, which is always a 1. The stop bit is also a critical point, since the
receiving UART uses it as a simple check to make sure synchronization was
maintained throughout the reception of the character. If it doesn't see a 1 in
the stop-bit slot, it reports the familiar "framing error" status. Figure 1
shows what the letter A looks like as it emerges from the UART, all framed up
with start and stop bits.
While the letter A is interesting enough, the character U is really
interesting. As you can see in Figure 2(a), U (defined as 0x55) happens to
generate a completely alternating bit pattern, including the start and stop
bits. When a steady stream of characters is transmitted, the stop bits are
immediately followed by the start bit of the next character. Figure 2(b) shows
that a steady stream of U characters is actually just a continuous square
wave, the frequency of which is simply half the baud rate programmed at the
UART.
If you were to hook up your speaker to a UART while it generated a steady
stream of Us, you'd hear a high-pitched tone as this square wave wiggled the
speaker back and forth. But suppose you set the baud rate up very high, say at
115,200 baud, which is the fastest baud rate commonly available on PCs. Well,
the speaker can't wiggle that fast, nor can we humans hear a frequency that
high. Instead, the speaker just sees the effective average value of the
waveform, in this case, 0 volts, since there is an equal distribution of 1s
and 0s
Now, suppose you pop a character with an extra bit turned on into the stream
of Us every once in a while. Your trusty Us contain four 1s and four 0s, so
pick a character that has five 1s and three 0s--a W (0x57), for instance. When
you shove the W in, the speaker briefly sees an imbalance in the quick,
alternating tugs the stream of Us provides. The response to this brief
imbalance manifests itself as an exciting "click" every time you shift out a
W. But this isn't quite music yet.
What happens when you turn on six 1s, and only do two 0s? The speaker will be
a little more imbalanced and will move a little farther. How about seven 1s
and one 0? It turns out that you can "unbalance" that speaker by four distinct
values on each side of the center value created by the steady stream of Us.
Thus, you have nine "positions" you can ask the speaker to assume depending on
the mix of 1s and 0s you send it. It's a little DAC (digital-to-analog
converter), albeit a humble, 3-bit one (plus the zero position). By picking
the right character, you can control the movement of the speaker.
So how many "samples" per second can your DAC do? With 8 bits per character,
plus the start and the stop bit, you have a total of 10 bits for each
character. At the 115,200-baud rate, this works out to 11,520 characters (or
samples) per second. So your DAC is capable of sustaining a sample rate of
just over 11K samples per second, which isn't quite CD quality, but is still
good enough for voice-grade audio.
In most digital-audio systems, the DAC is immediately followed by a
sharp-cutoff lowpass filter. This prevents aliasing and provides the necessary
smoothing function to turn the discrete steps back into smooth analog. You're
going to have to forgo this luxury and settle for the simple lowpass action
resulting from the speaker's mechanical inertia. It doesn't matter that much.
The three bits, plus 0 output resolution is still the biggest limiting factor
in terms of overall "fidelity" (and I use that term very loosely).
Now that you have an understanding of how to coax some audio out of the RS-232
port, let's look at what you have to do to actually play the Microsoft Windows
WAV files.


Playing WAV Files Over RS-232


The most common WAV file format contains 8-bit mono audio data in linear pulse
code modulation (PCM) format using a sample rate of 11,000 samples per second.
Other formats are possible, but so far this is the most common. The 11-kHz
sample rate happens to be close enough to our own 11,520 DAC sample rate, that
a sample-rate conversion isn't even necessary. Sounds will play back slightly
less than a semitone too high, but that's close enough for rock 'n' roll.
Information about the actual file format is contained in a RIFF header at the
front of the file. For simplicity's sake, you're just going to ignore that
header and assume the previously stated sound-file parameters. Well, you're
not going to completely ignore the header_in fact, you're going to "play" it.
The header is so short (in terms of audio data) that the click it adds to the
beginning of the sound is almost imperceptible. Since the program is, by
definition, not particularly hi-fi, it's just not worth the extra effort to
parse that header, so we're going to make a KISS design decision right off
that bat. (Header? I don't see no header.)
Linear PCM is probably the simplest audio format to deal with, as each
instantaneous value of the audio waveform is directly represented by a number.
In this case, with 8 bits of resolution, the numbers vary from 0 to 255, with
128 used as the zero reference. As these numbers swing above and below the
zero reference, you need to assign ranges to the nine possible output values
provided by your cheesy DAC.


COMAUDIO.C


The program that makes it possible for you to blast sound out of your RS-232
port is COMAUDIO.C (Listing One, page 73). The area of COMAUDIO.C that's of
interest is in the conversion of the 8-bit PCM value to the character for
output to the serial chip. As you read the audio file in, the convert()
subroutine is called to map each input byte of the audio file onto a value
which will ultimately be used to index the nine-slot array of actual output
characters. The convert() routine applies a bias and a scaling factor to cause
each input value to land in one of these nine possible output zones. If
necessary, the signal will be clipped to guarantee that it stays within the
range of 0--8. Later during output, this index will select one of the output
characters from the dac[9] array.
The characters in the dac[9] array are arranged in order of increasing numbers
of 1s. Except for the values for all 0s (0x00) and all 1s (0xff), there are
several possible candidates for each position. The choice was based on
minimizing the low-frequency content of the character, so as to reduce
subharmonic squeal from the 115,200 carrier as much as possible.
Prior to commencing actual playback, interrupts are disabled. This is
necessary, since even the brief disturbance of a timer tick will pose enough
interruption that the CPU can fall behind the serial chip. If this happens,
the line drops into the marking state, yielding a very audible clicking sound.
The COMAUDIO.C program was compiled and tested under Borland Turbo C 2.0 and
Borland C++ 1.0. Although the program was coded on a 486/33 under DOS and
Windows 3.1, I also tested it successfully on 286-class PCs. Similar versions
of COMAUDIO.C have been tested on a SPARC 1+ under SunOS 4.1.3. If you have a
slower machine, you might need to do some optimizing of the output function.
(Or better yet, you might just buy a new computer. Jeez, get into the '90s.)
After you compile the program, the only thing left to do is hook up a speaker.
Sample data files in .WAV, .C3P, and .AU (Sun) format are available
electronically; see "Availability," page 2.


Attaching the Speaker


Figure 3 is a wiring diagram for the speaker. While just about any speaker
will work, a cheap replacement 8-ohm speaker (Radio Shack 40-1208 or
equivalent) is ideal. You actually want the frequency response to be somewhat
limited in order to help filter out some of the nasty, high-frequency digital
hash.
A capacitor of at least 100 mf, with a voltage rating of at least 16 WVDC is
put in series with the speaker to block DC. Polarity is important; make sure
you get the negative terminal of the capacitor attached to the speaker and the
positive terminal attached to pin 7 of the DB-25 connector.
The speaker will sound somewhat better if you enclose it in something. A
simple cardboard box works well enough; see Figure 4. Plus, this keeps your
friends guessing about what complex circuitry, that's capable of magically
turning the RS-232 into audio, lurks inside.
Hook the speaker up to one of the comm ports on your PC and give it a whirl.
Unless you've been living on a deserted isle, you can probably locate a copy
of TADA.WAV, a small sound file that ships with Windows 3.1, to use as a test.
Run COMAUDIO with TADA.WAV as an argument, specifying the proper comm port if
necessary, and you should hear delightful, although brief, strains of music
emanating from your proud little speaker.



ARPEGGIO.C


In addition to the WAV files that are available electronically, there are
plenty of places you can download more. But if you want something a little
more melodious to play right away, ARPEGGIO.C (Listing Two, page 74) can be
used to generate a sample audio file for playback through COMAUDIO. Simply
invoke it with a destination filename, and it will write out an audio file
containing a catchy little ditty. If your machine doesn't have a math
coprocessor, it can take a couple of minutes or more to generate the output
file, so be patient. ARPEGGIO.C is easy to modify, and you can change it to
generate electronic compositions of your own.
 Figure 1: The letter A, as seen emerging from the UART, all framed up with
start and stop bits.
 Figure 2: (a) The letter U (defined as 0x55) generates a completely
alternating bit pattern, including the start and stop bits; (b) a steady
stream of U characters is a continuous square wave, the frequency of which is
half the baud rate programmed at the UART.
 Figure 3: Wiring diagram for the speaker.
 Figure 4: New highs in low-fi: a roll-your-own speaker.
[LISTING ONE] (Text begins on page 70.)

/*
 COMAUDIO.C - uses PC com port to generate audio
*/
#include <stdio.h>
#include <stdlib.h>
#include <alloc.h>
#include <sys/stat.h>
#include <dos.h>

/* defs for low level access to serial chip */
#define SCC_DATA 0
#define SCC_INTCTRL 1
#define SCC_CTRL 3
#define SCC_STATUS 5
#define TXRDY 0x20
#define comout(scc_base,c) \
{ \
 while((inportb(scc_base + SCC_STATUS) & TXRDY) == 0); \
 outportb(scc_base + SCC_DATA,c); \
}

/* farinc - macro to increment far ptr */
#define farinc(p) { \
 p++; \
 if(FP_OFF(p) == 0) \
 p = MK_FP(FP_SEG(p) + 0x1000,0); \
 }

/* digital to ASCII analog conversion table */
unsigned char dac[9] = {0x00,0x08,0x12,0x29,0x55,0x6b,0xb7,0xef,0xff};

/* protos */
void main(int argc, char **argv);
int line_setup(int linenum);
void set_vol(int volume);
unsigned char convert(int c);

/*
 main
*/
void
main(int argc,char **argv)
{
 FILE *fp;
 unsigned char far *p, far *bufp;
 long i;
 int port = 1, volume = 5;
 register c,scc_base;

 struct stat statbuf;

 /* check arg cnt for sanity */
 if(argc < 2 argc > 4) {
 printf("Usage: comaudio [wavfile] [[port]] [[volume]]\n");
 exit(1);
 }

 /* if com port spec'd */
 if(argc > 2) {
 port = atoi(argv[2]);
 if(port < 0 port > 1) {
 printf("Use 0 or 1 to select com1 or com2 respectively.\n");
 exit(1);
 }
 }

 /* see if volume is spec'd */
 if(argc > 3) {
 volume = atoi(argv[3]);
 if(volume < 1 volume > 9) {
 printf("Volume should be in range 1-9.\n");
 exit(1);
 }
 }
 set_vol(volume);

 /* get length of sound file */
 if(stat(argv[1],&statbuf) != 0) {
 printf("Cannot stat sound file %s'\n",argv[1]);
 exit(1);
 }

 /* try to alloc mem to hold entire (nibble packed) file */
 bufp = farmalloc(statbuf.st_size / 2);
 if(bufp == NULL) {
 printf("Cannot allocate %lu bytes of memory for sound file\n",
 statbuf.st_size / 2);
 exit(1);
 }

 /* open sound file */
 fp = fopen(argv[1],"rb");
 if(fp == NULL) {
 printf("Cannot open sound file %s'\n",argv[1]);
 exit(1);
 }

 /* read entire file into mem */
 for(i = statbuf.st_size / 2, p = bufp; i--; ) {
 /* pack 2 converted vals per byte */
 c = convert(fgetc(fp));
 *p = c (convert(fgetc(fp)) << 4);
 farinc(p);
 }

 /* set up port */
 scc_base = line_setup(port);


 /* grab from buf and shove out com port */
 disable(); /* ints must be off for full "fidelity" */
 for(i = statbuf.st_size / 2, p = bufp; i--; ) {
 /* unpack to vals per byte */
 c = *p;
 comout(scc_base,dac[c & 0xf]);
 comout(scc_base,dac[c >> 4]);
 farinc(p);
 }
 enable(); /* turn interrupts back on */
 exit(0);
}

/*
 line_setup

 Sets up spec'd line for 115200, returns ptr to
 assoc'd chip channel.
*/
int
line_setup(int linenum)
{
 union REGS regs;
 int scc_base;

 scc_base = linenum ? 0x2f8 : 0x3f8;

 /* BIOS call does most of it */
 regs.h.ah = 0;
 regs.h.al = 0xe3; /* 9600,N,8,1 */
 regs.x.dx = linenum;
 int86(0x14,&regs,&regs);

 /* now talk nasty to the chip, write baud div = 1 for 115200 */
 outportb(scc_base + SCC_CTRL,inportb(scc_base + SCC_CTRL) 0x80);
 outportb(scc_base + SCC_DATA,1); /* write least sig */
 outportb(scc_base + SCC_INTCTRL,0); /* write most sig */
 outportb(scc_base + SCC_CTRL,inportb(scc_base + SCC_CTRL) & 0x7f);
 return(scc_base);
}

static atten,bias;

/*
 set_vol

 Sets conversion factors for spec'd volume.
*/
void
set_vol(int volume)
{
 atten = (10 - volume) * 3;
 bias = (256 - (9 * atten)) / 2;
}

/*
 convert

 Converts 8 bit PCM value to index into ASCII lookup table

 by attenuating and clipping as necessary.

*/
unsigned char
convert(int c)
{
 c -= bias;
 if(c < 0) c = 0; /* clip negative peaks */
 c /= atten;
 if(c > 8) c = 8; /* clip positive peaks */
 return(c);
}

[LISTING TWO]

/*
 ARPEGGIO.C - generates test sound file
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define SAMPRATE 11520 /* for 115.2 kbaud */
#define DURATION (SAMPRATE/15) /* 1/15 sec units */
#define MAX_NOTES 73 /* six octaves */

FILE *fp; /* output file */

/* protos */
void init_notes(void);
void note(int note_num, int duration, int decay);
void arpeggio(int basenote, int step);

/*
 main
*/
main(int argc, char **argv)
{
 if(argc != 2) {
 printf("Usage: arpeggio [outfile]\n");
 exit(1);
 }
 if((fp = fopen(argv[1],"wb")) == NULL) {
 printf("Can't open output file %s'\n",argv[1]);
 exit(1);
 }
 init_notes();

 arpeggio(24,0);
 arpeggio(15,12);
 arpeggio(22,12);
 arpeggio(19,0);
 arpeggio(24,0);
 note(24,10,8);
 exit(0);
}

/*
 arpeggio


 Recursively generates a pretty little arpeggio.
*/
void
arpeggio(int basenote, int step)
{
 note(basenote + step,1,2); /* plinky going up */
 if(step == 24) /* 2 octave arpeggio */
 return;
 else if(step % 12 == 4 step % 12 == 9)
 arpeggio(basenote,step + 3);
 else
 arpeggio(basenote,step + 2);
 note(basenote + step,1,8); /* legato coming down */
}

static double notetab[60], rad_per_samp;

/*
 init_notes

 Builds note frequency table and calcs radians/sample.
*/
void
init_notes(void)
{
 double twlfth_root2,freq;
 int i;

 twlfth_root2 = pow(2.0,1.0 / 12.0); /* compute semitone interval */
 for(i = 0, freq = 110.0 ; i < MAX_NOTES; i++) {
 notetab[i] = freq;
 freq *= twlfth_root2; /* up a semitone */
 }
 rad_per_samp = 2.0 * M_PI / SAMPRATE;
}

/*
 note

 Looks up note in frequency table and performs.
*/
void
note(int note_num, int duration, int decay)
{
 int c;
 long i,cnt;
 double freq,val,env,vol = 50.0;

 freq = notetab[note_num]; /* look up note frequency */
 cnt = duration * DURATION; /* calc count for duration */
 env = 0.999 + decay * 0.0001; /* calc envelope decay factor */
 for(i = 0; i < cnt; i++) {
 val = sin(rad_per_samp * freq * i); /* compute sine wave val */
 c = (int)(vol * val) + 128; /* convert to 8 bit PCM */
 fputc(c,fp); /* write to output file */
 vol *= env; /* make note decay */
 }
}






































































































































































































































































































































































































































































































































































































































July, 1994
EDITORIAL


Yesterday's News, Today's Digital Fishwrap


A popular government without popular information, or the means of acquiring
it, is but a prelude to a farce or a tragedy, or perhaps both. Knowledge will
forever govern ignorance, and a people who mean to be their own governors must
arm themselves with the power which knowledge gives.
--James Madison
For James Madison, access to information and individual freedom went hand in
catcher's mitt. Of course, that was more than 200 years ago, when weekly
newspapers were the information highway of their day. By the time Madison died
in the late 1830s, weeklies had given way to dailies, with nearly 400 of them
in the U.S. alone. By 1900, that number had grown to almost 2000. But
number-wise, the turn of the century was also a high-water mark, as the number
of daily newspapers has since abated to about 1500, with weeklies holding
steady at around 7300. 
The decline in the number of newspapers--30 years ago, New York had eight
major dailies; today, three--along with the inability to consistently attract
new readers has newspaper publishers running scared. Some have even gone so
far as to say that paper-and-ink newspapers are on the verge of extinction,
pointing fingers at a generation reared on MTV noise, "Headline News"
newsbytes, and USA Today Perot-like charts. And as you might expect, computer
technology is at the eye of the storm.
For many, electronic distribution is the key to newspaper survival. In
prototypes developed by the Knight-Ridder Information Design Lab (Boulder,
CO), electronic versions of the morning paper are delivered on clipboard-size
computers that have flat-panel displays. The start-up display looks like
today's hardcopy front page, with familiar headlines, photos, nameplates, and
the like. Select a story headline, and you get the complete article, ringed by
small ads. Select an ad, and you get a full-sized, detailed version of what's
being hawked. Select a photo, and a few seconds of video pop up. In some
applications, the tablet might even "read" the news to you (something
appreciated by those of us who've witnessed fellow rush-hour commuters reading
a newspaper at 55 mph). Getting all that information into the computer
involves more than the paperboy flinging a flat-panel display onto your front
stoop. Wireless networks will search out subscribers, then automatically
download the most up-to-date information to them, assuming they've paid their
bill.
In a more sedentary scenario, Tom Pardum, president of US West's multimedia
group, envisions electronic newspapers delivered to your television set via
fiber-optic networks. "You probably will be able to read in a few years [the
newspaper] on your television sets," Pardum says. "The morning newspaper is a
small amount of data when converted to digits." US West has recently joined
the fiber-optic ranks, undertaking an ambitious $750 million project to rewire
its 14-state network. 
Computers and newspapers are crossing pens on the editorial front too. In a
move prompted more by bottom-line worries than journalistic excellence, small
weeklies have been snapping up a $100, buzzword-laden program called
"Sportswriter" for covering local sporting events. Prior to a game, the
newspaper sends out blank forms for collecting period-by-period scores, game
highlights, team records, individual performances, and other relevant data,
including quotes from the coach. After the game, a coach completes the form
and faxes (or phones) it back to the newspaper, and the data is input into
Sportswriter, which automatically spits out an "article." Of course,
Sportswriter isn't in a league of its own. In the financial arena,
computer-generated stock quotes and similar raw data have been output as
"news" for years.
Granted, electronic distribution of information makes possible end-around runs
of some of the biggest problems confronting newspapers--the high costs
associated with ink, paper, printing presses, unions, paperboys, and the like.
In addition to making it easier for existing newspapers to survive, electronic
publishing also lowers entry barriers for burgeoning information providers.
Forget about printing presses and newsstands--all you need is a PC and modem.
What's missing in all these high-tech remedies for newspaper woes is mention
of content--the nature of the information itself. Newspapers that serve the
public well aren't measured by the volume of information they provide or by
how inexpensively they can produce it. Newspapers are more than shovelware for
data. They need to probe and challenge, fuel thought and spark debate. You may
not agree with someone's analysis, but if that person gets you thinking,
that's enough. Letting coaches (or politicians or marketing specialists, for
that matter) choose what will and will not appear in print does nothing to
safeguard the public welfare. 
As they have over the last few hundred years, newspapers will adapt to
changing times, technology, and events. What we can't forget, however, is that
great newspapers can be printed on recycled brown paper bags--and that the
flashiest multimedia presentation can still be nothing more than electronic
junk mail.
Jonathan Ericksoneditor-in-chief











































July, 1994
LETTERS


Securing Secure Algorithms


Dear DDJ,
Bruce Schneier's article, "The Cambridge Algorithms Workshop" (DDJ, April
1994) lists three conditions under which an encryption algorithm should be
secure. There is one class of encryption algorithms that not only satisfies
these conditions, but has been mathematically proven to be impossible to
break--those algorithms that allow the use of a key that is longer than the
plaintext to be encrypted. In addition to being absolutely secure, these
algorithms are the simplest to implement and the fastest to execute. The proof
of unbreakability is quite simple and based on the following theorem.
Theorem: Any binary number N, when XORed with a random binary number R,
produces a random number.
Proof: 
1. Let K be a random sequence of X number of bytes.
2. Let P be plaintext of length X.
3. Let C be the ciphertext produced by K xor P.
4. From the theorem we know that the ciphertext C is a sequence of totally
random bytes and is therefore unbreakable.
Example 1 shows P1 and P2 as plaintext to be encrypted. K1 is a random series
of 8-bit numbers used to encrypt P1. C1 is the ciphertext produced by XORing
P1 with K1. C2 is the ciphertext produced by using C1 as the key when
encrypting P2. 
If ciphertext C1 is unencrypted by XORing it with the random sequence K1, then
we get plaintext P1. If ciphertext C1 is unencrypted by XORing it with cipher-
text C2, then we get plaintext P2! Thus we see that, given any plaintext P of
length ten, there exists a key that will produce P when the key is XORed with
C1. That is, C1 could be unencrypted into any ten-character message, given the
right random key.
One criticism of this type of data-encryption algorithm is that to encrypt ten
megabytes of plaintext you need a random-number key that is ten megabytes
long. In this age of CDs and laser disks, however, distributing large keys to
those with whom you correspond is no more difficult than distributing short
keys.
William D. Hause
Bryan, Texas 


Rx for Phone Woes


Dear DDJ,
After reading about Al Stevens's telephone experience (DDJ, March 1994), I can
say only one thing: Switch to MCI! All the instructions I ever need are
printed on the back of my MCI credit card, and they work. Access to MCI is
(almost) always a local call, and when it's not, the 800 number works. No
distance is too long or too short. My wife claims that she once had to use the
800 number because she got a private party on the other one, but I think that
either she or the local telephone company screwed up and she got the wrong
number.
AT&T has called me several times to offer me $50.00 to switch. I have refused
because of a feeling that AT&T is the IBM of the telephone world. You know: Do
it our way or don't do it at all. Reading the instructions on a lot of motel
phones made me suspicious. Reading your article has confirmed them.
Long-distance telephone rates are lower than they have ever been, and they
keep falling. Back in the early '80s, when I was living in Germany, it cost
three times as much to call the States from Germany as it cost to call Germany
from the States. Don't knock Judge Green. The best system in the world has
gotten better, but it has changed. Switch to a company that understands the
new situation.
Clair J. Robinson
Minneapolis, Minnesota


Software Testing


Dear DDJ,
In his article "Software Testing Cycles" (DDJ, February 1994), Scott Bradley
prefaced his approach to automate testing with the phrase, "QA engineers who
are not constrained by time_." In doing so, he announces that he is not
dealing with the real world. Time constraints are often the reason people buy
automated testing tools. It's time to admit that writing programs to test
programs doesn't work and never has. Test- scripting tools are not news,
they've been around for decades--yet the majority of testing is still manual.
Why?
Scott also asserts that end users are consigned to using recording for their
testing. The flaw in this conclusion is that users must be closely involved
with test design from the earliest stages, yet recording is not possible until
the application exists. Further, as the application moves into production and
then maintenance, users bear the greatest responsibility for testing; with
unstructured, sequential recordings, maintenance is almost impossible.
Linda G. Hayes
AutoTester
Dallas, Texas


Modeless Dialog Boxes


Dear DDJ,
The May 1993 article by Joseph Newcomer, "Modeless Dialog Boxes for Windows,"
was particularly interesting to me. I have recently come across exactly the
same requirements--namely, the need to simplify the main Windows message loop
for multiple modeless dialogs and have iconized modeless dialogs to reduce
screen clutter.
While the method of keeping a linked list of dialog handles is fine, it is
totally unnecessary. I also thought that you needed to keep global variables
for each modeless dialog box that your application had "active." This is not
the case. If you read between the lines in the SDK documentation on
IsDialogMessage (with hindsight) you can see that IsDialog-Message seems only
to work on keystrokes to the (presumably) active dialog.
The method I use is very simple and will fit easily into the default dialog
processing in Joseph's code. I can't lay claims to this method, as I first saw
it in the Microsoft Knowledge Base. All you need is a single global variable
for the window handle of the active dialog box. In Joseph's default-dialog
processing routine, you process the WM_ACTIVATE message; if the dialog is
becoming active, set the ghActiveDlg variable to that dialog handle; if it is
being deactivated, set ghActiveDlg to NULL.
The keystrokes to move between the application window and modeless dialogs,
(ALT+F6, ALT+SHIFT+F6) still work as before. Incidentally, has anyone else
noticed that if you have multiple modeless dialogs, the behavior of these
keystrokes is not quite what you may expect? 
The simplicity of this method, shown in Example 2, highlights deficiencies in
the SDK documentation (which is improving slowly). The only use of this method
I could find was in the C7 CDDEMO sample program.
On the topic of minimizing modeless dialog boxes, I find that Joseph's method
seems to have some redundant processing--notably the setting of the dialog
icon word to NULL. In my experiments, the dialog-icon word is always NULL. The
only reason I can see that he might need this is if some other application
running has changed the dialog-icon word. This is a bit naughty--since all
applications share the dialog class, no application should be changing the
dialog-class word. Joe's approach then removes the need to process the
WM_NCPAINT message and the other places where he sets the dialog-class word to
NULL.
Minimizing the dialog works fine if the icon is outside the application
window, but not if the icon is over the application's client area. This is
because the icon's background is always the desktop bitmap, rather than the
windows underlying the icon. We need to get windows beneath the icon to
repaint before using DrawIcon, rather than the simple use of
WM_ICONERASEBKGND. I can't help feeling that Windows should have already
repainted any underlying windows, and the application should not be forcing
the background to repaint anyway! Since modeless dialogs with icons are a
requirement, maybe we need to ask Microsoft to provide these in a future
version of Windows.
David Lowndes
Stoke-on-Trent, England



Far Be It From Us


Dear DDJ,
Multimedia World and PC Magazine have decided to no longer accept advertising
for erotic, pornographic "adult" multimedia or CD-ROM products.
Degrading and dehumanizing women into sex objects both contribute to physical
violence against women and have been invaluable assets in the war against
social, political, and economic equality for women.
The constitution protects the right of free speech. It does not guarantee the
right to make a profit. I am writing to ask you to take a positive step toward
promoting social, political, and economic equality for women by refusing to
accept "adult software" advertising in your magazine. For equality and safety.
Lisa Gray
Bothell, Washington
DDJ responds: Thanks for writing; we appreciate hearing from you. While we
agree in principle with your statements, we're curious about what led you to
believe that Dr. Dobb's Journal accepts advertising for erotic products? 
By the way, the constitution does, in fact "guarantee the right to make a
profit," unless we missed some important news recently.


Trying Tries


Dear DDJ,
I was happy to see Tom Swan's column on tries (DDJ, April 1994). Though I've
been using tries to do many things (for at least ten years) I find that they
are misunderstood in general by people who otherwise use binary trees and the
like. And so they don't get much press (or use).
I find that for dealing with words, concordance programs and things like that,
I almost always put one character in a node with downward links for the next
character in the word, and also with sideways links for a similar word but
with a different character in this location.
The diagram in Figure 1 shows a trie, which includes the words and, arts, and
ant. Each leaf is a null, signifying end of word. This makes for a compact and
easily searchable dictionary. I also add an additional item on each leaf, a
number, which can be used as a counter. Thus, when a word is inserted into the
trie, a count of the number of occurrences can be easily kept.
I also want to thank Tom for using pseudocode in his column. Far too often
nowadays people will write a program and call it an "algorithm.'' They don't
understand the difference between an algorithm and an implementation, which is
something that I found disturbing, since I have been a Modula-2 user for many
years and that distinction is so important to the language (and to Professor
Wirth himself).
I've read of a time when Algol was the pseudocode of choice. I remember a time
when pseudocode really was a readable nonimplementation (those were the days),
and now it seems that C has become the pseudocode (which is a sad statement on
the state of computer science).
John Andrea
Antigonish, Nova Scotia 
Figure 1: Trying tries.
Example 1: Securing secure algorithms.
Example 2: Modeless dialog boxes.
/* Global variable - the active dialog handle */
HWND ghActiveDlg = NULL;
case WM_ACTIVATE:
 if ( wParam == WA_INACTIVE )
 {
 /* Becoming inactive */
 ghActiveDlg = NULL;
 }
 else
 {
 /* Becoming active */
 ghActiveDlg = hDlg;
 }
 return( FALSE );
In the main message loop the modeless dialog processing is simply:
/* If one of the modeless dialogs is active process the message */
if ( ( ghActiveDlg == NULL ) !IsDialogMessage( ghActiveDlg, &msg ) )
{
 /* Otherwise process it as normal */
 TranslateMessage( &msg );
 DispatchMessage( &msg );
}












July, 1994
Morphing 3-D Objects in C++


There's more than one way to 'melt" a cat 




Glenn M. Lewis


In the movie Terminator 2: Judgment Day, there's a well-known scene in which
the T1000 terminator has become a puddle of liquid metal on a checkerboard
floor. The puddle suddenly begins oozing upward, defying gravity, and forming
the general outline of a man. The liquid metal then solidifies to a humanoid
shape, and "grows" skin to become the clearly defined image of a man.
In this article, I present a C++ program that simulates this effect on certain
three-dimensional objects, such as the melting chess piece in Figure 1. For
lack of a better term, I call this the "melt effect," even though the actual
film sequence is the reverse of melting. Of course, once this program has
created all the intermediate objects, you can run the morph in whichever
direction you choose.
There are two kinds of morphing: 2-D and 3-D. In a 2-D morph, control points
are placed in two images at locations that correlate between the two, usually
in the form of a mesh. The 2-D morph algorithm calculates intermediate images,
that, when viewed in succession, smoothly change the first image into the
second. A 3-D morph starts with a representation of two 3-D objects, creating
new 3-D models that, when rendered and viewed in succession, smoothly change
the first object into the second. (For a backgrounder on morphing, see
"Morphing in 2-D and 3-D," by Valerie Hall, DDJ, July 1993.)
The C++ program presented here, which compiles and runs on UNIX boxes, PCs,
and the Amiga, processes objects represented using the IFF format generated by
Impulse Inc.'s (Minneapolis, MN) Imagine rendering package. As with other
rendering programs, Imagine takes a description of an abstract 3-D world
consisting of objects, textures, light sources, a camera, and the like, and
creates a 2-D image of that world as seen from a camera's perspective.
My "melt" program extends the Imagine program's limited built-in morphing
facility. One major restriction of Imagine's facility is that two morphable
objects must have the same topology: Each object of a morphable pair must have
the same number of points, edges, and faces, and each of these elements must
have a one-to-one correspondence with its counterpart, including the same
hierarchical order within the object. Although a severe limitation, this still
greatly facilitates Imagine's implementation of morphing--all that's necessary
is a simple linear interpolation between the corresponding points in each
object. However, this kind of interpolation is not enough to achieve the
puddle effect, which is a highly nonlinear transformation. Consequently, my
program generates objects that are roughly analogous to key frames in an
animation sequence and asks Imagine to interpolate between those "key-frame
objects." You can request that Imagine put any number of frames in between,
but the more intermediate objects you create with my melt program, the better
the resulting effect.
To get true Hollywood-class special effects, it's often necessary to combine
algorithmic transformation and "manual" touchup (using image-editing tools).
Plus, you need a lot of experience and artistic talent to achieve the most
natural-looking effects. All this is beyond the scope of my effort, which is
restricted to the algorithmic approach only. One drawback to the purely
algorithmic approach is that not all objects are well suited to melting. I've
found that the melt effect works best on objects that are "star-shaped," where
at least one point in the interior of the object can be selected such that all
rays cast from this point in any direction intersect the object once and only
once. It also helps if the shape is generally cylindrical or box-like. I have
tried this effect on complex objects (such as a model of the NCC-1701-D
Enterprise, by Carmen Rizzolo), and the effect is more of a curiosity than it
is usable.
Running the melt program requires two arguments: the original object's
filename, and the base filename to use when creating the new "in-between"
morphed objects (which defaults to root_). A third optional parameter is the
number of frames over which the user wishes to morph; this argument's default
is 10, which means that root_000.iob through root_009.iob are generated
automatically. The original object can be considered the final object in the
sequence. Note that the effect is much more convincing when the sequence is
viewed in reverse numerical order--that is, when you start with the original
object and morph toward the puddle (the "000" object). This, of course, is why
I named the program "melt."


Objects and Formats


IFF (short for "Interchange File Format") is a tagged file format that's
relatively straightforward to read. IFF allows programs to parse a file
without having to recognize all data fields within the file; unfamiliar
sections can simply be skipped. IFF shares this characteristic with formats
such as TIFF and RIFF.
The IFF file structure is very simple. The first four bytes must be the
characters FORM, followed by a 4-byte unsigned long that represents the length
of the remaining IFF file. All numeric data in IFF files are stored in
Big-endian (Motorola) format, with most significant byte (MSB) first and least
significant byte (LSB) last. Following the file size is a field specifying the
type of form. The type of form used by my program and by Imagine to store
objects in is called FORM TDDD. The rest of the file is made up of IFF
"chunks" that comprise the TDDD form. Each chunk consists of a 4-byte
identification (ID) string labeling the chunk, followed by its size as an
unsigned long. All IFF chunks are required to start on even-byte boundaries
within the file; therefore, odd-sized chunks are padded with a 0 byte. 
ReadWrite, a program that's part of my T3DLIB package (available via anonymous
ftp from ftp.wustl.edu or ftp.luth.se in the pub/aminet/qfx/3dobj directory)
dumps out the TDDD chunks present in an Imagine object. The program creates an
ASCII representation of the object, called a "T3D" or "Textual Three-Dimension
Data Description" file. If you run it with the --v command-line flag, the
program displays chunks that Impulse has added which ReadWrite does not yet
understand (because at the time of this writing, Impulse has not released the
full spec for the TDDD format). 


The Melt Code


Melt.cc (Listing One) and point.cc (Listing Two) are the two principal modules
of the melt program. The complete system, including list.cc and object.cc, is
available electronically; see "Availability." 
One of the reasons I implemented this program in C++ was that earlier I had
written T3DLIB, a C shareware library of object-conversion and manipulation
routines for Imagine objects. Although I thought about building upon this
package, I decided against it because I wanted a program designed to focus on
object manipulation that would simply ignore (yet retain) TDDD chunks that it
did not care about and thus maintain compatibility with future changes to the
TDDD file format. T3DLIB is not well suited for this because it loads in an
object entirely into an internal database that represents the full TDDD file
format and later traverses that hierarchy to write the object out. So any time
new TDDD chunks are added, T3DLIB must be updated to support them.
Also, I considered what would be involved in simply creating a copy of an
object within T3DLIB by duplicating the full internal object hierarchy, and I
got a headache thinking of the time involved. I knew about C++ constructors
and destructors, and decided that copying an object would be a perfect
application for these great features.


The Principal Classes


The three principal classes in the melt program are Point, List, and Object.
The design of these classes offloads a lot of bookkeeping chores from the main
algorithm. A Point object is simple: It consists only of the x, y, and z
values of a point, plus routines to initialize, load, and write from/to a
file, and get and put individual FRACTs from/to a file. A FRACT is a 4-byte
fixed-decimal real-value format that Imagine uses in its TDDD files. To
convert it to a double, simply cast the 4-byte long to a double and multiply
by 65,536. In other words, a FRACT represents a real number via a 16-bit
integer portion and a 16-bit fractional portion. Obviously, this constrains
the range of FRACT values from --32768 to +32767. Care was taken to implement
the low-level get_fract() and put_fract() routines in a CPU-independent form,
so that this code runs on both Big-endian and Little-endian CPU architectures
without modification.
The List class provides a simple mechanism to keep track of a list of Point
arrays. Note that since we are only interested in the PNTS subchunks from the
TDDD file, you can simply ignore all other IFF subchunks and store the PNTS
arrays in the correct order. Then, to perform any kind of object manipulation,
the first step is to read in all the PNTS arrays. When creating modified
versions of the object, you must read the original at the same time as writing
the modified version, filling in all the original data skipped over when
searching for PNTS. The beauty of the List class is its constructors. Given a
file pointer, the List class will read in the current PNTS subchunk being
pointed to in the TDDD file. Given another List reference, however, the List
class will make a new copy of the given list! This makes the programmer's task
much easier when you start dealing with entire objects in the Object class.
The Object class includes the remainder of the TDDD-handling "smarts," plus
constructors and destructors that make manipulating Imagine object PNTS a
breeze for the programmer. Given a filename, the Object constructor reads in
an entire Imagine object file, setting up the lists to point to the PNTS
arrays properly, and keeping them in the correct order, as found in the file.
It also saves the name of the original file for later use when writing out a
modified copy of an Object. Given a reference to another Object class, Object
copies the entire hierarchy very simply, taking advantage of the List
constructor. 
The most complex part of the Object class is the Output routine. When an
Object is to be written to a file, the filename of the original object has
already been copied with the constructor, so the original file is simply
opened for read operations while the new object is opened for write. Instead
of simply skipping over IFF chunks as was done during the reading of the file,
bytes are now copied from the input to the output until the first PNTS
subchunk is encountered. At this time, instead of copying the original file,
the new internal point data from the first List object is written out. Then
the original file is copied until the next PNTS subchunk is encountered, and
so on, until the end of the file has been reached. This paradigm can easily be
extended to keep track of any TDDD subchunks of interest, but in this case,
only the PNTS subchunk is needed for modification. 
One particularly useful feature of both the List and Object classes is that,
while they load an object, they continuously update the MBB (minimum bounding
box) information for the object. So when the object has been completely
loaded, the Object class already "knows" the MBB of the object, and can easily
calculate the center of this MBB. This information is then used by the melt
algorithm. 


The Melt Algorithm


The melt program first parses its arguments, then loads in the object to be
morphed with the Object constructor. The next step is to loop over the number
of frames desired, keeping track of the percentage of completion of the morph
as it moves from the puddle back to the original object shape. For each
iteration, we make a duplicate of the original object, again with an Object
constructor, call the Tweak_Object function, output the modified object, and
then delete it.
Tweak_Object() is the heart of the melt program. It iterates over all points
in the object and modifies them based on the percent variable, which specifies
how far this object is into the morph. If 0 percent is specified, the object
is the puddle; 100 percent (percent variable equals 1.0) means that the object
is in its original, untouched form. Of course, there is no need to call
Tweak_Object with a percent value of 100 percent, so you just generate objects
up to, but not including, this value. The Object class provides the
first_point() and next_point() functions; these work in cooperation with the
List class to easily iterate over all points without concerning the programmer
with the start and end boundaries of each PNTS List.
I experimented with the actual algorithm used to move each point. I started
out with a simple hermite curve to modify the radius of a given point based
upon its vertical (z) height, which created a smooth transition with zero
slopes both at the top and bottom of the object. This looked good closest to
the "puddle" portion of the morph, but as the object grew vertically, its
volume seemed to grow instead of remaining constant.
I then experimented with the function of an ellipse to give tangents to the
z-axis at the top portion of the object and tangents to the x-y plane at the
bottom of the object. This looked best near the higher-numbered objects. I
calculated on paper how to keep the area constant under the curve such that
the volume under the swept curve is also constant. I then made a smooth
transition from the hermite function in lower-numbered objects to the ellipse
function in higher-ordered objects, making sure to match up the top and bottom
radii of the two functions. This was the effect I was looking for. The last
step is to gradually "fade" (interpolate) from the swept-curve functions to
the original object's real x- and y-coordinates. This nicely completes the
effect.
This algorithm works best, as mentioned earlier, with star-shaped objects. You
can get two examples of this on the Aminet file servers via anonymous ftp from
ftp.wustl.edu or ftp.luth.se in the pub/aminet/gfx/3dobj directory. This
directory contains four files: ChessMelt.readme, ChessMelt.lha,
CowMelt.readme, and CowMelt.lha. ChessMelt contains a full set of chess pieces
that have been melted into puddles with this program. CowMelt contains the Cow
object that comes with Imagine 2.0. The effect on the chess pieces worked as
intended, but the effect on the cow is rather bizarre. I will let you judge
for yourselves. (My ReadWrite program is also available in the
pub/aminet/gfx/3d directory as part of the T3DLIB package.)


Conclusion



The three classes--Point, List, and Object--create an excellent starting point
for experimenting with IFF objects. In their present form, points can be
manipulated in many different ways. However, a wealth of other information
available in the TDDD files is begging to be processed as well. You could
extend the program to alter not just points, but to also add, delete, and
modify edges and faces. Also, you could work with color, reflectance, and
transparency. You could even try experimenting with particle simulations,
inverse kinematics, and dynamic constraints.
Figure 1: The "melt" effect in action. (a) (b) (c) (d)

Listing One 

//----------------------------------------------------------------------
// melt.cc - T2-like melt effect (excerpted listing) By Glenn M. Lewis 
//----------------------------------------------------------------------
#include <iostream.h>
#include <fstream.h>
#include <assert.h>
#include <ctype.h>
#include "t3dxx.h"
//----------------------------------------------------------------------
int num_frames = 0;
void Tweak_Object(int frame, Object &obj);
Object *mainobj=0;
extern "C" { int atoi(char*); }
char in_filename[100], out_root[100];
//----------------------------------------------------------------------
int main(int argc, char *argv[])
{
 int frame;

 Object *TmpObject;
 char filename[100];

 in_filename[0] = '\0';
 out_root[0] = '\0';
 num_frames = 10;
 for (int i=1; i<argc; i++) {
 if (isdigit(argv[i][0])) num_frames = atoi(argv[i]);
 else if (argv[i][0] == '-')
 cerr << "Unknown option: '" << argv[i] << "' ignored.\n";
 else if (!in_filename[0]) strcpy(in_filename, argv[i]);
 else if (!out_root[0]) strcpy(out_root, argv[i]);
 else {
 Usage:
 cerr << "Usage: " << argv[0] 
 << " [<num_frames>] infile outfile_root" << endl;
 return (-1);
 }
 }
 if (num_frames < 1 !in_filename[0] !out_root[0]) goto Usage;
 
 mainobj = new Object(in_filename);

 for (frame=0; frame<num_frames; frame++) {
 cerr << "Creating frame #" << frame << "...\n";
 TmpObject = new Object(*mainobj); // Copy loaded Object
 assert(TmpObject != 0);
 Tweak_Object(frame, *TmpObject);
 sprintf(filename, "%s%03d.iob", out_root, frame);
 TmpObject->Output(filename);
 delete TmpObject; // Destroy it
 }
 delete mainobj;

 return(0);
}
//----------------------------------------------------------------------
inline double interpolate(double percent, double zero_val, double one_val)
{
 return(percent*one_val + (1.0-percent)*zero_val);
}
//----------------------------------------------------------------------
inline double mini_hermite(double r, double top_r)
{
 if (r<top_r) return(0.0);
 assert(top_r != 1.0);
 r = (r-top_r)/(1.0-top_r); // Renormalize
 return((3-r-r)*r*r);
}
//----------------------------------------------------------------------
void project_point_to_radius(double newr, Point& newp1)
{
 double r;
 if (newp1.x==0.0 && newp1.y==0.0) 
 return; // It *is* the center!
 // Normalize the direction vector...

 r = newr/sqrt(newp1.x*newp1.x + newp1.y*newp1.y);
 newp1.x *= r;
 newp1.y *= r;
 // z is unchanged
}
//----------------------------------------------------------------------
inline double calc_ellipse_r(double percent, double r, double a)
{
 return((0.9*percent + 1.1*(1.0-r)*a)*mainobj->max_radius());
}
//----------------------------------------------------------------------
inline double calc_ellipse_z(double r)
{ // The following was derived to keep the area under an ellipse constant.
 return(1.0-sqrt(1.0 - r*r));
}
//----------------------------------------------------------------------
inline double calc_hermite_r(double BOT_RAD, double TOP_RAD, double z_percent)
{ double r = 1.0 - z_percent;
 // the following was (TOP_RAD + BOT_RAD*r);
 return (BOT_RAD*r < TOP_RAD ? TOP_RAD : BOT_RAD*r); 
}
//----------------------------------------------------------------------
inline double calc_hermite_z(double BOT_RAD, double TOP_RAD, double z_percent)
{
 double r = 1.0 - z_percent;
 return(1.0 - mini_hermite(r, TOP_RAD/BOT_RAD));
}
//----------------------------------------------------------------------
const double K=2.0;
double last_z_percent;
//----------------------------------------------------------------------
void Tweak_Object(int frame, Object &obj)
{
 ofstream out, oute, outh;
 Point *point, newp1;
 double ellipse_r, ellipse_z;

 double hermite_r, hermite_z;
 double newr, z_percent;
 double percent = (double)frame/(double)num_frames; // from 0...1
 double ZSIZE = percent*mainobj->height(); // Original mainobj!
 // Note that TOP_RAD must NOT ever equal BOT_RAD!
 double TOP_RAD = 0.9*percent*mainobj->max_radius(); // Original mainobj!
 double fade = percent;
 double BOT_RAD = interpolate(percent, K*mainobj->max_radius(),
 mainobj->max_radius());
 double a = (1.0 - percent*percent)/(percent + (1.0/K));
 assert(TOP_RAD != BOT_RAD);
 assert(mainobj->height() != 0);
 assert(mainobj->max_radius() != 0);
 last_z_percent = -1.0;
 
 for (obj.first_point(point); point; obj.next_point(point)) {
 newp1.x = point->x - mainobj->xcenter(); // translate point to origin
 newp1.y = point->y - mainobj->ycenter();
 newp1.z = point->z - mainobj->zmin();


 z_percent = newp1.z/mainobj->height();
 
 ellipse_r = calc_ellipse_r(percent, z_percent, a);
 ellipse_z = calc_ellipse_z(z_percent); // 0..1
 hermite_r = calc_hermite_r(BOT_RAD, TOP_RAD, z_percent);
 hermite_z = calc_hermite_z(BOT_RAD, TOP_RAD, z_percent); // 0..1

 newr = interpolate(fade, hermite_r, ellipse_r);
 newp1.z = ZSIZE * interpolate(fade, hermite_z, ellipse_z);

 project_point_to_radius(newr, newp1);

 newp1.x += mainobj->xcenter();
 newp1.y += mainobj->ycenter();
 newp1.z += mainobj->zmin();
 // We now have the new point location... fade to object.
 point->x = interpolate(fade, newp1.x, point->x);
 point->y = interpolate(fade, newp1.y, point->y);
 point->z = interpolate(fade, newp1.z, z_percent*ZSIZE+mainobj->zmin());
 }
}



Listing Two

//--------------------------------------------------------------------
// point.cc - the Point class for "melt" program. By Glenn Lewis
//--------------------------------------------------------------------
#include <iostream.h>
#include <fstream.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>
#include "t3dxx.h"
//----------------------------------------------------------------
double Point::get_fract(ifstream& inFile)
{

 unsigned char ch;
 unsigned long l;
 long l2;
 inFile.get(ch); l = (unsigned long)ch;
 inFile.get(ch); l = (l<<8)(unsigned long)ch;
 inFile.get(ch); l = (l<<8)(unsigned long)ch;

 inFile.get(ch); l = (l<<8)(unsigned long)ch;
 l2 = (long)l;
 return (ROUNDIT((double)l2*(1.0/65536.0)));
}
//----------------------------------------------------------------
void Point::put_fract(double num, ofstream& outFile)
{
 long l;
 l = (long)(num*65536.0);
 outFile.put((unsigned char)((l>>24)&0xFF));
 outFile.put((unsigned char)((l>>16)&0xFF));
 outFile.put((unsigned char)((l>> 8)&0xFF));
 outFile.put((unsigned char)((l )&0xFF));
}
//----------------------------------------------------------------
void Point::load(ifstream& inFile)
{
 x = get_fract(inFile);
 y = get_fract(inFile);
 z = get_fract(inFile);
}
//----------------------------------------------------------------
void Point::write(ofstream& outFile)
{
 put_fract(x, outFile);
 put_fract(y, outFile);
 put_fract(z, outFile);
}



























July, 1994
Generating Realistic Terrain


Simulating wind and erosion




Robert Krten


Robert is a contract programmer in Kanata, Ontario. You can contact him at
rk@parse.ocunix.on.ca.


Realistic landscapes are the bread and butter of the computer graphics used in
movies, video games, multimedia applications, and similar simulations.
Unfortunately, generating landscapes and terrain that look real isn't always a
straightforward process. However, fault generation, the technique I present
here, is easy to grasp and implement--and fast. Fault generation is a rough
simulation of the way some mountains and other geological features are formed
in nature.
For instance, imagine that the landscape that you wish to generate is
represented by a two-dimensional array in the computer's memory. The value at
any given x, y position within that array represents the height at that point
in the landscape. To make the landscape look interesting and realistic, each
height value should bear some relationship to its neighbor's height.
To visualize how this can be accomplished, start with a flat terrain (all
height values set to 0). Next, draw an imaginary straight line through any
given part of the array, such that it passes entirely through the array. Then,
change all of the values on one side of that line to be lower by a certain
amount, and all of the values on the other side of that line to be higher by a
certain amount. This results in a landscape with a big fault running through
some part of it. While this may be interesting as a first step, it certainly
doesn't offer much variety or realism. 
To enhance realism, you need to repeat the process several times, (using a
different imaginary line each time), decreasing the amount by which the
landscape changes (the height) with each iteration. This causes the landscape
to have a large shift in one place (corresponding to the first fault), with
smaller "detailing" shifts throughout. You can get creative with the function
that you use for the decrease in height, but I've used a 1/x-height reduction
with good results. This way, the first fault line causes the landscape to
change in height by the same amount as there are iterations, the next fault
line changes in height by 1 less, and so on, until the last fault line changes
in height by one unit.
This results in a rugged terrain, with sharp corners and sudden changes in
certain places. This terrain is usable for some applications without further
modification, but could be made more realistic by smoothing out the rough
edges.


Smoothing It Out


There are a number of ways to smooth out the terrain. By controlling the
fault-decay factor, you can have one large feature, and relatively small
subsequent features. Applying many such small features to the landscape makes
it statistically likely that the large feature will become "broken down" over
time, with fewer sharp edges. This approach may require many iterations,
however, consuming a fair amount of CPU time.
An alternative is to apply a digital filter over the generated landscape,
smoothing out the rough spots. This allows the entire landscape to be
generated and smoothed in one final pass. Even though the digital-filter
algorithm is somewhat expensive in CPU time, it is still a good solution
because it happens only once. 
A more realistic filtering approach is simulation of erosion. Imagine that
after the first major land shift, an innocuous filter is passed over the
terrain. It is just enough to smooth out a few of the rough spots in the
terrain in the neighborhood of the first fault. When the second land shift
occurs, the land being shifted is already smoother than it would have been.
Again, a filter is run over the newly shifted land. This produces the
equivalent of geological wind erosion. 
The "low-pass" filter (a single-constant FIR filter) in Listing Two operates
by propagating a certain small portion of the previous sample into the current
sample. This is repeated for all samples. The two filter types (the two-pass
and four-pass, both in Listing Two) differ only in that the two-pass sweeps
along the x-axis once and then the y-axis once, whereas the four-pass sweeps
along the x-axis in one direction and then the other direction (two passes)
and performs the same operation for the y-axis (two more passes, for a total
of four). With the two-pass filter, a landscape shift is introduced into the
array, whereas with the four-pass this shift is not apparent. 
The two-pass filter can be thought of as simulating constant wind erosion,
where the wind is always coming from the same direction. The four-pass filter
simulates rain erosion, where the rain is (obviously) coming from the top, and
eroding particles away in all directions.


Allocating the Landscape Memory


Memory was an interesting side issue during the development of the fault
generator. I have put together a 2-D calloc library call (see Listing Two)
that allows the landscape size to be determined at run time, rather than
compile time. This should be especially helpful for systems where memory is
tight and you want to get the biggest array that will fit. The technique used
also ensures that the 64-Kbyte segmentation barrier will not be reached
(unless your array is bigger than 16x16 Kbytes, in which case you will more
than likely first run out of physical memory and processor time). An advantage
of allowing the landscape size to be determined at run time is that you can
batch up a large number of different-sized landscapes to be processed, then go
home for the evening, without having to recompile each time.
There are a number of ways of declaring (and allocating storage for) 2-D
arrays in C. For instance, a statement such as Example 1(a) results in a
different memory layout than Example 1(b). However, both can be addressed as
in Example 1(c). The first statement declares an array-of-array and allocates
all of the required storage in a contiguous chunk of memory. This can easily
exceed 64 Kbytes. On systems without the 64-Kbyte barrier, it can still be a
problem, as it may exceed the largest chunk of contiguous free memory
available. The second statement declares a pointer-to-a-pointer and typically
allocates four bytes. The terrain generator presented here uses the second
statement, in conjunction with a library, for allocating and freeing the
structure.
The approach used to allocate memory is a two-stage allocation. Assume that a
200*400*(sizeof int) array is being allocated. In the first stage, the
"backbone" is allocated with space for 200 pointers to integers. (This
typically consumes 200*sizeof (int*), or 800 bytes.) In the second stage, one
400-integer array is allocated for each backbone pointer and stored in that
pointer.
The end result is an array of 200 pointers, each of which points to a
different 400-element array of integers. In terms of overhead, this introduces
the 800 extra bytes for the backbone array.
C allows the land [x][y] addressing style to work because the compiler is
aware of the details of the base type of "land" (that is, it is a
pointer-to-a-pointer). The compiler looks at land [x] and generates code to
reference the xth pointer (of the 200). It then generates code that indexes
into the yth location of that pointer, thus referencing the given array
element. By allocating just a little more memory than I actually need, I can
store some information at the beginning of the allocated-memory array, then
return a pointer to just after that header.
The extra information in the header (the x- and y-array size allocated and the
size of the individual element) is stored by the ECalloc2d routine when the
array is created, and it is especially useful in EFree2d, ERead2d, and
EWrite2d. Without this header, I'd have to pass the information around all of
the time or maintain it as a global structure.
Another feature of the pointer-to-a-pointer approach is that the entire array
can be assigned to a variable using the C assignment operator, rather than a
series of nested for loops. For example, after calling the digital filter in
the main loop, you need to swap the input and output arrays. This happens in
the procedure fault using just three assignments.


Storing the Landscape to Disk


The easiest way to store the landscape to disk is to write out all of the
elements, using two nested for loops. With this approach, the whole first row
will be written out (in column order), then the second row, and so on.
Agreeing on a common-output format makes it easy to have other utilities
process (or even generate) the terrain data. For example, you can write an
alternate filtering or display program. This underscores another advantage of
dynamically allocated array sizes. By writing out the array size (xxy) as the
first few bytes of the file, you can read in differently sized files for
processing in other utilities, without having to recompile all of the
utilities for the new size.


The Source Code


The fault-generation source code consists of the makefile (Listing One),
common.c (Listing Two), common.h (Listing Three), and fault.c (Listing Four).
I've developed this code under QNX 4.2 with the Watcom 9.5 C compiler.
The makefile contains C-compiler flags (that you may change or ignore), linker
flags, and dependencies. As defined, fault.c is the main module, with common.c
containing the filtering and 2-D manipulation routines. The ANSI C prototypes
for common.c are contained in common.h.
The fault.c module contains the main routine, which calls the command-line
option parser (optproc), allocates the landscape arrays (ECalloc), and calls
the fault generator (fault) with the number of iterations to be performed.
The fault routine calls the single fault-line generator generate_line (with
the height of the land shift) and the optional wind-erosion filter. At the end
of fault, the optional final filtering is performed, and the file is written
to disk.

To use the fault generator, type the name of the executable followed by the
name of the output file you wish to generate. For example, fault terrain.xy
will invoke the fault generator with the defaults and generate an output file
called terrain.xy.


Possible Enhancements


There are a number of enhancements you can add to the characteristics of the
terrain generated by this program.
For one thing, you can change how the fault heights are determined by
establishing a number of discrete sizes and choosing one of those each time,
perhaps with a weighted probability. For example, if you were generating a
200-iteration terrain, you would take 10 100-unit heights, 150 20-unit
heights, and the rest (40) in 2-unit heights.
The land movement on each side of the fault need not be constant. You can
scale the fault height by the distance from the fault line; for example, the
further away a point is from the fault line, the less it is affected. This is
also closer to the way that things happen geologically--an earthquake in San
Francisco usually has no effect in Pocatello, Idaho, 1000 kilometers away.
As previously mentioned, digital filters, such as FIR or IIR filters, can lead
to some quite spectacular effects. For another effect, run the landscape
through an FFT one row at a time, chop out a portion of the frequency domain,
and then reconstitute it. Regardless of what you do to improve the algorithm,
I'd like to hear about it. I can be reached on the Internet at
rk@parse.ocunix.on.ca. 
Example 1: Declaring and allocating storage for 2-D arrays.
(a) int land [200][400];
(b) int **land;
(c) land [x][y] = stuff;

Listing One 

# Makefile for fractal fault routine
# For QNX 4.2, but should be reasonably portable
# 1993 11 25 R. Krten released for publication / public use

CFLAGS = -4 -Oxr -w9 -mf

fault: common.o fault.o
 cc $(CFLAGS) -o fault fault.o common.o
fault.o : Makefile fault.c common.h
 cc $(CFLAGS) -c fault.c
common.o : Makefile common.c common.h
 cc $(CFLAGS) -c common.c



Listing Two

/* common.c
 * QNX 4
 * (C) Copyright 1993 by Robert Krten, all rights reserved.
 * This module contains common utilities for the X * Y fault programs.
 * 1993 10 26 R. Krten created
 * 1993 11 25 R. Krten released for publication / public use
*/

#include <stdio.h>
#include <stdlib.h>

typedef struct {
 int xSize; /* number of backbone entries */
 int ySize; /* number of entries in each backbone entry */
 int eSize; /* size of each entry */
} MHead;

extern char *progname;
extern int **land;
extern int dimensions [2];

/* Two-dimensional support routines:
 * ECalloc2d (x size, y size, size of each member)
 * EFree2d (pointer to allocated array to free)
 * ERead2d (FILE pointer, pointer to array to read)

 * EWrite2d (FILE pointer, pointer to array to write)
 * The above routines operate on two dimensional arrays, based upon the
 * "pointer to pointer" type. The allocation routine first allocates
 * a backbone (consisting of "x size" number of pointers + a header, and
 * then fills in the backbone with "y sized" members. The free routine
 * frees each backbone member in turn, and then the whole backbone itself,
 * including the header. The read and write routines read and write the
 * arrays from and to disk.
*/

void **
ECalloc2d (x, y, esize)
int x;
int y;
int esize;
{
 void **ptr; /* pointer to allocated memory */
 MHead *mptr; /* pointer to memory header */
 int i;

 if ((ptr = calloc (1, sizeof (MHead) + x * sizeof (void *))) == NULL) {
 fprintf (stderr, "%s: out of memory on first allocation\n", progname);
 exit (1);
 }
 mptr = (MHead *) ptr;
 mptr -> xSize = x;
 mptr -> ySize = y;
 mptr -> eSize = esize;

 ptr = (void *) (mptr + 1);

 for (i = 0; i < x; i++) {
 if ((ptr [i] = calloc (y, esize)) == NULL) {
 fprintf (stderr, "%s: out of memory (at [%d])!\n", progname, i);
 exit (1);
 }
 }
 return (ptr);
}
void
EFree2d (ptr)
void **ptr;
{
 MHead *mptr; /* pointer to memory header */
 int x, y;
 int i;

 mptr = (MHead *) ptr - 1;
 x = mptr -> xSize;
 y = mptr -> ySize;

 for (i = 0; i < x; i++) {
 free (ptr [i]);
 }
 free (ptr);
}
ERead2d (fp, l)
FILE *fp;
void **l;

{
 MHead *mptr; /* pointer to memory header */
 int x, y, esize;
 int i;

 mptr = (MHead *) l - 1;
 x = mptr -> xSize;
 y = mptr -> ySize;
 esize = mptr -> eSize;

 for (i = 0; i < x; i++) {
 fread (l [i], y, esize, fp);
 }
}
EWrite2d (fp, l)
FILE *fp;
void **l;
{
 MHead *mptr; /* pointer to memory header */
 int x, y, esize;
 int i;
 int dim [2];

 mptr = (MHead *) l - 1;
 dim [0] = x = mptr -> xSize;
 dim [1] = y = mptr -> ySize;
 esize = mptr -> eSize;

 fwrite (dim, 2, sizeof (int), fp);
 for (i = 0; i < x; i++) {
 fwrite (l [i], y, esize, fp);
 }

}
/* The filter algorithm -- This implements a single-constant FIR filter. */
filter (input, output, kx, ky, flag)
int **input;
int **output;
double kx, ky;
int flag;
{
 MHead *mptr; /* pointer to memory header */
 double acc;
 double acckx, accky;
 register x, y;

 /* we assume that dimensions of "input" == dimensions of "output" */
 mptr = (MHead *) input - 1;

 /* first pass X direction */
 printf ("1"); fflush (stdout);
 accky = 1. / (1. - ky);
 for (y = 0; y < mptr -> ySize; y++) {
 acc = input [0][y] * accky;
 for (x = 0; x < mptr -> xSize; x++) {
 output [x][y] = acc / accky;
 acc = acc * ky + input [x][y];
 }
 }

 /* second pass X direction */
 printf ("2"); fflush (stdout);
 if (flag == '4') {
 for (y = 0; y < mptr -> ySize; y++) {
 acc = input [mptr -> xSize - 1][y] * accky;
 for (x = mptr -> xSize - 1; x >= 0; x--) {
 output [x][y] += acc / accky;
 acc = acc * ky + input [x][y];
 }
 }
 }
 /* first pass Y direction */
 printf ("3"); fflush (stdout);
 acckx = 1. / (1. - kx);
 for (x = 0; x < mptr -> xSize; x++) {
 acc = input [x][0] * acckx;
 for (y = 0; y < mptr -> ySize; y++) {
 output [x][y] += acc / acckx;
 acc = acc * kx + input [x][y];
 }
 }
 /* second pass Y direction */
 printf ("4"); fflush (stdout);
 if (flag == '4') {
 for (x = 0; x < mptr -> xSize; x++) {
 acc = input [x][mptr -> ySize - 1] * acckx;
 for (y = mptr -> ySize - 1; y >= 0; y--) {
 output [x][y] += acc / acckx;
 acc = acc * kx + input [x][y];
 }
 }
 }
 /* averaging for 2 or 4 passes */
 printf ("A"); fflush (stdout);
 for (x = 0; x < mptr -> xSize; x++) {
 for (y = 0; y < mptr -> ySize; y++) {
 output [x][y] /= flag - '0';
 }
 }
 printf ("\r"); fflush (stdout);
}



Listing Three

/* common.h
 * QNX 4
 * (C) Copyright 1993 by Robert Krten, all rights reserved.
 * This module contains the common utility functions for the fault
 * handlers.
 * 1993 10 26 R. Krten created
 * 1993 11 25 R. Krten released for publication / public use
*/

/* prototypes */
void **ECalloc2d (int, int, int);
void ERead2d (FILE *, void **);
void EWrite2d (FILE *, void **);

void EFree2d (void **);
void filter (int **, int **, double, double, int);


Listing Four 

/* fault.c
 * QNX 4
 * (C) Copyright 1988 by Robert Krten, all rights reserved.
 * 1988 05 16 R. Krten created
 * 1991 08 29 R. Krten ported to QNX 2
 * 1993 02 20 R. Krten ported to QNX 4/Windows
 * 1993 10 16 R. Krten expanded array (novelty of 32bit)
 * 1993 10 31 R. Krten allow filtering with generator (wind erosion)
 * 1993 11 25 R. Krten released for publication / public use
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <time.h>
#include <math.h>

#include "common.h"

static void optproc (int, char **);

char *progname = "fault"; /* for diagnostics */
char outfile [256]; /* for output of fault program */
int **land; /* terrain */
int **outland; /* for filtering */
int dimensions [2]; /* holds size of terrain */
int nit; /* Number of ITerations */
double kx, ky; /* FIR filter constants */
int postFilter; /* flag indicating post-filtering */
int windFilter; /* flag indicating wind-erosion filtering */
int filtering; /* flag indicating any filtering */
int filterType; /* flag indicating type of filter (2 or 4 pass) */

main (argc, argv)
int argc;
char **argv;
{
 optproc (argc, argv);
 srand (time (NULL));
 land = ECalloc2d (dimensions [0], dimensions [1], sizeof (int));
 if (filtering postFilter) {
 outland = ECalloc2d (dimensions [0], dimensions [1], sizeof (int));
 }
 fault (nit);
}
fault (n)
int n;
{
 int i;
 FILE *fp;
 int **tmp;


 if ((fp = fopen (outfile, "w")) == NULL) {
 fprintf (stderr, "%s: couldn't open %s for w\n", progname, outfile);
 exit (1);
 } 
 printf ("Fault /%d [%d x %d]\r", nit, dimensions [0], dimensions [1]);
 fflush (stdout);
 for (i = 0; i < n; i++) {
 printf ("Fault %5d\r", i); fflush (stdout);
 generate_line (i + 1);
 if (windFilter (postFilter && i == n - 1)) {
 filter (land, outland, kx, ky, filterType);
 /* swap input and output arrays */
 tmp = land;
 land = outland;
 outland = tmp;
 }
 }
 printf (" \n");
 EWrite2d (fp, land);
 EFree2d (land);
 fclose (fp);
}
generate_line (height)
int height;
{
 int x1, y1, x2, y2;
 register x3, y3;
 int xintercept;
 int sign;
 double slope;
 int landset;

 do {
 y1 = random (dimensions [1]);
 y2 = random (dimensions [1]);
 } while (abs (y2 - y1) <= 2);

 do {
 x1 = random (dimensions [0]);
 x2 = random (dimensions [0]);
 } while (abs (x2 - x1) <= 2);
 slope = (double) (y2 - y1) / (double) (x2 - x1);
 sign = (random (10) > 5) ? -1 : 1;

 landset = (int) ((double) nit / (double) height);
 for (y3 = 0; y3 < dimensions [1]; y3++) {
 xintercept = (int) ((double) (y3 - y1) / slope) + x1;
 if (xintercept < 0) {
 xintercept = 0;
 }
 if (xintercept > dimensions [0]) {
 xintercept = dimensions [0];
 }
 for (x3 = 0; x3 < xintercept; x3++) {
 land [x3][y3] += sign * landset;
 }
 for (x3 = xintercept; x3 < dimensions [0]; x3++) {
 land [x3][y3] -= sign * landset;
 }

 }
}
random (n)
int n;
{
 return (rand () % n);
}
void
usageError ()
{
 fprintf (stderr, "%s: error in command line.\n", progname);

 fprintf (stderr,
 "\n"
 "use: fault [options] outFile\n"
 "\n"
 "where [options] are optional arguments from:\n"
 " -n NIT specify number of iterations (default 100)\n"
 " -x xSize specify X size (default 256)\n"
 " -y ySize specify Y size (default 256)\n"
 " -f const X and Y filtering constant (default 0.5)\n"
 " -F type specify filter type (2 or 4 pass, default 4)\n"
 " -p filter once at end of run\n"
 " -w filter each time through\n"
 "\n");

 exit (1);
}
void
optproc (argc, argv)
int argc;
char **argv;
{
 int opt;

 if (!argc) {
 usageError ();
 }

 dimensions [0] = dimensions [1] = 256;
 nit = 100;
 kx = ky = 0.5;
 filterType = '4';
 postFilter = 0;
 windFilter = 0;

 while ((opt = getopt (argc, argv, "f:x:y:n:F:pw")) != -1) {
 switch (opt) {
 case 'f':
 kx = ky = atof (optarg);
 break;
 case 'x':
 dimensions [0] = atoi (optarg);
 break;
 case 'y':
 dimensions [1] = atoi (optarg);
 break;
 case 'n':
 nit = atoi (optarg);

 break;
 case 'F':
 if (*optarg == '4') {
 filterType = '4';
 } else if (*optarg == '2') {
 filterType = '2';
 } else {
 usageError ();
 }
 break;
 case 'p':
 postFilter = 1;
 break;
 case 'w':
 windFilter = 1;
 break;
 default:
 usageError ();
 break;
 }
 }
 for (; optind < argc; optind++) {
 strcpy (outfile, argv [optind]);
 }
 filtering = windFilter postFilter;
}




































July, 1994
3-D Texture Mapping


Mapping 2-D images onto a 3-D surface




Jeremy Spiller


Jeremy graduated from Fitchburg State College in 1993 and works for SKY-SKAN
Inc. (Nashua, NH). He can be contacted at P.O. Box 1094, Townsend, MA 01469.


Texture mapping allows you to project a two-dimensional image, or texture map,
onto a flat polygon placed on a three-dimensional surface. Drawing 3-D
graphics involves two steps: constructing and manipulating a model, then
projecting that model onto a 2-D flat screen. The program presented in this
article draws a rotating cube (the model) with a different picture painted on
each visible side. The pictures I've chosen are simple mathematical formulas;
however, you can place any picture imaginable on the cube--even play a movie
on each of the six sides as it rotates!


Manipulating a Model


Constructing a model can be a complex task. For the moment, assume that the
model has already been constructed out of an array of points in 3-D space.
While there are many ways to transform the model, the three basic operations
discussed here are translating (or moving), scaling, and rotating. Matrix
multiplication makes these transformations possible. Multiplying the
appropriate matrix by each of the points in the model rotates it by 30 degrees
around the z-axis. Multiplying with a different matrix moves the model to a
new location. A matrix that contains information to translate, scale, or
rotate a model is referred to as a "transformation matrix." 
Scaling the model moves all the points inwards or outwards from the origin,
0,0,0 (where the viewer is located). Rotation can be broken down into three
separate operations called "pitch" (rotation around the x-axis), "yaw"
(rotation around the y-axis), and "roll" (rotation around the z-axis). With
scaling and rotation, it's important to have the proper origin; otherwise, the
center of the model will also be moved to a new location (see Figure 1).
Listing One includes the implementation of the texture-mapping algorithm. The
functions Translate, Scale, RotateX, RotateY, and RotateZ can be used
individually or in any combination to create matrices that perform these
operations upon a model; see Figures 2, 3, and 4. For instance, if you wanted
to move the center of a model to the origin, rotate it by 30 degrees around
the x-axis, rescale the model to twice its size, and move it 100 units back
from the viewer, you would perform the operation in Figure 5(a) on each of the
points in the model. Notice that the effects of the operations are from right
to left and cannot be reversed without reversing the effect. This is because
matrix multiplication, unlike real multiplication, is not commutative--A*B
does not equal B*A. On the other hand, matrix multiplication is
associative--A*(B*C) equals (A*B)*C. Therefore, you can assign a single
transformation matrix with the combined effect; see Figure 5(b). Whenever you
want to perform all of these operations on the model, you need only multiply
the transformation matrix by each point NewPoint=T*OldPoint. 
The main() function in Listing One constructs a cube (actually, a 3-D
rectangle) and demonstrates how to construct a simple model. The six sides of
the cube are constructed and stored in the variables S1 through S6. Once in
the proper spot, they need never be recalculated. This saves seven matrix
multiplications, not to mention the sine and cosine calculations, each time
the cube is rotated. Once it has been constructed, the model is manipulated
with the transformation matrix Cube. The cube is translated to its center,
rotated about the origin (which is now at the cube's center), then translated
away from the viewer. 


Projecting a Model


In simple terms, as a model moves away from the viewing screen, it gets
smaller. To project a point onto the screen, the x- and y-coordinates are
divided by the z-coordinate. After the point has been projected onto an
imaginary viewing screen, it must be scaled and translated for a real monitor.
This is done in main() with Cube=ScaleXYZ (230,200,1)*Cube, which scales for
the unit pixel. The x- and y-coordinates are scaled differently because the
pixels are about 15 percent taller (in this video mode) than they are wide.
The actual values in this example are somewhat arbitrary. (The object can be
moved back and forth on the z-axis to make it appear larger and smaller.)
However, values that are too large will make the model appear flat and not
three dimensional, while values that are too small will make the object appear
distorted. After the points have been projected, they are translated (through
the global variables MX and MY) onto the center of the screen.
The Find_Outline function calculates where the four corners of the rectangle
will land on the screen when transformed through the matrix M. The Tran
function maps each of them through the matrix M into its new x-, y-, and
z-coordinates. They are then projected and translated onto the screen. The
do_line function clips and traces the outline of the rectangle into the arrays
MinX and MaxX, so that the area inside the rectangle can be painted. 


Texture Mapping


Often a model consists of a few points connected by lines. In these cases, the
points are projected onto the screen, and the computer simply connects the
dots. If the model consists of opaque polygons, the computer connects the dots
to form a polygon, which is then painted a solid color. Texture mapping,
however, allows you to place an image in each of the polygons. The image (or
texture map) must be projected onto the 2-D screen in the proper perspective
to look like it is a flat surface in 3-D space that has been translated,
scaled, and rotated into the polygon.
One approach is to treat the 2-D texture map as a model and translate, scale,
and rotate it into the proper place. It would then be a simple matter to
project each point from the texture map onto the screen. While this works,
it's inefficient, and the results can possibly be quite ugly. Consider that
squeezing a 100x100 texture map into a 10x10 square on the screen means
drawing 10,000 pixels when only 100 pixels are required. The solution is to
map the points from the screen onto the texture map. This is what
Project_Plane does. To accomplish this, apply the inverse of the projection
formula to each point on the screen. Any point on a 2-D texture map can be
projected onto the screen using the formula in Figure 6. The numerator of the
equations is the x- and y-coordinates of the point in 3-D space, while the
denominator is the z-coordinate. Likewise, points can be projected from the
screen onto the texture map with the same equations, but with different
constants.
After finding the outline, Project_Plane calculates the constants necessary to
map points from the screen onto the texture map. Fixed-point variables are
used to speed up calculations inside the loops. The value I've chosen for
FIXED_POINT scales the constants to be as large as possible without
overflowing. If you use a larger texture map (or a higher-resolution screen),
you may need to use a smaller constant. The for loop scans down the outline
map looking for places on the screen to paint. Once located, the constants
previously derived are used to calculate the inverse x-, y-, and
z-coordinates. The z-coordinate is forced to stay odd to prevent an accidental
divide by zero. The final texture-map coordinates are calculated and stored in
GridX and GridY. 
The innermost while loop scans across the screen until the end of the area to
be filled is found. Two optimizations are performed inside this loop. The
first takes advantage of the fact that y is a constant. Therefore, x=(Ax+By+C)
can be rewritten as x=(Ax+j), where the constant j=By+C. Similar optimizations
are performed for y and z, saving a total of three multiplications per pixel.
The other optimization takes advantage of the fact that the x screen
coordinate is always incremented by 1 on each iteration of the while loop. The
difference in x is given by the equation Dx=(A(x+1)+j)--(Ax+j)=A. Therefore,
finding the next x-, y-, and z-coordinates is a simple matter of adding dx,
dy, and dz. This eliminates the other three multiplies, leaving only two
divides and three adds per pixel.


Visibility Test


Even though a cube has six sides, no more than three are visible at any one
time. This means that some sides of the cube must cover other sides. The
simplest method of removing the unseen sides (that is, performing
hidden-surface removal) is to remove all sides facing away from the viewer.
The plane-equation method can be used to determine which side of a plane the
viewer is on. Given the coefficients A, B, C, and D of a plane and any point
(x,y,z), the formula Ax+By+Cz+D will be 0 if the point is on the plane,
positive if the point is in front of the plane, and negative if the point is
behind the plane. The coefficients of the plane can be derived from any three
points on the plane by using the equations shown in Figure 7. Since you are
interested only in whether or not a plane is visible from the origin, the x-,
y-, and z-coordinates will be 0, and you only need to find the single
coefficient D. The easiest three points to use to calculate the coefficient D
are (0,0,0), (0,1,0), and (1,0,0), which are the points used by the Visible
function. The plane-equation method successfully removes about 50 percent of
the polygons on any object that has a definite inside and a definite outside.
Remember, if you are on the inside of the object, the test must be reversed;
otherwise, you won't see the object.
Unfortunately, this method can't remove all of the hidden surfaces of a 3-D
object. Other methods of hidden-surface removal include the z-buffering
scheme, which stores the z-coordinate (or the distance from the viewer) of
every pixel on the screen in a buffer. Only the pixels closest to the viewer
are kept. Another method, known as the "painter's method," is to draw all of
the polygons in the correct order, back to front, so that the closer polygons
cover up polygons farther from the viewer. For in-depth discussions of these
and other hidden-surface removal techniques, see Christopher Lampton's Flights
of Fantasy (The Waite Group Press, 1993) and Lee Adams's High Performance CAD
Graphics in C (McGraw-Hill, 1989). 


Optimizations


If it takes more than about one-tenth of a second to draw a side of the cube
(not including floating-point calculations), the culprit is probably the two
long-divide instructions that calculate GridX and GridY in the innermost loop
of Project_Plane. This is because an 80286 CPU lacks the instruction to
perform a full 32-bit divide, so a subroutine must be used instead. The
subroutine performs the divide by shifting and subtracting the numbers, which
can take hundreds of times longer than the equivalent 32-bit 80386 divide
instruction. What the compiler does not know (but we do) is that the results
need only be accurate to one texture-mapped pixel, which will always be in the
range of 0 to 256 (or greater for larger texture maps). Therefore, you can
perform a 32-by-16-bit divide to produce a 16-bit result, something the 80286
CPU can do easily. The assembly-language code to do this follows the
Project_Plane function.
On a PC without a coprocessor, most of the time will be spent performing
matrix operations. Try optimizing the matrix functions to use fixed-point
calculations. The same optimizations can be performed in do_line. Because
matrix multiplication is so basic to 3-D graphics, it is worthwhile to unroll
the loops in the matrix-multiplication function.
In many applications, only 2-D graphics are needed. A 2-D image can be
translated and scaled on the x- and y-axes and rotated about the z-axis
without becoming three dimensional. Because projecting 2-D graphics onto the
screen is a linear function and does not involve perspective, the z-coordinate
will stay constant (DeltaZ will always be 0) while drawing the polygon. The
formula GridX=X/Z can be rewritten as GridX=CX, where the constant C is the
inverse of Z. The constant C can be removed from the loop by multiplying it by
DeltaX. Performing these optimizations for 2-D graphics will remove both
divides in the innermost loop, speeding up drawing immensely.



Gotchas


This program has two "gotchas." Some of the pixels at the extreme edge of the
rectangle will be projected out of the range of the texture map. This is
because of the inaccuracies of fixed-point arithmetic and because the screen
coordinates are integers. The easiest solution to this is to not print any
pixels that are projected out of range of the texture map. 
Secondly, the program can only project rectangular texture maps instead of
true polygons. The easiest solution to this is to have a pixel color that is
clear and not print clear pixels. Another solution is to project all of the
points in the polygon onto the screen and paint only within its outline.


Projects


Using the ScaleXYZ function, you can stretch the model and create reflections.
For instance, ScaleXYZ (2,--1,1) will make the model stretch to twice its
width on the x-axis and make a mirror reflection on the y-axis. You can also
build matrices that tilt, shear, and distort the model in strange ways. For
example, try removing one or more of the commented lines from
Distort_Function. If you don't see the cube when experimenting with your own
distort functions, you've probably moved it off the screen. If you see that
the image is mapped into the polygon more than once, it's likely that the
fixed-point arithmetic has overflowed. Most of the time, this problem can be
corrected by making the value for FIXED_POINT smaller.
For another interesting effect, try drawing two cubes on the screen, each with
slightly different positions and angles. If you cross your eyes, you can
create a stereoscopic effect. Remember, the right eye views the left image;
the left eye, the right image. If you switch them, or get the viewing angle or
position wrong, you won't get the stereoscopic effect; you'll just get eye
strain.
Figure 1 (a) Rotation of a triangle around the z-axis--the center of the
triangle moves during the rotation (to prevent this, first translate the model
so its center is at the origin); (b) rotation around the z-axis with origin at
the center; (c) scaling a triangle not at the origin to twice its size (the
model appears to move because its center is not initially at the origin; (d)
scaling with origin at the center.
Figure 2 Translation matrix.
Figure 3 Scaling matrix.
Figure 4 Transformation matrices to rotate by q radians. (a) rotation around
the x-axis; (b) rotation around the y-axis; (c) rotation around the z-axis.
Figure 5: (a) Calculation to perform translation, scaling, and rotation. The
calculation must be performed on each point in the model; (b) using the
associative property of matrix multiplication to assign a single
transformation matrix with a combined effect.
(a)
 NewPoint = Translate(0,0,100)*Scale(2)*RotateX(30 Deg)
 *Translate(DCX,DCY,DCZ)*OldPoint
(b)
 T = Translate(0,0,100)*Scale(2)*RotateX(30 Degrees)
 *Translate(DCX,DCY,DCZ)
Figure 6: Projecting a point from a 2-D texture map onto the screen. The
constants A, B, C, D, E, F, G, H, and I are derived from the transformation
matrix. Variables x and y are the x- and y-coordinates on the texture map. 
Xscreen = (AX+BY+C)/(GX+HY+I)
Yscreen = (DX+EY+F)/(GX+HY+I)
Figure 7: Coefficients of the plane can be derived from any three points on
the plane with these equations.
A = Y1(Z2--Z3)+Y2(Z3--Z1)+Y3(Z1--Z2)
B = Z1(X2--X3)+Z2(X3--X1)+Z3(X1--X2)
C = X1(Y2--Y3)+X2(Y3--Y1)+X3(Y1--Y2)
D = --X1(Y2*Z3--Y3*Z2)--X2(Y3*Z1--Y1*Z3)
--X3(Y1*Z2--Y2*Z1)

Listing One 

// *************************************************************************
// Texture mapping, copyright (C) 02/01/1993 by Jeremy Spiller
// This file contains the implementation for the texture mapping algorithm.
// *************************************************************************

#include <stdlib.h>
#include <iostream.h>
#include <conio.h>
#include <dos.h>
#include <math.h>
unsigned _stklen = 64000U; // --- 64K of stack space ---

// --- Global variables ---
#define max(X,Y) ( ((X) > (Y)) ? (X) : (Y) )
#define min(X,Y) ( ((X) < (Y)) ? (X) : (Y) )
#define SCR_MAX_X 319
#define SCR_MAX_Y 199
int MX = 160, MY = 100;
int MinY, MaxY, MinX [SCR_MAX_Y+1], MaxX [SCR_MAX_Y+1];
char *Map1, *Map2, *Map3, *PhysicalScreen;

// ===== Graphics functions =====
char *Pixel_Pointer, *ScreenBuffer;
// --- Set the (X, Y) position to draw future pixels ---

#define Set_Pixel_Position(X,Y) \
 (Pixel_Pointer = ScreenBuffer + 320*(Y) + (X))
// --- Draw a pixel, then move the position to the right ---
#define Plot_Pixel_Across(Color) \
 (*Pixel_Pointer++ = (Color))
// --- Call a bios function to set the screen to graphics mode ---
void ibm_graphics_mode ()
 {
 _AH =0x00; // --- Set video mode ---
 _AL =0x13; // --- Mode = VGA (320x200) ---
 geninterrupt (0x10); // --- Perform mode change ---
 }
// --- Calls a bios function to set the screen to text mode ---
void ibm_text_mode ()
 {
 _AH =0x00; // --- Set video mode ---
 _AL =0x03; // --- Mode = text ---
 geninterrupt (0x10); // --- Perform mode change ---
 }
// ===== MAT3D - Matrix functions for 3D matrix operations. This is a standard
// mathematical matrix, accessed as M [Row][Column] (or M [Y][X]) where
// M[0][0] is the first element.
class MAT_COLUMN
 {
 double Column [4];
 public:
 double &operator [] (int Index)
 {
 return Column [Index];
 }
 };
// --- 3D matrix class ---
class MAT3D
 {
 MAT_COLUMN Matrix [4];
 public:
 MAT_COLUMN &operator [] (int Index)
 {
 return Matrix [Index];
 }
 friend MAT3D operator * (MAT3D &A, MAT3D &B);
 };
// --- Multiply two matrices, returns A*B ---
MAT3D operator * (MAT3D &A, MAT3D &B)
 {
 int I, J, K;
 MAT3D Temp;
 for (I = 0; I < 4; I++)
 for (J = 0; J < 4; J++)
 {
 double Sum = 0;
 for (K = 0; K < 4; K++)
 Sum += A [I][K] * B [K][J];
 Temp [I][J] = Sum;
 }
 return Temp;
 }
// --- Produce an identity matrix ---
MAT3D Identity ()

 {
 int X, Y;
 MAT3D Temp;
 for (Y = 0; Y < 4; Y++)
 for (X = 0; X < 4; X++)
 if (X == Y)
 Temp [Y][X] = 1;
 else
 Temp [Y][X] = 0;

 return Temp;
 }
// --- Calculate pitch - rotation around the X axis ---
MAT3D RotateX (double Angle)
 {
 MAT3D Temp = Identity ();
 Temp [1][1] = Temp [2][2] = cos (Angle);
 Temp [2][1] = - ( Temp [1][2] = sin (Angle) );
 return Temp;
 }
// --- Calculate yaw - rotation around the Y axis ---
MAT3D RotateY (double Angle)
 {
 MAT3D Temp = Identity ();
 Temp [0][0] = Temp [2][2] = cos (Angle);
 Temp [0][2] = - ( Temp [2][0] = sin (Angle) );
 return Temp;
 }
// --- Calculate roll - rotation around the Z axis ---
MAT3D RotateZ (double Angle)
 {
 MAT3D Temp = Identity ();
 Temp [0][0] = Temp [1][1] = cos (Angle);
 Temp [1][0] = - ( Temp [0][1] = sin (Angle) );
 return Temp;
 }
// --- Calculate pitch, yaw, and roll ---
MAT3D RotateXYZ (double AngleX, double AngleY, double AngleZ)
 {
 return RotateZ (AngleZ) * RotateY (AngleY) * RotateX (AngleX);
 }
// --- Calculate Scale - enlarge or shrink model (about origin) ---
MAT3D Scale (double Scale)
 {
 MAT3D Temp = Identity ();
 Temp [0][0] = Temp [1][1] = Temp [2][2] = Scale;
 return Temp;
 }
// --- Calculate Scale for each axis (can cause distortion) ---
MAT3D ScaleXYZ (double SX, double SY, double SZ)
 {
 MAT3D Temp = Identity ();
 Temp [0][0] = SX;
 Temp [1][1] = SY;
 Temp [2][2] = SZ;
 return Temp;
 }
// --- Translate - move model ---
MAT3D Translate (double X, double Y, double Z)

 {
 MAT3D Temp = Identity ();
 Temp [0][3] = X;
 Temp [1][3] = Y;
 Temp [2][3] = Z;
 return Temp;
 }
// --- Translate a point through a matrix, returns M * [X, Y, Z, 1]t ---
void Translate_Point (MAT3D &M, double &X, double &Y, double &Z)
 {
 double TX = X*M[0][0] + Y*M[0][1] + Z*M[0][2] + M[0][3];
 double TY = X*M[1][0] + Y*M[1][1] + Z*M[1][2] + M[1][3];
 double TZ = X*M[2][0] + Y*M[2][1] + Z*M[2][2] + M[2][3];
 X = TX;
 Y = TY;
 Z = TZ;
 }
// -- Draw a line clipped on Y axis, and has only one point per scan line. --
void do_line (void Plot(long X,long Y),double X1,double Y1,double X2,double
Y2)
 {
 double YL = 0, YH = 200, Temp, DeltaX;
 if ((Y1 < YL && Y2 < YL) (Y1 > YH && Y2 > YH))
 return;
 // --- if Y1 > Y2, swap (X1, Y1) with (X2, Y2) ---
 if (Y1 > Y2)
 {
 Temp = Y1;
 Y1 = Y2;
 Y2 = Temp;
 Temp = X1;
 X1 = X2;
 X2 = Temp;
 }
 // --- Is this a horizontal line? ---
 if (Y2 - Y1 == 0)
 {
 Plot (X1, Y1);
 Plot (X2, Y2);
 return;
 }
 DeltaX = (X2 - X1) / (Y2 - Y1);
 // --- Clip points ---
 if (Y1 < YL)
 {
 X1 = X1 + (YL - Y1) * DeltaX;
 Y1 = YL;
 }
 if (Y2 > YH)
 {
 X2 = X2 + (YH - Y2) * DeltaX;
 Y2 = YH;
 }
 // --- Draw line ---
 while (Y1 <= Y2)
 {
 Plot (X1, Y1);
 X1 += DeltaX;
 Y1 += 1;
 }

 }
// --- Find the minimum and maximum bounds of the plane ---
void MinMaxPixel (long X, long Y)
 {
 if (Y >= 0 && Y < SCR_MAX_Y)
 {
 MinX [Y] = min (MinX [Y], X);
 MaxX [Y] = max (MaxX [Y], X);
 }
 }
// --- Fill up the min/max outline ---
void Find_Outline (MAT3D &M, double Xp, double Yp)
 {
 double P1x, P1y, P2x, P2y, P3x, P3y, P4x, P4y;
 double Z1 = 0, Z2=0, Z3=0, Z4=0;
 int CountY;
 MaxY = 0;
 MinY = SCR_MAX_Y;
 // --- Initialize array values ---
 for (CountY = 0; CountY < SCR_MAX_Y; CountY++)
 {
 MinX [CountY] = 32000;
 MaxX [CountY] = -32000;
 }
 // --- Project four corners ---
 P1x = 0; P1y = 0;
 P2x = Xp; P2y = 0;
 P3x = Xp; P3y = Yp;
 P4x = 0; P4y = Yp;
 Translate_Point (M, P1x, P1y, Z1);
 Translate_Point (M, P2x, P2y, Z2);
 Translate_Point (M, P3x, P3y, Z3);
 Translate_Point (M, P4x, P4y, Z4);
 // --- Projection formula ---
 P1x = P1x/fabs(Z1) + MX;
 P1y = P1y/fabs(Z1) + MY;
 P2x = P2x/fabs(Z2) + MX;
 P2y = P2y/fabs(Z2) + MY;
 P3x = P3x/fabs(Z3) + MX;
 P3y = P3y/fabs(Z3) + MY;
 P4x = P4x/fabs(Z4) + MX;
 P4y = P4y/fabs(Z4) + MY;
 // --- Calculate min and max values of outline ---
 do_line (MinMaxPixel, P1x, P1y, P2x, P2y);
 do_line (MinMaxPixel, P2x, P2y, P3x, P3y);
 do_line (MinMaxPixel, P3x, P3y, P4x, P4y);
 do_line (MinMaxPixel, P4x, P4y, P1x, P1y);
 }

// ==== Project a texture map onto the screen. The size of the map is
// [0..Xp] [0..Yp]. It will be transformed through the matrix M,
// translated by the variables MX and MY, and drawn on the screen.

void Project_Plane (MAT3D &M, char *Map, int Xp, int Yp)
 {
 int GridX, GridY;
 long X, Y, Z, DeltaX, DeltaY, DeltaZ;
 int Ypos, Xpos, MaxXpos, LineLength;
 Find_Outline (M, Xp, Yp);

 // --- Calculate inverse of M * [X, Y, 0, 1]t ---
 double Sx1 = M [0][0], Sy1 = M [0][1], T1 = M [0][3];
 double Sx2 = M [1][0], Sy2 = M [1][1], T2 = M [1][3];
 double Sx3 = M [2][0], Sy3 = M [2][1], T3 = M [2][3];
 const long FIXED_POINT = 64;
 // --- Calculate X axis (Scale X) ---
 long SXx = (T2*Sy3 - T3*Sy2) * FIXED_POINT; // Scale X
 long SXy = (T3*Sy1 - T1*Sy3) * FIXED_POINT; // Scale Y
 long SXt = (T1*Sy2 - T2*Sy1) * FIXED_POINT; // Translate
 // --- Calculate Y axis (Scale Y) ---
 long SYx = (T3*Sx2 - T2*Sx3) * FIXED_POINT; // Scale X
 long SYy = (T1*Sx3 - T3*Sx1) * FIXED_POINT; // Scale Y
 long SYt = (T2*Sx1 - T1*Sx2) * FIXED_POINT; // Translate
 // --- Calculate Z axis (Scale Z) ---
 long SZx = (Sx3*Sy2 - Sx2*Sy3) * FIXED_POINT; // Scale X
 long SZy = (Sx1*Sy3 - Sx3*Sy1) * FIXED_POINT; // Scale Y
 long SZt = (Sx2*Sy1 - Sx1*Sy2) * FIXED_POINT; // Translate
 for (Ypos = 0; Ypos < SCR_MAX_Y; Ypos += 1)
 {
 Xpos = max (0, MinX [Ypos]);
 MaxXpos = min (SCR_MAX_X, MaxX [Ypos]);
 LineLength = MaxXpos - Xpos + 1;
 if (Xpos <= MaxXpos)
 {
 X = ((Xpos-MX)*SXx + (Ypos-MY)*SXy + SXt);
 DeltaX = SXx;
 Y = ((Xpos-MX)*SYx + (Ypos-MY)*SYy + SYt);
 DeltaY = SYx;
 Z = ((Xpos-MX)*SZx + (Ypos-MY)*SZy + SZt) 1; //make z odd
 DeltaZ = SZx & ~1; // --- Force Z to stay odd ---
 Set_Pixel_Position (Xpos, Ypos);
 while (--LineLength >= 0)
 {
 GridX = X / Z;
 GridY = Y / Z;
 X += DeltaX;
 Y += DeltaY;
 Z += DeltaZ;
 Plot_Pixel_Across (Map [256*GridY + GridX]);
 }
 }
 }
 }
// -- If you are compiling this for a 286 machine, or with a 286 compiler, 
// replace the two lines (GridX = X/Z) and (GridY = Y/Z) with the following
// assembly code. This will speed up the process by using a machine 
// language divide instruction instead of a subroutine. 
// asm mov al, byte ptr Y+3
// asm cbw
// asm mov dx, ax
// asm mov cx, word ptr Z+1
// asm mov ax, word ptr Y+1
// asm idiv cx
// asm mov word ptr GridY, ax;
//
// asm mov al, byte ptr X+3
// asm cbw
// asm mov dx, ax
// asm mov cx, word ptr Z+1

// asm mov ax, word ptr X+1
// asm idiv cx
// asm mov word ptr GridX, ax

// --- Is the viewer on the visible (or invisible) side of the plane? ---
int Visible (MAT3D &M)
 {
 // --- Point1 = M * [0, 0, 0, 1]t ---
 double X1 = M [0][3];
 double Y1 = M [1][3];
 double Z1 = M [2][3];
 // --- Point2 = M * [0, 1, 0, 1]t ---
 double X2 = M [0][1] + M [0][3];
 double Y2 = M [1][1] + M [1][3];
 double Z2 = M [2][1] + M [2][3];
 // --- Point3 = M * [1, 0, 0, 1]t ---
 double X3 = M [0][0] + M [0][3];
 double Y3 = M [1][0] + M [1][3];
 double Z3 = M [2][0] + M [2][3];
 double D = -X1*(Y2*Z3-Y3*Z2) - X2*(Y3*Z1-Y1*Z3) - X3*(Y1*Z2-Y2*Z1);
 // --- If the viewer is on the positive side, the plane is visible ---
 return D > 0;
 }
// --- Return a matrix that distorts the model ---
MAT3D Distort_Function ()
 {
 MAT3D Distort = Identity ();
 // Distort [0][0] = 2; // Stretch on X axis
 // Distort [1][0] = Distort [0][1] = .4; // Parallelogram
 // Distort [3][1] = .005; // Pyramid
 return Distort;
 }
// --- rotate a cube ---
main ()
 {
 int I;
 long Fx, Fy;
 // --- Set up memory ---
 PhysicalScreen = (char *)MK_FP (0xA000, 0x0000);
 ScreenBuffer = new char [320U*200];
 Map1 = new char [256U*200];
 if (!( Map2 = new char [256U*200] )) Map2 = Map1;
 if (!( Map3 = new char [256U*200] )) Map3 = Map1;
 if (ScreenBuffer == 0 Map1 == 0)
 {
 cout << "Sorry, not enough memory to run this program!\n";
 exit (1);
 }
 // --- Draw a picture of crosses (X*Y) ---
 for (Fx = 0; Fx < 256; Fx++)
 for (Fy = 0; Fy < 199; Fy++)
 Map1 [256*Fy + Fx] = (Fx - 128) * (Fy - 100) / 256 + 128;
 // --- Draw a picture of circles (X*X + Y*Y) ---
 for (Fx = 0; Fx < 256; Fx++)
 for (Fy = 0; Fy < 199; Fy++)
 {
 long Dx = (Fx - 128);
 long Dy = (Fy - 100);
 Map2 [256*Fy + Fx] = (Dx*Dx + Dy*Dy) / 128 + 16;

 }
 // --- Draw some lines (X) ---
 for (Fx = 0; Fx < 256; Fx++)
 for (Fy = 0; Fy < 199; Fy++)
 Map3 [256*Fy + Fx] = Fx + 17;
 // --- Build 6 sides of a cube relative to the origin ---
 MAT3D Flip, S1, S2, S3, S4, S5, S6;
 Flip = RotateY (180 * 3.141592654 / 180);
 Flip = Translate (256, 0, 0) * Flip;
 S1 = Translate (0, 0, 0);
 S2 = Flip;
 S2 = RotateX (-90 * 3.141592654 / 180) * S2;
 S3 = Flip;
 S3 = Translate (-56, 0, 0) * S3;
 S3 = RotateY (90 * 3.141592654 / 180) * S3;
 S4 = Flip;
 S4 = Translate (0, 0, 200) * S4;
 S5 = RotateX (-90 * 3.141592654 / 180);
 S5 = Translate (0, 200, 0) * S5;
 S6 = RotateY (90 * 3.141592654 / 180);
 S6 = Translate (256, 0, 0) * S6;
 ibm_graphics_mode ();
 for (I = 0; I < 20000; I++)
 {
 // --- Move center of cube to origin, rotate cube, move ---
 // --- cube back (away from viewer), and scale for screen ---
 MAT3D Cube = Translate (-128, -100, -100);
 Cube = Distort_Function () * Cube;
 Cube = RotateXYZ (I*0.051, I*0.01, I*0.037) * Cube;
 Cube = Translate (0, 0, sin (-I*0.0121) * 350 + 350 + 400) * Cube;
 Cube = ScaleXYZ (230, 200, 1) * Cube; // --- scale for screen ---
 // --- Transform each side of the cube ---
 MAT3D Side1 = Cube*S1;
 MAT3D Side2 = Cube*S2;
 MAT3D Side3 = Cube*S3;
 MAT3D Side4 = Cube*S4;
 MAT3D Side5 = Cube*S5;
 MAT3D Side6 = Cube*S6;
 // --- Draw each side of the cube (if visible) ---
 memset (ScreenBuffer, 0, 320U*200); // Clear screen buffer
 if (Visible (Side1))
 Project_Plane (Side1, Map1, 255, 199);
 if (Visible (Side2))
 Project_Plane (Side2, Map2, 255, 199);
 if (Visible (Side3))
 Project_Plane (Side3, Map3, 199, 199);
 if (Visible (Side4))
 Project_Plane (Side4, Map1, 255, 199);
 if (Visible (Side5))
 Project_Plane (Side5, Map2, 255, 199);
 if (Visible (Side6))
 Project_Plane (Side6, Map3, 199, 199);
 memcpy (PhysicalScreen, ScreenBuffer, 320U*200); // Copy to screen
 if (kbhit ()) if (getch () == 27) break; // Break at users request
 }
 ibm_text_mode ();
 cout << "Texture mapped cube copyright (C) 1993 by Jeremy Spiller.\n";
 }
































































July, 1994
RAY: A Ray-Tracing Program in C++


The better the light model, the more realistic the image




Alain Mangen


Alain can be reached at CSC Computer Sciences, Avenue Lloyd Gearge 7, B-1050
Brussels, Belgium.


Ray tracing is a computer-graphics technique for generating realistic
three-dimensional images. It is based on the premise that a "scene" is created
when light rays from multiple sources strike individual objects. Depending on
surface characteristics of the objects, the rays can either change direction
or color, reflect or refract into a finite number of rays, or diffuse into an
infinite number of rays.
In this article, I present RAY, a ray-tracing program written in C++. While
I've limited the objects illuminated to spheres and planes, you can easily add
cones, cubes, cylinders, and other objects. The light model I implement
consists of diffuse, specular, and reflected components. RAY also performs
hidden-surface removal and simulates shadow and semishadow effects to produce
images of dazzling realism like that in Figure 1. Images the program generates
can be saved in true color (16.7 million colors) in standard TARGA files
supported by most commercial image-manipulation software. Since RAY uses an
associative palette of colors, it can display 256 colors in real time. It also
supports the VESA standard (see the accompanying text box entitled,
"Encapsulating VESA Services"), so the program is compatible with most Super
VGA cards. RAY also displays images in 256 levels of gray scale and stores
those images in TARGA files with color maps.


Ray Tracing


There are two basic approaches to ray tracing--forward and backward. As Figure
2 illustrates, forward ray tracing involves sending simulated rays from light
sources, computing their interaction with encountered objects, and determining
which light rays actually reach the eye of the observer. In the real world,
few rays actually reach the observer, so backward ray tracing operates in the
reverse--from the observer to the objects and light sources; see Figure 3.
This involves scanning pixel by pixel, sending a ray from the observer to a
pixel, and computing the corresponding RGB color based on the interactions of
the ray with all objects, starting with the nearest.
While many factors affect the image's realism--how you model objects, colors,
vector operations, and the like--I'll focus on the light model because its
quality ultimately determines the degree of realism. Assume, for instance,
that you want to compute the color of a point P on an object in the scene.
This point is met by ray --V, issued from the observer, and is illuminated by
a light source following the direction --S. The normal at the object at point
P is aligned following the direction --N, and the reflection direction that is
the symmetrical of --S relative to --N is represented by the vector --R; see
Figure 4. Light variations of point P depend on the roughness and material of
the object surface.
Ambient light is constant for the entire scene. It represents the sum of the
light not taken into account from other contributions.
Diffuse light slightly penetrates the surface of the encountered object and is
reemitted uniformly in all directions. Interaction between the light and the
solid is high, and the color of the ray is changed by the color of the
surface. Diffuse light is modeled using Lambert's Law: I=_n(INrd(--US.--UN)),
where IN is the color of the object with components RGB, rd is the diffuse
reflection factor for its surface, --US is the vector --S normalized, --UN is
the vector --N normalized, and _ is the sum computed for all light sources.
The scalar product (--US.--UN) is proportional to the cosine of the angle that
separates --S from --N, and goes from 0 for a horizontal illumination to 1 for
light from the normal direction.
Specular light represents the mirror effects given by a light that does not
penetrate the shiny surface of an object, but is reflected directly on its
outer surface in a manner that indicates the source's location. The color of
the specular light comes from the color of the source that illuminates the
object: I=_(ISrs(--UR.--UV)f), where IS is the color of the light source with
components RGB, rs is the specular reflection factor associated with the
surface of the object, --UR is the vector --R normalized, --UV is the vector
--V normalized, f is an empirical exponent that determines the level of
specular reflection (depending on the physical characteristics of the
surface), and the sum _ is computed for all light sources.The scalar product
(--UR.--UV) is proportional to the cosine of the angle that separates --R from
--V, and is thus maximal when the reflected ray --R is aligned along the
direction --V coming from the observer. The exponent f allows you to rapidly
decrease this effect when the alignment disappears.
The presence of several objects in a scene usually means that one object will
be in the illumination path of another. For the object that is then in a
position of shadow relative to the light source, the diffuse and specular
contributions of the light source are zeroed, leaving ambient light and
reflected rays.
To implement shadow and semishadow effects, you have to draw secondary rays
from the point to the source (following --S) and check that no other objects
are in the path. If none are, the diffuse and specular components do not have
to be computed for the source.
Some shiny surfaces reflect incoming light rays. Consequently, a ray can
bounce from one object to another before reaching the observer's eye. The
chromatic information carried by the ray will depend on the multiple
interactions of the ray with the different surfaces encountered: I=IRrr, where
rr is the reflection factor of the surface of the object. The intensity IR is
computed by sending a new ray from the point P, aligned along the direction
--W, which is the symmetrical of -- V relative to --N. This means that IR will
be evaluated by means of a recursive call to the module that computes the
illumination of a point. 


Illuminating Objects


RAY currently models geometric objects such as points and spheres, along with
rays, planes, colors, and vector operations. Since the point is the basic
object (both geometrically and programmatically), I'll focus on it. To
characterize the behavior of an object, you must compute both the intersection
of a light ray (a 3-D vector) with the surface of the object and the
orientation of the normal at the point of intersection; see the accompanying
text box entitled, "Modeling Objects." Because I use C++, you have only to
program these two rules to add new objects in the software.
In object-oriented (as opposed to geometric) terms, the point is the base
object that holds all basic characteristics of all other objects. All other
objects inherit the characteristics of the basic point via inheritance. 
A point with the coordinates [x, y, z] has a position in 3-D space. The
constructor of a point initializes it and automatically places it in a linked
list with two sentinels, Head and Trail. The destructor of a point removes it
from the linked list and if necessary resets the values of Head and Trail.
This has two consequences: First, you only have to declare an object of type
point to automatically store it in a linked list in memory and make it
automatically disappear when going out of scope; second, since all other
objects inherit from the point, all objects of the list will be placed in the
same linked list and will have the same behavior.
A C++ function scans this linked list, calling a method for each object. If
the method has been declared as virtual, the compiler will automatically call
the method corresponding to the actual type of the object found (late
binding).
For instance, with a linked list of one plane and one sphere, you can call the
method Intersect() for each object in the list. The compiler will then
automatically call the method Intersect() of the Plane object in the first
case, and Intersect() of the Sphere object in the second.
As each ray is cast from the observer's point of view, the object list is
scanned and the corresponding Intersect() method called. The object with the
smallest time value is identified as closest to the observer, and the
interaction of the ray with the corresponding surface is then computed.
The two methods attached to the object Point are the two routines needed to
characterize the behavior of the object as previously defined: Intersect()
computes the intersection of the object with a 3-D vector --V, and GetNormal
computes the normal of the object in a given point P.


The RAY Ray-Tracing Program


Although powerful, the basic RAY ray-tracing program is concise, due largely
to the object-oriented nature of C++. For example, I use operator overloading
to simplify vector operations; see Table 1. To generate an image, you simply
declare an object of type scene, create several objects of type light vector,
and attach the light vector objects to the scene object. You then declare the
objects to be represented (spheres or planes), call the RayTrace() method for
the scene, and display the image on the screen or store it to disk. 
The RAY ray-tracing program includes the following modules: 
RAY, which implements the scene to be represented using the objects defined in
other modules.
RAYOBJ, the definition of the predefined objects that defines their behavior
relative to the light model.
RAYSVGA, the interface routines for the VESA standard for displaying, in real
time, an approximation of the actual true-color picture computed in memory.
RAYTGA, the TARGA save routines needed to export images to other software.
The complete system, including demos, executables, C++ source code, sample
images (in GIF and TARGA format), and TSR versions of the VESA BIOS extensions
for most Super VGA cards are available electronically; see "Availability,"
page 3. 
Listing One is the include file RAY.H. The main parameters that allow you to
alter the behavior of the program are A, the size of the associative palette
array (normally 31 or 63) and A0, a threshold of visibility. Below this value,
all colors are considered black in order to spare palette entries for
more-significant colors. 
The #define constants Diffuse, Specular, Shadow, and Reflections allow you to
selectively activate or deactivate the different contributions in the light
model. Object-specific parameters are given in the objects' constructors (with
appropriate default values): Rd (the diffuse reflection factor), Rs (the
specular reflection factor), and Rr (the reflection factor), all of which
depend on the characteristics of the surface. MaxRef is the maximum number of
reflections that a light ray can have. This limits the level of recursiveness
in the computations. The program allows any value here, but empirically, since
each additional reflection is attenuated by the factor rr, a value of 1
already gives astonishing results.
Finally, parameters are given to RayTrace(). Spread is the radius used for the
associative palette algorithm. Values can range from 0 (for a simple image, no
specular reflection or reflection) to 7 (complex images, full light model with
multiple reflections). Search is used by the associative palette algorithm to
look for an approximate color when the palette is full.

Listing Two, RAY1.CPP, generated Figure 1--a full light model that produces
multiple reflections. Listing Three is RAYOBJ.H, and Listing Four is
RAYOBJ.CPP, which provides the various classes for the objects to be
illuminated (points, spheres, planes, scenes, and the like), along with base
classes for color. 


Conclusion


Most RAY computations are done using floating-point numbers. Consequently,
high performance depends on the presence of a math coprocessor in the target
PC. For instance, Figure 1 (available electronically, along with the sample
image files) was computed in one to three minutes on a 33-MHz 486DX PC. On a
25-MHz 486SX, however, this can take up to an hour. 
Finally, I would like to thank my brother, Jean-Michel Mangen, for his
assistance on this project.
Encapsulating VESA Services
The VESA standard defines the behavior of video screens and graphical cards.
Most graphics cards are now bundled with either hardware or software that
emulates this standard, accounting for differences in video modes and
performing video bank switching for the SVGA modes.
In RAY, I've encapsulated VESA calls in the RAYSVGA module. While Table 2
lists all VESA modes, RAY only supports the VESA 256-color ones. Table 3 lists
the RAY routines that provide VESA services.
Since the video palette is dynamically filled in real time during program
execution, activating the video palette in two individual steps allows you to
buffer palette changes. This prevents flickering on the video screen during
the repeated changes of the colors in the palette. Also, loading the video
palette in the VGA-card hardware can be slow on some boards, so buffering
these changes globally optimizes program performance.
--A.M.
The TARGA Interface
Synthesized images are almost always postprocessed for video integration or
conversion to another file format. Thus, you only have to implement two
uncompressed TARGA formats to keep the TARGA interface modules simple--the
size of the images on disk isn't an issue since they're only temporary files.
Consequently, the TARGA interface can be implemented in a single page of C++
code. I use two different TARGA formats: type 1 (uncompressed, color-mapped
images) for the gray-scale images (see Table 4) and type 2 (uncompressed, RGB
images) for true-color images (Table 5). I chose type 1 for gray-scale images
instead of type 3 (uncompressed, black-and-white images) because type 3 is
less likely to be supported in commercial applications.
TARGA files are converted by the C++ object TGAFile. The TGAFile constructor
opens the file in Write mode and writes the file header. The destructor closes
the TARGA file. 
The method WritePixel() copies an array of uncompressed pixels into the TARGA
file. This function is optimized using fwrite() to perform the disk operation
in one step. Also, the file is buffered using a dynamically allocated area of
32 Kbytes in memory, which optimizes the overall speed of the application.
--A.M.
Modeling Objects
One feature of the ray-tracing rendering algorithm presented here is its
ability to handle geometric objects using their exact mathematical
expressions. To characterize the behavior of an object, you must compute the
intersection of a light ray (a 3-D vector) with the surface of the object and
determine the orientation of the normal at the point of intersection.
When modeling the ray, if --V is a 3-D vector defined by its origin --V0=[x0,
y0, z0] and its direction --Vd=[xd, yd, zd], the equation of the ray as a
function of time t is --V(t)=--V0+--Vdt, where t>0 and --Vd is normalized. As
t increases, the ray point moves farther from the observer's eye. If the ray
encounters several distinct objects in its path, the closest object to the
observer will correspond to the lowest value of t. This means you can "depth
sort" the objects to compute the hidden-surface removal by comparing the time
values of the intersections. Objects with a negative value of t are located
behind the observer and can be ignored. Miscellaneous objects composing the
scene can interpenetrate each other: The intersection of the ray will be
automatically computed with the resulting outermost surface.
When modeling the sphere, consider a sphere S with its center --Sc=[xc, yc,
zc] and its radius Sr. Replacing a point of the light ray --V(t) in the
equation of the sphere in Figure 5(a) yields the equations in Figures 5(b) and
5(c). If the discriminator (B2--4AC) is negative, the equation has no solution
and the ray misses the sphere. Otherwise, the lowest positive value between t0
and t1 represents the nearest intersection, and the intersection point P=[xP,
yP, zP]=[x0+xdt, y0+ydt, z0+zdt]. The normal --N in the point P is then given
by the equation in Figure 5(d). If both t0 and t1 are negative, then the
object is located behind the observer and need not be represented on the
screen.
To optimize the sphere, you can rewrite the equations in Figure 5 by computing
the vector --O--C=--Sc----V0, going from the origin of the light ray to the
center of the sphere. Then compute the distance of the point of the ray
nearest to the center of the sphere (Figure 6). This approach requires fewer
computations and, by considering the sign of t2dc, more quickly determines
whether or not there is an intersection 
The equation for a plane (considered an infinite surface in all directions) is
Ax+By+Cz+D=0, where A2+B2+C2=1. Replacing a point of the light ray yields the
equation in Figure 7. If the denominator is 0, the scalar product of the ray
with the normal [A, B, C] to the plane is 0, and the ray is parallel to the
plane. If t is greater than 0, the point of intersection P=[xP, yP,
zP]=[x0+xdt, y0+ydt, z0+zdt]. The normal --N in P is then --N=[A, B, C] with
possibly a reversed sign.
--A.M.
Figure 1 A sample image generated by the RAY ray-tracing program (see Listing
One).
Figure 2 Forward ray tracing.
Figure 3 Backward ray tracing.
Figure 4 The light model.
Figure 5 (a) General equation for a sphere; (b) replacing a point of the light
ray --V(t) in the equation in (a) and solving for t0; (c) solving for t1; (d)
the normal --N in the point P.
Figure 6 Computing the distance of the point nearest to the center of the
sphere.
Figure 7 Replacing a point of the light ray in the plane equation.
Table 1 Overloaded vector operations.
Table 2: VESA modes.
 Mode Definition Colors 
 GetMaxXXGetMaxY
 0x100 640X400 256
 0x101 640X480 256
 0x102 800X600 16
 0x103 800X600 256
 0x104 1024X768 16
 0x105 1024X768 256
 0x106 1280X1024 16
 0x107 1280X1024 256
Table 3: RAY VESA routines.
 Routine Description 
 int VESA_Close() Closes VESA mode and returns to text mode.
 int VESA_GetMode() Returns current VESA mode.
 void VESA_PutPixel Displays a pixel of color color at
 the position (x,y).
 (int x, int y, int color)
 void VESA_WritePixel Displays a line of n pixels starting at (x,y).
 Optimized to copy
 (int x, int y, int n, char *color)
 pixel value color in one or two operations
 using direct-memory
 addressing to video memory.
 void VESA_ShowPalette() Displays palette contents by drawing one

 horizontal line using
 each available color.
 void VESA_LoadBlackPalette() Initializes memory palette so that all
 displayed colors are black.
 void VESA_LoadBWPalette() Initializes memory palette to hold
 graduated shading of gray scale.
 void VESA_LoadColor Loads a color in the memory palette entry i, with
 the given RGB components. 
 (int i, int R, int G, int B) 
 void VESA_ActivatePalette() Activates memory palette and copies all
 predefined entries to video
 palette in VGA card.
 int VESA_SetMode (int Mode, Initializes a given VESA mode. See Table 1.
 int GetMaxX, int GetMaxY)
Table 4: TARGA type-1 format.
 Field Bytes Uncompressed 
 Color-mapped Images 
 1 1 Number of bytes in field 6
 2 1 Color-map type: 1
 3 1 Image-type code: 1
 4 5 Color-map specification
 2 4.1. Color-map origin: 0
 2 4.2. Color-map length: 256
 1 4.3. Color-map entry size: 24
 5 10 Image specification
 2 5.1. X-Origin: 0
 2 5.2. Y-Origin: 0
 2 5.3. Width of the image
 2 5.4. Height of the image
 1 5.5. Image pixel size: 8
 1 5.6. Image descriptor: 0x20
 6 Variable Identification field
 7 Variable Color-map data (RGB):
 3 bytes/entry in map
 8 Variable Image data: 1 byte/pixel
Table 5: TARGA type-2 format.
 Field Bytes Uncompressed RGB Images 
 1 1 Number of bytes in field 6
 2 1 Color-map type: 0
 3 1 Image-type code: 2
 4 5 Color-map specification
 2 4.1. Color-map origin: 0
 2 4.2. Color-map length: 0
 1 4.3. Color-map entry size: 0
 5 10 Image specification
 2 5.1. X-Origin: 0
 2 5.2. Y-Origin: 0
 2 5.3. Width of the image
 2 5.4. Height of the image
 1 5.5. Image pixel size: 24
 1 5.6. Image descriptor: 0x20
 6 Variable Identification field
 7 0 Color-map data (RGB): empty
 8 Variable Image data: 3 bytes RGB/pixel

Listing One 

/***********************************************************************/
/*** (C) Copyright A.MANGEN 1994 ***/

/*** PROJECT : RAY TRACING PROGRAM ***/
/*** PROGRAM NAME: RAY.H 1.1 ***/
/*** DESCRIPTION : Ray Tracing - Main Include File ***/
/***********************************************************************/

/* Constants for associative palette */
const A=63; /* Size of the associative palette */
const A0=10; /* Minimum level of visibility */

/* Constants for Ray Tracing */
#define Diffuse /* Implement diffuse light */
#define Specular /* Implement specular light */
#define Shadow /* Implement objects shadows */
#define Reflections /* Implement ray reflections */



Listing Two 

/***********************************************************************/
/*** (C) Copyright A.MANGEN 1994 ***/
/*** PROJECT : RAY TRACING PROGRAM ***/
/*** PROGRAM NAME: RAY1.CPP 1.1 ***/
/*** DESCRIPTION : Ray Tracing Main Program ***/
/*** Generate one single image on screen ***/
/***********************************************************************/

#include <stdio.h>
#include "raysvga.h"
#include "rayobj.h"

#define RD 0.9 // Diffuse reflection factor
#define RS 0.9 // Specular reflection factor
#define RR 0.9 // Reflection factor
#define MAXREF 1 // Maximum number of reflections

extern unsigned _stklen = 16384U;

void main()
{ Color Ambient(40,40,40);

 Scene MyScene(320,-150,240,Ambient);

 Color CWhite(200,200,200);
 Color CGreen(0,200,0);
 Color CRed(200,0,0);
 Color CBlue(0,0,200);

 Vector MyLight(700,-400,800,-0.5,-1,0);
 MyScene.AddLight(!MyLight,CWhite);

 Sphere S0(-1900,2700,240,1300,CRed,RD,RS,RR,MAXREF);
 Sphere S1(2400,2700,240,1300,CBlue,RD,RS,RR,MAXREF);
 Sphere S2(320,1800,300,300,CGreen,RD,RS,RR,MAXREF);
 Plane P0(0,0,1,1500,CGreen,RD,RS,RR,MAXREF);

 MyScene.RayTrace(VESA_640x480x256,640,480,0,NULL,7,3);
}




Listing Three

/***********************************************************************/
/*** (C) Copyright A.MANGEN 1994 ***/
/*** PROJECT : RAY TRACING PROGRAM ***/
/*** PROGRAM NAME: RAYOBJ.H 1.1 ***/
/*** DESCRIPTION : Ray Tracing Objects - Include File ***/
/***********************************************************************/

#include <math.h>
#define min(a,b) ((a)<(b)?(a):(b))
#define max(a,b) ((a)>(b)?(a):(b))
typedef float Coord; // Point Coordinate

/*** Color: Base class for color. Overloaded operators are declared inline 
in order to optimize speed. ***/
typedef float Col; // Color Component
class Color
{
public:
 Col R,G,B; // Red,Green,Blue Colors
 Color(Col r=0,Col g=0,Col b=0) // Inline Constructor
 { R=r; G=g; B=b; }
 Color operator +(Color & C) // Overloaded + operator
 { return Color(R+C.R,G+C.G,B+C.B); }
 Color operator *(Col f) // Overloaded * operator
 { return Color(f*R,f*G,f*B); }
 Col GrayScale() // GrayScale function
 { Col W=0.39*R+0.50*G+0.11*B;
 return min(255,W);
 }
 Col Associate(int Spread,int Search); // Associate function
};
/**** Vector: Base class for vector. Overloaded operators are declared inline 
in order to optimize speed. ***/
class Vector
{
public:
 Coord X,Y,Z,dX,dY,dZ; // Vector Direction
 Vector(Coord x=0,Coord y=0,Coord z=0, // Inline Constructor
 Coord dx=0,Coord dy=0,Coord dz=0)
 { X=x; Y=y; Z=z; dX=dx; dY=dy; dZ=dz; }
 Vector operator *(Coord f) // Overloaded * : fV
 { return Vector(X,Y,Z,f*dX,f*dY,f*dZ); }
 Vector operator ^(Coord f) // Overloaded ^ : V(t)
 { Coord N=sqrt(dX*dX+dY*dY+dZ*dZ);
 return Vector(X+f*dX/N,Y+f*dY/N,Z+f*dZ/N,dX,dY,dZ); }
 Coord operator *(Vector V) // Overloaded * : Scalar product
 { return(dX*V.dX+dY*V.dY+dZ*V.dZ); }
 Vector operator +(Vector V) // Overloaded + : Vector addition
 { return Vector(X,Y,Z,dX+V.dX,dY+V.dY,dZ+V.dZ); }
 Vector operator -(Vector V) // Overloaded - : Vector substraction
 { return Vector(X,Y,Z,dX-V.dX,dY-V.dY,dZ-V.dZ); }
 Vector operator !() // Overloaded ! : Norm
 { Coord N=sqrt(dX*dX+dY*dY+dZ*dZ);
 return Vector(X,Y,Z,dX/N,dY/N,dZ/N);
 }

 Vector operator -() // Overloaded - : Unary -
 { return Vector(X,Y,Z,-dX,-dY,-dZ); }
 Coord Length() // Length of vector
 { return sqrt(dX*dX+dY*dY+dZ*dZ); }
};
/*** Point : Generic base class with linked list ***/
#define NO_INTERSECTION 1e99
#define DEFAULT_RD 0.9 // Diffuse reflection factor
#define DEFAULT_RS 0.0 // Specular reflection factor
#define DEFAULT_RR 0.0 // Reflection factor
#define DEFAULT_MAXREF 0 // Maximum number of reflections
class Point
{
protected:
 Point *Prev,*Next; // Integrated linked list
 Coord X,Y,Z; // Point Coordinates
public:
 Color C; // Point Color
 Col Rd; // Diffuse reflection factor
 Col Rs; // Specular reflection factor
 Col Rr; // Reflection factor
 int MaxRef; // Maximum number of reflections
 int MakeShadow; // Gives shadow or not
 Point(Coord x,Coord y,Coord z, // Constructor
 Color c,
 Col rd=DEFAULT_RD,
 Col rs=DEFAULT_RS,
 Col rr=DEFAULT_RR,
 int maxref=DEFAULT_MAXREF,
 int makeshadow=TRUE);
 ~Point(); // Destructor
 Point *GetNext() // Get Next Point in linked list
 { return Next; };
 virtual Coord Intersect(Vector V) = 0; // Intersection method - designed
 // to be overridden by descendants
 virtual Vector GetNormal(Vector P) = 0; // Get normal
};
/*** Sphere : Inherited Point for spheres ***/
class Sphere:public Point
{
 Coord R; // Radius of the sphere
public:
 Sphere(Coord x,Coord y,Coord z, // Inline Constructor
 Coord r,Color c,
 Col rd=DEFAULT_RD,
 Col rs=DEFAULT_RS,
 Col rr=DEFAULT_RR,
 int maxref=DEFAULT_MAXREF,
 int makeshadow=TRUE)
 :Point(x,y,z,c,rd,rs,rr,maxref,makeshadow) { R=r; };
 virtual Coord Intersect(Vector V); // Intersection method
 virtual Vector GetNormal(Vector P) // Get normal
 { return Vector(P.X,P.Y,P.Z,P.X-X,P.Y-Y,P.Z-Z); }
};
/*** Plane : Inherited Point for planes ***/
class Plane:public Point
{
 Coord D; // Position of the plane
public:

 Plane(Coord x,Coord y,Coord z, // Inline Constructor
 Coord d,Color c,
 Col rd=DEFAULT_RD,
 Col rs=DEFAULT_RS,
 Col rr=DEFAULT_RR,
 int maxref=DEFAULT_MAXREF,
 int makeshadow=TRUE)
 :Point(x,y,z,c,rd,rs,rr,maxref,makeshadow) { D=d; };
 virtual Coord Intersect(Vector V); // Intersection method
 virtual Vector GetNormal(Vector P) // Get normal
 { return Vector(P.X,P.Y,P.Z,X,Y,Z); }
};
/*** Scene : Scene object ***/
const MAX_LIGHT=10;
class Scene
{ Coord CameraX,CameraY,CameraZ; // Position of the camera
 Color Ambient; // Ambient luminosity
 int NLight; // Number of lights
 Vector LightV[MAX_LIGHT]; // Lights orientation
 Color LightColor[MAX_LIGHT]; // Lights color
 char ScreenLine[VESA_Max_Width]; // Screen buffer
 char TGALine[3*VESA_Max_Width]; // TARGA File buffer
public:
 Scene(Coord camerax,Coord cameray,Coord cameraz,Color ambient);
 int AddLight(Vector V,Color I);
 Coord ClosestIntersect(Vector V,
 Point **ObjectMin,int ShouldMakeShadow=FALSE);
 void Ray(int N,Vector V,Coord f,Color & I);
 void RayTrace(int Mode,int GetMaxX,int GetMaxY,int BW,
 char *FileName=NULL,int Spread=0,int Search=1);
};



Listing Four

/***********************************************************************/
/*** (C) Copyright A.MANGEN 1994 ***/
/*** PROJECT : RAY TRACING PROGRAM ***/
/*** PROGRAM NAME: RAYOBJ.CPP 1.1 ***/
/*** DESCRIPTION : Ray Tracing Objects ***/
/***********************************************************************/
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>

#include "ray.h"
#include "raysvga.h"
#include "raytga.h"
#include "rayobj.h"

/*** Error : Standard Error Routine ***/
#define Here __FILE__,__LINE__
void Error(char *File,int Line,char *Format,...)
{ va_list arg_ptr;
 va_start(arg_ptr,Format);
 printf("(%s,%d) ",File,Line);

 vprintf(Format,arg_ptr);
 va_end(arg_ptr);
 exit(1);
}
/*** Color : Base class for color ***/
int AssociativeAllocated; // Allocation index
char huge AssociativePalette[A+1][A+1][A+1]; // Associative Palette
void Init_AssociativePalette() // Initialize palette
{ AssociativeAllocated=1;
 _fmemset(AssociativePalette,0,sizeof(AssociativePalette));
}
Col Color::Associate(int Spread,int Search) // Search a color in palette
{ int C;
 int R0=(int) R*(A+1)/256; R0=min(A,R0); R0=max(A0,R0);
 int G0=(int) G*(A+1)/256; G0=min(A,G0); G0=max(A0,G0);
 int B0=(int) B*(A+1)/256; B0=min(A,B0); B0=max(A0,B0);
 if((C=AssociativePalette[R0][G0][B0])!=0) // Found an existing color
 return(C);
 else if(AssociativeAllocated==255) // Palette is full
 { for(int R1=R0;R1<=min(A,R0+Search);R1++)
 for(int G1=G0;G1<=min(A,G0+Search);G1++)
 for(int B1=B0;B1<=min(A,B0+Search);B1++)
 if((C=AssociativePalette[R1][G1][B1])!=0)
 return(C); // Look for an approximate
 for(R1=max(0,R0-Search);R1<=R0;R1++)
 for(int G1=max(0,G0-Search);G1<=G0;G1++)
 for(int B1=max(0,B0-Search);B1<=B0;B1++)
 if((C=AssociativePalette[R1][G1][B1])!=0)
 return(C); // Look for an approximate
 return(0); // Found nothing
 }
 else // Allocate a new one
 { VESA_LoadColor(AssociativeAllocated,
 min(63,R/4),min(63,G/4),min(63,B/4));
 if(AssociativeAllocated%10==1) // Avoid video flickering
 VESA_ActivatePalette(); // Activate VGA palette
 for(int R1=R0;R1<=min(A,R0+Spread);R1++) // Spread color
 for(int G1=G0;G1<=min(A,G0+Spread);G1++)
 for(int B1=B0;B1<=min(A,B0+Spread);B1++)
 if(AssociativePalette[R1][G1][B1]==0)
 AssociativePalette[R1][G1][B1]=AssociativeAllocated;
 return(AssociativeAllocated++); // Return new index
 }
}
/*** Point : Generic base class with linked list ***/
Point *Head=NULL; // Linked List Header
Point *Trail=NULL; // Linked List Trailer
 // Point Constructor
Point::Point(Coord x,Coord y,Coord z,
 Color c,Col rd,Col rs,Col rr,int maxref,int makeshadow)
{ X=x; Y=y; Z=z; C=c; // Data Encapsulation
 Rd=rd; Rs=rs; Rr=rr; MaxRef=maxref; MakeShadow=makeshadow;
 Next=NULL; // Build up linked list
 if(Head==NULL) Head=this;
 if(Trail!=NULL) Trail->Next=this;
 Prev=Trail; Trail=this;
};
Point::~Point()
{ if(Head==this) // Clean up linked list

 Head=Next;
 if(Trail==this)
 Trail=Prev;
 if(Prev!=NULL)
 Prev->Next=Next;
 if(Next!=NULL)
 Next->Prev=Prev;
}
/*** Sphere : Inherited Point for spheres ***/
Coord Sphere::Intersect(Vector V)
{ Vector OC(X,Y,Z,X-V.X,Y-V.Y,Z-V.Z);
 Coord tpp=OC*V;
 Coord tdc2=R*R-(OC*OC-tpp*tpp);
 if(tdc2>0) // Is there an intersection
 { tdc2=sqrt(tdc2);
 Coord t=min(tpp-tdc2,tpp+tdc2);
 if(t>0) return(t); // If in front of observer
 else return(NO_INTERSECTION); // Else behind observer
 }
 else return(NO_INTERSECTION); // No intersection found
};
/*** Plane : Inherited Point for planes ***/
Coord Plane::Intersect(Vector V)
{ Coord Denom=X*V.dX+Y*V.dY+Z*V.dZ;
 if(Denom==0) return(NO_INTERSECTION); // No intersection found
 else
 { Coord t=-(X*V.X+Y*V.Y+Z*V.Z+D)/Denom;
 if(t>0) return(t); // If in front of observer
 else return(NO_INTERSECTION); // Else behind observer
 }
};
/*** Scene : Scene object ***/
Scene::Scene(Coord camerax,Coord cameray,Coord cameraz,Color ambient)
{ CameraX=camerax; CameraY=cameray; CameraZ=cameraz; // Data Encapsulation
 Ambient=ambient; NLight=0;
};
int Scene::AddLight(Vector V,Color I)
{ if(NLight<MAX_LIGHT)
 { LightV[NLight]=V; // Add a new light in scene
 LightColor[NLight]=I;
 NLight++;
 return(TRUE);
 }
 else return(FALSE); // Too many lights
} // See constant MAX_LIGHT
Coord Scene::ClosestIntersect(Vector V, // Search for the
 Point **ObjectMin,int ShouldMakeShadow) // closest intersection
{ Coord t,tmin=NO_INTERSECTION;
 Point *Object=Head;
 *ObjectMin=NULL;
 do
 { if(!ShouldMakeShadow Object->MakeShadow)
 { t=Object->Intersect(V);
 if((t<tmin)&&(t>0))
 { tmin=t; *ObjectMin=Object; }
 }
 Object=Object->GetNext();
 } while (Object!=NULL);
 return(tmin);

}
void Scene::Ray(int N,Vector V,Coord f,Color & I)
{ Point *ObjectMin=NULL,*ObjectShadowMin=NULL;
 Coord t=ClosestIntersect(V,&ObjectMin); // Compute intersection tmin
 if(ObjectMin!=NULL) // with closest ObjectMin
 {
#ifdef Diffuse
 Vector P=V^t; // Point of intersection
 Vector Un=!ObjectMin->GetNormal(P);
 for(int i=0;i<NLight;i++) // Now compute the color
 // for each light source
 { Vector Us(P.X,P.Y,P.Z,LightV[i].X-P.X,LightV[i].Y-P.Y,LightV[i].Z-P.Z);
#ifdef Shadow
 Coord tshadow=ClosestIntersect((!Us)^0.01,&ObjectShadowMin,TRUE);
 Coord Dist=Us.Length();
 if((ObjectShadowMin==NULL)(tshadow>Dist))
#endif // No other object found
 { Coord Diff=(!Us)*Un; // on lightening path
 I=I+ObjectMin->C*f*ObjectMin->Rd*max(0,Diff);
#ifdef Specular
 if((N==0)&&(ObjectMin->Rs>0))
 { Vector Ur=Un*2*(Us*Un)-Us; // Specular reflection
 Coord Spec=(!Ur)*(-V);
 I=I+LightColor[i]*f*ObjectMin->Rs*pow(max(0,Spec),20);
 }
#endif
 }
 }
#ifdef Reflections // Check maximum number of reflections/refractions
 if((N<ObjectMin->MaxRef)&&(ObjectMin->Rr>0))
 { Vector Uw=-Un*2*(V*Un)+V;
 Uw=Uw^0.01; // Add some distance along the vector to
 // avoid intersection with the same object
 Ray(N+1,Uw,f*ObjectMin->Rr,I);
 }
#endif
#endif
 }
}
void Scene::RayTrace(int Mode,int GetMaxX,int GetMaxY,int BW,
 char *FileName,int Spread,int Search)
{ if(!VESA_SetMode(Mode,GetMaxX,GetMaxY))
 Error(Here,"Cannot set VESA mode 0x%x (%dx%d)",Mode,GetMaxX,GetMaxY);
 switch(BW)
 { case 0:VESA_LoadBlackPalette(); break;
 case 1:VESA_LoadBWPalette(); break;
 }
 VESA_ActivatePalette(); // Initialize screen palette
 Init_AssociativePalette(); // Initialize memory palette
 TGAFile TGA(FileName,GetMaxX,GetMaxY,BW); // Create TARGA file
 for(int Z=GetMaxY;Z>0;Z--) // Scan all pixels
 { for(int X=0;X<GetMaxX;X++)
 { // Build vector from camera to (X,0,Z)
 Vector V(CameraX,CameraY,CameraZ,X-CameraX,0-CameraY,Z-CameraZ);
 Color I=Ambient;
 Ray(0,!V,1.0,I); // Cast a ray
 switch(BW)
 { case 0:
 ScreenLine[X] =I.Associate(Spread,Search);

 TGALine[3*X] =min(255,(int) I.B);
 TGALine[3*X+1]=min(255,(int) I.G);
 TGALine[3*X+2]=min(255,(int) I.R);
 break;
 case 1:
 TGALine[X]=ScreenLine[X]=I.GrayScale();
 break;
 }
 }
 VESA_WritePixel(0,GetMaxY-Z,GetMaxX,ScreenLine);
 TGA.WritePixel(GetMaxX,TGALine);
 }
 VESA_ActivatePalette(); // Freshen screen palette
 Beep();
 if(FileName==NULL)
 { (void) getchar();
 VESA_ShowPalette();
 }
 VESA_Close();
 printf("Colors used : %d\n",AssociativeAllocated);
};









































July, 1994
Lotfi Visions Part 1


An interview with Lotfi Zadeh, the father of fuzzy logic 




Jack Woehr


Jack is a frequent contributor to DDJ and can be contacted at jax@cygnus.com.


Even at 73 years of age, Lotfi Zadeh, the father of fuzzy logic, has an
energetic stage presence. Nowhere is this more evident than in the rapt
attention he commands when presenting his paper, "Fuzzy Logic: Issues,
Contentions and Perspectives," at the 22nd Annual ACM Computer Science
Conference in Phoenix, Arizona on March 8, 1994. 
It is during this session that, once again, Zadeh's colleague and friendly
personal gadfly, Professor William Kahan, rises to challenge, for three full
minutes, Zadeh's lifework as an assault of illogic upon the scientific
foundation of control engineering. "A scientific idea is one which contains
within it the germ of a refutation," says Kahan. "A test can then be posited
whereby the hypothetical refutation can be proven or disproven. Fuzzy logic
has no scientific content because it doesn't assert anything upon which we can
model such a hypothetical refutation."
Zadeh's lips tighten into a tolerant smile. He toys with the lapels of his
suit coat, he lowers his eyes and rocks back and forth slightly at the podium
as he again hears the familiar arguments.
Afterwards, Zadeh (referred to as "LZ" in the following interview) and I chat
on subjects ranging from the theory and economics of fuzzy logic, to AI,
fractals, philosophy, and Zadeh's boyhood in Stalin's Soviet Union. We are
joined by William Kahan (WK) who, like Zadeh, is a professor at the University
of California, Berkeley, and John Osmundsen (JO), associate director, public
affairs, of the ACM.
In the first installment of this two-part article, Zadeh examines the
philosophical underpinnings of fuzzy logic, how it relates to disciplines such
as fractals and AI, and his youth in the USSR and Iran. Next month, we'll
discuss fuzzy applications, such as Japan's Sendai train, and hear in detail
what Professor Kahan thinks of fuzzy logic. 
DDJ: What you said in your lecture today that struck me the most is that fuzzy
logic is a means of presenting problems to computers in a way akin to the way
humans solve them.
LZ: There is one way of expressing that, which I use sometimes: The role model
for fuzzy logic is the human mind. If you examine the way the human mind
functions, you find that the human mind has this remarkable capability to deal
with information which is incomplete, imprecise, uncertain, and so forth.
Computers do not have that capability to any significant extent. 
Classical logic is normative. Classical logic in effect tells you, "That's the
way you should be reasoning." There is a big difference in that sense between
the spirit of classical logic, which is prescriptive, and the spirit of fuzzy
logic, which is descriptive. That is, it merely asks the question "How do you
reason about this or that?" It's like translation. The translator does not
take responsibility for what he or she translates.
DDJ: And the analogy of translation to fuzzy logic is that fuzzy logic is
simply a translation mechanism?
LZ: Well, there are many facets to fuzzy logic, so you cannot summarize the
whole thing in one sentence. I'm talking here about one particular facet of
fuzzy logic. That facet has to do with most practical applications today in
the realm of consumer products and in many other fields, where what you do is
you use the language of fuzzy rules. You start with a human solution and you
translate it into that language. But it doesn't mean that that is all there is
to fuzzy logic, because there are many other things that fall within the
province of fuzzy logic that would not fit this description. The essence of
fuzzy logic is that everything is a matter of degree, including the notion of
subsethood.
DDJ: Is this a philosophical point of view that finds itself translated into
computer logic--the notion that everything is a matter of shades, and
varyings, and degrees? Is there a personal philosophical viewpoint that is
finding its expression in computer science in your work?
LZ: Not yet, but consider the following: The real world that we live in is
very fuzzy, very imprecise, very uncertain. The theories that we have
constructed are, on the other hand, very precise. We have mathematics, we have
all kinds of things which are very precise in nature.
Now, these theories have proved to be very successful in many respects, but
their ability to come to grips with the analysis of complex systems--I mean
complex not just in terms of number of components, for example, a chip that
has two million resistors_. When I say "complex," I mean "complex economic
systems" and things of that kind, systems with many components, with
relationships that are not well defined. The successes of classical
techniques, in connection with certain kinds of systems, have led us to
believe they can be successful also in dealing with the other types of
systems, such as economic systems.
There is no question about it, classical mathematics has proved to be very
successful in astronomy, where you compute the orbits of stars and planets.
But people then conclude from that that you can apply mathematics equally
successfully to laws of economic systems. And that's where I question this
thing.
I say "No, these systems don't fit. You need a concept of classes which don't
have well-defined boundaries." And if you do, as fuzzy logic attempts to do,
construct such a framework, then you enhance your ability to model economic
systems and other systems of that kind. It doesn't mean you'll be able to
solve all the problems, but at least you'll be able to do much more.
This applies, for example, to natural languages. Notice that we didn't make
that much headway in machine translation, and essentially nothing in machine
summarization. If I ask you to write a program that will look at a book and
summarize it, I think you'll say that we cannot do that. Not only can't we do
it today, there's no way that we can conceive of doing that in the foreseeable
future.
The situation is this, that there is this tradition of believing that
conventional, traditional techniques have the power within them to solve some
of these problems. My position is that this is not the case.
DDJ: It sounds like you are making a point analogous to that of Benoit
Mandelbrot in The Fractal Geometry of Nature, in which he pointed out that
mathematicians at the turn of the twentieth century, including his father_.
LZ: His uncle_
DDJ:_his uncle, had examined fractal forms and had been roundly criticized in
the math world for studying these "monstrosities," and asked why they did not
go back to studying genuine geometrical forms such as the sphere. Mandelbrot
points out that there are no spheres in nature, no squares, nothing which has
perfect form to it. It's just a question of to what degree our measurement
instruments are able to penetrate the form and discover the imperfections that
nature has placed there--the "sacred error," as it were. 
LZ: I know Mandelbrot quite well, and hold great respect and admiration for
him. Mandelbrot stays within the traditional framework. He does consider
objects, fractals, and he did raise questions of the kind that other people
somehow did not raise, for example: What is the length of the coastline of
Brittany? To me, that's a very good question, because somehow people took it
for granted that there is an answer to that question. But he pointed out that
there isn't an answer to that question, because it depends on the degree of
resolution.
DDJ: It ends up being a question about your measuring instruments, and not
about the problem you're trying to solve.
LZ: That's right. So I think it was a very incisive observation that questions
like that cannot be answered within the traditional framework. But what
Mandelbrot tried to do--and I think it's a very significant accomplishment,
but different from what you do in fuzzy logic--he attempted to come up with a
reasonably precise theory of this sort of thing. So he talks about fractional
dimension. Basically, Mandelbrot is a mathematician by training, and he has
not abandoned his home, so to speak.
So to me, the theory of fractals is an important theory, and it helped to
focus attention on issues that were not really properly formulated before. But
by itself, it stays within the traditional paradigm. In other words, you're
still committed to the goal of mathematicians: to come up with theorems. I'm
not saying that that goal is not a worthwhile goal, I'm merely saying that in
many cases it is unattainable.
DDJ: When you come down to the field of practical engineering, especially in
problems of embedded control, the problem of making machines that can in real
time make, if not perfect decisions, then reasonable decisions_.
LZ: Certainly.
DDJ: My father is 74 years old, and he doesn't know which end of the computer
you hook up the airhose to, but he knows what fuzzy logic is because he has
been an amateur photographer for 60 years, and his Japanese camera can
determine the illumination of dim objects against bright background
light--using fuzzy logic. "Fuzzy logic" is also part of the advertising.
LZ: Minolta uses fuzzy logic very extensively.
DDJ: Do you derive any royalties from this?
LZ: Zero. The thought of applying for a patent did not even occur to me at the
time I did the work.
DDJ: Is this an oversight which you regret?
LZ: No, not at all. Perhaps I would be a rich man, but so long as I can live
in reasonable comfort, that's enough.
DDJ: Is it possible that if you had patented this technology it would not have
been so readily adopted?
LZ: It's difficult to predict what could have happened, because sometimes
things evolve in [an] unpredictable fashion. Just to give you an example, the
first consumer product [to use fuzzy logic] was a Panasonic showerhead which
came out in 1987. If someone had asked me in 1986 what sort of applications
would you expect, it wouldn't have occurred to me that there would be all
these applications in the realm of consumer goods.
DDJ: What did you envision?
LZ: Industrial applications, yes. Industrial control, traffic control,
applications in linguistics, yes. But not washing machines, not microwave
ovens, none of those things would have occurred to me. So it shows that even a
reasonably well-informed person in that field may find it very difficult to
predict how things will evolve.
I could see that control was going to be an application, and in 1972 I wrote a
paper called "A Rationale for Fuzzy Control." But my colleagues in the area of
control didn't share that feeling, and to this day, most of the
control-systems community is very antagonistic towards fuzzy logic, as is the
AI community.
DDJ: I can understand the control-engineering community's wariness, which
possibly is based on unfamiliarity, since in the classic control theory,
proportional-integral derivative, and the like, so much time has been
committed to working out these theories, and so much of the training of the
people involved has dealt with this. You don't want to teach an old dog, in
the form of a well-established school of engineering, new tricks.
In the AI field, I wonder if there is professional jealousy that AI didn't
quite take off as a commercial proposition, whereas fuzzy logic has done so.
LZ: The issue is somewhat complex. Incidentally, many of the people within the
AI community were very hostile toward fuzzy logic. Among them are good
friends, so there is nothing personal about it.
I think that what happened is that, for historical reasons, AI embraced
classical logic, classical predicate logic, symbolic logic. And so, the
intellectual leaders of AI, who were deeply committed to that kind of logic,
took the position of prescriptive logic, telling you "that is the way you
should be reasoning." Since computers are symbol-manipulation machines, it
appeared that computers could implement that kind of reasoning and go beyond
what humans can do.
AI has hitched itself to symbol manipulation. As a result, most people in AI
dislike numerical computation. They dislike not just fuzzy logic, but they
dislike probability theory, neural networks, anything that involves numerical
computations. Things are changing, and I think that gradually you'll find the
number of people within AI who use numeric computations, [and] the number of
papers at AI conferences in which numerical computations are used in one form
or another, are increasing. It's a gradual process. If you go back five or ten
years you will find that AI was almost entirely symbol-manipulation oriented.
Fuzzy logic is not symbol-manipulation oriented. It's computationally
oriented. Because of that, it simply did not sit too well with the AI
community.

Also, it's a human thing: You sit at a table and the pie is sliced, and the
more people who sit at it, the smaller your slice of pie gets. It has happened
in a number of fields. There is resistance to something that may result in a
smaller piece of pie for yourself and perhaps, more importantly, may
depreciate the value of your knowledge. This is what is happening in AI now,
because many people will use neural networks; they use, to a lesser extent,
fuzzy logic; they use things that were not in the mainstream of AI. Many
people in AI see that as a threat. They see that as an intrusion of people
whose thinking is different from their own.
In that respect, it's not very different from many of the sociological
phenomena that we observe. Take the way people dress. People who are committed
to a classical-logicalesque reasoning are people who dress very properly. They
have a tie and a shirt and the colors match, shiny shoes, starched shirt, and
so forth.
The fuzzy-logic people dress informally. Their shirt may have colors that
don't match perfectly, they don't worry too much if the shoes are this kind or
that kind. These people do have more traditional clothes somewhere in their
closet, so that if they have to go to a party where that's what's expected,
they have something to put on. But not the other way around! The traditional
people would not have the other kinds of clothes in their closet, because they
would never stoop to wearing such a thing.
People who dress very conservatively, when they look at people who dress very
informally, they don't like it. They feel that "these people are not my
people." And the people who dress very informally, when they look at people
who dress formally, say, "these people are old-fashioned, they are not my
people."
DDJ: So you subscribe to the theories advanced by Thomas Carlyle's fictional
Herr Teufelsdroeck, who said in Sartor Resartus that clothes really do make
the man.
LZ: (Laughs) So you do really have different philosophies. There is a defense
mechanism that I have observed which is part of all of us, that if there is
something that is unfamiliar to you, you convince yourself that it is not
worth learning, because if that were not the case, then you would have to
learn it. But if you convince yourself that it is garbage, or uninteresting,
then that absolves you from the need.
DDJ: As long as we are on the subject of predisposition to certain viewpoints,
may I ask you a personal question?
LZ: Sure.
DDJ: Are you a Moslem?
LZ: I am not practicing. Let me explain something about my background. I am of
Iranian descent, but I was born in the Soviet Union, not in Iran. My father
was a correspondent for Iranian newspapers. So I was born there but was not a
Soviet citizen. When I was ten, my family moved back to Iran.
I went through the first three grades of elementary school in the Soviet
Union. That was at the height of antireligious propaganda. I was born in 1921,
and we left for Iran in 1931. At that point in the Soviet Union, no one would
dare admit that he or she believed in God. That would have been sort of, sort
of_
DDJ: Suicide?
LZ: Suicide. No one would talk to you, no one would associate with you. That
was the sort of environment in which I grew up in those particular years.
Then when my parents returned to Iran, they placed me in an American
Presbyterian missionary school where we had chapel every day at 10 o'clock, if
you can imagine the transition from the one environment to the other
environment!
Then, after studying there for a few months, I had to leave the school because
the Iranian government at that time was nationalistic, and a law was passed to
the effect that you couldn't go to a school run by foreigners without first
completing an elementary school [education] in Iran. So I had to move from
that school to an Iranian school.
DDJ: This was in the time of Mohammed Reza's father?
LZ: Exactly. Now, at that time Iran was anticlerical. That's what many
Americans don't realize because of the Iranian Revolution. But before the
revolution, Iran was highly anticlerical country--many of the imams were in
prison--but antisecular at that same time. In Iran, Islam was even then the
state religion.
It was assumed that you were Moslem unless you declared otherwise. So, in that
Iranian school, if a Christian were to visit the school, all the walls had to
be washed. I experienced these extremes of fanaticism.
When I was at the University of Tehran later, there was only one professor who
was known to go to a mosque. That was Bazargan, who later became premier. If
you said that you were going to a mosque, people would laugh at you. That was
essentially the spirit of the times.
As a result of being subjected to these different influences, I became
tolerant of different points of view. Because these people believed
passionately one thing, and these people believed passionately another, and
they couldn't all be correct. I began to realize that people have these
passionate beliefs in something because they are insulated from people who
believe in something else, so it sort of feeds on itself.
DDJ: My question had been aimed at this: It seems that what you have been
teaching for the last two decades, and then some, has been more accessible to
people from certain backgrounds than from others. It doesn't seem to be merely
a question of what programming discipline one comes from. Fuzzy logic seems to
be more accessible to people of different cultures, certainly more accessible
to the Japanese, and to the coterie of Iranian students and former Iranians
that has formed around you in this country that seems to find these ideas very
easily accessible. Do you think that there is anything in cultural makeup that
would make fuzzy logic acceptable?
LZ: In part. But some people try to explain the whole thing from that
particular angle. I would not go that far, but there is some truth to it, in
the following sense: When you talk about such cultures, you are talking about
old cultures. When you talk about western cultures, you are talking about
young cultures. Young people tend to be more dogmatic in their views than old
people, because when you grow older, you become more keenly aware of the fact
that, well, this is true, and that is true. You become more willing to concede
that truth is not a monopoly of some particular way of thinking.
The thing to remember is that all traditions have a certain amount of
validity, and beyond that point, they may become wrong or counterproductive.
In this quest for precision, which is very characteristic of western culture,
that's where fuzzy logic enters the picture. I say "Stop, maybe you should get
off the train. Maybe it's not taking you to the solution to your problem. You
have to reexamine things." And that is why some people become upset about
this.
DDJ: You're introducing a degree of uncertainty into their life.
LZ: That's right. You're saying their goals may not be realizable. These views
were expressed by Professor Kahan during my lecture: "We need more precision,
more logic." He's a mathematician. I have very great admiration for him. You
can see a forceful expression of that kind of thinking in him, and I agree
that we need this expression, because it is precisely there that I disagree. I
say that that kind of thinking has its place, but it also has its limits.
DDJ: It seems to presuppose that man has a godlike power to determine,
incontrovertibly, the truth or error of a proposition.
LZ: What I'm very conscious of is that there are many things people accept
without question, even though those things don't make any sense. They accept
it because it's part of the tradition. I have many transparencies that I use
in my lectures that illustrate how people come up [with] very precise
conclusions on the basis of data which are completely unreliable.
Just to give you an example, there was a study made on the effect of removing
lead from gasoline. The conclusion: "The removal of lead from gasoline will
result in 1223 fewer cases of high blood pressure and 237 fewer cases of
this_." It doesn't make any sense. How can you determine these things with
such accuracy? Predictions of AIDS that by 1998 there will be so many cases of
AIDS. How can you say this? People accept these things, because that is the
tradition. You're supposed to swallow such assertions.
DDJ: Our society respects statistics as the ancient Babylonians respected the
stars.
LZ: Given that it is respected, that's what people accept. Even if it doesn't
make any sense.
JO: Scientists agree with Professor Zadeh. It's journalists who publish these
numbers.
LZ: By the way, one of the interesting things we have looked at lately is
establishing an "MIQ," a machine intelligence quotient.
DDJ: IQ is a measure that is determined by statistically analyzing those facts
which are the common property of the civilization, and ranking people on the
basis of what percentage [of] those cultural ideas the examinee is in
possession of. How would one begin to establish a quotient on machines?
LZ: You have a standards committee. This standards committee considers various
products and establish certain criteria. So there are dimensions to
intelligence.
DDJ: Chess computers are rated these days.
LZ: So we would have to come up with a rating system agreed upon by a
standards committee which can put its imprimatur upon the system. Many things
are rated. Consumer Reports rates things.
DDJ: Once again you are proposing something that has tremendous commercial
implication. Advertisers would love to say "Our washing machine has an MIQ of
190."
LZ: That is why I registered MIQ as a trademark! I hope to make some money out
of this! I'm pretty sure that at some time in the future when you open
Consumer Reports and read a report on washing machines, you'll see a column
labeled, "MIQ." It's basically a measure of user-friendliness.
DDJ: You are reeducating America and Japan's young control engineers to a
whole new paradigm of control engineering, yet you are in your seventies and
show no sign of diminishing your activity.
LZ: Thank you. It's one of those situations where today I'm okay, and tomorrow
I may be dead! Professor Kahan gave me this, which reads, "I hope you enjoy
good health long enough to be invited to the White House to receive from the
President a medal for so successfully distracting the Japanese for so long."
DDJ: Does Professor Kahan truly believe that you are leading the control
industry astray?
LZ: I think so. I think he's very serious.
DDJ: Yet, in view of the vast commercial success of these products, how can an
American argue with money?
LZ: We're very good friends. He is very sincere in his feelings that this
whole thing is pernicious. Even if it became so ubiquitous that every product
in Japan and the United States used fuzzy logic, he would still hold onto the
opinion that it is wrong.
JO: Perhaps if you reassured him you would make no fuzzy-controlled nuclear
bombs_.
LZ: (Laughs)
JO: But what if you proved that fuzzy logic works?
LZ: That's not good enough, because some people, not Professor Kahan, say that
people aren't really using fuzzy logic, that it's just an advertising gimmick.
Professor [John] MacCarthy at Stanford University makes that argument. Other
people say that, well, yes they do use fuzzy logic, but they have not shown
that they could not have attained this same result using standard methodology.
My response is "The Japanese, for instance, are not that stupid. If they could
have achieved these things using standard techniques, why would they go into
fuzzy logic?" It's not just a merchandising gimmick. You can't explain the
whole thing by a merchandising gimmick.
The fuzzy-logic Sendai train started running in 1987. It's very successful.
Now they're going to use it in Tokyo. A merchandising gimmick?


Next Month


In the next installment, Professor Zadeh discusses the Sendai train in detail
and the Japanese approach to fuzzy logic. Then Professor Kahan arrives on the
scene.






July, 1994
A C++ Class for Generating Bar Codes


Outputting bar codes on PCL laser printers




Douglas Reilly


Doug owns Access Microsystems, a software-development house specializing in
C/C++ software development. He is also the author of the BTFILER and BTVIEWER
Btrieve file utilities. Doug can be contacted at 404 Midstreams Road, Brick,
NJ 08724, or on CompuServe at 74040,607.


The application of bar-code technology extends far beyond scanning cans of
peas in your local grocery store. In many industrial applications, bar codes
have become essential for tracking items during production. Bar-code readers
can now scan items as they quickly move down production lines.
With the advent of relatively inexpensive lasers that can scan from some
distance, many areas of the transportation industry use bar codes to track
trucks, trailers, and containers. Federal Express and United Parcel Service
are using both stationary and mobile bar-code readers to track packages. In
short, any application that depends upon accurate entry of information with
little or no user intervention is a candidate for bar codes.
Identity-verification applications can also use bar codes, although magnetic
striping and other technologies are more secure: An identification bar code
can be photocopied. To thwart this, such bar codes can be printed on a red
background, which allows the bar code to be read but not copied.


Bar-Code Properties


Roger Palmer's The Bar Code Book, second edition (Helmers Publishing, 1991)
lists no fewer than 17 different types, or "symbologies," of bar codes, all of
which have several common properties. Each has a character set that's usually
a subset of the ASCII character set. Some are strictly numeric, and others
allow alphabetic and punctuation symbols. Some use characters separated by
white space, while others are continuous, with no white space between
characters. Continuous symbologies are more common.
Some symbologies use only two element widths: narrow and wide. Others use a
variety of widths. Overall symbol width can be fixed or variable. Version A of
the Universal Product Code (UPC) always encodes a fixed number of digits.
Another symbology, Code 128, can encode any number of characters. Many
symbologies have a check character. If this check character is limited to 0
through 9, it's called a "check digit." Most (though not all) symbologies are
bidirectional; that is, they can be scanned left to right or right to left.
The hardware used to read a bar code can also impact the type of bar-code
symbology used. Most bar-code readers are laser scanners, which have the
advantage of being able to read a bar code from some distance. However, laser
scanners are relatively expensive and not very portable. Relatively low-cost
contact scanners are available, but the scanner must come in contact with the
bar code during scanning. This can be difficult when the object being scanned
is not flat (a can, for instance) or when contact requires the cooperation of
a living thing (a cow in a cattle yard or a hospital patient). For contact
scanners especially, a compact bar-code symbology is preferred.


Types of Bar Codes


The most familiar bar-code symbology is UPC version A. UPC is different from
many symbologies in that it is a coding system as well as a symbology: The
data encoded with UPC symbology in itself conveys some meaning.
Each UPC encodes 12 digits (UPC encodes digits only), divided into two
sections of six digits each, surrounded by left, center, and right guard
patterns. The first digit is a "number-system digit." Table 1 defines this
digit. The next five digits are a manufacturer's code, the next five are the
actual product code, and the last is a check digit. Check digits are also
useful for applications where users enter numeric codes into any device that
lacks the ability to validate the value directly. Check digits are designed to
catch common data-entry errors, such as digit transpositions. Figure 1 shows
how to calculate the check digit for UPC numbers. 
Although it is considered a symbology, postnet (those long and short lines on
the bottom of mail) does not meet the strictest definition of a bar code,
since information is not encoded by the width of bars. Postnet is a "clocked"
technology: The bottom section of the bars help the scanner track the code.
The postnet symbology includes a single tall bar as a start and stop code and
often a check digit calculated such that the sum of all data digits and the
check digits mod 10 is 0.
Another common symbology is Code 128, which has an advantage over the
previously discussed codes: It encodes alphabetic and punctuation characters
as well as digits. Code 128 can also encode two digits into a single
character, in the same fashion that binary-coded decimal (BCD) can pack two
digits into a single 8-bit byte. Code 128 uses three distinct character sets
and "shift" characters to switch from one to the other. The most useful
character sets are set A (digits, alphabetic characters, and most punctuation)
and set C (which includes 100 codes to encode "00" through "99" in a single
digit). 


Building Code-128 Bar Codes


Figure 2 is a Code-128 bar code that, when scanned by the appropriate reader,
will decode as the text "CODE 128." Each digit contains three bars and three
spaces, with an overall length of 11 modules for each character. There are not
absolute tolerances for the width of the bars and spaces, but rather
tolerances related to the width of a single module.
The width of a single module (call it "x") must be 7.5 mils. The minimum bar
height is 15 percent of the total length of the symbol, or 0.25 inches,
whatever is greater. Code 128 requires a "quiet zone" of white space on either
side of the actual code that is ten times the value of x. The width between
the beginning of one black area to the next, or the beginning of one white
area to the next, must be within 0.20x of the expected starting point.
For the purposes of this discussion, keep in mind that using an x of three
dots on a 300 DPI laser printer produces Code 128 symbols that I've always
found readable, even when laminated in not-entirely clear plastic.


C++ Classes for PCL Printer Access


To create Code-128 symbols on a PCL 5-compatible laser printer, I wrote the
PCL class declared in PCL.H (Listing One). Listing Two is PCL.CPP, the
implementation of the class. As with all C++ classes, my goal was to make as
much of the operation of the class as automatic and self contained as
possible. To that end, the constructor and destructor handle virtually all the
housekeeping for the class.
PCL is the page-description language all HP laser printers (and many laser
printers from other vendors) use to control the output. Unlike controlling a
dot-matrix printer, a page-description language allows access to anywhere on
the page, much like screen handling under MS-DOS. Unlike screen handling,
however, there's little feedback from the printer. Simple questions like,
"Where is the cursor?" cannot be answered using PCL alone. For this reason,
several variables in the class are used as state variables. These record the
current x and y position; the current font; and font size, symbol set, and
intensity settings.
Note that the x and y positions are not always accurate for the x dimension
because the class as it exists has no way of determining the width of a
character if it is part of a proportional font. A possible enhancement to the
class would be inclusion of a font-metrics call that would allow the class to
always know the current cursor position, as well as allowing proportional
fonts to be centered.
The constructor's first task is to open a file (with a stdio.h C-style FILE *)
using the filename passed to the constructor, or LPT1 by default. Production
code should also add a test for device availability if the "filename" passed
is actually LPTn, where n is the digit 1 through 3. Next, use the moveto()
member function to position the cursor (to the position where actions, by
default, occur). Like all C++ classes, constructors do not return a value.
After construction and before the first use, the object can be tested by
casting the object to a FILE * and testing for NULL.
All actual output to the file, including the codes required to move cursor
positions and draw lines, comes through the outtext() member function. This
function would be the ideal place for centralized printer-status testing. The
lineto() member function is central to the business of drawing bar-code
symbols, so it deserves some explanation. PCL has no native support for
drawing lines. Horizontal and vertical lines can be drawn using the
rectangle-drawing functions. All lines are simply one-dot-wide or one-dot-high
rectangles. The lineto() function has no support for lines other than
horizontal or vertical.
The pushPos() and popPos() member functions can be used to save and restore
cursor position before and after performing some action. These functions use a
PCL command to save the position, not relying upon x and y stored by the
class. Other member functions set font values and manipulate the various state
variables.
The destructor simply calls reset() (which issues a PCL reset command) and
closes the output file.


Using the PCL Class to Print Code-128 Bar Codes



The code to print the Code-128 bar code is CODE128.CPP (which, along with
executables and sample data, is available electronically, see "Availability,"
page 3). The bulk of the code initializes an array that controls the lines and
spaces that make up the bar code. The CODE128 structure is made up of an
integer value that holds the actual character value for each code for
character-set A, and a character array that holds 1 for a line and 0 for a
space for each of the 11 modules of a Code-128 bar code. Note that each code
ends with a 0 as the 11th module in the code. The character values for
"unprintable" characters (FF, CR, and so on) are represented as single spaces,
since for any application I've had, these characters haven't been required.
The printCode128Str() function takes a string; x-, y-, and height coordinates;
and a reference to a PCL object. First, a start character is printed using
printCode128Ch(). Each of the 11 modules is made up of a three-dot-wide line
or space. While three dots is the minimum, the maximum width is up to you,
since Code 128 scales up nicely. For each data character, I loop through the
array of CODE-128 structs to find the correct value and then print it using
printCode128Ch(). Finally, I calculate a check digit. The Code 128 check digit
is different than the UPC check digit mentioned previously. The data-character
values are added together, plus 103 for the start character, and the total mod
103 is the check digit.
CODE123.CPP uses the PCL class to create bar codes from a list of text lines
in a file whose name is passed as argument 1. To display the text of the code
under the actual bar-code symbol, use a non-proportional font (compressed or
lineprinter, in this case) and calculate the width of the text that's printing
so it can center within the bar code.


Conclusion


Production code should have error handling of cases where the printer is not
available. A DOS critical-error handler will handle most such errors, but a
BIOS level check for printer status can also be useful if outputting directly
to a printer.
Another useful addition to the PCL class would be the previously mentioned
font-metrics member function that would allow you to determine the width of a
string of proportional-font characters. The ability to draw other elements
besides horizontal and vertical boxes would also be useful. The
printCode128Str() function could be enhanced to take advantage of
character-set C of Code 128, so strings of four or more characters could be
compressed as two digits per 11 module symbol.
Table 1: UPC system digits. (Source: The Bar Code Book, second edition, Roger
C. Palmer, Helmers Publishing, 1991.)
 Digit Meaning 
 0 92,000 manufacturer numbers, 8,000 locally assigned numbers
 1 Reserved
 2 Random-weight consumer packages
 3 Drug products
 4 In-store marking without format
 5 UPC coupons
 6 Manufacturer ID numbers
 7 Manufacturer ID numbers
 8 Reserved
 9 Reserved
Figure 1: Calculating the check digit for UPC numbers. Use the number-system
digit (in this example, 0), the five-digit manufacturer's code (49200), and
the product code (05675). Thanks to Paul Pazniokas at Posse Inc.
Step 1 Start at the right, sum up digits in the odd position:
 0 4 9 2 0 0 0 5 6 7 5
 0 + 9 + 0 + 0 + 6 + 5 = 20
Step 2 Multiply the value from Step 1 by 3:
 20 * 3 = 60
Step 3 Sum up the digits in the even position:
 0 4 9 2 0 0 0 5 6 7 5
 4 + 2 + 0 + 5 + 7 = 18
Step 4 Add the results of Steps #2 and #3:
 60 + 18 = 78
Step 5 The check digit is the smallest number that produces a multiple of 10
when added to the result of Step #4 (in this case, 2).
Figure 2 A Code-128 bar code that decodes as "CODE 128." 

Listing One 


#ifndef PCL_H
#define PCL_H

// just two of MANY symbol sets
#define SYMSET_ROMAN8 8
#define SYMSET_PC8 10

// Just 4 of MANY typefaces.
#define TYPEFACE_CGTIMES 4101
#define TYPEFACE_UNIVERS 4148
#define TYPEFACE_LINEPRT 0
#define TYPEFACE_COURIER 3

class PCL {
 int x;
 int y;
 int curPoints;
 int curTypeface;
 int curBold;

 int curSymset;
 int curProp;
 int curPitch;
 char dest[120];
 FILE *out;
public:
 // dest is destination of printout...
 PCL(char *tdest);
 // destructor
 ~PCL();
 // The next two functions look lots like Borland's BGI interface.
 // This is an intentional modeling.
 int moveto(int newX,int newY);
 int lineto(int destX,int destY);
 void outtextxy(int newX,int newY,char *text);
 void outtext(char *text);
 void setFont(int proportional,int symSet,int points,int bold,int typeface);
 void reset()
 {
 char temp[10];
 if ( out )
 {
 sprintf(temp,"%cE",27);
 outtext(temp);
 }
 }
 void bold(int boldNum=3)
 {
 setFont(curProp,curSymset,curPoints,boldNum,curTypeface);
 }
 void noBold()
 {
 setFont(curProp,curSymset,curPoints,0,curTypeface);
 }
 int getBold()
 {
 return(curBold);
 }
 void compressed()
 {
 char temp[40];
 setFont(0,SYMSET_PC8,10,0,TYPEFACE_LINEPRT);
 if ( out )
 {
 sprintf(temp,"%c&k2S",27);
 outtext(temp);
 }
 }
 void normal()
 {
 char temp[40];
 setFont(0,SYMSET_PC8,12,0,TYPEFACE_COURIER);
 if ( out )
 {
 sprintf(temp,"%c&k0S",27);
 outtext(temp);
 }
 }
 int rectangle(int left,int right,int top,int bottom,int shade);

 int box(int left,int right,int top,int bottom);
 void popPos()
 {
 char temp[40];
 if ( out )
 {
 sprintf(temp,"%c&f1S",27);
 outtext(temp);
 }
 }
 void pushPos()
 {
 char temp[40];
 if ( out )
 {
 sprintf(temp,"%c&f0S",27);
 outtext(temp);
 }
 }
 int fontHeight();
 void setLPI(int lpi)
 {
 char temp[40];
 if ( out )
 {
 sprintf(temp,"%c&l%dD",27,lpi);
 outtext(temp);
 }
 }
 void setVMI(int vmi)
 {
 char temp[40];
 if ( out )
 {
 sprintf(temp,"%c&l%dC",27,vmi);
 outtext(temp);
 }
 }
 void setPitch(int pitch)
 {
 char temp[40];
 if ( out!=0 )
 {
 curPitch=pitch;
 if ( pitch==12 )
 {
 sprintf(temp,"%c&k2S",27);
 }
 else if ( pitch==10 )
 {
 sprintf(temp,"%c&k0S",27);
 }
 else if ( pitch==16 ) // really 16.5 to 16.7, but use an int.
 {
 sprintf(temp,"%c&k4S",27);
 }
 outtext(temp);
 sprintf(temp,"%c(s%dH",27,pitch);
 outtext(temp);

 }
 }
 // just in case you really do want to violate encapsulation...
 operator FILE *()
 {
 return(out);
 }
};
#endif



Listing Two

#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include "pcl.h" // my header that describes the PCL class

PCL::PCL(char *tdest)
{
 if ( tdest==0 )
 {
 strcpy(dest,"LPT1");
 }
 else
 {
 strcpy(dest,tdest);
 }
 // out will be 0 if dest can't be opened.
 out=fopen(dest,"a+b");
 if ( out!=0 )
 {
 char temp[20];
 moveto(0,0);
 // For these and all control codes, refer to the Owner's Manual of a
 // PCL 5 compatible laser printer.
 sprintf(temp,"%c&l0O",27);
 outtext(temp);
 }
}
PCL::~PCL()
{
 if ( out!=0 )
 {
 reset();
 fclose(out);
 out=0;
 }
}
void PCL::outtextxy(int newX,int newY,char *text)
 {
 moveto(newX,newY);
 outtext(text);
 }
// EVERYTHING goes through here. Even member functions that could access the
// FILE * directly use outtext() so that changes to underlying implementation
// (use the streams IO, etc.) are transparent to all external and internal
// functions, except the constructor, destructor, and outtext.

void PCL::outtext(char *text)
 {
 if ( out!=0 )
 {
 fprintf(out,"%s",text);
 }
 }
void PCL::setFont(int proportional,int symSet,int points,int bold,int
typeface)
{
 if ( out )
 {
 char temp[40];
 sprintf(temp,"%c(s%dU",27,symSet);
 outtext(temp);
 sprintf(temp,"%c(s%dP",27,proportional);
 outtext(temp);
 sprintf(temp,"%c(s%dV",27,points);
 outtext(temp);
 sprintf(temp,"%c(s%dB",27,bold);
 outtext(temp);
 sprintf(temp,"%c(s%dT",27,typeface);
 outtext(temp);
 // set the state variables...
 curPoints=points;
 curSymset=symSet;
 curTypeface=typeface;
 curBold=bold;
 curProp=proportional;
 }
}
int PCL::box(int left,int right,int top,int bottom)
{
 if ( left>right top>bottom )
 {
 return(0);
 }
 moveto(left,top);
 lineto(left,bottom);
 lineto(right,bottom);
 moveto(right,top);
 lineto(right,bottom);
 moveto(left,top);
 lineto(right,top);
 x=right;
 y=top;
 return(1);
}
int PCL::lineto(int destX,int destY)
 {
 int top,bottom,left,right;

 top=destY;
 bottom=y;
 left=destX;
 right=x;
 if ( top>bottom )
 {
 int t;
 t=top;

 top=bottom;
 bottom=t;
 }
 if ( left>right )
 {
 int t;
 t=left;
 left=right;
 right=t;
 }
 if ( out!=0 )
 {
 char temp[40];
 moveto(left,top);
 if ( left==right )
 {
 sprintf(temp,"%c*c1a%db0P",27,bottom-top);
 }
 else
 {
 sprintf(temp,"%c*c%da1b0P",27,right-left);
 }
 outtext(temp);
 moveto(destX,destY);
 x=destX;
 y=destY;
 }
 return(0);
 }
int PCL::moveto(int newX,int newY)
 {
 if ( out!=NULL )
 {
 char temp[40];
 sprintf(temp,"%c*p%dx%dY",27,newX,newY);
 outtext(temp);
 x=newX;
 y=newY;
 }
 return(0);
 }

int PCL::rectangle(int left,int right,int top,int bottom,int shade)
{
 moveto(left,top);
 if ( out )
 {
 char temp[40];
 sprintf(temp,"%c*c%da%dB%c*c%dg2P",27,(right-left),(bottom-top),27,shade);
 outtext(temp);
 }
 moveto(left,top);
 return(0);
}
int PCL::fontHeight()
{
 double tFloat;
 tFloat=(double)curPoints/72.0;
 tFloat*=300.0;

 return((int)tFloat);
}




























































July, 1994
Postman: A Bridge to the UNIX Mail System


Automating the mail process




Zongnan H. Lu


Henry is a system analyst with the University of Michigan School of Business
Administration. He can be contacted at henry_lu@ccmail.bus.umich.edu.


Postman is an interface program that sits between PIS, our in-house UNIX-based
personal-information system (which currently supports more than 3000
registered users) and the UNIX sendmail program, which allows access to the
outside world. Postman checks incoming mail delivered by sendmail, putting it
into the PIS, and checks outgoing mail in the PIS, sending it to sendmail. In
short, postman provides a way to exchange mail between user-application
programs and the outside world through the existing UNIX Mail System. 
Communication between sendmail and postman is handled by the UNIX Mail
command. If a user has an Internet mailing address, any mail coming from
outside will not only go into the PIS, but will also be forwarded to the
Internet mailing address; see Figure 1.
Before running postman or PIS, your application program must be running on a
UNIX computer which is set to be either mail master, or using the DNS. 
A special user group must be created for the application users. The
system-specific mail-spooling directory is used in the postman program. Each
user has his own mailbox bearing the user's name. In our case, /usr/spool/mail
is used as a mail-spooling directory. Postman is owned by a
superuser--root--and can handle only mail formatted as ASCII text. 
Communication between postman and the PIS is flexible. For example, I put mail
read from a mailbox directly into a database accessed by PIS. Conversely, the
notes in the database are read by the postman program and then sent out.


Adding a User


Whenever a new user is registered in PIS, the user's name should be added into
the UNIX-system password file /etc/passwd, which will be checked by the
sendmail program when mail comes in. All users in PIS are set to be in the
same group; that is, they have the same group ID and an entry in the
/etc/passwd file as follows:
henry_smith:ABC123XYZ:101:66:henry_smith:/pis_mailuser:/bin/csh, where
henry_smith is the user's mailing address known to outside senders. ABC123XYZ
is the user's password for logging onto the UNIX system. 101 is a unique user
ID, assigned by the postman program. Under UNIX, a user ID could be as large
as 32767 or 65534. The group's ID registered in the /etc./group file is 66,
/pis_mailuser is the user's home directory used by the postman program, and
/bin/csh is the login shell. For simplicity and to save disk space, group ID,
home directory, login shell, and even password are shared by all PIS users.
When the postman is alerted that a user has been denied access to PIS, the
user's entry in the /etc/passwd will be deleted. 
If a user has an Internet address set in PIS, then the user's name and
Internet address will be added into the UNIX alias file etc/aliases. The entry
looks like: henry_smith:pis_user, henry_smith@ccmail.bus.umich.edu, where the
first part of the line is the user's name registered in /etc/passwd. The third
part is an Internet mailing address. This could be a number of forwarding
addresses separated by commas. The second part, pis_user, is shared by all PIS
users in the /etc/aliases file. This forces incoming mail not only to go into
PIS but also to forward it to its Internet address. Depending on how sendmail
searches the file /etc/aliases, if henry_smith was placed in the position of
pis_user, then the search would go into an infinite loop. Ignoring this
problem, I created another user in /etc/passwd, pis_user. This caused all
incoming mail to be put into pis_user's mailbox instead of the mailboxes
belonging to those users who have set their Internet addresses. The postman
will open the pis_user's mailbox, find the name of sender and recipient, and
put them into PIS properly. The system command newaliases should be called
every time /etc/aliases is updated.


Reading Incoming Mail


Before running the postman program, I create a reading mail format
shell-command file, /pis_mailuser/d that looks like this: 
p 1
d 1
q
The first line tells the Mail command to read the first piece of mail in a
special mailbox, d 1 deletes the piece after reading, and q exits from Mail
command.
When mail is delivered by sendmail, it is added to its recipient's mailbox (a
text-format file) in the /usr/spool/mail directory. The postman program scans
all mailboxes in the mail-spooling directory. To read one piece of mail for
each user, the postman calls: system("Mail --N >/pis_mailuser/a --u
henry_smith </pis_mailuser/d"), where --N tells the Mail command to read mail
without the initial header summary; /pis_mailuser/a is a temporary file for
Mail to put mail into and later for postman to read; the name following --u
(henry_smith, in this case) is a user's name; and d is the format file for
Mail to use. Before calling system(), /pis_mailuser/a must be deleted. After
system(), postman calls another function to read /pis_mailuser/a and put it
into PIS. This whole process is in two loops, and it continues until all
mailboxes are empty. If a recipient has a forwarding address in /etc/aliases,
then the sendmail program will forward the incoming mail to its Internet
address and deliver it into the pis_user mailbox.


Sending Mail


In sending mail, the postman program reads the mail sender's name, recipient's
address, and mail text from PIS and puts the mail itself into /pis_mailuser/a.
It then changes the owner of /pis_mailuser/a to the sender's name, and root
creates a file that changes the mode of /pis_mailuser/a to 0777 (executable).
For root to send the mail using the original mail sender's name, the postman
program creates a one-line shell-command file: /pis_mailuser/m: Mail --s "this
is a test" john_smith@um.umich.edu </pis_mailuser/a, where "this is a test" is
the mail's subject, john_smith@um.umich.edu is the destination address, and
/pis_mailuser/a is the mail file. The owner and mode of /pis_mailuser/m must
then be changed to the sender's name and 0777. Finally, postman issues the
function call: system("su --henry_smith c /pis_mailuser/m") to post the mail.


The Postman Program 


The postman program (see Listing One and Listing Two) was written in Sun C++
2.1 under SunOS 4.1.3. Letter is a base class that includes five members: mail
title, date, sender, recipient, and the name of the file in which mail text is
stored. Two pure virtual functions send() and read(), are created to send mail
out and read mail in. Users must create their own version of send() and read()
based on different applications and environments. Class myletter is derived
from letter. As mentioned earlier, if a user has a forwarding address set in
the /etc/aliases file, then his or her mail will be put into pis_user's
mailbox. When myletter::read() reads the pis_user mailbox, it may see mail for
different users each time. To find the recipient's name, read() scans mail
text, reads the line with the first substring "To: ", and calls _getusrname()
to separate the user name from its mailing address for later use.
A simple error-handling class, Errors, reports error messages when a critical
error occurs and terminates the program. Class dirf is used for reading the
mailbox in the mail-spool directory. Postman resides in the default directory.
The member function resetdir() should be called before starting another search
if dirf is declared as static or global.
The main body of the program is in a loop that checks incoming mail and passes
it to PIS, checks outgoing mail from PIS and sends it out to the sendmail, and
goes to sleep for a certain period of time until the alarm wakes it up.
Figure 1 Postman sits between the personal-information system and sendmail.

Listing One 

//////////////////////////////////////////
// ml.h - header file for postman //

//////////////////////////////////////////

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/file.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/dir.h>
#include <signal.h>

#include <sysent.h>
#include <iostream.h>
#include <fstream.h>
#include <stdarg.h>

const int WAITTIME = 60;
const int MAXDAT = 24;
const int MAXNAM = 128;
const int MAXFNM = 32;
const int MAXID = 12;

const int FRSTUSRID = 200; // PIS users' ID start from 200

const char* A_FILE = "a";
const char* M_FILE = "m";
const char* D_FILE = "d";

#define MAILDIR "/usr/spool/mail"
#define ERRLOG "/tmp/postman.err"
#define HOSTADDR "@my_pis.my_corp.com"

class Errors {
 enum {bsize = 256};
 char fname[bsize]; // error log file
 public:
 Errors(char *fn) {strcpy(fname, fn);};
 void seterr(const char *,...);
};
class letter {
 public:
 char title[MAXNAM];
 char date[MAXDAT];
 char sender[MAXNAM];
 char recipient[MAXNAM];
 char textf[MAXFNM];
 public:
 virtual int send() = 0;
 virtual int read(char *) = 0;
};
class myletter : public letter {
 char *hostaddr;
 char usrdir[MAXFNM];
 int usrid;
 int grpid;
 public:
 int letterId;
 public:
 myletter(char *hst) : letterId(-1), hostaddr(hst) {

 usrdir[0]=0;
 }
 int isinPIS(char *);
 int send();
 int read(char *);
};
class dirf {
 DIR *dirp;
 char *dirname;
 struct direct *ds;
 public:
 dirf (char *drnm=".") { dirname=drnm; dirp=NULL;}
 ~dirf() {dirp=NULL;}
 void resetdir(char *drnm=".") {
 if (dirp)
 closedir(dirp);
 dirp=NULL;
 dirname=drnm;
 }
 int rddir(char *);
};
extern Errors _syserr;
extern "C" {
 int _getusrname(char *, char *, char *);
 void check_inmail();
 void check_outmail();
 int _read_outmail(myletter *);
 void _readmail(char *);
 void wakeup();
 void _mail2note(myletter *);
}



Listing Two

//////////////////////////////////////////
// postman.c //
//////////////////////////////////////////

#include "ml.h"
//---------------------------------------//
// dirf::rddir() - class member function //
//---------------------------------------//
int dirf::rddir(char *fname)
{
 if (dirp == NULL) {
 if ((dirp = opendir(dirname)) == NULL)
 _syserr.seterr("open %s failed, errno = %d\n", dirname, errno);
 }
 errno = 0;
 if ((ds = readdir(dirp)) == NULL) {
 if (errno) {
 _syserr.seterr("readdir failed, errno = %d\n", errno);
 }
 return 0;
 }
 strcpy(fname, ds->d_name);
 return 1;

}
//------------------------------------------//
// Errors::seterr() - class member function //
//------------------------------------------//
void Errors::seterr(const char *format_str,...)
{
 FILE *fp;
 va_list arg_ptr;

 va_start(arg_ptr, format_str);
 if ((fp=fopen(fname,"w"))!=NULL) {
 vfprintf(fp, format_str, arg_ptr);
 fclose(fp);
 }
 va_end(arg_ptr);
 exit (-1);
}
//--------------------------------------------------------------------//
// _find_usrid() - find a user's login I.D. in the /etc/passwd file. //
// -------------------------------------------------------------------//
int myletter::isinPIS(char *usrnam)
{
 struct passwd *p;
 
 setpwent();
 while (p=getpwent()) {
 if (!strcmp(usrnam,p->pw_name)) {
 if (p->pw_uid < FRSTUSRID)
 return 0;
 strcpy (usrdir, p->pw_dir);
 usrid = p->pw_uid;
 grpid = p->pw_gid;
 return 1;
 }
 }
 endpwent();
 return 0;
}
//------------------------------------------//
// myletter::send() - class member function //
//------------------------------------------//
int myletter::send()
{
 char mfile[256], buf[256];
 FILE *fp;
 int ret, forward=0;

 if (!isinPIS(sender)) {
 _syserr.seterr("send:isinPIS:[%s]",sender);
 }
 sprintf(mfile,"%s/%s", usrdir, M_FILE);
 if ((fp=fopen(mfile,"w"))== NULL) {
 return -1;
 }
 fprintf(fp,
 "Mail -s \"%s\" %s <%s\n", title, recipient, textf),
 fclose(fp);

 if (chmod(mfile, 0777))

 _syserr.seterr("send:chmod:%s",mfile);
 if (chmod(textf, 0777))
 _syserr.seterr("send:chmod:%s",textf);
 if (chown(mfile, usrid, grpid))
 _syserr.seterr("send:chown:%s",mfile);
 if (chown(textf, usrid, grpid))
 _syserr.seterr("send:chown:%s",textf);

 sprintf(buf,"su - %s -c %s", sender, mfile);
 ret=system(buf); 
 if (ret)
 _syserr.seterr("send:system:%s(%d)\n",buf,ret);

 return 1;
}
//------------------------------------------//
// myletter::read() - class member function //
//------------------------------------------//
int myletter::read(char *usr)
{
 short frnm=0, tonm=0, subj=0, datm=0, ret;
 char buf[256];
 ifstream inf;

 sprintf(buf, "%s/%s", usrdir, A_FILE);
 unlink(buf);
 sprintf(buf,"Mail -N >%s/%s -u %s <%s/%s",
 usrdir, A_FILE, usr, usrdir, D_FILE);
 ret = system(buf);
 if (ret) {
 _syserr.seterr("read:system:%s:ret=%d\n",buf, ret);
 }
 sprintf(textf, "%s/%s", usrdir, A_FILE);
 inf.open(textf);
 while (inf.getline(buf,256)) {
 if (!datm && !strncmp(buf,"Date:",5)){
 strcpy(date,(char *)&(buf[5]));
 datm++;
 } else if (!frnm && !strncmp(buf,"From",4)) {
 strcpy(sender,strtok((char *)&(buf[5])," "));
 frnm=1;
 } else if (!tonm && !strncmp(buf,"To: ",4)) {
 if (strstr(buf,usr)) {
 // mail is in the user's mailbox //
 strcpy(recipient, usr);
 } else {
 // mail is in the pis_user mailbox //
 if (!_getusrname((char *)&(buf[4]),
 recipient, hostaddr))
 _syserr.seterr("_getusrname:[%s]\n",buf);
 }
 tonm++;
 } else if (!subj && !strncmp(buf,"Subject: ",9)) {
 strcpy(title, (char *)&(buf[9]));
 subj++;
 } else if (datm && frnm && tonm && datm &&
 !strncmp(buf,"Status:",7)) {
 break;
 }

 }
 inf.close();
 return 1;
}
//-------------------------------------//
// wakeup() - signal handling function //
//-------------------------------------//
void wakeup()
{
 alarm(0);
 signal(SIGALRM, (SIG_PF)wakeup);
}
Errors _syserr(ERRLOG);
void main()
{
 signal(SIGALRM, (SIG_PF)wakeup);
 alarm(1);

 do {
 pause();

 check_inmail();

 check_outmail();

 alarm(WAITTIME);
 } while (1);
}
//-------------------------------------------------------//
// check_outmail() - check outgoing mail and send it out //
//-------------------------------------------------------//
void check_outmail()
{
 myletter ml(HOSTADDR);
 do {
 if (!_read_outmail(&ml)) {
 break;
 }
 ml.send();
 } while (1);
}
//---------------------------------------------------------//
// check_inmail() - check incoming mail and pass it to PIS //
//---------------------------------------------------------//
void check_inmail()
{
 char fname[256];
 dirf df(MAILDIR);

 while (df.rddir(fname)) {
 if (strcmp(fname, ".") && strcmp(fname, "..") &&
 strcmp(fname,"root")) {
 _readmail(fname);
 }
 }
}
//------------------------------------------------------------------//
// _readmail() - read title, sender, recipient, and date from mail. //
//------------------------------------------------------------------//

void _readmail(char *usr)
{
 char tmp[256];
 myletter ml(HOSTADDR);
 int ret;

 if (!ml.isinPIS(usr))
 return;

 do {
 ml.read(usr);

 _mail2note(&ml);

 sprintf(tmp,"%s/%s", MAILDIR, usr);
 ret= access(tmp, F_OK);
 } while (!ret);
}
//-------------------------------------------------------------//
// _getusrname() - find a user's name from a recipient string. //
//-------------------------------------------------------------//
int _getusrname(char *s, char *usr, char *hst)
{
 char tmp[256], *p;

 usr[0]=0;
 strcpy(tmp, s);
 if (strchr(s,'@')==NULL) {
 p = strtok(tmp," \"<>");
 p = strtok(NULL," \"<>");
 if (p == NULL)
 return 0;
 strcpy(usr,p);
 return (strlen(usr));
 }
 p = strstr(tmp,hst);
 int i = strlen(s)-strlen(p);
 tmp[i]=0;
 for (i=strlen(tmp)-1;i>=0;i--) {
 if (strchr(" <,[\"",tmp[i]))
 break;
 }
 strcpy(usr,(char *)&(tmp[i+1]));
 return (strlen(usr));
}
//---------------------------------------//
// _read_outmail() - read mail from PIS. //
//---------------------------------------//
int _read_outmail(myletter *ml)
{
 strcpy(ml->textf,"");
 //
 // user application function getone_inpis()
 // assigns mail title, date, sender, recipient,
 // and text file name to ml.
 //
 // if(getone_inpis(ml->title, ml->date, ml->sender,
 // ml->recipient, ml->textf)) {
 // return 1;

 // else
 // return 0;
 //
 return 0;
}
//---------------------------------//
// _mail2note() - pass mail to PIS //
//---------------------------------//
int _getletterId(void)
{
 static i=0;

 return (++i);
}
void _mail2note(myletter *ml)
{
 ml->letterId = _getletterId();
 //
 // pass2pis(ml->title, ml->date,
 // ml->sender, ml->recipient, ml->textf);
 //
}








































July, 1994
Ray Tracing and the POV-Ray Toolkit


A powerful multiplatform graphics package that comes with source code




Craig A. Lindley


Craig is a founder of Enhanced Data Technology (Colorado Springs, CO),
developers of the imaging-database tool EnhancedView. He is the author of
Practical Image Processing in C and Practical Ray Tracing in C, both published
by John Wiley & Sons. Craig can be contacted at Enhanced Data or on CompuServe
at 73552,3375.


Once, only graphics gurus with access to supercomputers had the compute power
to master black arts such as ray tracing. Today's powerful PCs, combined with
readily available ray-tracing software have changed this, making it possible
for anyone with a PC to render almost anything. Even if you can't draw a
straight line with a ruler, you can produce spectacular images. You only have
to be able to visualize objects and their relationships in an imaginary
three-dimensional setting.
The software I use for ray tracing is POV-Ray ("Persistence of Vision Ray
Tracer"), a powerful, multiplatform package available free of charge in
source-code and executable form. POV-Ray, which is written in C, is available
for PCs (running under DOS, Windows, NT, or OS/2), Macintosh, Amiga, UNIX
(including X Windows), and VMS workstations. POV-Ray (based on the DKBTrace
2.12 ray-tracer written by David Buck and Aaron Collins) is supported and
maintained by the POV-Team, a group of volunteer programmers, designers,
animators, and artists. 
POV-Ray is not public-domain software. It is copyrighted by the POV-Team and
used under the conditions set forth in its documentation. The POV-Ray
package--documentation, executables, source code, and sample image files--is
available on CompuServe's GRAPHDEV Forum, the PC Graphics area on America
On-Line (jump keyword "PCGRAPHICS"), the Internet via anonymous ftp from
alfred.ccs.carleton.ca (134.117.1.1), the "You Can Call Me Ray" BBS
(708-358-5611), "The Graphics Alternative" BBS (510-524-2780), or any number
of freeware/shareware disk houses. In addition, many of these sources have
other tools designed for use with POV-Ray, including those for scene design
(POVCAD), font generation, animation, automatic scene generation, and the
like. Questions about POV-Ray should be directed to the POV-Team leader, Drew
Wells (CompuServe, 73767,1244; AOL: Drew Wells).
POV-Ray provides a scene-description language, supports images of up to
4096x4096, and uses standard include files for predefined shapes, colors, and
textures. POV-Ray can produce 24-bit color image files, fractal landscapes
(using height fields), spotlights for sophisticated lighting, Phong and
specular highlighting for realistic-looking surfaces, and several image-file
output formats. It can render a wide range of shapes, including spheres,
boxes, ellipsoids, cylinders, cones, triangles, planes, tori, hyperboloids,
paraboloids, Bzier patches, height fields, blobs, quartics, and smooth
Phong-shaded triangles. POV-Ray also supports constructive solid geometry
(CSG) operations--unions, intersections, and differences--for combining
primitive shapes into more complex ones. Finally, POV-Ray has built-in
textures: marble, checkerboard, wood, bumpy, agate, clouds, granite, ripples,
waves, leopard, wrinkled, mirror, chrome, brass, gold, silver, blue sky with
clouds, sunset with clouds, sapphire agate, jade, shiny, brown agate,
apocalypse, blood marble, glass, brown onion, pine wood, cherry wood, and
more.


Ray-Tracing Fundamentals


Ray tracing produces photorealistic images--those in which the interplay of
light and shadows with 3-D objects closely resembles that found in nature. An
advantage ray tracing enjoys over other 3-D rendering techniques is the
simplicity with which you can incorporate effects such as shadows,
reflections, refractions, and transparency. The reason ray tracing can produce
photorealistic images with these visual effects is because the tracing of rays
simulates light. In other words, ray tracing models the interaction of light
rays and objects using simple optical principles. [Editor's note: For details,
see "Ray: A Ray Tracing Program in C++" elsewhere in this issue.] The rules
used for light-ray and object interaction are simplified from those of nature
but still provide realistic results. Most ray-tracing programs simplify nature
by assuming that:
Light rays travel only in straight lines through homogeneous media.
Light interacts with objects only at their surfaces.
Properties of light such as diffraction, phase, polarization, wavelength, and
attenuation over distance aren't generally taken into consideration.
Of course, the more complete a ray-tracing program is, the closer to reality
are its optical models. The trade-off is computation time--the more accurate
the model, the slower the program runs. 
Ray tracing is an example of a class of graphics algorithms referred to as
"point-sampling algorithms" which determine the visibility of a surface
(object) using a finite number of sample points, making assumptions about the
points in between. Ray tracing is thus an approximation of reality. (Other
algorithms in this class are the Z-buffer, painter's, and scanline algorithms,
all of which are used in hidden-surface determination and removal. A related
class of algorithms, referred to as "continuous algorithms," tries to
determine visibility continuously over entire surfaces.)
Performance limitations and aliasing problems arise as a result of sampling.
As the number of points evaluated increases, so does the apparent realism of
the image. Unfortunately, the computation time increases as well, so
performance is an issue. The use of multiple light sources within a scene also
increases realism, again at the cost of performance. 
Aliasing is a problem which must be addressed whenever sampling is used.
Aliasing arises from the inability to reconstruct the original signal from its
discrete samples. The severity of the problem is directly related to the
frequency of the signal being sampled and the sampling rate. If the sampling
rate or frequency is high enough, aliasing is not a problem. However, as the
sampling frequency nears that of the signal being sampled, undesirable
low-frequency signal components called "aliases" are created.
When ray tracing is used to produce detailed images, some details are masked
by aliases. This results in various types of image distortion, including
"jaggies," which must be dealt with to produce photorealism. Aliasing
artifacts in ray-traced still images are bad enough, but when animation of
still images is required, temporal aliasing compounds the problem. Rendering
an image with higher resolution (higher-frequency sampling) helps, but it
doesn't solve the problem completely. You'll need to use antialiasing
techniques such as those built into POV-Ray to mask the effects of aliasing.
To construct a ray-traced image, you begin by defining a 3-D scene of light
source(s) and object(s) in relation to the position of the viewer. Each object
is assigned a location and attributes that define its shape and surface
properties (color, reflectivity, texture, and so on). With the scene defined,
the view geometry specifies where in 3-D space the viewer's camera or eye will
be located (where the scene will be viewed from), the direction the camera
will be looking, the orientation of the camera, and the camera's field of
view. The view of the 3-D scene is through a hypothetical window called the
"view plane," which is mapped to the computer's monitor. No matter where the
camera is located or how it is positioned, the view plane always lies between
the camera and the objects in the 3-D scene. 
At first, it may seem intuitive to trace light rays which begin at a light
source, enter the scene, and--after bouncing through the scene--pierce the
view plane and contact the eye. This approach, sometimes referred to as
"forward ray tracing," is workable but computationally expensive. Accurately
rendering a scene in this way would require billions of light rays to be
generated and traced, and only a small percentage of these would have the
correct trajectory to ultimately pass through to the eye. Such an image would
require months of PC-computation time. 
However, "ray casting" (also known as "backward" or "reverse" ray tracing) is
much more efficient than forward ray tracing. Ray casting reverses the tracing
of rays by starting at the eye location and generating a ray that passes
through each pixel of the view plane on its way to the 3-D scene. As these
rays strike objects in the scene, the pixel through which the ray passed takes
on the color of the closest intersected object. If the point of intersection
between the ray and the closest object has an unobscured view of a light
source, the point of intersection is fully illuminated by the light source.
If, however, any objects in the scene lie between the point of intersection
and the light source, the object at that point is in shadow, and its
corresponding pixel's intensity diminishes accordingly. Example 1 is a
pseudocode description of this ray-tracing process. 
Determining an object's color at the point of intersection with the eye ray
would be easy if not for an object's reflective and/or refractive nature and
texture. The color at the point of intersection is a function of the object's
color, the light source's color, the distance from the light source, and any
reflection/refraction. The calculation of color is a recursive process called
"shading." Every time a reflective/refractive object is intersected by the eye
ray, an additional set of rays must be generated. Each of the additional rays
must then be traced back to each light source to determine its contribution to
the object's color. Because the process is recursive, the resulting ray tree
describes the components that contribute to the object's color. The color
contributions at the leaves of the tree are evaluated and passed up the tree.
Each contribution figures into the final color of the object and therefore
into the color of the pixel with which it is associated. 


The Image-Development Process


The image-development process is one of continual refinement, closely
resembling the typical software-development process; see Figure 1. During the
image-design stage, ideas are translated into scene-description language
statements and entered into an ASCII image-model file. The image file is then
read by the ray-tracer program, and a low-resolution image--which requires
less time to generate--is produced. Next, the output file produced by the ray
tracer is color quantized (if necessary) and displayed to see what it really
looks like. Modifications are usually needed, a process that continues until
the image meets expectations. At this point, I render the image in high
resolution with antialiasing enabled and save it as a graphics file. This
final image, which will probably be of much higher resolution, takes longer to
produce, but only needs to be done once.
Once a ray-traced image is put into a standard, graphical file format (the PC
version of POV-Ray produces TARGA), you can manipulate it, incorporate it into
an animation sequence, or produce hard copy. 


Using POV-Ray 


Because ray tracing is computationally expensive, the more powerful your
computer, the faster you'll get results. Minimum hardware for the PC is a 386
with 2 Mbytes of RAM. However, for complex imagery you'll want a math
coprocessor if you want to see the results in your lifetime. 
POV-Ray uses a scene-description language to implement ray-tracing concepts.
Syntactically correct language statements are coded into an ASCII text file
(with a .POV extension) that's read by the POV-Ray program for rendering. As
you can see in Listing One , the general process of defining a ray-traced
scene consists of:
Defining the camera (position and orientation). This describes where the
camera is in the 3-D universe, which direction it is pointing, and which
direction is up and to the right from the camera's perspective. Camera setup
determines what the view of the resulting image will look at.
Defining the light source(s) (type, location, and color). This provides
illumination for seeing any objects in the scene.
Defining the simple objects within the scene (floor, ocean, backdrop, and so
on).
Combining simple objects into complex ones using composite objects and CSG.
Translating, rotating, and scaling objects, light sources, and camera.
Assigning colors, textures, and finishes for the objects in the scene.
Listing One is a basic scene-description file which generates the image in
Figure 2. The visible portion of the scene consists of a red sphere with
specular highlight casting a shadow on a blue floor-like plane. The language
is block structured and, for the most part, generally understandable. Items in
angled brackets (<>) are vectors used to describe location and direction to
the ray tracer. Curly braces ({}) delineate object definitions, and double
slashes (//) denote comments. Listing Two is the code which describes the
starry black sky, checkered floor (including reflection), infinite-path sky,
and other aspects of the complete ray-traced image in Figure 3. The complete
source code that describes the image as well as the image itself are available
electronically; see "Availability," page 3. 

After using POV-Ray on projects ranging from book covers to greeting cards,
I've come up with a number of hints and tips for speeding up the
image-development process. For instance, one technique that saves time is to
render the smallest practical image during image development. Small images are
fine for establishing object and/or view-point placement; larger images are
rendered when you need to see detail. Likewise, you can lower the POV-Ray
rendering-quality factor (command-line option q) for less image-generation
time. Faster rendering times are traded for lower-quality imagery, but
lowering the quality during the image-development process can speed things up
substantially. 
You can also save time when developing complex scenes by rendering only the
new portions of the scene. This is accomplished by commenting out previously
verified portions of the image file which aren't needed to verify additions to
the image. As the new parts are verified, they, too, can be commented out so
that their rendering time does not contribute to image turn-around time. When
all component parts of the image are verified, all comments can be removed and
the complete image rendered. 
Another way to save image-development time is to use antialiasing only when
generating the final image. While antialiasing usually produces visually
superior images, it does so at the expense of processing time. Not using it
during image debugging or refinement saves a substantial amount of time. 
Finally, you should make use of POV-Ray's DECLARE statement to define
quantities used repetitively. For example, if an elaborate texture is to be
used over and over, it should be declared once and referenced throughout the
file. 
Facts that need to be kept in mind concerning Pov-Ray include the following:
Positive x-axis is horizontal and to the right, positive y-axis is vertical
and upward, and positive z-axis is into the scene. 
The files shapes.dat, textures.dat, and colors.dat are usually included in
each image file. They contain a basic set of definitions to make using POV-Ray
easier. 
Always declare light sources centered at the origin <0 0 0> and translate them
into their final position. 
Constructive solid geometry makes possible a union (a logical OR of the
surfaces), an intersection (a logical AND), and a difference (the subtraction
of the second surface from the first surface). 
The optimized sphere shape cannot be scaled nonuniformly; scaling factors must
be the same in the x-, y-, and z-dimensions. The exception to this is the
quadric sphere. 
All quadric shapes (except spheres and ellipsoids) are infinite in at least
one direction. Scaling cannot be used to fit them into an image. They must be
constrained using CSG to fit entirely within an image. 
All shapes (except triangles) have an inside and an outside. A plane's outside
is the side with the surface normal. The side of a plane opposite the normal
is the inside. The concept of inside and outside is important when using CSG. 
Composite objects are collections of simple objects that can be manipulated as
a single object. Transformations applied to a composite object affect all
simple objects contained within it. 
The controllable surface-attribute parameters within POV-Ray are ambient,
diffuse, brilliance, reflection, refraction, index of refraction (IOR), Phong,
Phong size, specular, and roughness. 
The sum of the ambient and diffuse parameters should never exceed 1.0.
A color map converts a number into a color using linear interpolation. 
The color of an object must be specified within a texture block when rendering
high-quality images. 
A color with an alpha value of 1.0 is transparent; values approaching 0.0 are
more opaque. 
The use of bounding objects is recommended, as they can drastically decrease
the image-generation time. 
Transformations are always performed in relation to the origin. A sphere
located at the origin will show no visible displacement when rotated. A sphere
first translated away from the origin and then rotated will spin like a planet
in orbit around the origin. 
The "left-hand rule" is used to define the direction of rotation. When the
thumb of the left hand is pointed in the positive direction of the axis being
rotated around, the fingers will curl in the direction of positive rotation. 
Use a right vector of <1.333 0 0> when rendering images in 320x200 256-color
VGA mode. This compensates for the nonsquare pixels produced in this mode. The
right vector can be changed to <1.0 0 0> when rendering in resolutions which
have square pixels. For Super VGA, this means all 256 color modes above
640x400. 
Two different types of textures exist within POV-Ray: coloration and
surface-perturbation. Coloration textures include bozo, spotted, marble, wood,
checker, checker_texture, granite, gradient, agate, and imagemap.
Surface-perturbation textures include waves, ripples, dents, bumps, and
wrinkles. 
Textures produce approximately one feature change (color transition) across a
sphere of radius one. Textures can be scaled to provide the appropriate number
of feature changes needed within an image. 


Conclusions


Ray tracing with POV-Ray is addictive. It can cause you to spend many hours at
your computer that you could otherwise spend painting your house or mowing
your lawn. After a short learning curve, however, you'll be producing unique
and spectacular imagery. 


References


Angell, Ian O. High Resolution Computer Graphics using C. New York, NY: John
Wiley, 1990.
Glassner, Andrew S., ed. An Introduction to Ray Tracing. San Diego, CA:
Academic Press, 1989.
Glassner, Andrew S. "Ray Tracing for Realism." BYTE (December 1990).
Joy, Kenneth I., Charles W. Grant, Nelson L. Max, and Lansing Hatfield, eds.
Computer Graphics: Image Synthesis. Los Alamitos, CA: IEEE Computer Society
Press, 1988.
Lindley, Craig A. Practical Ray Tracing in C. New York, NY: John Wiley, 1992.
------. Practical Image Processing in C. New York, NY: John Wiley, 1991.
Lyke, Daniel. "Ray Tracing." Dr. Dobb's Journal (September 1990).
van Dam, A. and Foley, J.D. Fundamentals of Interactive Computer Graphics.
Reading, MA: Addison-Wesley, 1983.
Example 1: Pseudocode description of backward ray tracing.
Procedure Ray Trace
Begin
 For each row of the image {
 For each column of the image {
 Generate a ray from the eye's location through the pixel at (column,row)
 For each object in 3D universe {
 Calculate the intersection between the ray and object
 If intersection occurred {
 If intersection distance is the smallest yet (object is closest to eye) {
 Save this intersection
 }
 }
 }
 If an intersection did occur {
 Calculate point of intersection of ray with closest object

 Calculate normal to object's surface at the point of intersection
 For each light source in 3D universe {
 Generate a ray from the point of intersection to light source
 For each object in 3D universe except the one initially intersected {
 Calculate light ray/object intersection
 If no intersection occurred, point of intersection is not in shadow {
 Calculate pixel's color value and intensity as a function of this light
source
 }
 }
 }
 } else { // No intersections with objects occurred
 Set pixels color and intensity to that of the background
 }
 Color pixel on screen and/or save pixel data to a file
 } // column
 } // row
End
Figure 1 The image-development process. 
Figure 2 A basic image generated by the scene-description file in Listing One.
Figure 3 A complex image generated by a POV-Ray scene-description file.

Listing One 

// A basic scene with sphere and floor. See Figure 2. 

#include "colors.inc"

// Define the camera location and orientation
camera {

 location <0.0 5.0 -20.0>
 direction <0.0 0.0 1.0>
 up <0.0 1.0 0.0>
 right <1.0 0.0 0.0>
}
// Define a simple point light source
object {
 light_source {
 <30 30 -30>
 color White
 }
}
// Floor plane 
object {
 plane { <0.0 1.0 0.0> 0.0 }
 texture {
 color Blue
 }
}
// Sphere object 
object {
 sphere { <0 3 0> 3 }
 texture {
 color Red
 phong 5
 }
}




Listing Two

// Define a starry black sky
#declare Sky = object {
 plane { < 0 0 1> 5000 }
 color Black
 texture {
 bozo
 turbulence 0.3
 color_map {
 [0.0 0.9 color Black color Black ]
 [0.9 1.001 color Black color White ]
 }
 scale < 50 50 50 >
 }
 }
// Define the checkered floor. NOTE: the floor is partially reflective.
#declare Floor = object {
 intersection {
 plane { <-1 0 0 > 100 }
 plane { < 1 0 0 > 100 }
 plane { < 0 -1 0 > 1 }
 plane { < 0 1 0 > 0 }
 plane { < 0 0 -1 > 100 }
 plane { < 0 0 1 > 6 }
 }
 texture { checker
 color Blue
 color MidnightBlue
 scale < 12 12 12 >
 ambient 0.4

 diffuse 0.6
 reflection 0.4
 }
 }
// Define the infinite path. It is reflective and checkered also.
#declare PathWay = object {
 intersection {
 plane { <-1 0 0 > 18 }
 plane { < 1 0 0 > 18 }
 plane { < 0 -1 0 > 1 }
 plane { < 0 1 0 > 0 }
 plane { < 0 0 -1 > -6 }
 plane { < 0 0 1 > 5000 }
 }
 texture { checker
 color Blue
 color MidnightBlue
 scale < 12 12 12 >
 ambient 0.3
 diffuse 0.7
 reflection 0.4
 }
 }
// Define the wall with window and door cutouts.
#declare Wall = object {
 difference {

 intersection { // The solid wall
 plane { <-1 0 0 > 100 }
 plane { < 1 0 0 > 100 }
 plane { < 0 -1 0 > 0 }
 plane { < 0 1 0 > 96 }
 plane { < 0 0 -1 > 0 }
 plane { < 0 0 1 > 6 }
 }
 intersection { // Door Cutout
 plane { <-1 0 0 > 22 }
 plane { < 1 0 0 > 22 }
 plane { < 0 -1 0 > 0 }
 plane { < 0 1 0 > 80 }
 plane { < 0 0 -1 > 0.1 }
 plane { < 0 0 1 > 7 }
 }
 intersection { // Left window cutout
 plane { <-1 0 0 > 100 }
 plane { < 1 0 0 > -36 }
 plane { < 0 -1 0 > -12 }
 plane { < 0 1 0 > 76 }
 plane { < 0 0 -1 > 0.1 }
 plane { < 0 0 1 > 5 }
 }
 intersection { // Right window cutout
 plane { <-1 0 0 > -36 }

 plane { < 1 0 0 > 100 }
 plane { < 0 -1 0 > -12 }
 plane { < 0 1 0 > 76 }
 plane { < 0 0 -1 > 0.1 }
 plane { < 0 0 1 > 5 }
 }
 }
 // Wall is bounded for speed
 bounded_by {
 intersection { // The complete wall
 plane { <-1 0 0 > 100 }
 plane { < 1 0 0 > 100 }
 plane { < 0 -1 0 > 0 }
 plane { < 0 1 0 > 96 }
 plane { < 0 0 -1 > 0 }
 plane { < 0 0 1 > 6 }
 }
 }
 // The wall has a near white color
 texture {
 ambient 0.4
 diffuse 0.6
 color VLightGrey
 }
 }
// Define the view which is the cloudy blue sky seen through the windows.
#declare View = composite {
 object {
 // View through the left window
 intersection {
 plane { <-1 0 0 > 100 }
 plane { < 1 0 0 > -36 }

 plane { < 0 -1 0 > -12 }
 plane { < 0 1 0 > 76 }
 plane { < 0 0 -1 > -4.98 }
 plane { < 0 0 1 > 4.99 }
 }
 // This is the cloudy blue sky.
 texture {
 ambient 0.8
 diffuse 0.0
 turbulence 0.5
 bozo
 color_map {
 [0.0 0.6 color red 0.4 green 0.4 blue 1.0
 color red 0.4 green 0.4 blue 1.0]
 [0.6 0.8 color red 0.4 green 0.4 blue 1.0
 color red 1.0 green 1.0 blue 1.0]
 [0.8 1.001 color red 1.0 green 1.0 blue 1.0
 color red 0.85 green 0.85 blue 0.85]
 }
 scale < 75 15 75 >

 }
 }
 // View through right window
 object {
 intersection {
 plane { <-1 0 0 > -36 }
 plane { < 1 0 0 > 100 }
 plane { < 0 -1 0 > -12 }
 plane { < 0 1 0 > 76 }
 plane { < 0 0 -1 > -4.98 }
 plane { < 0 0 1 > 4.99 }
 }
 // This is the cloudy blue sky.
 texture {
 ambient 0.8
 diffuse 0.0
 turbulence 0.5
 bozo
 color_map {
 [0.0 0.6 color red 0.4 green 0.4 blue 1.0
 color red 0.4 green 0.4 blue 1.0]
 [0.6 0.8 color red 0.4 green 0.4 blue 1.0
 color red 1.0 green 1.0 blue 1.0]
 [0.8 1.001 color red 1.0 green 1.0 blue 1.0
 color red 0.85 green 0.85 blue 0.85]
 }
 translate < 25 10 0 >
 scale < 50 15 75 >
 }
 }
 }










July, 1994
Examining Audio DSP Algorithms


Algorithms for DSP-based audio effects




Dennis Cronin


Dennis is an engineer specializing in UNIX driver development for Solaris and
HP-UX operating systems. He can be contacted at denny@cd.com.


Digital-signal processing (DSP) configurations span the range of complexity
and cost from complete solutions (such as attached array processors) to add-in
PC cards with embedded DSP controllers. In this article, I will demonstrate
some basic audio DSP algorithms for creating real-time audio effects--pitch
change, echo, flanging, and phase shifting--for the Microsoft Windows Sound
System (WSS) which sells for well under $150.00. The only development tools
you'll need to write software for the WSS are your regular C compiler and
linker. The WSS sound card is comprised of an Analog Devices' AD1848
analog-to-digital converter (ADC) and digital-to-analog (DAC) chip, plus a
Yamaha FM synthesis chip and some glue logic. Since this card does not have a
dedicated DSP onboard, the host CPU does all the processing. It turns out that
a 486 can do quite a respectable job running some of the DSP algorithms, even
in C.
I compiled and tested FX.C with Borland Turbo C 2.0 and Borland C++ 2.0 on a
33-MHz 486. You'll need a 486 or a fast 386 with a math coprocessor to run
this program, as it uses floating point extensively to avoid obscuring the
algorithms with the fussy bit shiftings and normalization characteristic of
integer DSP.
All patches work at the default sample rate of 16K. The version compiled under
Turbo 2.0 will, in fact, run most patches at a sample rate of 27K on a 33-MHz
486; the Borland C++ 2.0 version is somewhat slower. Your mileage may vary
with other compilers, of course. I also developed many of the same algorithms
for the $99.00 Texas Instruments DSP Starter Kit (DSK). These are in its
native 320C20 assembly language and use the .ASM suffix. (This code is
available electronically; see "Availability," page 3.) The TI SDK comes
complete with PC-based assembler, debugger, and manuals and can be ordered
from any TI distributor.


Echo


The first audio effect I'll look at is an "echo," which is achieved using a
single, fixed-delay element and produces the well-known Hello hello hello
hello_. An echo is simply an identical copy of the original audio signal, but
it is delayed by a fixed amount of time. It is easy to digitally create this
fixed delay. 
As you read samples from the analog-to-digital converter, you store them in a
circular buffer. When the buffer is filled, the store pointer wraps back
around to the beginning of the buffer.
The delay comes from a single read pointer, which is placed N slots "behind"
the store pointer and marched along in step with the store pointer. As each
new sample is stored, a sample is read from N samples behind it, thus creating
a static delay of N*1/Fs, where Fs is the sampling rate.
This delayed signal is then mixed in with the original signal, usually at a
somewhat reduced volume level. While this gives you a nice echo, so far it's
only Hello hello. 
To get the decaying repeats alluded to earlier, we need to supply a feedback
path around the delay element. Figure 1 shows the block diagram of an echo
effect that can provide decaying repeats. The more feedback, the longer the
repeats take to fade away.
The first patch provided by FX.C (available electronically) for WSS
demonstrates this decaying-echo effect. No similar effect is provided for the
TI module since it doesn't provide quite enough memory to get suitable delays
for echo-type effects.


Flanging


Flanging is a simple, delay-based effect that produces the whooshing sound
heard on numerous rock records. (A classic example of this is in the
break-down section of "Life in the Fast Lane" by the Eagles.) Flanging uses
extremely short delay lengths which are not discernible to the ear as discrete
echoes.
The origins of the term "flanging" are somewhat uncertain. Some credit George
Martin, the producer for the Beatles, with coining the term in jest. Others
suggest a practical origin. In any case, the effect was originally produced by
running two tape machines with identical tapes closely in sync. Then the speed
of one machine was slightly varied, possibly by a recording engineer's thumb
on the flange of the tape reel. The resultant short varying delay creates the
characteristic striking whooshing sound.
Why the whoosh? When a signal is mixed with a very short delay of itself,
there will be certain frequencies at which the signal is 180 degrees out of
phase with itself and near-total cancellation will occur. For instance, with a
delay of one millisecond, dips (or notches) will occur at 500, 1500, 2500, and
3500 Hz, and so on.
This frequency-response shape is commonly called a "comb" filter since the
notches resemble the teeth of a comb. As the delay is varied from a fraction
of a millisecond to 5 milliseconds or so, the notches sweep dramatically up
and down in frequency. Your ears hear this as sounding "whooshy." 
The digital implementation of flanging is similar to that of echo except that
the delay time must be very short and continuously variable. Figure 2(a) shows
a block diagram of the signal path for flanging, while Figure 2(b) shows the
"shape" of delay variation we use to create flanging. The key element is the
implementation of the varying delay element.
Now, it might seem like you could vary the delay by taking the fixed-delay
element implementation described previously for echo and simply moving the
read pointer in relation to the store pointer by a notch every now and then.
Unfortunately, this approach creates a little click every time the delay tap
is changed. Any steady movement of the delay tap results in "zipper noise," an
objectionable, gritty modulation noise mixed with the varying delay signal.
To avoid this, you need to implement a method of achieving noninteger delays,
thereby sweeping the delay more smoothly, varying it by just a little bit with
each sample. This problem is akin to the general problem of sample-rate
conversion, which is discussed at length in many texts on DSP. Unfortunately,
most proper methods for noninteger ratio sample-rate conversion tend to be a
bit computationally intensive and not always well suited for real-time work.
Luckily, there's a simple but inexact method that yields subjectively low
audible distortion, yet is computationally efficient. An averaged linear
interpolation between two sample points gives good bang-for-the-buck; see
Figure 3. The file FLANGE.ASM (available electronically) and the module
flange_chorus() (Listing One excerpted from FX.C) provide implementation
details of this linear-interpolation technique. 
To create the basic flanging sound, this fine-grained variable-delay element
is cyclically "swept" between a very short delay value of less than a
millisecond to a longer delay value of 5--10 milliseconds. The rate and range
of this sweep can be adjusted to achieve radically different characters of the
basic flange.
Variations on the basic flanging effect can be achieved by providing a
feedback path (like that used to create decaying echoes) and recirculating
some of the delayed signal. This can dramatically intensify the flanging
effect and impart a strong harmonic nature to the sound as the feedback
creates a more-resonant filtering action.
Note also that the delayed signal and the feedback can be inverted before
being summed by simply reversing the sign of the gain stages. This creates
some interesting variants often overlooked in commercial effects devices.
Figure 2(a) shows the paths for feedback and the gain stages providing for
inversion prior to summing.
For another variant, the sweep is disabled entirely, reverting back to the
basic fixed-length delay effect but using short delay times. The robot-voice
patch in FX.C uses a fixed short delay with a lot of feedback. This creates
the static, metallic, resonant filter sound used to make mechanical voices in
old sci-fi movies.
It's also a short hop from flanging to "chorusing," an effect that uses the
exact same processing as flanging except that the delay value is increased to
somewhere around 20--40 milliseconds. This delay is long enough that the
exaggerated comb-filtering action decreases but short enough that the delayed
signal is not quite heard as a distinct echo. Instead, the gently undulating
pitch change resulting from the varying delay just adds a subjective richness
to the sound, much like a second voice singing unison--hence the term
"chorusing."


Pitch Change


As you play around with flanging and chorusing effects, you'll probably notice
how faster rates of delay change result in funny, warbling pitch variations in
the signal. As the delay goes from longer to shorter, you'll hear a sort of
Doppler shift up in pitch. When the sweep reverses, you'll hear the reverse
Doppler shift as the delay "moves away from you." It proves to be fairly easy
to harness this effect of a varying pitch from a varying delay into a decent,
real-time, pitch change algorithm.
Pitch changers allow the pitch of a sound to be dramatically altered in real
time. A downward pitch change can make your voice sound like Darth Vader from
Star Wars (although the actual Darth effect is a little more complex). An
upward pitch change will make you sound like you've been inhaling helium to
get ready for your Alvin and the Chipmunks tryouts.
To create an upward pitch change, you could start with a 30-millisecond delay
and steadily decrease it at a rate that yields the desired pitch change. As
the delay approaches 0, you would start a second delay channel at 30
milliseconds and sweep it as well. Then you do a quick crossfade from the
first to the second channel, making sure the first channel is completely faded
out before its delay reaches 0. Repeat this process, going back and forth
between the delay channels. Figure 4 diagrams the changing delays and the
crossfading pattern.
A downward pitch change is achieved in a similar manner, only the delay
channels are started with a near-zero initial delay, and the delay is
increased out to around 30 milliseconds, at which time the alternate channel
is started and the crossfade performed.
The subjective result of this approach is good for small to medium amounts of
pitch change. As the interval becomes greater, however, the frequency of the
splicing (or crossfading) increases until a "singing through the fan" effect
is created in the pitch-changed signal. In spite of these limitations,
however, this approach is effective and many commercial units are based along
these lines.
PITCH.ASM (available electronically) and the pitch_change module of FX.C
illustrate implementation details. A smooth crossfade is essential to
good-quality blending of the two delay channels. The FX.C version use sine and
cosine lookup tables to generate ideal crossfade blends, whereas the
memory-limited PITCH.ASM version uses a two-piece, linear approximation of the
sine and cosine functions to perform the crossfade. Note that the crossfade
time can be tinkered with to provide less-noticeable splicing at certain
pitch-change rates and for different signal types.

You will definitely want to plug in a microphone and talk through some of
these pitch-change effects. (And you'll probably want to change your
answering-machine message once you hear how cool you sound, talking through a
serious downward pitch change!)
Some really wild effects can be created by combining pitch change with
echo-length delays and/or providing feedback paths around the pitch-change
element. Some of the patches provided with the FX.C program demonstrate these
tricks.


Phase Shifting


Phase shifting is not unlike flanging in that its frequency-response
characteristic is one or more notches sweeping up and down. Like flanging, the
notches in the frequency spectrum result from phase cancellation between the
unaffected signal and the processed signal. You can hear phase shifting all
over Pink Floyd's Dark Side of the Moon and numerous other records from the
early '70s. This effect is based on a curious type of filter called
"all-pass." As the name implies, this type of filter passes all frequencies,
but "filters" the phase of the signal. While its frequency response is a
straight line, its phase response varies by 180 degrees, with a 90-degree
phase shift at what would traditionally be considered the cutoff frequency of
a normal filter.
Example 1(a) is the normalized transfer function of a first-order all-pass
filter. Using a bilinear z-transform (BZT) method, you arrive at the
(difference) equation in Example 1(b), where the coefficient A is described by
Examples 1(c) and 1(d).
The phase-shifting effect is then implemented by cascading several such
all-pass filter sections and sweeping their cutoff frequencies in unison.
Mixing this processed signal with the original signal results in the notching
effect, as the total phase delay through the filter sections causes certain
frequencies to cancel. Like flanging, the effect can be varied by providing a
feedback path around the filter sections, and by providing for inversion of
the processed signal and the feedback.
A smooth sweep function is important; the frequencies of the filters should be
changed exponentially over time. This is easily accomplished using floating
point in C, but the assembly version uses another linear approximation of the
desired function. See PHASER.ASM and the phase_shift module of FX.C for
implementation details.


Conclusion


Recording studios and audio-for-video facilities are just now starting to move
rapidly toward more digital implementations. As studios set aside clunky tape
storage as their primary medium and adopt disk-based systems, the
possibilities for increased digital processing abound. We'll effectively be
able to have access to the audio signal before it happens! This opens up whole
new realms of playback-audio processing possibilities. Extremely complex audio
processing can be "rendered" on hard disk and then auditioned.
Whatever your interests--voice recognition, music, communications, or
games--audio DSP is sure to play an ever-increasing part in your future.


References


Ifeachor, Emmanuel and Barrie Jervis. Digital Signal Processing: A Practical
Approach. Reading, MA: Addison-Wesley, 1993.
Pohlmann, Ken C. Principles of Digital Audio, 2nd ed. Carmel, IN: SAMS, 1992.
Strawn, John. Digital Audio Signal Processing: An Anthology. Los Altos, CA:
William Kaufmann, 1985.


For More Information


TMS320C2X 
Digital Signal Processing Starter Kit
Part #TMDS3200026
Available from any TI distributor
(Hamilton/Hallmark,800-325-1021)
TI DSP hotline: 713-274-2320
TI DSP BBS: 713-274-2323
Figure 1 Echo effect.
Figure 2 (a) Flanging effect; (b) sweep for flanging effect.
Figure 3 Linear interpolation.
Figure 4 Delay/crossfade scheme for pitch change.
Example 1 Phase shifting: (a) Normalized transfer function of a first order
all-pass filter; (b) using a bilinear z-transform (BZT) method, you arrive at
this equation where the co-efficient A is described by (c) and (d).
 s-1
(a) H(s) = ---
 s+1

(b) y(n)=A*x(n)+A*y(n-1)-x(n-1)

 1-wp
(c) A = ----
 1+wp

 (PI*freq)
(d) wp = ----------
 Fs
 Fs = sampling rate

Listing One 


/* flange_chorus. Does flanging/chorusing family of effects based on a single
 varying delay.
 
 dry_mix mix of unaffected signal (-0.999 to 0.999)
 wet_mix mix of affected signal (-0.999 - 0.999)
 feedback amount of recirculation (-0.9 - 0.9)
 rate rate of delay change in millisecs per sec
 sweep sweep range in millisecs
 delay fixed additional delay in millisecs
*/
void
flange_chorus(struct program *p)
{
 int fp,ep1,ep2;
 int step,depth,delay,min_sweep,max_sweep;
 double inval,outval,ifac = 65536.0;
 long scan = 0;
 bw data;
 wl sweep;
 /* fetch params */
 step = (int)(p->rate * 65.536);
 depth = (int)(p->depth * (double)SampleRate / 1000.0);
 delay = (int)(p->delay * (double)SampleRate / 1000.0);
 /* init/calc some stuff */
 max_sweep = BFSZ - 2 - delay;
 min_sweep = max_sweep - depth;
 if(min_sweep < 0) {
 printf("Can't do that much delay or depth at this sample rate.\n");
 exit(1);
 }
 sweep.w[1] = (min_sweep + max_sweep) / 2;
 sweep.w[0] = 0;
 /* init store and read ptrs to known value */
 fp = ep1 = ep2 = 0;
 /* disable interrupts, go to it */
 disable();
 while(1) {
 while((inp(SR) & 0x20) == 0); /* wait for input ready */
 data.b[0] = inp(PDR); /* read input from chip */
 data.b[1] = inp(PDR);
 /* interpolate from the 2 read values */
 outval =
 (Buf[ep1] * sweep.w[0] + Buf[ep2] * (ifac - sweep.w[0])) / ifac;
 /* store finished input plus feedback */
 Buf[fp] = (inval = (double)data.w) + outval * p->feedback;
 /* develop final output mix */
 outval = outval * p->wet_mix + inval * p->dry_mix;
 if(outval > 32767.0)
 data.w = 32767;
 else if(outval < -32768.0)
 data.w = -32768;
 else
 data.w = (int)outval;
 while((inp(SR) & 0x2) == 0); /* wait for output ready */
 outp(PDR,data.b[0]); /* write output to chip */
 outp(PDR,data.b[1]);
 /* update ptrs */
 fp = (fp + 1) & (BFSZ - 1);

 sweep.l += step;
 ep1 = (fp + sweep.w[1]) & (BFSZ - 1);
 ep2 = (ep1 - 1) & (BFSZ - 1);
 /* check for sweep reversal */ 
 if(sweep.w[1] > max_sweep) /* see if we hit top of sweep */
 step = -step; /* reverse */
 else if(sweep.w[1] < min_sweep) /* or if we hit bottom of sweep */
 step = -step; /* reverse */
 
 check_human(); /* check on human every so often */
 }
}


















































July, 1994
PROGRAMMING PARADIGMS


Mushroom Programming, the Sequel




Michael Swaine


A generation or two ago, schoolmarms with their hair in buns taught that good
handwriting really mattered. Who knew then that these teachers were
anticipating the arrival of the Newton MessagePad?
Hmm. That's not going to work. I can blame the MessagePad's unsatisfactory
handwriting recognition on bad writing skills, but then how do I alibi its
year-long lack of cellular communications or its sluggish sales?
My view is that Newton is not really about handwriting or wireless
communication and that ultimately the sales of Newton devices will be out of
Apple's hands, as other hardware manufacturers build Newton devices.
Newton is just not the same kind of platform as a PC, and it isn't subject to
the same old marketing homilies. In particular, I don't think that the size of
the installed base matters very much. When the hardware platform costs less
than a lot of PC software packages, I think it's clear that a different model
applies. In my view, Newton is mainly about vertical applications of certain
types, and the right vertical application can be sold to its target market
directly; the hardware can be bundled with the application.
We used to talk about VisiCalc or Lotus 1-2-3 selling computers; in the case
of Newton I think that the idea of the software selling the hardware may
become the norm.
Anyway, that's my theory. Thus this non-handwriting-dependent,
non-communications-oriented, vertical-application project. It's more an
elucidation of my theory than a demonstration of my programming skills. 


Our Friend, the Fungus


Last month, I presented part of a Newton-Script programming project I'm
working on: a field guide to identifying mushrooms. I should emphasize two
terms: "project" and "field guide." This endeavor is for my own education;
definite identification of mushrooms can require a microscope, while
indefinite identification can be dangerous, to say the least. This program
(see Figure 1) is intended, at most, as a guide to deciding which mushrooms to
throw in the basket to take back to the house for later, more accurate
identification. That, anyway, is the disclaimer I'd use if it were a real
product.
What I supplied last month was the user interface. That column was really an
exercise in using the Newton Toolkit (NTK), since a whole slew of
user-interface templates are built into the ROM, and the NTK lets you build a
UI by dragging these around on screen and setting their attributes via menu
selections. If you think that sounds like using custom controls in Visual
Basic, you've got the idea, although the NextStep environment is probably an
even better analogy.
I did do some coding in building the interface, but not much. This month's
effort was much more of an exercise in NewtonScript programming, since I wrote
the program's internals, for which there are no handy templates, in ROM.
Last month, in a flight of rhetoric, I referred to these internals as an
"expert system." What I've written is really just a database and a matching
routine, but it arguably follows the basic structure of an expert system as
laid down in The Handbook of Artificial Intelligence, Volume IV by Avram Barr,
Paul R. Cohen, and Edward A. Feigenbaum (Addison-Wesley, 1989). That is, it
has a knowledge base of facts about the domain of knowledge and a relatively
simple program, called an "inference engine," for reasoning about that
knowledge base.
And in fact, I started to design a somewhat less-minimal expert system, only
to be redirected in my efforts by a principle of object-oriented design. Some
of you may be interested in that process, so I'm reporting it here.
Many of you, though, will find the programming insights obvious and the design
trivial. They are, and it is, but perhaps my efforts will demonstrate some
interesting features of this fascinating new object-oriented language.


There are a Lot of Mushrooms


The so-called "knowledge base" of this application is intended to hold all the
knowledge necessary for identifying mushrooms on the basis of their
attributes. In a conventional expert system, this knowledge might be stored as
rules, as in Example 1.
Last month's user interface lets the user enter attributes of the found
mushroom (the exemplar) such as color (white, buff, grey, yellow, brown,
black, and so on); size (in centimeters); gill presence, appearance, and mode
of attachment to the stem; and five other attributes.
This list is way too short. There are thousands of mushroom species. To tell
one species from another with any reasonable certainty might require knowledge
of attributes such as: cap_presence, cap_color, cap_color_change, cap_surface,
cap_shape, cap_size, cap_texture, plus a couple dozen others; the stem, flesh,
and gills; where the mushroom was found (what part of the country, presence of
trees nearby, whether the mushroom was growing on the ground or on wood); and
the time of year.
In addition to the many species, attributes, and possible values (color, for
example), other complications afflict this data. First, some attributes are
contingent on others: Mushrooms that lack gills have no meaningful
gill-appearance or gill-attachment attribute values, for example.
Then there is the question of the appropriate level at which to make the
identification. Identifying the species may not be precise enough, as some
species (Boletus Edulis, for example) have distinct varieties. You might want
to identify the variety, or you might just want to use certain information to
shorten your search by eliminating, for example, an East Coast USA variety if
you're on the West Coast.
But species can also be too precise an identification. Out in the field, you
might be satisfied to know simply that the mushroom is the genus boletus;
species identification can be postponed until you get home.
All these considerations affect the structure of the knowledge base, of
course, but they also affect the inference engine. In expert systems, the
inference engine needs to crank through the knowledge base efficiently.


What do You Mean by Identify?


The scientific name of a mushroom is one kind of identification, but there are
others. It might be enough to know if it is poisonous, good to eat, or
hallucinogenic.
Knowledge bases are usually structured to facilitate answering useful
questions. Other criteria that could influence the knowledge-base structure
include:
The desire to identify the most-common mushrooms easily and quickly. This
would argue for emphasizing those attributes most useful for distinguishing
these common mushrooms, rather than generally distinctive or very salient
attributes (like size and color).
The desire to make things natural and easy for the user. The user will
recognize and enter salient attributes like color whether or not they happen
to be the most useful attributes for identifying the current mushroom.
As it happens, I didn't take any of these things into account in building the
knowledge base. I just implemented the scientific hierarchy of variety,
species, genus, and the like, and let that dictate the order in which the
knowledge base was accessed. I did this for two reasons:
"The expert system is a_model of the expert's model of the domain," according
to Bruce Buchanan in The Handbook of AI. The real experts are the botanists
who sort these fungi into Linnean categories.
"A good plan is to study the physical system you are trying to model and
create the classes of objects it has," is Actor author Chuck Duff's advice on
deciding what objects to create in an object-oriented system.
Well, the classes that mushrooms have, in the model of reality that botanists
use, are: kingdom (fungi), division, subdivision, class, order (agaricales),
family (boletacaea), genus (boletus), species (edulis).
This structure submits willingly to object-oriented exploitation. Each rule is
simply a description of a species (or variety or genus or whatever) of
mushroom in terms of defining attributes. Inheritance comes into play
naturally: Species inherit from genus, and so on. Family boletacaea is defined
by the lack of gills, so the gill_type slot in all its genera and species
inherit this value unless they explicitly override it.
This structure imposes a hierarchy on the search, so the inference engine
doesn't have to examine the entire knowledge base and can be much more
efficient. Another nice feature is that the inference engine could be designed
to stop at the level of family or genus or keep cranking down to species or
variety, although I haven't implemented this yet.



The Knowledge Base and the Inference Engine


The knowledge base consists of frames, each defining a mushroom species, or
genus, or order, or whatever. The inference engine always starts with the same
frame, corresponding to the highest level in this hierarchy of fungi; see
Example 2(a). 
It compares the values of this frame's slots with the values of corresponding
slots in the frame named Observations, which gets built as the user enters
data about the mushroom to be identified; see Example 2(b). In this version of
the program, only one slot in the agaricales frame, namely stem_position, has
its value compared with that of the corresponding slot in Observations. If
there is a match, the inference engine then recurses over the children of this
frame. The single quote in front of the names in the array children indicates
that these are symbols, which is what they need to be to appear as names of
frames; see Example 2(c).
The inference engine then does its comparison with each of these children. If
it finds a match, it continues down the tree further, examining that frame's
children. The frame above is a child of agaricales, and it has children
suillus, boletus, and so on. Each of these children is a genus, and each genus
has child frames that represent species; that's where the recursion stops and
an identification is returned. As Example 2(d) shows, the inference engine
hands back to the user the genus, species, and possibly other information
picked up along the way (poisonous, delicious, attracts maggots, or whatever).
The key to the inference engine is its matching algorithm. I may seem to be
glossing over its detail here, but I'm not: As currently implemented, it
really does no more than what I've described. Well, it does shift gears when
looking at the size slot, using the size value to create a range and checking
to see if the observation's size value is in that range. But there's a serious
flaw in the model I've used.
Although the inheritance mechanism in NewtonScript does allow a child to
inherit from a parent (for example, all frames inherit the slot stem_position
with value "central" from frame agaricales), and it allows the child to
override that value (you could give boletus a stem_position of "eccentric" if
you wished), the overriding does no good, because this inference engine will
never compare an eccentric-stemmed sample with the frame for boletus. It will
have bailed out with the very first comparison.
The problem is that nature is not neat. Agaricales consists of mushrooms that
generally, but not always, have centrally placed stems, while on the other
hand, always have pores rather than gills. Some of the characteristics
actually do allow cutting off entire branches of the search tree, while others
merely indicate which branches are less promising than others.
What I apparently need are two things: 1. a flag attached to each slot
indicating whether it is a required, typical, rare, or prohibited
characteristic for this particular species, genus, and the like; and 2. a
matching algorithm that uses this information to manage the search tree,
cutting off a branch whenever a required characteristic is missing or a
prohibited one is found, and reordering branches to follow promising ones
first whenever typical or rare characteristics are found or missed. I'm
working on it.
That's probably enough of this. My real point was to give an example of what I
consider an appropriate Newton application: a vertical,
handwriting-independent application that requires transportability and gives
quick answers.
How many people will be asking the questions that my program is prepared to
answer is an issue, of course, and uncertainty about the size of the market
does give me pause. There is already at least one mushroom-identification
program on the market, though. It runs on, er, um, the Amiga.
Figure 1 The mushroom-identification program in action.
Example 1: Storing knowledge as rules.
if stem_position = "central"
and gill_type = "absent"
then return "boletus"
Example 2: (a) The knowledge base starts at the highest level; (b) comparing
the values of this frame's slots with the values of corresponding slots in the
frame; (c) the single quote in front of the names in the array children
indicates that these are symbols; (d) the inference engine hands back to the
user the genus, species, and other information.
(a)
 agaricales : {
 // Agaricales is an order of fungi,
 // usually characterized by a centrally placed stem.
 // It contains several families of fungi.
 level:"order",
 children:['boletacaea,...],
 stem_position:"central"}
(b)
 Observations : {
 color : "brown",
 size : "15",
 cap_shape : "",
 cap_surface : "dry",
 gill_type : "absent",
 gill_attachment : "",
 stem_position : "central",
 stem_surface : "smooth",
 veils : ""}
(c)
 boletacaea : {
 // Boletacaea is a family of fungi,
 // characterized by pores rather than gills.
 // It contains several genera of fungi.
 level:"family",
 children:
 ['suillus, 'boletus, 'boletellus, 'gyroporus, 'pulveroboletus],
 gill_type:"absent"}
(d)
 suillus : {
 // Suillus is a genus of fungi,
 // typically characterized by a viscid cap surface.
 // It contains several species of fungi.
 level:"genus",
 children:[],
 cap_surface:"viscid"}
boletus : {
 // Boletus is a genus of fungi,
 // typically characterized by a dry cap surface.
 // It contains several species of fungi.
 level:"genus",
 children:['flaviporus,'satanas,'pulcherrimus,'edulis],

 cap_surface:"dry"}
flaviporus : {
 // Flaviporus is a species of fungi,
 // usually yellow with a smooth stem.
 level:"species",
 children:[],
 color:"yellow",
 stem_surface:"smooth"}
edulis : {
 // Edulis is a species of fungi,
 // usually large and brown, with a scaly stem.
 level:"species",
 children:[],
 color:"brown",
 size:15,
 stem_surface:"scaly"}














































July, 1994
C PROGRAMMING


The Quincy Preprocessor (Continued)




Al Stevens


I should be grouchy writing this column. It's overdue as usual, the lawn needs
mowing, the old car needs a new idler arm, I need a haircut, the deadline for
my new book looms near, papers for the Borland conference are due next week,
and federal tax returns must be filed by Friday. When things start to pile up
like that, I react swiftly and with purpose. I take the day off.
Yesterday Judy and I drove over to Lakeland to the annual Sun-and-Fun fly-in.
Her cousin-in-law, Mike Horvath, is an owner of Kolb Aircraft, a company that
manufactures ultralight airplanes, and they were displaying their wares at the
show. As a former pilot of real airplanes, I have always viewed those noisy
little toys with the same distant regard usually reserved for venomous snakes,
earthquakes, pitbulls, and women scorned.
Former pilot, indeed. Some time ago the FAA in their infinite wisdom decided
that due to a malfunctioning pancreas I was no longer fit to pilot airplanes.
That ruling being nonnegotiable and my condition being irreversible, I set
aside my doubts to take another look at ultralights. You don't need to be a
licensed pilot to fly one. And with good reason, I always thought. No sane
pilot would want to. When they first came out, they were little more than tube
and nylon hang gliders with patio chairs and lawn-mower engines strapped on.
They've come a long way since then, and if I ever get all of these other
things taken care of, I might just slap one together and take it out for some
bumps and circuits.
I looked at Mike's product and a bunch of others. If I didn't say here and now
that the Kolb FireStar was the clear, standout best on the field, I'd have
trouble at home. Fortunately, it turns out to be true. One company was selling
fan-driven parachute machines similar to the one that landed in the middle of
a heavyweight title fight and later on the roof of Buckingham Palace. The
birdman who flew it was naked and painted green when the Bobbies took him into
custody. See, that's what I mean. You have to have a few bulbs burned out to
get into some of these hobbies. I can't believe that I'm seriously considering
it. The salesman didn't want to talk about the naked, green fan-man. He said
it gave his product a bad name. As part of his sales approach he asked about
my profession. Then all he wanted to talk about was his PC and whether Visual
C++ would have templates soon. It's everywhere.


Preprocessing


This month continues the Quincy C-interpreter project. Last month, I began
describing Quincy's preprocessor. Since writing that column, I've made a
number of changes to the preprocessor. I'm building Quincy at the same time
that I'm writing the introductory C book that Quincy supports. Developing the
book's exercises reveals Quincy's bugs and shortcomings. For example, last
month I said that Quincy tends to reflect the subset of the language that I
use, and among the things that I rarely use are the # and ## preprocessing
macro tokens. To add Standard C's assert.h functions, which I use a lot, I
found that I needed the # "stringizing" token. Since I was adding that, adding
the ## argument-pasting token was not much more trouble, so I did. Perhaps
I'll find a use for it later. As I predicted last month, adding those tokens
meant that I had to change the internal representation of argument-replacement
tokens in macros. They are now 8-bit integers with the most significant bit
set instead of #<digit> pairs, as described last month.
Last month's column deferred the parts of the preprocessor that resolve
#define macros and evaluate #if expressions. We continue that discussion now.


Resolving Macros


When the preprocessor scan in preproc.c (from last month) finds an identifier,
it calls the ResolveMacro function preexpr.c (Listing One), which tests to see
if the identifier has been redefined in a macro. If not, the function returns
the identifier itself. If so, the function returns the string that represents
the resolution of the macro. Macros can invoke other macros, so the
ResolveMacro process is recursive.
ResolveMacro calls FindMacro to see if the identifier has been redefined. If
not, the function returns the original identifier copied into the space
specified by the caller. If the identifier has been redefined, the function
must resolve the macro. Some macros are simple identifier substitutions in
that they have no parameters and therefore, accept no arguments; #define ESC
27 for example. 
In the case of such substitutions, the function makes the substitution and
returns to the top to test the substituted value to see if it, too, invokes a
macro substitution.
When a macro has parameters, ResolveMacro calls CompileMacro to substitute the
macro definition replacing the parameters in the macro definition with the
macro call's arguments. Since that substitution can result in a completely
different code sequence, ResolveMacro scans the result and calls itself to
resolve any identifiers in the new substituted string.
The CompileMacro function expands a macro call, including argument
substitution for parameters. First, it builds an array of pointers to the
macro call's arguments. Then it scans the macro definition, copying ASCII
characters to the result. When it hits a parameter token, which is an 8-bit
integer with the most significant bit set, the function copies the
corresponding argument from the array to the result. The tokens are integer
values that represent the parameter number. The most significant bit
identifies the integer as a parameter token rather than an ASCII value to be
copied.
If the macro definition includes the # character outside of a literal, the
substitution surrounds the argument that immediately follows with double
quotes, turning it into a string. A ## pair causes the immediately preceding
and following arguments to be concatenated.


Expression Evaluation


The #if and #elif preprocessing directives test a constant expression and
include or exclude the lines of code that follow it, depending on whether the
expression evaluates to a true or false value. Evaluating expressions uses a
recursive-descent expression analyzer that starts with a call to
MacroExpression.
A recursive-descent parser calls the function that processes operators with
the lowest precedence first. That function starts by calling the function for
operators with the next-higher precedence. Then it loops, looking for the
operators that it manages, processing them when it finds them, and breaking
out of the loop when it does not. If the operator is binary, the function
saves the first operand and again calls the next-higher precedence function to
get the second operand. Then it computes its result by applying the two
operands and the operator. Each of the operator functions returns the value
that it evaluates.
The highest-precedence function extracts a numerical value to contribute to
the expression; this function is at the bottom of the recursive descent.
First, it looks to see if there is a left parenthesis. If so, it calls the top
of the descent to evaluate the expression in parentheses, making sure that a
right parenthesis terminates the expression. Then it returns the result of the
evaluation.
If there is no left parenthesis, the highest-precedence function checks for
unary operators. If it finds one, the function calls itself to get an
expression value against which to apply the operator.
The highest-precedence function parses out numerical constants and returns
their value. It uses the interpreter's tokenize function to convert the C
constant into a number. 
If the highest-precedence function finds an identifier, the function calls
ResolveMacro to convert the identifier to a value that the function can use in
a call to itself to get a number. If there is no macro, the identifier is
translated into a 0 value.


Piles of Books


My desk is piled high with reference books used in research for the C book I'm
writing and in verifying C-language features as I build Quincy. I'll discuss
some of those books here.
The ANSI-standard document is at the top of the pile. Its official name is The
American National Standard for Programming Languages-C. My copy is the
December 1988 draft, which is close to, but not, the final copy. They added
some sections at the beginning, so the paragraph numbers are off, but
otherwise the draft seems to be complete. The ANSI-draft document package
includes a rationale document that "summarizes the deliberations of X3J11."
Every C programmer should have a copy of the ANSI document. You can order the
official version by calling 212-642-4900 and pressing 1 twice. Ask for
ANSI/ISO 9899-1990. The cost is $130.00, plus shipping and handling. Rather
than pay that much, you could buy a cheaper book that has it all. The
Annotated ANSI C Standard (Osborne/McGraw-Hill), reproduces the complete ANSI
specification, with annotations by Herb Shildt. The book was released in 1993,
but the copyright notice cites only the 1990 copyright date of the original
ISO/ANSI standard document. The purpose of the book is to translate the
standard document's terse technical specification into a more readable form. A
side benefit is that it publishes the complete ANSI C standard at a
substantially lower cost ($39.95 list) than ANSI's. The book devotes two
facing pages with identical page numbers to each page from the standard
document--the standard-document page on the left and Herb's comments on the
right. The standard pages were printed from plates provided to the publisher
by ANSI, so you can be sure that you are getting the real McCoy. Herb's
comments are relevant and to the point, making some of the more arcane parts
of the document understandable. 
One time, I turned to this book when I doubted what I read in the draft. The
fprintf formatting string includes a %n token that I have never used, and the
ANSI language that discusses it was less than clear. At first glance it looked
like a typo. Here was a situation where Herb's comments might shed light, or,
if not, maybe the final document would be clearer than the draft. Wouldn't you
know it? That was a page that the publisher inadvertently left out. I called
Herb, and he said that subsequent printings corrected the error. I have the
third printing. When you get the book, if pages 131 and 132 are the same, ask
for a more recent printing. I was able to clear up my question about fprintf
on my own. A more careful reading of the terse language in the ANSI draft
proved that it was correct.
Next in my pile is P.J. Plauger's The Standard C Library (Prentice-Hall,
1992), a book that I discussed in an earlier column. Quincy implements a
subset of the standard C-library functions, mostly by passing calls to them
through to the Borland C++ run-time library. Sometimes, however, as with
stdarg.h macros and setjmp.h functions, that is not possible. Other times, as
with assert.h macros and functions, the compiler's implementation might work,
but not well enough. An assert call can abort the interpreted program, but it
shouldn't abort Quincy, too. Plauger's book explains how all of these
libraries are implemented.
Both editions of The C Programming Language (1978 and 1988), by Brian
Kernighan and Dennis Ritchie are prominent in my pile: the first, to provide a
historical reference of how C used to be; the second, for Ritchie's occasional
reflections on the effects that an international committee had on his
creation. For example, "The rules for nested uses of ## are arcane_." I'll
say.
Then there are the books about compilers and interpreters. Although Quincy's
K&R interpreter was mostly written when I started this project, adding the
ANSI constructs and improving its performance required considerable
modifications to the interpreter code. Three books that have helped me better
understand the architecture of language translation are: Principles of
Compiler Design, by Alfred Aho and Jeffrey Ullman (Addison-Wesley, 1977), the
well-known "dragon book;" Writing Compilers & Interpreters, by Ronald Mak
(John Wiley & Sons, 1991); and Writing Interactive Compilers and Interpreters,
by P.J. Brown (John Wiley & Sons, 1979). The Brown book is my favorite. It
doesn't take itself too seriously and explains things in ways that we poor,
humble, country programmers can understand.



Variable Arguments: stdarg.h


Having mentioned stdarg.h, I'll use this column to discuss what it does and
how Quincy implements it. It's one of those oddities that sets C apart from
other languages and complicates the business of writing a language translator.
The ability for a C function to have a variable number of arguments of
variable types is an enigma of sorts. On one hand, the prototype mechanism
that ANSI borrowed from C++ wants all of a function's arguments to be defined
at compile time. That way, the compiler can validate the function definition
and all calls of the function. On the other hand, functions such as printf
need to have variable arguments whose number and types can be determined at
run time, based on format specifiers in a fixed argument whose type is
declared. The prototype for printf is shown in Example 1(a).
The ellipse token (_) tells the compiler not to check the types or presence of
any of the arguments past the first one, and that's all that a C compiler has
to do to support variable arguments. There may be any number of fixed
arguments before the ellipse.
Writing a function that accepts variable arguments involves using some macros
defined in stdarg.h. At first glance, it might seem that implementing a
variable argument list would involve some trickery in the compiler, but, in
fact, the whole thing is done with macros. The real trick is in knowing how
arguments get passed. Example 1(b) is the Quincy implementation of stdarg.h.
It works very much like the one found in most compilers that pass adjacent
arguments on a stack in increasing address order.
To understand how the macros work, you need to look at how you use them.
Example 1(c) is a simple function with variable arguments. It assumes that the
first argument is an int containing the number of arguments that follow, and
those following arguments are pairs of one int and one char* each. It doesn't
have to be that rigid. The printf function infers the types of arguments from
the formatting tokens in the first argument. That's why you see such strange
stuff when your formatting string and your arguments don't match.
Here's how the macros work. The va_list variable becomes a reference point for
the scan of the variable argument list. The va_start macro call uses that
variable and the function's last fixed argument as its own arguments. The
va_list variable is a pointer to a char type by virtue of the typedef in
stdarg.h. The va_start macro assigns to the variable the address of whatever
follows its second argument, which should be the last fixed argument to the
function. This technique assumes that arguments are passed on the stack, that
they are adjacent, and that the first listed argument has a lower address than
the second, the second, a lower address than the third, and so on. The
va_start macro takes the address of its second argument, adds to that the size
of the second argument, casts the resulting address to type va_list, and
assigns the address to the va_list variable named by its first argument.
After this, it is the responsibility of the using function to know what types
are in the argument list. The va_arg macro takes the address of its va_list
first argument, casts it to an address of the type specified by its second
argument, dereferences the object pointed to by that address, and increments
the va_list variable by the type size named as its second argument. By using
the va_arg macro in an expression, you can dereference the argument itself
where the macro call occurs.
With most macro definitions, the macro calls resemble function calls with
valid C expressions as arguments. The va_arg macro is an exception. Its second
argument has to be a type, which is why va_arg must be a macro and cannot be
implemented as a function. The macro expansion inserts the second argument
into a cast and two sizeof expressions. The va_start macro cannot be a
function, either. It needs to know the size of its second argument, which can
be of any valid type, and a C function cannot work that way.
In Quincy, the va_end macro is what assembly-language programmers call a NOP
(pronounced "no op," also used to describe nonproductive programmers). It does
nothing. Some C implementations use it to clean up the variable-argument
environment.


"C Programming" Column Source Code


Quincy, D-Flat, and D-Flat++ are available to download from the DDJ Forum on
CompuServe, DDJ Online, and the Internet by anonymous ftp; see "Availability,"
page 3. If you cannot get to one of the online sources, send a diskette and a
stamped, addressed mailer to me at Dr. Dobb's Journal, 411 Borel, San Mateo,
CA 94402. I'll send you a copy of the source code. It's free, but if you want
to support my Careware charity, include a dollar for the Brevard County Food
Bank.
Example 1: (a) Prototype for printf; (b) Quincy implementation of stdarg.h;
(c) a function with variable arguments.
(a)
 int printf(const char *fmt, ...);
(b)
typedef char *va_list;
#define va_start(ap,pn) (void)((ap)=(va_list)&(pn)+sizeof(pn))
#define va_arg(ap,ty) (*(ty*)(((ap)+=sizeof(ty))-sizeof(ty)))
#define va_end(ap) (void)0
(c)
#include <stdio.h>
#include <stdarg.h>
void foo(int n,...)
{
 va_list ap;
 va_start(ap,n);
 while (n--) {
 printf("\n%d:",va_arg(ap,int)); /* int argument */
 puts(va_arg(ap,char*)); /* char* argument */
 }
 va_end(ap);
}

Listing One 

/* ---------- preexpr.c ---------- */
#include <string.h>
#include <stdlib.h>
#include "qnc.h"
#include "preproc.h"

void bypassWhite(unsigned char **cp);

/* ------- compile a #define macro ------- */
static void CompileMacro(unsigned char *wd, MACRO *mp,unsigned char **cp)
{
 char *args[MAXPARMS];
 int i, argno = 0;
 char *val = mp->val;
 int inString = 0;

 if (**cp != '(')

 error(SYNTAXERR);
 /* ---- pull the arguments out of the macro call ---- */
 (*cp)++;
 while (**cp && **cp != ')') {
 char *ca = getmem(80);
 int parens = 0, cs = 0;
 args[argno] = ca;
 bypassWhite(cp);
 while (**cp) {
 if (**cp == ',' && parens == 0)
 break;
 if (**cp == '(')
 parens++;
 if (**cp == ')') {
 if (parens == 0)
 break;
 --parens;
 }
 if (cs++ == 80)
 error(SYNTAXERR);
 *ca++ = *((*cp)++);
 }
 *ca = '\0';
 argno++;
 if (**cp == ',')
 (*cp)++;
 }
 /* -- build the statement substituting arguments for the parameters -- */
 while (*val) {
 if (*val & 0x80 (*val == '#' && !inString)) {
 char *arg;
 int stringizing = 0;
 if (*val == '#') {
 val++;
 if (*val == '#') {
 val++;
 continue;
 }
 else {
 *wd++ = '"';
 stringizing = 1;
 }
 }
 arg = args[*val & 0x3f];
 while (isspace(*arg))
 arg++;
 while (*arg != '\0')
 *wd++ = *arg++;
 if (stringizing) {
 while (isspace(*(wd-1)))
 --wd;
 *wd++ = '"';
 }
 val++;
 }
 else if ((*wd++ = *val++) == '"')
 inString ^= 1;
 }
 *wd = '\0';

 for (i = 0; i < argno; i++)
 free(args[i]);
 if (argno != mp->parms)
 error(ARGERR);
 if (**cp != ')')
 error(SYNTAXERR);
 (*cp)++;
}
/* ---- resolve a macro to its #defined value ----- */
int ResolveMacro(unsigned char *wd, unsigned char **cp)
{
 unsigned char *mywd = getmem(MAXMACROLENGTH);
 MACRO *mp;
 int sct = 0;
 ExtractWord(wd, cp, "_");
 while (alphanum(*wd) && (mp = FindMacro(wd)) != NULL &&
 sct != MacroCount) {
 if (mp->val == NULL)
 break;
 if (mp->isMacro) {
 unsigned char *mw = mywd;
 int inString = 0;
 CompileMacro(mywd, mp, cp);
 while (*mw) {
 if (*mw == '"' && (mw == mywd *(mw-1) != '\\'))
 inString ^= 1;
 if (!inString && alphanum(*mw)) {
 ResolveMacro(wd, &mw);
 wd += strlen(wd);
 }
 else
 *wd++ = *mw++;
 }
 *wd = '\0';
 }
 else
 strcpy(wd, mp->val);
 sct++;
 }
 free(mywd);
 return sct;
}
/* --- recursive descent expression evaluation for #if --- */
static int MacroPrimary(unsigned char **cp)
{
 /* ---- primary:
 highest precedence;
 bottom of the descent ---- */
 int result = 0, tok;
 if (**cp == '(') {
 /* ---- parenthetical expression ---- */
 (*cp)++;
 result = MacroExpression(cp);
 if (**cp != ')')
 error(IFERR);
 (*cp)++;
 }
 else if (isdigit(**cp) **cp == '\'') {
 /* --- numerical constant expression ---- */

 char num[80];
 char con[80];
 char *ch = *cp, *cc = con;
 while (isdigit(*ch) strchr(".Ee'xX", *ch))
 *cc++ = *ch++;
 *cc = '\0';
 tokenize(num, con);
 *cp += cc - con;
 switch (*num) {
 case T_CHRCONST:
 result = *(unsigned char*) (num+1);
 break;
 case T_INTCONST:
 result = *(int*) (num+1);
 break;
 case T_LNGCONST:
 result = (int) *(long*) (num+1);
 break;
 default:
 error(IFERR);
 }
 }
 else if (alphanum(**cp)) {
 /* ----- macro identifer expression ----- */
 unsigned char *np = getmem(MAXMACROLENGTH);
 unsigned char *npp = np;
 result = (ResolveMacro(np, cp) == 0) ? 0 : MacroPrimary(&npp);
 free(np);
 }
 else {
 /* ----- unary operators ----- */
 tok = **cp;
 (*cp)++;
 result = MacroPrimary(cp);
 switch (tok) {
 case '+':
 break;
 case '-':
 result = -result;
 break;
 case '!':
 result = !result;
 break;
 case '~':
 result = ~result;
 break;
 default:
 error(IFERR);
 }
 }
 bypassWhite(cp);
 return result;
}
/* ----- * and / operators ----- */
static int MacroMultiplyDivide(unsigned char **cp)
{
 int result = MacroPrimary(cp);
 int iresult, op;
 for (;;) {

 if (**cp == '*')
 op = 0;
 else if (**cp == '/')
 op = 1;
 else if (**cp == '%')
 op = 2;
 else
 break;
 (*cp)++;
 iresult = MacroPrimary(cp);
 result = op == 0 ? (result * iresult) :
 op == 1 ? (result / iresult) :
 (result % iresult);
 }
 return result;
}
/* ------ + and - binary operators ------- */
static int MacroAddSubtract(unsigned char **cp)
{
 int result = MacroMultiplyDivide(cp);
 int iresult, ad;
 while (**cp == '+' **cp == '-') {
 ad = **cp == '+';
 (*cp)++;
 iresult = MacroMultiplyDivide(cp);
 result = ad ? (result+iresult) : (result-iresult);
 }
 return result;
}
/* -------- <, >, <=, and >= operators ------- */
static int MacroRelational(unsigned char **cp)
{
 int result = MacroAddSubtract(cp);
 int iresult, lt;
 while (**cp == '<' **cp == '>') {
 int lt = **cp == '<';
 (*cp)++;
 if (**cp == '=') {
 (*cp)++;
 iresult = MacroAddSubtract(cp);
 result = lt ? (result <= iresult) : (result >= iresult);
 }
 else {
 iresult = MacroAddSubtract(cp);
 result = lt ? (result < iresult) : (result > iresult);
 }
 }
 return result;
}
/* -------- == and != operators -------- */
static int MacroEquality(unsigned char **cp)
{
 int result = MacroRelational(cp);
 int iresult, eq;
 while ((**cp == '=' **cp == '!') && *(*cp+1) == '=') {
 eq = **cp == '=';
 (*cp) += 2;
 iresult = MacroRelational(cp);
 result = eq ? (result==iresult) : (result!=iresult);

 }
 return result;
}
/* ---------- & binary operator ---------- */
static int MacroBoolAND(unsigned char **cp)
{
 int result = MacroEquality(cp);
 while (**cp == '&' && *(*cp+1) != '&') {
 (*cp) += 2;
 result &= MacroEquality(cp);
 }
 return result;
}
/* ----------- ^ operator ------------- */
static int MacroBoolXOR(unsigned char **cp)
{
 int result = MacroBoolAND(cp);
 while (**cp == '^') {
 (*cp)++;
 result ^= MacroBoolAND(cp);
 }
 return result;
}
/* ---------- operator -------- */
static int MacroBoolOR(unsigned char **cp)
{
 int result = MacroBoolXOR(cp);
 while (**cp == && *(*cp+1) != ) {
 (*cp) += 2;
 result = MacroBoolXOR(cp);
 }
 return result;
}
/* ---------- && operator ---------- */
static int MacroLogicalAND(unsigned char **cp)
{
 int result = MacroBoolOR(cp);
 while (**cp == '&' && *(*cp+1) == '&') {
 (*cp) += 2;
 result = MacroBoolOR(cp) && result;
 }
 return result;
}
/* ---------- operator -------- */
static int MacroLogicalOR(unsigned char **cp)
{
 int result = MacroLogicalAND(cp);
 while (**cp == && *(*cp+1) == ) {
 (*cp) += 2;
 result = MacroLogicalAND(cp) result;
 }
 return result;
}
/* -------- top of the descent ----------- */
int MacroExpression(unsigned char **cp)
{
 bypassWhite(cp);
 return MacroLogicalOR(cp);
}































































July, 1994
ALGORITHM ALLEY


Rendering Circles and Ellipses




Tim Kientzle


Tim, who has a PhD in mathematics from the University of California, Berkeley,
is the author of numerous commercial, shareware, and public-domain programs
for graphics and serial communication. Tim can be contacted at
kientzle@netcom.com.


In the July 1990 issue of Dr. Dobb's Journal, Tim Paterson presented the
article "Circles and the Digital Differential Analyzer." While Paterson's
algorithm does not accumulate error, his explanation involves a method for
solving the differential equation dy/dx=--x/y which does accumulate error.
In a subsequent letter to the editor ("Letters," DDJ, July 1991), V.
Venkataraman pointed out that Paterson's algorithm plots points on or just
within the ideal circle, and suggests the desirability of a method that plots
the points closest to the circle, even if they are outside it. Venkataraman
presented a minor change to Paterson's algorithm, but still failed to
consistently choose the optimal point.
Based on this exchange, I developed a circle algorithm which satisfies both
Paterson's interest in speed and Venkataraman's interest in plotting the
closest points to the circle. The corresponding algorithm for drawing ellipses
is faster than the line algorithm that inspired it.


Lining Up


The digital differential analyzer (DDA), also known as "Bresenham's
algorithm," is based on a simple idea: Instead of picking points on the ideal
line, you should pick points on the screen that are close to the line. To do
this, increment one coordinate while updating the other so that you pick the
integer closest to the line. Since the endpoints of the lines have integer
coordinates, the slope is always rational, so you can obtain a fast, exact
algorithm by using rational arithmetic.
For example, assume you're drawing a line from (0,0) to (3,2). Whenever you
increment the x-coordinate, you need to increase the y-coordinate by 2/3. If
you let the variable y hold the integral part of the y-coordinate and add a
new variable to hold the numerator of the fractional part, you have the
algorithm in Listing One .
The problem with this approach is that by simply using y to plot the point,
you are truncating the exact value rather than rounding. Since rounding
involves adding 1/2 and then truncating, in this algorithm it suffices to
initialize the exact y-coordinate to 1/2 before you start, as in Listing Two .


Going in Circles


Drawing circles using the DDA algorithm requires more-involved geometry than
drawing lines, but it uses the same approach: Step the x-coordinate through
successive values and update the approximate y-coordinate based on some
additional information. The trick is to figure out what that additional
information should be.
You need the following simplifications: First, deal with points on the circle
as "offsets" from the center of the circle (this is required by the algebra);
and second, only compute the pixels in 1/8 of the circle, taking advantage of
symmetry. Reflection to the remaining octants of the circle is handled by the
routines in Listings Three and Four .
To get a first cut at a circle algorithm, you compute y from ````r2--x2 (see
Listing Five). Although it may not look like it, the algorithm in Listing Five
is almost the same as Paterson's. The difference is that his approach replaced
an expensive, repeated square-root calculation with a series of additions and
subtractions. This is similar to the "strength reduction" performed by many
optimizing compilers that replace an expensive operation within a loop with a
series of less-expensive operations, usually adding a new variable in the
process. Of course, strength reduction of a square-root operation is
considerably beyond what can be handled automatically by today's compilers.
Then, reduce the square-root calculation to calculation of squares. In the
octant you're computing (the part of the circle from the top to the top-left),
the y-coordinate of the next point is always the same or one less than the
previous y-coordinate. This means that you can simply decrement y whenever
y2>radius2--x2, or equivalently, whenever x2+y2--radius2>0.
Next, remove the calculation of the squares. Keep the square of the radius in
a temporary variable. You can then use the algebraic simplification
(x+1)2=x2+2x+1 to keep track of x2 and y2 as you change x and y. This involves
adding new variables xSquared and ySquared, adding 2x+1 before incrementing x
and subtracting 2y--1 from y2 before decrementing y.
Finally, because the only place you need all these squares is in the condition
to decrement y, you can combine them into a single error variable kept equal
to xSquared+ySquared--radiusSquared. The earlier condition then simplifies to
error>0, which gives exactly the algorithm Paterson describes; see Listing Six
. This explanation of the variable error should make clear why this algorithm
works: Whenever the current y gives a point outside the circle (whenever
x2+y2>radius2), you reduce y by 1 to remain within the circle. As Venkataraman
pointed out, you'd get a smoother result if you could instead choose the
points closest to the circle; that is, if you could only reduce y when the
lower point would be closer to the circle.
To accomplish this, take a careful look at how the algorithm decides when to
decrement y. You want a condition that will tell you whether (x,y) or (x,y--1)
is closer to the circle. Algebraically, you want to decrement y whenever the
inequality in Figure 1(a) is true. Changing the absolute values to squares
gives something equivalent, but easier to simplify. The simplification is
still tedious, but you eventually end up with something useful. The left side
of Figure 1(b) is simply the value error from earlier attempts. The value in
brackets is always positive and less than 1. This means that the right side of
the inequality is less than y but greater than y--1. Since everything else is
an integer, the inequality simplifies to error_y.
This results in Listing Seven , which is close to Venkataraman's suggestion,
but always gives the point closest to the circle. To test this code, replace
the plot8 routine with one that computes the radius of the points being
plotted and the radius of points just above and below them. You'll see that
the algorithm always picks the y-coordinate that results in a radius closest
to the circle. You may be able to speed up Listing Seven slightly by changing
error to be x2+y2--r2--y, which allows the condition to simply test error_0. 


Handling Ellipses and Nonsquare Pixels


If you're only interested in drawing perfect circles on screens with square
pixels, then Listing Seven serves the purpose quite nicely. However, if the
pixels are not square, you have to draw an ellipse in order to get a circular
result. 
The biggest problem in drawing ellipses is that you lose the eight-fold
symmetry used to simplify the circle routine. Where before you had eight
identical octants, you now have two sets of octants: two each at the top and
bottom of the circle that are identical to each other, and four at the left
and right that are identical to each other. So, you need to run the earlier
algorithm twice, once for each set. 
The formula for the ellipse that fits within a rectangle of width 2a and
height 2b is b2x2+a2y2=a2b2. Adjusting the calculation of error to match this
gives us the algorithm in Listing Eight for drawing the ellipse. The
least-obvious part of this is how to determine where each octant stops.
Remember that the algorithm depends on y either staying the same or decreasing
by 1. This means that each loop has to continue until you reach a place in the
ellipse where the slope is 1 or --1. This happens exactly when b2x==a2y; see
Listing Eight.
Clearly, a few more things could be done to speed up this routine. By adding
variables to keep track of b2x and a2y, you could replace all of the
multiplications with additions. You could also alter the definition of error
as discussed earlier to simplify the condition test. Listing Nine incorporates
those optimizations, plus a few others. In assembly language, you should be
able to optimize this to a maximum of eight addition/subtraction/comparison
operations per loop iteration, not counting the operations in the plot4
routine. Counting those, you get a maximum of 16 such operations for four
points on the ellipse, or a mere four additions per point, which compares
quite favorably to the three additions per point for the less-general circle
routine developed earlier, and the six additions per point for the line
algorithm. In fact, this shows that drawing the ellipse directly is faster
than computing several points on the ellipse and using DDA lines to connect
them!
Of course, it might be faster to undo some of these optimizations to reduce
the number of variables required. By doing that, you may be able to fit all
the variables in registers, thus speeding up the algorithm. As is usual for
such low-level graphics operations, optimizing for the specific processor and
video hardware is critical.
Figure 1 (a) You need to decrement y whenever the inequality is true; (b) the
left side is simply the value error from earlier attempts. 

Listing One 

void line_1(int x0, int y0, int x1, int y1 )
{
 int x = x0,y = y0; /* Current point on line */
 int deltaX = x1-x0; /* Change in x from x0 to x1 */
 int deltaY = y1-y0; /* Change in y from y0 to y1 */

 int numerator = 0; /* Numerator of fractional part of y */

 while ( x <= x1 )
 {
 plot( x,y );
 x += 1;

 numerator += deltaY; /* Increase fractional part of y */
 if (numerator >= deltaX) /* If fraction is 1 or more */
 {
 numerator -= deltaY; /* Reduce fraction */
 y += 1; /* Increase whole part of y */
 }
 }
}





Listing Two

#define abs(x) (((x)>=0)?(x):-(x)) /* Absolute value */
void line_2(int x0, int y0, int x1, int y1 )
{
 int x = x0,y = y0;
 int deltaX = 2*abs(x1-x0);
 int deltaY = 2*abs(y1-y0);
 int numerator = deltaX/2; /* Initialize y-coordinate to 1/2 */
 while ( x <= x1 )
 {
 plot( x,y );
 x += 1;

 numerator += deltaY;
 if (numerator >= deltaX)
 {
 numerator -= deltaY;
 y += 1;
 }
 }
}






Listing Three

void plot4( int xOrigin, int yOrigin, int xOffset, int yOffset)
{
 plot( xOrigin + xOffset, yOrigin + yOffset );
 plot( xOrigin + xOffset, yOrigin - yOffset );
 plot( xOrigin - xOffset, yOrigin + yOffset );
 plot( xOrigin - xOffset, yOrigin - yOffset );
}




Listing Four

void plot8( int xOrigin, int yOrigin, int xOffset, int yOffset)
{
 plot4( xOrigin, yOrigin, xOffset, yOffset );
 plot4( xOrigin, yOrigin, yOffset, xOffset );
}



Listing Five

void circle_1(int xOrigin, int yOrigin, int radius)
{
 int x = 0; /* Exact x coordinate */
 int y = radius; /* Approximate y coordinate */

 while (x <= y) /* Just one octant */
 {
 plot8( xOrigin, yOrigin, x, y );
 x += 1;
 y = sqrt(radius*radius - x*x);
 }
}



Listing Six

void circle_2(int xOrigin, int yOrigin, int radius)
{
 int x = 0; /* Exact x coordinate */
 int y = radius; /* Approximate y coordinate */
 int error = 0; /* x^2 + y^2 - r^2 */

 while (x <= y)
 {
 plot8( xOrigin, yOrigin, x, y );

 error += 2 * x + 1;
 x += 1;

 if (error > 0)
 {
 error -= 2 * y - 1;

 y -= 1;
 }
 }
}



Listing Seven

void circle_3(int xOrigin, int yOrigin, int radius)
{
 int x = 0; /* Exact x coordinate */

 int y = radius; /* Approximate y coordinate */
 int error = 0; /* x^2 + y^2 - r^2 */

 while (x <= y)
 {
 plot8( xOrigin, yOrigin, x, y );

 error += 2 * x + 1;
 x += 1;

 if (error >= y)
 {
 error -= 2 * y - 1;
 y -= 1;
 }
 }
}



Listing Eight

void ellipse_1(int xOrigin, int yOrigin, int a, int b)
{
 { /* Plot the octant from the top to the top-right */
 int x = 0;
 int y = b;
 int error = 0;/* b^2 x^2 + a^2 y^2 - a^2 b^2 */

 
 while (x * b *b <= y * a * a)
 {
 plot4( xOrigin, yOrigin, x, y );
 
 error += 2 * b*b * x + b*b;
 x += 1;

 if (error >= y * a*a)
 {
 error -= 2 * a*a * y - a*a;
 y -= 1;
 }
 }
 }

 { /* Plot the octant from right to top-right */
 int x = a;
 int y = 0;
 int error = 0;/* b^2 x^2 + a^2 y^2 - a^2 b^2 */
 
 while (x * b * b > y * a * a)
 {
 plot4( xOrigin, yOrigin, x, y );
 
 error += 2 * a*a * y + a*a;
 y += 1;

 if (error >= x * b*b)
 {

 error -= 2 * b*b * x - b*b;
 x -= 1;
 }
 }
 }
}



Listing Nine

void ellipse_2(int xOrigin, int yOrigin, int a, int b)
{
 int aSquared = a*a;
 int bSquared = b*b;
 int twoASquared = 2 * aSquared;
 int twoBSquared = 2 * bSquared;

 { /* Plot the octant from the top to the top-right */
 int x = 0;
 int y = b;
 int twoXTimesBSquared = 0;
 int twoYTimesASquared = y * twoASquared;

 int error = -y* aSquared; /* b^2 x^2 + a^2 y^2 - a^2 b^2 - a^2y */
 
 while (twoXTimesBSquared <= twoYTimesASquared )
 {
 plot4( xOrigin, yOrigin, x, y );
 x += 1;
 twoXTimesBSquared += twoBSquared;
 error += twoXTimesBSquared - bSquared;
 if (error >= 0)
 {
 y -= 1;
 twoYTimesASquared -= twoASquared;
 error -= twoYTimesASquared;
 }
 }
 }
 { /* Plot the octant from right to top-right */
 int x = a;
 int y = 0;
 int twoXTimesBSquared = x * twoBSquared;
 int twoYTimesASquared = 0;
 int error = -x* bSquared; /* b^2 x^2 + a^2 y^2 - a^2 b^2 - b^2x */

 while (twoXTimesBSquared > twoYTimesASquared)
 {
 plot4( xOrigin, yOrigin, x, y );
 y += 1;
 twoYTimesASquared += twoASquared;
 error += twoYTimesASquared - aSquared;
 if (error >= 0)
 {
 x -= 1;
 twoXTimesBSquared -= twoBSquared;
 error -= twoXTimesBSquared;
 }

 }
 }
}



























































July, 1994
UNDOCUMENTED CORNER


QPI: The QEMM-386 Programming Interface




Ralf Brown


Ralf maintains the MS-DOS Interrupt List, a free collection of information
about interrupt calls. He coauthored Undocumented DOS, PC Interrupts, and
Network Interrupts (Addison-Wesley, 1994) and is currently a postdoctoral
fellow at Carnegie Mellon University's Center for Machine Translation. Ralf
can be contacted at ralf@telerama.lm.com. 


Introduction 
by Andrew Schulman 
In this month's "Undocumented Corner," Ralf Brown examines the private
programming interface provided by Quarterdeck's 386 memory manager, QEMM.
Questions remain concerning the longevity of third-party memory managers such
as Quarterdeck's QEMM and Qualitas's 386MAX. Why should you develop
third-party software when DOSand often Windows provide it for free? Because as
we've seen with DOS extenders and even disk compressors, there's often room
for third-party alternatives.
Whatever the future of third-party memory managers, Ralf's description of the
QEMM programming interface remains fascinating. For example, take a look at
Figure 1, the output from Ralf's QEMMINFO program. The three maps displayed by
this program show the arrangement of the first megabyte (plus a smidgen) of
memory. Even though 386s and Virtual-8086 (V86) mode have been around for
years, many PC programmers are still surprised to hear that the first megabyte
of memory on a PC even has an "arrangement." If you're sitting at the DOS C:\>
prompt, the first megabyte is the first megabyte, right? No, not if (like most
users today) you're using a 386 memory manager such as QEMM or 386MAX, and/or
are running in a DOS box under Windows Enhanced mode. For example, bits of the
first megabyte of linear memory in the third column of Figure 1 actually
belong to the fourth megabyte of physical memory.
Of course, 386 memory managers try to make V86 mode as invisible as possible.
However, it's often necessary (or at least helpful) for programmers to view
the V86-mode reality behind the "it looks like real-mode DOS" facade. Ralf
shows how to do this for QEMM. Some of these QEMM APIs are also partially
implemented (or "spoofed," as Ralf puts it) by other memory managers, so this
information is widely applicable. 
Besides describing how to use the QEMM interface, Ralf also presents some
fascinating background information. For example, he shows how Compaq's
original 386 memory manager, CEMM, is the basis for today's EMM386 and QEMM.
Ralf also touches on how QEMM patches Windows. I enjoyed his explanation of
how V86 managers can hook interrupts and establish interfaces (by hooking I/O
ports, for example) in ways likely to surprise those who still think of DOS as
a real-mode operating system. Make no mistake, when something like QEMM is
loaded, DOS isn't running in real mode, so anything is possible. V86 is hardly
like real mode.
Send your comments and suggestions to me via the Undocumented Corner area in
the Dr. Dobb's CompuServe forum (GO DDJFORUM), where my ID is 76320,302.
Memory managers support a variety of industry-standard interfaces developed
over the years--EMS, XMS, VCPI, DPMI, and VDS. But any programmer who has used
the utility programs included with memory managers such as Quarterdeck's
QEMM-386 or Qualitas's 386MAX knows that there must be another way to control
and retrieve information from these managers beyond the method available
through interfaces such as EMS and XMS. How else, for example, could QEMM.COM
or Manifest determine how much memory QEMM is using for its own code or mapped
ROM?
The answer is that QEMM, 386MAX, and Helix Software Netroom's RM386 support
private APIs; Compaq's CEMM and Microsoft's EMM386 also have smaller APIs.
These memory managers use quite different methods of invoking their private
functions. RM386 provides direct, interrupt-based calls; QEMM, CEMM, and
EMM386 have a FAR CALL entry point whose address may be determined in a number
of ways, including interrupt calls; and 386MAX uses the 386's ability to trap
access to a "magic" I/O port and transfer control to the V86-mode supervisor,
386MAX.SYS. Given the sizes of the APIs involved and the fact that the Netroom
API is almost entirely documented (rather a rarity these days), I'll focus
here on QEMM.
Through a FAR CALL entry point, QEMM provides functions to change its state,
provide statistics, control memory mapping and video virtualization, and
support coexistence with Microsoft Windows. In June 1993, Quarterdeck released
official documentation on what it calls the QEMM-386 Programming Interface
(QPI); however, the majority of this interface is still undocumented. 


Finding the QPI Entry Point


Since QPI is based on a FAR CALL entry point, you first have to determine that
entry point. There are at least four methods for determining QEMM's private
entry point in recent versions: scanning for a signature string; using INT 67h
AH=3Fh; using an IOCTL call; and using Quarterdeck's RPCI (Resident Program
Communication Interface). The last two were officially, though obscurely,
documented in two June 1993 Quarterdeck files called QDMEM.DOC and QPI.DOC.
These four methods have accumulated over numerous revisions of QEMM.
Scanning for a signature involves looking for the string "QUARTERDECK EXPANDED
MEMORY MANAGER 386" located at offset 14h in the EMMXXXX0 (expanded memory
manager) device driver's segment. This is preceded by a WORD containing the
entry point's offset in the driver's segment. Prior to QEMM 7.0, this
device-driver code--and thus the signature string--was always located in low
memory. Beginning with 7.00, this code can be relocated into upper memory,
though in 7.01, copies are present in both low memory and, under some
circumstances, in an upper-memory block. This method is probably used only to
verify the entry point returned by the next method, since software scanning
memory for the signature would likely not have known it would wind up in high
memory with QEMM 7, and would thus have been "broken" by the new version.
Both Compaq's CEMM and Microsoft's EMM386 support the signature method of
getting their own private entry points, using the signatures "COMPAQ EXPANDED
MEMORY MANAGER 386" and "MICROSOFT EXPANDED MEMORY MANAGER 386". The
similarity between the CEMM and EMM386 private APIs indicates that EMM386 is
likely a direct descendant of CEMM. There are also sufficient similarities
between CEMM and QEMM to hint that QEMM is derived from Compaq's memory
manager as well. All three memory managers have identical functions 00h and
01h in their private APIs.
The INT 67h AH=3Fh method is the simplest, but has also been (incompletely and
not entirely correctly) copied by at least two other memory managers. To check
for QEMM's presence and simultaneously retrieve the entry point, load AH with
3Fh, CX with 5145h ('QE'), and DX with 4D4Dh ('MM'); then invoke INT 67h. On
return, AH will be 00h if the call was successful (QEMM or one of the
"spoofing" managers is installed), and ES:DI will contain the address of the
entry point.
However, both Micronics' MICEMM and 386MAX provide only a few of the functions
on this entry point that QEMM does. One way to distinguish between the real
QEMM and other memory managers is to test for the signature string at offset
14h in the entry point's segment; however, MICEMM will provide this same
signature if it has been given the DV command-line switch.
Beginning with version 5.0, Quarterdeck introduced a new interface shared by a
number of its resident programs, including QEMM, QRAM, VIDRAM, and
resident-mode Manifest. The RPCI uses INT 2Fh with a dynamically set function
number between C0h and FFh, defaulting to D2h. All RPCI programs share this
same multiplex number. 
To find the RPCI multiplex number, scan AH values from D2h to FFh and then C0h
to D1h, calling INT 2Fh with AX=XX00h, BX=5144h ('QD'), CX=4D45h ('ME'), and
DX=4D30h ('M0'). On return, AL will be FFh if the multiplex number is in use.
If it is the RPCI rather than some other program, it will be BX=4D45h ('ME'),
CX=4D44h ('MD'), and DX=5652h ('VR'). Armed with the multiplex number, check
for QEMM by calling INT 2Fh with AH=multiplex number, AL=01h, BX=5145h ('QE'),
CX=4D4Dh ('MM'), and DX=3432h ('42'); if QEMM is present, it returns BX=4F4Bh
('OK') and sets ES:DI to the address of the entry point.
The final detection method was also added in 5.0 (though Quarterdeck's QPI.DOC
claims that it is first available in 6.0). If you open the character device
QEMM386$ with INT 21h AX=3D00h and perform an IOCTL INPUT (INT 21h AX=4402h)
of four bytes, the four returned bytes are a FAR pointer to the QPI entry
point.
Since both the RPCI and IOCTL methods have been officially (if obscurely)
documented, they are the preferred methods for retrieving the QPI entry point,
rather than the older and still-undocumented signature and INT 67h methods. 


The QPI Functions


After finding the private entry point, QEMM's private functions may be invoked
by loading the function number into AH, the subfunction (if any) into AL,
setting any other required registers, and calling the far entry point. On
return, CF indicates if the function was successful.
Interestingly, using a debugger to examine the actual entry point for QEMM
prior to 7.0 reveals nothing more than an INT 2Ch instruction followed by an
IRET (in 7.x, there is some additional indirection before the INT 2Ch). But
looking at INT 2Ch reveals nothing of QEMM--by default, MS-DOS points it at an
IRET instruction. 
How then could this possibly be QEMM's interface? On any hardware or software
interrupt (such as this INT 2Ch) in V86 mode, the CPU switches to protected
mode and calls a protected-mode interrupt handler rather than the handler
pointed at by the real-mode interrupt vector table at 0000:0000. So QEMM gets
to see all interrupts before applications do, and can filter out those meant
for it instead of passing them down to the real-mode handler. 
If issued from the QEMM386$ driver's segment, INT 2Ch and most of the other
interrupts in the range 22h to 30h are meant for QEMM and will provide various
functions to QEMM's real-mode stub; if issued from any other segment, they are
passed back down to the "real-mode" (actually V86 mode) handler. So you can't
call QPI simply by putting an INT 2Ch in your own code. INT 2Ch happens to be
the QPI provider and has remained stable over many versions of QEMM; various
other interrupt numbers have been used for EMS, XMS, VDS, and INT 15h AH=87h
services, as well as some internal calls.
Table 1 provides an overview of the QPI functions, with the few officially
documented calls indicated. Except for function 1Dh, each new version of QEMM
has supported all functions supported by all previous versions since 4.23 (the
earliest about which information is available). The QPI functions may be
roughly classified as follows: 
Changing QEMM's state: 00h, 01h, 04h, and 05h.
Statistics: 11h, 16h, and 17h.
Memory mapping: 06h to 0Bh, 0Fh, 18h, and 1Fh.
Stealth: 1Dh, 1Eh, 21h, 24h.
Video virtualization: 0Dh, 0Eh, 13h.
Coexistence with Windows: 1Bh.
Desqview support: 1306h, 14h, 1Ch, and 22h.
VCPI functionality: 0Ch and 10h.
Miscellaneous: 02h, 03h, 12h, 15h, 1Ah, and 20h.

Although the ordering is different, function 0Ch and the various subfunctions
of function 10h provide exactly the same calls as the Virtual Control Program
Interface (VCPI) and are clearly the precursor of the public specification. In
fact, QEMM implements both the VCPI calls on INT 67h AH=DEh and QPI functions
10xxh with calls to the same underlying subroutines. This situation is
analogous to the development of the DPMI specification, which in many ways is
merely a description of preexisting functionality in the Windows VMM.
Function 1Ah provides access to I/O ports that bypass any protections or
virtualization QEMM may have imposed on the I/O port the program wishes to
use. Other functions, such as some of the 13xxh calls and function 15h, can
affect which I/O ports are virtualized by QEMM. Ports 60h, 64h, 92h, and
various VGA ports are normally virtualized by QEMM.
Functions 1Dh, 1Eh, 21h, and 24h support QEMM's patented Stealth feature,
which provides more upper memory by hiding the system's ROMs. Stealth remaps
memory so that the ROMs appear in the first megabyte only when they are
actually required, namely during an interrupt call which reaches a handler in
ROM. When QEMM starts up with Stealth enabled, it hooks all interrupts that
point into ROM to intercept calls just before they are chained to a ROM. This
technique allows Stealth to work with the existing ROMs, in contrast to Helix
Software's Cloaking or Novell's DPMS (DOS Protected Mode Services), which
require software written specifically to their interface (which involves a
small stub in the first megabyte and the true handler running in protected
mode). In exchange, Cloaking and DPMS offer the ability to move arbitrary
resident programs out of the "real-mode" first megabyte.
QEMM provides some functions specifically for use by Desqview (in combination
with which it creates the Desqview/386 multitasker). The close interaction
between QEMM and Desqview can be seen in function 14h, which supports
Desqview's "protection level" feature. A nonzero protection level for a
program enables additional checks that catch many errant programs before they
cause a system-wide crash. These functions are not usable by other
applications because QEMM makes various Desqview API calls (INT 15h
AH=10h--12h) when nonzero protection levels are in effect; in particular, QEMM
assumes that it can pop up a Desqview error-message window when it detects a
protection violation.
Naturally, the conventional-memory stub of QEMM386.SYS also uses QPI calls.
QEMM uses functions 00h and 01h to either temporarily or permanently change
its state, such as turning itself off when a laptop goes into sleep mode and
then returning to its former state when the laptop resumes. The code
supporting the DISKBUF (DB) switch uses function 18h to determine whether
there is any need to copy the data being transferred to or from the disk
through a fixed buffer allocated by QEMM; if the logical address is identical
to the physical address for every byte in the buffer being used by the
application, there is no need for the temporary buffer. QEMM v6.0x used
functions 1D00h and 1D01h in supporting the suspend/resume interrupt feature
of many laptops.
All the 1Bxxh functions are used in some way while operating with Windows:
Function 1B00h returns the address of the Global EMM Import Structure.
Functions 1B01h and 1B02h implement the Windows V86-mode enable/disable
callback provided through INT 2Fh AX=1605h.
Functions 1B03h and 1B04h are used by QEMM's conventional-memory stub to
notify QEMM's protected-mode code that Windows is starting or terminating.
Functions 1B05h and 1B06h are used in patching some of Windows' drivers as
they are loaded into memory (in particular, QEMM versions 6 and 7 patch
Windows 3.0 Standard mode).
The previous section mentioned that two other memory managers provide the INT
67h AH=3Fh call to get the QPI entry point, but provide only a subset of the
QPI functions. MICEMM provides only functions 00h, 02h, and 03h; 386MAX 6
provides only function 0Ch and the various subfunctions of function 10h.
(Interestingly, these are precisely the functions which later became the VCPI
specification on INT 67h AH=DEh.) The problem with 386MAX's implementation is
that the few supported functions use a nonzero return value in AH instead of
the carry flag to signal an error or unsupported function. 


Other Undocumented Functions


The QPI just described is not the full extent of QEMM's private API. QEMM
provides an additional (documented) RPCI function beyond the two already
shown. Similar to QPI function 12h, calling INT 2Fh with AH=multiplex number,
AL=01h, BX=4849h ('HI'), CX=5241h ('RA'), and DX=4D30h ('M0') will return
BX=4F4Bh ('OK') if high memory is present and will set both CX and DX. CX
contains the segment of the first memory-control block in the high-memory
chain, and DX contains the segment of the owner of any locked-out memory
blocks (video or ROMs between the regions of upper memory). In existing
versions of QEMM, the value in DX is always the segment of the QEMM386$
device-driver code. Unlike QPI function 12h, this call is also supported by
Quarterdeck's QRAM, a memory manager for sub-386 PCs which can use shadow RAM
as upper-memory blocks. Quarterdeck's high-memory chain is identical to the
DOS 4.x (and greater) memory chain in low memory, with the owner field in the
memory-control block set to the string "UMB" for XMS upper-memory blocks and
the program name for programs and their environments loaded with LOADHI. Just
as with DOS's memory chain, the first byte of each memory-control block except
for the last one is 4Dh ('M'). The first byte of the last one is 5Ah ('Z').
Another function provided for Windows compatibility (and supported by EMM386,
CEMM, and probably other memory managers) is an IOCTL call on the character
device EMMXXXX0, the actual EMS driver. EMM386 and CEMM support multiple
subfunctions, but QEMM only supports the one needed to coexist with Windows:
subfunction 01h, "Get EMM Import Structure Address." To use this function,
Windows opens the device "EMMXXXX0" to get a file handle, then calls INT 21h
with AX=4402h, BX=file handle, CX=0006h, and DS:DX pointing at a 6-byte buffer
whose first byte has been set to 01h. On return, CF will be clear if the call
was successful and the buffer will have been filled as in Table 2. This will
be covered in detail in a future DDJ article by Taku Okazaki.


Bugs


Various versions of QEMM contain errors in range checks on function numbers.
These cause attempted calls to some unimplemented functions to jump to random
locations, generally causing a system crash. Versions 5.11 and 6.00, for
instance, will accept INT 4Bh Virtual DMA Specification (VDS) calls with
AX=810Dh, even though the highest supported subfunction is 0Ch.


Some Useful Undocumented Functions


Not surprisingly, the officially documented functions are those that are most
critical for proper coexistence with QEMM's advanced features, such as
Stealth. Even so, a number of other functions also come in handy.
Function 18h (already mentioned in the context of the DISKBUF switch), for
example, can tell a program whether it is safe to use DMA directly to a
particular buffer. If this function indicates that the specified region of the
program's address space is entirely in conventional memory, then the physical
addresses needed for DMA are the same as the logical linear addresses the
program sees, and the DMA controller can be used without going through VDS to
allocate a buffer and copy data to and from it.
The memory allocated to an EMS handle may be made visible in the program's
address space using functions 0Bh and 0Fh. A program might thus make 128K of
EMS visible at a time, with the limitation that no single EMS handle can be
allocated more memory than the size of the address range into which the memory
is mapped. This is possibly how Desqview virtualizes CGA graphics: by
allocating some EMS and mapping it into the video-memory space. 
Either the aforementioned two functions, function 0Ah, or function 1F01h (both
of which change the mapping for a single 4K page) could map in the bulk of the
memory required by a TSR. This allows a very small stub in the 1-megabyte 8086
address space which maps in the remainder of the TSR, as needed. The main TSR
code is then physically located in extended memory, which can be made visible
anywhere--on top of video memory, for example. Take care, however, to properly
preserve the prior page mappings; this is particularly problematic when using
functions 0Bh and 0Fh, since function 0Fh will undo any mappings that might
have existed in the affected area before function 0Bh was used.


A Sample Program


To show how to use QPI, I've written QEMMINFO, an information-reporting
utility. Like Quarterdeck's own QEMM.COM and Manifest, QEMMINFO displays maps
of the memory types, which pages of memory have been accessed, and other
information.
The QEMMINFO display (see Figure 1) consists of four columns. The first
contains a map of the memory type for each 4K page in the first megabyte. This
map, like those generated by QEMM.COM or Manifest, indicates which pages are
conventional memory, mappable, high RAM, video memory, excluded, and so on. 
The second column displays which pages have been accessed or modified. This
map is an extension of the one displayed by QEMM.COM or Manifest, since it
also shows the access status of the 16 pages making up the high-memory area
(HMA). The display of the first megabyte uses the QPI function 1600h provided
for that purpose, but the HMA display extracts the access bits from the
page-table entries for the HMA pages.
The third QEMMINFO column displays a map which neither QEMM.COM nor Manifest
can generate--the translations between the V86-mode addresses and
physical-memory addresses. For each 4K page in the first 1088K (one megabyte
plus HMA) of linear address space, QEMMINFO shows which megabyte of physical
memory actually appears in that page. This display is created by reading the
page number for each of the first 272 pages (1088K) in the current V86 address
space and converting the page number into a multiple of one megabyte. A value
of 0 indicates that the page shows memory from the first
megabyte--conventional memory. (Except in very unusual cases, the physical
address is the same as the logical address.)
Figure 1 was generated from a 512K DOS window under Desqview and clearly shows
how segments 0400h--87FFh have been mapped to a block of EMS memory in
megabytes 3 and 4, while the 96K from 8800h--9FFFh, which are not part of the
DOS window, have not been remapped. My PC has 384K of "top" memory just below
the 16M mark (as do many Compaq systems), and this memory is used to provide
UMBs and shadow RAM, appearing as "F" in the QEMMINFO display.
Additional information includes the VHDIRQ setting, which affects background
disk accesses by many advanced disk caches with delayed writes; this item will
typically report that the setting is ignored when Stealth is disabled and
respected when Stealth is active. 
QEMMINFO's memory-mapping display can be used to show that QEMM doesn't
actually enable or disable the A20 line, but merely remaps memory to simulate
the address wrapping due to A20. When A20 is open (which it will always be
when DOS=HIGH), QEMMINFO will show that the HMA is mapped to megabyte 1; when
it is closed, QEMMINFO shows that the HMA is mapped to megabyte 0.
One of QEMMINFO's options is to clear the memory-access flags, just like QEMM
RESET. The QEMMINFO RESET option also resets the access flags for the HMA,
which QEMM RESET won't do. QEMMINFO also allows you to selectively clear
either read or write flags as well as both flags for each page; QEMM RESET
always clears both flags.
Listing One, QPICALL.ASM, forms the core of QEMMINFO. This module exports the
C-callable function QPIcall, which invokes QEMM's private API in the same way
the int86 function permits C code to call software interrupts. QEMM.C builds
more than three dozen "glue" functions around QPIcall to provide access to
most of the QPI; QEMMINFO.C, in turn, builds upon the functions provided by
QEMM.C. The combination of QPICALL.ASM and QEMM.C can be used as a generic
function library for calling QEMM functions, and it is independent of the
sample program QEMMINFO. The full listings are available electronically (see
"Availability," page 3), as is the complete calling information known for the
private API functions (in QEMMINTS.LST).


Wrapping Up


Though the makers of memory managers have tried hard to make the V86 mode
behave just like true real mode, it is not possible (and in many cases not
practical) to operate exactly as in real mode. For example, VDS was created to
deal with the problem that logical and physical addresses are no longer the
same when running under a memory manager. Although the memory manager could
virtualize the DMA controller, that would not help bus-mastering cards such as
many SCSI host adapters; a set of services which allow aware software to
interact with the memory manager is far superior because it can be applied to
any hardware, not just that to which the memory manager's programmers have
ready access. QEMM's private calls similarly allow QEMM-aware programs to
accomplish things that would be possible in real mode but are not otherwise
possible in V86 mode under QEMM.


References


Brown, Ralf, ed. INTER40x.ZIP, "MS-DOS Interrupt List," Release 40, April 3,
1994. 

Brown, Ralf and Jim Kyle. PC Interrupts, 2nd ed. Reading, MA: Addison-Wesley,
1994. 
Quarterdeck Office Systems, Technical Note QDMEM.DOC, "Quarterdeck Memory
Driver Interface," and QPI.DOC, "QEMM-386 Programming Interface," June 15,
1993. Available on the Quarterdeck BBS (310-314-3227) in QPI.ZIP.
Figure 1: Sample QEMMINFO display, showing the status of each 4K page in the
first 1088K of linear memory: In the first column, M=mapped ROM, period
(.)=mappable RAM, H=high RAM, X=excluded memory, V=video, R=ROM,
A=adapter,\=split ROM (2K ROM/2K RAM), f=page frame, r=RAMable,
C=conventional.
 Memory Types Memory Accesses Memory Mappings QEMM v7.03
 ------------------- ------------------- ------------------- state: ON
 01234567 89ABCDEF 01234567 89ABCDEF 01234567 89ABCDEF HiRAM from: B100
0 XXXX.... ........ 0 WWWWWWWW WWWWWWWW 0 00003333 33333333
1 ........ ........ 1 WWWWWWWW WWWWWWWW 1 44444444 44444444 QEMM uses:
2 ........ ........ 2 WWW.W... ........ 2 44444444 44444444 768 low
3 ........ ........ 3 ........ ........ 3 44444444 44444444 75626 code
4 ........ ........ 4 ........ ........ 4 44444444 44444444 39916 data
5 ........ ........ 5 ........ ........ 5 44444444 44444444 18568 TASKS=
6 ........ ........ 6 ........ .....WWW 6 44444444 44444444 20480 MAPS=
7 ........ ........ 7 WWWWWWWW WWWWWWWW 7 44444444 44444444 196608 HiRAM
8 ........ ........ 8 WWWWWWW. WWWWWWWW 8 44444444 00000000 32768 DMA buf
9 ........ ........ 9 WWWWWWWW WWWWWWWW 9 00000000 00000000 16384 ROMs
A VVVVVVVV VVVVVVVV A WW...... ........ A 00000000 00000000 Unavailable:
B VHHHHHHH VVVVVVVV B .WW.WWWW WWWWWWWW B 0FFFFFFF 00000000 0 conv
C ffffffff ffffffff C R.R..... ........ C 00000000 00000000 0 ext
D HHHHHHHH HHHHHHHH D WWWWWWWW WWWWWWWW D FFFFFFFF FFFFFFFF 0 EMS
E HHHHHHHH HHHHHHHH E WWWWWWWW W.WWWWWW E FFFFFFFF FFFFFFFF 0 top/shdw
F HHHHHHHH RMRRRHMM F WWWWWWWW R.RRRWRR F FFFFFFFF 0F000FFF Stealth:M
 H R.RRRRRR R....... H 11111111 11111111 (2 ROMs)
VCPI: 876 of 1951 pages available Mapping context: 0141
VHDIRQ setting respected (enabled)
Global EMM Import Structure v1.00 is at physical address 00480758
Table 1: QEMM-386 programming interface functions: (a) General functions; (b)
QEMM v5.0+; (c) QEMM v5.1+; (d) QEMM v6.00+; (e) QEMM v6.03+; (f) QEMM v6.04+;
(g) QEMM v7.00+.
Function Description 
(a)
00h Get QEMM state (documented)
01h Set QEMM state (documented)
02h Get segment of unknown data structure
03h Get QEMM version (documented)
04h Activate QEMM when in AUTO mode
05h Deactivate QEMM when in AUTO mode
06h Make new mapping context
07h Get mapping context
08h Set mapping context
09h Get linear page number for page table entry
0Ah Set linear page number for page table entry
0Bh Map 4K pages into memory
0Ch Get available memory
0Dh Select CRT controller I/O ports to be trapped
0Eh Set cursor virtualization callbacks
0Fh Unmap 4K pages
10h VCPI-precursor interface
00h Get protected-mode interface
01h Get CPU debug registers
02h Set CPU debug registers
03h Get machine status word CR0
04h Allocate a 4K page
05h Free 4K page
06h Null function
07h Get maximum physical memory address
08h Get physical address of page in first megabyte
09h Switch to protected mode
0Ah Switch back to virtual-86 mode
11h Get memory type map
12h Get HIRAM chain

13h Video-related
00h May be VIDRAMEGA
01h May be check for modified video memory
02h Unknown
03h Initialize EGA graphics virtualization
04h Shutdown EGA graphics virtualization
05h Select portion of EGA graphics to virtualize?
06h Set DESQview critical section counter address
07h Unknown
08h Start/reset CRT controller I/O trapping
09h Hercules Graphics Card mode-change support
0Ah Virtualize EGA/VGA DAC registers (I/O ports 03C8h/03C9h)
0Bh Unknown
0Ch Set interrupts to mask during certain Function 13h subfunctions
0Dh Map EGA memory at A0000h
0Eh Unknown
0Fh Reset unknown data
10h Copy modified pages to physical video RAM?
11h Set unknown flag
12h Apparently null function
14h Desqview "protection level" support
00h Initialize
01h Shutdown
02h Set protection level?
03h Add item to unknown list
04h NOP
05h Remove item from unknown list

06h Unknown
07h Unknown
08h Unprotect?
09h Abort program causing protection violation?
0Ah Unknown
0Bh Unknown
05h Set timer channel 0 virtualization
Function Description 
(b)
16h Get/Set memory access status
00h get
01h set
17h Get memory usage statistics
(c)
18h Check whether conventional memory mapped in address range
19h Null function
1Ah Non-virtualized I/O port access
00h Read byte
01h Write byte
02h Write byte, read byte from following port
03h Write word
1Bh MS Windows 3.x support
00h Get EMM Import Structure address (see Table 2.)
01h Disable V86 mode (shutdown EMS and initialize EMM Import record)
02h Enable V86 mode (restart EMS and free EMM Import record)
03h MS Windows initializing
04h MS Windows terminating
05h Determine whether program is a driver
06h Patch driver
07h Bug (fencepost error)
1Ch Hardware interrupt V86-mode calldowns

00h Disable IRQ0-7 calldowns
01h Set V86-mode IRQ0-7 handlers
02h Disable IRQ8-15 calldowns
03h Set V86-mode IRQ8-15 handlers
(d)
1Dh Stealth interrupts (QEMM 6.x only)
1Eh Stealth information (documented)
00h Get Stealth configuration
01h Get number of Stealth'ed ROMs
02h Get list of Stealth'ed ROMs
1Fh Page-table manipulation (documented)
00h Get page-table entry
01h Set page-table entry
20h Asynchronous disk access support (documented)
00h Get VirtualHDIRQ information
01h Set VirtualHDIRQ state
21h Stealth support (documented)
00h Copy data from Stealth'ed addresses
(e)
22h Desqview/X support
00h Get unknown data
01h Set unknown value
(f)
23h Unknown (subfunctions 00h, 01h, 02h, and FFh)
(g)
24h ST-DBL support (subfunctions 00h and 01h)
Table 2: EMM import-structure address record.
 Offset Size Description 
 00h DWORD Physical address of EMM import
 structure
 04h BYTE Major version of EMM import
 structure (01h)
 05h BYTE Minor version of EMM import
 structure (00h for Windows 3.0,
 0Bh for Windows 3.1)

Listing One 

;************************************************************************
;* QPIcall.ASM High-level function to call QEMM-386 API *
;* (c) Copyright 1994 Ralf Brown *
;************************************************************************
;LastEdit: 2/24/94

 .386

REGS STRUC
 reg_eax dd ?
 reg_ebx dd ?
 reg_ecx dd ?
 reg_edx dd ?
 reg_ebp dd ?
 reg_esi dd ?
 reg_ds dw ?
 reg_edi dd ?
 reg_es dw ?
 reg_flags dw ?
REGS ENDS


;========================================================================
_TEXT SEGMENT BYTE PUBLIC 'CODE' USE16 ; forward declaration to ensure
_TEXT ENDS ; proper segment ordering

_DATA SEGMENT WORD PUBLIC 'DATA' USE16

QEMM_name db "QEMM386$",0
public QEMM_version
initialized db 0
QEMM_version dw 0
QPI_entrypt dd ?

_DATA ENDS

;========================================================================
_TEXT SEGMENT BYTE PUBLIC 'CODE' USE16
 ASSUME CS:_TEXT

;------------------------------------------------------------------------
; int QPIinit(void) ;
; Returns QEMM version (256*major+minor) or 0 if QEMM not loaded
; Destroys AX, BX, CX, DX, and flags

public _QPIinit
_QPIinit proc far
IFDEF __HUGE__
 push ds
 mov ax,_DATA ; in huge model, every module gets its
 mov ds,ax ; own data segment
ENDIF
 ASSUME DS:_DATA
 push es
 push di
 mov ax,QEMM_version
 cmp initialized,1
 je short init_done
;; first, try to use QEMM v5+ interface to get entry point
 lea dx,QEMM_name
 mov ax,3D00h ; try to open QEMM386$ for reading
 int 21h
 jc instchk_2 ; if open failed, not QEMM v5+
 mov bx,ax
 lea dx,QPI_entrypt
 mov cx,4
 mov ax,4402h ; IOCTL Input
 int 21h
 pushf
 mov ah,3Eh ; close the file handle
 int 21h
 popf
 jnc short got_entrypoint
;; if that fails, try the older installation check (which gets spoofed by
;; some other memory managers nowadays)
instchk_2:
 mov ah,3Fh
 mov cx,5145h ; QE
 mov dx,4D4Dh ; MM
 int 67h
 cmp ah,0

 mov ax,0 ; assume QEMM not installed
 jne short init_done ; abort initialization if wrong return
 mov word ptr QPI_entrypt,di
 mov word ptr QPI_entrypt+2,es
got_entrypoint:
 mov initialized,1 ; QPI pointer successfully initialized
 mov ah,3 ; func = get version
 call QPI_entrypt
 mov ax,0 ; was get-version call successful?
 jc short init_done ; if not, this isn't really QEMM
 mov ax,bx
 mov QEMM_version,ax ; remember QEMM version 
init_done:
 pop di
 pop es
IFDEF __HUGE__
 pop ds
 ASSUME DS:NOTHING
ENDIF
 ret
_QPIinit endp

;------------------------------------------------------------------------
; int QPIcall(QEMMREG far *inregs, QEMMREG far *outregs) ;
; Returns 1 if successful, 0 if QEMM call failed, and -1 if QEMM not loaded
; Destroys AX, BX, CX, DX, and flags
;
public _QPIcall
_QPIcall proc far
@inregs = dword ptr [bp+18]
@outregs_ofs = 14
@outregs = dword ptr [bp+@outregs_ofs]
 push es
 push di
 push ds
 push si
 push bp
 mov bp,sp
IFDEF __HUGE__
 mov ax,_DATA ; in huge model, every module gets
 mov ds,ax ; its own data segment
ENDIF
 ASSUME DS:_DATA
 cmp initialized,1 ; have we been called before?
 je short do_call ; if yes, don't re-initialize
 push cs
 call near ptr _QPIinit ; get QPI entry point
 mov ax,-1
 cmp initialized,1 ; was initialization successful?
 jne short QPIcall_done ; return AX=-1 (error) if not init'ed
do_call:
 lea ax,call_done ; build a fake call frame with the
 push cs ; address to which we want to return
 push ax ; after the QPI call
 push dword ptr QPI_entrypt ; also store QPI call address on stack
 lds si,@inregs ; load up the CPU registers from the
 mov eax,[si].reg_eax ; input registers structure
 mov ebx,[si].reg_ebx
 mov ecx,[si].reg_ecx

 mov edx,[si].reg_edx
 mov ebp,[si].reg_ebp
 les edi,pword ptr [si].reg_edi
 lds esi,pword ptr [si].reg_esi
 ret ; invoke the QPI call
call_done:
 push ebp
 push ds ; preserve the registers which get
 push esi ; clobbered in setting up addressing
 pushf ; to the output registers structure
 mov bp,sp ; restore BP to pre-call value
 add bp,12
 lds si,@outregs ; set up addressing to results buffer
 pop [si].reg_flags ; store returned register values into
 pop [si].reg_esi ; the output registers structure
 pop [si].reg_ds
 pop [si].reg_ebp
 mov [si].reg_eax,eax
 mov [si].reg_ebx,ebx
 mov [si].reg_ecx,ecx
 mov [si].reg_edx,edx
 mov [si].reg_edi,edi
 mov [si].reg_es,es
 cmp ah,84h ; 386MAX error return?
 mov ax,0
 je short QPIcall_done ; if yes, return 0 ("failed")
 test byte ptr [si].reg_flags,1 ; CF set to indicate error?
 jnz short QPIcall_done ; if yes, return 0
 inc ax ; AX <- 1 ("OK")
QPIcall_done:
 pop bp
 pop si
 pop ds
 pop di
 pop es
 ret
_QPIcall endp

_TEXT ENDS

 END





















July, 1994
PROGRAMMER'S BOOKSHELF


Examining Graphics File Formats




Jonathan Erickson


From our 1988 unraveling of TIFF to the 1993 examination of FLI, articles on
graphics file formats have consistently been among the most popular DDJ has
published. An admitted shortcoming, however, is that overall, the information
is both inconsistent in presentation and incomplete in coverage. That is to
say, Kent Quirk's August 1989 "Translating PCX Files" takes a much different
approach than, say, Tom Swan's August 1993 "Diving into Windows
Bitmaps"--although both ultimately reveal details about graphics file formats.
And while we've covered many different file formats over the years, there are
many more we haven't. I've often thought that what's needed is a book that
examines, in detail, the various file formats available to programmers. The
Encyclopedia of Graphics File Formats, Bitmapped Graphics Programming in C++,
and Programming for Graphics Files in C and C++ are recently published books
that do just this. As you might expect, there is overlap among them. Each
begins with an overview of graphics file formats--color, compression, hardware
and operating-system dependencies, bitmap vs. vector files, metafiles,
conversion issues, and the like. Nor is it a surprise that the books generally
cover the same file formats--TIFF, PCX, GIF, BMP, and so on. Likewise, all
three are written for the same audience--programmers who need to import or
export graphics files into or out of their applications. Nonetheless, each
author does take a different approach to the task.
As its title suggests, the Encyclopedia of Graphics File Formats, by James
Murray and William vanRyper, is a reference book. As such, it provides a
consistent presentation of nearly 100 graphics file formats ranging from
Adobe's Photoshop to Zenographics' ZGM. In between, you'll find the familiar
(PCX, TIFF, EPS, QuickTime) alongside the uncommon (FITS, TDDD, VICAR2, SGI
YAODL). Of particular relevance to this issue of DDJ, for instance, is a
discussion of the POV and DKB file formats (see "Ray Tracing and the POV-Ray
Toolkit," by Craig Lindley, on page 68) and the IFF format (see "Morphing 3-D
Objects in C++," by Glenn Lewis, on page 18). 
Each file format is presented in capsule form (name, type, colors, image size,
and other details), followed by a brief overview, file organization (including
source-code examples), file details (usually with more code), and pointers to
additional information (it's gratifying that DDJ is often cited). More than a
dozen pages, for example, are dedicated to the FLI (Flic) file format Jim Kent
wrote about in the March 1993 issue of DDJ. Although the Encyclopedia doesn't
publish a working program in hardcopy, as did Jim's DDJ article, it does
provide much more detailed information. 
Included with the Encyclopedia of Graphics File Formats is a CD-ROM containing
code examples to read, write, and display files in most (but not all) of the
formats covered in the book. Additionally, the CD-ROM details format
specifications, sample images, and utilities for manipulating and converting
graphic files. Note that the authors did not write all of the tools and
utilities on the CD themselves, opting instead to provide publicly available
programs for image conversion, manipulation, and the like. For browsing the
CD-ROM, they provide Mosaic, and in most cases, the CD includes original
format specifications (when the vendors allowed publication).
Marv Luse's Bitmapped Graphics Programming in C++ and John Levine's
Programming for Graphics Files in C and C++ take a less structured approach.
Both provide page after page of published source code, and both come with DOS
diskettes containing the published code and more. It's interesting, however,
that Luse based his book on code he wrote (not surprising, as he is founder of
Autumn Hill Software, which specializes in DOS/Windows graphics development
tools), while Levine for the most part wrote his book around publicly
available source code from Jef Poskanzer, Sam Leffler, Tom Lane, and others. 
Although acknowledging its importance, Luse doesn't waste much time with
theory. In his "theoretical" overview of computer graphics, Luse jumps right
into the complete source code for a C++ class that implements and converts
color models. He quickly follows this with classes for palette modification,
color mapping, dithering, and compression. In particular, it was interesting
to learn that CompuServe's GIF format is based on Unisys's patented LZW
compression. Although all three books pointed this out, Luse alone (in his
introduction) cautioned against blindly developing GIF-based commercial
software.
Although the subject isn't formally addressed until page 250 in the book, file
formats are at the heart of Bitmapped Graphics Programming in C++. For the
most part, Luse zeros in on those most commonly associated with PC
(DOS/Windows) software: BMP, EPS, GIF, MSP, PCX, TGA, TIFF, and WPG. However,
he also wanders into the non-DOS arenas of XBM (X Window X-11 bitmaps), RAS
(Sun Raster Format), PNTG (Apple's MacPaint), and IMG (Digital Research's GEM
format). 
Luse categorizes bit-mapped files into simple and complex formats. In the
simple-format section, each file format is approached in a consistent,
code-dependent manner. In each chapter, Luse briefly introduces the file
format, discusses its structure and header, then launches into C++ code that
presents class and structure definitions for format. This is followed by a
query program that prints a formatted listing of the format's data structures.
Luse then presents a file-viewing program, another program that outputs a file
from a VGA display or memory bitmap, and finally, a demo program that writes a
file. Understandably, there are exceptions to this pattern. For instance, when
discussing the GEM IMG format, Luse throws in the source code for an
IMG-to-PCX conversion program (which, curiously, includes comments relating to
the nearby IMG viewer file instead of the IMG2PCX program). Other variations
on the pattern include a program that colorizes monochrome Sun raster files,
as well as both Windows and DOS versions of a BMP file viewer.
Luse changes pace when approaching the complex formats, spending more time on
explanation. In addition to the expected overview, he details the base format
structures, any extensions, and image and data types. The author then jumps
back to the familiar pattern of programs for file querying and viewing,
outputting files from a VGA display or memory bitmap, and demos to write
files. The included disk contains all the source-code listings from the book,
plus additional files for building libraries, testing programs, and more. 
Like Luse, John Levine also takes his time before launching into the heart of
graphics file formats, devoting the first 100 pages or so of the book to
background information, and to a greater extent, his "framework" for
discussing bitmap files. In particular, Levine writes his book around the PBM
utilities (created primarily by Jef Poskanzer) which enable you to read,
write, and manipulate a number of bitmap file formats. In Levine's words, "PBM
defines three extremely simple image formats called PBM, PGM, and PPM, as a
'lingua franca' into which all the other formats can be translated." (The
source-code diskette contains, among other files, PBMPLUS, a toolkit for
converting various image formats to and from portable formats, and therefore
to and from each other. PBMPLUS consists of: PBM, for bitmaps; PGM, for
grayscale images; PPM, for full-color images; and PNM, which performs
content-independent manipulations on the three internal formats.)
After spending 70 or so pages explaining the PBM utilities, Levine dives into
graphics file formats themselves. Instead of simply lumping file formats into
simple and complex, he divides them into run-length compressed (MacPaint, PCX,
IMG, and IFF), uncompressed (BMP and TGA), dictionary compressed (GIF, TIFF,
JPEG), vector (HP-GL, Windows Metafiles), and printer (HP-PCL, Postscript).
Levine presents the requisite overview before discussing how you read and
write a particular file format. When it comes to PCX, for example, he
publishes a short utility that converts a PCX file to a PPM file by reading
the PCX header, uncompressing the image data, reading the color map, and
unscrambling the data before applying the color map. Levine then reverses the
process by writing a PCX file and briefly discusses possible extensions. This
pattern is generally applied throughout the book.
One unique feature of Programming for Graphics Files in C and C++ is its
thorough discussion of JPEG, which Luse doesn't cover and Murray/vanRyper only
touch upon. The disk includes version 4 of the complete JPEG read/write
libraries; routines for reading/writing GIF, TGA, PBMPLUS, and JIFF; and
driver code. The disk also contains the Leffler TIFF Library 3.0 which
implements the complete TIFF 6.0 specification.
At about $40.00 and up, these aren't inexpensive books, and unless you're a
full-time graphics programmer, you probably won't need all three. Considering
my general needs--quick reference, as opposed to in-depth analysis--the
Encyclopedia of Graphics File Formats is most useful. For more depth, I find
myself picking up Marv Luse's Bitmapped Graphics Programming in C++, which is
clearly written and straightforward in its presentation. 
That said, it is interesting that, when putting together this issue of DDJ and
needing background information on the IFF file format discussed by Glenn
Lewis, I ended up referring to Levine's Programming for Graphics Files in C
and C++. As you'd expect of a reference book, Murray and vanRyper give IFF
cursory coverage, while Luse, none at all. Levine, however, provides useful,
in-depth information on IFF bitmap images and the ILBM subformat.
Encyclopedia of Graphics File Formats
James D. Murray & William vanRyper O'Reilly & Associates, 650 pp., $59.95
ISBN 0-937175-058-9
Bitmapped Graphics Programming in C++
Marv Luse
Addison-Wesley, 705 pp., $37.95
ISBN 0-201-63209-8
Programming for Graphics Files in C and C++
John Levine
John Wiley & Sons, 494 pp., $49.95
ISBN 0-471-59856-9























July, 1994
SWAINE'S FLAMES


Fussy Logic


An author who sued over a negative review of one of his books demands to know
why publishers are now shunning him.A leading business publication asks in a
headline, "If mainframe computers are dinosaurs, why is IBM creating a new
generation?"
An astronomer threatens to sue a computer company whose engineers have
code-named a machine after him, and then is puzzled when they rename it BHA,
for "butt-head astronomer."
Are these dumb questions, or what? The answers, which should be obvious even
to a BHA, are:
1. Because they hate you.
2. Because they're stupid.
3. Because you are one.
If award-winning astronomers, editors of leading business publications, and
formerly publishable authors can ask such dumb questions, is it possible that
the overall quality of questions in circulation today is declining? Is there a
query crisis?
These were the thoughts running through my mind that morning in April as I
sorted press releases on the kitchen floor. Just then my cousin Corbett strode
in.
"Well, they ripped me off again," he announced, angrily tearing open a bag of
Doritos I'd been saving for lunch.
"Who ripped you off this time?" I sighed.
"Those physicists at Fermilab. They've found the top quark."
"I give up. How could the discovery of the top quark possibly have anything to
do with ripping you off, Corbett?"
"Because I predicted it," he mumbled, his mouth full of Doritos. "The minute
they discovered the bottom quark, I predicted that someone would someday find
a top quark. I had a team in Montpellier working on it." He frowned at my bins
of press releases. "You just put a Glossy Brochure in the White Office Paper
bin."
"Oops. Thanks. Listen, it's bad to use food to ease your depression. Put away
the Doritos and I'll tell you something that'll take your mind off your loss."
I filled him in on my theory about the decline in query quality. As I
expected, he claimed that he'd already been thinking about it.
"It's absolutely true," he said. "We're doing some research in that area at my
R&D company, Smoke&Mirrors. It's a very serious issue in software development,
with the current movement toward client-server models and distributed
computing and interapplication communication and the information highway and
so on."
"How's that?"
"Well, all these technologies involve queries, whether they're called that or
not. And our research shows that most people today don't know how to form
simple queries. SQL is beyond the reach of most people. They get AND and OR
mixed up."
"They don't know how to ask questions, eh? So what are you doing about the
situation? Educating people?"
"No, no. We're exploiting their ignorance. We're dumbing down software. For
example, we're developing a new kind of language to be embedded in electronic
agents that search for information on the Internet and in databases."
"How is that different from what General Magic is doing with TeleScript?"
"TeleScript creates smart agents. Our language creates dumb ones."
I snorted. "That's the stupidest answer I've ever heard."
"Well it's not my fault," he said, tossing the empty Doritos bag in my
Cellophane bin. "It was a dumb question."
Michael Swaineeditor-at-large





























July, 1994
OF INTEREST
The Gamelon file-access library from Menai is designed to enable programmers
to store program data as objects. The tool lets you create multidimensional
file structures, enabling object-based and cross-platform file access. It does
so by providing object nesting, "logical navigation" via logical object
navigation rather than file-pointer and offset manipulation, and "auto-object
tracking," which makes it possible to automatically track the location of data
regardless of changes in the structure of a file. 
Data objects created with the Gamelon library hold a single data value.
Aggregate objects then associate a number of data objects. These aggregate
objects can then be acted on as a single unit. Aggregate objects can also be
placed within other aggregate objects, allowing you to create and connect
nested objects.
In addition to the library and API, Gamelon includes a browser, a compiler
module that creates a Gamelon file from a text specification, a decompiler
module that creates a text spec from a Gamelon file, and a journal-recovery
application that rebuilds a Gamelon file from its journal. 
The first release of Gamelon is for OS/2 ($495.00) and Windows ($395.00).
Releases for UNIX, Macintosh, and NT will follow. Reader service no. 20.
Menai Corporation
1010 El Camino Real, Suite 370
Menlo Park, CA 94025-4335
415-853-6450
The WinRT Toolkit from BlueWater Systems is a software-development kit for
building Win32 real-time applications for Windows NT. The SDK allows you to
write programs to directly access port I/O, memory I/O, and interrupts without
having to deal with Microsoft's Windows NT Device Driver Kit (the NT DDK).
With the WinRT kit, you only need to deal with three API calls to write device
drivers, instead of coping with the 250+ calls required by the DDK. 
WinRT includes templates for developing real-time threads and connecting them
to Visual Basic, Visual C++, and other programming languages. It also includes
the WinRT Device Driver with royalty-free run time, the WinRT preprocessor,
DOS simulator, the NT Registry editor, and samples. It sells for $595.00 and
includes six months of technical support. Reader service no. 21.
BlueWater Systems
144 Railroad Ave., Suite 207
Edmonds, WA 98020
206-771-3610
DataFocus has introduced NuTCRACKER, a family of tools for developers who want
to port UNIX applications to Win32/Windows NT. NuTCRACKER enables developers
to recompile UNIX source code and link it to NuTCRACKER's DLLs, resulting in
native Win32 applications. NuTCRACKER, which includes a collection of UNIX
calls based on SVR4 and POSIX, supports both C and C++. After the application
has been ported, end users can run their UNIX applications in the Windows
environment without having to learn complex UNIX commands.
The NuTCRACKER family supports Solaris, HP-UX, AIX, SVR4, Ultrix, XPG4/POSIX,
and other source platforms. It includes the NuTCRACKER SDK, which in turn,
includes the NuTCRACKER API, based on UNIX SVR4 and POSIX, the MKS Toolkit,
and utilities such as KornShell, make, vi, awk, and over 100 more. Also
included are the NuTCRACKER X/Operating Environment, an X server that combined
with the NuT-CRACKER DLLs, runs X Window apps; and NuTCRACKER X/SDK, which
combines the SDK and X/OE products with libraries for porting X/Motif
applications. The NuTCRACKER SDK sells for $995.00, including demo programs
and online documentation. Reader service no. 22.
DataFocus Inc.
12450 Fair Lakes Circle
Fairfax, VA 22303-3831
703-631-6770
The PARTS visual development environment from Digitalk integrates three
existing Digitalk tools: Smalltalk/V, the PARTS Workbench, and Team/V.
Smalltalk/V is Digitalk's implementation of the Smalltalk programming
language, and the PARTS Workbench is a framework for integrating objects
(created with Smalltalk, C++, Cobol, or other languages) with SQL databases or
other apps. Team/V is a version-control and work-group
configuration-management system. PARTS, short for "Parts Assembly and Reuse
Tool Set," is an object-oriented technology for rapidly creating graphic
client/server applications. The PARTS environment allows programmers to move
between visual programming and lower-level component building with changes
managed by Team/V. The PARTS package sells for $5000.00. Reader service no.
23.
Digitalk Inc.
5 Hutton Center Drive, 11th Floor
Santa Ana, CA 92707
714-513-3000
Version 3 of the Network C Library for NetWare from ASCI is a collection of
almost 400 functions for writing NetWare-based programs and applications. The
Network C Library gives programs access to NetWare accounting services,
bindery management, connection services, console services, directory and file
management, file-server stats, locking services, message and IPX services, and
the like. The functions support NetWare 2.x/3.x/4.x. Version 3 includes DLLs
for writing Windows applications.
Included with the library are over 100 sample programs, various utilities,
peer-to-peer chat programs, and documentation. The Network C Library, which
supports Microsoft C/C++, QuickC, Visual C++, and Borland C/C++, sells for
$395.00. Source code is available for an additional $275.00. Reader service
no. 24.
ASCI
1150 Forest Run Drive
Batavia, OH 45103
513-753-6327
QA C, designed for software development and quality-assurance organizations,
has been released by ASTA. QA C analyzes source code and checks for over 800
different types of potential problems, including deviation from ANSI or
company-specific programming standards, use of nonportable coding practices,
and incompatibilities with C++. QA C then produces a variety of reports that
describe and rank the problems, highlighting the function and line in which
the problem was detected. In particular, QA C lets you check for ISO 9001
compliance, calculating and reporting on more than 40 different metrics. 
QA C is available for most UNIX implementations, including IBM, Sun, HP, and
DEC, as well as 386/486/Pentium-based PCs running SCO UNIX. The tool sells for
$9200.00. Reader service no. 25.
ASTA Inc.
1 Chestnut Street
Nashua, NH 03060
603-889-2230
FrontRunner from Phar Lap Software is a Windows desktop that adds Windows
features to your DOS shell and an enhanced DOS box to Windows. FrontRunner
includes an intuitive, DOS work environment integrated into a Windows shell so
that you can run Windows programs directly from the DOS prompt, scroll and
view your entire DOS screen history, and copy/paste/print any part of your DOS
session. 
The environment also provides an alternative to Program Manager that lets you
run programs from a customizable Launch bar or convenient Run menu. Additional
features are a powerful, programmable real-time Status Bar, new GUI Visual
Batch Language Extensions for DOS, and other utilities. The Visual Batch
Language Extensions for DOS let users create visual front ends for DOS batch
files. FrontRunner sells for $139.00. Reader service no. 26.
Phar Lap Software
60 Aberdeen Ave.
Cambridge, MA 02138
617-661-1510
NetManage, a vendor of TCP/IP for Windows tools, has announced the ONC RPC SDK
for creating RPC client/server applications on Windows NT. The SDK is based on
the ONC (Open Network Computing) RPC/XDR (Remote Procedure Call/External Data
Representation) industry standard. 
The NetManage ONC RPC SDK provides an RPCGEN protocol compiler, sample code,
and client and server support for NT. Applications developed with the RPC SDK
are compatible with the RPC DLL in NetManage's Chameleon32NFS, an NFS
client/server package for Windows NT.
The NetManage RPC DLL supports the Winsock interface so it can run on the
Microsoft TCP/IP stack that comes as a standard component of NT. Sample
applications are included in the package.
Chameleon32NFS is a suite of TCP/IP applications including Telnet terminal
emulation (VT100, VT220, TN3270) FTP, NewsReader, TFTP, Ping, Bind, Finger,
and WhoIs. NFS client/server functionality allows for file sharing and
transferring of data between Windows desktops and other network devices.
The ONC RPC/XDR development kit is priced at $500.00. Reader service no. 27.
NetManage Inc.
10725 North De Anza Blvd.
Cupertino, CA 95014
408-973-7171
Targeting its Indigo Magic user environment, Silicon Graphics has announced
the IRIS ViewKit, a C++ class library that provides an application framework
for program development. The framework is used in conjunction with C++ and the
OSF/Motif UI toolkit. The ViewKit class library, bundled with SGI's C++
compiler, sells for $1195.00. Reader service no. 28.
Silicon Graphics
2011 N. Shoreline Blvd.
Mountain View, CA 94043

415-390-1980
A hypertext development kit called HDK has been released by DEK Software. HDK
is designed to work in conjunction with Word for Windows to create hypertext
documents for Windows applications. It enhances the WinHelp display engine by
making it easier for users to locate information in very large documents. This
is accomplished by graphically representing chapters and subchapters, granting
users point-and-click access to information. Additionally, you can add pop-up
glossaries, animations, audio and graphical buttons, and the like. The
royalty-free tool sells for $345.00. Reader service no. 29.
DEK Software International
1843 The Woods II
Cherry Hill, NJ 08003
609-424-6565
ThesDB, a thesaurus database with over 100,000 synonyms and antonyms for over
20,000 words, has been released by Wintertree Software. Developers can
incorporate the database and run-time kernel into applications royalty free.
ThesDB is avail-able in source or binary and with an SDK. 
The Source package, which sells for $585.00, includes ANSI C source code, the
ThesDB software, the SDK in source form, source code for a sample app, and
documentation. The Binary package, selling for $299.00, is designed to be
incorporated into MS-DOS or Windows apps. It includes the ThesDB in binary
(compressed) form, a DOS object library and Windows DLL (interface for C/C++
and Visual Basic are provided), a reference for the ThesDB API in Windows Help
format, and sample apps in C, C++, and Visual Basic. (Since the API uses only
simple data types, it is suitable for macro and scripting languages that
support DLL access under Windows.) Finally, the Thesaurus Construction Kit
SDK, which sells for $299.00, supplements the Binary package, enabling you to
modify, enhance, or replace the thesaurus database. The SDK includes the
database in source form and a DOS utility for compiling the thesaurus
database.
Wintertree also offers a 100,000-word spell checker in similar packages.
Reader service no. 30.
Wintertree Software Inc. 
43 Rueter St.
Nepean, ON
Canada K2J 3Z9
613-825-6271
Virtuoso Nano, a microkernel for DSP systems, has been released by Intelligent
Systems International. The core nanokernel of the single-processor (SP)
implementation of Virtuoso Nano uses less than 200 instructions and provides
for sub-microsecond context switching, true multitasking, time events, and
interprocess communication via semaphore-based, stack-based,
linked-list-based, or FIFO-list-based channels. The multiprocessor extensions
(MP) add about 300 more instructions, while the virtual single-processor (VSP)
implementation provides what amounts to parallel processing in the system. The
VSP does this by implementing the communication as part of the kernel
services. The VSP uses less than 1000 instructions. 
Virtuoso Nano is available for the Texas Instruments 32040 and Motorola 96002
processors. Ports to Analog Devices' 21020 and 21060 are underway.
Development-kit prices start at $2000.00. Reader service no. 31.
Intelligent Systems International
Lindestraat 9
B-3210 Linden
Belgium
+32-16-62-15-85
Bronson, a binary file editor for UNIX, has been released by Cactus
International. The tool allows you to view and display any binary file or
hard-disk partition, then edit it in either ASCII or hexadecimal mode. Changes
are highlighted for easy reference. Additionally, there is a browse-only mode,
and the program can be run either from the command line or via menus. Bronson,
available for SCO, SVR4, and AIX 3.2, sells for $289.00. Reader service no.
32.
Cactus International Software
13987 W. Annapolis Ct.
Mt. Airy, MD 21771
301-829-1622
The CD-R Personal Archiver from Marcan is a recorder which makes it possible
for you to record and store information on CD-ROM. According to Marcan, the
CD-ROM recorder (CD-R) is the only currently available system that lets you
record and play back increments of data before finalizing the CD-ROM. As each
session is recorded, it is labeled. When all sessions are ready to be burned
into the CD-ROM, the Personal Archiver creates a table of contents. This
process addresses the current lack of ISO multisession standards for CD-ROMs.
The recorded data can be read by any standard CD-ROM drive.
The Personal Archiver supports track-at-once (that is, regular CD-ROMs),
incremental, and multisession recording and conforms to Hi-Sierra and ISO 9660
format standards. A half-height internal 5.25-inch CD-R sells for $4100.00 and
includes recording software and SCSI adapter. External versions are also
available. Versions of Personal Archiver are available for Windows, DOS,
Macintosh, and UNIX. Reader service no. 33.
Marcan Inc.
1020 108th Avenue NE, Suite 209
Bellevue, WA 98004
206-635-7477






























August, 1994
EDITORIAL


The Tax Man at Your (Information) Service


In a Jekyll-and-Hyde approach to small-business development, the state of
California is on one hand actively courting high-tech startups; on the other,
it's doing what it can to run them out of the state, if not into the ground.
And if you don't think so, ask Ruth Koolis. 
Koolis runs Information Sources, a small, Berkeley-based,
information-retrieval business that collects descriptive and evaluative
information about software tools and applications, then distributes this
database to Dialog, an online information provider. Abstracts of the data are
accessible through Knowledge Index, CompuServe, Easylink, and other services
via gateways to the Dialog system. In return, Information Sources receives
royalties commensurate to how many people access the database. 
Clearly, how much Koolis receives in royalties depends on the quality of the
information she delivers. The quality of Koolis's information must be high, or
else she wouldn't have been able to keep the doors open for the past decade.
All of this means little to the California Board of Equalization--aka the tax
board--which sees Information Sources (IS) less as an up-and-coming roadstop
on the information highway, and more as a cash cow ready for milking. At
least, that's what Koolis discovered when state auditors came knocking at her
office door, calculators in hand. By the time the bean counters left, Koolis
was facing the prospect of coughing up sales taxes, retroactive to January
1991, on royalty income from California-based Dialog. With the logic we've
come to expect from out-of-touch bureaucracies, one auditor told me that if IS
had delivered the data to Dialog over the phone lines via modem, no sales tax
would be required. But since Koolis handed over computer tapes, sales taxes
are applicable--as if the delivery vehicle, not the data, is what has value. 
With their myriad of overlapping, redundant, and sometimes contradictory
regulations, state sales-tax laws are akin to a briar patch--and a tangled one
at that. In wandering through the thicket, you learn that the auditors have a
number of shoes available and they're looking for one that fits. For instance,
one auditor told me that there's no property transfer when data is exchanged
online--it's the physical handing over of the tape that generates the
assessment. Another auditor contradicted this, saying that the distinction is
really between services performed and the sale of tangible personal
property--services aren't taxed, property sales are. In short, the transfer
medium really doesn't have much to do with taxability. 
To explain the difference between services and property transfer, California
tax auditors use the analogy of custom versus off-the-shelf software. If
someone writes a custom spreadsheet specifically for you (and only you),
you're not obligated to pay sales taxes. If you buy an off-the-shelf
spreadsheet package that's also available to other users, sales tax is in
order. In other words, said the auditor, if Information Sources had given the
data to only one provider, IS would have been providing a service. However, in
the tax board's view, since IS data is available over multiple providers via
electronic gateways, a sale occurred and IS has to pay up--even though IS
turned over one, and only one, customized version of the database to a single
client, Dialog.
But wait. Information Services really didn't "sell" anything in the sense that
one company (Dialog) paid another (IS) when the data was actually transferred.
No problem, says the state, there's a regulation for that, too. The tax board
lumps royalty arrangements such as that between IS and Dialog under a
regulation covering the "leasing" of tangible property. 
One of the more curious aspects of the Information Sources case is that this
isn't the first time the state has tried to put the squeeze on data providers.
Back in the early '80s, California tried to pump up tax revenues by placing
"data" and "programs" in the same category. The logic then was that since both
data and programs could be delivered on the same medium (tape), data and
programs must ultimately be one and the same. Information providers such as
Dialog balked, devising instead a means of shipping data tapes in and out of
state, thereby circumventing the confusing tax situation and eliminating jobs
in California. The situation was resolved in 1987, when the tax board waived
its right to levy state royalty taxes on federally copyrighted material. Now,
however, the state has done an about face, deciding that federally copyrighted
royalty income has always been taxable. The state is also trying to revive the
old chestnut that data is the same thing as a software application.
Of course, this distinction will likely disappear when (not if) California
catches up with states which already levy a sales tax on services. If nothing
else, the service sector is a growth industry, tax-wise. Services account for
51 percent of the U.S. gross domestic product, while goods are down to 40
percent. Employment figures reflect this. Nearly one million manufacturing
jobs disappeared in the 1978--82 timeframe, and more than 60 percent of the
jobs created in 1993 were service oriented, primarily in the temporary-help,
food-service, and health-care industries. To revenue-starved states, services
are an untapped resource, and, as regular as rumblings on the San Andreas
fault, California legislators routinely reintroduce the notion of applying
sales tax to services.
With all they've said about services, taxes, and royalties, there are several
things the auditors aren't saying. They aren't, for instance, telling Koolis
what her tax liability is--just that she'd better be ready to pay up. Nor will
the auditors say why Information Sources was chosen to be audited in the first
place. Finally, the auditors sidestep questions about whether or not this
signals a push by the state to cultivate a new electronic-data tax base.
One thing is apparent, however. The state's bullyboy tactics involve picking
on the smallest businesses with the fewest resources to resist the challenge.
Although California is loaded with deep-pocket, lawyer-laden data collectors
and software companies, the auditors have opted to blindside a small company
that can't afford a protracted fight. It's probably not the money that the
state wants from Information Sources; there isn't that much at stake. What the
state apparently hopes to gain is precedence--ammunition for taking aim at the
big boys further on down the road. 
Join us in welcoming Bruce Schneier to DDJ as a contributing editor. Bruce
will be editing and writing the "Algorithm Alley" column, taking over the helm
from Tom Swan who's opted for more time sailing, less time writing. If you'd
like to work with Bruce in sharing your algorithms with your fellow
programmers, contact him here at the magazine or at schneier@chinet.com.
Jonathan Ericksoneditor-in-chief









































August, 1994
LETTERS


More on Memory Management


Dear DDJ,
In "Rethinking Memory Management" by Arthur Applegate (DDJ, June 1994), there
was a discussion and example of overloading operator new on a per-class basis.
Applegate's purpose was to call a fixed-size memory allocator that would
efficiently allocate memory of size sizeof(MyClass). Using a fixed-size memory
allocator where the memory pool is only for memory allocations of a particular
size is faster than using a general-purpose allocator which may be called with
any size.
What the example did not show was that an overloaded operator new in a class
that calls a fixed-size allocator should test that the size being requested is
indeed equal to sizeof(MyClass). Why? Because if someone inherits from that
class, they could be reserving a fixed size of memory that is too small for
the inherited class.
If all the inherited classes also overload operator new, you can avoid the
problem. But if any inherited class forgets to include an operator new for
that size class, then you have a problem. Without it, the operator new of the
base class will be called, and the wrong amount of memory will be allocated.
Fixing the code in the base class is easy; see Example 1(a).
Even if no one is inheriting from MyClass now, someone might inherit from it
in the future, and memory corruption would occur. It's safest to include a
check of the requested size before doing the actual memory allocation, even if
inherited classes have their own fixed size new or if you don't intend to
inherit from the class.
Since operator new can call either a fixed-size memory allocator or a general-
purpose allocator, the delete routine must reflect that change. The operator
delete function has an optional, two-argument style, as in Example 1(b). Only
one operator delete may be declared for a single class. The proper operator
delete for MyClass becomes Example 1(c).
Steve Simpson
N. Billerica, Massachusetts 


Processor Perspective


Dear DDJ,
The article, "Computer Science and the Microprocessor" by Nick Treddenick
(DDJ, June 1993) was one of the best big-picture, see-the-forest-and-the-trees
articles I have had the pleasure of reading. In the text box, "CPU
Performance: Where Are We Headed?" Tredennick suggests that a
"reconfigurable-logic-unit" (RLU) built into future microprocessors might be
what's needed to maintain CPU speed increases after clock rates, parallel
pipes, and so on "hit the wall."
Why wait? The September 1991 Byte magazine carried an article describing a PC
add-on that carries reconfigurable logic chips that can be programmed (that
is, configured) by the host CPU, then run in the manner of a coprocessor. Just
as Nick describes, but with off-chip RLU. These cards are made by the Scottish
company Algotronix Ltd.
But why use reconfigurable logic just as a coprocessor? In the February 1991
issue of Electronics World + Wireless World, I proposed a computer
architecture, which I dubbed the "Impulse Matrix," based entirely on such
reconfigurable hardware. (Unfortunately, the newly appointed editor of that
magazine so butchered the article that much of the argument was lost--my
letter in the May 1991 issue of EW+WW attempts to clarify some of the
distortions.)
Briefly, the Impulse Matrix consists of a rectangular grid of cells, each
containing writable configuration bits together with some "active" logic
functions and "passive" wiring functions. Writing to the configuration bits of
a cell sets that cell to some logic function (that is, the North output is the
XOR of the East and West inputs) or, more simply, the cell is configured as a
wire (all inputs and outputs are disconnected--the simplest wire). Set the
configuration bits of many adjacent cells, and the computer takes on the
function of any conceivable digital circuit. If you have enough cells (even
1980s technology would allow millions of cells) you could have, for example,
an 80x86, a 680x0, a DSP, and a graphical engine in the four corners of the
matrix and the OS of your choice tying them together in the middle.
Some of the advantages of such an architecture are as follows:
1. The architecture resembles a normal computer with the CPU removed. There
would be a vestigial, extremely simple (1-bit?) CPU and ROM whose only
function is to boot the system on start-up/reset and provide some I/O buffers,
mass storage, power supplies, and such. But mostly, the computer would be row
upon row of simple, cheap "memory" chips and no complex, expensive CPU. The
economies of scale and scalability of required computer power are obvious.
2. Only about 0.1 percent of the silicon in today's computers does any
computing. A typical post-1960 computer is 99 percent idle memory and 1
percent CPU. But 90 percent of the CPU, the registers, microcode, and, these
days, the ever-larger caches, are also just idle memory. It is only the ALU
(or integer and FP pipes) that actually does any active date processing; hence
an efficiency of 0.1 percent. It is my guess that the matrix architecture
would be about 10 percent efficient. That is, 90 percent of the silicon would
be devoted to the configuration circuitry and 10 percent to the execution
circuit.
3. The matrix architecture is ideally suited to multitasking or parallel
computing. Rather than time slicing on a serial CPU, the matrix can be "space"
sliced into different functional blocks that operate together in real time.
There are other advantages, such as testability and fault tolerance;
unfortunately, there is not the space to cover these here.
There is also a catch.The catch is that you would be wasting your time if you
tried to program the matrix with a language (though many people will try). Low
level, high level, procedural, nonprocedural, object oriented, whatever:
Languages are just not up to the job of designing digital circuits. The
correct and tried and tested method of "programming" these circuits is via
circuit schematics.
The language-versus-schematic debate is the crux of the reconfigurable logic
debate. I must stress that this new architecture cannot be introduced simply
as new hardware. If the hardware is to be used successfully, then it must come
with programming tools, and to be at all workable, those tools must use some
sort of a schematic programming system.
For a long time I have been of the opinion that languages are not even well
suited to programming von Neumann computers. It has only been for the last 40
years, and only in the field of computer software, that any engineering design
has been done exclusively with languages as the medium. Every other engineer
uses either two-dimensional drawings or three-dimensional models.
The problem with languages is that they are essentially one-dimensional. That
is, you can only say or do one word or instruction at a time. The instructions
in today's programs are usually arranged one-dimensionally, one after another,
in an untidy column down the page--the ragged left margin supposedly giving
the program a "structure." A circuit schematic is 2-D, and it is understood
that you can read (or look at) any part of the schematic in any order. For a
language to make sense, you must start at the beginning and work steadily
through to the end (clearly, programs don't run steadily from BEGIN to
END--the main reason why languages are not suited to programing). The 2-D
"all-at-once" schematic has an inherent parallelism, making it suited to
parallel execution. One-dimensional languages are inherently sequential and,
as a result, have been a great hindrance to creating parallel applications.
Okay, but what about all those structured, modular, object-oriented,
high-level languages? Well, if you think about the adjectives used in the last
sentence, or those used to describe most modern languages, they can all be
used more aptly, to describe schematics. A high-level, modular, schematic
object is a box with "686 Engine" written on it.
So far, I have compared languages to old-fashioned paper schematics. On a VDU,
a schematic can be made to move, and the impulses can be shown propagating
along the wires. The schematic becomes 3-D, making it even more powerful. The
map becomes even more like the terrain.
I could go on and on but I think the best argument is history itself.
Take PCs, for instance. Over about 20 years, the hardware (designed using
schematics) has gone from 8-bit through 16-bit and 32-bit to 64-bit CPUs. Some
of these CPUs had bugs in them when first released, but invariably they were
bug free within a year.
So where is today's 64-bit software? Where is the 32-bit software? Well, there
is some 32-bit software around but mostly we still use mid-1980s, 16-bit
software. It's a slow business, this software business.
And bugs! Even the 16-bit software that has been through the code/test/debug
cycle for the past ten years is bug ridden. This just does not happen in the
hardware industry. You would lose your job if you couldn't debug your circuit
after the first few iterations.
The software-bug situation is so bad some programmers boast that their code is
so big and complex that it will inevitably have hundreds of bugs. Bugs have
become a sort of programming badge of merit. As an engineer, I find this
disgraceful. 
Erik Zapletal
Beaconsfield, Australia


Microkernels and Operating Systems 


Dear DDJ,
Congratulations on Peter Varhol's article (DDJ, May 1994) surveying the state
of operating-system technology. Although the overview of QNX provided some
very good detail, we'd like to expand on a few of his points.
First of all, Varhol argues that message passing represents more overhead than
a "function call" to invoke an OS service and therefore can slow down a
microkernel OS . The function call he's referring to here is really the
monolithic OS kernel call, which is much more complex than a simple function
call. In order to provide reentrancy, a monolithic kernel must implement
semaphores and preemption points, ultimately adding complexity. In contrast, a
microkernel OS moves system services off into other processes, so it can
provide a very clean and fast code path for implementing the message pass.
Once message passing becomes sufficiently fast, the overhead issue is
irrelevant, since the work requested dwarfs the time it took to make the
request (see "The Increasing Irrelevance of IPC Performance for
Microkernel-Based Operating Systems," by Brian N. Bershad, Proceedings of the
Usenix Workshop on Micro-Kernels & Other Kernel Architectures, Seattle, WA,
April 1992). Having implemented greater concurrency and reentrancy, a
microkernel OS can provide better overall performance.
A fast, message-passing OS also benefits client/server computing, where the
ability to perform fast IPC between communicating processes is essential.
We've taken the microkernel concept a step further and used QNX's fast message
passing to implement the first "microkernel GUI" for handheld and embedded
windowing environments.
Varhol states that "QNX claims to perform at least as well as traditional
architectures." Actually, we make an even bolder claim: QNX outperforms most
traditionally architected operating systems. To back this up, we invite
interested readers to ftp the file /pub/papers/qnx-paper.ps.z from
quics.qnx.com (198.231.78.1), which contains further details on the
microkernel/monolithic-kernel performance issue, including benchmark results.
Incidentally, the mention of Windows NT as a "microkernel operating system''
is surprising, given that Microsoft's Dave Cutler stated emphatically at the
1992 Usenix Workshop on Micro-Kernels & Other Kernel Architectures in Seattle
that Windows NT is not a microkernel OS!
Dan Hildebrand
QNX Software Systems Ltd.
Kanata, Ontario
Example 1: More on memory management.

(a) void* MyClass::operator new(size_t size)
 {
 if (size == sizeof(MyClass))
 {
 // allocate from fixed size memory pool
 }
 else
 {
 // call global new for inherited class of any size
 return ::operator new(size);
 }
(b) void operator delete (void*, size_t);
(c) void MyClass::operator delete(void *p, size_t size)
 {
 if (size == sizeof(MyClass))
 {
 // delete from fixed size memory pool
 }
 else
 {
 // call global delete for inherited class of any size
 ::operator delete(p, size);
 }
 }






































August, 1994
Numerical C and DSP


Numeric extensions for C programmers




Kevin Leary


Kevin is systems-engineering manager at Analog Devices. He can be contacted at
kevin.leary@analog.com.


Digital-signal processors have traditionally been difficult to program in
high-level languages such as C. In part, this is due to the fact that the
constructs for performing operations such as dot products (which are the same
as FIR filters) were not necessarily appropriate for DSP architectures. For
example, most programmers code loops in C by counting up like this: for
(i=0;i<N;i++). The index i is often used as an induction variable in the loop.
DSP chips, however, often have looping hardware that counts down, so the count
register is not readily available as an induction variable. This results in
source code that can adversely affect the compiler's optimization in loops.
This is significant since DSPs are intended to perform loops well. (In fact,
performance in loops is one of the key differentiators between DSP and RISC
processors.) 
Numerical C, a new high-level language built on the Free Software Foundation's
GNU C compiler (gcc), makes it easier to program DSP applications and related
mathematical algorithms. It is easy to use, and it also makes it easier to
produce better code for the target. 
Numerical C is a superset of ANSI-standard C. It differs from Standard C in
that the additional language constructs are geared to mathematical-programming
paradigms. These constructs enable the compiler to generate more-efficient
code by giving the compiler more information about the algorithm and by
enforcing a canonical form on the input program. 
The effort to define Numerical C began with the formation of the Numerical C
Extensions Group (NCEG) in March of 1989. This group quickly became part of
the ongoing C standardization efforts of ANSI X3J11 and officially operated as
ANSI X3J11.1. NCEG has been chartered by ANSI to standardize the math library
and suggest any changes to X3J11 in the area of high-speed numerical computing
for the next draft of the C language. 
In particular, NCEG is focusing on math libraries, floating-point math,
complex numbers, variable-length arrays (VLAs), and language extensions to
handle parallel and vector architectures. (For more information, see
"Numerical Extensions to C" by Robert Jervis, DDJ, August 1992.) The most
important extensions that have been added to gcc include operators, data
types, and iterators. In this article, I'll focus on complex numbers and
iterators using Analog Devices' implementation of Numerical C for the
ADSP-21060 SHARC DSP as an example. For more details on this processor, see
the accompanying text box, entitled "The ADSP-21060 SHARC DSP." Note, however,
that as part of the GNU releases, Numerical C is supported on SPARC, 386
(32-bit), and 486 (32-bit) platforms. As for other DSPs, Texas Instruments has
publicly committed to supporting Numerical C on its floating-point DSPs.


Complex Numbers


Numerical algorithms in general, and DSP applications in particular, often
rely on complex numbers and complex arithmetic as a basis for analysis.
Numerical C supports complex numbers implicitly in order to allow better
support for DSP applications. 
In Numerical C, complex numbers are supported as an aggregated pair of values:
One value represents the real component of the complex number, and the other
value represents the imaginary part of the number--the so-called "Cartesian
representation." Two primitives are used to access the real and imaginary
values of a complex number--creal and cimag, both of which follow the same
syntax and semantics as the sizeof operator in C. creal is used to extract the
real part of the complex number, and cimag, the imaginary part.
The type qualifier complex is used to construct a complex data type. By
combining complex with int, float, or double, you can construct a complex data
type with different base types. For example, complex float x declares x to be
a complex number in which its real and imaginary parts are of type float.
Likewise, the construct complex int ca[10] declares an array of complex
integers. 
In Numerical C, all operators behave as you would expect, regardless of
whether the operands are complex or real. The Numerical C compiler is
responsible for generating the best possible code for implementing the complex
operation on the target architecture.
Complex constants are constructed with the same syntax found in mathematical
notation. The i suffix notation is used to build complex lexical constants;
for example, 5+6i, 1i, and 3--.01i. The complex data type has been integrated
directly into the compiler's type system, and complex numbers will exist
cleanly with the normal, basic types in C. If you add a float to a complex
number, the float first gets promoted to a complex float, then added. For
instance, adding 5 to 6+4i would result in promoting 5 to 5+0i, and then
adding it to 6+4i, resulting in 11+4i. Example 1 illustrates this by computing
a "twiddle factor" for a discrete Fourier transform.


Iterators


In Standard C, arrays are declared and operated on explicitly--exact array
sizes must be known at declaration time, and every element of an array that is
to be operated on must be explicitly accessed. In Numerical C, on the other
hand, arrays are declared explicitly but operated on implicitly. A tool to
help this implicit operating on arrays is called an "iterator."
Iterators are represented syntactically as declarations. To declare an
iterator, you use the new iter keyword. Iterators need to be initialized when
they are declared. The initialization value of the iterator is the iterator's
upperbound. When an iterator is used in an expression, it causes the
expression's evaluation to be based on all integer values [0,N) for the
iterator. That is, the iterator itself takes on all the values in the set
0,N). Note that {[0,N) means 0,1,2_N--1 }. 
Example 2(a), for instance, loads the first ten elements of the vector A with
a ramp resulting in A[0]=0, A[1]=1, _ A[8]=8, A[9]=9. As you can see in this
example, all elements of the array A were explicitly declared and (with the
help of the iterator I) implicitly operated upon. This example demonstrates
two important benefits of Numerical C: 
Coding of an algorithm is much more succinct since the algorithm can be
abstracted above the element-wise level of analysis.
The compiler has more information about the actual function of the algorithm.
Consequently, the compiler can more reliably generate the best code for the
algorithm. In this particular case, the compiler can utilize the do loop
capability and automated address generation of the ADSP-21060.
For the sake of comparison, Example 2(b) is Standard C code that is
semantically equivalent to Example 2(a). 
When an expression contains an iterator, the expression becomes a vector
expression. Vector expressions can include piece-wise operations and reduction
operations.
Piecewise operations are the simplest operations. They typically involve use
of the Standard C operators to perform standard arithmetic operations on
arrays of data. Example 3(a), for instance, results in {Z[0]=X[0]*Y[0],
Z[1]=X[1]*Y[1], _, Z[9]=X[9]*Y[9]}. All Standard C operators work in this
piece-wise manner on vector expressions. 
Reductions are used to reduce a vector into a scalar quantity. The most common
operation is the sum reduction, which works the same as S does in standard
mathematical notation. sum adds up all the elements of the expression,
resulting in a scalar value that is the sum of the vector expression. To take
the dot product of X and Y, you reduce the vector expression X[I]*Y[I] to
scalar by summation, as in Example 3(b).


Order of Iteration and Multidimensions


Up to to this point, the iteration space has been one-dimensional. Still,
Numerical C is flexible enough to support a three-dimensional iteration space.
Given the matrices C, A, and B, where A, B, and C are m x n, p x n, and m x p
matrices respectively, the algorithm in Example 4(a) would be coded with
iterators in Numerical C as Example 4(b).
Expressions that involve more-complicated iterators and require explicit
control of the order of iteration use the for(<iter>) construct. Example 5(a)
is the grammar extension for this. For instance, Example 5(b) is semantically
identical to Example 5(c), while Example 5(d) is semantically identical to
Example 5(e).


Other Extensions


Numerical C extensions included in the Analog Devices implementation provide
support for dual data-storage architectures. Most DSP chips are architected
with a Harvard or modified Harvard architecture. These architectures support
two or three memory spaces for the simultaneous fetching of up to two data
operands and an instruction. Numerical C allows you to specify in which of two
memory spaces data is to be located. In the case of the ADSP-21060, these
spaces are known as pm and dm; as such, the ADI implementation of Numerical C
supports pm and dm type qualifiers, allowing you full control over which
memory data resides in. 
Since Numerical C is based on the gcc core, it has inherited many other
extensions to Standard C. These include support for variable-length arrays and
run-time initializers for aggregates.



A Neural-Net Example


One Numerical C application we've written is a backprop neural-network system
based on Neural Networks: Algorithms, Applications, and Programming
Techniques, by James Freeman and David Skapura. If you're unfamiliar with
neural-net concepts, refer to Freeman and Skapura's book, as well as any
number of DDJ articles, among them "Neural Networks and Character
Recognition," by Ken Karnofsky (June 1993), "A Neural-Network Audio
Synthesizer," by Thorson et al. (February 1993), and "Bidirectional
Associative Memory Systems in C++," by Adam Blum (April 1990).
The network in Listing One is a feed-forward network, where all signals move
forward from the input to the outputs to simulate a back-propagation network.
In simple terms, you'll find in Listing One two arrays of coefficients
modified by a training algorithm based on minimizing the LMS error. Training
is the crux of the entire algorithm (and where the network gets the name,
"back propagation") because we compute the error at the output layer and then
adjust the weights on the output layer. Then we take some weighted multiple of
the error and propagate the error back to the hidden layer. Since the code is
heavily commented, I won't go into detail about how the network works. Suffice
it to say, that this neural net, implemented for a DSP chip, is an example of
how you can best use Numerical C.


Performance


The most important reason for using Numerical C is performance. To examine
this issue, I'll briefly present an assembly-language code fragment generated
by the ADSP-21060 implementation of Numerical C for a dot product. 
Example 6(a) is the Numerical C source code used to generate the ADSP-21060
assembly-language code in Example 6(b). In this example, the most important
features are the use of the chip's do loop instruction, the single-cycle inner
loop, and the dual-memory fetch capability. Note that all ADSP-21060
instructions execute in a single cycle at 25 ns. Also note the algebraic
syntax of the ADSP-21060 assembly language.


Conclusions


Numerical C makes writing application code simpler, and the code produced
using the Numerical C compiler is more efficient than the code produced by
standard C compilers.
Numerical C is now available as a part of gcc as a publicly available compiler
covered by the ENU Software licence agreement which encourages the
dissemination of source code. Analog Devices encourages the use of Numerical C
on any platform and specifically encourages the porting of Numerical C to
other DSP chips.
The ADSP-21060 SHARC DSP
The ADSP-21060 Super Harvard Architecture (SHARC) digital-signal processor is
a 32/40-bit IEEE floating-point system on a chip offering 40-MIPS performance
with a 25-ns instruction rate and single-cycle instruction execution. In
particular, the processor and its instruction set are designed to support
efficient C code generation. Through the Harvard architecture and three
independent computation units, the processor can achieve 120-MFLOPS peak
performance without an arithmetic pipeline. The 4-Mbit on-chip SRAM memory
allows real-time signal processing without the off-chip data bottlenecks
typically found in data-intensive systems. Dual data-address generators with
indirect, immediate, modulo, and bit-reverse addressing and efficient program
sequencing with zero-overhead looping contribute to sustaining the
ADSP-21060's throughput. 
As Figure 1 shows, the core of the ADSP-21060 includes:
Independent, parallel computation units. The ADSP-21060 contains three
independent, parallel computation units--the ALU, multiplier, and
shifter--that perform single-cycle instructions. The units are arranged in
parallel, maximizing computational throughput. Single, multifunction
instructions execute parallel ALU and multiplier operations. These computation
units support the following data formats: IEEE 32-bit, single-precision
floating point; extended-precision, 40-bit floating point; and 32-bit fixed
point. 
Data-register file. This is a general-purpose register file used for
transferring data between the computation units and the data buses, and for
storing intermediate results. This 10-port, 32-register (16 primary, 16
secondary) register file, combined with the on-chip Harvard architecture,
allows unconstrained data flow between computation units and internal memory.
Single-cycle fetch of instruction and two operands. The data memory (DM) bus
transfers data and the program memory (PM) bus transfers both instructions and
data. 
High-performance instruction cache. This enables three-bus operation for
fetching one instruction and two data values. The cache is selective: Only the
instructions whose fetches conflict with PM bus data accesses are cached. This
allows full-speed execution of core, looped operations such as digital-filter
multiply accumulates and FFT butterfly processing.
Two data-address generators (DAGs) with hardware circular buffers. Circular
buffers allow efficient implementation of delay lines and other data
structures required in digital-signal processing. They are commonly used in
digital filters and Fourier transforms. The two DAGs contain sufficient
registers to allow the creation of up to 32 circular buffers (16 primary
register sets, 16 secondary). The DAGs automatically handle address-pointer
wraparound, thus reducing overhead, increasing performance, and simplifying
implementation. Circular buffers can start and end at any memory location.
Flexible instruction set. The 48-bit instruction word accommodates a variety
of parallel operations, for concise programming. For example, the ADSP-21060
can conditionally execute a multiply, add, subtract, and branch in a single
instruction.
The ADSP-21060 takes advantage of its Super Harvard Architecture to yield high
performance for signal processing applications. Table 1 provides a sampling of
benchmarks. 
The processor contains 4 Mbits of dual-ported internal SRAM, organized as two
blocks of 2 Mbits. Each block is dual-ported for single-cycle, independent
accesses by the core processor and I/O processor or DMA controller (the 21060
achieves 240-Mbyte DMA transfer rate). The dual-ported memory and separate
on-chip buses allow two data transfers from the core and one from I/O
processor, all in a single cycle. The memory can be configured as a maximum of
128K words of 32-bit data, 256K words of 16-bit data, 80K words of 48-bit
instructions (and 40-bit data), or combinations of different word sizes up to
4 Mbits. For example, each block can be organized into 64K 32-bit words for
data or into 40K 48-bit words to support the ADSP-21060's 48-bit instruction
word. These two-word types can coexist in the same memory block. All memory
can be accessed as 16-, 32-, or 48-bit words. 
While each 2-Mbit block can store combinations of code and data, on-chip
memory accesses are most efficient when one block stores data using the DM bus
for transfers and the other block stores instructions and data, using the PM
bus for transfers. Using the DM bus and PM bus with one dedicated to each
memory block assures single-cycle execution with two data transfers. In this
case, the instruction must be available in the cache. Single-cycle execution
is also maintained when one of the data operands is transferred to or from
off-chip, via the ADSP-21060's external port.
--K.L.
Figure 1 ADSP-21060 block diagram.
Example 1: Computation for Fourier-transform twiddle factor.
complex float Wn;
Wn = cexp (2.0*PI*1.0i/N);
Example 2: (a) is the semantic equivalent of (b).
(a) int A[10];
 iter I = 10;
 A[I] = I;

(b) int A[10];
 int i;
 for (i=0; i<10; i++)
 A[i] = i;
Example 3: (a) Piece-wise operations; (b) reducing vector expressions.
(a) float Z[10], X[10], Y[10];
 iter I = 10;
 Z[I] = X[I]*Y[I];

(b) sum(X[I]*Y[I]);
Example 4 (a) would be coded with iterators in Numerical C as (b).
Example 5: (a) is the grammar extension for the for(<iter) construct; (b) is
semantically identical to (c); (d) is semantically identical to (e).
(a) for (<iterator-variable>)
 statement;

(b) float A[m][m], B[m];

 iter I=m, J=m;
 for (I) B[J] = B[I] + A[J][I]*I;

(c) float A[m][m], B[m];
 int i, j;
 for (i=0; i<m; i++)
 for (j=0; j<m; j++)
 B[j] = B[i] + A[j][i]*i;

(d) float A[m][m], B[m];
 iter I=m, J=m;
 for (J) B[J] = B[I] + A[J][I]*I;

(e) float A[m][m], B[m];
 int i, j;
 for (j=0; j<m; j++)
 for (i=0; i<m; i++)
 B[j] = B[i] + A[j][i]*i;
Example 6: (a) is the Numerical C source code used to generate the ADSP-21060
assembly-language code in (b).
(a) int fir, A[10];
 pm int B[10];
 iter I=10;
 fir = sum (A[I]*B[I]);.

(b) r4=dm(i4,m6), r2=pm(i12,m14);
 lcntr = 9, do (pc,1) until lce;
 mrf=mrf+r4*r2(ssi), r4=dm(i4,m6), r2=pm(i12,m14);
 mrf=mrf+r4*r2(ssi);
Table 1: ADSP-21060 benchmarks (@ 40 MHz).
Operation Time Cycles 
1024-Pt. Complex FFT 0.46 msec 18,221
FIR Filter (per tap) 25 ns 1
IIR Filter (per biquad) 100 ns 4
Divide (y/x) 150 ns 6
Inverse Square Root
 (1/sqrt(x)) 225 ns 9

Listing One 

#include <stdio.h>
#include <math.h>
#include <dspc.h>

static inline float sigmoid (float x){ return 1.f/(1.f+expf(-x));}
/* Simulating a backpropogation network: a feed-forward network. All signals 
move foward from the input to the outputs. xp ip op the 'p' is used to denote
the pth exemplar. Multiply 'xp' our input vector by wts on the hidden layer.
 * * * * * * * input layer.
 \ /
 + + + + + + hidden layer.
 / \
 o o o o o o o output layer 
 ip output of the hidden layer. op output of the output layer.
The matrix wts_h is used to map the input layer to the hidden layer, each node
in the input layer has a weight from that node to every node in hidden layer.
An zero in the weight matrix implies there is no connection. So the ip vector 
or the output of the input layer.
 h
 ip = SUM xp * wts

 i j i ji
 o
 op = SUM ip * wts
 k j j kj the sigmoid function is used as a thresholding 
function on the output signal of the layer.*/
void
bpn_simulate(int nni, int nho, int nno, float xp[nni], float ip[nho], 
 float op[nno], float pm wts_h[nho][nni], float pm wts_o[nno][nho])
{
 iter i = nni, j = nho, k = nno;
 ip[j] = sigmoid (sum (xp[i]*wts_h[j][i]));
 op[k] = sigmoid (sum (ip[j]*wts_o[k][j]));
}
/* Training is the crux of the algorithim and where network got its name--
Backpropogation. Compute the error at the output layer and adjust the weights
on the output layer. Take some weighted multiple of the error and propogate 
error back to hidden layer. Given xp the input vector, yp the expected output 
vector, and a set of weights wts_h, wts_o, the control variable eta is used to
control how fast algorithim learns. erro = yp-op is the difference of what we 
expected and what we got. Now we find the gradiant of the error surface and 
build delta_o and delta_h from delta_o. Modify the weights with delta_h and 
delta_o. Return half of the gradiant squared to be used by a control algorithm
to determine how much more we need to train. */
float
bpn_train(int nni, int nho, int nno, float xp[nni], float yp[nno],
 float pm wts_h[nho][nni], float pm wts_o[nno][nho], float eta)
{
 iter i = nni, j = nho, k = nno;
 float s;
 float ip[nho], op[nno], delta_h[nho], delta_o[nno];
 /* simulate xp to get ip, op */
 ip[j] = sigmoid (sum (xp[i]*wts_h[j][i]));
 op[k] = sigmoid (sum (ip[j]*wts_o[k][j]));
 /* error * derivitive of 1/(1+exp(op)) {simplifed} */
 delta_o[k] = (yp[k] - op[k]) * op[k] * (1 - op[k]);
 delta_h[j] = ip[j] * (1 - ip[j]) * sum (delta_o[k] * wts_o[k][j]);
 /* adjust the weights on the output layer. */
 wts_o[k][j] += eta * delta_o[k] * ip[j];
 /* adjust the weights on the hidden layer */
 wts_h[j][i] += eta * delta_h[j] * xp[i];
 /* return half of the gradiant squared d2 */
 return .5f * sum (delta_o[k] * delta_o[k]);
}
/* Initialize the weights to random values between 0 1. not too important */
void
bpn_wts_init(int nni, int nho, int nno,
 float pm wts_h[nho][nni], float pm wts_o[nno][nho])
{
 float op[nno],ip[nho];
 for (p) { 
 bpn_simulate (nni,nho,nno,exemplars[p],ip,op,wts_h,wts_o);
#if 0
 printf("{"); printf (" %f,",exemplars[p][i]); printf("}\n"); 
 printf("{"); printf (" %f,",desired[p][j]); printf("}\n"); 
 printf("{"); printf (" %f,",op[j]); printf("}\n"); 
 printf ("------------\n"); 
#endif
 }
}

void
bpn_training_loop(int nni, int nho, int nno,int nex, float
exemplars[nex][nni],
 float outputs[nex][nno],float pm wts_h[nho][nni], float pm wts_o[nno][nho],
 float eta,float epsilon,int printp)
{
 float error = 100;
 int iteration=0;
 iter p = nex;
 while (error > epsilon)
 {
 error = 0;
 for (p) 
 error += bpn_train (nni, nho, nno,exemplars[p], outputs[p], wts_h, 
 wts_o, eta);
 error /= nex;
 if (printp)
 printf ("%5d: err=%10lf\n",iteration,error);
 iteration++;
 if (printp && (iteration % printp) == 0)
 bpn_simulate_set (nni, nho, nno, nex, exemplars, outputs, wts_h, wts_o);
 }
}








































August, 1994
Migrating C Code to Unicode


One code base, two character-encoding schemes




Timothy D. Nestved


Tim is the principal architect of Unicode and National Language support issues
for software applications and third-party libraries. He can be reached at P.O.
Box 540754, Orlando, FL 21854-0754 or at ind00126@pegasus.cc.ucf.edu.


To compete in the global software market, application software must provide
national language support (NLS), thereby accommodating any country's locale
conventions, culture, and written language. Of the three, language support is
the most costly and time-consuming to implement. One encoding scheme that
supports all modern written languages of the world would be ideal--and that's
precisely what Unicode provides. What this means to you is that almost any
written language can be supported using Unicode, eliminating the need for
specialized algorithms to process various character-encoding schemes, such as
double-byte character sets (DBCS). In addition to providing characters and
symbols--including East Asian ideographic characters--Unicode also provides
abundant space for future expansion. 
In this article, I'll describe the process of migrating existing C source code
from ANSI to Unicode, independent of any operating system, compiler, or API.
This process is based on work I was involved in when migrating Windows NT's
built-in tape-backup program (NTBACKUP.EXE) for Conner Software. Because of
limited encoding space, the NT WIN32 OEM character set doesn't support all the
written languages of the Americas, Europe, Asia, and the Middle East.
Consequently, Microsoft adopted Unicode as NT's character-encoding scheme,
requiring that programs such as NTBACKUP.EXE also support Unicode. What this
ultimately means is that one code base must support two character-encoding
schemes: ANSI and Unicode. Both ANSI and Unicode applications can be created
from the same source code by simply setting a compilation variable. 


Character or Wide Character?


To computers, a character is a unit of encoding with an associated code or
value. Therefore, the encoding size of a character is determined by the size
of the character's data type, and the data type is determined by the number of
unique codes required to represent all the characters for a particular
encoding scheme. For ANSI, a character's size is one byte (28=256 chars), for
Unicode it's two bytes (216=65,536 chars), and for other encoding schemes,
anywhere from one to four bytes. A wide character is still a character, but
it's a character with an encoding size greater than one byte. In the case of
Unicode, a wide character's encoding size is two bytes. 


Mapping at the Boundary


Mapping is the process of converting a non-Unicode character-code point (a
specific character's code or value) to its Unicode equivalent, and vice versa.
A boundary exists wherever mapping is required. Input, output, and displaying
(rendering) are examples of mapping boundaries; see Figure 1. Mapping at the
boundary is useful for resolving Unicode implementation issues, such as
rendering and processing. For example, Novell's NetWare 4.0 uses Unicode
because of the benefits of a universal data repertoire. User information for
NetWare is entered and rendered using the default code page (cp) for the
workstation (cp437 is the United States/English code page). Once entered, the
information is mapped to and stored as Unicode in NetWare's new Directory
Services component. 
By mapping at the boundary, NetWare resolves the issue of displaying Unicode
text and providing Unicode I/O routines for workstations that don't support
Unicode. That is, information entered using the default code page is mapped to
Unicode; before rendering any Unicode data, it is mapped back to the default
code page. 
The following steps explain how to migrate existing code to Unicode without
creating separate code bases. The migration process is divided into two
stages: Stage one involves header-file modifications, while stage two consists
of source-code modifications. For this article, all ANSI examples use cp437
and precomposed characters. That is, the letter '' is a single character and
not composed of multiple characters such as 'u+'. 


Stage One: Header-File Modifications


Stage one consists entirely of header-file modifications. Proper migration
allows for both ANSI and Unicode support using the same source-code base by
defining generic text-data types, macros, and prototypes in header files for
use in the source code. Use the compile-time, command-line symbolic constant
/DUNICODE (for example, cl /DUNICODE_ for Microsoft C) to force a Unicode
compilation. If the header files are properly constructed, the compilation
mode is completely transparent when viewing the source code. 
Step 1. The first step in stage one is to create explicit data types. Explicit
text-data types are typedefed as CHAR (one byte) and UNICHAR (two bytes) for
ANSI and Unicode, respectively. For completeness, explicit pointer data types
P_CHAR and P_UNICHAR are also typedefed. (You may use the standard C data type
wchar_t in place of unsigned short for UNICHAR, but verify that wchar_t's type
is properly defined by your compiler. Earlier versions of some compilers
defined wchar_t to be of type unsigned char.) 
Step 2. The second step involves the creation of generic text-data types TCHAR
and P_TCHAR, using the explicit typedefs created in Step 1. To support both
ANSI and Unicode in a single code base, generic text-data types are essential.
The compile-time identifier UNICODE determines the explicit data type to
typedef as TCHAR and P_TCHAR (for example, CHAR for ANSI or UNICHAR for
Unicode). If UNICODE is not defined, TCHAR is typedefed as CHAR; if UNICODE is
defined, TCHAR is typedefed as UNICHAR. Converting explicit data types into
generic types at compile time makes support for multiple fixed-width encoding
schemes transparent and easy to configure. 
Use an explicit data type whenever a specific text-data type is required,
independent of the compilation mode. For example, mapUnic2Ansi() maps Unicode
characters to ANSI using two explicit parameters independent of the
compilation mode: the Unicode source parameter type P_UNICHAR and the ANSI
destination parameter type P_CHAR. If the parameter P_CHAR is defined as any
other type, a data-type inconsistency is generated. On the other hand,
encryptString() accepts a string using the generic-parameter data type
P_TCHAR, because the string type is based on the compilation mode. (Listing
One includes the function prototypes mapUnic2Ansi() and encryptString().) 
It is important not to confuse or interchange byte data (data buffers) with
text or character data. Declare nontext data (or byte data) using the byte
data types BYTE and P_BYTE. As a rule, byte data should never be processed by
text routines. If strcpy() is used on byte data, replace it with either
memcpy() or memmove(). The proper use of data types greatly enhances
source-code readability and eliminates potential semantic errors. 
Step 3. Next, a text macro is created to convert character constants and
string literals into their proper types based on the compilation mode. Most C
compilers use the syntax L'c' for a wide-text character and L"string" for a
wide-text string. The text macro TEXT() substitutes the proper generic literal
constant syntax during compilation (for instance, TEXT("string") becomes
"string" for ANSI and L"string" for Unicode). Remember, use explicit literal
constants whenever the text-data type is independent of the compilation mode. 
Step 4. The final step in stage one is to create explicit and generic function
prototypes. Each text function requires an ANSI and Unicode explicit prototype
and a generic prototype. The explicit function-naming convention ends ANSI
function names with an A and Unicode function names with a U (Microsoft uses W
for wide). For example, strlenA() is ANSI and strlenU() is Unicode. The
generic function-naming convention replaces str with txt (Microsoft uses wcs
for wide-character string), so strlen() would become txtlen(), strncpy() would
become txtncpy(), and so on. Using the string-length example again, if UNICODE
is not defined, txtlen() is defined as strlenA(); if UNICODE is defined,
txtlen() is defined as strlenU(). strlenA() is a redefinition of the
standard-library function strlen(). (See Listing One for examples of defining
ANSI, Unicode, and generic prototypes for several string functions.) All
generic prototypes are resolved and defined at compile time. 
When compiling for ANSI, do not inadvertently create duplicate standard
C-library functions. Duplicates are generally created by improperly defining
the data types for function declarations that have standard library
equivalents. For example, a Unicode string-length routine uniStrLen(P_TCHAR
pStr) is created with one text-string parameter using the generic data type
P_TCHAR. When compiling for ANSI, P_TCHAR becomes CHAR; therefore, uniStrLen()
and the standard-library function string length strlen() are duplicates,
although their specific characteristics may differ slightly. To avoid creating
duplicate functions, never use generic data types for explicit functions and
don't create generic prototypes using explicit data types. I also recommend
that you redefine all standard-library text functions that you use with the
ANSI-explicit function names (that is, strlenA()). This should eliminate all
standard-library text function names from your source code. Again, this is for
readability and completeness. 


Stage Two: Source-Code Modifications


Stage two involves modifications to the source code and represents the major
effort in the migration process. The conversion of data types, function names,
and text-pointer arithmetic--plus the creation of special routines--is the
major task of stage two. Avoid the temptation of adding conditional
compilation directives for Unicode (such as #ifdef UNICODE) in your source
code. The creation of additional macros, such as the TEXT() macro, may be
helpful in eliminating unnecessary #ifdefs. (Listing Seven shows how the
TEXT() macro eliminates conditional directives.)
(Asmus Freytag, Microsoft's Unicode-implementation architect, provided some
conversion statistics for migrating over one million lines of Windows code to
support Unicode. He found that approximately 10 percent of the code could be
modified using a search-and-replace mechanism, 5 percent required an
inspection of the code's intent before modifications could be made, and less
than 1 percent required changes in algorithms. Freytag's statistics should be
a good sample population for determining the amount of work your application
may require.) 
Step 1. The first step in stage two is to compile all of your source code and
save the error/warning messages generated by your compiler. The errors and
warnings are a result of changing the function prototypes in stage one, and
they assist in completing the tasks in stage two. Compiling the source code
also validates that your header files are free of any syntax errors. Most
modifications required in stage two are identified using the compiler's
error/warning reports.
Step 2. Next, all text function definitions and function calls in the source
files must be updated with the appropriate explicit/generic function name,
return data type, and parameter data types. Use the compiler's error/warning
reports and the function prototypes declared during the function-prototype
step in stage one to complete this step. Remember, duplicate routines may be
created if text function definitions are not converted properly. 
Step 3. Once a function's definition is updated, various data types within the
function's body may require modification. For example, local variables may
need to be modified because the function's parameter-data types have changed.
Another possibility is that the return value for the routine has changed, or
maybe function calls within the body of a routine require different return or
parameter types. Regardless of the type of changes, function-body
modifications can be made later or when a function's definition is modified. I
recommend that you first make all the function-definition modifications, then
recompile the source code, saving the error/warning messages that are
generated. The new error/warning report will include the function body
modifications that resulted from modifying the function definitions. 
Step 4. Another important aspect of migrating ANSI-based source code to
Unicode is pointer arithmetic. Whenever the number of actual bytes in a text
string is required, simple pointer arithmetic is no longer valid. For an ANSI
string, every character is one byte, so pStrPos2--pStrPos1 is the number of
bytes from pStrPos1 to pStrPos2. Since Unicode consists of two bytes per
character, Unicode pointer arithmetic will calculate characters based on the
data type UNICHAR, not the actual number of bytes. Therefore, pointer
arithmetic may need some minor modifications, so avoid using a
search-and-replace mechanism for making pointer-arithmetic modifications. 
If two generic text variables (of type P_TCHAR) contain valid addresses within
a string, the total number of bytes (byte count) is calculated using the
formula ((existing calculation)*sizeof(TCHAR)). Likewise, to determine the
actual number of characters given a byte count, use the formula ((existing
calculation)/sizeof(TCHAR)). It is important that you never have an odd byte
count when calculating the number of Unicode characters. By using
sizeof(TCHAR) in the calculations, the generic text arithmetic instructions
are correct, regardless of the compilation mode. It may be useful to create
two new generic macros, CALC_BYTE2CHAR(exp) and CALC_CHAR2BYTE(exp) for
readability and consistency. 
Step 5. The final step in stage two is the creation of any mapping routines
that may be required at I/O and rendering boundaries. These routines are
generally new code and deal strictly with converting Unicode to and from
various code pages. The mapAnsi2Unic() function (Listing Two) is a basic
algorithm for mapping ANSI cp437 data to Unicode. 
To demonstrate all of the modifications described in this article, I created a
before (Listing Three) and after (Listing Four) set of listings. The code was
created simply to give as many examples as possible. The source file UCS.C
(Listing Two) contains several Unicode string functions and a basic mapping
function to map ANSI cp437 data to Unicode. Because Unicode's first 127 (0x7F)
characters are the same as ASCII, the mapping of standard ASCII characters is
simply a data-type cast from CHAR to UNICHAR. The extended ASCII
characters--those from the base value 0x80 (ANSIEXTCHBASE) to 0xFF--have
different Unicode code values and require more than a simple data-type cast.
Therefore, the function mapExtCh2Unic() contains an array extChArray[] for
mapping ANSI-extended characters to Unicode. The ANSI-to-Unicode values were
taken from the "Latin PC Code Page Mappings" table supplied with the Unicode
standard. 

Listing Three provides source code before migration, and Listing Four shows
the same code after migration. For completeness, Listing Five displays the
ANSI output, and Listing Six displays Unicode. You will notice the Unicode
output displays an empty string for the token. This is because the string is
Unicode and must be converted to ANSI for the standard C-library functions to
work properly. This is an example of where a mapping at the boundary function
is required. The token's length of 14 is correct because a Unicode version of
string length was used to calculate its length. I added an explicit Unicode
function to Listing Four named ExplicitUnicode().


Conclusion


ANSI-based applications can be migrated to Unicode following the guidelines
presented here. Once the migration is complete, you should be able to create
both ANSI- and Unicode-based applications from the same source code.
Initially, a Unicode application doesn't have to read, write, or render
Unicode data. Routines that map at the boundary may be added to supplement any
of the I/O and rendering shortcomings associated with implementing Unicode on
operating systems that do not completely support Unicode--yet. Similarly,
third-party libraries may also be used with Unicode-compliant applications if
mapping routines are used. 
Figure 1 Application mapping boundaries for varying character sets.

Listing One
/** Name: UNICODE.h
 Desc: Contains both explicit/generic data types, macros, and function
 prototypes. Stage 1 modifications. ANSI compilation: default
 UNICODE compilation: use /DUNICODE. Hungarian notation is not used
**/
#ifndef _unicode_h_
#define _unicode_h_
/* data type definitions */
/** CHAR and BYTE may already be #define'd or typedef'd in the compiler's 
standard include files, so a conditional check may need to be added to avoid 
redefinitions, errors and warnings. **/
// explicit types
typedef char CHAR ; // standard char 
typedef CHAR * P_CHAR ;
typedef unsigned short UNICHAR ; // UNICODE explicit data types
typedef UNICHAR * P_UNICHAR ;
typedef unsigned char BYTE ; // data buffer data types
typedef BYTE * P_BYTE ;
// text character generic types 
#if defined( UNICODE )
 typedef UNICHAR TCHAR ; // generic data types (really Unicode)
 typedef TCHAR * P_TCHAR ;
#else
 typedef CHAR TCHAR ; // generic data types (really ANSI)
 typedef TCHAR * P_TCHAR ;
#endif
/* macros */
#if defined( UNICODE )
# define TEXT(literal) L##literal // wide literal constant L'c' L"str"
#else
# define TEXT(literal) literal // literal constant 'c' "str"
#endif
#define CALC_CHAR2BYTES(exp) ( (exp) * sizeof(TCHAR) ) // n chars -> n bytes 
#define CALC_BYTE2CHARS(exp) ( (exp) / sizeof(TCHAR) ) // n bytes -> n chars
/* function prototypes */
#if defined( UNICODE )
void mapAnsi2Unic( P_CHAR pAnsiStr, P_UNICHAR pUnicStr ) ;
P_UNICHAR ucschr( P_UNICHAR pUCS, UNICHAR token ) ;
P_UNICHAR ucscpy( P_UNICHAR pDst, P_UNICHAR pSrc ) ;
int ucslen( P_UNICHAR pUCS ) ;
void mapAnsi2Unic( P_CHAR pAnsiStr, P_UNICHAR pUnicStr ) ;
UNICHAR mapExtCh2Unic( CHAR ansiChar ) ;
#endif
/** The function prototypes should be placed in the appropriately named header
file. Always define the ANSI and Unicode explicit set first, then the generic
set using the explicit names previously defined. **/
#define strcmpA strcmp // ANSI explicit string compare
#define strcpyA strcpy // ANSI explicit string copy
#define strlenA strlen // ANSI explicit string length

#define strchrA strchr // ANSI explicit
#define strcmpU ucscmp // UNICODE explicit string compare
#define strcpyU ucscpy // UNICODE explicit string copy
#define strlenU ucslen // UNICODE explicit string length
#define strchrU ucschr // UNICODE explicit
#if defined( UNICODE )
# define txtcmp strcmpU // generic string compare (really UNICODE)
# define txtcpy strcpyU // generic string copy (really UNICODE)
# define txtlen strlenU // generic string length (really UNICODE)
# define txtchr strchrU // generic string (really UNICODE)
#else
# define txtcmp strcmpA // generic string compare (really ANSI)
# define txtcpy strcpyA // generic string copy (really ANSI)
# define txtlen strlenA // generic string length (really ANSI)
# define txtchr strchrA // generic string (really ANSI)
#endif
/** Prototype parameter examples. It is important not to declare functions, 
parameters and return values incorrectly. Follow these two basic rules:
1) Never use generic data types for an explicit prototype
2) Avoid creating a generic prototype using explicit data types **/
#if defined( INCORRECT )
int mapUnic2Ansi( P_TCHAR pUniStr, P_TCHAR pAnsiStr, TCHAR rplCh ) ;
P_CHAR encryptString( P_CHAR pStr ) ;
int uniStrLen( P_TCHAR pUniStr ) ;
#else
int mapUnic2Ansi( P_UNICHAR pUniStr, P_CHAR pAnsiStr, CHAR rplCh ) ;
P_TCHAR encryptString( P_TCHAR pStr ) ;
int uniStrLen( P_UNICHAR pUniStr ) ;
#endif
#endif // end include file


Listing Two
/** Name: UCS.c
 Desc: Unicode character string functions. Hungarian 
notation is not used.
**/
#if defined( UNICODE )
#include <stdio.h>
#include "unicode.h"
#define ANSIEXTCHBASE 0x80 // start of ANSI extended 
characters
static UNICHAR mapExtCh2Unic( CHAR ansiChar ) ;
P_UNICHAR
ucschr( P_UNICHAR pUCS, UNICHAR token )
{
P_UNICHAR pStr = pUCS ;
 for ( ; *pStr != token; pStr++ ) ;
 return( pStr ) ;
}
P_UNICHAR
ucscpy( P_UNICHAR pDst, P_UNICHAR pSrc )
{
P_UNICHAR pStr = pDst ;
 while ( *pDst++ = *pSrc++ ) ;
 return( pStr ) ;
}
int
ucslen( P_UNICHAR pUCS )

{
P_UNICHAR pStr = pUCS ;
 for ( ; *pStr; pStr++ ) ;
 return( (int)( pStr - pUCS ) ) ;
}
void
mapAnsi2Unic( P_CHAR pAnsiStr, P_UNICHAR pUnicStr )
{
P_CHAR pSrc = pAnsiStr ;
P_UNICHAR pDst = pUnicStr ;
 for ( ; *pSrc; pSrc++ )
 {
 *pDst++ = (UNICHAR)( ( *pSrc > ANSIEXTCHBASE ) ?
 mapExtCh2Unic( *pSrc ) : *pSrc ) ;
 }
 *pDst = L'\0' ; // NULL terminate the Unicode string
}
UNICHAR
mapExtCh2Unic( CHAR ansiChar )
{
static UNICHAR extChArray[] = {
 /* 80 */ 0x00C7, 0x00FC, 0x00E9, 0x00E2, 0x00E4, 0x00E0, 
0x00E5, 0x00E7,
 0x00EA, 0x00EB, 0x00E8, 0x00EF, 0x00EE, 0x00EC, 
0x00C4, 0x00C5,
 /* 90 */ 0x00C9, 0x00E6, 0x00C6, 0x00F4, 0x00F6, 0x00F2, 
0x00FB, 0x00F9,
 0x00FF, 0x00D6, 0x00DC, 0x00A2, 0x00A3, 0x00A5, 
0x20A7, 0x0192,
 /* A0 */ 0x00E1, 0x00ED, 0x00F3, 0x00FA, 0x00F1, 0x00D1, 
0x00AA, 0x00BA,
 0x00BF, 0x2310, 0x00AC, 0x00BD, 0x00BC, 0x00A1, 
0x00AB, 0x00BB,
 /* B0 */ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 
0x2562, 0x2556,
 0x2555, 0x2563, 0x2551, 0x2557, 0x255D, 0x255C, 
0x255B, 0x2510,
 /* C0 */ 0x2514, 0x2534, 0x252C, 0x251C, 0x2500, 0x253C, 
0x255E, 0x255F,
 0x255A, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 
0x256C, 0x2567,
 /* D0 */ 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 
0x2553, 0x256B,
 0x256A, 0x2518, 0x250C, 0x2588, 0x2584, 0x258C, 
0x2590, 0x2580,
 /* E0 */ 0x03B1, 0x00DF, 0x0393, 0x03C0, 0x03A3, 0x03C3, 
0x00B5, 0x03C4,
 0x03A6, 0x0398, 0x03A9, 0x03B4, 0x221E, 0x03C6, 
0x03B5, 0x2229,
 /* F0 */ 0x2261, 0x00B1, 0x2265, 0x2264, 0x2320, 0x2321, 
0x00F7, 0x2248,
 0x00B0, 0x2219, 0x00B7, 0x221A, 0x207F, 0x00B2, 
0x25A0, 0x00A0 
} ;
 return( extChArray[ (int)( ansiChar - ANSIEXTCHBASE ) ] ) ;
}


Listing Three

/** Name: BEFORE.c (convert)
 Desc: Shows a functions before migration to allow you to 
 apply the steps discussed in the article and compare 
 your results. The functions are strictly examples. 
 Hungarian notation is not used.
**/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "unicode.h"
P_CHAR GenericText( P_CHAR text, int *bytes ) ;
P_CHAR ExplicitText( P_CHAR pStr ) ;
P_CHAR DataStream( P_CHAR pDst, P_CHAR pSrc, int count ) ;
P_CHAR
GenericText( P_CHAR text, int *bytes )
{
P_CHAR pStr1 ;
P_CHAR pStr2 ;
 pStr1 = strchr( text, 'A' ) ; // locate the token char
 pStr2 = strchr( pStr1, '@' ) ; // locate the delimiting char
 *pStr2 = '\0' ; // terminate string at delimiter
 *bytes = pStr2 - pStr1 ; // calc number of bytes
 // number chars == number of bytes
 return( pStr1 ) ; // return start of token found
}
P_CHAR
ExplicitText( P_CHAR pStr )
{
P_CHAR pAnsiStr ;
 // search a string that is explicitly an ANSI string
 pAnsiStr = strchr( pStr, 'W' ) ;
 *pAnsiStr = '\0' ;
 return( pStr ) ;
}
P_CHAR
DataStream( P_CHAR pDst, P_CHAR pSrc, int count )
{
 strncpy( pDst, pSrc, count ) ; // strncpy used purely as an example
 *( pDst + 13 ) = 0 ; // just to randomly truncate the stream
 return( pDst ) ;
}
int main( void )
{
CHAR text[ 78 ] ; // text string - conversion
P_CHAR pToken ;
P_CHAR pStr1 ;
CHAR specialThanksTo[] = "Dawn Woods for editing my article" ;
P_CHAR pStr2 ;
CHAR data[ 78 ] = "Imagine this is byte data and not text!" ;
CHAR temp[ 78 ] ; // temp data stream buffer
P_CHAR pStr3 ;
int nBytes ;
 strcpy( text, "Text with A Token@ in the stream." ) ;
 /* text string that should be converted to a generic string */
 pStr1 = GenericText( text, &nBytes ) ; // get a token from the string
 pToken = (P_CHAR)malloc( nBytes + 1 ) ; // alloc space for token only
 strcpy( pToken, pStr1 ) ; // cpy str to alloc'd space
 printf( "Token '%s' (A Token)\n, pToken ) ;
 printf( "Bytes: %d\nCharacters: %d\n\n",strlen( pToken ),strlen( pToken));

 free( pToken ) ;
 /* explicit text string that should not be generic */
 pStr2 = ExplicitText( specialThanksTo ) ;
 printf( "Thanks '%s'\nBytes: %d\nCharacters: %d\n\n",
 pStr2, strlen( pStr2 ), strlen( pStr2 ) ) ;
 /* processing data in a stream */
 pStr3 = DataStream( temp, data, 78 ) ;
 printf( "Stream '%s'\nCopy '%s'\n", data, pStr3 ) ;
 return( 0 ) ;
}


Listing Four
/** Name: AFTER.c (convert)
 Desc: Shows a functions after migration to allow you to 
 apply the steps discussed in the article and 
 compare your results. The functions are strictly 
 examples.
 Hungarian notation is not used
**/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "unicode.h"
P_TCHAR GenericText( P_TCHAR text, int *bytes ) ;
P_CHAR ExplicitText( P_CHAR pStr ) ;
P_BYTE DataStream( P_BYTE pDst, P_BYTE pSrc, int count ) ;
P_TCHAR
GenericText( P_TCHAR text, int *bytes )
{
P_TCHAR pStr1 ;
P_TCHAR pStr2 ;
 pStr1 = txtchr( text, TEXT('A') ) ; // locate the 
token char
 pStr2 = txtchr( pStr1, TEXT('@') ) ; // locate the 
delimiting char
 *pStr2 = TEXT('\0') ; // terminate 
string at delimiter
 *bytes = pStr2 - pStr1 ; // calc number of bytes
 // number chars == 
number of bytes
 return( pStr1 ) ; // return start of token 
found
}
P_CHAR
ExplicitText( P_CHAR pStr )
{
P_CHAR pAnsiStr ;
 // search a string that is explicitly an ANSI string
 pAnsiStr = strchr( pStr, 'W' ) ;
 *pAnsiStr = '\0' ;
 return( pStr ) ;
}
P_UNICHAR
ExplicitUnicodeText( P_UNICHAR pStr )
{
P_UNICHAR pUnicStr ; // purely an example, code not called
 // search a string that is explicitly an ANSI string
 pUnicStr = strchrU( pStr, L'W' ) ;

 *pUnicStr = L'\0' ;
 return( pStr ) ;
}
P_BYTE
DataStream( P_BYTE pDst, P_BYTE pSrc, int count )
{
 memmove( pDst, pSrc, count ) ;
 *( pDst + 13 ) = 0 ; // just to randomly truncate 
the stream
 return( pDst ) ;
}
int main( void )
{
TCHAR text[ 78 ] ; // text string - conversion
P_TCHAR pToken ;
P_TCHAR pStr1 ;
CHAR specialThanksTo[] = "Dawn Woods for editing my 
article" ;
P_CHAR pStr2 ;
BYTE data[ 78 ] = "Imagine this is byte data and not text!" 
;
BYTE temp[ 78 ] ; // temp data stream buffer
P_BYTE pStr3 ;
int nBytes ;
 txtcpy( text, TEXT("Text with A Token@ in the stream.") ) ;
 /* text string that should be converted to a generic string 
*/
 pStr1 = GenericText( text, &nBytes ) ; // get a token 
from the string
 pToken = (P_CHAR)malloc( nBytes + 1 ) ; // alloc space for 
token only
 txtcpy( pToken, pStr1 ) ; // cpy str to 
alloc'd space
 printf( "Token '%s' (A Token)\n, pToken ) ; // need 
mapUnic2Ansi()
 printf( "Bytes: %d\nCharacters: %d\n\n",
 CALC_CHAR2BYTES(txtlen( pToken )), txtlen( pToken ) 
) ;
 free( pToken ) ;
 /* explicit text string that should not be generic */
 pStr2 = ExplicitText( specialThanksTo ) ;
 printf( "Thanks '%s'\nBytes: %d\nCharacters: %d\n\n",
 pStr2, strlenA( pStr2 ), strlenA( pStr2 ) ) ;
 /* processing data in a stream */
 pStr3 = DataStream( temp, data, 78 ) ;
 printf( "Stream '%s'\nCopy '%s'\n", data, pStr3 ) ;
 return( 0 ) ;
}


Listing Five
Token 'A Token' (A Token)
Bytes: 7
Characters: 7

Thanks 'Dawn'
Bytes: 5
Characters: 5


Stream 'Imagine this is byte data and not text!'
Copy 'Imagine this '


Listing Six
Token '' (A Token)
Bytes: 14
Characters: 7
Thanks 'Dawn '
Bytes: 5
Characters: 5
Stream 'Imagine this is byte data and not text!'
Copy 'Imagine this '


Listing Seven
/** Name: TRANSPAR.c
 Desc: Example of how to use macros (such as the TEXT() 
macro) to eliminate
 unnecessary conditional directives, potential 
semantic errors and 
 improve readability.
**/
// Migration is NOT transparent. Requires a conditional 
directive in source.
void notTransparent( void )
{
 // ...
#if defined( UNICODE )
 ucscpy( text, L"Text with A Token@ in the stream." ) ;
 // ...
#else
 strcpy( text, "Text with A Token@ in the stream." ) ;
 // ...
#endif
 // ...
}
// Migration is transparent. No conditional directive 
required
void transparent( void )
{
 // ...
 txtcpy( text, TEXT( "Text with A Token@ in the stream." ) 
) ;
 // ...
}
_















August, 1994
Polymorphic C


An extended C interpreter




Greg Voss


Greg is a C++ and Smalltalk consultant and the "OOP Alley" columnist for
Windows Tech Journal. He can be contacted at gmv@cruzio.com.


Failure is crucial to learning and productivity. The faster you find out what
is wrong, the faster you can make it right. If I can fail six times a minute,
I'll acquire knowledge six times faster than a programmer who fails only once
every minute. To facilitate this sort of failure, I wrote an interpreter that
lets me combine the benefits of incremental compilation with those of using a
mainstream language.
My Polymorphic C interpreter (PCC) has proven useful in developing and
debugging C routines used in Windows applications. The environment comes close
to approximating the feel of developing programs in Lisp and Smalltalk. Once
subroutines are debugged and stable, they can be passed off to standard
commercial C compilers and incorporated into DLLs, which in turn can be called
from PCC statements and functions still undergoing development.
PCC does not interpret C; it interprets a variant I call "Polymorphic C."
Polymorphic C adds to C features traditionally associated with Smalltalk,
Actor, and Lisp: flexible (polymorphic) typing, incremental compilation, and
direct, interactive access to the symbol table and to all objects in the
run-time image of the program being developed. The program is designed with a
direct-manipulation GUI interface that lets you select individual expressions,
statements, or blocks for immediate compilation and execution. 


The Implementation Language


One of the first steps in developing PCC was choosing the implementation
language. Both Smalltalk and Lisp offer advantages over C or C++: In addition
to incremental compilation, they have built-in support for processing lists
and ordered collections, which can dramatically change the nature of
development. Lisp's strong support for macros and symbolic processing makes it
a better choice for language processing than Smalltalk. In contrast, Smalltalk
has better support for complex windowing interfaces. Lisp's run-time type
checking virtually eliminates program and system crashes associated with
typical Windows development tools. Ultimately, I chose Franz's Allegro Common
Lisp, which enabled me to develop PCC in about six months, spending an average
of ten to fifteen hours a week. 
The grammar for Polymorphic C is a nearly complete implementation of the
ANSI-compatible C grammar specified in the appendix of Kernighan and Ritchie's
The C Programming Language, second edition. Preprocessor directives, such as
#define, #include, and #if, are not supported. It is a relatively trivial
exercise to add preprocessor directives to Polymorphic C, but these directives
do not play the same central role in incremental compilation that they play in
batch processing of C source files. All grammar productions other than
preprocessor directives are recognized.
Polymorphic C provides no library, instead allowing direct, interactive calls
to any Windows DLL, including the entire Windows API. In addition, Polymorphic
C allows subroutines to be implemented in either C or Lisp, thereby giving C
programmers access to all Common Lisp, CLOS (Common Lisp Object System), and
Common Graphics functions and objects. C's standard-library function, printf,
for example, is much easier to implement in Lisp than in C. A complete
implementation of printf is provided with Polymorphic C as an example of a
nontrivial, C subroutine; see Listing Three, page 95. Compilation of
subroutines of up to several hundred lines of C source takes one or two
seconds. With small subroutines, compilation is instantaneous. Run-time type
checking adds a degree of safety not currently available in popular commercial
compilers for C.


C Interpreters 


There have been incremental compilers and interpreters for C on the PC
platform ever since its inception, from Tiny C, cTerp, and Instant-C, to Al
Stevens's latest endeavor, Quincy. Some of the early DOS interpreters suffered
from the awkward disadvantage of shuffling code back and forth between an
incremental development tool and a tool for producing the final application.
The awkwardness of this shuffling between development code and production code
had much more to do with the speed limitations of the 8086 and the 640K memory
limitations of DOS than with the basic idea of incremental compilation. 
As Windows has grown in popularity, Windows programs have increased in size
and complexity so that, even with the fastest compilers, you can rarely run
through a minor edit-compile-link cycle in less than 30 seconds. A minute is
more typical, and several minutes is not unusual. With the large amount of
memory and the fast processor speed of most machines used with Windows,
incremental compilation and run-time development images make more sense than
ever. More than 640K of memory allows the interpreter to cache more objects in
memory and dramatically reduce the amount of processing necessary to modify a
program image.


Using Polymorphic C 


One difference between standard C programming and programming with incremental
C compilers is the granularity of the compilation unit or module. What is the
minimum program quantum that can effectively be processed by the compiler,
linker, and loader? In standard C, the file is the minimum quantum for the
compiler. But linking and loading are still separate, time-consuming steps. In
Smalltalk and Lisp, the expression is the minimum processing quantum. Loading
of the compiled expression is instantaneous, and linkage occurs dynamically
while the expression is being evaluated. In Polymorphic C, the statement is
the minimum processing quantum for the compiler/linker/ loader. For this
reason, the main interface for program development in Polymorphic C is the C
Statement Editor (see Figure 1), the primary interface used in developing C
statements and functions. The C Statement Editor assumes you are compiling,
running, or translating a single C statement. This is not a limitation of the
PCC compiler, but rather an intentional restriction designed to help catch
errors such as missing semicolons, commas, and curly braces. The compiler can
be smarter if it can make assumptions about the type of construct it is
parsing.
While it may be adequate to move back and forth between compilation and
execution of several related statements, compound statement blocks provide the
most convenient way of executing several statements at once. For convenience,
variables need not be declared as long as they are assigned a value before
they are used in expressions. Declaring local variables, however, does help
keep the symbol table clean. Local variables are added to the symbol table on
block entry and removed from the symbol table on block exit.
A typical scenario for usage is one in which you build statements
incrementally, making liberal use of the polymorphic echo() function to output
the values of variables and check for unexpected behavior. For more control
over the format of output, you call printf() instead of echo().


Program Architecture


From a structural point of view, it's best to think of PCC as four
interconnected objects: a parser, a scanner, a symbol table, and a reader
(input system). There are other, secondary objects, but the main action takes
place between these four primary objects. Table 1 shows the primary objects
and their roles in PCC.
Implicitly, there is another major component of PCC: the run-time interpreter.
Actually the interpreter is Allegro's Lisp interpreter, so no special global
object called interpreter actually exists. However, to properly understand the
support mechanism of the run-time environment for programs generated by PCC
(or any C compiler), it's important to think of the interpreter as an entity
in itself. For typical C compilers, the run-time interpreter is the CPU, which
interprets the meaning of sequences of numbers assumed to be machine
instructions. Run-time languages can also be defined at a higher level of
abstraction; for example, the various byte-code sequences used by Basic,
Pascal, and Smalltalk interpreters, and the p-code used by Microsoft C. Such
higher-level program representations require support code above and beyond the
CPU to implement language behavior at run time.
PCC has a global object, *c-environment*, which maintains state data relevant
to the most recently compiled program or statement. If a special run-time
interpreter were required by PCC, it would be placed inside *c-environment*,
which should be thought of as the image in memory that would be set up by a
program loader when a C program is invoked from the command line. The
environment sets up the code (text) and data segments (both initialized and
uninitialized) as well as the stack for automatic variables and the heap used
by the translated program.
As a convention, global variables in Lisp are preceded and followed by an
asterisk. Thus, the four primary objects in PCC have the names
*c-char-reader*, *c-scanner*, *c-parser*, and *symbol-table*.
In PCC, the parser takes legal C token sequences and turns them into Lisp
code. The actual objects generated by the parser are lists--Lisp symbolic
expressions which can be evaluated by the Lisp interpreter. At one point, I
considered generating code for a stack machine--very much like Forth--but
decided that there were advantages to using Lisp itself as the target
language. The primary advantages include the outstanding debugging support
provided by Lisp browsers and inspectors, steppers, and tracing mechanisms.
Listing One shows the internal instance variables of the four primary objects
in PCC. Careful study of the class definitions in Listing One shows the
natural hierarchical relationship between a parser, scanner, and reader. The
parser has a pointer (reference) to the scanner, which in turn has a pointer
to the character reader. The parser can ask the scanner for the next input
token without knowing anything about the reader. Similarly, the scanner asks
the reader for input characters to build up tokens.
While it is possible to access the scanner and the reader through the global
variables *c-scanner* and *c-char-reader*, you don't need to do this--except
while testing methods for the scanner and reader. Thus, the primary interface
to the compiler is through *c-parser*, which is mediated through the local
menu of the C Statement Editor. However, it can also be accessed directly
through Lisp symbolic expressions. It's convenient to have *symbol-table* as a
global so that you can dump and inspect the contents of the table after a
statement or program is compiled or when program execution is suspended at a
break point while debugging. 


PCC Objects and Methods 



The PCC reader has 7 methods, the scanner has 27, and the symbol table has 8.
About 60 methods in the parser class are central to compilation. Of these, 50
methods are direct implementations of the productions for the ANSI-C grammar
specified in Kernighan and Ritchie. The remaining 10 parser methods provide
support for managing input strings, retrieving tokens and line numbers,
matching tokens against input, and reporting errors. Error recovery is enabled
by exception handling (catch and throw) to unwind the interpreter's stack when
errors are encountered in the middle of a parse.
The most revealing function in PCC is the test-xlate-statement in Listing Two
. This is a function rather than a method of c-parser-class, so it can be
invoked even if instances of c-parser-class, c-scanner, and c-input-class have
not yet been created. The only required argument to this function is the
identifier string; this argument contains the text source for a single C
statement, which may be a compound statement and therefore arbitrarily
complex.
The code for test-xlate-statement shows all the major types of transactions
you are likely to ask the parser to carry out if you are not working through a
GUI interface like the C Statement Editor. In fact, the C Statement Editor
calls test-xlate-statement directly whenever you issue the compile command
from the local menu. The C Statement Editor passes whatever text is
highlighted as the string argument to test-xlate-statement.
First, the run-time environment and error handlers are initialized. Next, the
echo run-time library subroutine is installed in *symbol-table*. This could be
done once globally rather than each time a statement is translated, but
leaving it here shows the interface for installing functions in the symbol
table and also reveals the simplicity of the definition for echo. 
Now for the heart of the compiler: An instance of c-parser-class called
*c-parser* is created by the Lisp expression: (setf *c-parser* (make-instance
'c-parser-class)). 
A lot goes on behind the scenes here. The constructor for c-parser-class
actually creates an instance of, and a link to, a c-scanner-class object. The
scanner in turn creates a character-reader object and initializes a pointer to
it. This design lets the user focus all interaction on *c-parser*. Setting the
input string for translation is handled in the expression: (source-string
*c-parser* string).
Normally the string is retrieved from the text highlighted in the C Statement
Editor window. This expression sends the message source-string to the object
*c-parser* with the argument string. Compared to Smalltalk and C++, Lisp
inverts the order of the object/message syntax: In Lisp the message comes
first. For a variety of reasons, CLOS makes message sends use syntax identical
to that used for function invocation. The first argument is the object
receiving the message.
To get the translation started, the token pointer must be advanced to point to
the first input token. Then the statement translator is invoked. If
translation is successful, a Lisp expression is left on top of the parse
stack. To run the resulting program, the Lisp expression can be evaluated as
follows: (eval (car (stack *c-parser*))). 
The function car returns the first element of its list argument--in this case,
the parse stack, implemented as a list. At any time, you can print the Lisp
expression in Allegro's TopLoop window, which is similar to Smalltalk's
Transcript window; for example, (pprint (car (stack *c-parser*))). 
The translate menu command in the C Statement Editor invokes a similar
expression to show the Lisp translation of a C program in a newly opened edit
window.
When interacting directly with the C Statement Editor, the aforementioned
mechanisms are entirely hidden. You have a sense of compiling and running
programs directly by highlighting text and issuing applicable menu commands.
Output normally directed to C's stdout device appears in Allegro Lisp's
TopLoop window. Windows applications would probably use dialog boxes for input
and output. Standard dialogs are readily accessible through Allegro's Common
Graphics package or by calls to the Windows API.


Conclusion 


I have found PCC to be a practical alternative to popular, commercial C
development environments, although PCC is best used as an enhancement to
commercial compilers rather than a replacement. Hopefully PCC will provide
insight into the use of incremental compilation in off-the-shelf tools.
Figure 1 The C Statement Editor window is the primary interface for writing
code.
Table 1: PCC primary components.
Component Description 
reader Remembers input string for C source.
 Keeps track of current position in input stream.
 Remembers current character.
 Keeps track of current line number (for errors and 
 warnings reported by scanner or parser).
 Maintains pushback stack for characters.
scanner Converts input to token stream.
 Allows putback of one token.
 Remembers current token and last-read token.
 Strips comments from input.
 Installs identifiers in symbol table.
parser Recognizes and validates legal token sequences 
 (sentences). 
 Translates token stream into expressions 
 in target language (Lisp). 
 Parses type declarations and generates type 
 structures. 
 Recognizes and reports syntax and semantic errors 
 (some semantic errors are recognized and reported 
 by the run-time interpreter).
symbol Stores values and types associated with identifiers 
table (strings). Implemented as hash table for fast lookup.
 Identifiers used as lookup key.
 Stores values for constants, variables, and code for 
 functions.
 Scoping: Allows shadowing of variables redefined in 
 local blocks.

Listing One 

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; CLASS c-input-class
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defclass c-input-class ()
 ((source-stream :accessor source-stream
 :initform nil
 :initarg :source-stream)
 (line :accessor line :initform nil)
 (line-size :accessor line-size :initform 0)

 (line-number :initform 0 :accessor line-number)
 (index :accessor index :initform 0)
 (putback-stack :accessor putback-stack :initform nil)))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; CLASS c-scanner-class
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defclass c-scanner-class ()
 ((token :accessor token :initform nil)
 (text :accessor text :initform nil)
 (value :accessor value :initform nil)
 (current-char :accessor current-char :initform nil)
 (previous-token :accessor previous-token :initform nil)
 (previous-text :accessor previous-text :initform nil)
 (previous-value :accessor previous-value :initform nil)

 (char-reader :accessor char-reader :initform nil)
 (lexeme-buffer :accessor lexeme-buffer
 :initform (make-array '(1024)
 :element-type 'string-char
 :initial-element (character 0)
 :fill-pointer 0))))
 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; CLASS c-parser-class
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defclass c-parser-class ()
 ((scanner :accessor scanner ; link to scanner
 :initform nil)
 (look-ahead :accessor look-ahead ; look-ahead symbol
 :initform -1)
 (stack :accessor stack ; parse stack
 :initform nil)))
 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; CLASS symbol-table
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defclass symbol-table ()
 ((table :accessor table
 :initform (make-hash-table
 :size 1000
 :test #'equal))))


Listing Two

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Function acting as primary interface between
;;; C Statement Editor window and *c-parser*.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun test-xlate-statement (string ; source text
 &optional ; optional args follow
 (print nil) ; print translation if t
 (interpret t)) ; run program if t
 ;
 (ct-block-reset *c-environment*) ; Reset all compile-time
 ; block data.
 ; 
 (catch 'exit-parser ; Exception handler: Throw

 ; expressions in parser 
 ; methods are caught here.
 ;; ==================================================
 ;; Define 'echo' function. (to be installed as a 
 ;; function in *symbol-table*). Simulates a compiled C function 
 ;; which echos its argument to the TopLoop (Transcript) window. 
 ;; Argument can be any data type. (i.e. argument is 
 ;; polymorphic--hence name 'Polymorphic C').
 ;; ==================================================

 (symbol-install *symbol-table* "echo" 'BUILT-IN
 #'(lambda (&rest x)
 (dolist (e x) (format t "~a " e))
 (format t "~%") x))
 ;; ==================================================
 ;; Make new instance of parser and set source
 ;; string to be translated.
 ;; ==================================================
 (setf *c-parser* (make-instance 'c-parser-class))
 (source-string *c-parser* string)

 ;; ==================================================
 ;; PARSE and TRANSLATE. Creates Lisp expression as top element
 ;; of parse stack (stack *c-parser*).
 ;; ==================================================
 (format t "~%COMPILING~%")
 (advance-token *c-parser*)
 (xlate-statement *c-parser*)
 (format t "~%COMPILE DONE~%")

 ;; ==================================================
 ;; Wrap translation in 'return-label' block 
 ;; so C return statements will work in compound statements--even 
 ;; when enclosing function isn't defined.
 ;; ==================================================
 (let ((s (pop (stack *c-parser*))))
 (push `(block return-label ,s) (stack *c-parser*)))

 ;; ==================================================
 ;; PRINT (optional). Expressions can get long, so printing is 
 ;; optional.
 ;; Pass T (true) as second argument to test-xlate-statement 
 ;; if you want to see the translation.
 ;; ==================================================
 (when print
 (terpri)
 (format t "~%LISP TRANSLATION OF=====================>~% ~a" string)
 (format t "~%<========================================")
 (pprint (car (stack *c-parser*))))

 ;; ==================================================
 ;; INTERPRET (run--also optional). Evaluate translated 
 ;; Lisp expression (form) on top of parse stack.
 ;; ==================================================
 (when interpret
 (terpri)
 (format t "~%INTERPRETING STATEMENT==================>~%~a" string)
 (format t "~%<========================================")
 (terpri)

 (eval (car (stack *c-parser*))))
 (format t "~%OK~%> ") t))



Listing Three

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 
;;; printf --- the behavior of C's printf is simulated 
;;; by converting calls to
;;; printf into Lisp expressions which are evaluated at run time.
;;; The basic algorithm is:
;;; while more chars in format-string
;;; get next char
;;; if % process-conversion
;;; else if \ process escape character
;;; else process normal character
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 
(defun printf (format-string &rest args)
 (do* ((in-stream (make-string-input-stream format-string))
 (out-stream (make-string-output-stream))
 (c (read-char in-stream nil 'eof) (read-char in-stream nil 'eof))
 (out-args) (justify) (prefix) (alt-output)
 (width nil nil) (precision nil nil) (conversion-char))
 ((eq c 'eof) ; terminate loop
 (let* ((f-string (get-output-stream-string out-stream))
 (return-expr
 (append `(format t ,f-string ) (reverse out-args))))
 (eval return-expr)))
 (macrolet

 ((next-char ()
 '( progn (setf c (read-char in-stream nil 'eof))
 (if (eq c 'eof)
 (error "printf: eof in print format string"))))
 (get-number-for (attribute)
 `(progn
 (when (char= c #\*)
 (setf ,attribute '*)
 (push (pop args) out-args)
 (next-char)
 (format t " * for ~a specifier " ',attribute))
 (when (digit-char-p c)
 (do ((stream (make-string-output-stream)))
 ((not (digit-char-p c))
 (setf ,attribute (get-output-stream-string stream)))
 (write-char c stream)
 (next-char))))))
 (case c
 (#\% ;;(princ 'print-directive)
 (next-char)
 (when (char= c #\-) (setf justify 'left) (next-char))
 (when (member c '(#\+ #\space)) (setf prefix c)(next-char))
 (when (char= c #\#) (setf alt-output t) (next-char))
 ;; field width
 (when (or (char= c #\*) (digit-char-p c))
 (get-number-for width))
 ;; precision
 (when (char= c #\.)

 (next-char)
 (get-number-for precision))
 (case c
 (#\c ;;(princ 'character)
 (write-string "~c" out-stream)
 (push `(character ,(pop args)) out-args))
 (#\s ;;(princ 'string)
 (write-char #\~ out-stream)
 (when width
 (if (eq width '*)
 (write-char #\v out-stream)
 (write-string width out-stream)))
 ;; parameters done, now modifiers
 (unless (eq justify 'left)
 (write-string ":@" out-stream))
 (write-char #\a out-stream)
 (push (pop args) out-args))
 (#\d ;;(princ 'integer)
 (write-char #\~ out-stream)
 (when width
 (if (eq width '*)
 (write-char #\v out-stream)
 (write-string width out-stream)))
 (write-char #\d out-stream)
 (push (pop args) out-args))
 (#\f ;;(princ 'float)

 (write-char #\~ out-stream)
 (if width
 (if (eq width '*)
 (write-char #\v out-stream)
 (write-string width out-stream)))
 (write-char #\, out-stream)
 ;; if precision not specified, set default precision to 6
 (when (null precision) (setf precision "6"))
 (cond ((eq precision '*)
 (write-char #\v out-stream)
 (write-char #\f out-stream))
 ((zerop (parse-integer precision))
 ;; Precision specified as "0"
 ;; To avoid printing decimal point
 ;; used d (decimal) format instead of f.
 ;; This is what C's printf does. See
 ;; K&R 2nd Ed. pp. 12-13. You must also
 ;; explicitly round the argument in Lisp
 ;; or it will print as a float
 (write-char #\d out-stream)
 (let ((arg (pop args)))
 (push `(round ,arg) out-args)))
 (t (write-string precision out-stream)
 (write-char #\f out-stream)
 (push (pop args) out-args))))
 ((#\x #\X)
 (write-char #\~ out-stream)
 (write-char #\x out-stream)
 (push (pop args) out-args))
 (#\% (princ 'percent)
 (write-char #\% out-stream))))
 (#\\ ; escape character

 (next-char)
 (case c
 (#\t (princ 'tab)
 (write-string "~,8@T" out-stream))
 (#\r (princ 'carriage-return))
 (#\n (princ 'integer) (write-string "~%" out-stream))))
 (t ; normal character
 (write-char c out-stream))))))
;;;
;;; Install printf function in *symbol-table*
;;;
(symbol-install *symbol-table* "printf" 'BUILT-IN
 (function printf))

















































August, 1994
C++ Namespaces


Controlling global-namespace pollution




Tom Pennello


Tom is vice president of engineering at MetaWare and can be contacted at 2161
Delaware Ave., Santa Cruz, CA 95060. 


Most programmers understand the global namespace problem that C++ inherited
from C. Global names--those of functions, variables, types, and
enumerators--declared in one third-party library can clash with names found in
another third-party library or in the programmer's own application. For
example, a graphics library from Vendor A and a math library from Vendor B can
both have a method called Curve::rotate(int degrees). If you run into this
situation, your only recourse is to change the name of the class Curve in one
of the libraries. Sometimes this change is difficult, if not impossible, to
implement.
To address such problems, C++ includes namespaces, which make it possible to
encapsulate library names in a library namespace. In general, a namespace is a
named declarative region, and its name is used to access the names declared
within that region. Thus, Vendor A's rotate function can be known as
A_graphics::Curve::rotate, and Vendor B's as B_math::Curve::rotate. The
external (mangled) names given to the linker for such encapsulated library
names include the namespace name, so that they are distinct to the linker.
Even if you are using a library that predates namespaces, you can use
namespaces to encapsulate names in your own code that clash with those of the
library.
Namespaces are a recent addition to C++, accepted in July 1993 at the Munich
meeting of X3J16. They were preceded by many experiments in
programming-language design, such as the packages of Ada (1980), which were
also incorporated in MetaWare's Pascal (1984). C++'s namespaces offer new
twists not present in these two languages, such as the distinction between a
using declaration and a using directive.
Names are introduced in a namespace via the namespace definition in Example
1(a). A namespace definition is syntactically a declaration, but it may only
appear at the global scope or within another namespace definition. All names
declared directly within a namespace definition, except those introduced by a
using declaration, are "owned" by the namespace.
If the namespace identifier is not declared in the current scope, it is
declared as a namespace. If it has already been declared, the definition
"continues" the namespace. In this way, a namespace can be extended to allow a
single large namespace to be defined over different header files; see Example
1 (b).
A library vendor can choose a very long name for a namespace to minimize
potential clashes with other vendors; see Example 1(c). However, it may be
unwieldy to use such a long name to refer to names in the namespace. A
namespace-aliasing definition allows you to use a shorthand name like that in
Example 1(d), where DSOML is declared as a namespace that stands for the
longer name in the scope of the alias definition.


Using the Names: Explicit Qualification


The names declared in a namespace can be denoted explicitly using the existing
qualified name syntax for classes. In Example 2(a), for instance,
Colors::invert denotes a member invert of namespace Colors. A name declared in
a namespace can be defined outside the namespace, if not already defined
within it, using the qualified name to denote the member being defined; see
Example 2(b). Just as with classes, the text following the qualification is
considered in the scope of the namespace. So, although we say Colors::color
for the return type, namespace members color (the parameter type) and red (the
initializer) can appear unqualified.


The using Declaration


A using declaration declares a name N from a namespace as a current
declaration in the scope in which the using declaration appears. The
declaration of N is an alias of its declaration in the namespace. In Example
3(a), for example, a using declaration uses a qualified name to refer to the
used name. (The current X3J16 proposal does not allow the comma-separated
list; MetaWare High C++ does.) 
Just as with normal declarations, a using declaration can introduce a function
that overloads other functions in the same scope. And just as with normal
declarations, a using declaration can cause a duplicate declaration if a
nonfunction of the same name has already been declared; see Example 3(b).
Unlike normal declarations, using declarations can overload functions with the
same argument types. The ambiguity is detected at the point of use, rather
than the point of declaration. Suppose Other_namespace::invert had the same
signature as Colors::invert; then for //1 invert(C), an error message would
result.
If ambiguity exists between two functions with identical argument types, where
one function was introduced by a using declaration and the other by a
non-using declaration, the non-using declaration is preferred over the using
declaration. In Example 3(c), //2 invert(C) calls the extern int function
rather than the function introduced by the using declaration. No diagnostic is
produced; this silent choice could easily be a point of disagreement among
language designers.
A using declaration may appear in the same places as any other declaration. A
using declaration within a namespace N makes the name declared in N; see
Example 3(d). Hence, N::name can refer to the using-declared name, as Example
3(e) illustrates. The only difference is that the imported name is still
"owned" by its original namespace. Thus, being using-declared doesn't affect
its external mangled name. You can now say TV::color or using TV::color, even
though color is owned by namespace Colors. 
Partly because using declarations allow a name to be declared in multiple
namespaces, and partly because using declarations are just aliases, different
using declarations may import the same declaration. Such an otherwise-invalid
duplicate declaration is allowed and has the same effect as if the duplicate
were not there; see Example 3(e). This is more useful with functions, where
you may use a namespace solely for constructing a collection of functions, as
in Example 3(f). But, as Example 3(g) shows, you may also now import both
collections without fear of duplication. A using declaration does not allow
you to select a single, individual function; see Example 3(h). (High C++ also
supports using N::*, which effectively performs a using declaration on every
name declared in N.)


The using Directive


A using directive is quite different from a using declaration. The directive
imports an entire namespace at once, but not as declarations in the scope
containing the directive. Instead, within the scope containing the directive,
the names are treated as if they were declared in the smallest enclosing
non-namespace scope. (This is the global scope for namespaces declared
globally or within global namespaces.) Furthermore, apparent duplicate
declarations do not occur until a name's point of use.
As Example 4(a) illustrates, a using directive begins with using namespace
followed by the (potentially qualified) name denoting the namespace. The names
are treated as though they were declared globally, so local declarations of
the same name, like those in Example 4(b), hide the global declarations. The
local declaration //1 hides the global; //2 invokes
//1Other_namespace::invert. However, in Example 4(c) both namespaces' members
are introduced globally, so in //3 there is the choice of two globally
declared invert functions. 
As with using declarations, ambiguity between two functions with identical
argument types is resolved in favor of a function that was not introduced by a
using directive. Similarly, ambiguity that cannot be resolved via overload
resolution is reported at the use of a name, not at the occurrence of the
using directive.
Names introduced by using directives are ignored when an expression uses
explicit qualification. Thus, even though the names are treated as being
declared globally, a reference ::global_name, as in Example 4(d), does not
refer to any names introduced by using directives.
According to a preliminary paper on namespaces (93-055), directives may be
transitive--if namespace A contains using namespace B in its definition, then
using namespace A also gains access to the names in B. In the formal proposal
for namespaces (93-105), such an intent is not indicated by the language. The
transitivity was intended for "joining" two namespaces together in a single
one, but it does not have the intended effect; thus in High C++ directives are
not transitive.


Friend and extern Declarations


Friends first declared in a class are declared in the smallest enclosing
nonclass, nonfunction-prototype scope. With the addition of namespaces, this
means a friend declaration like that in Example 5 can be "injected" into the
namespace. The same holds for an extern declaration first declared within a
function in a namespace. Rather than being a global extern, the declaration is
owned by the namespace.


Unnamed Namespaces 



You can omit the identifier in a namespace definition, in which case you are
then referring to that "unnamed" namespace that exists for every compilation
unit. The unnamed namespace is unique for that compilation unit, and a using
directive for it is automatically assumed. The resultant effect is that you
can enclose your local data in the unnamed namespace without fear of it
clashing with local data in other compilation units. Example 6(a), for
instance, is equivalent to having written Example 6(b), where UNIQUE is a
compiler-chosen name unique to each compilation unit. 
The proposed C++ draft says that names within the unnamed namespace have
internal linkage, but this is being debated on the X3J16 core language
reflectors, as some think that the internal linkage provision serves no
purpose.
Because namespaces are so new, some issues currently under discussion include:
statements such as "the scope of a class is a namespace"; the implicit
internal linkage of unnamed namespaces; the transitivity of directives; and
the possible inclusion of using N::*. But even without decisions on these
finer points, MetaWare's engineers have found namespaces extremely useful.
Example 1: (a) Namespace definition; (b) extending a namespace; (c) long names
for namespaces can minimize potential clashes; (d) using a shorthand name.
(a) namespace identifier {
 ... list of zero or more declarations
 }

(b) namespace A { // Declare A as a namespace.
 int f() { ... }
 typedef int T;
 }
 ...
 namespace A { // Add more to namespace A.
 int g();
 }

(c) namespace Distributed_System_Object_Model_Library {
 ...
 }

(d) namespace DSOML = Distributed_System_Object_Model_Library;
Example 2: (a) Colors::invert denotes a member invert of the namespace Colors;
(b) defining a name outside the namespace.
(a) namespace Colors {
 enum color {red,white,blue};
 typedef int color;
 color invert(color);
 extern color c;
 };
 ...
 func() {
 Colors::color C = ...;
 Colors::invert(C);
 }

(b) Colors::color Colors::invert(color c) { ... }
 Colors::color Colors::c = red;
Example 3: (a) A using declaration uses a qualified name to refer to the used
name; (b) a duplicate declaration; (c) //2 invert(C) calls the extern int
function; (d) an imported name is still "owned" by its original namespace; (e)
an allowable duplicate declaration; (f) using a namespace for constructing a
collection of functions; (g) importing both collections without duplication;
(h) a using declaration does not allow you to select a single, individual
function.
(a) func() {
 using Colors::red, Colors::white, Colors::blue;
 Colors::color x = red;
 using Colors::invert;
 x = invert(x);
 }

(b) func() {
 extern int invert(int);
 using Colors::invert; // invert is now overloaded.
 using Other_namespace::invert;// OK if invert's a function.
 Colors::color C = ...;
 invert(C); //1 Selects from 3 inverts.
 typedef int T;
 using Other_namespace::T; // Error.
 }

(c) extern int invert(Colors::color);
 using Colors::invert; // invert is now overloaded.
 Colors::color C = ...;

 invert(C); //2 Selects from 2 inverts.

(d) namespace TV {
 using Colors::color;
 color pixels[1024*1024];
 }

(e) using Colors::color;
 using TV:color; // OK; same as Colors::color.

(f) namespace Collection1 {
 using A::f, B::f, C::f;
 };
 namespace Collection2 {
 using B::f, C::f, D::f;
 };
 ...
 ... Collection1::f(...) ...
 ... Collection2::f(...) ...

(g) using Collection1::f, Collection2::f;
 ... f(...) ... // Select from A::f, B::f, C::f, and D::f.

(h) using N::f(int); // NOT ALLOWED.
 using N::f(char); // NOT ALLOWED.
 // But skip a whole bunch of other f's from N.
Example 4: (a) A using directive begins with using namespace, followed by the
(potentially qualified) name denoting the namespace; (b) hiding global
declarations; (c) namespaces' members are introduced globally.
(a) func() {
 using namespace Colors;
 color C = red;
 invert(C);
 }

(b) func() {
 using namespace Colors;
 using Other_namespace::invert; //1 local declaration
 color C = red;
 invert(C); //2 gets //1.
 }

(c) func() {
 using namespace Colors; //1 invert declared globally
 using namespace Other_namespace;//2 another global invert
 color C = red;
 invert(C); //3 chooses between 1 and 2
 }

(d) namespace A { int glob; }
 using namespace A;
 int glob; // OK, now we have two of them globally.
 func() {
 glob++; // Ambiguous: A::glob or ::glob.
 ::glob++; // Refers to global, non-namespace glob.
 A::glob++; // Refers to namespace A's glob.
 }
Example 5: Injecting a friend declaration into the namespace.
namespace A {
 class C {
 // This is A::func.

 friend func();
 };
 }
int A::func() { ... }
Example 6: (a) is equivalent to (b). 
(a) namespace {
 int a, b, c;
 void f() { ... }
 }

(b) namespace UNIQUE {
 int a, b, c;
 void f() { ... }
 }; using namespace UNIQUE;
















































August, 1994
Lotfi Visions Part 2


Concluding our conversation with the father of fuzzy logic




Jack Woehr


Jack is a frequent contributor to DDJ and can be contacted at jax@cygnus.com.


Last month, Jack Woehr spoke with Lotfi Zadeh, the father of fuzzy logic,
shortly after Zadeh presented a paper entitled "Fuzzy Logic: Issues,
Contentions and Perspectives" at the 22nd Annual ACM Computer Science
Conference in Phoenix, Arizona on March 8, 1994. In this installment, Woehr
and Zadeh (identified as "LZ" in the following interview) are joined by
Professor William Kahan (WK) and John Osmundsen (JO), associate director,
public affairs, of the ACM. 
DDJ: Professor Zadeh, do you travel to Japan?
LZ: Yes, I was there a week ago. There are many misconceptions. One of the
misconceptions is that the Japanese just jumped on this thing. No. I wrote my
paper in 1965. Starting in 1968, you'll find that there are quite a few papers
in Japanese literature dealing with fairly sophisticated applications of fuzzy
logic. In 1970 a study group was formed and met once a month, sometimes in
Tokyo, sometimes in Kyoto.
There were some people in Japan, influential people, who saw that this was
promising. The director of the National Laboratory for Fuzzy Engineering is
one of them, Toshio Terano. The late Okicha Tanaka was one of them.
In 1974, there was a joint America-Japan symposium on fuzzy-set theory in
Berkeley. There were about 20 people from Japan that time. Professor Terano
was among them. The Japanese came in at an early stage. It was not a rash
decision. They started working on it a long time ago.
They started working on the Sendai train in 1979. Sendai is a modern city,
beautifully planned. The system is a real model. It runs beautifully, it's
clean, it's an outstanding system. They went into operation in 1987, so they
spent eight years. Hitachi did the electronics, Kawasaki Heavy Industries
built the train. I must take my hat off to the Hitachi people and Kawasaki
people for starting with an idea that, in 1979, was still in a somewhat
embryonic stage. Subway systems are not toys! Lives are at stake.
In Japan, people are so careful, and bureaucracy is so strong. The regulatory
agencies in Japan are notorious in their insistence on all kinds of tests.
Three-hundred thousand simulations and two thousand actual runs before they
finally certified that system. No American company would have done that. No
American company would have spent eight years on that sort of thing.
Today in Japan they have a large number of engineers working for various
companies who have quite a bit of experience in the use of fuzzy logic.
DDJ: Hitachi was daring in going with an embryonic technology that did not
have an overwhelming body of field trials. To what extent was this confidence
in your work? To what extent was it confidence in Japanese work? To what
extent was it hubris, the Japanese deciding confidently they could solve any
problems found along the way?
LZ: I would say it was a combination of all of these things. These people are
very serious. They take an idea which perhaps did not originate in Japan, but
are much more thorough, much more serious about developing [it] than we would
be in this country.
All of these systems--our system, their system--have certain strengths and
weaknesses. Their system is very strong when it comes to very methodical, very
detailed research and development. They also have many original ideas, but the
crux of their strength lies in the areas I have noted. They're tenacious,
they're persistent, so they can take an idea and do wonders with it, because
of the way in which they function.
JO: What applications would one definitely prefer to do using fuzzy logic?
LZ: There are many. When you talk about applications, you can divide them into
a couple of groups.
In one group are those applications in which there is no competition for fuzzy
logic. Then in the second group are applications in which you can use fuzzy
logic, or you can use something else. Then it becomes a question of what is
better.
Now, the first group I refer to as the issue of tractability. Forget the
economics: Can you solve the problem? Let me give you some examples of this
sort of thing. Let me start with a very practical sort of thing: the Isuzu
hand-braking system.
In Japan, automotive applications are becoming very widespread. This is a very
simple idea. If you are in a car, and you are going uphill, and you want to
park your car, it becomes a problem. You have to play with the hand brake, and
if you have one of those hand brakes which you have to release with your foot,
you are in real trouble.
This simple idea occurred to the Isuzu people. Assume you know what you want
to do. Express that in fuzzy rules. IF you are on a hill AND you want to back
into a parking space AND IF you want so much pressure to be applied, and so
forth, THEN do this and so forth. Just take these rules and implement them.
The Isuzu system makes it possible for you to slide into a parking space.
DDJ: So you are saying that the various degrees of release of the brake, the
gradations of speed of forward and backward motion, all combine to render this
a fuzzy problem for fuzzy logic.
LZ: Yes. Bill, come and join us! We're talking about fuzzy logic.
WK: As in, "What's the difference between fuzzy logic and fuzzy thinking?"
LZ: We were talking about problems in which there is competition for fuzzy
logic and problems in which there is no competition, problems where if you
want to solve them, you must use fuzzy logic, problems intractable by
conventional solutions.
How do you cross a traffic intersection in a vehicle? In our head, we have a
bunch of rules. For example, if there is a light, a stop sign, two-way street,
one-way street_various dynamics which describe the situation. But given those
parameters, then you can formulate the rules._ IF there is a car that is
approaching, if there's a red light, a green light, a yellow light. IF you are
going so fast, THEN do something, press on the brake, et cetera. You can
formulate a bunch of rules. Now, my point is that there is no way in which
such rules could be formulated using some other methodology; there is no way
you can do it.
DDJ: Do you mean there is no practical way, or there is, in an ideal sense, no
mathematical way?
LZ: Humans can't do it. First of all, operations research or something like
that can't provide any answers, and secondly, humans can't formulate these
rules crisply. If this car's speed is greater than or equal to 25 mph and you
are within a distance of 20 feet, then apply so much pressure to the brake.
People just cannot do it.
DDJ: There is a nonfinite number of rules without fuzzy logic.
LZ: People can't be this precise. They have this fuzzy perception of what to
do.
WK: That's an interesting claim. It's always very hard to prove a negative. I
don't know of any evidence, or for that matter, of any body of opinion that
would say that no such set of rules could be formulated.
I suppose we could put it to a test by creating a robot-controlled vehicle and
seeing whether it succeeded in crossing a traffic intersection at least about
as often as human-controlled vehicles do. I feel more optimistic than Lotfi. I
feel I could design such a thing myself. I don't think it would be hard. All I
have to do is get it to know when to stop and when to proceed through the
intersection. There's one difficulty, perhaps what Lotfi is thinking about.
It's called the "Paradox of Buridan's Ass," the hungry donkey so neatly
situated between two equally attractive piles of straw_.
DDJ: _that he starves to death.
WK: It turns out that in every decision situation, every go/no-go situation,
there is always a finite risk_. 
DDJ: _of stasis.
WK: _of being hung up. In fact, switching circuits offer a similar finite
risk, although they are designed to make that risk so tiny that you don't
normally notice it. If that is the difficulty Lotfi means, then there's an
intrinsic impossibility. But if that's not the difficulty he means, if we're
willing to take the chance that every now and then, maybe once in every
millennium, the device may be paralyzed by indecision, then I rather dispute
that it can't be done (by conventional control theory).
DDJ: I have two thoughts simultaneously. One is that I'm not sure that you and
Professor Zadeh have the same meaning for the word "can't."
WK: Perhaps not.
DDJ: The second is that, [speaking] as a former taxi driver, the problem in a
San Francisco intersection is that you are never sure that the oncoming
cross-traffic is actually going to stop for the red light. Thus you are
constantly adjusting your speed planning for a fail-safe abort pending the
diminishing speed of the car coming down the hill towards the red light on the
cross street.
WK: I don't believe that those calculations are of the kind that can only be
characterized in the language of fuzzy sets. The perceptions that you have as
a driver can be quantified. With patience, one can enumerate these
perceptions, and with sensors create a robot that can approximate a human
being's actions without resorting to concepts like "very," "somewhat," et
cetera.
DDJ: Let's look at another domain. The game of chess has an absolute
mathematical solution for every sequence of moves. Knowing that there exists,
in theory_an absolute solution, the field of computer chess does not seek any
such solution except in the endgames, wherein brute force can solve the
problem exhaustively in real time.
Is it possible that there is a "Platonic Ideal" solution to all these
engineering problems that Professor Zadeh brings up, but one that can't be
reached in real time in the late twentieth century by American engineers, only
approximated by fuzzy logic?
WK: No, I think that's a different problem. In chess you have combinatorial
explosion which makes it impossible to implement. The mathematical solution's
existence you can prove, but a generalized algorithm is intractable because it
would simply take too much computing. And so instead, we devise a variety of
strategies. We have extremely good chess-playing programs. These have not been
devised using the fuzzy calculus.
DDJ: But the heuristics of the chess-playing programs resemble fuzzy logic in
many respects.
LZ: Take backgammon. [Chess grand master emeritus] Hans Berliner wrote a
program that was very good at playing backgammon. He used crisp rules. Then he
introduced fuzzy rules, and the performance of the program improved to the
point where the program played championship-quality backgammon. If you look at
commentaries on chess, you'll find that all of the comments are fuzzy. They
say, "the center was strengthened," et cetera. The reason fuzziness comes into
chess is that you have an ultimate goal--checkmate--which is crisp, but that
ultimate goal is too far away, in some sense, at the intermediate stages of
the game. So you have to replace it with local goals. The local goals are
fuzzy goals. 
WK: Lotfi's error here is a failing of pride. He has a fuzzy calculus, infers
that this is the way the mind works, and imputes to the commentators on chess
a similar strategy. 
Another game in which the combinatorial explosion caused even more despair
than chess is the Japanese game of Go. A book has come out recently by
Burlekamp and Wolf which analyzes a large class of endgames in Go--endgames
which used to use the same vocabulary as Lotfi is using for chess. What
they've shown is that you can actually analyze these quite exactly, and
consequently discover that positions which experts in Go would have given up
as a draw or loss, are actually wins.

DDJ: When you gentlemen get into mathematical theory, you fly way over my
head. But when you talk about chess, Go, and backgammon, I'm a United States
Chess Federation expert-class player. I understand your references to Go
research, and there is a similar problem in chess. That is, if you start
backwards from the end of the game, you can analyze many, many positions.
Nowadays, with the aid of computers, many endings which were unclear have been
solved definitively by brute-force calculation.
I know several grand masters of chess. The chess world hasn't had any doubt
since the 1950s that computers would eventually revolutionize the endgame.
They didn't use to believe they would be beaten by computers over the board,
however, because by the standard of computers of the 1950s and 1960s, it
didn't appear that computers would ever get good enough at working through
approximate situations which have no potential for being solved
deterministically in real time.
WK: In other words, that they would never be able to formulate a strategy.
DDJ: Yes. But now people like grand master Julio Kaplan in Berkeley and
Kaufmann and others have been instrumental in aiding the computer programmers
to understand what strategy in chess consists of. It's a very fuzzy set! If
the center is flexible (which covers a wide variety of possible pawn
positions), then don't attack on the wing_.
WK: The question about the use of fuzzy calculus in the context of computing
is a question about whether that design paradigm is preferable in
circumstances where you can see the possibility of using the standard design
paradigms, albeit laboriously, to accomplish the desired technological goals.
When it comes to playing various games, since it may very well be that there
is no design paradigm that can be guaranteed to be better than another, it may
not hurt very much to say, "Let's try fuzzy logic, what have we got to lose?"
LZ: There is one transparency which I sometimes show in my lectures which I
call the "effectiveness chart." I have a triangle there, the vertices of which
are labeled "fuzzy logic," "neural net," and "probabilistic reasoning."
Particular problems are represented as points. I put a problem close to the
vertex labeled "fuzzy logic." It means that that problem can be solved
effectively using fuzzy logic. It may be far away from "neural network," or it
might be perhaps somewhere in between, meaning it can be solved using fuzzy
logic or neural networks.
The purpose of this sort of thing is to say, "Look, you cannot take the
position that any one of these methodologies in itself is superior to the
other ones." It depends really on the problems you are examining.
Returning to backgammon, there has recently been what I consider to be a
highly significant development. Gerald Tesauro [at IBM Hawthorne] came out
with a paper in which he described using reinforcement learning. You start
with a description of the legal moves, the board, the rules. That's all. The
system then begins to play, and eventually it learns to play quite well, until
finally it plays championship-quality backgammon! 
JO: The interesting thing for me is not only the fact that the program
eventually plays world-class backgammon, but that for the first few hundred
thousand games it is a mess. Its weighting is graphed, and the weights appear
randomly distributed. All of a sudden it kicks in, and there is order in the
distribution of the weighted guesses. It reaches a point where it
self-organizes its neural net and can repeat its high-quality plays from then
on.
LZ: If you look then at what Hans Berliner did, he himself supplied the rules.
What can fuzzy logic do? Nothing. In that kind of a thing, fuzzy logic can do
nothing, because fuzzy logic does not have the capability to start from
scratch and learn. In combination with neural networks, it can, but not by
itself. There is learning and training in fuzzy logic, but it is nowhere
nearly as advanced as Tesauro's work and neural networking in general.
Here is an example of a problem which can be solved in an impressive way by
one methodology and not at all by other techniques!
DDJ: But on the other hand, Berliner, who heads a small and not terribly
well-financed team, was able to achieve roughly equal tournament results
simply by applying fuzzy logic based on what he already knew about backgammon.
LZ: Exactly. But notice the different starting points. One starts with a blank
slate; the other starts with Berliner's knowledge.
There are problems of that kind for which fuzzy logic is the only thing you
can use; others for which neural networks are the only solution. It's not
just, in these cases, a matter of competition--that this is better than that.
We talked about traffic intersections before. Let's look at parking a car. The
problem has three components. The first component is how do you find a parking
space? There is no methodology, and I include here operations research,
control theory_.
DDJ: There being no guarantee you are going to find a parking space at all_. 
LZ: _where you can say, "Here I have a theory, and I will turn the crank and
the theory will tell me the precise steps to finding a parking space." Fuzzy
logic's solution is [to ask] "How do you find parking spaces?" and encode it
in this language.
The second component is that once you have found a parking space, you must
assess the desirability and difficulty of parking there. Is it close to where
you are going? Is it tight? Is it safe? Again, operations research will do you
no good whatsoever. There is nothing but fuzzy logic, which will again do the
same thing: It will ask you to articulate whatever criteria you may have in
judging.
The third component is actually maneuvering your car into that space. Here
some competition develops. You can say, "What about control theory, or
neural-network theory?" What fuzzy logic will do is say, "How do you park the
car? Give me the rules." Control theory will ask, what is the final goal? The
final goal is that the car is in the parking space, so it will use backwards
iterations, "Here is the final goal, the near goal, the subgoals," and work
back to the starting point.
Neural networking can take two approaches. It can observe the driver and try
to work out the sequence of commands that the driver uses. The other is just
to try this way, that way, until it succeeds, in the manner of reinforced
learning and genetic algorithms. Neural networks will not succeed in the case
of parking, although it does work in the case of trailer trucks where you have
to back up without maneuvering back and forth. It will not work with parking a
car.
The only theory which can be used in the parking problem is fuzzy logic. The
control-theory approach will fail because it will not exploit the tolerance
for imprecision. Neural networks will fail because strategy is much more
difficult to learn than a sequence of moves.
JO: And in the Sendai train [which is operated under human control once a week
so that the operators can refresh their skills] you have a weekly comparison
of how well the task has been learned, compared to human skills.
LZ: William [Kahan] will say that we could have done it just as well another
way.
WK: Or better!
DDJ: Fuzzy-logic compilers eventually output assembly code. A genius
understanding the problem domain and understanding assembly code could have
written the same code without the fuzzy-logic front end.
WK: Or he could have written better code, or used a paradigm which admits of
logical analysis.
DDJ: Plutarch says that logic takes us from point to point, but there are
great gaps in between those points that have to [be] bridged by things other
than logic.
WK: People will extoll the virtues of some other style of thought, but the
fact remains that a large component of creative thought is, in fact, trial and
error. For that you need logic.
DDJ: But are all creative flashes based on logic?
WK: Oh, no, what is often required is to allow yourself to think illogically,
or even intentionally incorrectly, in order to come up with other possible
scenarios. Still, you use logic to filter out the ones that are just pipe
dreams.
One of the serious failings of the fuzzy paradigm is that contradictory
information is averaged, and that there is no incentive to resolve
contradictions.
I don't claim that other design paradigms automatically expose contradictions
or conflicts. There's an impossibility of proving the consistency of the
system without having a model of the system built up in terms of some other
system you believe to be consistent. That's Gdel's Incompleteness Theorem.
But control systems written in the conventional way are broken up into modules
with specifications which submit to testing to see whether, in fact, they do
what they are supposed to do. That testing does not mean merely that you try
all possible inputs and see whether you are satisfied with all resulting
outputs. But with something written according to the fuzzy paradigm, there's
not much else you can do but try all possible inputs and see whether you are
satisfied with all the outputs.
LZ: We have all kinds of rule-based systems; they're becoming ubiquitous.
Whenever we use a rule-based system, you do have a problem with reduction of
complications. It may be that one rule tells you one thing, and somewhere else
in the system there is another rule that may be contradicting that rule. The
discovery of these things may reveal a degree of contradiction, rather than a
flat contradiction.
This problem may exist in any rule-based system, whether that system is a
crisp rule-based system or a fuzzy rule-based system. But in the case of fuzzy
rule-based systems, it is a much less serious problem than in the case of a
crisp rule-based system. In [the] case of a fuzzy rule-based system, the rules
do not fire sequentially, as they do in a conventional system.
In a conventional system, you look for a rule which is satisfied and fire that
rule. Then things change, and you look for another rule. In a fuzzy system all
rules fire in parallel, but each rule fires to a degree. The degree to which a
rule fires is proportional to the degree to which the antecedent is satisfied.
So if the antecedent matches the input to a slight degree, then weight
associated with that rule will be a small weight.
And then these things are aggregated, not necessarily through arithmetic
averaging, but through disjunction or some other operation. Essentially each
rule votes, but each rule has a weight. In practical systems in operation at
this point, the aggregation is not arithmetical, it's disjunctive. It's the
MAX operation. You take the maximum and you defuzzify and come up with a crisp
conclusion.
What I'm driving at is this: Because it's a voting system, if one rule is
wrong, it will not affect the whole thing.
WK: Wishful thinking.
LZ: There are many systems which have demonstrated this, notably the balancing
pendulum. There are seven rules. When Professor Yamakawa was demonstrating
this thing, he would disable two out of these seven rules. The system would
still continue to function. In other words, there is a graceful degradation.
It's not sudden, because of the aggregation process. 
WK: Or perhaps because of a superfluity of rules, and [because] he could have
gotten along with fewer.
DDJ: But is it better to achieve an ideal solution with more engineering, or
is it better to achieve an adequate solution and save a million dollars in
development costs?
WK: That's where we also have a serious conflict. I propose that the cost of
designing a control system for heating a house with a simple gas furnace can
be a thousand times greater using fuzzy logic, and implementing the system
about ten times greater using fuzzy logic than classic control theory.
LZ: But that's your supposition. I will counter with the evidence of people
who have actually done that. The Schlumberger company in France, an
oil-exploration company, is becoming concerned with HVAC. One of their
engineers has been using fuzzy logic to come up with an optimal system for
them. He is a very well-informed person, he has tried a variety of techniques.
There are companies in Germany working on heating buildings, there's
Mitsubishi in Japan._ Air conditioning uses fuzzy logic, automobiles use fuzzy
logic in their air-conditioning systems._ So we have a lot of experience.
Hitachi, Toshiba_we have this kind of experience, versus people for whom I
have great admiration saying they can do this in some other fashion.
We are comparing the work of hard-headed people who have actually designed
systems, built systems, tested them, compared them, with, with_I'm not sure
how to refer to it.
WK: Say something pejorative!
LZ: (Laughing) Professor Tribus, when he was told about the Sendai system,
said, and I quote, "I could do the same thing using Bayesian logic." He
admitted that it hasn't been done, but he thinks it can be done. The thing
that struck me about it--and I know Tribus quite well--he's a person who knows
quite a lot about probability, decision analysis. He knows nothing about
control. He knows nothing about that system. And yet, he felt justified in
saying he could do the same thing using Bayesian logic, without knowing
anything about that system.
I hear this frequently: "I could do the same thing using something." Do you
know something about that system? Do you know how it functions? "No, I don't
really."
In the case of this Sendai system, one has to look at how this system
functions. What are the rules? What's the performance? These people, as I
said, performed three-hundred thousand simulations. They're not stupid, these
Hitachi people. They spent eight years working on these things.
So if you take the people working for Mitsubishi, Toshiba, Hitachi, Siemens,
SGS Thompson_Siemens has about 30 or 40 people working on fuzzy problems.
Cement kilns were the very first successful application of fuzzy logic. In
1980, the F.L. Smith company in Copenhagen came out with this application.
Today, all cement kilns use fuzzy logic. They have sort of convinced
themselves that this is the way to go.
Let me take a little peek into the future. In five, ten, fifteen years from
now, fuzzy logic will become a standard part of the curriculum. It's the sort
of thing that engineering and other students really would be learning
routinely. I think it will turn out that this capability to describe phenomena
which do not lend themselves to precise characterization is something which is
lacking in traditional approaches. This will provide a tool which does not
displace the other tools, but adds to the other tools.
I would also say that we should not look at fuzzy logic in isolation, but see
it as part of "soft computing." Soft computing tolerates imprecision, partial
truth, considering them not as undesirable, because the real world is
imprecise. We cannot capture many aspects of the world if we stick to the
traditional framework. We cannot formulate theorems, we cannot solve problems,
or if we can, the cost is excessive. So let us learn something about this
other methodology.
It is for this reason that I say fuzzy logic will become part of the
curriculum. This controversy which we have today will be forgotten. In fact,
people will be wondering why we were arguing about something that is so
obvious!







August, 1994
Data Attribute Notation in C++


A coding style that emphasizes data abstraction




Reginald B. Charney 


Reg is president of Charney & Day and a voting member of ANSI's X3J16
Committee on the C++ Language. He can be reached on CompuServe at 70272,3427.


Data Attribute Notation (DAN) is an object-oriented coding style that
emphasizes data abstraction. DAN closely relates the abstract concepts defined
in a project's analysis and design stages to the implementation stage. Thus,
the differences across these three stages are minimized. The benefits of this
include: better comprehension, higher productivity through clearer
communications, and fewer errors as a result of fewer misunderstandings. DAN
also provides better data abstraction, stronger type checking, greater
clarity, and fewer errors than traditional programming styles. 
Most C++ classes are composed of one or more components. As such, a class can
be thought of as a collection of components. By implication, a derived class
then is also a collection of components. Normally, components have a type and
a unique name. These names usually reflect an attribute or property of a
class. For example, a product has an id, a description, a cost price, and a
list price. In Listing One , pid, desc, cost_price, and list_price are
attributes of a product. If another component were added to the class, it
would form another attribute for the abstract entity called a "product." For
instance, a quantity attribute could be added to the class Product.
At this point, there are no surprises: A class is defined in terms of its
components, or "attributes."


Attribute Classes


An attribute class is a class encapsulating a single logical entity, usually
represented by one data member. Its interface is limited to a few operations
that make logical sense for the logical entity in the class. Thus, an
attribute class that defines product ids can ensure that only valid codes are
created and only valid operations can be performed on them.
Constructors, including copy and conversion constructors, are used to create
instances consistent with the meaning of the data. Conversion operators are
also needed to allow instances of attribute classes to be used with
non-user-defined data types. To read or write instances of attribute classes,
input and output operations need to be defined. There may also be other member
functions for attribute classes, depending on their requirements. Using the
example of the product code, you can define an attribute class called Pid; see
Listing Two .
Note that now, only certain defined operations are possible on a product id.
Also, the actual implementation of a product code is hidden inside the class
and its member functions. Thus, the Product class can be defined in terms of
attribute classes (as in Listing Three), where the member names play a less
important role in attribute classes than they would in normal classes that use
normal coding practices. As such, the names in attribute classes tend to be
shorter than normal. Since the elements of the class Product are private,
Product users can treat the implementation of the class as a black box,
allowing the implementation of any attribute class to change. Thus, the class
Product can be considered a collection of attributes, and this leads to the
question of how to manipulate the value of Product attributes.


Changing Attribute Class Values


An assignment operator is used to change values. However, it does not work
well for assigning values to implicit members of the class Product. For
example, in Listing Four , to what component is the value 7 assigned?
Defining assignment operators for individual attributes does not work well
either, since it requires both that the user know the names of the components
of the Product class and that the components be public, a clear violation of
the principle of data hiding. Listing Five is a poor solution to this problem.
Listing Six is the classic C++ method for changing attribute values. It
requires that a user learn the names and prototypes for a number of member
functions that can be used to get and set attribute values. An approach that
uses data attributes and the black box, however, avoids the problems
associated with both Listings Six and Seven.
DAN treats a given class as being defined by its attribute classes. Also, all
attributes are unique in a given collection. In the Product example, both cost
and list prices started out as type double, but ended up being represented by
the unique attribute classes Cost and List. 
There is a final piece of groundwork that needs to be laid; it is the analogy
to the I/O streams classes. Most programmers use these classes without knowing
how they are implemented. However, you can change the contents of a stream by
using friend functions and overloading operator functions: for example, << for
inserting data, and >> for extracting data. Thus, the stream classes are
treated as black boxes that are modified using the insert and extract
operators. Overloading the << and >> operators also allows streams to appear
in expressions and be extensible.
DAN uses the insert and extract operators for the same reasons that I/O
streams do. Other operators could have been used, and originally were, but the
correspondence between I/O streams as black boxes and collection classes as
black boxes was too good to ignore. Therefore, DAN uses the insert and extract
operators to get and set data in collection classes. Single components can
also be extracted from a collection by using the attribute classes as
conversion operators. Listing Seven illustrates this.


Relationships


While attributes define the properties of a class, they also define
relationships within a class. For example, all instances of the Product class
must have a product id and a description. Attributes can define relationships
between different classes. For example, you can define an invoice line-item
class, called LineItem; see Listing Eight .
A relationship between an invoice line item and a product is formed by the
product id. A product is part of an invoice line item if both have the same
value for their product id. This relationship is expressed in Listing Nine .
At execution time, the isIn function returns nonzero if the supplied product
is in the given invoice line item. Otherwise, it returns zero. This isIn
function can be defined outside either class Product or class LineItem. It
depends on the linking or key attribute. In this case, the key attribute is
Pid. 


Static and Dynamic Relationships


If a relationship exists between two or more classes, then, at a minimum,
there must exist a function to verify the relationship between the classes and
use one or more attributes that the classes have in common. In the isIn
example, the classes Product and LineItem have the Pid attribute in common,
and it is valid C++ to write the function isIn.
During execution, if a given LineItem Pid value matches the Pid attribute
value of a Product, then the corresponding invoice line item contains the
product.
A static relationship always exists if classes have common attributes. A
dynamic relationship exists during execution. Static relationships can be
checked at compile time, while dynamic relationships need to be evaluated at
run time.


Conclusion


There is a trade-off between classic C++ coding methods and DAN: A program
using DAN has more classes but does not pollute the namespace as badly. That
is, DAN overloads attribute names, thus avoiding the need for access functions
like getX() and setX(), where X is some attribute name.
Using DAN to define a problem is more declarative in nature than most other
C++ coding styles, yet, it leads more directly to an implementation, since you
compile the specifications that you wrote. Because the step from specification
to implementation is very short, the chance of errors and misunderstandings is
smaller.


Listing One 
// classic C++ style

class Product {
 long pid;
 char desc[30];
 double cost_price;
 double list_price;
};



Listing Two

class Pid {
 long p;
public:
 Pid(long pp = 0) { p = pp; }
 Pid(const Pid& pp) { p = pp.p; }
 Pid operator long() { return p; }
 Pid operator +(const long pp)
 { return Pid(p+pp); }
 Pid operator -(const long pp)
 { return Pid(p-pp); }
 friend ostream& operator <<
 (ostream& os, Pid& pp)
 { return os << pp.p; }
 friend istream& operator >>
 (istream& is, Pid& pp)
 { return is >> pp.p; }
};



Listing Three

class Product {
 Pid p; // product code
 Desc d; // description
 Cost c; // cost price
 List l; // list price
public:
 Product(const Pid& pp,
 const Desc& dd,
 const Cost& cc,
 const List& ll)
 :p(pp), d(dd), c(cc), l(ll) { }
 ~Product() { }
 friend ostream& operator <<
 (ostream& os, Product& p);
 friend istream& operator >>
 (istream& is, Product& p);
}; 



Listing Four


Product p; // product instance
p = 7; // assign 7 to what?




Listing Five

// poor solution - do not use
class Product
{
public:
 Pid p; // bad practise
 // ... other members
}
 Product aP;
 aP.p = 7;



Listing Six

// classic C++ coding style

class Product {
 Pid p;
 Desc d;
 Cost c;
 List l;
public:
 Cost getCost() const
 { return c; }
 void setCost(const Cost& cc)
 { c = cc; }
 // ... other members
};
 Product aP;
 Cost aCost;
 aCost = aP.getCost();
 aP.setCost(aCost);



Listing Seven

// Data Attribute Notation style
class Product
{
 Pid p; // product code
 Desc d; // description
 Cost c; // cost price
 List l; // list price
public:
 // insert & extract operators
 Product& operator <<(Pid& pp) { p = pp; return *this; }
 Product& operator <<(Desc& dd)
 { d = dd; return *this; }
 Product& operator <<(Cost& cc)
 { c = cc; return *this; }

 Product& operator <<(List& ll)
 { l = ll; return *this; }
 Product& operator >>(Pid& pp)
 { p = pp; return *this; }
 Product& operator >>(Desc& dd)
 { dd = d; return *this; }
 Product& operator >>(Cost& cc)
 { cc = c; return *this; }
 Product& operator >>(List& ll)
 { ll = l; return *this; }
 // attribute conversion ops
 operator Pid() const 
 { return p; }
 operator Desc() const
 { return d; }
 operator Cost() const
 { return c; }
 operator List() const
 { return l; }
 // constructors/destructor
 Product() { }
 Product( const Pid& pp, 
 const Desc& dd,
 const Cost& cc, 
 const List& ll)
 : p(pp),d(dd),c(cc),l(ll) { }
 ~Product() { }
 // I/O stream friend functions
 friend ostream& operator <<
 (ostream& os, Product& pp);
 friend istream& operator >>
 (istream& is, Product& pp);
};
ostream& operator <<
 (ostream& os, Product& prod)
{
 os << " Prod: " << prod.p
 << " Desc: " << prod.d
 << " Cost: " << prod.c
 << " List: " << prod.l
 return os;
}
istream& operator >>
 (istream& is, Product& prod)
{
 is >> prod.p >> prod.d
 >> prod.c >> prod.l;
 return is;
}
int main() {
 Product prod;
 Pid pid(184389);
 Desc desc("Angle Bracket");
 Cost cost;
 // using insert/extract ops
 prod << pid << desc;
 prod >> desc >> pid;
 prod << Pid(34562);
 prod >> cost;

 // using conversion operators
 cout << " Prod: " <<Pid(prod)
 << " Desc: " <<Desc(prod)
 << " Cost: " <<Cost(prod)
 << " List: " <<List(prod)
 << endl;
 // using stream I/O
 cin >> prod;
 cout << prod;
 return 0;
}



Listing Eight

class LineItem {
 Iid i; // invoice id
 Pid p; // product id
 Quant q; // quantity
public:
 // ... other members
};



Listing Nine

Bool isIn(Product& p,LineItem& i)
{
 return Pid(p)==Pid(i);
} 






























August, 1994
Associations in C++


Callback lists allow arbitrary objects to work together here




Dan Ford


Dan is a software engineer with Hewlett Packard's Medical Products Group,
working with GUIs and real-time applications. He can be reached at
ford@wal.hp.com.


The decomposition of event-driven systems into object-oriented designs poses
interesting questions. This is particularly true when resolving issues about
the relationships between objects. Common questions include: 
Which classes "know" about the relationships to other classes? 
Should you create "manager" classes or "relationship" objects to handle the
associations? 
How are these associations implemented without unduly sacrificing the
reusability of classes? 
Built-in relationships always have the potential for limiting the reusability
of a class, since relationships are what is most likely to be different in a
different context needing the same class.
In an application modeling a car, for example, the program might create
classes for the various objects that together make up a car: engine,
transmission, wheels, brakes, and so on. In modeling the car, the associations
between these objects are surely at least as important as their individual
behavior. However, if intimate knowledge of these relationships is coded into
these objects, they might be much more difficult to reuse in another
application that does not need the same relationships.
We need methods that allow objects to interact with one another without
hardcoding the relationships. In this article I'll present a method of
implementing relationships in terms of the behavior inherent in an object,
rather than the use of references to specific objects. This is achieved via
callback lists.
A callback is a function that's registered at run time with a data structure
or object. The object then calls the function when particular events occur.
Usually, if an object has callback support, it maintains a list of such
callback functions for each type of event. When the event occurs, all callback
functions on the corresponding list are invoked, one at a time.
Maintaining callback lists is a powerful technique because it allows arbitrary
objects to work together in a highly synchronized manner, without building in
specific knowledge about one another. For example, in a real-time medical
application, a system might use callbacks to maintain and update a display.
The system might collect heart rate, blood pressure, respiration, and other
vital signs at irregular intervals. Such a system might build container
objects to store the various measurements. If these container objects invoked
a callback list each time a new value was inserted into the container, then
any number of callbacks could be registered to be notified when this event
occurs. Perhaps one callback could update a heart-rate display; another could
perform a calculation involving the new measurement; others could trigger
alarms if the new value fell within critical parameters, and so on. The point
is that any number of dependencies can be placed on the event; more
importantly, these dependencies can be registered and deregistered at run
time. By providing a mechanism whereby callback procedures can be registered
with an object, other objects can be notified of various events that concern
them.
You can use the C++ class that's the focus of this article to easily and
quickly add callback lists to your own classes. This class takes care of all
the work necessary to maintain the callback list and invokes the chain of
callbacks when told to. It also keeps track of client data that must be passed
to the callback function when invoked. Once the basic infrastructure is in
place, adding callbacks to your objects becomes fairly simple.
Throughout this article, I'll use the term "client" to refer to any object,
function, or subprogram that supplies the actual callback functions. A client
is any entity that wishes to register a callback on an object. In contrast,
"owner" objects are those that own the callback lists. The owner "yanks" the
callback chain when a particular event occurs, thereby invoking the various
callback functions registered by its clients. 
Listing One is Callback.h, which defines the classes Callback and CBMgr.
Listing Two is Callback.cpp, which is the implementation for these classes.
Callback is a class that encapsulates individual callback functions. CBMgr is
a class that manages a chain of callbacks, providing a method to invoke the
entire chain of callbacks, as well as methods to add and remove callbacks from
the chain. An owner class that wishes to make use of these to support
callbacks must:
Include a private CBMgr data member for each callback chain it wishes to
maintain. 
Provide a method for client classes to register callback procedures. Since the
work involved in this is mostly done by the supplied CBMgr class, this method
can usually be written as an inline function with one statement.
Invoke the corresponding callback chain when the appropriate event occurs.
This can usually be done with a single statement, since the work of traversing
the chain and invoking the callbacks is the job of the CBMgr class.
Publish in the class interface, the format of the event data passed to the
callbacks when they are invoked. The event data can range from a simple NULL
value to a pointer to an arbitrarily complex structure.


Callback and CBMgr Classes


First of all, every Callback object contains a pointer to a function. This
function is the one called when the callback is invoked. This function must
have a particular prototype, which is also defined just above the class
definition (see Listing One).
The second private data member of the Callback class is the client data
supplied with the callback when it's registered. This data is simply held, and
each time the callback is invoked, this item is supplied as the second
parameter to the callback function. Its format and contents need be known only
to the actual callback function. 
The third private data member of the Callback class is the pointer to the
owner object. It is supplied when the callback is created and is simply held
until the callback is invoked. Each time the callback is invoked, this pointer
is passed as the first parameter to the callback function. This parameter can
be used by the callback function to perform additional actions on its owner
object. For example, if an object with several interrelated attributes invokes
a callback chain each time one of its attributes changes, then callbacks can
be registered on the object to perform consistency checking. This guarantees
that the attributes of the object will stay within certain parameters set by
the callback function.
The fourth private data member of the Callback class is a pointer to the next
callback. This simplifies building chains of callback objects. The usefulness
of this member will become clearer when you meet the CBMgr class. This member
can be queried and set using the GetNext() and SetNext() methods. If a
container class library is available, this member can be done away with
because all list manipulation can be encapsulated by the CBMgr class (using an
abstract linked list from the container library). To keep things simple,
however, I build the list directly.
The constructor for the Callback class requires three arguments: a pointer to
the owner object, the pointer to the actual callback function, and the client
data supplied by the client when the function was registered. Callback objects
are never actually instantiated by clients; rather, they're created by the
CBMgr object when its Register method is called.
The Invoke method is used to call the callback function. When Invoke is
called, the supplied parameter is the call data, which is passed to the
callback function as the third argument. (Invoke also passes the client data
as well as the owner object pointer, both of which are supplied by the
Callback object.) As with the constructor, this method is rarely called
directly; rather, it is usually called by the CBMgr object when its own
InvokeAll method is called.
As mentioned earlier, the prototype for callback functions is specified by the
callback class. Its return type is always void, and it will expect these
parameters: 
Parameter 1 is a pointer to the object that owns the callback (that is, the
object that this function is registered on). As mentioned earlier, this can be
used by the callback function to perform additional actions on the owner
object.
Parameter 2 is a client-defined data item that's passed in when the callback
function is registered. This value is usually a pointer. The callback object
holds on to the client data and supplies this item to the function call each
time it's invoked. This client data can be set up prior to registering the
callback to contain whatever data will be necessary for the callback to do its
job. For example, in the medical application described earlier, the client
data might be the window handle or display ID where the heart rate is to be
displayed. Sometimes this parameter is unnecessary, in which case NULL can be
supplied.
Parameter 3 is the event data, the data item supplied by the owner object.
It's provided each time the callback is invoked, and is intended to convey
state information or other relevant data regarding the event that caused the
callback chain to be invoked. Typically this parameter is a pointer to a
structure that contains information relevant to the event that invoked the
callback. The container objects might have a callback chain invoked each time
a new item (for example, a heart-rate value) is added to the container. When
the callback chain is invoked, the event-data parameter points to the most
recently added value. What this parameter contains is completely up to the
owner class, but its format must be documented by the owner class so that
clients that register callbacks know what to expect. As with the client-data
parameter, if this data is deemed unnecessary by the owner class, it can pass
NULL.
The CBMgr class maintains a list with an arbitrary number of Callback objects.
It does the work of registering new callbacks and deregistering callbacks that
need to be removed. Finally, it simplifies the task of calling all the
callbacks on the chain, which is the most frequent action performed on a
callback chain. With the CBMgr class, objects can invoke an entire chain of
callbacks with a single function call requiring one argument.
Internally, the CBMgr object must maintain pointers to the first and last
objects in the chain. (As mentioned earlier, if a linked-list container class
is available, these pointers will be replaced by the linked-list container
object.) The public methods are Register, Deregister, and InvokeAll.
Register(void*pObj, PFNCB pfn, void*clientData) is the method used to add a
callback to a CBMgr. Typically this method is called by the owner object (on
behalf of the client) and returns a pointer to a Callback object, which should
be passed back to the client registering the callback. The pointer can be used
by the client to uniquely identify the callback when it needs to remove the
callback.
Deregister(PCB pcb) is the counterpart of the register method and removes a
callback from a CBMgr object. Again, this function is usually called by the
owner of the callback chain on behalf of the client. This client must supply
the pointer to the callback object it received when the callback was
registered. You might be tempted to use a simpler method to uniquely identify
the callback so as to not require the client to hold on to an identifier. One
method is to use the address of the actual callback function. Simply using the
address of the function to identify the callback would not be adequate, since
in many circumstances a particular function could be registered several times
on the same callback list (each instance with different client data). In
effect, a callback is uniquely identified by a pair (function address and
client data), encapsulated by the Callback object. Returning the pointer to
the Callback object uniquely identifies the callback instance.
InvokeAll(void*eventData) is the method that the owner object calls when it
wishes to "yank" a callback chain. When this function is called, the CBMgr
object traverses its list, invoking the callbacks one by one and passing to
each the eventData parameter. The eventData is formatted by the owner object,
and may be as simple as a single discrete value or as complex as a pointer to
an elaborate structure. In any case, it should not be unexpected by the
callback function, since the format of the eventData should be published in
the owner-class interface.


An Example


Listing Three is an example program which demonstrates the use of the callback
classes. It defines a class, NumContainer, which will be given integers one at
a time using the AddNumber method. Each time AddNumber is called, NumContainer
will invoke the callback list.
After the program creates an instance of NumContainer, it registers four
callbacks on the container (actually all four callbacks are the same function,
but they have different clientData). The program then generates 100 random
numbers, putting them into the NumContainer. Finally, the program prints the
results. The count of numbers and the average are obtained by calling methods
on NumContainer. The results of the callbacks can be observed by examining the
clientData of the callbacks, which are in the global variable aCData (an array
of structures, one for each callback). The output from a typical run is shown
in Figure 1.
This example shows how to add callbacks to your own classes. Notice that
adding callbacks to a class involves very little overhead. Both the Register
and Deregister methods on NumContainer were implemented as a single-line
inline function, and the AddNumber method had one extra statement to invoke
the callback chain.



Conclusion


Once you begin to use callbacks, you'll find many uses for them. They're
especially suited for configurable, event-driven systems that can change
configuration during run time. Since associations can be added/removed at run
time, they need not be hardcoded. Callbacks are also very useful in
applications requiring "watchdogs" to monitor the value of certain parameters.
These watchdogs can be installed (as callbacks) without disrupting the primary
design of the system.
Another use is in the design of GUIs. If, for example, you'd like a certain
label highlighted whenever a particular entry field receives the focus, a
callback would do nicely. Just have the entry-field object invoke the "gaining
focus" and "losing focus" callback chains. Register the callback functions for
each of these. The "gaining focus" callback would highlight the associated
label, while the "losing focus" callback could reset the label to its normal
state. Using similar methods, callbacks can enforce many different types of
dependencies between fields in dialogs and other user-interface code.
Figure 1: Output of C++ callback demonstration program (Listing Three).
There were 100 numbers generated.
The average is 52
There were 11 numbers greater than or equal to 90
There were 19 numbers greater than or equal to 80
There were 31 numbers greater than or equal to 70
There were 42 numbers greater than or equal to 60

Listing One 
/***** Callback.h *****/

//----------------------------- defines ----------------------------
#define NULL 0L
#define TRUE 1
#define FALSE 0

//----------------------------- typedefs ---------------------------
typedef int BOOL; // define a Boolean type
typedef void FNCB(void * pObj, void * clientData, void * callData);
typedef FNCB * PFNCB;

//------------------------------ Class -----------------------------
class Callback {
 void * cdata; // client data
 PFNCB pfnCallback; // function to be called
 void * pOwner; // object that owns the callback
 Callback *pNext; // pointer to next callback in chain
public:
 Callback(void * pObj, PFNCB pfn, void * clientData);
 ~Callback();
 void Invoke (void * callData)
 { pfnCallback(pOwner, cdata, callData); };
 Callback * GetNext ()
 { return pNext; };
 void SetNext (Callback * pCB)
 { pNext = pCB; };
};
typedef Callback * PCB;
class CBMgr {
 Callback *pFirst; // pointer to first callback in chain
 Callback *pLast; // pointer to last callback in chain

 void AddToList (PCB pcb); // method to add a callback to the list
 BOOL RemoveFromList (PCB pcb); // method to remove a callback from list
public:
 CBMgr();
 ~CBMgr();
 PCB Register (void * pObj, PFNCB pfn, void * clientData);
 BOOL Deregister (PCB pcb);
 void InvokeAll (void * callData);
};




Listing Two

/***** Callback.cpp *****/

//------------------------- Includes ------------------------------
#include "Callback.h"

//--------------------------- code --------------------------------
Callback::Callback(void * pObj, PFNCB pfn, void * clientData)
{ cdata = clientData;
 pfnCallback = pfn;
 pOwner = pObj;
 pNext = NULL;
}
Callback::~Callback()
{
}
CBMgr::CBMgr()
{ pFirst = NULL;
 pLast = NULL;
}
CBMgr::~CBMgr()
{ PCB p, pNxt;
 p = pFirst;
 while (p) { // traverse list and destroy all
 pNxt = p->GetNext(); // callbacks that still remain
 delete (p);
 p = pNxt;
 }
}
void CBMgr::AddToList (PCB pcb)
{ if (pLast) {
 pLast->SetNext(pcb);
 pLast = pcb;
 }
 else {
 pLast = pcb;
 pFirst = pcb;
 }
}
BOOL CBMgr::RemoveFromList (PCB pcb)
{ PCB p;
 BOOL fFound = FALSE;
 p = pFirst;
 if (p == pcb) {
 pFirst = pFirst->GetNext();
 fFound = TRUE;
 } else
 while ((p) && !fFound)
 if (p->GetNext() == pcb) {
 p->SetNext(pcb->GetNext());
 fFound = TRUE;
 } else
 p = p->GetNext();
 return fFound;
}
PCB CBMgr::Register (void * pObj, PFNCB pfn, void * clientData)

{
 PCB pcb = new Callback(pObj, pfn, clientData);
 AddToList (pcb);
 return pcb;
}
BOOL CBMgr::Deregister (PCB pcb)
{
 if (RemoveFromList (pcb)) {
 delete (pcb);
 return TRUE;
 } else
 return FALSE;
}
void CBMgr::InvokeAll (void * callData)
{
 PCB p;
 p = pFirst;
 while (p) { // traverse list
 p->Invoke (callData); // invoking each callback in the list
 p = p->GetNext();
 }
}



Listing Three

/***** C++ callback demonstration program *****/

//-------------------------------- Includes ----------------------------
#include <stdio.h>
#include <stdlib.h>
#include <time.h> // needed by randomize()
#include "Callback.h"

//-------------------------------- Defines -----------------------------
#define NCRITERIA 4 // number of callbacks we will register
#define NROUNDS 100 // number of random numbers generated
#define MAXNUM 100 // max range for random numbers

//--------------------------------- Types -----------------------------
// The NumContainer class will be the 'owner' class in this example. Random
// numbers will be given to an instance of NumContainer. The numbers will be
// summed and counted. On arrival of each number, the NumContaier class will 
// also invoke the callback list. The eventData passed to the callbacks when 
// invoked will be the new number just added to NumContainer.
class NumContainer {
 int Total;
 int Count;
 CBMgr CBList;
public:
 NumContainer(); // constructor
 void AddNumber (int num);
 int QueryAvg() { return (Total/Count); };
 int QueryCount() { return Count; };
 PCB RegisterCallback (PFNCB pfn, void * clientData)
 { return CBList.Register ((void *)this, pfn, clientData); };
 BOOL DeregisterCallback (PCB pcb)
 { return CBList.Deregister (pcb); };

};
// The structure defined below will be used by the callback functions as
// their clientData. The structure will define the threshhold value, and
// a count of the numbers that are greater than or equal to the threshhold.
typedef struct {
 int Threshold;
 int Count;
} CRITERIA;

//------------------------------ Static Data ---------------------------
static CRITERIA aCData[NCRITERIA] = { {90,0}, {80,0}, {70,0}, {60,0} };

//------------------------------- Prototypes ---------------------------
// This is the function that will be registered as the callback. It will be 
// registered 4 different times, that is, there will be 4 instances of this 
// function registered, each with a different threshold value in the structure
// pointed to by cData.
void CounterCallback (void * pObj, void * cData, void * eventData);

//---------------------------------- Code ------------------------------
NumContainer::NumContainer() // constructor for NumContainer
{
 Total = 0;
 Count = 0;
}
void NumContainer::AddNumber (int num)
{
 Total += num;
 Count++;
 CBList.InvokeAll((void *)num);
}
void CounterCallback (void * pObj, void * cData, void * eventData)
{
 CRITERIA *pC = (CRITERIA*)cData; // cast client data
 if ((int)eventData >= pC->Threshold) // if new value >= threshold then
 pC->Count++; // increment counter
}
void main ()
{
 NumContainer nc; // number container object
 int i;
 // Register callbacks on the number container object.
 for (i=0; i < NCRITERIA; i++)
 nc.RegisterCallback (CounterCallback, &aCData[i]);
 // Generate random numbers, and add them to the number container. Each time 
 // we Add a new number, the entire chain of callbacks should be called.
 randomize();
 for (i=0; i < NROUNDS; i++)
 nc.AddNumber(random(MAXNUM));
 // Display results.
 printf ("There were %d numbers generated.\n",nc.QueryCount());
 printf ("The average is %d\n",nc.QueryAvg());
 for (i=0; i < NCRITERIA; i++)
 printf ("There were %d numbers greater than or equal to %d\n",
 aCData[i].Count, aCData[i].Threshold);
 exit(0);
}

































































August, 1994
Using the Microsoft Mail API


Encapsulating message information 




Jim Conger


Jim is a business manager for Chevron Chemical and the author of numerous
programming books, including the recently published Windows API New Testament
(The Waite Group, 1994). Jim can be contacted on CompuServe at 73220,324.


Electronic mail has become a mainstream application. For many of us, the
electronic-mail system is the primary application we use every day, with word
processors and spreadsheets filling lesser roles. For the most part, e-mail
systems have been used to simply send and receive text messages which would
otherwise have been sent on a printed page. Many developers, however, have
discovered that e-mail is capable of doing much more than simply transmitting
text. Lotus Notes has popularized the concept of a textual database, which
allows groups to collect and organize more-structured documents. The
integration of OLE into several mail systems has made file transfer simple
enough for most users to master with very limited training. A logical
extension to e-mail is to build custom applications that mirror a company's
existing work procedures. This can vary from simple input forms, to complex
applications that lead the user through a complete work process. Figure 1 is
the main window of an application which structures requests for technical
information. This application can be run from within Microsoft Mail or as a
stand-alone program. In the latter case, the application starts a background
mail session to transmit the data as a specially identified MS Mail message.
The application also "pops up" whenever an incoming message of this type is
selected from within the MS Mail in-box.


Microsoft Mail Extensions


There are two keys to extending MS Mail beyond simple messaging. The first is
the MAPI.DLL library, which exports 12 functions that you can call within your
own program. Microsoft calls this the "simple mail API" since it intends to
add to the function list in future releases of Mail. Table 1 is a list of the
MAPI functions, which are documented in the Microsoft Mail Technical Reference
and the Microsoft Development Library (MSDN) CD-ROM. You can call these
functions within Access, Excel, Word Basic, Visual Basic, or just about any
other development platform. The MAPI functions are easy to use, and provide a
lot of horsepower for a minimum effort. 
The second key to extending MS Mail is Microsoft's APPEXEC.DLL, which passes
information about a message received by MS Mail to another application. The
information is transferred by passing the handle of a global-memory block
containing a message identifier. The handle is passed using a clever trick
that I have not seen elsewhere. If APPEXEC.DLL is called with a command line
containing the string <PARAMBLK>, that string is replaced in the command-line
arguments with the hexadecimal handle of the global-memory block containing
the message information. In other words, if APPEXEC.DLL is called with the
command-line arguments APPEXEC.DLL <PARAMBLK>, the DLL will overwrite the
<PARAMBLK> string with a handle to APPEXEC.DLL B046. The program wishing to
read the message can then use this handle to read the memory block. Example 1
shows how the memory block is accessed. The application takes ownership of the
memory block by calling GlobalRealloc() using the GMEM_ MODIFY
GMEM_MOVEABLEGMEM_SHARE flags, and then works directly with the data in
memory.
The memory block will contain information about the message (or messages) that
the user has selected from within MSMAIL.EXE. The data is organized as a
PARAMBLK structure; see Example 2. The key member of this structure is
lpMessageIDList, which contains one or more unique message identifiers. The
message identifier is used to read the actual message data using
MAPIReadMail().
The combination of the mail functions in MAPI.DLL and the APPEXEC.DLL library
gives you the flexibility to build your own application right into the fabric
of MS Mail. Figure 2 is a conceptual model of how these tools work together.


Custom Message Types


Mail supports the concept of different message types. Normal text messages are
the default type, but you can create your own specialized message types to
handle transmittal of more-structured data. The message identifier allows Mail
to route specific types of messages to different applications, instead of
always displaying the message data as text. Mail (MSMAIL.EXE) is made aware of
the specialized message types by adding an entry to the MSMAIL.INI file in the
user's Windows subdirectory. Example 3 is a typical entry that creates a
custom message type "MailDemo.Demo"; the prefix IPM stands for "interprocess
message." The MSMAIL.INI entry specifies a custom menu item in the Mail
program window titled "Mail Demo_"; selecting this menu item will load the
library APPEXEC.DLL and pass the command line \MAILDEMO\MAILDEMO.EXE --m
<PARAMBLK> to that DLL. The binary 1s and 0s specify that this custom message
type be recognized for composing, reading, replying, forwarding, printing, and
saving of messages marked as having the type "MailDemo.Demo." The MSMAIL.INI
entry must all be on one continuous line of text.


Message Data


The MS Mail system uses several data structures to encapsulate message
information. Example 4 shows the definitions of the MapiMessage and
MapiRecipDesc structures, which are defined in the MAPI.H header file supplied
by Microsoft. The MapiMessage structure defines how an individual Mail message
is organized. For example, the MapiMessage structure contains a pointer to the
subject string (lpszSubject), a pointer to the contents of the message
(lpszNoteText), and a pointer to a string used to identify unique message
types (lpszMessageType). Messages are transmitted including information about
who sent the message and everyone on the distribution list. Address
information is organized using the MapiRecipDesc structure (Example 4). The
MapiMessage structure contains pointers to two sets of MapiRecipDesc data.
lpOriginator points to MapiRecipDesc data for the individual that sent the
message, while lpRecips points to an array of one or more MapiRecipDesc
entries for the people copied on the message-distribution list, including the
originator. The long-integer value nRecipCount is used to store the number of
entries in the MapiRecipDesc array pointed to by lpRecips. Messages can also
include embedded files and OLE objects. At the bottom of the MapiMessage
structure definition are the member variables nFileCount and lpFiles, which
are used to organize embedded data. MAPI.H also includes the definition of the
MapiFileDesc structure, which is used to store individual file and OLE
entries. 


Using the CMapiUtil Class


MAPIUTIL.H and MAPIUTIL.CPP (Listings One and Two) show how the MAPI functions
exported by MAPI.DLL can be organized into a C++ class named CMapiUtil.
CMapiUtil does not use every possible mail feature, but is sufficient for most
projects. Table 2 summarizes the member functions of the CMapiUtil class.
Listing Three shows how the CMapiUtil class can be used to send a mail
message. The Logon() member function either starts a new mail session or
establishes a link to an existing session if MS Mail is already running.
GetAddressList() displays the standard MS Mail address-list dialog box for the
user to make selections. The SendMessage() function then uses the address list
to distribute the message. FreeMessage() clears memory buffers used by the
CMapiUtil class. Finally, Logoff() exits the mail session or simply severs the
link if MS Mail was already running.
The complete MAILDEMO program, including executables and programmer notes, is
available electronically; see "Availability," page 3. This program sends and
receives both standard text messages and special message types. The
application was written using Visual C++ 1.5 and the MFC library, plus the
CMapiUtil class presented here. 
Figure 1 Typical customized mail application.
Figure 2 Inserting your own application into the MS Mail system.
Example 1: Reading the message data passed by APPEXEC.DLL.
int CTsrApp::ReadMessageFromMemory (char * psCmdLine)
{ // convert the handle from ASCII hex digits to a number
 HGLOBAL handle = (HGLOBAL) atol_hex (psCmdLine) ;
 if (handle == 0)
 return HANDLE_ERROR ;
 // get ownership of block from APPEXEC.DLL
 HGLOBAL hMem = GlobalReAlloc (handle, 0,
 GMEM_MODIFY GMEM_MOVEABLE GMEM_SHARE) ;
 if (hMem == NULL)
 return HANDLE_ERROR ;

 // and lock the block to access it's members
 PARAMBLK * pParamBlk = (PARAMBLK *) GlobalLock (hMem) ;
 // use the PARAMBLK data in the memory block here...
 GlobalUnlock (hMem) ;
 GlobalFree (hMem) ;
 return 0 ;
}
Example 2: The PARAMBLK structure defined in MAILEXTS.H.
typedef struct tagPARAMBLK
{
 WORD wVersion; // eg. 0x0300 for ver 3.0
 WORD wCommand; // eg. wcommandOpen (def. MAILEXTS.H)
 LPSTR lpDllCmdLine; // command line string for msg. type
 LPSTR lpMessageIDList; // array of message identifiers
 WORD wMessageIDCount; // no. items in lpmessageIDList
 HWND hwndMail; // the mail window handle
 HANDLE hinstMail; // the mail instance handle
 LPSTR lpHelpPath; // the help file name (if any)
 DWORD hlpID; // the help file context ID
} PARAMBLK;
Example 3: Defining a custom message type within the MSMAIL.INI file.
[Custom Messages]
IPM.MailDemo.Demo=3.0;Mail;&Mail
 Demo...;2;\maildemo\APPEXEC.DLL;\maildemo\maildemo.EXE -m
 <PARAMBLK>;1111111000000000;Mail demo application;;;
Example 4: The MapiMessage and MapiRecipDesc structures from MAPI.H.
typedef struct
 {
 ULONG ulReserved; // Reserved (Must be 0)
 LPSTR lpszSubject; // Message Subject
 LPSTR lpszNoteText; // Message Text
 LPSTR lpszMessageType; // Message Class
 LPSTR lpszDateReceived; // in YYYY/MM/DD HH:MM format
 LPSTR lpszConversationID; // conversation thread ID
 FLAGS flFlags; // unread,return receipt
 lpMapiRecipDesc lpOriginator; // Originator descriptor
 ULONG nRecipCount; // Number of recipients
 lpMapiRecipDesc lpRecips; // Recipient descriptors
 ULONG nFileCount; // # of file attachments
 lpMapiFileDesc lpFiles; // Attachment descriptors
 } MapiMessage, FAR * lpMapiMessage;
#define MAPI_UNREAD 0x00000001 // flFlags values
#define MAPI_RECEIPT_REQUESTED 0x00000002
#define MAPI_SENT 0x00000004
typedef struct
 {
 ULONG ulReserved; // Reserved for future use
 ULONG ulRecipClass; // Recipient class
 // MAPI_TO, MAPI_CC, MAPI_BCC, MAPI_ORIG
 LPSTR lpszName; // Recipient name
 LPSTR lpszAddress; // Recipient address (optional)
 ULONG ulEIDSize; // Count in bytes of size of pEntryID
 LPVOID lpEntryID; // System-specific recipient reference
 } MapiRecipDesc, FAR * lpMapiRecipDesc;
#define MAPI_ORIG 0 // Recipient is message originator
#define MAPI_TO 1 // Recipient is a primary recipient
#define MAPI_CC 2 // Recipient is a copy recipient
#define MAPI_BCC 3 // Recipient is blind copy recipient
Table 1: Mail API functions exported by MAPI.DLL.

 Function Description 
 MAPIAddress Uses the Mail addressing features.
 MAPIDeleteMail Deletes a message.
 MAPIDetails Provides details of an address-list entry.
 MAPIFindNext Locates a message in the message queue.
 MAPIFreeBuffer Deletes a buffer used by another MAPI function.
 MAPILogoff Ends a Mail session.
 MAPILogon Starts a new Mail session.
 MAPIReadMail Accesses the data in a message.
 MAPIResolveName Gets a unique address for a recipient.
 MAPISaveMail Saves a message.
 MAPISendDocuments Transmits files.
 MAPISendMail Transmits a message.
Table 2: CMapiUtil member functions from MAPIUTIL.CPP.
Function Description
Logon() Loads MAPI.DLL into memory, obtains
 pointers to the MAPI functions,
 and starts a mail session.
GetAddressList() Displays the standard Mail dialog box
 for creating a distribution list.
 The list is initially empty.
GetUpdatedAddressList() Displays the standard Mail dialog box
 for creating a distribution list.
 The list is initialized with the names
 saved by SaveAddressList().
 FreeAddressList() Frees the buffers used
 by the GetUpdatedAddressList function.
SendMessage() Sends a message using the current address list.
ReadMessage() Reads a message from the Mail system.
SaveAddressList() Saves the distribution list from the last message.
FreeMessage() Frees memory tied up with the last message.
Logoff() Logs off the mail session and unloads MAPI.DLL.
NoteText() Returns a pointer to the contents of a message.
NoteLong() Returns the number of bytes in the contents of a message.

Listing One 

// mapiutil.h header file for mapiutil.cpp, jim conger, 1994
#ifndef MAPIUTIL_H
#define MAPIUTIL_H
// MAPI function types (from mapi.h prototypes)
typedef ULONG (FAR PASCAL *LPMAPILOGON)(ULONG, LPSTR, LPSTR, FLAGS,
 ULONG, LPLHANDLE);
typedef ULONG (FAR PASCAL *LPMAPILOGOFF)(LHANDLE, ULONG, FLAGS, ULONG);
typedef ULONG (FAR PASCAL *LPMAPISENDMAIL)(LHANDLE, ULONG,
 lpMapiMessage, FLAGS, ULONG);
typedef ULONG (FAR PASCAL *LPMAPIADDRESS)(LHANDLE, ULONG, LPSTR, ULONG, 
 LPSTR, ULONG, lpMapiRecipDesc, FLAGS, ULONG, LPULONG, 
 lpMapiRecipDesc FAR *);
typedef ULONG (FAR PASCAL *LPMAPIREADMAIL)(LHANDLE, ULONG, LPSTR, FLAGS,
 ULONG, lpMapiMessage FAR *);
typedef ULONG (FAR PASCAL *LPMAPIRESOLVENAME)(LHANDLE, ULONG, LPSTR,
 FLAGS, ULONG, lpMapiRecipDesc FAR *);
typedef ULONG (FAR PASCAL *LPMAPIFREEBUFFER)(LPVOID);
class CMapiUtil
{
public:
 CMapiUtil () ;
 ~CMapiUtil () ;

private:
 HINSTANCE m_hMAPI ; // handle of MAPI.DLL
 unsigned long m_MailHandle ; // mail session handle
 MapiRecipDesc m_MapiRecipDesc ; // defined in MAPI.H
 ULONG m_lRecipients ; // number of recipients
 lpMapiRecipDesc m_lpRecipList ; // pointer to list of recips
 ULONG m_lOldRecipients ; // number of recipients
 lpMapiRecipDesc m_lpOldRecipList ; // last list of recips
 lpMapiMessage m_lpMessage ; // message data in memory
 CStringList m_NameList ; // string list of recipients
 LPMAPILOGON lpMAPILogon ; // pointers to functions in 
 LPMAPISENDMAIL lpMAPISendMail ; // mapi.dll
 LPMAPIFREEBUFFER lpMAPIFreeBuffer ;
 LPMAPILOGOFF lpMAPILogoff ;
 LPMAPIADDRESS lpMAPIAddress ;
 LPMAPIREADMAIL lpMAPIReadMail ;
 LPMAPIRESOLVENAME lpMAPIResolveName ;
public:
 int Logon (CWnd * pWnd) ;
 int GetAddressList (CWnd * pWnd) ;
 int GetUpdatedAddressList (CWnd * pWnd) ;
 void FreeAddressList () ;
 int SendMessage (CWnd * pWnd, MapiMessage * lpMessage) ;
 int ReadMessage (CWnd * pWnd, LPSTR lpMessageID) ;
 void SaveAddressList () ;
 void FreeMessage () ;
 void Logoff (CWnd * pWnd) ;
 char * NoteText () ;
 int NoteLong () ;
} ;
#endif // MAPIUTIL_H



Listing Two

// mapiutil.cpp utility functions for working with mail api, jim conger, 1994
#include "stdafx.h" // used by VC++ for standard MFC includes
#include "mapi.h"
#include "mailexts.h"
#include "mapiutil.h"
#include <stdlib.h>
CMapiUtil::CMapiUtil ()
{
 TRACE ("\nCMapiUtil constructor") ;
 m_hMAPI = NULL ;
 m_MailHandle = NULL ;
 m_lRecipients = 0 ;
 m_lpRecipList = NULL ;
 m_lOldRecipients = 0 ;
 m_lpOldRecipList = NULL ;
 m_lpMessage = NULL ;
}
CMapiUtil::~CMapiUtil () // allows mindless shutdown of mail
{
 TRACE ("\nCMapiUtil destructor") ;
 if (m_lpOldRecipList != NULL)
 delete m_lpOldRecipList ;
 if (m_lpMessage != NULL)

 (*lpMAPIFreeBuffer) (m_lpMessage) ;
 if (m_hMAPI != NULL) 
 ::FreeLibrary (m_hMAPI) ;
}
 // Load mapi.dll into memory, get function addresses, and logon
int CMapiUtil::Logon (CWnd * pWnd)
{
 TRACE ("\nCMapiUtil::Logon ()") ;
 ULONG lHwnd = (ULONG)(LPSTR) pWnd->GetSafeHwnd () ;
 m_hMAPI = ::LoadLibrary ("MAPI.DLL") ;// explicitly load MAPI.DLL
 if (m_hMAPI < HINSTANCE_ERROR)
 return 0 ;
 // get address of MAPILogon()
 lpMAPILogon = (LPMAPILOGON) ::GetProcAddress (m_hMAPI,"MAPILogon") ;
 if (lpMAPILogon == NULL)
 return 0 ;
 // if MAPILogon() was found, assume the other functions can be found
 lpMAPISendMail = (LPMAPISENDMAIL) 
 ::GetProcAddress (m_hMAPI, "MAPISendMail") ;
 lpMAPIFreeBuffer = (LPMAPIFREEBUFFER) 
 ::GetProcAddress (m_hMAPI, "MAPIFreeBuffer") ;
 lpMAPILogoff = (LPMAPILOGOFF) 
 ::GetProcAddress (m_hMAPI, "MAPILogoff") ;
 lpMAPIAddress = (LPMAPIADDRESS) 
 ::GetProcAddress (m_hMAPI, "MAPIAddress") ;
 lpMAPIReadMail = (LPMAPIREADMAIL) 
 ::GetProcAddress (m_hMAPI, "MAPIReadMail") ;
 lpMAPIResolveName = (LPMAPIRESOLVENAME) 
 ::GetProcAddress (m_hMAPI, "MAPIResolveName") ;
 
 // open a mail session - does nothing if a session is running
 ULONG lStatus = (*lpMAPILogon) (lHwnd, NULL, NULL, 
 MAPI_LOGON_UI, 0, (LPLHANDLE) &m_MailHandle) ;
 if (lStatus != SUCCESS_SUCCESS)
 return 0 ;
 else
 return TRUE ;
}
 // Show empty address dialog box, and get selections from user
int CMapiUtil::GetAddressList (CWnd * pWnd)
{
 TRACE ("\nCMapiUtil::GetAddressList ()") ;
 ULONG lHwnd = (ULONG)(LPSTR) pWnd->GetSafeHwnd () ;
 
 long lStatus = (*lpMAPIAddress) ((LHANDLE) m_MailHandle, lHwnd,
 NULL, 2,NULL, 0, NULL, 0, 0, &m_lRecipients, &m_lpRecipList) ;
 return (int) lStatus ;
}
 // Show initialized address dialog box, get final selections 
int CMapiUtil::GetUpdatedAddressList (CWnd * pWnd)
{
 TRACE ("\nCMapiUtil::GetUpdatedAddressList ()") ;
 ULONG lHwnd = (ULONG)(LPSTR) pWnd->GetSafeHwnd () ;
 lpMapiRecipDesc lpRD ;
 
 if (m_lOldRecipients > 0) // if there is a list of names 
 {
 if (m_lpOldRecipList != NULL) 
 delete m_lpOldRecipList ;

 m_lpOldRecipList = new MapiRecipDesc [m_lOldRecipients] ;
 ASSERT (m_lpOldRecipList) ;
 lpMapiRecipDesc lpMRD = m_lpOldRecipList ; 
 
 for (int i = 0 ; i < (int) m_lOldRecipients ; i++)
 {
 POSITION pos = m_NameList.FindIndex (i) ;
 ULONG lStatus = (*lpMAPIResolveName) (
 (LHANDLE) m_MailHandle, lHwnd, 
 (LPSTR) (const char *) m_NameList.GetAt (pos), 
 MAPI_DIALOG, 0, &lpRD) ;
 if (lStatus == SUCCESS_SUCCESS)
 lpMRD [i] = * lpRD ; // copy recipient data to array
 else
 {
 delete m_lpOldRecipList ;
 m_lpOldRecipList = NULL ;
 m_lOldRecipients = 0 ;
 break ;
 }
 }
 }
 long lStatus = (*lpMAPIAddress) ((LHANDLE) m_MailHandle, lHwnd,
 NULL, 2, NULL, m_lOldRecipients, m_lpOldRecipList, 0, 0, 
 &m_lRecipients, &m_lpRecipList) ;
 return (int) lStatus ;
}
 // free the buffers tied up by the GetUpdatedAddressList() function
void CMapiUtil::FreeAddressList ()
{
 TRACE ("\nCMapiUtil::FreeAddressList ()") ;
 if (m_lpOldRecipList != NULL) 
 {
 delete m_lpOldRecipList ; // delete array holding copies
 m_lpOldRecipList = NULL ;
 m_lOldRecipients = 0 ; 
 } 
}
 // Send the message, using addresses from GetAddressList()
int CMapiUtil::SendMessage (CWnd * pWnd, MapiMessage * lpMessage)
{
 TRACE ("\nCMapiUtil::SendMessage ()") ;
 ULONG lHwnd = (ULONG)(LPSTR) pWnd->GetSafeHwnd () ;
 m_MapiRecipDesc.ulReserved = 0 ;
 m_MapiRecipDesc.ulRecipClass = MAPI_ORIG ;
 m_MapiRecipDesc.lpszName = "TSR" ;
 m_MapiRecipDesc.lpszAddress = NULL ;
 lpMessage->lpOriginator = &m_MapiRecipDesc ;
 lpMessage->nRecipCount = m_lRecipients ;
 lpMessage->lpRecips = m_lpRecipList ;
 long lStatus = (*lpMAPISendMail) ((LHANDLE) m_MailHandle, lHwnd,
 lpMessage, 0L, 0L) ; 
 return (int) lStatus ; 
}
 // Read message identified by lpMessageID, save pointer to message
 // Returns TRUE if message was read OK, FALSE on error
int CMapiUtil::ReadMessage (CWnd * pWnd, LPSTR lpMessageID)
{
 TRACE ("\nCMapiUtil::ReadMessage ()") ;

 static lpMapiMessage lpMessage ;
 ULONG lHwnd = (ULONG)(LPSTR) pWnd->GetSafeHwnd () ;
 if (m_lpMessage != NULL) // make sure last message is purged
 (*lpMAPIFreeBuffer) (m_lpMessage) ;
 if (m_lpRecipList != NULL) // and clean up old address list
 {
 (*lpMAPIFreeBuffer) (m_lpRecipList) ;
 m_lpRecipList = NULL ;
 }
 m_lRecipients = 0 ;
 // read the message
 ULONG lStatus = (*lpMAPIReadMail) (m_MailHandle, lHwnd, 
 lpMessageID, 0L, 0, &lpMessage) ;
 if (lStatus == SUCCESS_SUCCESS)
 {
 m_lpMessage = lpMessage ; // save pointer to message if OK
 return TRUE ; // and return TRUE
 }
 else
 {
 m_lpMessage = NULL ; // otherwise, quit and return FALSE
 return FALSE ;
 }
}
 // save the names of all recipients in a message for use
 // in initializing the address dialog box in next mail send
void CMapiUtil::SaveAddressList ()
{
 TRACE ("\nCMapiUtil::SaveAddressList ()") ;
 if (m_lpMessage == NULL)
 return ;
 m_NameList.RemoveAll () ; // empty the last list (if any)
 
 m_lOldRecipients = (long) m_lpMessage->nRecipCount ;
 // add all of the cc's to the end of the list
 for (int i = 0 ; i < (int) m_lpMessage->nRecipCount ; i++)
 {
 m_NameList.AddTail (m_lpMessage->lpRecips [i].lpszName) ;
 }
}
 // Free all buffers associated with a message
void CMapiUtil::FreeMessage ()
{
 TRACE ("\nCMapiUtil::FreeMessage ()") ;
 if (m_lpMessage != NULL)
 {
 (*lpMAPIFreeBuffer) (m_lpMessage) ;
 m_lpMessage = NULL ;
 }
 if (m_lpRecipList != NULL)
 {
 (*lpMAPIFreeBuffer) (m_lpRecipList) ;
 m_lpRecipList = NULL ;
 m_lRecipients = 0 ;
 }
}
 // Log off from mail, and release MAPI.DLL
void CMapiUtil::Logoff (CWnd * pWnd)
{ 

 TRACE ("\nCMapiUtil::Logoff ()") ;
 ULONG lHwnd = (ULONG)(LPSTR) pWnd->GetSafeHwnd () ;
 if (m_MailHandle != 0) 
 {
 (*lpMAPILogoff) (m_MailHandle, lHwnd, 0L, 0L) ;
 m_MailHandle = 0 ;
 }
 if (m_hMAPI != NULL) 
 {
 ::FreeLibrary (m_hMAPI) ;
 m_hMAPI = NULL ;
 }
}
 // Returns a pointer to the text of the message, NULL if no message
char * CMapiUtil::NoteText ()
{
 TRACE ("\nCMapiUtil::NoteText ()") ;
 if (m_lpMessage != NULL)
 return m_lpMessage->lpszNoteText ;
 else
 return NULL ;
}
 // Returns number of chars in the text of the message, 0 if none
int CMapiUtil::NoteLong ()
{
 TRACE ("\nCMapiUtil::NoteLong ()") ;
 if (m_lpMessage != NULL)
 return ::lstrlen (m_lpMessage->lpszNoteText) ;
 else
 return 0 ;
}



Listing Three

void CMainFrame::OnMailSendText()
{
 CMapiUtil MapiUtil ;
 if (MapiUtil.Logon (this))

 {
 if (MapiUtil.GetAddressList (this) == SUCCESS_SUCCESS)
 {
 MapiMessage Message ;
 Message.ulReserved = 0 ;
 Message.lpszSubject = "Message subject" ;
 Message.lpszNoteText = "Text of the message." ;
 Message.lpszMessageType = NULL ; // standard msg. 
 Message.lpszDateReceived = "1994/01/01 12:00" ;
 Message.lpszConversationID = NULL ;
 Message.flFlags = MAPI_UNREAD MAPI_DIALOG ;
 Message.lpOriginator = NULL;
 Message.nRecipCount = 0 ; 
 Message.lpRecips = NULL ; 
 Message.nFileCount = 0 ; 
 Message.lpFiles = NULL ; 
 MapiUtil.SendMessage (this, &Message) ; 
 MapiUtil.FreeMessage () ;

 }
 MapiUtil.Logoff (this) ;
 }
 else
 MessageBox ("Could not log on to MS Mail.", "Error",
 MB_ICONEXCLAMATION MB_OK) ;
}























































August, 1994
Extending C with Prolog


An expert advisor for resolving IRQ conflicts




Dennis Merritt


Dennis is the author of the Active Prolog Tutor and a principal of Amzi!, a
vendor of Prolog products and custom applications. Dennis can be reached at
508-897-7332 or on the Internet at amzi@world.std.com.


When used together, Prolog and C complement each other, allowing you to
quickly build extremely powerful applications. For example, at the heart of
KnowledgeWare's Application Development Workbench (ADW) CASE tool, you'll find
a giant Prolog program. ICARUS, a company that provides project-estimation
tools for chemical engineers, uses Prolog to do much of its work. Pacific AI
provides educational tools that use C libraries for presentation and Prolog
for internal logic. And Windows NTeven uses Prolog to manage networking
installation (see the accompanying text box entitled "Small Prolog and Windows
NTNetworking"). While C can be used to write anything written in Prolog,
Prolog code is much less complex. In fact, KnowledgeWare claims its Prolog
source modules are one-tenth the size of the equivalent C code. Ultimately,
Prolog developers can manage and maintain a greater degree of complexity, thus
providing the user with a more sophisticated application.
This article examines the design of an interface between Prolog and C and
presents a simple expert advisor that identifies IRQ conflicts. All comments
about and examples of Prolog adhere to the Edinburgh Prolog standard and apply
to any conforming Prolog implementation. The C interface to the advisor
program is specific to the Cogent Prolog API (which my company develops),
which provides tools for building and breaking down lists and complex Prolog
structures, and capturing Prolog stream I/O and errors. These allow access to
any Prolog structures from C and vice versa, and enable you to write Prolog
code that is truly environment independent. Similar capabilities are, of
course, available in many Prolog implementations.


Prolog in a Nutshell


While artificial intelligence (AI) might seem to be about the problems of
simulating intelligence on a computer, the essence of most AI programming is
developing search and pattern-matching algorithms. A chess program searches
for patterns in the game, a natural-language program searches for patterns in
lists of words, and a diagnostic program searches for rules that match
symptoms. There are two features in a programming language that make pattern
matching easier: support for symbols as a primitive data type that can be
manipulated without the need to call special functions; and dynamic-memory
management that lets you use the symbols without worrying about
memory-allocation issues. Languages that have these features, such as Prolog
and Lisp, are called "symbolic languages."
For instance, Figure 1(a) is a simple control loop (written in C) that reads a
command from a user and then does some processing. Figure 1(b) is the
equivalent code written in Prolog. Notice the lack of data-definition
statements and string compares in the Prolog version. In Prolog, dynamically
allocated symbols are used instead of character strings.
As an integral part of the language, Prolog also has a sophisticated
pattern-matching algorithm called "unification" and a search mechanism called
"backtracking." In Figure 1(b), the pattern do(X) is unified against the first
of the three do rules, defined by the if operator (:--). If the user types
open when prompted, then the first rule (or "clause," as its usually called)
is matched and the code on the right of the :-- is executed. If the user
enters something other than open, Prolog backtracks to the adjacent clause and
continues to look for a do rule that matches the user's input. Similarly,
matching and backtracking take place in the main clause with the use of repeat
and X==quit, which cause the code between these two statements to loop until
the user types quit. 
With Prolog, you don't write if-then-else statements, function calls, while
loops, or other flow-control constructs. However, between unification and
backtracking, you can induce any flow-control behavior in a Prolog program
that can be achieved in any other language. Symbols, unification,
backtracking, and dynamic-memory management all tend to eliminate the
procedural code you normally need to write. It is no surprise that what is
left looks much more declarative than conventional code and is often a
fraction of the size. 


From Prolog to C and Back


Prolog programs are essentially a collection of rules activated by queries
much in the same way a database is queried. This is true whether the code is
interpreted or compiled. When Figure 1(b) is compiled, main is used to start
the program; it then queries other rules, which query other rules, and so on.
However, this program can also be loaded and run from an interpreter. In this
case, any of the clauses in the program can be queried directly. The nature of
this interaction with Prolog dictates the nature of a C-to-Prolog interface,
which must be able to either execute compiled Prolog code or query a loaded
Prolog program. In this sense, the C-to-Prolog interface will look more like a
database API than procedural, interlanguage calls.
Figure 1(b) also illustrates that the write statement has nothing to do with
logic, pattern-matching, or searching; it simply performs I/O. Prolog provides
a number of special predicates, such as write, used primarily for their side
effects. It is in this area that Prolog is weaker than C. In Prolog, you must
rely on whatever special predicates a particular vendor provides with an
implementation. So, for example, if a particular Prolog implementation doesn't
supply the tools for accessing Windows, it can't be used to implement Windows
applications. This is where Prolog-to-C connections come in--they let you
define extended predicates, such as write, to allow Prolog code access to any
services accessible from C.


Advising on IRQ Conflicts


We recently installed Gateway multi-media kits on our PCs, but the
installation was somewhat difficult because of conflicts in our
interrupt-request lines (IRQs). Since an expert system is ideal for resolving
such conflicts, I wrote a Prolog program (Listing One) that embodies a few
rules of expertise for sorting out IRQ conflicts. It first checks to see if
the default IRQ for the device being installed is available. If so, there is
no problem. If not, it tries the alternate IRQs and recommends an available
slot, telling the user to reset the IRQ switches on the card for the device.
If the alternates aren't available, the program tries to move existing IRQs
around. Failing all else, the program looks for COM ports that can be doubled
up on a single IRQ, thus freeing one for the new device.
This example is intended to illustrate how this knowledge is expressed in
Prolog, not to be the last word on IRQ conflicts. The example does illustrate
how expert systems evolve. These rules came about from the particular cases of
installing the Gateway SoundBlaster on two different machines. As new cases
are encountered, new rules can be added, or old rules can be modified to
reflect the new situations. In this way, the system gets smarter as new
installation situations are encountered.
Listing Two is a simple DOS C program that calls the Prolog IRQ advisor. The
main entry point could easily be a function called from a menu choice of a
larger installation application. Listing Two finds out what type of device is
being installed, then calls the IRQ advisor to see if there will be any IRQ
conflicts installing that device, and if so, how they can be resolved. The
advisor has, coded into it, the knowledge of various devices and the allowable
IRQs.
The main() function in Listing Two uses cpCallStr(&t, "irq_advice('%s')",
sDevice) to call the main entry point of the advisor. It dynamically builds a
query and poses it to the compiled Prolog program. This is very similar to a
database call. When the Prolog clause irq_advice(Device) is called, it gets
the existing IRQ assignments and asserts them into Prolog's dynamic database
by calling the Prolog predicate get_irqs. get_irqs is in turn mapped to a C
function that provides the service. (In this example, the C function simply
reads the data from a file of test IRQ data. However, a typical implementation
would have the code necessary to determine the IRQs from the machine itself.)
In the C function p_get_irqs, each IRQ is asserted to the Prolog database
using the appropriate API function calls. In this case, a printf-like function
is used to build the Prolog term to be asserted.
These terms (or facts) are used by the Prolog rules for finding open slots or
making recommendations on rearranging slots. For example, the make_room
predicate in Listing One uses pattern matching to find two IRQs with single
COM ports on them. It then recommends that the user combine them and makes the
same change to its own dynamic database. The rule that called make_room then
tries again to fit in the device's IRQ request with the new free space. This
approach is very similar to the approach taken with puzzle-solving
applications. The program starts with an initial state (the current IRQs) and
a goal state (the IRQs with the device installed on one of them). The rules
transform the state until the goal condition is reached. The various steps of
the transformation are the recommended steps for the user to take in
rearranging IRQs.
The sample program is set up to allow installation of two different devices, a
SoundBlaster and a Mitsumi CD-ROM. Each has different IRQs it can use. The
current IRQ settings are listed in the file IRQTEST1.DAT; see Listing Three .
These are the possible devices installed on IRQ channels 0--15, respectively.
Running the program for each of these produces the results shown in Figure 2.
The output is the result of the third key aspect of the C-Prolog interface:
Rather than use Prolog I/O statements, the Prolog msg predicate calls the
p_msg function defined by the C program, which uses printfs to generate the
output. This way, the Prolog program is independent of the user-interface
environment in which it is deployed. (The interface could have just as easily
been Windows, or been implemented using a GUI library, such as Zinc or XVT.)
To illustrate other transformations between Prolog and C, the p_msg function
accepts either a term or a list of terms (represented within square brackets
[] in Prolog) as an argument. The C function dynamically determines which type
of Prolog argument it has received, and, if it's a list, walks through the
list outputting each term after first converting it to a C string.


Conclusion


The IRQ advisor illustrates the possibility of a whole class of advisor
modules adding expertise to larger applications. This technology could be
applied to tuning an operating environment, such as Windows, or as part of
very specific applications that control physical devices, as in a
manufacturing environment. Many organizations that have invested in
expert-system technology for help-desk applications have found that they wind
up with many small advisors, rather than one large system. They might have an
advisor for printers, another for LANs, and others for various software
packages.
Help systems could provide natural-language parsers that allow users to
express what they're trying to do. The natural-language parser could then use
the information in the user's question to steer the user to the appropriate
documentation. In addition, natural-language front ends on database queries
can also be developed in a straightforward way from Prolog. Using this
technology, users can be shielded from underlying databases and be able to
express in everyday terms what they wish to get from the database.
Finally, there are a wide variety of standard-conforming Prolog
implementations, which range from shareware to commercial implementations.
Prolog is available for all sorts of platforms, from PCs to graphical
workstations and mainframes. Many of these implementations provide external
language interfaces. The Internet news group comp.lang.prolog is a good source
of information about Prolog. The Frequently Asked Questions (FAQ) file lists
numerous sources for Prolog, as well as learning resources. In the end, you
may find that adding Prolog to your tool chest will allow you to better manage
complex applications while reducing the amount of coding required.
Small Prolog and Windows NT Networking
David Hovel
David is a developer at Microsoft research and can be contacted at
davidhov@microsoft.com.
Windows NT networking installation and configuration is controlled by the file
NCPA.CPL, which the user sees as the Networks icon within the Control Panel's
main window. The bulk of this DLL is written in C++, but it also contains a
simplified Prolog interpreter known as "Small Prolog," written by Henri de
Feraudy during the late 1980s and put into the public domain. It is available
through the C User's Group (Lawrence, KS). 
Small Prolog (or "SProlog") follows the Cambridge syntax and includes most
standard built-ins (predicates) defined by Clocksin and Mellish in their
seminal book Programming in Prolog; you can further extend the language with
user-defined built-ins. The disk from the C User's Group (#CUG297) includes C
source files, makefiles, documentation, and examples that demonstrate Prolog
features for C programmers. The interpreter is written in portable C code--it
compiles for MS-DOS, UNIX, and Atari platforms, as well as Macintosh (if you
modify the UNIX version using the Think C Console function) and OS/2 (by
modifying the PC version).
SProlog supports most standard Prolog-interpreter functionality with one
notable exception: It does not do garbage collection. SProlog defines a
reduced form of the Prolog language which looks very much like Lisp. For
example, a typical Prolog predicate might read
pet(C):animal(C),ownedBy(D),person(D). In SProlog, this would look like ((pet
C) (animal C) (ownedBy D) (person D)).

To be useful, each hardware or software component installed on a Windows NT
machine must be connected at run time to some other component--nothing stands
alone. However, certain constraints define which connections are reasonable
and legitimate. Possibilities such as multiple network cards, multiple
protocol stacks, and conflicting software needs must be handled.
As each network-related component is installed, it writes several textual
records into its own area of the Configuration Registry. In typical OOP
fashion, these records define the class of each component and the constraints
and requirements of each class. At configuration time, the C++ code in the
NCPA.CPL file exhaustively browses the Registry, collects these records, and
converts them to SProlog format.
Along with its C++ code and SProlog interpreter, the NCPA.CPL file has a
700-line SProlog program embedded into it as a textual Windows "resource."
This program is the actual network-configuration algorithm; a C++ class
"wrapper" encapsulates the SProlog engine and exposes C++ member functions for
facilities such as consulting and querying. Network configuration is performed
by consulting the "rules" (the resource-based algorithm), consulting the
"facts" (the Registry-derived declarative information), and performing a
single query which runs the configure everything predicate. Then the C++ code
performs many smaller queries to enumerate the results of the main query from
the SProlog database. This information is rearranged and written back into
each network component's Registry area. Then, when the network is started,
this updated information is used by each software process to determine what
other software modules are to be dynamically connected to it.
The SProlog algorithm has several features, but primarily it exploits Prolog's
inherent backtracking mechanism by exhaustively checking each extant component
with every other to determine compatibility. A set of potential "bindings"
(one-directional links) is asserted into the Prolog database in the first
pass. Then the other constraints are checked, and any associations which would
violate negative constraints are retracted. Finally, the remaining database
information is used to construct NT namespace device names for each
component's "bindings." 
This network configuration methodology was designed to facilitate the
installation of component ensembles which could not be foreseen in 1991, when
the work began. Since components declare their classes and constraints
themselves, the Prolog interpreter can be counted upon to perform correctly
without requiring updates to the NT binaries themselves.
The SProlog interpreter was chosen for this project because the problem
required search-space control, simple and reliable database access, and easy
string-manipulation operations, all of which are hallmarks of Prolog.
Figure 1: (a) Control loop written in C that reads a command from the user and
processes; (b) similar function written in Prolog.
(a)
void main()
{
 char buf[20];
 do {
 gets(buf);
 if (0 == strcmp(buf, open)) ...
 else if (0 == strcmp(buf,add)) ...
 else if (0 == strcmp(buf, delete)) ...
 } while (strcmp(buf, quit));
 printf(done);
}
(b)
main :-
 repeat,
 read(X),
 do(X),
 X == quit,
 write(done).
do(open) :- ...
do(add) :- ...
do(delete) :- ...
do(quit).
Figure 2: Sample session with the IRQ advisor program.
What device are you installing? Sound Blaster
Use which test file? irqtest1.dat
IRQ Conflict Resolution Analysis
Put com ports 3 and 4 together on IRQ 3
Move device mouse to IRQ 4
Continue normal install
What device are you installing? Mitsumi CD-ROM
Use which test file? irqtest1.dat
IRQ Conflict Resolution Analysis
Default IRQ not available. Set device to use 11 instead
Continue normal install

Listing One 

/* Prolog program to illustrate how an expert system that resolves
installation
 conflicts might be implemented. In this case, it resolves conficts in the
 IRQ table, by either selecting another IRQ for the device being installed, 
 rearranges another device's IRQ to open a slot for the new device, or, if 
 there is no room doubles up the com ports on a single IRQ to free up a slot.
 The predicate msg/1, which takes either a single term or a list as
 an argument, is implemented in C.
 The predicate get_irqs/0 is also implemented in C and asserts a
 number of Prolog facts in the form irq/2.
*/

irq_advice(Device) :-
 msg($IRQ Conflict Resolution Analysis$),
 get_irqs, % get IRQs from C program

 free_irq(Device),
 msg($ Continue normal install$).

/* rules for finding a free IRQ */
free_irq(Dev) :- % fail if unknown device
 not(ok_irq(Dev,_,_)),
 msg([$ Unknown Device $, Dev]),
 !,
 fail. 
free_irq(Dev) :- % see if requested IRQ is free
 ok_irq(Dev,IRQ,Option),
 is_free(IRQ,Option), !.
free_irq(Dev) :- % see if requested IRQ can be cleared
 ok_irq(Dev,IRQ,Option),
 clear(IRQ), !.
free_irq(Dev) :- % see if an IRQ can be opened
 make_room,
 free_irq(Dev). % try again with new open slot
is_free(IRQ,default) :- % if default is free, no problem
 irq(IRQ, open),
 msg($ The default IRQ is open. No action needed$).
is_free(IRQ,optional) :- % use different device IRQ
 irq(IRQ, open),
 msg([$ Default IRQ not available. Set device to use $, IRQ, $ instead.$]).
clear(I) :- % move another device's IRQ to free up requested IRQ
 irq(X, open),
 irq(I, D),
 ok_irq(D, X, _), % make sure the device can be moved
 msg([$ Move device $, D, $ to IRQ $, X]),
 retract(irq(X,open)), % update dynamic database with the switch
 retract(irq(I,D)),
 assert(irq(X,D)),
 assert(irq(I,open)).
make_room :- % double up COM ports to make room, if possible
 irq(IRQ_X, com(COM_X)),
 irq(IRQ_Y, com(COM_Y)),
 IRQ_X \= IRQ_Y,
 msg([$ Put com ports $, COM_X, $ and $, COM_Y, $ together on IRQ $, IRQ_X]),
 retract(irq(IRQ_X, _)), % update the dynamic database
 assert(irq(IRQ_X, com(COM_X)+com(COM_Y))),
 retract(irq(IRQ_Y, _)),
 assert(irq(IRQ_Y, open)).
% IRQs that can be used for different devices. We only do Sound Blasters
% and Mitsumi CD-ROMs for now.
ok_irq('Sound Blaster', 5, default).
ok_irq('Sound Blaster', 2, optional).
ok_irq('Sound Blaster', 3, optional).
ok_irq('Sound Blaster', 7, optional).
ok_irq('Mitsumi CD-ROM', 15, default).
ok_irq('Mitsumi CD-ROM', 11, optional).
ok_irq('Mitsumi CD-ROM', 12, optional).
/* this IRQ information is used for relocating the mouse if necessary */
ok_irq(mouse, 2, optional).
ok_irq(mouse, 3, optional).
ok_irq(mouse, 4, optional).
ok_irq(mouse, 5, optional).




Listing Two

/* Sample expert system advisor embedded in C. This particular 
 system is used to resolve and fix conflicts for IRQ slots 
 when installing new devices in a computer. It is set up here 
 as a simple DOS program, but it can be used as a module in a 
 larger program, called, for example, when a menu item is 
 selected in a GUI interface.
 It illustrates the three key aspects of integrating Prolog 
 and C.
 1. The rules for deciding how to rearrange the IRQs are 
 declarative and expressed in Prolog code. (Note that the 
 advantage of Prolog is when there is no clear algorithm 
 for describing how to solve a problem, but rather a 
 collection of seemingly disconnected rules.) The rules, 
 or logic base, is called from the C program.
 2. The Prolog program calls back to C to get low level 
 information. In this case, the C program determines the 
 current IRQ use for the machine it's running on. 
 For the example, the information is read from a test data 
 file, but a real example would have code that digs around 
 in the machine or asks for this information. The 
 predicate is called get_irqs.
 3. The Prolog program relies on C for its I/O, so that the 
 Prolog program is independent of Ui. In this example, it 
 simply sends output to a predicate, implemented in C, called 
 msg. Msg is made a little interesting by the fact that it 
 can either take a single argument, to be displayed, or a list 
 of Prolog terms to be displayed. The I/O, for this example is 
 just printfs, but could be any fancy GUI display.
*/

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>

#include "cogent.h"

char sTestFile[80]; /* global to hold name of test data file */
/*---- built-in predicates, callable from Prolog ----*/
/* function prototypes */
TF p_get_irqs(void);
TF p_msg(void);
/* Extended predicate table definitions, mapping Prolog predicate
 names to C functions. */
PRED_INIT irqPreds[] =
{
 {"get_irqs", 0, p_get_irqs},
 {"msg", 1, p_msg},
 {NULL, 0, NULL}
};
/* extended predicates */
TF p_get_irqs(void)
{
 int i;
 FILE * fp;
 char buf[80];
/* Assert facts about the IRQs to the Prolog dynamic database */
/* Read them from the test file for now */

 fp = fopen(sTestFile, "r");
 for (i=0; i<16; i++)
 {
 fgets(buf, 80, fp);
 cpAssertzStr("irq(%i, %s)", i, buf);
 }
 fclose(fp);
 return(TRUE);
}
TF p_msg(void)
{
 char buf[80];
 TERM t, tm;
 pTYPE ptype;
/* Get the first (and only) parameter and determine its type. If its
 a list, walk the list converting each term to a string and printing
 it. If its not a list, then just convert and print the term.
*/
 cpGetParm(1, cTERM, &t);
 ptype = cpGetTermType(t);
 if (ptype == pLIST)
 {
 while (OK == cpPopList(&t, cTERM, &tm))
 {
 cpTermToStr(tm, buf, 80);
 printf("%s", buf);
 }
 }
 else
 {
 cpTermToStr(t, buf, 80);
 printf("%s", buf);
 }
 printf("\n");
 return(TRUE);
}
/* ----- Main ------------------------------------------------------------ */
void main()
{
 TERM t;
 TF tf;
 char sDevice[80];
 cpInit("irqxs");
 cpInitPreds(irqPreds);
 cpLoad("irqxs");
 printf("What device are you installing? ");
 gets(sDevice);
 printf("Use which test file? ");
 gets(sTestFile);
 cpCallStr(&t, "irq_advice('%s')", sDevice);
 cpClose();
 return;
}



Listing Three

timer

keyboard
com(1)+com(2)
com(3)
com(4)
mouse
disk
lpt1
clock
redirect
open
open
open
coprocessor
disk
network















































August, 1994
Speech Synthesis in C++


A library for generating speech under Windows 3.1




Neil G. Rowland, Jr.


Neil is a programmer at Gradient Technologies. He can be reached on either
CompuServe at 72133,426, the Internet as neil_r@gradient.com, or Channel One
as Neil Rowland.


You'd think that because human speech is made up of discrete sounds
("phonemes"), all you'd have to do to write a speech-synthesis application is
translate text into a string of phonemes, then output each phoneme from a
table of sampled waveforms. In practice, however, programs that use this
approach sound terrible and are nearly impossible to understand. Obviously,
there's much more to speech than just phonemes. Inflection (or change in
pitch) of an accented syllable or the end of a sentence is another factor that
affects the quality of speech. Speech also involves coarticulation--the short
interval of time in which sound gradually changes from one phoneme to
another--such that there is no sharp dividing line between one sound and the
next.
This article presents a C++ class library for speech synthesis using the
Windows 3.1 Multimedia API. Every step in the synthesis process is
represented, from parsing English text to the calls into the sound driver.
With this library, you can write a Windows app that generates speech on any
MPC-compatible sound card.


Coarticulation on the Cheap


The usual method of coarticulation is to store a sample (a "diphone") of every
possible pair of sounds and the transition between them. This approach yields
good results, but at a price. For one thing, it requires a lot of samples,
which means a great deal of storage space. Inflection is also an issue. If you
simply sampled the waveform for a phoneme, you could change the pitch by
varying the playback rate of the stored waveform. The rate of speech can be
regulated by changing how long you repeat the pattern before going on to the
next phoneme. Unfortunately, there's no way to separate these two in a stored
diphone. As you increase the rate of playback, both the pitch and rate of
speech increase. Compounding this problem, each sound in the diphone may be at
a different pitch. For instance, consider a word where the first sound is in a
stressed syllable and the second is in a nonstressed syllable occurring just
after the stressed syllable. A stored diphone can't easily change pitch as
it's playing back without complicating the timing issues.
The approach I present here is a cheaper solution. While the results aren't
quite as smooth as those you get with diphones, the overhead is much more
reasonable. I start with a waveform sample for each phoneme. When playing a
phoneme, I play back the sample repeatedly, at the appropriate rate for the
desired pitch, until the desired time has elapsed. This way, pitch and rate of
speech are individually controllable. For the transition between each phoneme,
I insert an in-between sound--an existing phoneme that sounds halfway between
the two phonemes--instead of sampling and storing a table of all possible
in-between sounds. In other words, I'm coarticulating. If the two phonemes are
of a different pitch, I output the bridge waveform at a pitch that is the
average of these two pitches.


Synthesizing Speech


I've incorporated my coarticulation method into a class library which will
input a buffer of ASCII English text and output intelligible speech. I've
divided the sounds into three categories, each of which is handled
differently. The first category, "tonals," contains all vowel sounds (and some
we don't ordinarily think of as vowels). A tonal has a definite pitch, and the
particular pitch it's played back at depends on the inflection of the voice.
The most obvious coarticulation occurs between two tonals. 
The speech library implements two kinds of inflection--syllable inflection and
phrase inflection. The former is an accent or stress on a particular syllable,
such as the "a" in "tomato," while the latter is the overall pitch pattern of
a sentence (question, exclamation, or ordinary sentence). 
The other two types of sounds, "percussive" and "atonal," are much simpler
than tonal sounds since they don't have any pitch. I don't implement
coarticulation with these because it would be more difficult. Since they're
less common than the tonals, the cost-to-benefit ratio is much less favorable.
A percussive is a short sound, like p or t. An atonal is any sound that has no
tone, but may have an arbitrary duration, such as s or f. How long an atonal
lasts depends on the speed at which the speaker is speaking. A percussive is
always very short, no matter what the rate of speech.
The four layers of the code are encapsulated into respective classes; see
Listings One and Two . These classes are unusual because they each can have
only one instance--there's no need for more than one. Therefore, I declare all
members and methods static, which results in smaller and more efficient object
code. The top level, SPEECH_READER, takes an ASCII text, translates it into a
phoneme string and inflection data, and passes this to the next level,
SPEECH_PHRASER. SPEECH_PHRASER parses the phoneme string and applies the
inflection and coarticulation, using primitives to generate each sound
individually. These primitives are entry points to the third layer,
SPEECH_SOUNDER, which synthesizes a given moment of speech. The samples for
all the phonemes are in this module. WAVEPLAYER, available electronically (see
"Availability," page 3), is a front end to the Windows waveform API. 


SPEECH_SOUNDER


SPEECH_SOUNDER, the low-level class of the speech-synthesis engine, contains
all the sample tables for the various sounds and is rather large. It's split
among several modules, including SPEECLIB.CPP (see Listing Three),
SPEEATON.CPP (for atonals), SPEECTON.CPP (tonals), and SPEEPERC.CPP
(percussives), all of which are available electronically. The application can
set SPEECH_SOUNDER's speech rate by changing the value of static member
svoicespeed. SPEECH_SOUNDER uses a quantum unit of time called a "tick," and
svoicespeed controls the length of the tick. A tick is svoicespeed/11025
seconds long.
The sTonal() method, which plays all tonal sounds, takes three parameters:
_code is an index into a table of sampled tonal sounds (see Table 1); _ticks
is the duration of the sound in ticks; and _pitch is the pitch of the sound,
where a higher number means a higher pitch. Note that SPEECH_SOUNDER has no
knowledge of coarticulation or inflection. These are implemented in
SPEECH_PHRASER. SPEECH_SOUNDER's sTonal() method repeatedly plays the sample
waveform, until the time specified in ticks expires. However, it doesn't play
the waveform samples consecutively. It may skip or repeat samples as needed to
give the desired pitch.
The sAtonal() method plays all atonal sounds. It takes _code, an index into a
table of atonals, and _tick, the duration. Like sTonal, it repeats the sampled
waveform until the time is up. It's much simpler than sTonal because it
doesn't have to adjust the pitch and can spit out the samples consecutively.
The sPercussive() method takes only _code as an index into a table of
percussives. Since a percussive is of short duration, sPercussive() plays
through the waveform only once. sSilence() inserts a length of silence (that
occurs between words or sentences). It takes one parameter, _tick, to indicate
the duration. sFlush() flushes the output stage (WAVEPLAYER) to prevent buffer
overflow. Because some sound drivers pause between buffers introducing an
unwanted silence, it's best to call this only when you might want a silence.
For example, sSilence calls sFlush. sFlushMaybe() will flush the buffer if it
is reasonably close to full. Call this whenever an unwanted silence would not
be objectionable, such as between words. Note that real speech doesn't usually
have a full silence between words, but it doesn't hurt to have one.


SPEECH_PHRASER


SPEECH_PHRASER accepts a string of phonemes and a phrase-inflection code. It
parses the string, speaks the phonemes by means of SPEECH_SOUNDER, and
implements coarticulation. SPEECH_PHRASER::sPhrase() is the main method for
this module. The other methods, sTonal(), sAtonal(), sPercussive(), and
sSilence overload the corresponding SPEECH_SOUNDER methods. They implement the
pitch changes that go with inflection, and sTonal() also sticks the
coarticulation in between sounds, all in a way largely transparent to sPhrase.
sPhrase() takes a pointer to the phoneme string and an inflection code. Each
phoneme in the string begins with an uppercase letter, and may also have a
second, lowercase letter (see Table 1). A caret (^) before a phoneme means to
stress that sound, by raising its pitch slightly (syllable inflection). The
inflection code is a single ASCII character indicating how to inflect the
phrase as a whole. It is a punctuation mark and is interpreted just like
punctuation marks in written English. For example, a question mark gives a
rising inflection at the end of the phrase, and a period gives a falling
inflection. An exclamation mark raises the overall pitch at which the phrase
is spoken. A comma, which is for clauses in a sentence, gives no inflection,
but does put a pause at the end of the phrase.
sPhrase stores the inflection in the static member sinflection and sets up
scount for the convenience of sTonal(). scount is a rough estimate of how
close SPEECH_PHRASER is to the end of the phrase. Finally, sPhrase() applies
the syllable accent to a long tonal (vowel), by adjusting the pitch argument
it passes to sTonal. The #defines TSHORT and TLONG are the number of ticks for
a short and long tonal, respectively. A long tonal is a vowel. A short tonal
is a sound normally thought of as a consonant, such as the "r" sound. 
sTonal() is where the work of inflection and coarticulation goes on. It
derives the right inflection for the current tick from sinflection and scount.
It uses svoicepitch as a baseline pitch and applies inflection to it to derive
the pitch that it passes on to its counterpart routine in SPEECH_SOUNDER.
sTonal() does coarticulation by substituting the in-between sound for the
first tick in every call. It also uses a pitch that is halfway between the
pitch of the previous tonal tick and the pitch it would otherwise use for this
tick. sprevtonal always has the code of the tonal that the previous call to
sTonal() spoke, and sprevpitch has its pitch. coart[] is the lookup table,
indexed by sprevtonal and the currently passed tonal code.
When two percussives occur consecutively, like the s and t in the word stop,
it's difficult to distinguish them unless there's a brief space in between
them. SPEECH_PHRASER's sPercussive() method uses the fPrevwasperc flag to
catch this and insert the brief space when needed.


SPEECH_READER 


SPEECH_READER is the highest-level module of the library. It takes a string of
ASCII text, translates it into phoneme strings, and feeds it to SPEECH_PHRASER
to be said out loud. Its code is in the module SPEEREAD.CPP (see
"Availability," page 3). The entry point sSayText() parses the input text into
clauses by looking for the punctuation marks that mark the end of a clause. It
then feeds each clause in turn to sText2Phonemes(), which translates the
clause into a phoneme string. Finally, it feeds the phoneme string to
SPEECH_PHRASER and loops back for the next clause.

sText2Phonemes() first preprocesses the text to get rid of unpronounceable
punctuation and lower case. It also pads the beginning and end of the string
with spaces to help match against patterns with spaces. The preprocessor
stashes its result into pszTemp. Then we do pattern substitution on pszTemp.
HardRules[] is the table of pronunciation rules. You go through this from
beginning to end, looking for matches and performing substitutions. To ensure
correct precedence, this table should be sorted with long patterns first. Many
patterns have spaces in them to accommodate the beginnings and ends of words.


WAVEPLAYER


The WAVEPLAYER class is a refined version of the WAVEPLAYER class described in
my article "Compressing Waveform Audio Files" (Dr. Dobb's Sourcebook of
Multimedia Programming, Winter 1994). This module is not specific to speech
synthesis; it can be used in any application that needs to output digitized
sounds. You feed this class a string of samples, one at a time. The WAVEPLAYER
object handles the buffering. The code calling WAVEPLAYER needn't bother with
any part of the Windows waveform API calls. 


The Blocking Hook


Every Windows application that does time-consuming calls should have a
mechanism to process system messages. Otherwise, all other system activity
could come to a halt while the operation goes on. Since this can be annoying,
you'll want a Cancel button to halt it. However, Cancel buttons won't work
unless the app is dispatching messages. In WAVEPLAY.CPP, I've created a
"blocking hook" mechanism that has two functions: It processes messages to
allow other apps to run, and it detects a user cancel. When the user selects a
Cancel button, it returns True to the caller; otherwise, it returns False.
While waiting for the system to accept more output, the WAVEPLAYER library
from time to time calls a blocking-hook routine. The pointer to the function
is in sfnBlocking, so the user can change it if necessary. If the user leaves
this pointer alone, there is a default blocking hook, sDfltBlocking(), which
will work in most cases. The blocking hook is called through sDoBlocking().
This routine also sets a flag, sfBlocking, to indicate when the blocking
function is in progress. Since the blocking hook dispatches control messages,
it could also dispatch a message that leads to the app using the WAVEPLAYER.
But if the WAVEPLAYER is already in use, we have a reentrancy problem. The
main program can consult this flag to avoid reentrancy. 


Conclusion


You might consider adding longer samples. In my experience, a longer sample
usually yields better quality and a less robotic-sounding voice. I've kept my
samples short as an economy measure. You may also want to provide the ability
to add pronunciation rules on the fly. A program would look in any .INI file
for rules to add and insert them into the rules list at the proper points.
Either way, this library should serve as a good starting point for an
affordable, all-software, speech-synthesis solution. All that is needed is a
user interface.
Table 1: Phonemes and their codes. (a) Tonal (usually long); (b) tonal
(usually short); (c) atonal; (d) percussive.
(a) 00 Ah (c) 00 S
 01 Aw 01 CH
 02 Ee 02 SH
 03 Oo 03 F
 04 H
(b) 04 A (cat) 05 TH
 05 Eh (short E)
 06 Ih (it) (d) 00 T
 07 Uh (short U) 01 K
 08 Ue (foot) 02 B
 09 R 03 P
 10 Z 04 D
 11 Jh (g in fudge) 05 G
 12 Zh (pleasure)
 13 M
 14 N
 15 Ng
 16 V
 17:Dh (this)
 18:L
 19:Oe (long O)

Listing One

//***************************** SPEECLIB.H **********************************
// Header file for SPEECH library. Copyright (c) 1994 by Neil G. Rowland, Jr.

#ifndef __SPEECLIB_H
#define __SPEECLIB_H
extern "C"
 {
 #include <windows.h>
 #include <mmsystem.h>
 }
extern int WavelibErrno; // last error code in WAVELIB
void WavelibErrorBox(); // report last error to user.
typedef struct
 { // Play waveform output

 static HWAVEOUT shwaveout;
 static HANDLE shHdr;
 static LPWAVEHDR slpHdr;
 static HANDLE shBuf; // current waveform buffer.
 // blocking function...
 static BOOL (*sfnBlocking)(); // return TRUE to cancel output.
 static BOOL sfBlocking; // TRUE when in blocking function.
 static BOOL sfUserCancelled; // set when we see IDCANCEL.
 static BOOL sDfltBlocking(); // default blocking function;
 // accum buffer for feeding in a sample at a time...
 static HANDLE shBufS;
 static LPSTR slpBufS;
 static unsigned scountS;
 static BOOL sOpen(PCMWAVEFORMAT* _pFmt);
 static void sClose();
 static void sPlaySample(WORD _sample);
 static void sFlush();
 protected:
 static void sDoBlocking();
 static inline void iCloseoutBuffer();
 static inline void iCloseoutSampleBuffer();
 static inline void iPlay(MMIOINFO* _pInfo);
 static inline void iPlay(HANDLE _hbuf, LPSTR _lpBuf, int _len);
 }
WAVEPLAYER;
//+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
#define MAXTONAL 19
#define MAXATONAL 5
#define MAXPERC 5
class SPEECH_SOUNDER {
 public:
 SPEECH_SOUNDER(); ~SPEECH_SOUNDER();
 static WAVEPLAYER sOut;
 static unsigned svoicespeed; // in quanta (1/11025sec) per tick.
 static void sFlush();
 static void sFlushMaybe();
 static void sSilence(BYTE _ticks);
 static void sPercussive(BYTE _code);
 static void sAtonal(BYTE _code, BYTE _ticks);
 static void sTonal(BYTE _code, BYTE _ticks, BYTE _pitch);
 };
class SPEECH_PHRASER : public SPEECH_SOUNDER {
 public:
 static BYTE svoicepitch; // overall pitch of voice
 static void sPhrase(const char* _pszPhon, char _inflection);
 static void sSilence(BYTE _ticks); // overload
 static void sTonal(BYTE _code, BYTE _ticks, BYTE _pitch); // overload
 static void sPercussive(BYTE _code); // overload
 static void sAtonal(BYTE _code, BYTE _ticks); // overload
 protected:
 static char sinflection;
 static int scount; // of chars to go in current phrase
 static int sinf; // inflection differential
 static BYTE sprevtonal; // used in co-articulation
 static BYTE sprevpitch; // used in co-articulation
 static BOOL fPrevwasperc; // for dtecting consecutive percussives.
 };
// This is the phrase that read English text...
class SPEECH_READER : public SPEECH_PHRASER {

 public:
 static void sText2Phonemes(char* _pszPhon,int _lenPhon,const char* _pszText);
 static void sSayText(const char* _pszText);
 static void sSayCliptext();
 };
#pragma hdrstop
#endif

Listing Two

//*************************** SPEECH.CPP **********************************
// Main app for SPEECH speech synthesizer. Copyright (c) 1994 Neil Rowland,
Jr.
extern "C" {
 #include <math.h>
 #include <stdlib.h>
 #include <string.h>
 }
#include "speeclib.h"
#define IDM_ABOUT 11 // menu items

//***************************************************************************
int PASCAL WinMain(HANDLE hInst,HANDLE hPrev, LPSTR lpszCmdLine, int
iCmdShow);
BOOL FAR PASCAL AboutDlgProc(HWND hwnd,unsigned wMsg,WORD wParam, LONG
lParam);
BOOL FAR PASCAL MainDlgProc(HWND hwnd,unsigned wMsg,WORD wParam, LONG lParam);

//****************************************************************************
char gszAppName[] = "Speech demo"; // for title bar, etc.
HANDLE ghInst; // app's instance handle
void PhonemeDoIt()
 {
 if (SPEECH_PHRASER::sOut.sfBlocking) return; // re-entrancy.
 SPEECH_PHRASER T;
 WORD code, pitch;
 // Test out the SPEECH_PHRASER stage...
 T.svoicespeed = 350;
 T.svoicepitch = 70;
 for (int c=0; c<2; c++) {
 T.sSilence(4);
 T.sPhrase("W^AhW", '!');
 T.sPhrase("Uh T^AwLKEeNg KUhMPY^OoTEr", '?');
 T.sPhrase("IZ DhIS IMPR^ESIV", ',');
 T.sPhrase("OeR W^AhT", '.');
 T.svoicespeed += 40;
 T.svoicepitch -= 5;
 }
 T.sPhrase("EeN^UhF AwLR^EhDEe", '.');
 }
void TextDoIt()
 {
 if (SPEECH_PHRASER::sOut.sfBlocking) return; // re-entrancy.
 SPEECH_READER T;
 // Now take SPEECH_READER out for a spin...
 T.svoicespeed = 350;
 T.svoicepitch = 70;
 T.sSayText("This is a test of the Emergency Broadcasting System. "
 "This is only a test, so calm down already.");
 }
void ClipDoIt()
 {

 if (SPEECH_PHRASER::sOut.sfBlocking) return; // re-entrancy.
 SPEECH_READER T;
 // Now take SPEECH_READER out for a spin...
 T.svoicespeed = 350;
 T.svoicepitch = 70;
 T.sSayCliptext();
 }
//****************************************************************************
int PASCAL WinMain(HANDLE hInst, HANDLE hPrev, LPSTR lpszCmdLine, int
iCmdShow)
 {
 FARPROC fpfn;
 HWND hwd;
 MSG msg;
 // Save instance handle for dialog boxes.
 ghInst = hInst;
 // Display our dialog box.
 fpfn = MakeProcInstance((FARPROC) MainDlgProc, ghInst);
 if (!fpfn) goto erret;
 hwd = CreateDialog(ghInst, "LOWPASSBOX", NULL, fpfn);
 if (!hwd) goto erret;
 ShowWindow(hwd, TRUE); UpdateWindow(hwd);
 while (GetMessage(&msg, NULL, 0, 0))
 if (!IsDialogMessage(hwd, &msg))
 DispatchMessage(&msg);
 DestroyWindow(hwd);
 FreeProcInstance(fpfn);
 return TRUE;
 erret:
 MessageBeep(0);
 return FALSE;
 }
// MainDlgProc - Dialog procedure function.
BOOL FAR PASCAL MainDlgProc(HWND hWnd, unsigned wMsg, WORD wParam, LONG
lParam)
 {
 FARPROC fpfn;
 HMENU hmenuSystem; // system menu
 HCURSOR ghcurSave; // previous cursor
 switch (wMsg) {
 case WM_INITDIALOG:
 // Append "About" menu item to system menu.
 hmenuSystem = GetSystemMenu(hWnd, FALSE);
 AppendMenu(hmenuSystem, MF_SEPARATOR, 0, NULL);
 AppendMenu(hmenuSystem, MF_STRING, IDM_ABOUT,
 "&About LowPass...");
 return TRUE;
 case WM_COMMAND:
 switch (wParam) {
 case IDOK: // Phomene demo
 // Set "busy" cursor, filter input file, restore cursor.
 ghcurSave = SetCursor(LoadCursor(NULL, IDC_WAIT));
 PhonemeDoIt();
 SetCursor(ghcurSave);
 break;
 case 3: // Text demo
 // Set "busy" cursor, filter input file, restore cursor.
 ghcurSave = SetCursor(LoadCursor(NULL, IDC_WAIT));
 TextDoIt();
 SetCursor(ghcurSave);
 break;

 case 4: // Clipboard demo
 // Set "busy" cursor, filter input file, restore cursor.
 ghcurSave = SetCursor(LoadCursor(NULL, IDC_WAIT));
 ClipDoIt();
 SetCursor(ghcurSave);
 break;
 case IDCANCEL: // "Shut Up"
 SPEECH_SOUNDER::sOut.sfUserCancelled = TRUE;
 break;
 case WM_USER+IDCANCEL: // "Done"
 SPEECH_SOUNDER::sOut.sfUserCancelled = TRUE;
 PostQuitMessage(0);
 break;
 }
 break;
 }
 return FALSE;
}



Listing Three

//******************************** SPEECLIB.CPP
*******************************
// SPEECH speech synthesizer library. Copyright (c) 1994 by Neil G. Rowland,
Jr
extern "C" {
 #include <math.h>
 #include <stdlib.h>
 #include <string.h>
 }
#include "speeclib.h"

//****************************************************************************
BOOL fInited = FALSE;
//*************************** SPEECH_SOUNDER ***********************
WAVEPLAYER SPEECH_SOUNDER::sOut;
unsigned SPEECH_SOUNDER::svoicespeed = 500;
//------------------------------------------------------------------------------
SPEECH_SOUNDER::SPEECH_SOUNDER()
 {
 static PCMWAVEFORMAT Fmt = {{WAVE_FORMAT_PCM, 1, 11025L, 11025L, 1},8};
 if (fInited) return;
 if (sOut.sfBlocking) return;
 if (!sOut.sOpen(&Fmt)) { WavelibErrorBox(); return; }
 fInited = TRUE;
 }
SPEECH_SOUNDER::~SPEECH_SOUNDER()
 {
 if (!fInited) return;
 sOut.sClose();
 fInited = FALSE;
 }
//****************************************************************************
void SPEECH_SOUNDER::sFlush()
 { sOut.sFlush(); }
void SPEECH_SOUNDER::sFlushMaybe()
 { // Here for between words...
 // Sometimes a flush leads to an audible pause. Put between words.
 if (sOut.scountS > 15000)

 sOut.sFlush();
 }
void SPEECH_SOUNDER::sSilence(BYTE _ticks)
 {
 unsigned t; 
 unsigned len = svoicespeed * _ticks;
 if (_ticks > 1) sOut.sFlush();
 for (t=0; t < len; t++)
 sOut.sPlaySample(0x8000);
 if (_ticks > 1) sOut.sFlush();
 }
//****************************** SPEECH_PHRASER
************************************************
BYTE SPEECH_PHRASER::svoicepitch = 30;
char SPEECH_PHRASER::sinflection = '.';
int SPEECH_PHRASER::scount = 0;
int SPEECH_PHRASER::sinf = 0;
BYTE SPEECH_PHRASER::sprevtonal = 255; // 255 means no immed. prev tonal
BYTE SPEECH_PHRASER::sprevpitch = 30;
BOOL SPEECH_PHRASER::fPrevwasperc = FALSE; 
 // for detecting consecutive percussives.
//------------------------------------------------------------------------------
void SPEECH_PHRASER::sPhrase(const char* _szPhon, char _inflection)
 {
 if (!fInited) return;
 if (!_szPhon) return;
 if (sOut.sfBlocking) return; // re-entrancy.
 #define TSHORT 3 // ticks for a short sound
 #define TLONG 5 // ticks for a long (vowel) sound
 int accent = svoicepitch/17; // how much pitch rises on an accent;
 char cPhon[2];
 const char* szPhon = _szPhon;
 char cnext;
 int pitchbase = svoicepitch;
 BOOL fLongTonal = FALSE;
 BOOL fAccent = FALSE; // whether current vowel is being stressed
 int longtonal;
 // Init variables for benefit of sTonal...
 sinflection = _inflection;
 sprevtonal = 255;
 sprevpitch = svoicepitch;
 fPrevwasperc = FALSE;
 scount = strlen(_szPhon);
 if (accent < 1) accent = 1;
 while (*szPhon && !sOut.sfUserCancelled) { // Parse the phoneme string...
 cPhon[1] = '\0';
 cPhon[0] = *szPhon;
 szPhon++;
 cnext = *szPhon;
 if (cnext>='a' && cnext<='z') { // part 2 of a 2-char phoneme
 cPhon[1] = cnext;
 szPhon++; scount--;
 }
 // Now look it up...
 switch (cPhon[0]) {
 case ' ': sSilence(1); sFlushMaybe(); break; 
 // space between words
 case '^': fAccent = TRUE; break;
 case 'A':
 switch (cPhon[1]) {

 case 'h': longtonal=0; fLongTonal=TRUE; break;
 case 'w': longtonal=1; fLongTonal=TRUE; break;
 default: sTonal(4,TSHORT,pitchbase); break;
 }
 break;
 case 'B': sPercussive(2); break;
 case 'C': sAtonal(1, 1); break;
 if (cPhon[1]=='h') sAtonal(1, 1); else sPercussive(1); break;
 case 'D':
 if (cPhon[1]=='h') sTonal(17,TSHORT,pitchbase); 
 else sPercussive(4); break;
 case 'E':
 switch (cPhon[1]) {
 case 'e': longtonal=2; fLongTonal=TRUE; break;
 case 'r': longtonal=9; fLongTonal=TRUE; break;
 default: sTonal(5,TSHORT,pitchbase); break;
 }
 break;
 case 'F': sAtonal(3, 1); break;
 case 'G': sPercussive(5); break;
 case 'H': sAtonal(4, 1); break;
 case 'I': longtonal=6; fLongTonal=TRUE; break;
 case 'J': sTonal(11,TSHORT,pitchbase); break;
 case 'K': sPercussive(1); break;
 case 'L': sTonal(13,TSHORT,pitchbase); break;
 case 'M': sTonal(13,TSHORT,pitchbase); break;
 case 'N': sTonal((cPhon[1]=='g')?15:14, TSHORT, pitchbase); break;
 case 'O': 
 if (cPhon[1]=='o') { longtonal=3; fLongTonal=TRUE; }
 else if (cPhon[1]=='e') { longtonal=19; fLongTonal=TRUE; }
 else sTonal(1, TSHORT,pitchbase); break;
 case 'P': sPercussive(3); break;
 case 'R': sTonal(9,TSHORT,pitchbase); break;
 case 'S': sAtonal((cPhon[1]=='h')? 2:0, 1); break;
 case 'T':
 if (cPhon[1]=='h') sAtonal(5, 1); else sPercussive(0); break;
 case 'U':
 if (cPhon[1]=='e') sTonal(8,TSHORT,pitchbase); 
 else sTonal(7,TSHORT,pitchbase); break;
 case 'V': sTonal(16,TSHORT,pitchbase); break;
 case 'W': sTonal(3,TSHORT,pitchbase); break;
 case 'X': break;
 case 'Y': sTonal(2,TSHORT,pitchbase); break;
 case 'Z': sTonal((cPhon[1]=='h')?12:10, TSHORT, pitchbase); break;
 }
 if (fLongTonal) { // play long tonal, applying any (syllable) accent.
 sTonal(longtonal, 1, pitchbase + (fAccent?accent/2:0));
 sTonal(longtonal, TLONG-2, pitchbase + (fAccent?accent:0));
 sTonal(longtonal, 1, pitchbase + (fAccent?accent/2:0));
 fLongTonal = FALSE; fAccent = FALSE;
 }
 scount--;
 }
 sSilence(TLONG);
 }
static BYTE coart[MAXTONAL+1][MAXTONAL+1] =
 { // coarticulation table... code //
// Ah Aw Ee Oo A Eh Ih Uh Ue R Z Jh Zh M N Ng V Dh L Oe // prevtonal
 0, 5, 5, 8, 4, 5, 5, 1, 8, 8, 10, 11, 12, 8, 8, 5, 5, 5, 3, 7, // 0 = Ah

 5, 1, 5, 3, 4, 5, 5, 8, 8, 8, 10, 11, 12, 8, 8, 14, 16, 17, 3, 7, // 1=Aw
 5, 5, 2, 7, 4, 5, 5, 5, 8, 8, 10, 11, 12, 8, 8, 14, 16, 17, 6, 8, // 2=Ee
 1, 1, 5, 3, 4, 5, 8, 8, 8, 9, 10, 11, 12,13, 14, 14, 5, 17, 7, 7, // 3=Oo
 4, 1, 2, 3, 4, 5, 6, 8, 8, 8, 10, 11, 12, 8, 8, 14, 16, 17, 3, 5, // 4=A
 5, 1, 2, 3, 4, 5, 6, 8, 8, 8, 10, 11, 12, 8, 8, 14, 16, 17,18, 6, // 5=Eh
 5, 5, 2, 3, 5, 5, 6, 5, 8, 8, 10, 11, 12, 5, 5, 2, 5, 5, 5, 7, // 6=Ih
 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 14, 16, 17, 8, 19, // 7=Uh
 8, 1, 2, 3, 4, 5, 6, 7, 8, 3, 10, 11, 12, 8, 8, 14, 16, 17,18, 19, // 8=Ue
 5, 7, 5, 3, 6, 5, 2, 8, 3, 9, 10, 11, 12, 8, 8, 14, 16, 17,18, 8, // 9=R
 0, 1, 2, 3, 4, 5, 6, 5, 8, 9, 10, 11, 12, 8, 8, 14, 16, 17,18, 19, // 10=Z
 0, 1, 2, 3, 4, 5, 6, 5, 8, 9, 10, 11, 12, 8, 8, 14, 16, 17,18, 19, // 11=Jh
 0, 1, 2, 3, 4, 5, 6, 5, 8, 9, 10, 11, 12, 8, 8, 14, 16, 17,18, 19, // 12=Zh
 7, 0, 7, 7, 5, 5, 6, 8, 8, 9, 10, 11, 12,13, 14, 14, 16, 17,18, 8, // 13=M
 7, 0, 7, 7, 5, 5, 6, 8, 8, 9, 10, 11, 12,13, 14, 14, 16, 17,18, 8, // 14=N
 14, 1, 5, 3, 4, 5, 5, 8, 8, 9, 10, 11, 12,13, 14, 15, 16, 17,18, 19, // 15=Ng
 7, 1, 2, 3, 4, 5, 6, 5, 8, 9, 10, 11, 12, 8, 14, 14, 16, 17,18, 19, // 16=V
 5, 1, 2, 3, 4, 5, 6, 5, 8, 9, 10, 11, 12, 8, 14, 14, 16, 17,18, 7, // 17=Dh
 0, 1, 5, 7, 4, 5, 5, 7, 7, 8, 10, 11, 12,13, 14, 14, 16, 17,18, 19, // 18=L
 8, 8, 8, 7, 5, 5, 5, 8, 8, 8, 10, 11, 12, 3, 3, 7, 8, 8,18, 19 // 19=Oe
 };
void SPEECH_PHRASER::sTonal(BYTE _code, BYTE _ticks, BYTE _pitch)
 { // sTonal, applying end-of-sentence inflection and co-articulation... 
 BYTE pitch;
 BOOL fFirstTick = TRUE;
 if (sinflection == '!') _pitch += svoicepitch/20; // talk excited.
 if (scount > 4) sinf = 0;
 else if (sinf == 0) sinf = 1; // start phrase inflecting.
 while (_ticks--) {
 pitch = _pitch;
 if (sinf) {
 switch (sinflection) {
 case '?': pitch += sinf; break; // rise on question.
 case '.': case '!': pitch -= sinf; break;
 default: break;
 }
 sinf++;
 }
 if (fFirstTick && sprevtonal!=255) { // co-articulate
 SPEECH_SOUNDER::sTonal(coart[sprevtonal][_code], 1, 
 (pitch+sprevpitch)/2);
 fFirstTick = FALSE;
 }
 else SPEECH_SOUNDER::sTonal(_code, 1, pitch);
 }
 sprevtonal = _code; // for next time.
 sprevpitch = pitch; // "
 fPrevwasperc = FALSE;
 }
void SPEECH_PHRASER::sAtonal(BYTE _code, BYTE _ticks)
 { // overload, applying coarticulation...
 SPEECH_SOUNDER::sAtonal(_code, _ticks);
 sprevtonal = 255;
 fPrevwasperc = FALSE;
 }
void SPEECH_PHRASER::sPercussive(BYTE _code)
 { // overload, applying coarticulation...
 if (fPrevwasperc) sSilence(1); // keep consecutive percs distinct
 SPEECH_SOUNDER::sPercussive(_code);
 sprevtonal = 255;

 fPrevwasperc = TRUE;
 }
void SPEECH_PHRASER::sSilence(BYTE _ticks)
 { // overload, applying coarticulation...
 SPEECH_SOUNDER::sSilence(_ticks);
 sprevtonal = 255;
 }























































August, 1994
PROGRAMMING PARADIGMS


Buck and the Preacher




Michael Swaine


The application model is not dead yet. But it is in intensive care, and the
heirs are getting impatient. One man who has made a career of impatience,
Steven P. Jobs, resurfaced this summer in a venue that one might think an odd
place to discuss the death of the app: Rolling Stone. Steve had come down from
the mountain to preach the word to the masses, and the word was OOP.
In the 1970s, Steve explained, only the elite could use computers. The
personal- computer revolution changed that. In the 1990s, only the elite can
create software solutions, and object-oriented software development will
change that. Object-oriented software development will bring about a radical
change in the way software is written, used, and distributed; traditional
applications will go away; and blah, blah, blah.
Sorry, but I mean, Steve Jobs is not the first person to make such
predictions, is he? Even as I was writing this month's column, an obscure
newsletter crossed my desk announcing "The Application Is Dead: Long Live the
Business Object." An article in another publication begins, "Everyone knows
that object technology is the future_." (Of course, there are contrarian
voices: BYTE magazine recently announced that OOP has failed, and that we'll
have to rely on Visual Basic. Hmm.)
The point of the BYTE piece, however, was that object-oriented programming
hasn't lived up to the revolutionary predictions. Anybody who has ever
programmed in C++ knows that merely using object-oriented tools doesn't
suddenly change the world. What Steve is talking about is the universal
availability of good reusable software components, a development that
object-oriented tools ought to make not only possible, but real, and that
would indeed change the way software is written, used, and distributed. But,
as the BYTE piece points out, it hasn't happened.


Come the Revelation


Nevertheless, I think Steve's right. At least, if you factor in the spin that
he puts on his arguments in order to make the answer come out NextStep.
Several forces do seem to be leading us toward revolutionary change: The
compelling need among users to focus on the solutions to their problems rather
than learning a new application. The compelling need among developers to
escape from the tyranny of the monster app that does it all. The clear
economic imperative of good reusable software, if we can just figure out how
to make software reusable, and how to know that a piece of reusable software
is any good. The clear economic advantages of rapid development and
responsiveness to user needs that would come with a shift from a market of a
few huge applications to many small components.
Applications are not just monstrous in terms of size; they are beginning to
look a little grotesque with all the gadgets and wires of the life-support
systems they get hooked up to in order to keep them alive one more year. The
monarch is ill. Revolution is in the air.
Steve thinks the winner will be NextStep. IBM and Microsoft have their long-
and short-term plans. Mac fans are optimistic about OpenDoc or vaguely hopeful
about Taligent. But there are people who aren't waiting; like impatient
revolutionary Lee Buck.


Bucking the Trend


I first became aware of Lee's work when I was writing HyperTalk code and
trying to keep a low profile about it so as to maintain some shred of
credibility among real programmers. It turns out that I was not alone: At this
year's Apple Worldwide Developer's Conference, one developer assured Apple
that there were more HyperTalk programmers than even Apple knew about.
Godzillions of closet coders were prototyping apps and even delivering
completed apps to clients using HyperCard, playing every game they knew to
disguise their works as real applications.
Lee Buck was one of them. He founded a consulting company, SDU, in 1988, and
soon he was creating multimedia titles for the likes of National Geographic
and Apple Computer. Lee used HyperCard and found himself developing tools to
make a stack look more like the real thing. One tool, WindowScript, was an
interface builder for HyperCard that lets your stacks do
anything--visually--that apps could. It was the ideal prototyper and the
perfect masquerade. (Close to perfect: It didn't do anything for performance.)
The next evolution was away from HyperCard and on to AppleScript. Apple's
system-scripting technology has no interface; Lee's FaceSpan lets you give it
one. More than that, you can use it to create small apps quickly with full
user-interface features. And because it is built on top of Apple's AppleEvents
system, you can build solutions that use system capabilities, or span
applications, or use single applications as functionality toolkits. Because
Apple is bundling FaceSpan with its AppleScript developer package, FaceSpan is
getting a lot of attention.
I talked with Lee in May about FaceSpan, application development, and where
current trends may be leading.
DDJ: FaceSpan is more than just a front end for AppleScript. Since Apple is
bundling it with the development package for AppleScript, it's more like the
official front end for AppleScript. What does it mean to be blessed by Apple?
Buck: For us, it's exciting; it's an opportunity to get a set of tools out to
a larger audience. There's a real opportunity on the Macintosh side to do a
better job of taking the robust functionality and capability that is available
embedded somewhere in all of the wonderful application software out there and
delivering it in a much more focused way to users.
DDJ: For example?
Buck: There are a lot of people out there who use two menu items and three
dialogs of Excel, and that's all they do. I think Apple was right in its
vision of making applications that are smaller, [more] focused to users'
needs.
DDJ: You mean OpenDoc.
Buck: OpenDoc is a really neat technology. I look forward to supporting it
more. At the same time, in the here and now, we can bring some of those
benefits to users today.
DDJ: Benefits like what? OpenDoc is a whole new model of computing.
Buck: Better focused solutions. Ones that meet specific users' needs. The
problem OpenDoc is solving is, in part, that I had to bring along the other 80
percent of Excel in order to let you [use your two menu items and three
dialogs]. If we can help address some of those things now, we don't eliminate
the imperative for OpenDoc, but in the short term we address some of the real,
pressing needs that users have for more-focused solutions.
DDJ: Tell me more about FaceSpan.
Buck: FaceSpan is an interface builder. It is focused on the task of building
applications and the user-interface elements of those applications. It does so
at the expense of some of the other services that you might see in a [product
like] HyperCard. It doesn't have a data-storage model, per se. It doesn't have
strong text-retrieval and searching mechanisms. It doesn't have paint tools.
But it aspires to be a very flexible set of user-interface tools that allows
you to manifest and to create, probably in the next version, 80 to 90 percent
of the functionality of most commercial products.
We always believed that everyone spends far too much time telling a computer
how to do something. We wanted to deliver tools that let people make real
things, things that are indistinguishable from what you might get from using C
and so on, but using much more highly leveraged tools.
DDJ: How is it different from something like HyperCard?
Buck: We are focused on the user experience, and we have AppleScript as the
glue that binds. There is a class of applications that we're good at, there's
a class that we're not. If there's anything that's computationally intensive,
you're going to have to go somewhere else to get that done. One of the
dynamics you find in other environments is that you have to be very mindful of
their limitations when devising a user interface. Part of our desire is to
give you a canvas, the ability to create the right user interface, and then
worry about technical limitations and actually getting it done later. Our
worldview is that you ought to start at the user experience, and then the rest
is just engineering.
DDJ: Where is FaceSpan going?
Buck: There are some possibilities, one of which is that we have the notion of
a component. It's a little run manager that knows how to do things like open a
window, set a window property, close a window. And those are all accessible
from C and Pascal. So one model is that you would write programs in C and
Pascal and just shove off the UI tasks to this API.
DDJ: I understand that you are doing that already.
Buck: We are using our component in that way for some projects. One of the
nice things about that is that we prototype in FaceSpan, get client buyoff on
the screens, and then those are the screens. We don't go make the screens;
those are the screens that get used in the actual product.
DDJ: Is this going to be a product? Or a technology direction?
Buck: I don't know the answer yet, but it sort of makes you go "Hmm."
DDJ: You didn't mention AppleScript at first when you started talking about
FaceSpan. And you characterize it as the glue. How ungluable is it?
Buck: We're OSA compatible, which means that we can plug in any kind of
scripting language. Again, this is not necessarily a product direction, but
with the code-fragment manager, you could imagine making C++ an OSA language,
and it would just sort of work, and it would bind to other segments that
you've got elsewhere in your program. So you could imagine being in FaceSpan
and saying, "I need to write this in C++, please." That would be sort of cool.
DDJ: Of course, one of the things that C and C++ and Pascal programmers have
against HyperCard is its performance, and AppleScript isn't a speed demon. Are
you concerned about performance issues?
Buck: With the advent of PowerPC, when you bring the AppleEvent manager
native, AppleScript native, put a [Power Mac] 620 on everybody's desk--that's
three years out--call it a 5x, 10x improvement over 040. The fact that [an
application] was developed in AppleScript really doesn't matter as much any
more. If the window comes up in one tick or two ticks really doesn't matter to
the user.
DDJ: Most of the first apps to come out on PowerPC aren't getting much more
out of it than speed.

Buck: It reminds me of the old days. Cool, I've got a faster machine, I can
run Lotus quicker. Well, the Macintosh said, No, let's do something with that
horsepower. Let's deliver a better user interface.
Well, we've got more horsepower. Let's do something with that horsepower
instead of just running C faster. There's a whole class of people who have not
sinned sufficiently in life to have to program in C. They shouldn't have to.
They're just trying to get a class of problems done that doesn't have to
perform like a bat out of hell. These are the same people [about whom] we
thought the performance of a Mac 128 was just what they needed ten years ago.
DDJ: I gather that you think there's room for improvement in programming
environments.
Buck: Every time I'm in a language and I hit compile and it comes up and says,
"This variable is not defined," I feel like it's reached out and slapped me
instead of saying, "Shall I define it?" or, "Do you mean this one that is
spelled really close to the one you just typed in?" That's really stupid.
In all of these wonderful studies about productivity and lines of code, I know
there's at least 10 to 40 percent of productivity that we can pick up, crumbs
that are lying around. Instead of reengineering languages, we can start
collecting some of the crumbs.
DDJ: Such as?
Buck: Formatting C. Call me silly; I shouldn't have to spend a lot of time
hitting tab. I just think that's stupid. Different people have different
formatting styles; that's nice. I don't care. The format of your C code isn't
what your engineers should be spending lots and lots of time on.
We just need to treat programmers like users. We need to think about the user
experience [of programming].
DDJ: How about application frameworks?
Buck: I think frameworks are cool, frameworks have a real place, but you're
living in somebody else's house. I think there is at least space for people to
consider another model, which is, "We're a set of really cool services that
can make a whole class of your headaches go away if you want us, but it's
completely under your control."
DDJ: Can you give me an example of a good feature in a programming
environment?
Buck: One of the great innovations of HyperCard that has gone unheralded was
that the programmers spent all of their time in the user's space. They were
looking at a stack. They would go into a script, they would add a little
[code], and they would go back and spend their time in the stack. When you
think about traditional development environments, you spend all your time in
C++ looking at object browsers and tools dedicated to your task--and then five
minutes looking at the program.
And this is bad because you don't inhabit the space that your users are going
to inhabit. [We need to] get tools to a state where I can spend less time in
those environments and more time in the user's space. You don't see it much in
traditional languages--for very good reasons. But there's that underlying
dynamic that I think we need to pay attention to.
Another thing that HyperCard did right was, if you do nothing, you get
something. We tried to do that in FaceSpan. If you open up FaceSpan and create
a new document and do Save as Application, you get a double-clickable
application, it brings up a splash screen, the font menu works, style menu
works, edit menu works, undo works--and then you can incrementally add
behaviors.
In any kind of framework, if you want to deliver that same functionality,
it's, "Let's talk about the class-inheritance structure, and next week you'll
be ready to make a quick something."
DDJ: Let's get back to the software-components revolution, or whatever you
want to call it. Where do you see things heading, in the long run?
Buck: There's an interesting trend. If you look back at the good old days,
there was a monolithic, linear program. And then we had these things called
"subroutines," and we could split the program from one monolithic huge thing
to smaller chunks and have some connections or relationships between them. And
then we moved to object-oriented programming, where we said, "Let's make those
subroutines even smaller still, let's have more sophisticated kinds of
linkages between bits of code." And one way of looking at this is that we're
beginning to put less and less information into the code and more and more
information into the network.
So there are two points: I'm not sure we've really thought enough about a
class of tools that help us understand what information is being encoded in
the network. Object browsers are just access to source code. Those aren't the
answer.
The other interesting thing is, what's the [limit] of this trend? The [limit]
is little bits of code that know almost nothing and where the connection is
everything. Like, for example, oh, a neuron. It knows nothing, but the
connection is everything.
So, in some ways we're finding the path of computing evolving in a way very
close to the biological analog.
DDJ: For which we have no documentation or user interface.
Buck: Yeah. No clues at all. And even in neural nets, I think that they just
sort of go, "Cool, look, it works."








































August, 1994
C PROGRAMMING


Quincy's Debugger




Al Stevens


It's my anniversary again, and the annual C issue. Six years ago this month, I
wrote my first "C Programming" column. It announced my intentions to use the
column as a forum for C code that readers could use and study. Having at the
time about three months worth of code in reserve, I wondered how soon I would
exhaust my store of ideas. Not soon, I hope.
This month continues the Quincy project by describing the debugger. Quincy is
a C interpreter with a D-Flat front end and an integrated C-language tutorial.
It's the teaching environment for a book I'm writing and an ongoing project
for this column. The program has undergone many changes since I started
writing the book. Quincy has to properly interpret the book's exercises, which
demonstrate most of Standard C. There were times when I was tempted to change
an exercise to avoid a bug in Quincy, but I tried not to yield to those
temptations. The result is a better interpreter. There are still bugs--a
couple that I know about and some, I'm sure, that I haven't seen yet. I'll
hear about them.


The Quincy Debugger


Quincy operates from within an integrated development environment (IDE) which
integrates an editor, a debugger, and a translator. The editor is implemented
as the D-Flat Editor class, which I built specifically for Quincy but made a
part of D-Flat, since it is a general-purpose text-editor class. The
translator is an interpreter that stands alone because I might want to use it
by itself outside of the IDE or integrate it with a different user interface.
The debugger is part of Quincy's application, driven by commands from the
D-Flat CUA interface, and is mostly independent of the interpreter or the fact
that it is debugging C-language source code. It has some D-Flat specific code,
but if I ever port Quincy to another platform, most of the debugger code will
go along. If I ever use the IDE for a different language, the debugger will
port easily enough, too.
A debugger needs cooperation from the executing program, which, in this case,
is the interpreted C program being executed by Quincy's interpreter. When you
run the program outside of the debugger, the debugger must intrude as little
as possible to minimize its overhead. When you are stepping through the
program, the debugger must take control after the execution of every statement
in the program to display the source-code line and let the programmer view the
source code and watch variables. When breakpoints are set, the debugger
observes every line of code as it is executed to see if it is in the
breakpoint list. This means that the executing program reports the execution
of each statement to the debugger, and the debugger associates statements in
the executing program with lines of source code. All of which means that
executable code needs line-number information in order to be debugged.
Quincy's preprocessor strips all of the excess white space from the source
code. But it preserves the program's line-number information by inserting a
line-number comment for each line that has code on it. Quincy's lexical
scanner translates source code into interpretable tokens, and the line-number
comments become line-number tokens. Quincy's interpreter calls a function to
retrieve the next token in the stream. That function bypasses line-number
tokens after storing the current line number globally. That way, the
interpreter and debugger always know what the current line number is. The
debugger needs it to step through the code and sense breakpoints. The
interpreter uses it to report run-time errors.
The debugger consists of three logical parts: executing the program; managing
breakpoints; and viewing, watching, and modifying variables. Execution is the
focus this month. Listing One is debugger.h and Listing Two debugger.c.
Debugger.h declares external and global functions and variables for all three
parts of the debugger, and debugger.c contains the code that operates the
interpreted program from the IDE.
When the user chooses the Run command from the IDE, the RunProgram function is
called. This runs the program without stepping. You can use it to start the
program after some Step commands, after a breakpoint, or after interrupting a
running program with Ctrl+Break. There is a complicated relationship between
these different contexts. If this function finds the Stepping flag set to be
true, the program has already been running and has been interrupted by a step,
breakpoint, or Ctrl+Break. The user has decided to Run uninterrupted from the
current position in the program. The function turns off the Stepping flag and
returns. This sequence finds its way back to the interpreter at the
interrupted position.
If the interpreted program is not stepping, the RunProgram function sets some
flags, hides the IDE so the program can use the screen, and calls
StartProgram. When that function returns, the program has returned from main.
If the Stepping variable is true, the user has chosen Run, the program has hit
a breakpoint, or the user pressed Ctrl+Break and has stepped to the end of the
program. If Stepping is not true, the program was run and terminated without
interruption. In any case, the function displays the IDE.
The StepProgram function is called when the user chooses Step Over or Step
From the IDE. If Stepping is not true, this is the first step into the
program. It sets up the execution and calls StartProgram. If Stepping is true,
this is a subsequent step. The function sets inIDE to false and returns, which
finds its way back to the place in the debugger function where the current
statement was interrupted to wait for the next step.
The StopProgram function is called when the user chooses the Stop command to
stop a program that has not terminated normally. It clears some flags that
manage the running context of the program and sets inIDE to false.
The StartProgram function starts a program running from the Run and Step
commands when the program is not yet running. The function sets up
command-line input/output redirection, builds the command-line parameters from
what the user sets up, and calls qinterpmain to scan, translate, and interpret
the source-code program. When that function returns, the interpreted program
has terminated, either by returning from main, calling exit or abort, or as
the result of the user choosing the Stop command from the IDE.
The interpreter calls the debugger function once for each statement in the
interpreted program. The debugger function returns true to tell the
interpreter to terminate the program or false to continue interpreting. If the
user is not Stepping, no breakpoints are programmed, and the user has not
pressed Ctrl+Break, then the debugger function gives control back to the
interpreter. If the user is stepping, debugger looks at the current line
number to see if it is the same as the last time the interpreter called in.
Several C statements can be on the same line; the interpreter calls in for
each statement. The debugger function, being interested in source-code lines
rather than statements, returns if the statement has already been processed.
Otherwise, it calls DisplaySourceCodeLine twice: first to turn off the step
highlight cursor for the previous line, and then to turn it on for the current
line. Then the debugger calls the RunIDE function. When the RunIDE function
returns, the debugger function returns to the interpreter.
The RunIDE function calls UpdateWatches (to be discussed next month) to query
watched variables and display their values. Then it goes into a D-Flat
message-dispatching loop. This action runs the IDE until the user does
something that clears the inIDE flag, at which time RunIDE returns. 
There are several more small functions in debugger.c. Their comments document
their purposes. At the bottom of the listing are functions to support the
Function List and Function History commands. The user can view a list of the
functions in the program and view the stack of functions as they have been
called from main to the current running function in the interpreted program.
The user can choose an entry from the list, and the IDE will page to the
chosen function and display it in the editor window. Function List and
Function History are implemented by a single dialog box. The difference
between the two is that the function history comes from the run-time table of
functions running, while the function list comes from the table of callable
functions. Neither command is active when the program is not running. This is
because the tables that build them are themselves built by the interpreter,
and they are not available unless the source code has been scanned and
translated.
Next month I'll explain how the debugger manages breakpoints and the Watch
Window.


Cpp: Summation for the Defense


In the "Programmer's Bookshelf" column in this issue , I review Bjarne
Stroustrup's new book, The Design and Evolution of C++. I enjoyed the book and
recommend it to every C++ programmer, but I had a problem with Chapter 18,
Bjarne's brief diatribe on the preprocessor. In the last four pages of the
book, he becomes a bit of a language lawyer, something he usually avoids. His
dislike for the preprocessor is clear. "I didn't like Cpp at all and I still
don't like it._ I'd like to see Cpp abolished." No beating about the bush
there. He is not alone. Many C++ programmers express the same sentiment. I
understand it, but having recently written Quincy's preprocessor and having
used the preprocessor effectively in other projects, I cannot agree to the
extent that I might have otherwise. Understand that these are matters of
opinion, and Bjarne makes it clear that Chapter 18 reflects his own. His
opinions merit attention and carry weight because of the respect that his
achievements have earned him. This response is my view of the same subject.
C++ has language features that obviate some of Cpp's preprocessing directives,
but not all of them. Bjarne stated his intention to provide language
alternatives for as much of Cpp as possible. In most cases he succeeded.
However, there is no replacement for #if, #ifdef, and #ifndef because he
hasn't come up with anything suitable. He has never seen a #pragma that he
liked. I agree with that. The only #pragma that I ever used suppresses
annoying C warning messages and is unnecessary because C++ permits unnamed
parameters in a function header.
Bjarne proposes a C++ include directive that is more restrictive than
#include. It would ignore multiple inclusions of the same header file and
would restrict the effectiveness of macros to the file in which they are
defined. That part of the proposal, which implicates macros, is inconsistent
with the position that macros are replaced by inline functions. If they are
unnecessary, why deal with their scope?
Bjarne's proposed include operator would disable a useful programming idiom if
it ever replaced #include. Despite his good intentions, applications for
#define exist that inline C++ functions do not support and that the proposed
include operator would not support. Consider how D-Flat redefines the ClassDef
macro to use the same initializer list for different purposes. Take a look at
D-Flat's dialbox.h and dialogs.c. The preprocessor is the resource compiler,
and a good one at that. These techniques apply macro definitions that
transcend source files and include header files more than once in different
compile-time conditional contexts. Without #include, these techniques would
not work.
Bjarne prefers that you not have to use Cpp. In his opinion, Cpp does its job
"pretty badly" although "adequately." Furthermore, he dislikes the potential
for programmer abuse. He gives an extreme (and funny) example of a macro coded
on the command line that converts sqrt to rand and return to abort. This is an
effective, albeit unrealistic, point but it is also an inconsistent position.
Bjarne dismisses other potential abuses of the language itself as being
unimportant, saying that a good programming language cannot defend against
intentional malice, and that in the spirit of C++, programmers are trusted to
know what they are doing. Yet, Cpp is unworthy of the same tolerance, and you
cannot be trusted to use it.
That's where I disagree. The preprocessor is only a tool, and, as with unions,
casts, unbounded arrays, unchecked pointers, and every way that a programmer
can subvert the protection mechanisms of the language, good programmers know
better than to abuse it. Careless and irresponsible programmers can use
#define to override the type system if they want to. They can burn down the
data center, too.
Bjarne made his point about Cpp in certain terms, but you don't have to agree
with his opinion. It is a moot point, in any event. Cpp is a source-code
preprocessing program. It reads and emits text, reacting to preprocessing
directives and making macro substitutions. It's output is compilable source
code. It is just a program. They can't make it go away by committee decree.
Even if they remove it from the C and C++ language definitions (which, I
opine, will not happen soon), it remains widely available, and programmers can
and will use it.


"C Programming" Column Source Code


Quincy, D-Flat, and D-Flat++ are available to download from the DDJ Forum on
CompuServe and on the Internet by anonymous ftp. See page 3 for details. If
you cannot get to one of the online sources, send a diskette and a stamped,
addressed mailer to me at Dr. Dobb's Journal, 411 Borel, San Mateo, CA 94402.
I'll send you a copy of the source code. It's free, but if you want to support
my Careware charity, include a dollar for the Brevard County Food Bank.

Listing One 

/* -------- debugger.h ------- */

#ifndef DEBUGGER_H
#define DEBUGGER_H


#define MAXBREAKS 25
#define SIZEWATCH 110
#define MAXARGS 20
#define WATCHHT 2
#define MAXWATCHHT 5
#define MAXWATCHES 25

extern BOOL Running;
extern BOOL Stepping;
extern BOOL CtrlBreaking;
extern BOOL Exiting;
extern WINDOW watchWnd;
extern DBOX ExamineDB;
extern DBOX FunctionStackDB;
extern char *errs[];
extern int LastStep;
extern int BreakpointCount;
extern int DupStdin;
extern int DupStdout;
extern int oldstdin;
extern int oldstdout;
extern char DupFileIn[65];
extern char DupFileOut[65];
extern char ErrorMsg[160];

int isBreakpoint(int);
void AdjustBreakpointsInserting(int, int);
void AdjustBreakpointsDeleting(int, int);
void SetBreakpointColor(WINDOW);
void SetBreakpointCursorColor(WINDOW);
void DisplaySourceLine(int);
void DisplayError(void);
void RunProgram(void);
void StepProgram(void);
void StepOverFunction(void);
void StopProgram(void);
void CommandLine(void);
void ExamineVariable(void);
void ToggleWatch(void);
void AddWatch(char *);
void FunctionStack(void);
void FunctionList(void);
void GetWatch(void);
void ShowWatches(void);
void DelWatch(void);
void DeleteAllWatches(void);
void UpdateWatches(void);
void unBreak(void);
void reBreak(void);

#endif



Listing Two

/* ---------- debugger.c ---------- */

#include "dflat.h"

#include "qnc.h"
#include "debugger.h"
#include "interp.h"

static BOOL StdoutOpened;
static BOOL inProgram;
static BOOL inIDE;
static BOOL Changed;
static BOOL Stopping;
static BOOL FuncStack;
static char CmdLine[61];
static char Commands[61];
static int argc;
static char *args[MAXARGS+1];
static int StuffKey;

BOOL Running;
BOOL Stepping;
int LastStep;
int SteppingOver;
int ErrorCode;
int currx, curry;
char ErrorMsg[160];

static char line[135];

static void CopyRedirectedFilename(char *, char **);
static int StartProgram(void);
static BOOL LineNoOnCurrentPage(int);
static void TurnPageToLineNo(int);
static void TestFocus(void);
static void BuildDisplayLine(int);
static BOOL SourceChanged(int);
static BOOL TestRestart(void);
static void TerminateMessage(int);
static void RunIDE(void);

int DupStdin = -1;
int DupStdout = -1;
int oldstdin;
int oldstdout;
char DupFileIn[65];
char DupFileOut[65];

/* --- run the program from Run command --- */
void RunProgram(void)
{
 int rtn = -1;
 TestFocus();
 if (SourceChanged(F9))
 return;
 Stopping = FALSE;
 if (Stepping) {
 /* --- executed Run after Step or Breakpoint --- */
 OpenStdout();
 inIDE = FALSE;
 Stepping = FALSE;
 return;
 }

 /* --- Running program from start --- */
 StdoutOpened = FALSE;
 HideIDE();
 Changed = editWnd->TextChanged;
 editWnd->TextChanged = FALSE;
 inProgram = FALSE;

 rtn = StartProgram(); /* run the program */

 /* --- returned from running the program --- */
 editWnd->TextChanged = Changed;

 if (Stepping) {
 /* Stepping would be Run/Bkpt/Step, returning here on program exit */
 Stepping = FALSE;
 SteppingOver = FALSE;
 Stopping = FALSE;
 LastStep = 0;
 SendMessage(editWnd, PAINT, 0, 0); /* clear IP cursor */
 }
 else {
 if (ErrorCode==0 && !Exiting && !Stopping && !StuffKey)
 PromptIDE();
 UnHideIDE();
 }
 if (ErrorCode)
 DisplayError();
 else if (!TestRestart())
 TerminateMessage(rtn);
}
/* --- step over the current statement --- */
void StepOverFunction(void)
{
 SteppingOver = 1;
 StepProgram();
}
/* --- execute one statement (Step) --- */
void StepProgram(void)
{
 TestFocus();
 if (SourceChanged(F7))
 return;
 if (!Stepping) {
 /* ---- the first step ---- */
 int rtn = -1;
 Stepping = TRUE;
 Stopping = FALSE;
 StdoutOpened = FALSE;
 LastStep = 0;
 inProgram = FALSE;
 Changed = editWnd->TextChanged;
 editWnd->TextChanged = FALSE;
 rtn = StartProgram(); /* take the step */
 editWnd->TextChanged = Changed;
 Stepping = FALSE;
 LastStep = 0;
 if (!isVisible(applWnd)) {
 UnHideIDE(); /* display the IDE */
 SteppingOver = FALSE;

 }
 if (ErrorCode)
 DisplayError();
 SendMessage(editWnd, PAINT, 0, 0);
 Stopping = FALSE;
 if (!TestRestart())
 TerminateMessage(rtn);
 }
 else
 inIDE = FALSE;
}
/* --- process Stop command --- */
void StopProgram(void)
{
 Stopping = TRUE;
 Stepping = FALSE;
 inProgram = FALSE;
 inIDE = FALSE;
 if (LastStep && LineNoOnCurrentPage(LastStep)) {
 WriteTextLine(editWnd, NULL, LastStep-1, FALSE);
 LastStep = 0;
 }
 TestFocus();
}
/* -- start running the program from Run or Step command -- */
static int StartProgram(void)
{
 int rtn = -1;
 char *cp = CmdLine;
 char *cm = Commands;
 argc = 0;
 args[argc++] = editWnd->extension;
 while (*cp && argc < MAXARGS) {
 while (isspace(*cp))
 cp++;
 if (*cp == '>') {
 CopyRedirectedFilename(DupFileOut, &cp);
 continue;
 }
 else if (*cp == '<') {
 CopyRedirectedFilename(DupFileIn, &cp);
 continue;
 }
 args[argc++] = cm;
 while (*cp && !isspace(*cp))
 *cm++ = *cp++;
 *cm++ = '\0';
 }
 CtrlBreaking = FALSE;
 unBreak();
 if (ErrorCode == 0) {
 Running = TRUE;
 rtn = qinterpmain(editWnd->text, argc, args);
 Running = FALSE;
 }
 reBreak();
 *DupFileIn = *DupFileOut = '\0';
 return rtn;
}

/* --- entry to debugger from interpreter --- */
BOOL debugger(void)
{
 if (SteppingOver) {
 SteppingOver = 0;
 Stepping = TRUE;
 curr_cursor(&currx, &curry);
 UnHideIDE();
 }
 if (inProgram == FALSE) {
 /* -- don't debug the first statement (main call) -- */
 inProgram = TRUE;
 return FALSE;
 }
 if ((CtrlBreaking Stepping BreakpointCount) && Ctx.CurrFileno == 0) {
 int lno = Ctx.CurrLineno;
 if (Stepping) {
 if (lno != LastStep) {
 if (LastStep) {
 int ln = LastStep;
 LastStep = 0;
 DisplaySourceLine(ln);
 }
 LastStep = lno;
 DisplaySourceLine(lno);
 SendMessage(editWnd, KEYBOARD_CURSOR, 0, lno-editWnd->wtop-1);
 /* --- let D-Flat run the IDE --- */
 reBreak();
 RunIDE();
 unBreak();
 }
 }
 else if (CtrlBreaking isBreakpoint(lno)) {
 LastStep = lno;
 Stepping = TRUE;
 UnHideIDE();
 TurnPageToLineNo(lno);
 SendMessage(editWnd, KEYBOARD_CURSOR, 0, lno-1-editWnd->wtop);
 MessageBox(DFlatApplication,
 CtrlBreaking ? errs[CTRLBREAK-1] : "Breakpoint");
 CtrlBreaking = FALSE;
 /* --- let D-Flat run the IDE --- */
 reBreak();
 RunIDE();
 unBreak();
 }
 }
 return Stopping;
}
/* --- copy a redirected filename from command line --- */
static void CopyRedirectedFilename(char *fname, char **cp)
{
 (*cp)++;
 while (isspace(**cp))
 (*cp)++;
 while (**cp && !isspace(**cp))
 *fname++ = *((*cp)++);
 *fname = '\0';
}

/* -- display source code line with bkpt/cursor highlights - */
void DisplaySourceLine(int lno)
{
 TurnPageToLineNo(lno);
 BuildDisplayLine(lno);
 if (isBreakpoint(lno)) {
 if (Stepping && lno == LastStep)
 SetBreakpointCursorColor(editWnd);
 else
 SetBreakpointColor(editWnd);
 }
 else if (Stepping && lno == LastStep)
 SetReverseColor(editWnd);
 else
 SetStandardColor(editWnd);
 PutWindowLine(editWnd, line, 0, lno-1-editWnd->wtop);
}
/* --- build a display line of code ---- */
static void BuildDisplayLine(int lno)
{
 char *tx = TextLine(editWnd, lno-1);
 int wd = min(134, ClientWidth(editWnd));
 char *ln = line;
 char *cp1 = ln;
 char *cp2 = tx+editWnd->wleft;
 while (*cp2 != '\n' && cp1 < ln+wd)
 *cp1++ = *cp2++;
 while (cp1 < ln+wd)
 *cp1++ = ' ';
 *cp1 = '\0';
}
/* ---- prompt user on output screen ---- */
void PromptIDE(void)
{
 printf("\nPress any key to return to Quincy...");
 getkey();
 printf("\r ");
}
/* --- test Run/Step command for source changed --- */
static BOOL SourceChanged(int key)
{
 if (Stepping && editWnd->TextChanged) {
 MessageBox(DFlatApplication, "Source changed, must restart");
 StopProgram();
 StuffKey = key;
 return TRUE;
 }
 return FALSE;
}
/* --- test for restarting with Run or Step command --- */
static BOOL TestRestart(void)
{
 if (StuffKey) {
 SendMessage(editWnd, PAINT, 0, 0);
 PostMessage(editWnd, KEYBOARD, StuffKey, 0);
 StuffKey = 0;
 return TRUE;
 }
 return FALSE;

}
/* --- run the IDE --- */
static void RunIDE(void)
{
 inIDE = TRUE;
 UpdateWatches();
 while (inIDE && dispatch_message())
 ;
}
/* ------------- set command line parameters ------------- */
void CommandLine(void)
{
 InputBox(applWnd, DFlatApplication, "Command line:", CmdLine, 128, 25);
}
/* ---- display program terminated message ---- */
static void TerminateMessage(int rtn)
{
 char msg[50];
 sprintf(msg, "Program terminated: code %d", rtn);
 MessageBox(DFlatApplication, msg);
}
/* --- test if source line is in view ---- */
static BOOL LineNoOnCurrentPage(int lno)
{
 return (lno-1 >= editWnd->wtop &&
 lno-1 < editWnd->wtop+ClientHeight(editWnd));
}
/* ---- display page with specified source code line --- */
static void TurnPageToLineNo(int lno)
{
 if (!LineNoOnCurrentPage(lno)) {
 if ((editWnd->wtop = lno-ClientHeight(editWnd)/2) < 0)
 editWnd->wtop = 0;
 SendMessage(editWnd, PAINT, 0, 0);
 }
}
/* --- make sure that edit window has the focus --- */
static void TestFocus(void)
{
 if (editWnd != inFocus)
 SendMessage(editWnd, SETFOCUS, TRUE, 0);
}
/* --- display an error message when program terminates --- */
void DisplayError(void)
{
 if (!Watching) {
 if (Ctx.CurrFileno == 0) {
 TestFocus();
 TurnPageToLineNo(Ctx.CurrLineno);
 SendMessage(editWnd, KEYBOARD_CURSOR, 
 0, Ctx.CurrLineno-1-editWnd->wtop);
 }
 }
 strcat(ErrorMsg, errs[ErrorCode-1]);
 beep();
 ErrorMessage(ErrorMsg);
 ErrorCode = 0;
}
/* --- hide IDE so program can use output screen --- */

void HideIDE(void)
{
 if (isVisible(applWnd)) {
 SendMessage(applWnd, HIDE_WINDOW, 0, 0);
 SendMessage(NULL, HIDE_MOUSE, 0, 0);
 cursor(currx, curry);
 }
}
/* -- display IDE when program is done with output screen -- */
void UnHideIDE(void)
{
 if (!isVisible(applWnd)) {
 curr_cursor(&currx, &curry);
 SendMessage(applWnd, SHOW_WINDOW, 0, 0);
 SendMessage(NULL, SHOW_MOUSE, 0, 0);
 }
}
/* --- interpreter needs output screen for stdout --- */
void OpenStdout(void)
{
 if (Stepping)
 HideIDE();
 if (!StdoutOpened) {
 StdoutOpened = TRUE;
 putchar('\n');
 }
}
/* --- interpreter is done with output screen for stdout --- */
void CloseStdout(void)
{
 if (Stepping)
 if (!SteppingOver)
 UnHideIDE();
}
/* ---- window proc module for Function Stack DB ------- */
static int FunctionStackProc(WINDOW wnd, MESSAGE msg, PARAM p1, PARAM p2)
{
 FUNCRUNNING *fr = Ctx.Curfunc;
 FUNCTION *fun = FunctionMemory;
 int sel;
 CTLWINDOW *ct =
 FindCommand(wnd->extension,ID_FUNCSTACK,LISTBOX);
 WINDOW lwnd = ct ? ct->wnd : NULL;

 switch (msg) {
 case INITIATE_DIALOG:
 Assert(lwnd != NULL);
 SendMessage(lwnd, CLEARTEXT, 0, 0);
 if (FuncStack) {
 while (fr != NULL) {
 char *fn=FindSymbolName(fr->fvar->symbol);
 if (fn)
 SendMessage(lwnd,ADDTEXT,(PARAM)fn,0);
 fr = fr->fprev;
 }
 }
 else {
 while (fun < NextFunction) {
 if (fun->libcode == 0) {

 char *fn = FindSymbolName(fun->symbol);
 if (fn)
 SendMessage(lwnd,ADDTEXT,(PARAM)fn,0);
 }
 fun++;
 }
 }
 SendMessage(lwnd, PAINT, 0, 0);
 break;
 case COMMAND:
 Assert(lwnd != NULL);
 switch ((int) p1) {
 case ID_FUNCSTACK:
 if ((int) p2 == LB_CHOOSE)
 SendMessage(wnd, COMMAND, ID_OK, 0);
 break;
 case ID_OK:
 if (p2)
 break;
 sel = SendMessage(lwnd,
 LB_CURRENTSELECTION, 0, 0);
 if (FuncStack) {
 while (fr != NULL && sel--)
 fr = fr->fprev;
 fun = fr->fvar;
 }
 else
 while (fun < NextFunction) {
 if (fun->libcode == 0)
 if (sel-- == 0)
 break;
 fun++;
 }
 if (fun->fileno == 0) {
 TurnPageToLineNo(fun->lineno);
 SendMessage(editWnd, KEYBOARD_CURSOR,0,
 fun->lineno-1-editWnd->wtop);
 }
 break;
 default:
 break;
 }
 default:
 break;
 }
 return DefaultWndProc(wnd, msg, p1, p2);
}
/* ---- display the function stack ------- */
void FunctionStack(void)
{
 FuncStack = TRUE;
 FunctionStackDB.dwnd.title = "Function History";
 DialogBox(applWnd, &FunctionStackDB, TRUE, FunctionStackProc);
}
/* ---- display list of functions ------- */
void FunctionList(void)
{
 FuncStack = FALSE;
 FunctionStackDB.dwnd.title = "Function List";

 DialogBox(applWnd, &FunctionStackDB, TRUE, FunctionStackProc);
}




























































August, 1994
ALGORITHM ALLEY


Password Generation by Bloom Filters




William Stallings


William is president of Comp-Comm Consulting of Brewster, MA. He is the author
of over a dozen books on data communications and computer networking,
including Network and Internetwork Security (Prentice-Hall, 1994). He can be
reached at stallings@acm.org.


Introduction 
by Bruce Schneier 
Niklaus Wirth said: "Algorithms+data structures=programs." Every computer
program consists of complex algorithms: to sort the database, compute the
formulas, draw the graphics, and display the data. Often, the difference
between a good program and a bad one is the underlying algorithms.
"Algorithm Alley" explores the design and implementation of algorithms. Every
month I'll present useful algorithms that you can implement today. The
algorithms will cover a variety of areas--computation, graphics, databases,
networking, artificial intelligence, and more--and be relevant to many more
applications. 
The ultimate goal of this column is to help you think about algorithms so you
can develop your own. A craft so varied as programming cannot be taught as a
series of recipes. No matter how many algorithms I present, you're going to
need something else. If I can teach you general principles of algorithms, then
you can take them with you wherever you program.
My first column is about Bloom filters, a method of hashing that greatly
reduces memory requirements at the expense of false "hits." They are useful in
a variety of applications, particularly those in which no calculation is
required if the search is unsuccessful. For example, you might want to check
someone's credit rating or passport number, but do nothing else if the record
doesn't exist. While Bloom filters will occasionally report that a record
exists when it doesn't, they'll never erroneously report that a record doesn't
exist when it does.
Consider a differential file: a separate file of changes to a main database.
Every night, the changes are incorporated into the database. Meanwhile, each
database access must first check the differential file to see if the record of
interest has been modified. A Bloom filter can reduce accesses to the
differential file. For instance, each time a record is updated, you hash the
record key with this technique. Then, you access a record check to see if
there is a hit against the hash file. If there is no hit, you can be
guaranteed that the record was not modified. If there is a hit, you must
search the differential file.
How about a hyphenation routine with a general rule and a table of exceptions?
If you don't find the word in the exception hash file, use the general rule.
If there is a hit, search the word database for the particular exception.
Bloom filters can even work as spelling checkers. Occasionally a nonword
"passes," but the dictionary can be stored in far less space than it would be
as individual words.
In this month's column, Bill Stallings uses Bloom filters in a similar
application. Instead of checking for correctly spelled words, however, he uses
Bloom filters to check for easy-to-guess passwords like those that made the
1989 Internet Worm an infamous part of computer lore. As you might guess, such
passwords, which are highly susceptible to computer break-ins, bring smiles to
crackers' faces, and Bill's approach to Bloom filters and computer-generated
passwords should be seriously considered. 
I look forward to hearing from you about the algorithms you find most useful,
algorithms you'd like to find out more about, or those that you've developed
and that you'd like to share with other DDJ readers. You can contact me at
schneier@chinet.com, or through the DDJ offices. 
A system intruder's objective is to gain access to your computer system or to
increase the range of privileges accessible on it. Generally, this requires
that the intruder acquire information that should have been protected, usually
via user passwords. With knowledge of someone else's password, the intruder
can log into a system and exercise all the privileges accorded to the
legitimate user.
Left to their own devices, many users choose a password that is too short or
too easy to guess. However, if users are assigned passwords consisting of,
say, eight randomly selected, printable characters, password cracking can be
effectively rendered impossible. The problem with this approach is that most
users can't remember such passwords. Fortunately, even if we limit the
password universe to strings of characters that are reasonably memorable, the
size of the universe is still too large to permit practical password cracking.
Our goal, then, should be to eliminate guessable passwords while allowing the
user to select a password that is still memorable. There are four basic
techniques currently in use to enable this:
User education.
Computer-generated passwords.
Reactive password checking. 
Proactive password checking.
Users can be told the importance of using hard-to-guess passwords and can be
provided with guidelines for selecting strong passwords. This user-education
strategy is unlikely to succeed at most locations because many users will
simply ignore the guidelines, while others may not be good judges of what is a
strong password. For example, many users (mistakenly) believe that reversing a
word or capitalizing the last letter makes a password unguessable.
Computer-generated passwords also have problems. If the passwords are random
in nature, users will not be able to remember them. Even if the password is
pronounceable, the user may have difficulty remembering it and so be tempted
to write it down. In general, computer-generated password schemes have a
history of poor acceptance by users.


Reactive Password Checking 


A reactive password-checking strategy is one in which the system periodically
runs its own password cracker to find guessable passwords. The system cancels
any passwords that are guessed and notifies the user. This tactic has a number
of drawbacks. First, it is resource intensive if the job is done right.
Because a determined opponent who is able to steal a password file can devote
hours or even days of full CPU time to the task, an effective reactive
password checker is at a distinct disadvantage. Furthermore, any existing
passwords remain vulnerable until the reactive password checker finds them.


Proactive Password Checking


The most promising approach to improved password security is a proactive
password checker. In this scheme, a user is allowed to select his or her own
password. However, at the time of selection, the system checks to see if the
password is allowable and, if not, rejects it. Such checkers are based on the
philosophy that, with sufficient guidance from the system, users can select
memorable passwords from a fairly large password space that are not likely to
be guessed in a dictionary attack.
The trick with a proactive password checker is to strike a balance between
user acceptability and password strength. If the system rejects too many
passwords, users will complain that it is too hard to select a password. If
the system uses a simple algorithm to define what is acceptable, password
crackers, too, can refine their guessing technique.
The most straightforward approach is a simple system for rule enforcement. For
example, all passwords have to be at least eight characters long, or the first
eight characters must include at least one uppercase letter, one lowercase
letter, one numeral, and one punctuation mark. These rules could be coupled
with advice to the user. Although this approach is superior to simply
educating users, it may not be sufficient to thwart password crackers. This
scheme alerts crackers as to which passwords not to try, but may still make it
possible to do password cracking.
Another possible procedure is simply to compile a large dictionary of possible
"bad" passwords. When a user selects a password, the system checks to make
sure that it is not on the disapproved list. However, one problem with this
approach is space--the dictionary must be very large to be effective. Another
problem is time--the time required to search a large dictionary may itself be
great. In addition, to check for likely permutations of dictionary words,
either those words must be included (making it truly huge), or each search
must also involve considerable processing.


Bloom Filters


A different approach is based on the use of Bloom filters, a hashing technique
that makes it possible to determine, with high probability, whether a given
word is in a dictionary. The amount of online storage required for the scheme
is considerably less than that required to store an entire dictionary of words
and permutations, and the processing time is minimal. Eugene Spafford and his
colleagues at Purdue have adapted the Bloom filter for proactive password
checking.
A Bloom filter of order k consists of a set of k independent hash functions
H1(x), H2(x),_, Hk(x), where each function maps a word into a hash value in
the range 0 to N--1. That is, Hi(Xj)=y 1ik; 1jD; 0yN--1, where Xj=jth
word in the password dictionary, and D=number of words in the password
dictionary. This procedure is then applied to the dictionary: 1. A hash table
of N bits is defined, with all bits initially set to 0; 2. for each dictionary
word, its k hash values are calculated, and the corresponding bits in the hash
table are set to 1. Thus, if Hi(Xj)=67 for some (i,j), then the 67th bit of
the hash table is set to 1; if the bit already has the value 1, it remains at
1.
When a new password is presented to the checker, its k hash values are
calculated (see Figure 1). If all the corresponding bits of the hash table are
equal to 1, then the password is rejected. All passwords in the dictionary
will be rejected. But there will also be some "false positives"--passwords
that are not in the dictionary but that do produce a match in the hash table.
To see this, consider a scheme with two hash functions. Suppose that the
passwords undertaker and hulkhogan are in the dictionary, but XlnsXqtn is not.
Further suppose that:
H1(undertaker)=25

H1(hulkhogan)=275
H1(XlnsXqtn)=665
H2(undertaker)=998
H2(hulkhogan)=665
H2(XlnsXqtn)=998
If the password XlnsXqtn is presented to the system, it will be rejected even
though it is not in the dictionary. If there are too many such false
positives, it will be difficult for users to select passwords. Therefore, you
would like to design the hash scheme to minimize false positives. It can be
shown that the probability of a false positive can be approximated by the
equation in Example 1(a) or, equivalently, Example 1(b), where k=number of
hash functions, N=number of bits in hash table, D=number of words in
dictionary, and R=(N/D), the ratio of hash-table size (bits) to dictionary
size (words).
Figure 2 plots P as a function of R for various values of k. Suppose you have
a dictionary of one million words and wish to have a 0.01 probability of
rejecting a password not in the dictionary. If you choose six hash functions,
the required ratio is R=9.6. Therefore, you need a hash table of 9.6x106 bits,
or about 1.2 megabytes of storage. In contrast, storage of the entire
dictionary would require on the order of eight megabytes. Thus, you achieve a
compression factor of almost 7. Furthermore, password checking involves the
straightforward calculation of six hash functions and is independent of the
size of the dictionary, whereas with the use of the full dictionary, there is
substantial searching.


References


Bloom, B. "Space/time Trade-offs in Hash Coding with Allowable Errors."
Communications of the ACM (July 1970).
Spafford, E. "Observing Reusable Password Choices." Proceedings, UNIX Security
Symposium III, September 1992.
----. "OPUS: Preventing Weak Password Choices." Computers and Security (No. 3,
1992).
Stallings, W. Network and Internetwork Security: Principles and Practice.
Englewood Cliffs, NJ: Prentice-Hall, 1994.
Figure 1 Password checking with a Bloom filter.
Figure 2 Performance of Bloom filter.
Example 1 Approximating the probability of a false positive.











































August, 1994
UNDOCUMENTED CORNER


Undocumented OS/2: DosQProcStatus




Troy Folger


Troy is an OS/2 developer for a large company in the retail industry. He can
be reached on CompuServe at 72360,427.


Applications developed for OS/2 2.x may employ multiple threads or processes,
and OS/2 developers have a variety of interprocess communication (IPC)
mechanisms to manage them. It would be helpful if OS/2 apps could examine
external processes, IPC resources, and DLLs in this complex environment, but
IBM does not provide the necessary functionality in OS/2's documented API.
Of course, you can observe external processes or system resources with the
OS/2 command PSTAT. In this article, I'll examine DosQProcStatus (Dos Query
Process Status, or DQPS), the undocumented API that PSTAT uses to obtain
low-level OS/2 system information.
DQPS is a 16-bit API that has existed since OS/2 1.1. As undocumented OS/2
functions go, it is the most well known. A quick search of CompuServe's OS/2
forums reveals a number of DQPS explorers who have made relevant code
available, among them George Brickner, Rick Fishman, Franz Krainer, and Chris
Laforet.
In July 1993, an IBM employee released DOSQPS.ZIP on IBM's CompuServe OS2DF1
Forum (Lib 1). DOSQPS.ZIP includes DOSQPROC.TXT, a description of DQPS, and
DOSQPROC.INF, the same information formatted for the OS/2 VIEW facility.
Although these documents don't focus on PSTAT, they do mention that PSTAT uses
information returned by DQPS. IBM has provided these documents and allowed
them to be circulated freely, but declares that:
...some or all of the interfaces described in this document are unpublished.
IBM reserves the right to change or delete them in future versions of OS/2 at
IBM's sole discretion, without notice to you. IBM does not guarantee that
compatibility of your applications will be maintained with future versions of
OS/2.
This is not an idle threat: DQPS changed significantly between OS/2 1.3 and
2.0. DQPS currently remains "officially undocumented" and is not discussed in
any IBM-provided toolkit, online reference, or sample code, apart from the
file on CompuServe. 
Unfortunately, the information in DOSQPROC.TXT is incomplete and inaccurate.
After examining the DQPS API to learn more about the areas left unexplained
(and to correct the inaccuracies), I wrote the C header file DOSQPROC.H
(available electronically; see "Availability," page 3), which allows anyone
armed with a 32-bit OS/2 compiler capable of calling a 16-bit API and passing
16-bit pointer parameters to fully use DQPS on OS/2 2.x systems. 
I've also written a simple OS/2 text-mode program, PROCINFO.C (also available
electronically), which uses the header file to demonstrate most of the data
supplied by DQPS. PROCINFO shows the process IDs of arbitrary system processes
or the names of their corresponding executables, and allows you to view the
resources in use by those processes. 
The data returned by DQPS can be divided into four major classes: process and
thread records, executable module records, 16-bit system semaphore records,
and named shared-memory segment records. Not every type of system resource is
represented in the information maintained by DQPS; for instance, data
regarding OS/2 pipes, queues, and 32-bit semaphores is absent.


Accessing DQPS


OS/2's DOSCALL1 library provides access to DQPS. DOSQPROC.TXT indicates that
the DQPS entry point is DOSCALL1 ordinal 154, so applications should link with
a module-definition file (.DEF) having the entry DOSQPROCSTATUS=DOSCALL1.154
in the IMPORTS section. 
Running IBM's EXEHDR utility (using the /VERBOSE option) on OS/2 2.1's
PSTAT.EXE confirms that PSTAT uses DQPS: PSTAT has a relocation record for
DOSCALLS.154. (In the context of this discussion, the DOSCALL1 and DOSCALLS
libraries can be considered equivalent.) 
IBM LAN Server and LAN Requester reveal that FFST/2 (IBM's logging and
diagnostic facility for network users) also uses DQPS: One of the FFST/2 DLLs
(EPWPSI16.DLL) imports DOSCALLS.154. The serviceability and diagnostic aids to
LAN administrators FFST/2 provides give an idea of the types of applications
that can benefit from DQPS. If nothing else, IBM's own use of DQPS provides
some assurance that the API will be around for a while.
The 16-bit DQPS API takes two parameters: a 16-bit far pointer to a
user-allocated memory buffer and a USHORT containing the length of the buffer.
It returns a 16-bit unsigned value indicating the success or failure of the
call. Figure 1 presents a C-function prototype. (IBM's CSet++ compiler will
automatically thunk the pointer parameters on functions declared with the
APIENTRY16 modifier from 32-bit to 16-bit. However, the current version of
Borland's C++ for OS/2 compiler requires the _far16 keyword in the parameter
declarations.)
DOSQPROC.TXT doesn't enumerate the possible error returns, only mentioning
that a return of zero indicates correct operation, and nonzero indicates an
error. DOSQPROC.TXT also suggests that 64K is the preferred buffer-allocation
size because you cannot accurately predict how much information a DQPS call
will return. 
The first parameter to DQPS, p16Buf, is declared as a PVOID (VOID _far16* in
Borland) because the information copied to the passed buffer by the API is not
a single structure, but a collection of several different data types relating
to global data, processes and threads, 16-bit system semaphores, executable
modules, and shared-memory segments (see Figure 2). The buffer is an unwieldy
collection of pointers to arrays of structures and linked lists; to use the
information, you must match the handles found in one data structure to
corresponding handles in other, separate data structures. 
The layout of the data holds a clue as to why this API remains
undocumented--it exposes a broad assortment of information that should be
provided in a more accessible format. Perhaps the complexity of DQPS is also
why PSTAT has been one of the least reliable and most visually crude OS/2
utilities (for example, compare the output from PSTAT /L to the output from
PSTAT /P:xx, where xx is the PID of the first instance of the process
PMSHELL.EXE loaded in the system: PSTAT /L indicates fewer DLLs loaded for
PMSHELL than does PSTAT /P).


Using the DQPS API: PROCINFO.EXE


My PROCINFO.EXE utility shows how to use the contents of the DQPS buffer.
PROCINFO accepts either the ID or the name of an OS/2 process on the command
line and displays information regarding any matching process's usage of system
resources. PROCINFO was developed at the suggestion of an OS/2 developer who
was trying to see if an instance of a given program was already running on an
OS/2 system.
The PROCINFO utility starts by obtaining command-line information, allocating
a buffer of size DQPS_BUFSIZE, and then calling DosQProcStatus. Upon
successful return from the API, p16Buf fills with system-process and resource
information. The first few bytes of the buffer are cast to a pointer to a DQPS
pointer record, a structure that in turn contains pointers to the various
lists of information represented in the buffer. The structure type is named
qsPtrRec_t, and it defines five important members (see the excerpt from
DOSQPROC.H in Listing One), which are 32-bit pointers to the beginning of the
other sections of the buffer.
The first member of qsPtrRec_t, pGlobalRec, points to a "global" data record
containing system-wide information. The next member, pProcRec, points to the
process-data section, the heart of the DQPS API. The pointers p16SemRec,
pShrMemRec, and pLibRec each point to lists of system resources--the 16-bit
system semaphores list, the named shared-memory-segment list, and the
executable-module list. 
DOSQPROC.TXT contains a serious error in its definition of qsPtrRec_t. It
defines only the five significant structure members, omitting the 4-byte value
occurring between the p16SemRec and pShrMemRec members of the structure. This
is a source of confusion for first-time users of DOSQPROC.TXT; an attempt to
use this erroneous type definition will result in an improper initialization
of the pShrMemRec and pLibRec structure members. The type definition of
qsPtrRec_t in Listing One modifies IBM's definition by adding the necessary
new structure member, VOID*Reserved. 


DQPS Process Data


The process data records referenced by the pProcRec member of qsPtrRec_t are
part of a block of information regarding the system processes and threads
active at the time of the call to DQPS. Each process record provides detailed
information about the process's status and lists some types of resources that
it is using. These lists are actually arrays of indexes or handles into the
resource lists found in the other sections of the buffer, allowing the
developer to examine system-resource usage on a per-process basis. Figure 3
shows a sample process that includes two threads, a 16-bit semaphore, and
three loaded DLLs. The process has mapped one shared-memory segment.
Information on threads owned by the process is available via a pointer to an
array of thread data records. The number of thread data records in the array
is given by the thread control-block count member of the process data
structure. 
One of the key uses of DQPS is to determine the name of a process given a PID
or, conversely, the ID of a process given an executable name. Each process has
an associated module table entry in the executable module data-record list,
and the handle (HMTE) of the entry in that list is found in the process data
record. The name of the executable that corresponds to the process data record
can be determined by cross-referencing the process record's module-table
handle with the entries in the executable module data-record list. 
PROCINFO obtains the pointer to the block of process data records from the
DQPS pointer record and performs a quick sanity check to make sure that it
doesn't point to NULL. Depending on the type of request made on the
command-line, PROCINFO will traverse the list of process records looking for
either a particular process ID or a process having a certain name. 
When it finds a process record indicating that a process matches the
command-line specification, PROCINFO calls DisplayProcessInfo to display
system-resource information regarding that process. This information includes:
the full path specification of the executable that the process is running; the
process ID and parent PID; the type and status of the process; and the number
and characteristics of the threads, semaphores, DLLs, and named shared-memory
segments in use by the process. 
In addition to process information, you can also examine the individual
resources a process is using. Sandwiched between the last member of the
process data structure and the first member of the initial thread data
structure may be one or more arrays of indexes or handles into the DQPS
buffer's semaphore, library module, or shared-memory record lists. To
determine the resource associated with a particular process-resource index or
handle, traverse the corresponding resource list and examine successive nodes
until the matching resource is found. PROCINFO performs these operations in
its Display-Semaphores, DisplayLibraryRecords, and DisplaySharedMemory
functions. 
DQPS can also be used to determine the type or status of external processes.
For instance, the OS/2 DosGetInfoBlocks API reports process type and status
only for the process calling the API. DQPS reports the status and type of all
system processes. 
The type member of qsPrec_t will have the same value that the pib_ultype
member of the PIB struct returned by DosGetInfoBlocks would have if the
corresponding process called DosGetInfoBlocks. I have followed the qsPrec_t
type definition in DOSQPROC.H with macros defining the known process-type
values (see Listing Two).

IBM's DOSQPROC.TXT lists the possible values for the stat member (the process
status flag) of the qsPrec_t struct. This process status flag will have the
same value that the pib_flstatus member of the PIB struct returned by
DosGetInfoBlocks would have if the corresponding process called
DosGetInfoBlocks. I have followed the qsPrec_t type definition in DOSQPROC.H
with macros defining the known process status-flag values (see Listing Two).
Any application using the process-record status flag should treat the value in
the field as a collection of bit flags. PROCINFO utilizes the type and stat
members of qsPrec_t to display additional detail concerning the designated
process.


DQPS Thread Data


If you've written multithreaded or time-critical applications, you know the
importance of managing thread priority in an OS/2 system. Even established
software from large companies is vulnerable to the pitfalls of poor thread
management. Borland's Brief, for example, controls mouse operations through a
low-priority thread. Borland's implementation does not handle thread
starvation, and the result is Brief's erratic mouse performance and its
failure to terminate completely in certain situations. 
DQPS allows developers to easily recognize these situations and respond to
them. With this power comes responsibility--altering the behavior of OS/2's
task scheduler by modifying application thread priorities can have a negative
impact on overall system performance. Of course, DQPS can also be used to
observe system thread priorities, and this is often sufficient for debugging
multithreaded applications. PROCINFO is useful for this purpose; it takes the
information found in DQPS thread data records and displays the characteristics
of a given process's threads.
The pThrdRec member of a process record (qsPrec_t) points to a structure of
type qsTrec_t, the thread data record. The thread data records maintain
information specific to individual threads, including their priorities. Each
process has one or more threads associated with it, and the process's
associated thread records follow the process record in the DQPS buffer.
Accessing thread records is therefore process dependent; you must traverse the
DQPS buffer process by process to access each of the system's thread records.
PROCINFO's DisplayThreads function displays each of the threads associated
with the passed process record, demonstrating how you can dynamically
determine the status and priority of any thread in an OS/2 system.
The thread records following a process record can be referenced as elements in
an array. The first member of a thread record, RecType, is always 0x100, and
is not really useful. The final three bytes of the structure are pad bytes,
probably to provide DWORD alignment.
The slot member of qsTrec_t has the same value that the tib_ordinal member of
the TIB struct returned by DosGetInfoBlocks would have if the thread described
by the qsTrec_t struct were to call DosGetInfoBlocks. The same relationship
exists between the priority member of qsTrec_t and DosGetInfoBlocks' TIB2
member, tib2_ulpri. IBM's DOSQPROC.TXT lists three possible values for the
state member of qsTrec_t; see Listing Two.


DQPS Semaphore and Named Shared-Memory Segment Data


Many OS/2 applications require semaphores or shared memory. PROCINFO reveals
how you can use DQPS to track the names of, and references to, 16-bit system
semaphores and named shared-memory segments.
The 16-bit system semaphore-detail data records (qsS16rec_t) and named
shared-memory-segment data records (qsMrec_t) are both arranged as linked
lists in the buffer returned by DQPS. Each node in the list of semaphore
records carries the specifics of an associated semaphore, and the nodes in the
shared-memory list likewise detail their associated, named shared-memory
segments. The pointer record's pShrMemRec member points directly to the list
of shared-memory records, but the p16SemRec member of the pointer record
points not to the list of semaphore-detail records, but rather to the 16-bit
system semaphore-header record that immediately precedes it. A pointer to the
list of semaphore-detail records is constructed using pointer arithmetic to
increment the pointer record's pointer to the semaphore-header record by one
and by casting that result to type qsS16rec_t (see Listing Three).
PROCINFO.C shows the relationship between the SEMINDEX indexes and HMEM
handles that follow a process data record and individual semaphores and named
shared-memory segments. The ShowSemaphore function runs through the linked
list of semaphore records, stopping at the record whose ordinal position
matches the passed SEMINDEX parameter. ShowSharedMemorySegment similarly finds
desired shared-memory information by looking at the shared-memory segment
records in the list and comparing each record's HMEM member to the one
requested.


DQPS Executable-Module Data


The executable-module data structures (also called "library records" or
"module-table entries"--MTEs) in the DQPS buffer contain system-resource
summary information describing OS/2 executable modules. Examples of OS/2
executable modules are executables, DLLs, device drivers, and font files. Each
executable-module data structure contains a pointer to the full pathname of
the corresponding module, the module's module-table entry handle (HMTE), and a
count of the number of external modules directly referenced by this module. An
array of MTE handles enumerating the directly referenced modules immediately
follows the library record. The module's pathname pointed to by the pName
structure member follows the HMTE array. A library record has, as its first
member, a pointer to the next executable-module data record in the list.
The executable-module data records maintain many useful bits of information.
The most valuable use of the linked list is to cross-reference its HMTE member
with the MTE handle found in a process data record. This determines the full
pathname of a given process and the names of each of the executable modules
loaded by the process. An application can also traverse the chain of modules
statically referenced by a process by examining the array of module handles
that follows each relevant library record, determining which DLLs or fonts the
process loads.
Executable-module data records have other valuable properties. Rick Fishman's
PROCSTAT.H file (from the KILLEM.ZIP archive in Library 1 of CompuServe's
OS2DF1 Forum) indicates that the first reserved field of the qsLrec_t
structure equals 0 when 16-bit modules are referenced by a given
executable-module record, and 1 when 32-bit modules are referenced. I've
adopted Rick's name for this member of the structure: usModType. PROCSTAT.H
also informs you that the second reserved field is a count of the number of
segments in the executable module associated with the structure (as can be
verified with EXEHDR). I call this member of the qsLrec_t structure
ctSegments. 
PROCINFO's ShowLibraryRecord and GetModuleName use the relationship between
the HMTE field in a process record and the handles of records in the
executable-module list. PROCINFO uses GetModuleName to find the name of a
given process by taking the associated process record's MTE handle and
traversing the linked list of library records until a record with the matching
HMTE is encountered. This returns the name of the executable loaded by the
process, pointed to by the pName field of the library record.


Using DQPS in Your Applications


Although IBM prefers that you not use DQPS in OS/2 applications, you sometimes
don't have a choice. Some environments demand supervisory control of processes
and system resources, and the documented OS/2 API does not address this
requirement. IBM's own use of DQPS underscores this point. It is true that if
you use undocumented functions, you risk having to scramble to fix broken
applications when new versions of operating systems are released, but this is
often a trivial concern. Ask someone porting an application from OS/2 1.x to
OS/2 2.x, or someone moving a Win16 app to Win32, which set of functions
underwent the greatest degree of change: the undocumented functions or the
documented? In any case, remember that you, not the operating-system vendor,
are in the best position to decide what is best for your particular
application. 


Acknowledgments


I would like to thank Jon Wright, Howard Kapustein, Rick Fishman, Wayne
Kovsky, Dwayne Nebeker, and Jim Masse for their assistance with this article.
Other Undocumented OS/2 Areas
There are other undocumented aspects of OS/2 besides the DosQProcStatus API.
What follows is a list of several of the more prominent undocumented functions
or features.
DosGetSTDA. OS/2's system-trace facility API. Potentially useful for debugging
and profiling tools.
DCF (Data Collection Facility). DCF is somehow connected with APIs that allow
performance-monitoring applications to get system statistics and measurements
directly from the OS/2 kernel. IBM's SPM/2 (System Performance Monitor/2) and
a number of third-party applications use these APIs.
DosReplaceModule. When an OS/2 executable module is in use, OS/2 locks the
file, preventing the file from being replaced or deleted. The DosReplaceModule
API allows the on-disk replacement of an existing module with a new module,
while the system continues to run with the old one. The function is located at
DOSCALLS.417, and the prototype is: APIRET APIENTRY DosReplaceModule(PSZ
pszOldModule, PSZ pszNew-Module, PSZ pszBackupModule). The contents of
pszOldModule are cached by the system, the file is closed, and a backup copy
of the file, pszBackupModule, is created for recovery purposes, should the
routine performing the module replacement fail. The new module pszNewModule
then takes the place of the original module on the disk. Calling the function
and specifying only the first parameter makes it possible to delete or copy
the indicated module. Apparently, DosReplaceModule is the routine that IBM's
display-driver installation program (DSPINSTL) uses to replace or upgrade OS/2
display drivers.
IFS/HPFS. Installable file systems, an important part of OS/2, are largely
undocumented.
INF/HLP file format. It is widely known that OS/2 .HLP and .INF files differ
by only one byte. Why is it that the OS/2 view facility works with .INF files
but not with .HLP files? The format of these files is not documented.
DosQueryTmr. This is supposedly an API that allows higher-resolution timing
than that provided by documented OS/2 APIs. 
Presentation Manager window messages. Undocumented window messages abound
among PM applications. For instance, WM_CANCELMODE (0x005A) is sent to a
window when it is "usurped" by the Window List or by other system pop-ups.
Responding to this message with an (MRESULT)TRUE will disable the Ctrl-Esc and
Alt-Esc hot-key combinations for that window, and responding to a subsequent
WM_CANCELMODE message with (MRESULT)FALSE will reenable the hot keys.
WinSetErrorInfo. The online programming reference for the OS/2 WorkPlace Shell
describes wpclsSetError, a method "analogous to the WinSetErrorInfo function
that is used by Presentation Manager functions to log their error return
codes." No mention is made of the WinSetErrorInfo API in IBM's Presentation
Manager documentation. 
Calling DosSleep from a DOS application. Example 1(a) shows an undocumented
way for DOS applications running in an OS/2 VDM to yield a time slice or sleep
for a specified number of milliseconds.
WinStretchPointer. This function is intended to allow 32x32 or smaller pointer
bitmaps to be stretched or shrunk to new sizes. The function's entry point is
at PMWIN.968, and the function prototype is similar to the prototype for the
documented API WinDrawPointer, with the addition of parameters specifying the
destination bitmap height and width; see Example 1(b). Rick Fishman adds that
his tests indicate that "if you have a mini-icon as one of your formats
[passed in the HPOINTER parameter], WinStretchPointer will always draw the
mini-icon. If not, it will stretch or compress the pointer." Rick also points
out that this function is not entirely undocumented; it appears in the
OS2386.LIB import library that comes with the IBM OS/2 2.1 Toolkit.
----T.F.
Example 1: (a) Calling DosSleep; (b) function prototype for WinStretchPointer.
(a)
xor dx,dx
mov ax,woMilliSecs ; Number of milliseconds in DX:AX.
 ; A value of 0 means that the current
 ; timeslice is released.

hlt ; Trigger OS/2's exception manager.
db 35h,0CAh ; Signature to differentiate between a
 ; normal HLT instruction and the call
 ; to DosSleep().
(b)
BOOL APIENTRY WinStretchPointer(
 HPS hps,
 LONG x,
 LONG y,
 LONG cx,
 LONG cy,
 HPOINTER hptr,
 ULONG fs);
Figure 1: DosQProcStatus prototype.
#ifdef __BORLANDC__
 /* OS/2 2.x prototype with Borland C++ 1.0, 1.01 _far16 * semantics */
 APIRET16 APIENTRY16 DosQProcStatus(VOID _far16 * p16Buf, USHORT cbBuf);
#else
 /* typical OS/2 2.x prototype */
 APIRET16 APIENTRY16 DosQProcStatus(PVOID p16Buf, USHORT cbBuf);
#endif
/* suggested buffer size for DosQProcStatus */
#define DQPS_BUFSIZE 0xFFFF
Figure 2 Contents of buffer returned by DosQProcStatus.
Figure 3 Sample process record and associated data structures.

Listing One
/* Excerpt from DOSQPROC.H. Complete header file available electronically. */
/* VOID * p16Buf = NULL;
 * USHORT cbBuf = DQPS_BUFSIZE;
 * qsPtrRec_t * pPtrRec;
 * DosQProcStatus(p16Buf,cbBuf);
 * pPtrRec = (qsPtrRec_t *)p16Buf;
 * // ...
 */
typedef struct qsPtrRec_s
{
 qsGrec_t * pGlobalRec; /* ptr to the global data structure */
 qsPrec_t * pProcRec; /* ptr to process data list */
 qsS16Headrec_t * p16SemRec; /* ptr to 16 bit system sem list */
 VOID * Reserved; /* always NULL - see comments above */
 qsMrec_t * pShrMemRec; /* ptr to shared memory seg list */
 qsLrec_t * pLibRec; /* ptr to module table entry list */
} qsPtrRec_t;


Listing Two 
/* Excerpt from DOSQPROC.H. Complete header file 
 available electronically. */
/* process 'type' definitions */
#define PT_FULL_SCREEN 0
#define PT_DOS_OR_WINOS2 1 /* kernel (SYSINIT) process */
#define PT_WINDOWED 2 /* OS/2 windowed session */
#define PT_PM 3 /* Presentation Manager */
#define PT_DETACHED 4
/* process status definitions */
#define PS_IN_EXITLIST 0x01
#define PS_EXITING_THREAD_1 0x02
#define PS_PROCESS_EXITING 0x04

#define PS_TERMINATION_AWARE 0x10
#define PS_PARENT_EXEC_WAIT 0x20
#define PS_DYING 0x40
#define PS_EMBRYONIC 0x80
/* thread status definitions */
#define TS_READY 1
#define TS_BLOCKED 2
#define TS_RUNNING 5


Listing Three
/* Constructing a pointer to the list of semaphore detail 
 records. Excerpt from DOSQPROC.H */
qsPtrRec_t * pPtrRec;
qsS16rec_t * pSemRec;
// ...
pSemRec = (qsS16rec_t *)(pPtrRec->p16SemRec + 1);













































August, 1994
PROGRAMMER'S BOOKSHELF


C++ and the PowerPC




Al Stevens


This month's "Programmer's Bookshelf" looks at two books on completely
different subjects that have two things in common. First, both provide
historical accounts of events that shaped specific niches within the computer
industry. Second, the subjects of these books will influence what we
programmers do for the next several years. The first book relates the history
of C++ from its beginning, up through the near future, when C++, as defined
and invented by ANSI/ISO, will be available. The second book is about the
PowerPC microprocessor, which could become a dominant software-development
platform.


The Design and Evolution of C++


The Design and Evolution of C++ is Bjarne Stroustrup's account of the events
and people that contributed to the current condition of C++. In widespread use
for many years, C++ has grown into a formidable software-development
environment that has progressed from a preprocessor that added classes to C to
the acknowledged language of choice for a generation of programmers. It is now
about 15 years old, sports compilers for virtually every major operating
system and environment, and is undergoing formal standardization and
augmentation at the able hands of ANSI/ISO committees.
This isn't for everyone. You will understand and appreciate the story much
better if you are a C++ programmer. Sometimes the book describes
characteristics of the language and then explains the rationale behind their
inclusion in the language. Other times, it delves into particular arcane
behaviors of C++. You need to understand the nature of C++ and the potential
implications of the hidden aspects of the language in order to keep up. A lot
of the book uses code to illustrate the point at hand, and a programmer who
already reads C++ has an advantage over one who does not.
This is an important book, an important addition to the culture, not only for
its historical perspective, but for the insight that it provides into the
process of language definition, development, and specification.
You learn a lot about C++ programming, even though that's not Stroustrup's
primary purpose. In explaining why he accepted or rejected proposed features,
Stroustrup offers examples of alternatives that reveal better ways to use
C++--ways made possible by the underlying behavior of the language, ways that
programmers discovered rather than designed. He often expresses his own
surprise at their discovery, which adds insight to the complexities of the
language: Even its creator has to discover (or be told about) an idiom that
applies the language's underlying behavior to the expression of a particular
solution.
The Design and Evolution of C++ is a study in language structure and design,
revealing Stroustrup's resolute philosophy about how a programming language
should work and what compromises are necessary to assure its success. Most
criticisms of C++ fall into two categories--the legacy of language constructs
that descended from C, and its static (compile-time) type checking system,
which purists view as being less than object-oriented. Stroustrup deals with
both of these. First, he could have built a better language instead of a
better C. He could have assigned less importance to compatibility with C.
"Within C++, there is a much smaller and cleaner language struggling to get
out," which he says, "would ... have been an unimportant cult language."
Second, he is committed to the concept of static (as opposed to dynamic) type
checking as being inherently safer and essential to retain the efficiency of
C. Without that guarantee, programmers used to C's efficiency will not switch
to a new language, no matter what promise it holds.
The book is Stroustrup's personal-historical perspective of the growth of C++.
He approaches it chronologically to provide a sense of when different features
were realized. Then he addresses individual programming issues and the
features that support them without regard to their place in time. He
chronicles the successes, the failures, and the forces brought to bear on his
decisions about the growth of C++.
We programmers sometimes believe that programming languages come from one of
two places: large paradigm-polluting bureaucracies belching out behemoths such
as Cobol and Ada, or independent free spirits, in one bright light of
inspiration, sit down and cobble a terse, elegant, language like C and C++ to
endure for generations. This book tells a different tale, and you learn of the
contributions of a number of collaborators both within AT&T and later on the
Committee. Stroustrup gives credit where it is due and names. Whether or not
you like a feature or bemoan the absence of another, you can usually find out
whose idea it was by reading this book. C++ is the product of the minds of
many participants over a long period of time, with Stroustrup as the focal
point.
The book is at times a study in group dynamics. Not bound by the limits that
the ANSI X3J11 Committee imposed upon themselves--to codify existing C
practice--X3J16 is inventing a lot of new language, and the exercise makes for
some dynamic interplay. Stroustrup chairs the Extensions group and tries to
manage the spate of new feature requests that pour in from users. He openly
discusses his attitudes about features and tells about the arguments and
forces of logic that bear him out in some cases and convince him to change his
mind in others.
C++ is what is it is because of several criteria that the Extensions group
applies before accepting a feature. Those criteria reflect ones that
Stroustrup applied as the language grew before the Committee was formed.
First, each feature is scrutinized for its need. Is it a provincial demand, or
will the programming community benefit from it? Passing that test, the feature
is implemented and used before being formalized in a release. Its
acceptability is based on that experience. If the feature cannot be easily
implemented, it is suspect. If it cannot be explained to a C++ programmer in
short order, it is suspect. If there are reasonable alternatives existing in
the language, the feature is suspect. If it breaks a significant amount of
existing code, it is almost sure to be rejected. If it involves a new keyword,
it has two strikes going in. (Stroustrup has no apparent strong bias against
new keywords--he's introduced plenty of them himself--but he does want to
avoid the inevitable outcry of protest. Sometimes, he says, it's easier to
find an alternative notation than to fight the new keyword fight.) You will
learn how these criteria were applied to requested features that were accepted
or rejected based on the outcome. It is as interesting to learn what didn't
make it and why as to learn what did. If your favorite feature isn't there,
chances are it's been considered and rejected. Now you will know why.
Finally, this book prepares you for the inventions that are coming from the
committee. We are properly concerned about new language features that fall out
of the deliberations of large numbers of people with shared and diverse
interests. We worry about issues being resolved with compromise based on the
strength of the debaters rather than technical merit. This book discusses them
all and relates the content and context of those deliberations and, for the
most part, puts our fears to rest. What's coming? Templates are changing
significantly over their first definition in the ARM and from existing and
different implementations. Run time type identification has been approved and
is already implemented in some compilers. Stroustrup explains his theories
about how this feature should and should not be used. There are a number of
new cast conventions intended to obviate C's inherently unsafe typecast
mechanism. The new namespace feature solves a long standing problem with name
collisions among user code, standard libraries, and third party libraries. The
namespace mechanism reflects differing opinions about how names ought to be
managed in a programming environment. The results are several notations from
which you can choose depending on where you stand on the issue. They all work,
and you can use the ones you like. The book explains the situation and how it
came to be.
The Design and Evolution of C++ expands your understanding of C++ by
explaining how and why it evolved. You will be more tolerant of some of its
vagaries once you understand the alternatives. You will embrace new features
after you've learned their motivations. You will anxiously wait for your
favorite compiler vendor to release versions that implement the new features
so that you can try them out. If you write C++ code, you need this book.


Inside the PowerPC Revolution


Inside the PowerPC Revolution, by former DDJ columnist Jeff Duntemann and Ron
Pronk, is many things. First and foremost, it is a wall-to-wall treatment of a
totally new computer architecture, designed by an alliance formed between
Apple, IBM, and Motorola. Beyond that, the book is a foundation technical
overview of traditional and future computer architectures, a history of
processor evolution and the development of RISC technology in particular, an
analysis of corporate dynamics when foes form friendships, and an abundance of
projections, plans, rumors, and speculation.
The book reveals a PowerPC bias. The authors don't come right out and say so,
but you can tell that if they aren't altogether convinced that the PowerPC
will rule, they at least hope that it will.
In 1991, Apple, IBM, and Motorola formed an alliance to define a standard for
desktop workstations and computers based on a family of RISC processors.
Motorola and IBM would cooperate with complementing chip design and
fabrication technologies. IBM and Apple would produce desktop systems that
complied with the standard specification and that would run PC and Mac systems
software and applications. Motorola and Apple would coordinate the issues of
compatibility between the Mac's 680x0 and the PowerPC. At the time of the
announcement, industry interest centered on the unlikely nature of the
alliance rather than its goals. In March of this year, the lid came off. Apple
introduced several models of the Power Mac based on the new architecture with
processor chips manufactured by IBM.
The implications of the alliance and the new architecture were not clear to
the marketplace because the media did not cover it with much enthusiasm. For
now, many computer buyers and software developers are uneducated about what's
coming. This book fills that void.
The most enduring part of Inside the PowerPC Revolution is its historical
account of the evolution of desktop computers in the first two chapters. The
authors repeat information that has been widely reported, but they are brief,
do not belabor the subject, and lay the foundation for what follows. No matter
what else happens, Chapters 1 and 2 will always be relevant. Chapter 3
compares RISC and CISC. It starts out by asserting that "There is no such
thing as RISC" [their italics]. The reason behind this position is that
so-called RISC chips have as many or more instructions in their repertoire as
so-called CISC chips. The chapter goes on to say that the real differences
are: the RISC architecture minimizes memory access by working mostly in
registers; instructions are of uniform length to eliminate complex instruction
fetch operations; and RISC manufacturers discard rather than sell the
processors that don't run fast. Chapter 3 is also a tutorial on caching,
pipelining, superscalar execution, and parallelism, which are techniques that
processors use to run programs faster, techniques that do not usually involve
cognizant cooperation from the programmer.
Past Chapter 3, the book gets into the dynamics, politics, and market
implications of the alliance. This part of the book has an almost tabloid
attraction. We like to read about unlikely alliances, declared wars on
competing alliances (like Intel/Microsoft), deal-making, and the like. But
there are benefits to this insight. To decide to purchase and develop software
for a different architecture, you need as much information as you can get
about its chances to succeed. Contributing to these chances are the strengths
of the agreements, the degree of financial commitment each party makes to
them, and the potential for hidden agendas that could compromise their future.
Chapter 9 describes the Power Macs that you can buy today from Apple. Chapters
10 discusses IBM's yet-to-be announced PowerPCs. These chapters address how
the PowerPC emulates Mac, DOS, and Windows to run applications from each of
these platforms. Chapter 10 discusses IBM's human-centered computing paradigm,
which, the authors' enthusiasm notwithstanding, reminds me of those exciting
multimedia demonstrations of a few years back--birds flying, music playing,
spreadsheets talking, databases listening, and the like. Impressive, but
disappointing in view of what has actually materialized in the way of useful
applications.
Chapter 10 also talks about IBM's operating-system crisis. For users to buy
the machine and developers to write software, the computer needs a popular PC
operating system that will run Windows 3.x applications--and IBM doesn't have
one. The operative work is "popular," which rules out OS/2. Apparently IBM's
first offering will run their AIX UNIX-lookalike and NT, which, if anything,
are both less popular than OS/2. Later comes Workplace/OS, which is somehow
supposed to become popular and fix everything.
Chapter 11 gets into emulation and compatibility. Although the PowerPC is
supposed to be a standard architecture for desktop computers, IBM's operating
systems will not run on the Power Mac, and System 7 will run only on the Power
Mac. The situation with respect to applications, operating systems, and
hardware, and what runs on what is complex enough that the book uses two
tables to chart it. So much for the promise of an open architecture.
Operating systems--specifically so-called object-oriented operating
systems--are the subject of Chapter 12. Apparently users are going to have to
make a paradigm shift the same way we programmers did. That will be
interesting to watch. 
Interviews with major players in the PowerPC project are scattered throughout
the book. These interviews provide insight into plans and motivations,
although they sometimes seem to be vehicles to entrench the interviewee's
agenda. The interviews do convey information, but because of the
time-dependent perishability of that information, the interviews seem to be
more suitable for magazines than a book.
I have two criticisms of Inside the PowerPC Revolution, but they are not big
ones. First, the book repeats itself too often. I was treated to a detailed
explanation of clean-room BIOS cloning twice. I learned at least three times
that pundits without vision predicted failure first for the PC and then for
Windows because of the lack of applications. I lost count of the times that I
found out that without a 486DX, you need a separate numerical coprocessor to
do floating point math in hardware. These kinds of repetitions are typical
when more than one author writes a book and no one takes charge and pulls it
together. At least this book does not contradict itself, and that is to its
credit.
My second criticism involves the chapter and paragraph headings in the book.
Many of them are too cute and cloying. In their attempt to be chummy and funny
(which fails), the authors do not let the headings tell you what the chapters
and paragraphs teach. As a result, the table of contents is about half as
effective as it ought to be.
I was glad to get this book to review because the PowerPC is one of those
things in the industry that kind of snuck up on me. Like most programmers, I
get mired down in the current project and come up for air and revelation only
every now and then. I did, and everyone was oohing and ahhing over something
new called the PowerPC. I wanted information and found hype. Since then,
having read Inside the PowerPC Revolution, I feel like I know as much as a
potential user and programmer can know about the PowerPC without joining the
alliance and making a financial commitment. Because of the infancy of the
technology, the uncertainty of its direction, and the vagaries of a fickle
marketplace, this book might have a short life. Its necessary dependence on
speculation could guarantee its obscurity fairly soon. Nonetheless, the book
represents the orderly presentation of a significant body of research on a
relevant and current topic, and is the only work I've seen on the PowerPC that
can be called comprehensive. Soon there will be a glut of PowerPC books, good
and bad, just as there are C++ books, Internet books, Windows books, and John
Grishom mysteries. Until then, Inside the PowerPC Revolution is the only game
in town, and it will be hard to beat.
The Design and Evolution of C++
Bjarne Stroustrup
Addison-Wesley, 1994, 461 pp.
$26.95
ISBN 0-201-54330-3
Inside the PowerPC Revolution
Jeff Duntemann and Ron Pronk
Coriolis Group Books, 1994, 395 pp.
$24.95
ISBN 1-883577-04-7































































August, 1994
SWAINE'S FLAMES


Fussy Logic II


Aside to Certain Readers (you know who you are).... True, I did more or less
give you permission to use the contents of my recent Info Highway Cliche' Kit
column, so I can't really complain that you posted the whole thing to the
Internet, though it did torpedo my ongoing negotiations with Reader's Digest.
In the spirit of that column, here's a free factoid that you may find useful
in constructing one of those abstruse references that make you the life of any
party.
A flat-iron was once called a "sad iron," based on an archaic use of the word
"sad," meaning "heavy or dense." I leave it to you to figure out how to apply
this, but you might think about the recent fortunes of Digital Equipment
Corporation. Sad iron.


Here Begins the Column Proper...


Long before Don Norman discovered that Turn Signals are the Facial Expressions
of Automobiles, Rust Hills, the onetime fiction editor of Esquire magazine,
addressed himself to the subject of how things ought to work. He did this most
memorably in a book entitled, How to Do some Particular Things Particularly,
or The Memoirs of a Fussy Man; and in some fussy sequels.
I particularly like the essay, "How to Set an Alarm Clock." It turns out that
there were, back when he wrote the piece, five steps to remember in setting an
alarm clock: Set the clock, set the alarm, wind the clock, wind the alarm, and
pull out the little knob so the alarm will go off.
Thank goodness we've simplified that interface. My alarm clock needs
adjustment only if the power goes out, the time changes, or I want to get up
early (each of which happens about twice a year).
Sometimes I think that fussy feedback is just what we need in user-interface
design. Then again....
I don't suppose the clock makers were responding to feedback from Rust Hills
when they made clocks easier to set. It's possible, but somehow I doubt it.
Hills invented a system for remembering the five steps in setting an alarm
clock. But when clocks came along that let you wind clock and alarm at the
same time, there were only four steps. And with electric clocks, there were
only three. His system no longer worked. So he adapted it, pretending to wind
the clock and alarm and preserving his five steps, some of which became, for
some clocks, virtual steps. He would twist an imaginary key, calling out
"three!" or "four!," driving his wife crazy.


The Moral of Our Story


Users will do the stupidest things and have excellent reasons for doing them.


A Postscript


It turns out there are an awful lot of clocks in my house and yard, and I
relearn just how many twice a year, when the time changes. It always takes me
a week or so to find them all. In the meantime, the sprinklers and pool filter
are coming on an hour early or late, the computers are timestamping files
wrong, the kitchen clock is giving bad advice on how much time we have before
the lunch guests arrive, and forget about the dashboard and guest-room clocks,
which I always do.
I've worked it out, though. It turns out that the whole clock-setting process
consists of exactly 14 steps....
Michael Swaineeditor-at-large



























August, 1994
OF INTEREST
ObjectSoftware has released ObjectTrace 1.0, a profiler and tracer for C++
applications. This tool allows you to trace C++ programs using instrumentation
techniques, C++ objects, C++ object-member functions, and any memory leaks by
C++ objects. The tool then produces a call-graph on the C++ application that
has been traced, along with a detailed report on object instances and memory
leaks. ObjectTrace is currently available for SunOS 4.1.x and Solaris 2.3. A
single-user license sells for $395.00. Reader service no. 20.
ObjectSoftware Inc.
1266 Hidden Ridge, Suite 1030
Irving, TX 75038
214-550-0747
Version 2 of SQL Objects++ C/C++ Database Library from Objects++ Software
supports ODBC, IDAPI, abstract SQL classes, and direct database access. The
vendor claims you can gain database independence without issuing any SQL code
by using the abstract SQL classes. Version 2 also supports SQL Base, Watcom
SQL, ASCII files, Oracle, Sybase, SQL Server, DB2/2, DDCS/2, NetWare SQL, and
Btrieve. Platforms supported include Windows, OS/2, DOS, and NT. Depending on
which database drivers you want, the price of the library ranges from $695.00
to $4995.00. Source-code options are also available. Reader service no. 21.
Objects++ Software
47 Stonewall Street
Cartersville, GA 30120
404-382-6585
Dragon VoiceTools is an SDK that lets C programmers build voice-activated
interfaces for Windows or DOS applications based on the Dragon Systems
speech-board system. The SDK's interface allows speech recognition to be
integrated into any source code that can call C functions. 
The SDK consists of DOS and Windows speech drivers; the SDAPI (Speech Driver
Application Program Interface) library, C functions that work with Borland and
Microsoft C/C++; an FSG (finite-state grammar) compiler to convert a text file
into words and phrases your application-specific program will recognize; and
speaker-independent, acoustic voice models. VoiceTools sells for $1995.00.
Reader service no. 22.
Dragon Systems Inc.
320 Nevada Street
Newton, MA 02160
617-965-5200
Pacific Communication Sciences Inc. (PCSI) has announced its Ubiquity CDPD
SDK--a software-only approach for simulating an end-to-end CDPD network on a
PC without resorting to cellular airlinks. The Ubiquity SDK enables software
developers to create DOS- and Windows-based wireless applications for the CDPD
cellular data network. The SDK includes a CDPD network simulator that permits
applications to be built, tested, and demonstrated in a controlled environment
reflecting the wide range of conditions that occur in normal field
environments.
The Windows PC-based network simulator supports configurable serial
communications interfaces for two mobile computers and one fixed-end computer.
Its programmable event logs and traffic statistics provide developers a
superior application debug and analysis capability. By eliminating the need
for the CDPD cellular airlink, the simulator reduces both development time and
the need for cellular service.
The Ubiquity SDK provides a library of communication interfaces such as the
Windows Sockets API and the SLIP protocol. The SDK provides an AT emulation
mode, allowing existing modem applications to operate over the CDPD-packet
data service. The library includes an optimized DOS API utilizing the TCP/IP
protocol stack built into all PCSI Ubiquity subscriber products. The Ubiquity
CDPD SDK sells for $995.00. Reader service no. 23.
PCSI
10075 Barnes Canyon Rd.
San Diego, CA 92121
619-535-9500
The KL Group has released XRT/table, a multipurpose widget that enables
OSF/Motif developers to include graphical, tabular text display and editing
capabilities in their applications. All of XRT/table's attributes are
programmed through resources. The widget supports programming interfaces
through C, C++, UIL, and resource files. Cell values can be specified in
advance or on the fly. Tables can be as large as memory allows--up to two
billion rows by two billion columns. The widget also supports compound strings
within each cell. XRT/table is available on Alpha/OSF, DECstation, HP 9000,
IBM RS6000, SCO ODT 386/486, GI Sun Sparc Motif, and UNIX V.4 386/486. It
sells for $995.00. Reader service no. 24.
KL Group Inc.
260 King Street East
Toronto, ON
Canada M5A 1K3
416-594-1026
EsiObjects 1.1, an object-oriented application-programming environment based
on the M language (a MUMPS derivative) has been released by ESI. The
Windows-hosted system combines over 140 classes containing almost 2000
methods. These classes are stored in a network-based multidimensional database
so that a multitude of programmers can share tools across the network.
Additionally, the database can be built into client/server applications.
EsiObjects runs on most M implementations that support workstation hardware
and a windows interface. Run-time environments are supported on all M
implementations. Single-user licenses cost $1295.00. Reader service no. 25.
ESI
5 Commonwealth Road
Natick, MA 01760
508-651-1400
Visigenic Software has licensed the source code to Microsoft's ODBC SDK 2.0 in
order to port the ODBC technology to UNIX. ODBC is Microsoft's interface for
accessing data in a heterogeneous environment of relational and nonrelational
database- management systems. The license covers the Driver Manager, ODBC
utilities, and documentation. 
In porting the ODBC Driver Manager 2.0 to UNIX, Visigenic provides a complete
ODBC SDK for UNIX database programmers. Developers will be able to use the
Visigenic ODBC SDK to create C or C++ programs that use the ODBC API, or
conversely, create their own driver for a new data source. The SDK is also
required by any other type of language or application, such as a spreadsheet
or word processor, which needs to access DBMS data on UNIX, through a standard
programmer interface, regardless of the brand of SQL engine. Initially,
Visigenic is developing drivers for the following UNIX data sources: Informix,
Oracle, and Sybase. These drivers, as well as drivers written by other data
sources, are available separately from the SDK.
The Visigenic SDK will include the Driver Manager, header files, sample
programs, and utilities to speed the development of applications or new
drivers. Since these components are common with the Microsoft SDK for Windows,
DBMS developers can write to a single database API for applications that can
be executed in both Windows and UNIX environments. The Visigenic ODBC SDK will
be priced at $995.00. Reader service no. 26.
Visigenic Software 
951 Mariners Island Blvd., Suite 460
San Mateo, CA 94404
415-286-1900
IBM has announced that its Mwave DSP-based multimedia technology now supports
the OS/2 2.1 Multimedia Presentation Manager (MMPM/2). Mwave has also been
extended to support V.32bis protocol for 14.4 Kbs modem transmission, V.17
protocol for 14.4 Kbs fax transmission, Video for Windows JPEG, wave-table
sound synthesis, and voice capabilities. Qsound special effects have also been
added. There is no charge for the audio functions. The V.32bis and V.17
protocols sell for $5.00/copy, while the wave-table synthesis and Qsound sell
for $3.00/copy. Reader service no. 27.
IBM Microelectronics
1500 Route 52
Hopewell Junction, NY 12533-6531
800-426-0181 ext. 500
Three C programming tools have been released by Interactive Instruments. The
first, Data-Organ, is a data-management tool which serves as a base for
multi-instance B*-trees and a record-set manager for keyed or direct access.
Keyed access is performed by extendable hash in the foreground or by B*-tree
indexing.
The second tool is KeyPoint, which contains AVL-balanced binary trees, a
keyword-table manager that allows access by abbreviations or synonyms, and a
constructor and interpreter for languages of the operator/operand type.
The third tool, TextMatch, consists of a finite-state pattern translator and
context-free macro-substitution processor. Both work as character pipelines
with user-supplied input/output handling.
DataOrgan sells for $450.00, KeyPoint for $130.00, and TextMatch for $180.00.
As a set, the three packages, which are supplied in source-code form with no
run-time royalties, are available for $680.00. Reader service no. 28.
Interactive Instruments
Beethoven Platz 14
53115 Bonn Germany
+49-228-650041
The NetWare Client SDK for Visual Basic, which makes it possible for Visual
Basic programmers to implement NetWare client APIs into applications, has been
released by Apiary. The SDK consists of a Windows help file covering the
entire NetWare API with prototypes in C, Pascal, and Visual Basic; numerous
examples in NetWare 2.x, 3.x, and 4.x; functions such as drive mapping and
directory services; and several Basic files containing the NetWare Client data
structures and DLL prototypes. The NetWare Client SDK for Visual Basic sells
for $395.00. Reader service no. 29.
Apiary

10201 W. Markham, Suite 101
Little Rock, AR 72205
501-221-3699
Aritek Systems has released Arisoft CornerStone, an SDK that allows you to
incorporate CAD functionality into Windows applications. At the heart of the
toolkit is a CAD engine around which you can build your CAD application. To
work with this engine, you use a C-like macro language (called "Ariflex") and
the accompanying compiler. The SDK also includes a converter for translating
AutoCAD .DCL files to Windows .RC format so that you can create customized
interfaces. CornerStone sells for $2800.00. Run-time royalties are also
required. Reader service no. 30.
Aritek Systems
10 Inverness Drive, Suite 105
Englewood, CO 80112
303-799-6559
Novell has announced support for the VIM (Vendor Independent Messaging API
from the VIM consortium), CMC (Common Mail Calls from the XAPI Association),
and Simple MAPI (from Microsoft) messaging APIs. Applications based on these
APIs can be used to interoperate with native NetWare MHS programs that run on
SMF, Novell's messaging API.
The VIM API is a procedural, cross-platform messaging, transport-independent
API developed by a consortium of industry vendors, including Novell, IBM,
Apple, Borland, Lotus, MCI, Oracle, and WordPerfect. Applications developed
with VIM include Borland's Quartro Pro for Windows, Central Point Tools for
Windows, Microsoft Office, and WordPerfect InForms.
CMC was designed to offer basic mail-enabling capabilities in a procedural,
cross-platform messaging, transport-independent environment. Applications
developed with CMC include Collabra Share, MS Word, and Excel. 
Simple MAPI was originally provided as a subset of the MAPI 1.0 API. Microsoft
has since replaced it with CMC as its recommended basic mail-enabling API.
Simple MAPI is a basic mail-enabling API for Windows upon which a variety of
older mail-enabled solutions are based.
SMF provides full access to all features of the NetWare MHS product line,
including NetWare Global MHS, NetWare Basic MHS, and NetWare Remote MHS. 
All these API libraries are available free of charge as a set of DLLs for
Windows. The libraries will be distributed on NetWire, included as part of the
next NetWare SDK release, and integrated with all NetWare MHS product
revisions. Reader service no. 31.
Novell
122 E. 1700 South
Provo, UT 84606
800-638-9273 
Motorola's Microcontroller Technologies Group has introduced the RMCU500
family of 32-bit RISC microcontrollers, which is based on the PowerPC
architecture. The first chip to be made available is the 3.3 volt, 25-MIP
RMCU505. The microcontroller family is targeted at consumer electronics,
computer peripherals, communications, and a variety of control applications.
Supporting the RMCU505 is an SDK which includes an optimizing C compiler,
macro assembler, debugger, linker, archiver, and SRecord generator. The SDK is
designed to support the development of embedded code and to facilitate
operation with ROM and RAM. In 100-quantity, the RMCU505 will sell for $75.00.
Reader service no. 32.
Motorola 
Microcontroller Technologies Group
6501 William Cannon Drive West
Austin, TX 78735-8598
408-982-0400
A new 80-minute video by William Hall provides tips for successful software
internationalization. The video, called Software Internationalization: Theory
and Practice, discusses necessary program changes ranging from modifying time,
date, and currency displays, to changing how lists of words are alphabetized
and formatted for display. Other topics include working with non-U.S.
keyboards and using colors preferred by various localities. (Hall recently
launched an article series entitled "Internationalization in Windows NT" in
the May 1994 issue of Microsoft Systems Journal.) The video sells for $299.95.
Reader service no. 33.
InternaX
6 Johnson Way
Scotts Valley, CA 95066
408-438-2270
TeachFuzz is a fuzzy-logic learning tool from Impatiens Publications. The
software, available for both PCs and Macs, lets you define a system composed
of two inputs, one output, and up to 25 rules. TeachFuzz sells for $24.95.
Reader service no. 34.
Impatiens Publications
4028 Pleasant Ave.
Minneapolis, MN 55409-1545
612-822-1799




























September, 1994
EDITORIAL


Forward Thinking 


The coming year looks to be a milestone for Dr. Dobb's and a watershed for the
computer industry at large. For our part, 1995 launches DDJ into its 20th year
of publication. Not bad, the good Doctor recently said, if I don't mind saying
so myself. His prescription: another 20 years of the same. To honor the
occasion, it seems a party should be the order of the day. How about getting
together with us up at Swaine's mountaintop pool sometime next summer? Michael
can roll out the Smokey Joe and his cousin Corbett can take charge as social
director/lifeguard. I can see it now--games like "Pin the Tail on the Pundit,"
"Babble" (a Scrabble-like game where you're only allowed to use computer
buzzwords), and "Name that Algorithm." On second thought, I think that's the
day I'm supposed to get my hair cut.
Still and all, we have been making concrete plans for 1995, starting with our
editorial calendar. Naturally, you'll see some familiar topics in the
lineup--algorithms, for instance, remain one of the fundamental building
blocks of programming, no matter which platform you're developing for or what
language you're using. Other subjects--visual programming, to name one--are
emerging and bear examination. Of course, this doesn't mean that these are the
only topics we'll be looking at next year. We're always looking for technical
articles that address issues and techniques important to the art and science
of computer programming. If you have an article you'd like to share with your
fellow programmers, give us a call or drop us a line (e-mail or otherwise).
We'll get a copy of author guidelines right out to you. It's important to
remember that the magazine generally comes out about a month prior to its
cover date--you probably received this September issue in early August--so
make your plans accordingly if you're targeting a specific issue.
Don't forget that if you've run across an undocumented interface of some kind
or another, Andrew Schulman would like to hear from you, as would Bruce
Schneier if you've come up with a new algorithm (or a unique twist on an old
one). And we're always looking for articles that focus on tools, projects, or
books for our "Examining Room," "Programmer's Workbench," and "Programmer's
Bookshelf" sections. If there's something other than an article on your mind,
send us a letter--your views on software development are important to us.
There's little doubt that 1995 will provide a clearer picture of the
directions in which the computer industry will be lurching. For instance,
component (or interchangeable) objects--the Holy Grail of object-oriented
programming--hold promise, although (as we'll discuss in next month's DDJ)
many of the models for dealing with them are still "vaporspecs" (that's
vaporware in its pre-beta form). By this time next year, however, today's
press releases should morph into specifications, and the current crop of
specifications will likely be real implementations. 
Similarly, we should know by next fall whether or not the PowerPC really is
the platform of the future, as we've been told. Questions abound, primarily
about issues such as emulation and resulting performance, but it's still
early, and true native PowerPC apps are forthcoming. (We'll be examining
factors related to PowerPC application development, starting with next month's
coverage of the processor's bi-Endian capabilities.)
Finally, 1995 should produce even more startling developments in the world of
communications and networking--particularly considering the recent legislative
changes governing what telephone companies can and cannot do. Frankly, if you
gave me a feather I couldn't be more tickled about the prospects of the
telephone, electric, and cable-TV companies slugging it out over the right to
send digital data into my house.
Technology-wise, 1995 should be an exciting year, and here's hoping you'll
enjoy it with us.
Dr. Dobb's Journal 1995 Editorial Calendar 
January Numerical Programming
February Distributed Computing
March Portability and Cross-Platform Development
April Algorithms
May Operating Systems & Microkernels
June Software Engineering & Design Methodologies
July Graphics Programming
August C/C++ Programming
September User Interfaces
October Object-Oriented Programming
November Client/Server Architectures
December Visual Programming
Jonathan Ericksoneditor-in-chief

































September, 1994
LETTERS


3-D Morphing 


Dear DDJ,
I really enjoyed the fine article, "Morphing 3-D Objects in C++" by Glenn
Lewis (DDJ, July 1994) and would like to congratulate Glenn, as well as
discuss a few points with him. How can I get in touch with him?
Ben Allen
Los Angeles, California
DDJ Responds: Thanks for your note, Ben. Glenn is an engineer with Intel's
real-time group and can be contacted at glewis @pcocd2.intel.com.


Timing is Everything


Dear DDJ, 
I just read Tom Swan's June 1994 "Algorithm Alley" column, where he shows how
recursion can be removed by essentially replacing the system stack with a
private stack. At the end of the column he says: "I didn't profile any of the
code listed here_but removing recursion usually produces a speed boost."
I have read similar statements in many algorithm books, but I have not found
any evidence for the general validity of this rule of thumb. Over the years, I
have used many techniques to enhance performance, including recursion removal,
and I have learned to always time code if time is important. What I have found
(in general) is:
1. Tail recursion removal is always beneficial. It is also a very simple
adaptation. 
2. General recursion removal requires a private stack and routines to deal
with this private stack. It depends on the algorithm, the compiler, and the
operating system whether you may expect an increase in performance. C has very
little overhead for a function call, so recursion can be very fast. C++ and
Windows require more function prologue and epilogue, which makes recursion
slower.
3. A benefit of recursion routines is that they are very simple. But similar
to their nonrecursive brothers, recursive routines can be optimized. Most
recursive routines begin by testing whether to exit immediately. A very simple
optimization is to move this test before each recursive call; if the recursive
function call would exit immediately, there is no need to call the routine.
4. Sometimes you find a nonrecursive routine that does not need a stack. Here
you have, in fact, found a completely different algorithm. This algorithm is
often much faster then the recursive routine.
5. Don't trust profilers. Use a stopwatch or the internal timer of the
computer.
Thiadmer Riemersma 
Bussum, Netherlands


The Right Tool_


Dear DDJ, 
I read Jay Frederick Ransom's letter ("Letters," DDJ, April 1994) commenting
on P.J. Plauger's "Programming Language Guessing Games" (DDJ, October 1993). I
agree with Mr. Plauger, C/C++ is a complex language. Not everybody, unless you
have special training, can read this cryptic language. C was written by two
excellent professionals (Kernighan and Ritchie) to create an operating
system--UNIX. Since C and C++ are actually high-level assemblers, why would
you use them to write scientific and mathematical or financial and commercial
applications? Why use a plier to tighten a nut if you can use a wrench?
Fortran is the correct tool for the first, and Cobol is the correct tool for
the second.
Simple programs, like that shown by Mr. Ransom, can be written in any language
(he chose C++). Example 1 is the same solution written in Visual Cobol (I must
admit that I originally wrote it in Fortran), but with a great difference: I'm
sure everybody can read and understand it. [Editor's Note: Executables and
related files are available electronically; see "Availability," page 3.] And
using his words, note the "simple and elegant'' code used to solve the
problem. No tricks, no hidden features; and straightforward, too.
Jaime Orozco-V.
Santaf de Bogot, Colombia


More on Secure Algorithms


Dear DDJ,
In a recent "Letters" column discussing secure algorithms (DDJ, July 1994),
William Hause suggested what's essentially the one-time pad system. Although
unbreakable, it produces many security difficulties in practical
use--especially with multiple correspondents, where every pair exchanging
messages must always use different random keys. If any two messages use the
same key, decryption becomes much easier.
All users must take great care not to reuse any part of any key, yet must
ensure that the recipient always know where to start. This point is a major
problem with the system, exaggerated by the weaknesses of human nature.
Natural random numbers require a high-quality electronic source not available
to most people. The British government uses a complex electronic device
(ERNIE) to generate the random numbers used for the monthly
national-savings-prize draw. A technical article describing the equipment
stated that each number combined two electronic sources to avoid any risk of
nonrandom output. Additionally, actual output is checked statistically after
every use.
The distribution of keys produces a major security problem. How does one
ensure--even with a personal messenger--that no one has taken a copy en route?
Even sending multiple keys by different routes and combining them for use must
always leave some element of doubt.
The most secure methods avoid transmission of any data sensitive to
interception. Widely published data, such as new books in the lists of agreed
large publishers, easily provide the several numerical values to derive
indirectly the multiple seeds necessary for a secure key. Each pair of
correspondents agrees--and keeps secret from all others--when to change lists
and books and how to calculate the several seeds from each chosen book.
Although every message requires a completely different key, book references
may be infrequent. The message date, time, number, and so on can
provide--again indirectly--some of the necessary variability in seeds.
A Poisson test provides the most critical test of randomness, sensitive to any
trace of sequence repetition. Many computer algorithms pass more-popular tests
but fail with a Poisson test. The best methods encipher each bit
independently. Check for and avoid correlations between bits in the same
sequence, as found with most shift-register designs.
Do not delete zeros from any random key; omission of the expected plaintext
characters actually assists decryption.
For any message, there are 128N possible different keys (one is plain
language), where N is the number of characters in the message. The best
methods approach this limit.
R.G. Silson
Tring, Herts, England


Big Numbers, Cont.



Dear DDJ,
I would like to take you up on an offer which was extended to DDJ readers in
the September 1993 issue, page 10. You were responding to Mike Neighbors, who
wrote a letter entitled "Searching for Mr. Big Number," in which he asked
readers for good references on high-precision arithmetic. 
I am the founder of Nth Power Software, a software-development company which
specializes in symbolic-algebra applications. Currently, we have a
symbolic-algebra programming language called the "Nth Programming Language,''
which runs on DOS- and Windows-based systems. Nth has been under development
for several years, and Version 1.0 was released in December of 1993. Nth has a
robust, multiple-precision number capability with far more than the
40-decimal-place accuracy mentioned by Mike Neighbors. Its key features
include multiple-precision numbers, multivariate rational polynomial
procedures, and the capability for you to build your own distributable
applications which utilize the full power of the Nth language via the NthDX
module. 
Lloyd E. Nesbitt, 
Nth Power Software
Adair, Oklahoma


Ray Tracing and POV-Ray 


Dear DDJ,
As a member of the POV-Team, I was pleased to see Craig Lindley's article,
"Ray Tracing the POV-Ray Toolkit" (DDJ, July 1994). While the article was
essentially correct for POV-Ray Version 1.0, it contains some outdated
information that may cause some confusion with users of the current version.
Craig's article and sample code was based on POV-Ray 1.0. The current version
of POV-RAY is 2.2. While the sample code that was provided will run, POV-Ray
2.x must be run in "backwards-compatibility mode" to prevent syntax errors.
This mode can be set via the command line or by using the #version 1.0
directive in the scene file itself. More information is available in the
POV-Ray documentation.
Also, Drew Wells is still a POV-Team member, but the job of coordinator has
been assumed by Chris Young (CompuServe 76702,1655) and inquiries should be
addressed to him.
Because of the large number of platforms that POV-Ray can run on, it is
advised that downloaders first look for the file POVINF.DOC, which will
explain a little about POV-Ray and which files are necessary to get it working
on any particular platform. An alternate Internet site for POV-Ray related
files is now available at uniwa.uwa.edu.au in the pub/povray directory.
While the article mentioned POVCAD as one possible modeler for the PC, it
should be noted that another modeler, Moray, is probably in more common use.
There is a great need for a portable modeler for POV-Ray. 
There are a couple of DOS modelers, but nothing for non-DOS platforms. Because
of POV-Ray's portable nature, it would be extremely desirable to have a
freeware portable modeler. Many issues arise from this, some of which have
been covered in DDJ, particularly portable GUI issues. I have tried to get a
group of Internetters together to work on this, but it's been really tough and
I'm not so sure that it'll happen. It'd sure make an interesting DDJ project
for someone, though.
Dan Farmer, 
POV-Team Member
CompuServe 74431,1075
DDJ Responds: You're absolutely right, Dan. A portable modeler for POV-Ray
would be a great project. If any readers are interested in working on such a
project, please contact Dan or DDJ. 
Example 1: The right tool...
Id Division.
 Program-Id. TeamPlay.

 Data Division.
 Working-Storage Section.
 01 TeamsTable.
 05 Team Occurs 100 Times, indexed by TeamX, TeamY
 Pic X(25).
 01 Temp Pic X(25).

 01 NroTeams Pic 9(3).
 01 LastTeam Pic 9(3).

 01 Dia Pic 9(3).

 Procedure Division.
 Begin-TeamPlay.
*----------------------------------------------------------------*
* Read in Team names... And no more than expected! *
*----------------------------------------------------------------*
 Display "Enter Teams Names... Ends with *"
 Perform With Test After Varying NroTeams From 1 By 1 Until
 Team (NroTeams) (1:1) Equal "*" or
 NroTeams Greater 100
 Accept Team (NroTeams)
 End-Perform
*----------------------------------------------------------------*
* Delete the "*" Team and calculate an even number of teams *
*----------------------------------------------------------------*
 Subtract 1 from NroTeams
 Compute LastTeam = Function Integer ((NroTeams + 1) / 2) * 2
*----------------------------------------------------------------*
* Print out playing schedule *
*----------------------------------------------------------------*


 Perform Varying Dia From 1 by 1 Until
 Dia Greater LastTeam - 1
 Display "Day " Dia

 Set TeamY to LastTeam
 Perform Varying TeamX from 1 by 1 Until
 TeamX Greater TeamY
 Display Team (TeamX) " .VS. " Team (TeamY)
 Set TeamY Down by 1
 End-Perform
*----------------------------------------------------------------*
* Rotate Teams for next day *
*----------------------------------------------------------------*
 Move Team (LastTeam) to Temp
 Perform Varying TeamX from LastTeam by -1 Until
 TeamX Equal 2
 Move Team (TeamX - 1) to Team (TeamX)
 End-Perform
 Move Temp to Team (2)
 End-Perform
 Stop Run.









































September, 1994
The BMP File Format


When a standard isn't necessarily a standard 




Marv Luse


Marv is president of Autumn Hill Software and author of Bitmapped Graphics
Programming in C++ (Addison-Wesley, 1994). Marv can be contacted at
303-494-8865.


Over the last few years, the BMP format has become an important graphics file
standard. This is not surprising as it is the native graphics file format of
both OS/2 and Windows. To most developers, however, the BMP format is still
something of a stranger, albeit one we know by name. This was true in my own
case, and it was only after writing a book on graphics file formats (Bitmapped
Graphics Programming in C++, Addison-Wesley, 1994) that I came to appreciate
the format's many vagaries.
As it turns out, the BMP format is actually a sheaf of formats bundled under
the same name. Under OS/2, for example, the format is used to store images,
icons, cursors, pointers, and image arrays (and I'm not entirely sure this
list is complete). Here, I'll look primarily at one of these variants--the BMP
image format. 
There are currently two versions of the BMP image format for Windows and two
for OS/2. Think of these as "old" and "new" versions on each platform. The two
old versions are identical, reflecting the common ancestry of OS/2 and
Windows. The two new versions, however, are different. This means that an
application that wishes to handle any valid instance of the BMP image format
must be prepared to deal with three format variants.
Few commercial applications actually support all three variants. You may have
had the experience of an image editor or word processor trying to import a BMP
file, only to be notified that the file is invalid. In such cases, it is
likely that the BMP file originated on OS/2, but the application only knows
how to handle the Windows-format variants. Another possibility is that the
file actually contains something other than an image, such as an icon or
cursor. And, of course, if the application is sufficiently old, it is possible
that it only knows about the common version of the format.
From an application-development perspective, a program should be prepared to
deal with any of the three valid image formats, and the program's
error-handling logic should recognize three distinct situations: a valid BMP
image file, a valid BMP file that contains something other than an image, and
an invalid file of undetermined format. Anything less is likely to generate
unnecessary and annoying technical-support calls (and if you are like me,
anything that reduces technical-support calls is nothing less than manna from
heaven!).
In this article, I'll examine the format itself and present techniques for
encapsulating it using C++. The latter will present a class design that
implements the strategy I've just described. 


The BMP Image Format


The BMP image format is a general-purpose format designed to accommodate
images of any size and possessing from 1 to 24 bits of color information. The
bitmap-handling machinery of OS/2 and Windows prefers a chunky (nonplanar)
pixel format; this is considered the norm for the format. Although multiplanar
images can be accommodated, I have never encountered a BMP file that contained
one. The format also supports RLE compression under Windows and RLE and
Huffman 1D encoding under OS/2. Again, I've never encountered a compressed BMP
file. One unfortunate canon of the format is that images are stored as
scanlines ordered from the bottom up. This is never a problem when dealing
only with the Windows or OS/2 Presentation Manager (PM) APIs, but if you must
deal with the bitmap data directly (say, to perform a format conversion or to
print the image on a dot-matrix printer), then the bottom-up ordering is a
real pain.
As noted previously, the domain of the BMP image format is generated from two
format versions on each of two operating environments and contains three
distinct variants; see Figure 1. In each case, a BMP file contains, in the
following order, a file header, bitmap header, optional palette, and bitmap.
Format variances are confined to the bitmap header and palette.
Under Windows 3.1, the old format is represented by the structures
BITMAPCOREHEADER, BITMAPCOREINFO, and RGBTRIPLE, while the newer format
comprises BITMAPINFOHEADER, BITMAPINFO, and RGBQUAD. Both versions also
include the BITMAPFILEHEADER structure. It should also be noted that a
"COREINFO" structure consists of a "COREHEADER" followed by one or more
RGBTRIPLE instances; similarly, an "INFO" structure consists of an
"INFOHEADER" followed by one or more RGBQUAD instances. (I'm using the
convention of paraphrasing names by dropping the BITMAP prefix and placing the
remaining name in quotes.)
The corresponding OS/2 structure names are a bit more logical by themselves
and a bit less logical when you lump them with the Windows names. The old
format consists of BITMAPINFOHEADER, BITMAPINFO, and RGB, while version 2
names are BITMAPINFOHEADER2, BITMAPINFO2, and RGB2. As with the Windows
versions, both structure sets also include a BITMAPFILEHEADER structure. The
file-header structures are identical under both operating environments.
You may have noticed that the old OS/2 names are the same as the new Windows
names. A second complication is that the structures overlap in both
environments. Thus, a "COREINFO" consists of a "COREHEADER" and an RGBTRIPLE
array, and an "INFO" consists of an "INFOHEADER" and an RGBQUAD array (using
Windows in this example). If this all seems confusing, don't be alarmed--it
is!
Palette entries in the old form of the format consist of three bytes
indicating a blue, green, and red intensity, respectively. The newer versions
add a fourth byte so that the palette can be read as an array of longs. Note
that the traditional RGB order is evident when read into a long value on an
Intel processor and then notated in hex.
The bitmap of a BMP file is organized as a series of scanlines and is
presented beginning with the bottom row of the image and proceeding up. A
second requirement is that scanlines are always padded, if necessary, so that
they occupy an even number of 32-bit double-words. Given an image w pixels
wide where each pixel is d bits deep, the number of bytes per scanline is
calculated as rowbytes=((w*d+31)/32)*4.


BMP Structure Items


Generally speaking, unused or unimportant fields can safely be set to 0 in all
situations. (In the following discussion, the names of the OS/2 structure
items precede Windows names.)
usType/bfType is used to validate the file as being a BMP file and also to
indicate its content. (The first word of the BITMAPFILEHEADER is always the
first item of a BMP file.) For a BMP image file, this field always contains
the value 4D42h which, when presented as low-byte/high-byte, consists of the
two ASCII characters BM, short for "BitMap." Windows does not appear to
support any additional values for this field, but OS/2 supports at least six
possible values; see Table 1.
cbSize/bfSize indicates the size of the file. In older instances of the format
it is frequently 0, and when set, it is sometimes the size of the file in
bytes, and other times, the size of the file in words, depending on who wrote
the file. Thus, you should never make assumptions based on its value. The
correct interpretation is file size in bytes.
offBits/bfOffBits indicates the offset in bytes from the beginning of the file
to the beginning of the bitmap, and is an important value. It is used to
locate the file's bitmap and to calculate the number of entries in the palette
as: ncolors=(offBits-file_hdr_size-bitmap_hdr_size)/rgb_size.
cbFix/biSize is the size of the bitmap header in bytes. This value is also
used to determine what version of the format you are dealing with. If the
value is 12, then you have an instance of either the old Windows format or the
old OS/2 format (they are identical). If the value is 40, you have an instance
of the new Windows format. If the value is 64, then you have an instance of
the new OS/2 format. And if the value is some other value between 12 and 64,
you might have an instance of a newer format, and then again, you might not.
For example, the file OS2LOGO.BMP that comes with OS/2 Version 2.1 has a value
of 36 because the last double-word of the bitmap header is not present. Many
applications check this field for value 40; if it isn't 40, they assume that
the file is not a BMP file. As you can see, this is not a very good strategy.
cx/biWidth is the width of the image in pixels. This value does not reflect
scanline padding.
cy/biHeight is the height of the image in rows.
cPlanes/biPlanes is the number of planes in the bitmap. I have yet to see a
value here other than 1, since the preferred format is for chunky pixels.
cBitCount/biBitCount is the number of bits per pixel. Valid values are 1, 4,
8, and 24.
ulCompression/biCompression is a value indicating the type of compression
employed on the bitmap. Zero indicates no compression, and this is likely to
be the only value you will ever encounter.
cbImage/biSizeImage is the size of the bitmap in bytes. Some applications
depend upon a correct (nonzero) value here.


A BMP-Format Class Design


There are many possible approaches to handling the BMP format from an
application, all of which depend upon the requirements and operational domain
of the application. For example, if you are writing for Windows 3.1 only, then
dealing only with the newer Windows version of the format is an acceptable
option. On the other hand, we are seeing more and more emphasis on
cross-platform capabilities these days, and from this standpoint it seems
desirable to handle the format no matter what--no excuses, no exceptions. This
is obviously the most desirable strategy in any case, but also the one
requiring the most work and presenting the most headache. However, once a
suitable black box has been constructed and determined to function properly,
it will never again have to be pondered; at least, not until somebody deems
that a version 3 of the format is necessary.
The approach I took in my own development was to base a BMP class on the OS/2
2.1 format, which represents, in effect, a superset of all other format
variants. When another variant is encountered on input, it is treated as if it
were an incomplete form of the OS/2 format, one where default values are
supplied for missing items. Conversely, if it is required to output a
different format variant, it is a simple matter to omit extraneous items and
to supply reformatting where necessary. A second aspect of the design is that
I elected to treat BMP files as consisting of two components only: a large
header followed by a bitmap. Thus, in place of a file header, a bitmap header,
and a color array there is simply a header. This goes against conventional
design wisdom to a certain extent, in that it is less modular. However, the
interdependence of the various data structures tends to neutralize the
benefits of a modular implementation.
When it is necessary to supply data components from the format for an instance
of a BITMAPFILEHEADER alone, the BMP class provides member functions that
return void pointers that can then be cast to the appropriate type. And if the
required type is formatted differently than the class's "native" type, a
reformatted version can be constructed on the fly in a temporary buffer and a
pointer to the buffer returned.
With this background, a suitable, skeletal class definition would look
something like Example 1. For more details, see Listing One and Two, page 82.



Final Thoughts


At this point you probably have a good idea of the complexity and capabilities
of the BMP format. What is more difficult to convey is the format's importance
and how to access and use it within a PM or Windows application. You might
start by examining those functions in either platform's API that use
BMP-format components as arguments and experimenting with them. BMPTEST.CPP,
available electronically in both source and executable form (see
"Availability," page 3) is a program that can serve as a starting point, in
this case for my platform of choice, OS/2. Eventually you should find this
familiar stranger to be a bit more familiar, and hopefully, a bit less
strange!
Figure 1 BMP format domains.
Table 1: BMP file header usType (OS/2) values.
 Hex Value Characters Meaning 
 4142 BA Bitmap array
 4D42 BM Bitmap
 4943 CI Color icon
 5043 CP Color pointer (mouse cursor)
 4349 IC Icon
 5450 PT Pointer (mouse cursor)
Example 1: A skeletal class definition for encapsulating BMP files.
class BmpImage
{
 public:
 enum BmpVersions
 {
 BMPOS2NEW,
 BMPOS2OLD,
 BMPWINNEW,
 BMPWINOLD
 };
 BmpImage( );
 ~BmpImage( );
 int read( char *path );
 int write( char * path, int version=BMPOS2NEW );
 void * filehdr( int version=BMPOS2NEW );
 void * bmaphdr( int version=BMPOS2NEW );
 void * palette( int version=BMPOS2NEW );
 void * bits( );
} ;

Listing One 

//------------------------------------------------------------------//
// File: BMP.H -- Classes for encapsulating the BMP format //
// Copr: Copyright (c) 1994 by Marv Luse //
//------------------------------------------------------------------//

#ifndef _BMP_H_
#define _BMP_H_

//.......Useful types
typedef unsigned char uchar;
typedef unsigned short ushort;
typedef unsigned long ulong;

//.......Useful constants
enum FileOrigins
{
 FILEBGN = SEEK_SET,
 FILECUR = SEEK_CUR,
 FILEEND = SEEK_END,
};
enum FileStates

{
 FILEOKAY,
 FILEENDOFFILE,
 FILENOTFOUND,
 FILEINVALID,
 FILENOTBMP,
 FILENOTBMPIMG,
 FILEERROR,
 FILENOMEMORY,
};
enum BmpVersions
{
 BMPWINOLD,
 BMPOS2OLD,
 BMPWINNEW,
 BMPOS2NEW,
};
enum BmpSizes
{
 BMPFILEHDRSIZE = 14,
 BMPOLDANYHDRSIZE = 12,
 BMPNEWWINHDRSIZE = 40,
 BMPNEWOS2HDRSIZE = 64,
};
enum BmpTypes
{
 BMPARRAY = 0x4142, // 'BA'
 BMPBITMAP = 0x4D42, // 'BM'
 BMPCLRICON = 0x4943, // 'CI'
 BMPCLRPOINTER = 0x5043, // 'CP'
 BMPICON = 0x4349, // 'IC'
 BMPPOINTER = 0x5450, // 'PT'
};
//.......A class for performing binary input
class BinaryInput
{
 private:
 FILE * inp;
 public:
 BinaryInput( char * path );
 ~BinaryInput( );
 //.....read various types
 int byte( );
 int word( );
 long dword( );
 int block( void * blk, int nbytes );
 //.....file management members
 int ok( );
 int error( );
 int seek( long ofs, int org );
 long tell( );
};
//.......A class for a BMP header and bitmap
class BmpImage
{
 private:
 char *bmBits; // bitmap data
 ulong bmNumColors; // size of palette
 int fiBmpStatus; // a status code


 char *tmpfilehdr; // temporaries
 char *tmpbmaphdr;
 char *tmppalette;

 public:

 ushort fiType; // type - 'BM' for bitmaps
 ulong fiSizeFile; // file size in bytes
 ushort fiXhot; // 0 or x hotspot
 ushort fiYhot; // 0 or y hotspot
 ulong fiOffBits; // offset to bitmap
 ulong bmSizeHeader; // size of this data - 64
 ulong bmWidth; // bitmap width in pixels
 ulong bmHeight; // bitmap height in pixels
 ushort bmPlanes; // num planes - always 1
 ushort bmBitCount; // bits per pixel
 ulong bmCompression; // compression flag
 ulong bmSizeImage; // image size in bytes
 long bmXPelsPerMeter; // horz resolution
 long bmYPelsPerMeter; // vert resolution
 ulong bmClrUsed; // 0 -> color table size
 ulong bmClrImportant; // important color count
 ushort bmUnits; // units of measure
 ushort bmReserved; // reserved
 ushort bmRecording; // recording algorithm
 ushort bmRendering; // halftoning algorithm
 ulong bmSize1; // size value 1
 ulong bmSize2; // size value 2
 ulong bmIdentifier; // for application use
 ulong bmPalette[256]; // image palette

 BmpImage( char * path );
 ~BmpImage( );

 //.....member functions - query
 long width( ); // bitmap width in pixels
 long height( ); // bitmap height in pixels
 long depth( ); // bitmap depth in bits
 long rowbytes( ); // scan line width in bytes
 long size( ); // bitmap size in bytes
 int planes( ); // number of planes
 int bits( ); // bits per plane
 int compression( ); // compression type
 int xres( ); // x res as pels/meter
 int xdpi( ); // x res dots/inch
 int yres( ); // y res as pels/meter
 int ydpi( ); // y res dots/inch
 void * filehdr( int vers ); // ptr to bmp file header
 void * bmaphdr( int vers ); // ptr to bmp info header
 void * palhdr( int vers ); // ptr to bmp rgb array
 void * bitmap( int vers ); // ptr to bmp bitmap data
 int status( ); // image/file status code
};
#endif


Listing Two
//-------------------------------------------------------------------//

// File: BMP.CPP -- Classes for encapsulating the BMP format //
// Copr: Copyright (c) 1994 by Marv Luse //
//------------------------------------------------------------------//
// Notes... (1) Only BMP input is illustrated here, and in general, the code 
// is intended as a model only. (2) No size typing is performed on pointers or
// the objects to which they point (i.e., near, far, huge, etc). This is 
// normal for OS/2, but since Windows is 16-bit, the code will need to be 
// modified slightly for that environment. In particular, if the entire bitmap
// is to be accessible through a single pointer, that pointer should be 
// declared huge. (3) The code was tested under OS/2 2.1 using the Borland
// 1.0 OS/2 compiler. Tweaking may be necessary with other environment mixes.

#include "stdlib.h"
#include "stdio.h"
#include "string.h"
#include "bmp.h"

//.......A class for performing binary input
BinaryInput::BinaryInput( char * path )
{
 inp = fopen( path, "rb" );
}
BinaryInput::~BinaryInput( )
{
 if( inp ) fclose( inp );
}
int BinaryInput::byte( )
{
 return fgetc( inp );
}
int BinaryInput::word( )
{
 short s;
 fread( &s, sizeof(short), 1, inp );
 return s;
}
long BinaryInput::dword( )
{
 long l;
 fread( &l, sizeof(long), 1, inp );
 return l;
}
int BinaryInput::block( void * blk, int nbytes )
{
 return fread( blk, nbytes, 1, inp );
}

int BinaryInput::ok( )
{
 return ((inp==0) ferror(inp) feof(inp)) ? 0 : 1;
}
int BinaryInput::error( )
{
 if( inp == 0 ) return FILENOTFOUND;
 if( feof(inp) ) return FILEENDOFFILE;
 if( ferror(inp) ) return FILEERROR;
 return FILEOKAY;
}
int BinaryInput::seek( long ofs, int org )

{
 return inp ? fseek( inp, ofs, org ) : FILEERROR;
}
long BinaryInput::tell( )
{
 return inp ? ftell( inp ) : -1;
}
//.......A class for a BMP header and bitmap
BmpImage::BmpImage( char * path )
{
 //.....initialize nonformat items
 bmBits = 0;
 bmNumColors = 0;
 tmpfilehdr = tmpbmaphdr = tmppalette = 0;
 //.....the remaining items constitute a valid OS/2 2.x BMP header set
 memset( &fiType,0,BMPFILEHDRSIZE + BMPNEWOS2HDRSIZE + sizeof(long) * 256 );
 //.....instantiate the input stream
 BinaryInput inB( path );
 if( ! inB.ok() )
 {
 fiBmpStatus = inB.error( );
 return;
 }
 //.....get the file header type field and verify
 fiType = (ushort) inB.word( );
 switch( fiType )
 {
 case BMPBITMAP:
 break;
 case BMPARRAY:
 case BMPCLRICON:
 case BMPCLRPOINTER:
 case BMPICON:
 case BMPPOINTER:
 fiBmpStatus = FILENOTBMPIMG;
 return;
 default:
 fiBmpStatus = FILENOTBMP;
 return;
 }
 //.....read rest of file hdr, which isn't versn dependent
 fiSizeFile = inB.dword( );
 fiXhot = (ushort) inB.word( );
 fiYhot = (ushort) inB.word( );
 fiOffBits = inB.dword( );
 //.....get the bitmap header size field and verify
 bmSizeHeader = inB.dword( );
 switch( bmSizeHeader )
 {
 case BMPOLDANYHDRSIZE:
 case BMPNEWWINHDRSIZE:
 case BMPNEWOS2HDRSIZE:
 break;
 default:
 if( (bmSizeHeader < BMPOLDANYHDRSIZE) 
 (bmSizeHeader > BMPNEWOS2HDRSIZE) )
 {
 fiBmpStatus = FILENOTBMP;
 return;

 }
 break;
 }
 //.....read the rest of the bitmap header and palette
 if( bmSizeHeader == BMPOLDANYHDRSIZE )
 {
 bmWidth = inB.word( );
 bmHeight = inB.word( );
 bmPlanes = (ushort) inB.word( );
 bmBitCount = (ushort) inB.word( );
 bmNumColors = (fiOffBits - bmSizeHeader - BMPFILEHDRSIZE) / 3;
 bmSizeImage = rowbytes( ) * bmHeight;
 for( int i=0; i<bmNumColors; i++ )
 {
 long blu = inB.byte( );
 long grn = inB.byte( );
 long red = inB.byte( );
 bmPalette[i] = (red << 16) (grn << 8) blu;
 }
 }
 else
 {
 long nbytes = bmSizeHeader - 4;
 inB.block( &bmWidth, nbytes );
 bmNumColors = (fiOffBits - bmSizeHeader - BMPFILEHDRSIZE) / 4;
 if( bmNumColors > 0 )
 inB.block( bmPalette, bmNumColors * 4 );
 }
 //.....read the bitmap. Works only for bitmaps 64K or smaller under Windows.
 bmBits = new char [ rowbytes() * bmHeight ];
 if( bmBits )
 {
 inB.block( bmBits, rowbytes() * bmHeight );
 fiBmpStatus = inB.ok( ) ? FILEOKAY : inB.error( );
 }
 else
 fiBmpStatus = FILENOMEMORY;
}
BmpImage::~BmpImage( )
{
 delete [] tmpfilehdr;
 delete [] tmpbmaphdr;
 delete [] tmppalette;
 delete [] bmBits;
}
long BmpImage::width( )
{
 return bmWidth;
}
long BmpImage::height( )
{
 return bmHeight;
}
long BmpImage::depth( )
{
 return bmPlanes * bmBitCount;
}
long BmpImage::rowbytes( )
{

 return (((bmPlanes*bmBitCount*bmWidth) + 31) / 32) * 4;
}
long BmpImage::size( )
{
 return bmSizeImage ? bmSizeImage : rowbytes() * bmHeight;
}
int BmpImage::planes( )
{
 return bmPlanes;
}
int BmpImage::bits( )
{
 return bmBitCount;
}
int BmpImage::compression( )
{
 return bmCompression;
}
int BmpImage::xres( )
{
 return bmXPelsPerMeter;
}
int BmpImage::xdpi( )
{
 return (int) ((bmXPelsPerMeter * 100) / 3937);
}
int BmpImage::yres( )
{
 return bmYPelsPerMeter;
}
int BmpImage::ydpi( )
{
 return (int) ((bmYPelsPerMeter * 100) / 3937);
}
void * BmpImage::filehdr( int vers )
{
 // file header is not version dependent
 return (void *) &fiType;
}
void * BmpImage::bmaphdr( int vers )
{
 // the first 40 bytes of the new OS/2 header is the same as the new Windows
 // header, except for the length value (40 versus 64); the old headers,
 // however, requires reformatting
 if( vers == BMPOS2NEW )
 return &bmSizeHeader;
 // allocate space for worst case - 40 bytes
 if( tmpfilehdr == 0 )
 tmpfilehdr = new char [ BMPNEWWINHDRSIZE ];
 if( (vers == BMPWINNEW) && (tmpfilehdr != 0) )
 {
 memcpy( tmpfilehdr, &bmSizeHeader, BMPNEWWINHDRSIZE );
 *((ulong *) tmpfilehdr) = BMPNEWWINHDRSIZE;
 }
 else if( ((vers == BMPWINOLD) (vers == BMPOS2OLD)) &&
 (tmpfilehdr != 0) )
 {
 // this is ugly, but safe and functional!
 *((ulong *) tmpfilehdr) = BMPOLDANYHDRSIZE;

 short * hdr = (short *) tmpfilehdr;
 hdr[2] = (short) bmWidth;
 hdr[3] = (short) bmHeight;
 hdr[4] = bmPlanes;
 hdr[5] = bmBitCount;
 }
 return (void *) tmpfilehdr;
}
void * BmpImage::palhdr( int vers )
{
 // The palette format is the same for both new
 // format versions, but the old format requires reformatting.
 if( (vers == BMPOS2NEW) (vers == BMPWINNEW) )
 return (bmNumColors > 0) ? (void *) &bmPalette[0] : 0;
 // allocate space for old palette
 if( (tmppalette == 0) && (bmNumColors > 0) )
 {
 tmppalette = new char [ bmNumColors * 3 ];
 if( tmppalette != 0 )
 {
 char * s = (char *) &bmPalette[0];
 char * d = (char *) tmppalette;
 for( int i=0; i<bmNumColors; i++ )
 {
 *d++ = *s++;
 *d++ = *s++;
 *d++ = *s++;
 s++;
 }
 }
 }
 return tmppalette;
}
void * BmpImage::bitmap( int vers )
{
 // bitmap is not version dependent
 return (void *) bmBits;
}
int BmpImage::status( )
{
 return fiBmpStatus;
}




















September, 1994
K-Tree Container Data Structures


Fast subscripting, slicing, and concatenation of sequences




Rodney Bates


Rod is an engineer with Boeing aircraft and can be contacted at
bates@salsv3.boeing.com.


In dealing with the problem of browsing and debugging incomplete programs, I
needed to efficiently handle tree nodes with variable (possibly large) numbers
of children. To address this problem, I developed a data structure called a
"K-tree," which also has general applicability.
K-trees are container data structures that represent linear sequences of
integers, pointers, and the like. There are countless ways of representing a
sequence, but almost all are variations on arrays or linked lists.
Arrays are very fast when you have a subscript, say, I, and want to find the
Ith element of the sequence. The time required by the address computation does
not increase as the sequence gets longer. This is called a "constant-time"
operation. On the other hand, if you want to concatenate two sequences or
extract a subsequence (a slice), part of some array must be copied. The time
this requires increases in direct proportion to the length of the sequences.
This is a "linear-time" operation.
The many variants of linked-list representations tend to be just the opposite.
You can cut apart pieces of a linked list and splice them together in constant
time. But finding the Ith element requires traversing from one end of the
list, which is linear.
Now, suppose you need to do a mixture of constant-time operations and linear
operations. As the problem gets bigger, linear operations will account for
almost all the time the program takes to run, while the overall effect of
constant-time operations is negligible. In an application where both kinds of
operations are needed, performance is indicated by the most inefficient
operation--linear, in this case. 
With K-trees, subscripting, slicing, and concatenation all take time
proportional to the logarithm of the length of the sequence. This is not as
good as constant time, but it's much better than linear time. Since no
operation is worse than logarithmic time, the logarithmic performance
dominates.
K-trees have one other important characteristic that I needed in my
application. When you extract a slice, the original sequence from which it was
taken is preserved. The same goes for concatenation and subscripted
assignment. Most of the array and linked representations have some operation
which destroys operands, unless you first make a copy, which is, of course,
linear.
To make this preservation of operands happen, K-trees use heap objects that
are immutable--once created, they never change. These objects can be shared
among several sequences, and this is vital to making the operations
logarithmic. On the downside, some kind of garbage collection is needed to
reclaim objects no longer used in any sequence.


The K-tree Data Structure


A K-tree is a pointer to one of two kinds of nodes, both of which contain an
integer field named Height. If Height=1, the node is a leaf node and contains
a field named LeafElems (a small array of sequence elements). If Height>1, the
node is a nonleaf node and contains a field named NonleafElems (a small array
of records). Each record contains two fields named CumChildCt and ChildRef.
CumChildCt has the type of sequence subscripts. ChildRef is a K-subtree
pointer.
Every node has a field named ElemCt which gives the number of elements. The
elements of both leaf and nonleaf nodes have subscripts in the range
0..ElemCt--1. Each node is dynamically allocated when created, with exactly
enough space to hold ElemCt elements.
Figure 1 shows a graphical notation for both kinds of nodes. The fields and
array subscripts are labeled, showing how to interpret the nodes in the
examples that follow.
There is a global maximum, N, for the number of array elements in any node of
either kind. N must be at least 3. A leaf node with one element occurs only in
the representation of a singleton sequence. Every other node always has at
least two elements. Figure 2 shows some examples of small K-trees and the
sequences they represent. The elements of sequences are integers. Figures 2(b)
and 2(c) are two different K-trees which represent the same sequence. In
general, there are many K-tree representations for a given sequence. A given
K-tree represents only one sequence, according to the following rules:
A NIL pointer represents the empty sequence. A pointer to a leaf node
represents just the elements of LeafElems. A pointer to a nonleaf node
represents the concatenation of the sequences represented by the ChildRef
fields.
The value of CumChildCt in the Jth element of a nonleaf node is the sum of the
lengths of the subsequences represented by the ChildRef fields of
NonleafElems[0..J]. This means that NonleafElems[ElemCt --1].CumChildCt is the
length of the sequence represented by this node.
From a given node, all paths to leaves have the same length. This value is
stored in the Height field. K-trees are always full in the sense that,
although ElemCt may vary, there are no NIL pointers in ChildRef fields.
No node contains any information derived from its parent or siblings. Since
the nodes are immutable, any subtree can be shared among many parents, each of
which belongs to a different K-tree.


Subscripted Fetching


Subscripted fetching proceeds top-down, using an intermediate subscript I that
is always relative to the current K-subtree. If the K-subtree is a leaf, the
sequence subscript is the subscript to LeafElems and leads directly to the
desired sequence element.
If the K-subtree is a nonleaf, fetch must determine to which of the node's
subtrees it should descend by comparing I with the values of CumChildCt in the
elements of NonleafElems. These values are, by definition, in ascending order,
so this can be done using a classic binary search.
Before fetch descends into the subtree, it must reduce the sequence subscript
by CumChildCt of the subtree to the left of the one it is about to descend
into.


Subscripted Assignment


Subscripted assignment begins like subscripted fetching, proceeding top-down
through the K-tree to the desired leaf element. However, the located element
cannot be altered in place, as this would violate the preservation-of-operands
property.
Instead, store allocates a copy of the old leaf node and alters the Ith
element of the copy. It then returns the pointer to the new node to the level
above. Each nonleaf level does essentially the same thing, except it replaces
only the ChildRef field of the selected element of its copied node with the
pointer it receives from below.
The result is a new K-tree, in which all nodes on the path from the root to
the leaf node containing the Ith sequence element have been replaced, while
all nodes off this path are shared with the original K-tree.
Figure 3 shows the result of assigning value 8 to element number 4 of the
K-tree in Figure 2(d). The shaded nodes have identical contents and are the
same nodes as before the assignment. The other nodes are new. The two new
nonleaf nodes look the same as before, but have different values in one of the
ChildRef fields. To illustrate this, pointer values which have changed are
shown as dashed arrows.


Concatenation



Concatenation is done by constructing a seam along the right edge of the
left-operand K-tree and the left edge of the right-operand K-tree. The seam is
constructed bottom-up, matching, and possibly joining, pairs of nodes of the
same height from both sides of the seam. 
The K-tree representation rules leave some choices as to how the seam is
constructed. If the two operand K-trees have the same height, concatenate
could just create a new nonleaf node of two elements, with the operand K-trees
as its two children. This is simple, but building tends toward binary trees.
It would be better to try to keep nodes more nearly full.
Starting from the bottom in the algorithm I chose, concatenate moves higher as
long as the total number of elements in the nodes on either side of the seam
at a given level is greater than N. Once it reaches a level with N or fewer
elements, it allocates a new node and repacks the elements.
Once repacking has started, every level above has to have one or two new nodes
allocated because some changes in child pointers will be needed to reflect the
replacement of old nodes at the level below.
If the total number of new elements along the seam is greater than N, two new
nodes will be needed. In this case, I divide the elements equally between the
new nodes, so as to keep node sizes equal.
Figure 4 gives an example of K-trees before and after concatenation, showing
only the nodes along the seam. The CumChildCt fields are omitted in this
example. The small numbers in circles represent nodes to either side of the
seam without showing their contents. All these nodes are reused.
At height one, the two leaf nodes collectively have seven elements, which
can't be repacked into one node. They are reused intact in the result. At
height two, there are six elements altogether, so they are repacked into one
new nonleaf node. 
At height three, there are initially eight elements. Two point to nodes that
are not reused in the result and whose replacement consists of only one node.
This leaves a total of seven new elements needed along the seam. These are
distributed into the two new nodes, three elements in one node and four in the
other.
At height four, only the right-operand K-tree has a node. The root pointer of
the left operand is treated as a fictitious, one-element node, which must be
repacked with the elements from the right side of the seam. This requires a
total of seven elements: Two point to replacements for old nodes, and the rest
point to reused nodes to the right of the seam.
Finally, a new node at height five is needed to collect the two nodes of
height four. Thus the height of the result K-tree is one greater than the
highest-operand K-tree.
Implementing concatenation is somewhat more complex than the concept. A
recursive procedure has to start at the top of the K-trees, descend to the
leaves, and then do its work on the way back up. The operand K-trees could
have different heights. The Height field allows the descent to synchronize
itself so it is working on nodes of the same height on each side of the seam.
Unequal heights also create some special cases for node construction during
the return to the top.


Slicing


K-trees are sliced bottom-up by constructing two cuts through the operand
K-tree along the left and right edges of what will become the result K-tree.
At each level, the node is divided between the elements belonging to the slice
and those outside the slice. The node slice of the divided node on its
"sliceward" side must be included in the result K-tree. 
The node slice could have only one element. If this happens, it is repacked
with the adjacent node in the sliceward direction. This will give a total of
at least three and at most N+1 elements, which can always be packed into
either one or two new nodes. As an optimization to keep nodes more nearly
full, the node slice and the adjacent node are also repacked any time they
will collectively fit into one new node.
As with concatenation, the slice algorithm must start at the top of the
operand K-tree, descend recursively, and do its reconstruction on the way back
up. However, when slicing, the descending phase must determine, at each level,
which child to descend to, using the starting and ending slice subscripts.
This is done using the same binary-search technique used in subscripting.
When computing a wide-enough slice at low-enough levels, the left and right
cuts are separated by other K-tree nodes, which will be reused in the result
K-tree. Whenever at least two nodes separate those containing the two cuts,
each side of the slice can be constructed independently.
At higher levels, the two cuts must be handled interdependently whenever they
are spread over three or fewer nodes, since a sliceward node adjacent to a cut
is involved. In these cases, the new elements will fit in at most three new
nodes. When only one element exists, no new node is constructed. Instead, the
single pointer is passed up, eventually to become the result K-tree, which
will be lower than the operand K-tree.
Figure 5 gives an example of slice construction, showing only relevant nodes
along the cuts. The notation is the same as in Figure 4, except that wavy
lines through nodes are used to show the location of the left and right cuts.
At height one, the two cuts are independent. On the left, the node slice of
the leftmost node shown in full contains one element whose value is 20. This
is repacked with the three elements 19, 4, and 25 of the next rightward node.
On the left, the entire node containing element values 16 and 10 is reused.
At height two, three nodes are involved in the slice. At the left end, two
elements have been replaced by one new element, returned from below. All other
elements involved are reused. This gives a total of seven elements, which are
repacked into two new nodes.
At height three, only two nodes are involved. The two new pointers returned
from the level below are packed into a single new node.
Finally, at height four, only one node is involved. Two of its elements are
replaced by one new node. Since one pointer does not require a node at this
level, it becomes the entire result K-tree. 


The Implementation


The source-code implementation of K-trees (along with programmer notes) is
available electronically; see "Availability" on page 3. I implemented K-trees
in Modula-3 for a couple of reasons. First, it has sufficient richness in its
type system to handle K-tree data structure without resorting to type-unsafe
tactics. Secondly, it has built-in garbage collection. 
I tested the K-Tree program with many randomly generated cases. If you run the
test program, be aware that the large number of trees and the brute-force
verification make it a memory and CPU hog. You might want to reduce
SingletonCt, CatCt, CatOfSliceCt, and StoreCt, for a more modest test run.
Figure 1 Notation for K-tree nodes.
Figure 2 K-trees and their sequences; (a) 13; (b) 7, 25, 19, 47, 5; (c) 7, 25,
19, 47, 5; (d) 16, 0, 15, 23, 6, 14, 11, 7, 3, 19, 29.
Figure 3 New K-trees after assignment of value 8 to element number 4.
Figure 4: (a) Before concatenation; (b) after concatenation; N=6.
Figure 5: (a) Before slicing; (b) after slicing.
























September, 1994
Extending REXX with C++


Combining REXX's ease of use with the power and flexibility of C++




Art Sulger


Art specializes in database administration, analysis, and programming for the
State of New York. He can be contacted on CompuServe at 71020,435.


Windows programmers are used to using C or C++ to extend visual development
tools. OS/2 developers haven't always had it so easy, however. It's only
recently that a number of OS/2-targeted visual development tools have come
onto the scene, most of which use OS/2's built-in REXX interpreter (see "A
Multicolumn List-Box Container for OS/2," DDJ, May 1994). With tools such as
Gpf's GpfRexx, HockWare's VisPro/REXX, and Watcom's VX-REXX, REXX can call
routines written in other languages--as long as those languages can create
DLLs.
I recently wrote a number of database routines in C++ for OS/2 Presentation
Manager applications. I wanted to use visual development tools to build the
user interface without recoding the database routines. Consequently, I
extended my C++ classes into REXX-callable external functions. To illustrate
this process, I'll develop, test, and debug a C++ class that displays text
files. I'll then write an application using one of the visual Presentation
Manager tools linked to the C++ class. 


External REXX Functions


REXX allows programs to be written in a clear and structured way. With REXX,
you must follow a rigid structure to write external functions that REXX can
call; see Example 1. You do not directly pass these parameters; they are built
by the REXX interpreter. The only parameters you can even indirectly control
are argc and argv, which are built from the function arguments in your REXX
program--just as main(int argc, char ** argv) in C is built by the
command-line arguments.
REXX has no built-in data typing. The type is inferred from the context, so
variables are passed as character strings. Unlike C, these strings are not
null terminated so the structure includes the length as well as a pointer to
the value. Example 2 is the structure from the REXXSAA.H file (supplied with
the OS/2 Developer's Toolkit; you must include this file in your C module).
For instance, suppose you included the commands in Example 3(a) in a REXX
program. The C function that receives the command would look like Example
3(b). Notice that the fname argument is the argv[0].strptr parameter of the
x->Open() call.
RXSTRING is also the structure you use to build the character string to return
to the REXX command file. Interestingly, the C module is really returning
values to two different processes--your REXX command file and the system's
REXX interpreter. You notify the interpreter with the return statement, and it
expects either 0 (VALID_ROUTINE) or 40 (INVALID_ROUTINE). An INVALID_RETURN
halts program execution. 
You supply the other return value at the address of the last *retstr argument
passed into your function by the REXX interpreter. The value you pass back
from your function is assigned to the special REXX variable RESULT; you can
also assign it to a variable in your REXX program. In Example 4, say emits the
value you have put into the retstr parameter in your FileDBNext function.
Similarly, the string you build in FileDBError is interpreted as a numeric
return code in the do while phrase. With the interpreter passing memory
addresses from your REXX program to your C functions, you may be wondering,
who owns this memory?
The return value (retstr) is allocated by REXX. The REXX interpreter allocates
256 bytes for return values when the process starts. This memory is owned by
the process and is accessible by both your REXX program and C functions. In
addition, any memory you allocate remains available to the process. The usual
scoping rules apply, with the presumption that the REXX program and C
functions are treated as a single module.
The return value that REXX allocates is slightly different than the memory
allocations that you make. You don't have to worry about freeing it, but you
do have to worry about not having enough. For instance, what happens if the
value you want to return is greater than 256 bytes? After all, REXX imposes no
limit on the length of the returned string. The function CopyResult (Listing
One) checks to see if there is enough storage, and, if not, allocates more.
CopyResult() uses DosAllocMem(), which handles memory in 4K chunks, so chances
are you won't have to do too many allocations. This memory stays around until
the process ends, or until the REXX program encounters a return statement that
has no expression attached to it, in which case REXX drops (uninitializes) it.


Using C++


Except for the way C++ mangles names, it is relatively straightforward to
extend these concepts to C++. The REXXSAA.H file should wrap the declarations
in extern "C" directives (although some C++ implementations don't do this). If
you get compile warnings about undefined pragmas or link errors listing
unresolved externals, chances are the header file is obsolete. You can
download the correct file (REXXSAA.ZIP) from CompuServe (GO IBMDF1). 
Listing Two is a C++ class for reading text files. It opens files, reads lines
sequentially, and performs some simple error checking. The class is a
scaled-down version of the database class that I designed to use with the new
REXX tools.
REXX development tools need only interface to the public methods of your C++
class. In the FileDB class these consist of: two constructors, of which one
immediately opens a file and the other instantiates the class without a
particular filename; a destructor, which simply calls the Close() method;
Next(), which returns character strings from the file; and Error(), which
returns the class status.
Constructors should go in one function and destructors in another. Your REXX
procedures will then follow the scenario in Example 5. The constructor
function will initialize the class (clss*x=new clss), and the destructor
function will invoke the class destructor (delete x).
There are two methods of mapping the external function(s) to the public class
members: You can write a separate function for each public method, and the
REXX calls will look like Example 6(a); or, you can put all of the interface
code in a single function, and the same REXX program will look like Example
6(b). Listing One uses both methods. 


Telling REXX About New Functions


You have to notify the REXX environment about any new external functions
written in a language other than REXX. You can use a built-in REXX function
which you must use for at least one of your external functions; see Example
7(a). This loads the external routine Func1 located in RXFILEDB.DLL and tells
REXX that it will be known to REXX by the name Func1. Another approach is for
the C program to register external functions. Example 7(b) from RXFILEDB.CPP
registers all the functions in a previously defined data structure.
The usual procedure is to build a LoadFunctions procedure in your DLL that
will handle the bulk of the registrations. The REXX program need only register
this single function, then call it. Once registered, the external functions
are visible to all processes until OS/2 shuts down or the functions are
deregistered. Usually you won't deregister functions, and often you may want
to write a procedure that will automatically register your functions at
startup.
The code does the registration in the body of the pseudoconstructor. This is
acceptable to OS/2; there is no penalty for duplicate registrations.


Compiling, Linking, and Testing


Compiling and linking are straightforward if you know how to code an OS/2 DLL.
You will need to link to the OS/2 Developer's Toolkit with the appropriate
headers and the REXX.LIB file. These are supplied by the compiler vendor as
part of the OS/2 support files. If you are using fopen() and similar
functions, you should link to the multithreaded C run-time support libraries.
Listing Three shows the compile and link commands for IBM C Set++ and Watcom.
During development, you may get an error while the DLL is being linked because
the DLL you are testing may still be owned by the operating system. Make sure
you end the process in which you are testing the DLL by exiting the window in
which the testing is being done.
If you want to use your library from more than one application concurrently,
direct the linker to provide a data segment for every process that invokes it.
If your application uses any C run-time functions, tell the linker to
initialize the library at each invocation. This is shown in the linker
definition files in Listing Three for both IBM C Set++ and Watcom.
Testing DLLs can be a problem. You will first want to make sure your class
works correctly. I've included test.cpp (see Listing Four) for just this
purpose.
Once you are sure the C++ class works, you should test the REXX callable parts
by writing a command file--a text file containing REXX commands. Command files
must start with a C-style comment, and clauses are delimited by semicolons or
line ends, unless the last character in a line is a comma (the continuation
character). Your test program will make calls to the external functions which
you have built. A call is made using the CALL keyword, or simply by writing
the function. In the latter, the interpreter will execute the function, then
attempt to execute the result.
The RXSTRING that you build in the RexxFunctionHandler to return a value is
placed in a special variable, RESULT, which is set automatically during
execution of a REXX program. The clause in Example 8(a) is equivalent to
Example 8(b), which in turn is functionally identical to Example 8(c).
The sample program T.CMD (Listing Five) prompts for a filename, then lists it.
The program starts with a menu to test concurrent invocations of the DLL. The
menu allows you to select a file in more than one window, then start the
listings almost simultaneously.

If the command file works correctly, then you are ready to use the DLL. If, on
the other hand, you run into a bug, you can source-level debug DLLs called by
REXX by debugging the REXX interpreter (cmd.exe). To do so with the C Set++ PM
debugger, choose the Breakpoints/Load Occurrence option from the Disassembly
window and type in the name of your DLL. The command for loading the debugger
initially is IPMD cmd /K x:\dir\MyREXX.cmd. 


A VX-REXX Application


Figure 1 shows two VX-REXX applications and one REXX command file (T.CMD), all
using the RXFileDB DLL. The code generated is based on the screen design. The
three applications are displaying three different listings. Listing Six
provides only the additional code needed for the VX-REXX application. The
FileDBStart pseudoconstructor call is in the Initialization routine of the
REXX source, and the destructor is in the Quit routine. The File dialog in the
Menu routine calls the remaining functions.
The ability to extend the new REXX development tools can provide a major boost
in productivity. As you can see here, you can extend these tools even further
using your existing library of C++ classes.
Figure 1 Two VX-REXX applications and one REXX command file (T.CMD).
Example 1: Prototype for C function callable by REXX.
ULONG _System // The _System directive inhibits C++ name mangling
FileDBLoadFuncs(
CHAR *name, // The name of this Function name.
ULONG argc, // Number of arguments, as in main(int argc,...
RXSTRING argv[], // Arguments, as in main(int argc,char *argv[])
CHAR *q_name, // Current Rexx queue name
RXSTRING *retstr)// The value that this function will return
Example 2: REXXSAA.H structure.
typedef struct _RXSTRING { /* rxstr */
 ULONG strlength; /* length of string */
 PCH strptr; /* pointer to string */
} RXSTRING;
Example 3: (a) Sample REXX program; (b) the C function called by the REXX
program in (a).
(a) pull fname
 say FileDBOpen(fname)

(b) ULONG _System
 FileDBOpen(CHAR *name, ULONG argc, RXSTRING argv[],
 CHAR *q_name, RXSTRING *retstr)
 {

 char tmp[5] ;
 if( argc != 1 )
 return(INVALID_ROUTINE);
 if (x)
 {
 x->Open(argv[0].strptr) ; // argv[0].strptr == REXX fname
 strcpy(retstr->strptr, itoa(x->Error(), tmp, 10)) ;
 retstr->strlength = strlen(retstr->strptr) ;
 }
 else
 . . .
 return(VALID_ROUTINE);
 }
Example 4: Assigning a value to a variable in your REXX program.
do while FileDBError() = 0
say FileDBNext()
 . . .
Example 5: Constructors should go in one function and destructors in another.
/* typical REXX call to C++ */
call FileDBStart /* the constructor */
do until FileDBError() = 0
 say FileDBNext()
end
call FileDBFinish /* the destructor */
Example 6: (a)Writing a separate function for each public method to map the
external function to the public class members; (b) putting interface code in a
single function to map the external function.
(a) call FileDBOpen("filename")
 if FileDBError() = 0 then do
 say FileDBNext()

 end

(b) call MyREXXDLL("Open", "filename")
 if MyREXXDLL("Error") = 0 then do
 say MyREXXDLL("Next")
 end
Example 7: (a) Using a built-in function to notify the REXX environment about
a new external function; (b) using a C program to register external functions.
(a) call RXFuncAdd 'Func1',
 'MYDLL', 'Func1'

(b) for( j = 0; j < entries; ++j )
 RexxRegisterFunctionDll(
 RxFncTable[ j ].rxName,
 DLLNAME.
 RxFncTable[ j ].cName );
Example 8: Returning a value. The clause in (a) is equivalent to (b) which, in
turn, is functionally identical to (c).
(a) say FileDBNext()

(b) call FileDBNext
 say Result

(c) result = FileDBNext()
 say result

Listing One 

/* RXFileDB.cpp */
#include "FileDB.hpp"
#define INCL_DOS
#define INCL_NOMAPI
#include <os2.h>
#define INCL_RXFUNC
#include <rexxsaa.h>
#include <stdlib.h>
#include <stdio.h>
// Change this if you're changing the DLL name...
#define DLLNAME "RXFILEDB"
#define INVALID_ROUTINE 40 // return Rexx error
#define VALID_ROUTINE 0
// Some useful macros:
#define SetNullRXString(dest) {*dest->strptr = '\0' ; \
 dest->strlength = 0 ;}
#define BUILDRXSTRING(t, s) {strcpy((t)->strptr,(s));\
 (t)->strlength = strlen((s));}
// The C++ interface is done by declaring a global class instance and then 
// using members of this global class in our RexxFunctionHandler functions.
// The class can be declared, or a pointer can be declared and then allocated
// within a RexxFunctionHandler function. We will use the latter approach.
// Either of these declarations will work, with the appropriate adjustments
// in the code that references them:
//FileDB x ; // Don't new and delete these.
// Reference members with dot(.) operator.
FileDB * x ; // Must new and delete.
// Refer to members with arrow(->) operator.
// Then declare the functions that access the public members of
// this class. Remember to export these!
//========================================================================
RexxFunctionHandler FileDBLoadFuncs ; // Load the other functions.
RexxFunctionHandler FileDBDropFuncs ; // Drop all these functions; why bother?

RexxFunctionHandler FileDBStart ; // Our class constructor.
RexxFunctionHandler FileDBFinish ; // Our class destructor.
RexxFunctionHandler FileDBClose ; // "call FileDBClose()"
RexxFunctionHandler FileDBError ; // "do while FileDBError() = 0"
RexxFunctionHandler FileDBNext ; // "say FileDBNext()"
RexxFunctionHandler FileDBOpen ; // "say FileDBOpen('fname')"
RexxFunctionHandler FileDBAPI ; // A single function API
// Define the table that lists REXX function names and the corresponding
// DLL entry point. You must change this table whenever you add/remove
// a function or entry point.
typedef struct {
 PSZ rxName;
 PSZ cName;
} fncEntry, *fncEntryPtr;
static fncEntry RxFncTable[] =
 {
 /* function */ /* entry point (may not match function name*/
 { "FileDBLoadFuncs", "FileDBLoadFuncs" },
 { "FileDBDropFuncs", "FileDBDropFuncs" },
 { "FileDBStart", "FileDBStart" },
 { "FileDBFinish", "FileDBFinish" },
 { "FileDBClose", "FileDBClose" },
 { "FileDBError", "FileDBError" },
 { "FileDBNext", "FileDBNext" },
 { "FileDBOpen", "FileDBOpen" },
 { "FileDBAPI", "FileDBAPI" },
 };
// This function builds strings to return to REXX :
void CopyResult(PRXSTRING dest, const char * src) ;
//------------------------------------------------------------------------
// FileDBLoadFuncs -- Register all the functions with REXX.
ULONG _System
FileDBLoadFuncs(CHAR *name, ULONG argc, RXSTRING argv[],
 CHAR *q_name, RXSTRING *retstr)
 {
 int entries;
 int j;
 if (argc != 0)
 return(INVALID_ROUTINE);
 entries = sizeof( RxFncTable ) / sizeof( fncEntry );
 for( j = 0; j < entries; ++j )
 RexxRegisterFunctionDll( RxFncTable[ j ].rxName, DLLNAME,
 RxFncTable[ j ].cName );
 SetNullRXString(retstr)
 return (VALID_ROUTINE);
 }
//------------------------------------------------------------------------
ULONG _System
FileDBDropFuncs( CHAR *name, ULONG argc, RXSTRING argv[],
 CHAR *q_name, RXSTRING *retstr )
 {
 int entries;
 int j;
 if( argc != 0 )
 return( INVALID_ROUTINE );
 entries = sizeof( RxFncTable ) / sizeof( fncEntry );
 for( j = 0; j < entries; ++j )
 RexxDeregisterFunction(RxFncTable[ j ].rxName);
 SetNullRXString(retstr)

 return(VALID_ROUTINE);
 }
//---------------------------------------------------------------------------
ULONG _System
FileDBStart( CHAR *name, ULONG argc, RXSTRING argv[],
 CHAR *q_name, RXSTRING *retstr )
 {
 if(argc == 1)
 {
 x = new FileDB(argv[0].strptr) ;
 CopyResult(retstr, argv[0].strptr) ;
 }
 else
 if(argc == 0)
 {
 x = new FileDB() ;
 BUILDRXSTRING(retstr, "Okay")
 }
 else
 return (INVALID_ROUTINE) ;
 return(VALID_ROUTINE);
 }
//------------------------------------------------------------------------
ULONG _System
FileDBFinish( CHAR *name, ULONG argc, RXSTRING argv[],
 CHAR *q_name, RXSTRING *retstr )
 {
 if( argc != 0 )
 return( INVALID_ROUTINE );
 if (x)
 delete x ;
 SetNullRXString(retstr)
 return(VALID_ROUTINE);
 }
//----------------------------------------------------------------------
ULONG _System
FileDBClose( CHAR *name, ULONG argc, RXSTRING argv[],
 CHAR *q_name, RXSTRING *retstr )
 {
 char tmp[5] ;
 if( argc != 0)
 return(INVALID_ROUTINE);
 if (x)
 {
 x->Close() ;
 BUILDRXSTRING(retstr, itoa(x->Error(), tmp, 5))
 }
 else
 BUILDRXSTRING(retstr, "No Object")
 return(VALID_ROUTINE);
 }
//----------------------------------------------------------------------
ULONG _System
FileDBError( CHAR *name, ULONG argc, RXSTRING argv[],
 CHAR *q_name, RXSTRING *retstr )
 {
 char tmp[5] ;
 if( argc != 0 )
 return( INVALID_ROUTINE );

 if (x)
 BUILDRXSTRING(retstr, itoa(x->Error(), tmp, 10))
 else
 BUILDRXSTRING(retstr, "No Object")
 return(VALID_ROUTINE);
 }
//----------------------------------------------------------------------
ULONG _System
FileDBNext( CHAR *name, ULONG argc, RXSTRING argv[],
 CHAR *q_name, RXSTRING *retstr )
 {
 if (argc != 0)
 return( INVALID_ROUTINE );
 if (x)
 {
 CopyResult(retstr, x->Next()) ;
 if (x->Error() != 0)
 BUILDRXSTRING(retstr, " ")
 }
 else
 BUILDRXSTRING(retstr, "No Object")
 return(VALID_ROUTINE);
 }
//----------------------------------------------------------------------
// Open returns a number which we convert to character for Rexx->
ULONG _System
FileDBOpen( CHAR *name, ULONG argc, RXSTRING argv[],
 CHAR *q_name, RXSTRING *retstr )
 {
 char tmp[5] ;
 if( argc != 1 )
 return(INVALID_ROUTINE);
 if (x)
 {
 x->Open(argv[0].strptr) ;
 BUILDRXSTRING(retstr, itoa(x->Error(), tmp, 10))
 }
 else
 BUILDRXSTRING(retstr, "No Object")
 return(VALID_ROUTINE);
 }
//----------------------------------------------------------------------
/* This is the 'all-in-one' function. REXX programs will typically
 call this function as follows: 
 call FileDBAPI("Open", "FileName")
 if FileDBAPI("Error") = 0 then do
 say FileDBAPI("Next") .... */
ULONG _System
FileDBAPI( CHAR *name, ULONG argc, RXSTRING argv[],
 CHAR *q_name, RXSTRING *retstr )
 {
 char tmp[5] ;
 if (argc == 0) // There must be at least one arg, the name of the call.
 return( INVALID_ROUTINE );
 if (!x)
 {
 BUILDRXSTRING(retstr, "No Object")
 return(VALID_ROUTINE);
 }

 if (stricmp("Close", argv[0].strptr) == 0)
 {
 if (argc != 1)
 return( INVALID_ROUTINE );
 x->Close() ;
 CopyResult(retstr, itoa(x->Error(), tmp, 10)) ;
 }
 else
 if (stricmp("Error", argv[0].strptr) == 0)
 {
 if (argc != 1)
 return( INVALID_ROUTINE );
 CopyResult(retstr, itoa(x->Error(), tmp, 10)) ;
 }
 else
 if (stricmp("Next", argv[0].strptr) == 0)
 {
 if (argc != 1)
 return( INVALID_ROUTINE );
 CopyResult(retstr, x->Next()) ;
 }
 else
 if (stricmp("Open", argv[0].strptr) == 0)
 {
 if (argc != 2)
 return( INVALID_ROUTINE );
 x->Open(argv[1].strptr) ;
 CopyResult(retstr, itoa(x->Error(), tmp, 10)) ;
 }
 else
 CopyResult(retstr, "Invalid First Argument") ;
 return (VALID_ROUTINE) ;
 }
//------------------------------------------------------------------------
/* CopyResult -- Copies a string into a result, allocating space for it if
 necessary. If you pass it an RXSTRING with a non-null buffer and a 
 non-zero length, it will try to copy the data into that buffer. Otherwise
 is uses DosAllocMem to allocate a new one. */
void CopyResult(PRXSTRING dest, const char *src)
 {
 int len = strlen(src) ;
 static void *mem = NULL;
 if( !dest )
 return ;
 if( (!src) && dest->strptr != NULL)
 SetNullRXString(dest)
 else
 if( dest->strptr != NULL && len <= dest->strlength )
 {
 memset(dest->strptr, 0, (size_t)dest->strlength);
 memcpy(dest->strptr, src, len);
 dest->strlength = len;
 }
 else
 {
 // The buffer is too small, so allocate a new one
 SetNullRXString(dest)
 if (DosAllocMem(&mem, len + 1, PAG_COMMIT PAG_WRITE PAG_READ))
 return ;

 dest->strptr = (char *)mem;
 dest->strlength = len;
 //memset(dest->strptr, 0, len + 1);
 *(dest->strptr + len) = '\0' ;
 memcpy(dest->strptr, src, len);
 }
 }



Listing Two

/* This C++ class can be dynamically allocated and called from REXX. It reads
 text files. There are public members that return integers and others that
 return character pointers. The class does file manipulation using 
 ordinary C runtime functions. */
#ifndef FILEDB_HPP
#define FILEDB_HPP
#include <stdio.h>
#include <string.h>
#define MAXROWSIZE 200
class FileDB
 {
 private:
 FILE * fp ;
 int error ;
 char * bf ;
 long pos ;
 const char * Get()
 {
 if (fgets(bf, MAXROWSIZE, fp) != NULL)
 {
 error = 0 ;
 if (strchr(bf, 0x0d))
 memset(strchr(bf, 0x0d), '\0', 1) ;
 else
 if (strchr(bf, 0x0a))
 memset(strchr(bf, 0x0a), '\0', 1) ;
 if (!strchr(bf, 0x1a))
 return bf ;
 }
 //else
 fseek(fp, 0L, SEEK_SET) ;
 error = -2 ;
 *bf = '\0' ;
 return bf ;
 }
 public:
 FileDB(){fp=NULL;bf=NULL;error=0;}
 FileDB(char * fname){fp=NULL;bf=NULL;Open(fname);}
 ~FileDB(){Close();}
 void Close()
 {
 if(fp)
 {
 fclose(fp);
 fp=(FILE *)NULL;
 }
 if (bf)

 delete [] bf ;
 }
 int Error(){return error;}
 const char * Next(){return Get();}
 void Open(char *fname)
 {
 Close();
 bf = new char [MAXROWSIZE] ;
 if ((fp = fopen(fname, "r")) == NULL)
 error = -1 ;
 else
 error = 0 ;
 fseek(fp, 0L, SEEK_SET) ;
 }
 } ;
#endif // FILEDB_HPP



Listing Three

/* commands for compiling and linking for IBM C-Set++ */
ICC.EXE /Ge- /Gm+ /C .\$*.cpp
 /B" /de /noe /m"
 /Fe"RXFILEDB.DLL" REXX.LIB RXFILEDB.DEF

;RXFileDB.DEF Used for Link step for IBM C-Set++
LIBRARY RXFILEDB INITINSTANCE TERMINSTANCE
DESCRIPTION 'OS/2 Test Dynamic Link Library (c)AFS 1994'
DATA MULTIPLE NONSHARED READWRITE LOADONCALL
CODE LOADONCALL
EXPORTS
 FileDBLoadFuncs
 FileDBDropFuncs
 FileDBStart
 FileDBFinish
 FileDBAPI
 FileDBClose
 FileDBError
 FileDBNext
 FileDBOpen

/* commands for compiling and linking for Watcom */
call wpp386 /bd /bm /d2 rxfiledb.cpp
call wlink @rxfiledb.lnk

;RXFileDB.LNK Used for Link step for Watcom

system os2v2 dll initinstance terminstance
option manyautodata
debug all
option symfile
export FileDBLoadFuncs
export FileDBDropFuncs
export FileDBStart
export FileDBFinish
export FileDBAPI
export FileDBClose
export FileDBError

export FileDBNext
export FileDBOpen
library rexx
file rxfiledb



Listing Four

/* C program to test the FileDB class. Pass in the name of a file to list. */
#include "FileDB.hpp"
#include <iostream.h>
int main(int argc, char * argv[])
 {
 FileDB * f ;
 if (argc != 2)
 return -1 ;
 f = new FileDB(argv[1]) ;
 while (0 == f->Error())
 cout << f->Next() ;
 return 0 ;
 }



Listing Five

/* t.cmd -- Functions with no arguments= call func or rc=func() */
say "Registering the RXFILEDB functions. If the program halts when trying"
say "to call RXFileDB, make sure the RxFileDB.DLL file is in your LIBPATH."
say ""
call RXFuncAdd 'FileDBLoadFuncs', 'RXFILEDB', 'FileDBLoadFuncs'
say "Load Funcs added"
call FileDBLoadFuncs
say "Load Funcs finished"
GotAFile = 0
call FileDBStart
do forever
 say "Main Menu "
 say "1. Choose a File"
 say "2. List a File"
 say "0. Quit"
 say ""
 say "Make a Selection"
 pull input
 select
 when input = 1 then call Chooser
 when input = 2 then call Lister
 when input = 0 then leave
 otherwise
 say "Make a Selection from 1 to 2"
 end
end
say "Left"
call FileDBFinish
say "Finish"
/*call FileDBDropFuncs*/
return
CHOOSER:

GotAFile = 0
say "What is the name of the file? Please include the extension."
say "Example: TEST.EXT"
pull fname
say FileDBAPI("Open", fname)
if FileDBAPI("Error") = 0 then do
 GotAFile = 1
end ;
return
LISTER:
if GotAFile = 0 then do
 say "Choose a file first"
 return
end ;
do until FileDBError() <> 0
 say FileDBNext()
end
say "Fname = " fname
say FileDBOpen(fname) /* Reset to top of the file */
return



Listing Six

/*:VRX File_Click */
File_Click:
 filespec = VRFileDialog( VRWindow(), "File to open?", "O", "*.*" )
 if filespec = "" then do
 return
 end
 call VRMethod("List001", "Clear")
 x = FileDBOpen(filespec)
 do while FileDBError() = 0
 f_line = FileDBNext()
 if FileDBError() = 0 then do
 call VRMethod("List001", "AddString", f_line)
 end
 end
return
/*:VRX Init */
Init:
 window = VRWindow()
 call VRMethod window, "CenterWindow"
 call VRSet window, "Visible", 1
 call VRMethod window, "Activate"
 call RXFuncAdd 'FileDBLoadFuncs', 'RXFILEDB', 'FileDBLoadFuncs'
 call FileDBLoadFuncs
 call FileDBStart
 drop window
return
/*:VRX Quit */
Quit:
 window = VRWindow()
 call VRSet window, "Shutdown", 1
 drop window
 call FileDBFinish
return
































































September, 1994
Inside the RIFF Specification


Designed with multimedia in mind




Hamish Hubbard


Hamish is a computer-science student at Canterbury University in New Zealand.
He can be reached at hamish@kcbbs.gen.nz.


The Resource Interchange File Format (RIFF) is a tagged-file specification
designed for the storage of multimedia data. Data types ranging from C++
objects to full-motion video can be stored based on the RIFF specification,
and new data types can be added; see Table 1.
RIFF provides a standard storage method for different types of multimedia
data. Applications can ignore types of data in a RIFF file that they can't
process, preventing software from becoming obsolete because of the
introduction of a new variation of a data type. The specification's only major
limitation is that, in its current version, the data area of a RIFF file may
not exceed four gigabytes. Given the current state of the art of most PCs,
this limitation isn't serious, but it may become so in a future that promises
giant files such as those of full-length digital HDTV movies. Already, four
gigabytes will only hold a few hours of uncompressed CD-quality audio.
Digitized waveforms recorded or synthesized on PCs are easily handled by the
RIFF specification. Waveform data is one of the most readily manageable
multimedia data types used on PCs. Digitized audio waveforms require far less
bandwidth and CPU power to process than full-motion video, for example.
Consequently, audio is the most widespread type of multimedia data in use.
Wave Viewer, the application I present in this article, reads and writes RIFF
files containing waveform data. Wave Viewer, which uses the Borland
ObjectWindows Library (OWL 2.0) and compiles under Borland C++ 4.0, is a
32-bit application compatible with Win32s, Windows NT, and other 32-bit
versions of Windows. 


RIFF Internals


The basis of the RIFF specification is a data structure known as the "chunk,"
which contains a unique chunk-type identifier, an integer value that holds the
size of the chunk's data area and the data itself. Example 1(a) is Microsoft's
example of the layout of a chunk, using C syntax. RIFF and LIST chunks have an
extra field at the beginning of their data areas; see Example 1(b). These
examples assume that there is no padding between fields in either structure.
Therefore, chunk data in a non-RIFF or LIST chunk starts at an offset of 8
bytes into the chunk (12 bytes in the case of RIFF or LIST chunks). Chunks are
padded at the end to WORD (16-bit) boundaries, however.
Chunks contain data such as ASCII text, waveform data, or bitmaps; certain
chunks (currently only RIFF or LIST chunks) may contain nested subchunks. The
data-area size includes the size of these subchunks (if any). By splitting a
file into several variable-length chunks, RIFF allows for greater flexibility
than file formats defined around fixed-length and position fields. 
The first chunk in a RIFF file must be a RIFF chunk with a chunk ID consisting
of the four-character code RIFF. The first chunk may alternatively be a RIFX
chunk--the X indicates that all integers in the file are in Motorola format.
In the present version of the RIFF specification, only one RIFF or RIFX chunk
is allowed per file.
The RIFF chunk contains at least one other chunk, with the number of chunks
varying depending on the form type (file format) of the file and on the number
of optional chunks that are present in the file. These chunks are known as
"subchunks" of the RIFF chunk.
RIFF chunks have a special code at the start of the data area that specifies
the form type--the type of data in the file and its format. A RIFF form is
also defined by:
A list of mandatory chunks (which must be present to make up a valid file of
the aforementioned form type).
A list of optional chunks, some or all of which may be present.
Optionally, an order in which to store some or all of the chunks.


LIST Chunks


LIST chunks are the only chunks apart from RIFF chunks that may contain their
own subchunks (although this may change). LIST chunks are usually subchunks of
RIFF chunks themselves. Like RIFF chunks, LIST chunks have a four-character
code in the first four bytes of their data area. This code specifies the list
type (analogous to a RIFF chunk's form type) of the LIST chunk. 
For example, a LIST chunk of list type INFO may contain subchunks such as INAM
(the name of the data stored in the file) and ICRD (creation date). LIST
chunks of type INFO are optional in current RIFF forms, but their use is
recommended. The LIST chunks' subchunks can store much more information about
the file than is available from the filename and date stamp. These LIST
subchunks share a common format: Each contains one ASCIIZ (NULL terminated)
string. The file ckinf.h (available electronically, see "Availability," page
3) describes Microsoft's recommendations about data storage for some of these
chunks. 
As long as LIST chunks are stored in the correct place (according to the
applicable RIFF form), correctly written applications that cannot process LIST
chunks will ignore their presence. Table 2 describes the layout of a typical
RIFF file containing a LIST chunk.
The Wave Viewer source code contains several chunks that may be stored in a
LIST chunk of list type INFO (see ckinf.h), as well as short and long
descriptions of these chunk types. Some chunk types are commented out because
they are inappropriate for RIFF files of form type WAVE, the only RIFF form
that Wave Viewer processes. The chunk types that may be stored in a LIST chunk
of list type INFO are listed separately from the other chunks. Table 3
describes an example WAVE form file. 
Multiple LIST chunks may be stored in a RIFF file if they have different list
types (and are therefore used to store different types of data). If an
application does not allow editing of LIST (list type INFO) subchunks, it may
treat the LIST chunk as though it contains nothing other than one piece of
data when reading it from disk. If no errors occur during file saving, all of
the subchunks will be preserved. Also, the code required to read a RIFF file
remains relatively uncomplicated if the LIST chunk is treated like any other
unknown chunk type.


The RIFF API


Beginning with Windows 3.0 with Multimedia Extensions, all versions of Windows
have included an API known as the "multimedia file I/O services," which
includes functions, data types, and messages that ease the task of navigating
through, reading, and writing RIFF files of all forms. The functions,
according to Microsoft's documentation, are superior to Windows and ANSI C
file I/O routines in several respects. Specifically, the chunk-navigation
functions decrease the complexity of the code needed to navigate the structure
of a RIFF file. They have minimal CPU overhead compared with going directly to
the Windows/DOS file I/O routines, and their use reduces the size of an
applications's executable because the API is part of Windows instead of a
statically linked library. However, Windows' multimedia file handles are
incompatible with ANSI C/Windows file handles.
The basic functions of the API, including mmioOpen(), mmioClose(), mmioRead(),
mmioWrite(), mmioSeek(), and mmioRename(), are fairly self-explanatory and can
perform file operations on any file, although they are geared toward use with
RIFF files. The Win16 versions of these functions can use huge pointers; there
is no 64K limit on the amount of data that may be read or written at one time.
The functions that have no analogue in general-purpose file I/O APIs are
mmioDescend(), mmioAscend(), and mmioCreateChunk(). These functions navigate
through the nested structure of a RIFF file and, in the case of
mmioCreateChunk(), build chunks in a file that is being written. Because these
functions take care of the calculation of addresses, offsets, and chunk sizes,
they simplify application code and help ensure that files are read and written
correctly.


Working with RIFF files


Example 2 demonstrates the basics of reading a RIFF file. First, mmioOpen()
attempts to open the file for reading; then mmioDescend(), used with the
MMIO_FINDRIFF flag, finds and descends into the RIFF chunk. The MMIO_FINDRIFF
flag is necessary because of the form-type fcode that occupies the beginning
of RIFF chunk's data area. If mmioDescend() is successful, the current file
position is set to the first byte after the form type four-character code in
the RIFF chunk.
If the code is successful, the current file position is set to an offset of 12
bytes from the beginning of the chunk. This location is the first byte of the
first subchunk of the RIFF chunk. mmioDescend() has filled the RIFFCkInfo
structure with information about the RIFF chunk. If the code fails, then the
file represented by the variable fileName is not a RIFF file of form type
WAVE.
mmioDescend() may also be used to search for a chunk of a particular type by
specifying the MMIO_FINDCHUNK flag. This is useful for finding chunks in a
RIFF file that an application can process. If the chunk is a subchunk (that
is, if it has a parent chunk), then a pointer to MMCKINFO structure previously
filled by mmioDescend() with information about the parent chunk must be
supplied as mmioDescend()'s third parameter. Note that mmioDescend() searches
from the current position in the file, and it may be necessary to use
mmioSeek() to seek back to the beginning of the parent chunk before searching.
If mmioDescend() fails to find a chunk, the current file position becomes
undefined.
However, Wave Viewer takes the approach of reading all of the chunks in a WAVE
form file, whether the chunk types are known or not. By specifying no flags
when calling mmioDescend(), the next chunk in the RIFF file (if any) is
descended into and its information stored in the MMCKINFO structure passed to
it as its second parameter.

mmioAscend() is the counterpart of mmioDescend(); it ascends out of a chunk
that has been descended into. It moves the current file position to the first
byte following the chunk that was ascended from (unless there is no more data
in the file). A call to mmioAscend() should be made to leave a chunk after
data has been read from it.


Writing a RIFF file


The function mmioCreateChunk() builds new chunks in an open RIFF file. A
MMCKINFO structure specifies the chunk's attributes and a flag must be set if
a RIFF or LIST chunk is being created (MMIO_CREATERIFF or MMIO_CREATELIST,
respectively). Depending on the circumstances of the call to
mmioCreateChunk(), only certain values in the MMCKINFO structure need to be
set by an application; mmioCreateChunk() fills in several of the values. 
If successful, mmioCreateChunk() moves the current file position to the
beginning of the data area of the new chunk (and after the chunk type for RIFF
or LIST chunks). The contents of the chunk may then be written using
mmioWrite(), or a subchunk may be created with another call to
mmioCreateChunk(). Note that mmioCreateChunk() cannot insert a chunk partway
into an already existing file. If this is attempted, existing data in the file
will be overwritten.
Once the contents of a chunk have been written, mmioAscend() ensures that the
correct chunk-size value is written to the header of the chunk that is
currently being written.


The WAVE Form


The WAVE form, which stores digitized sound in files with an extension of
.WAV, is defined as:
A RIFF chunk of form type WAVE.
An fmt chunk containing the waveform's format.
An optional fact chunk with format-dependent information.
An optional cue-points chunk (indentifying various locations within the
waveform data).
An optional associated-data list (a LIST chunk of list type adtl).
A data chunk containing waveform data.
The WAVE form requires the fmt chunk to be present before the data chunk, but
there are no other restrictions on the order of chunks in a WAVE file.
Additional chunks such as ZSTR, DISP, and LIST (type INFO) chunks may also be
present. The fact chunk is only necessary if the waveform is not in PCM
format.
The fmt chunk contains (at least) a WAVEFORMAT structure (defined in
mmsystem.h) that contains waveform format information. Listing One is a code
segment from chunkvw.cpp that shows how to decode this information. Depending
on the wFormatTag member of WAVEFORMAT, there may be additional information in
the fmt chunk following the WAVEFORMAT structure. For example, if the data is
in PCM format (WAVE_FORMAT_PCM), there is a WORD value at the end of the chunk
containing the number of bits per sample. The data chunk contains the waveform
data.


Other Chunk Types


Strings can be stored in any of several chunk types defined in the RIFF
specification. These strings may contain annotations or text not appropriate
for any of the LIST (list type INFO) chunks. The most useful chunk is ZSTR,
which may be used to store an ASCIIZ string. Two other string chunks are BSTR
and WSTR, which contain size prefixes of types BYTE and WORD, respectively.
All these chunk types should be stored as subchunks of a RIFF chunk.
A simple representation of a waveform may be stored in a DISP chunk, the
contents of which may be in any format that the Windows clipboard can display.
For example, a DISP chunk could contain an icon to be displayed if the file is
embedded using OLE. A DISP chunk's data area consists of a DWORD containing a
clipboard-format constant (such as CF_DIB), followed by the data used for the
representation. Usage of the CF_TEXT format is discouraged; it's better to use
a string chunk such as ZSTR instead.


Wave Viewer


Wave Viewer is an application written in C++ that reads and writes WAVE form
files, returns information on some chunk types, and enables editing of certain
chunks that contain text. It is a 32-bit Windows application compatible with
Win32s, and it uses C++ features such as templates, exception handling, the
ANSI string class, and run-time type identification (RTTI). At the heart of
the Wave Viewer program are wvfrmdc.h (Listing Two) and wvfrmdc.cpp (Listing
Three). The complete Wave Viewer program (.H, .CPP, .RC, and the like) is
available electronically; see "Availability," page 3.
The only direct calls to the Windows API made by Wave Viewer are to the
multimedia file I/O services. The rest of the application uses only the
ObjectWindows 2.0 class library and Borland container classes. I used
Borland's AppExpert to generate the outline of the application. Code
demonstrating the use of the multimedia file I/O services is in the
TWaveformDoc class (Listing Three), while code demonstrating the use of WAVE
chunk contents is contained in the TChunkView class (chunkvw.cpp, Listing
One). These two classes make up a document/view pair that is consistent with
the doc/view model in ObjectWindows 2.0. These two classes may be readily
transplanted into other ObjectWindows applications or modified to support RIFF
forms other than WAVE.
Table 1: Form types. These are several file formats based on the RIFF
specification. The form-type codes may be stored at the beginning of a RIFF
chunk's data area.
 Form Description 
 CPPO APPS Software International
 C++ Object Format
 PAL Palette File Format
 RDIB Device Independent Bitmap
 Format
 RMID MIDI Audio Format
 RMMP Multimedia Movie File Format
 WAVE Waveform Audio Format
Table 2: Example RIFF file. This shows the layout of a simple WAVE form file
as stored on disk. The "fmt" and "data" chunks are subchunks of the RIFF
chunk.
 Data type Description 
 FOURCC Chunk type (for example, "RIFF")
 DWORD Chunk size
 FOURCC Form type (for example, "WAVE")
 FOURCC Chunk type (for example, "fmt")
 DWORD Chunk size
 BYTE[Chunk size] Chunk contents (for example, waveform format)
 FOURCC Chunk type (for example, "data")
 DWORD Chunk size
 BYTE[Chunk size] Chunk contents (for example, waveform data)
Table 3: An example WAVE form file. A RIFF file containing these chunks would
hold a digitized waveform, and copyright and filename information. Any
application unable to process the LIST chunk could safely ignore it.

Chunk type Contents Optional
RIFF (WAVE) All other chunks in the file, No
 according to the WAVE form
fmt Waveform-format information No
data Waveform data No
LIST (INFO) All descriptive chunks (ICOP, INAM in this example) Yes
ICOP Copyright information (ASCIIZ string) Yes
INAM The name of the waveform (ASCIIZ string) Yes
Example 1: (a) Microsoft's example of the chunk layout using C syntax; (b)
RIFF and LIST chunks have an extra field at the beginning of their data area.
(a)
typedef unsigned long DWORD;
typedef unsigned char BYTE;
typedef DWORD FOURCC; // Four-character code
typedef struct {
 FOURCC ckID; // The unique chunk identifier
 DWORD ckSize; // The size of field <ckData>
 BYTE ckData[ckSize]; // The actual data of the chunk
} CK;

(b)
typedef struct {
 FOURCC ckID;
 DWORD ckSize;
 union {
 FOURCC fccType; // RIFF form type
 BYTE ckData[ckSize];
 } ckData;
} RIFFCK;
Example 2: Reading a RIFFfile.
HMMIO HWaveFile;
MMCKINFO RIFFCkInfo;
HWaveFile = mmioOpen(fileName, 0, MMIO_READ);
RIFFCkInfo.fccType = mmioFOURCC('W','A','V','E');
mmioDescend(HWaveFile, &RIFFCkInfo, 0, MMIO_FINDRIFF);

Listing One 

/* Project WaveView Copyright 1994. All Rights Reserved.
 SUBSYSTEM: waveview.exe 
 FILE: excerpted from chunkvw.cpp AUTHOR: Hamish Hubbard
 OVERVIEW: Source file for implementation of TChunkView (TListView).
*/

#include <owl\owlpch.h>
#include <owl\inputdia.h>
#include "riffsup.h"
#pragma hdrstop

#include "wvfrmdc.h"
#include "chunkvw.h"
#include "txtcked.h"
#include "ckinf.h"
 .
 .
 .
void TChunkView::CmEditItem ()
{
 RIFFCkArray& RIFFChunks = dynamic_cast<TWaveformDoc
*>(Doc)->GetRIFFCkArray();
 int index = GetSelIndex();

 int arrayIndex = 0;
 CkInfo ckInfo;
 GETCKINFO(RIFFChunks[index], &ckInfo);
 // Determine whether the chunk selected by the user is s text-chunk
 // and if so, allow user to edit it via the Text-chunk Editor dialog.
 // Otherwise, display information about fmt , data, etc chunks.
 if (ckInfo.ckID == mmioFOURCC('f', 'm', 't', ' ')) {
 // Display format-information.
 string fmtStr;
 WAVEFORMAT *waveFormat = (WAVEFORMAT *)(RIFFChunks[index] + sizeof(CkInfo));
 char numStr[100];
 // WAVE_FORMAT_PCM (0x0001) is only format defined in regular
 // mmsystem.h, but there are several others defined by MS.
 fmtStr += "Format type # : ";
 fmtStr += itoa(waveFormat->wFormatTag, numStr, 10);
 fmtStr += "\nNumber of channels (1 = mono, 2 = stereo) : ";
 fmtStr += itoa(waveFormat->nChannels, numStr, 10);
 fmtStr += "\nSamples per second : ";
 fmtStr += ultoa(waveFormat->nSamplesPerSec, numStr, 10);
 fmtStr += "\nAverage # bytes per second : ";
 fmtStr += ultoa(waveFormat->nAvgBytesPerSec, numStr, 10);
 fmtStr += "\nBlock alignment (minimum unit of data) : ";
 fmtStr += ultoa(waveFormat->nBlockAlign, numStr, 10);
 // If the data is in PCM format then there is extra information
 // to be extracted from the FMT structure.
 if (waveFormat->wFormatTag == WAVE_FORMAT_PCM) {
 fmtStr += "\nBits per sample : ";
 fmtStr += itoa( ((PCMWAVEFORMAT *)waveFormat)->
 wBitsPerSample, numStr, 10);
 }
 MessageBox(fmtStr.c_str(), "'fmt ' Chunk Information");
 return;
 }
 // Display basic information about a 'data' chunk.
 if (ckInfo.ckID == mmioFOURCC('d', 'a', 't', 'a')) {
 char sizeStr[100];
 string dataStr;
 dataStr = "Size of data chunk : ";
 dataStr += itoa(ckInfo.ckSize, sizeStr, 10);
 dataStr += " bytes";
 MessageBox(dataStr.c_str(), "'data' Chunk Information");
 return;
 }
 while (CkDesc[arrayIndex].FOURCCStr != 0) {
 if (ckInfo.ckID==mmioStringToFOURCC(CkDesc[arrayIndex].FOURCCStr,0)) {
 string ckStr = (RIFFChunks[index] + sizeof (CkInfo));
 string desc = CkDesc[arrayIndex].longDesc;
 TextCkEditDlg(this, ckStr, desc).Execute();
 // Set chunk-size information for edited chunk 
 // by getting length of text string and adding 1 to 
 // account for necessary null terminator.
 ckInfo.ckSize = ckStr.length() + 1;
 char *buffer;
 try {
 buffer = new char[ckInfo.ckSize + sizeof (CkInfo)];
 }
 catch (xalloc) {
 MessageBox("Error: Out of memory.");
 return;

 }
 memcpy(buffer + sizeof(CkInfo), 
 ckStr.c_str(), ckInfo.ckSize);
 SETCKINFO(buffer, &ckInfo);
 RIFFChunks.Destroy(index);
 RIFFChunks.AddAt(buffer, index);
 // Assume that text of this chunk has been modified.
 dynamic_cast<TWaveformDoc *>(Doc)->SetDirtyFlag(TRUE);
 return;
 }
 arrayIndex++;
 }
}
 .
 .
 .



Listing Two

#if !defined(__wvfrmdc_h) // Sentry, use only if it's not already included.
#define __wvfrmdc_h

/* Project WaveView -- Copyright 1994. All Rights Reserved.
 SUBSYSTEM: waveview.exe 
 FILE: wvfrmdc.h -- AUTHOR: Hamish Hubbard
 OVERVIEW: Class definition for TWaveformDoc (TFileDocument).
*/

#include <owl\owlpch.h>
#include <owl\docview.h>
#include <owl\filedoc.h>
#include <cstring.h>
#include "chunkds.h" // RIFFCkArray (TArrayAsVector) type, etc.
#pragma hdrstop

#include "wvapp.rh" // Definition of all resources.

class TWaveformDoc : public TFileDocument {
public:
 TWaveformDoc (TDocument* parent = 0);
 ~TWaveformDoc (); 
 virtual BOOL Open (int mode, const char far *path = 0);
 virtual BOOL Commit (BOOL force = FALSE);
 void SetDirtyFlag(BOOL flag = TRUE) { DirtyFlag = flag; }
public:
 // TXFile is a class to be used to hold state information when throwing
 // exceptions during execution of code that reads or writes files.
 class TXFile {
 private:
 string errorMsg;
 BOOL closeFile;
 public:
 TXFile (const string& msg = "", BOOL close = TRUE) {
 errorMsg = msg;
 closeFile = close; }
 const string& GetMessage () { return errorMsg; }
 const BOOL GetClose () { return closeFile; }

 };
 RIFFCkArray& GetRIFFCkArray() { return *RIFFChunks; }
protected:
 virtual BOOL ReadWaveFile (int omode, const char *name);
 virtual BOOL WriteWaveFile ();
private:
 void ReadSubchunks (const HMMIO HWaveFile, MMCKINFO& parentCkInfo)
 throw (TXFile, xalloc);
 void WriteSubchunks (const HMMIO HWaveFile)
 throw (TXFile);
private:
 RIFFCkArray *RIFFChunks;
 UINT chunkDepth; // Specifies number of chunks that have 
 // been descended into.
 int saveIndex; // Index of the next chunk to be saved in the array.
};
#endif // __wvfrmdc_h sentry.



Listing Three

/* Project WaveView
 Copyright 1994. All Rights Reserved.
 SUBSYSTEM: WaveView.exe Application
 FILE: wvfrmdc.cpp -- AUTHOR: Hamish Hubbard
 OVERVIEW: Source file for implementation of TWaveformDoc (TFileDocument).
*/

#include <owl\owlpch.h>
#pragma hdrstop
#include "wvfrmdc.h"

TWaveformDoc::TWaveformDoc (TDocument* parent) : TFileDocument(parent)
{
 RIFFChunks = new RIFFCkArray(1, 0, 4);
 chunkDepth = 0;
 saveIndex = 0;
}
TWaveformDoc::~TWaveformDoc ()
{
 // Empty RIFFChunks array, deleting contents of each chunk (thereby freeing
 // memory occupied by each chunk). RIFFChunk's object is then destroyed.
 RIFFChunks->Flush();
 delete RIFFChunks;
}
BOOL TWaveformDoc::Open (int mode, LPCSTR path)
{
 if (path)
 SetDocPath(path);
 if (mode != 0)
 SetOpenMode(mode);
 return ReadWaveFile(GetOpenMode(), GetDocPath());
}
BOOL TWaveformDoc::ReadWaveFile (int, const char *name)
{
 // Attempt to open the file named by name as a RIFF File. If successful, 
 // read information from the file relating to digitized waveform data 
 // and enable the client dialog window. HWaveFile is a RIFF API file 

 // handle. It is not compatible with regular Windows file handles.
 HMMIO HWaveFile;
 MMCKINFO parentCkInfo;
 chunkDepth = 0;
 try {
 // Open file specified in fileName for reading, using buffered I/O.
 HWaveFile = mmioOpen((const_cast<char *>(name), 0, MMIO_READ);
 if (!HWaveFile)
 throw (TXFile("Unable to open the file.", FALSE)); 
 parentCkInfo.fccType = mmioFOURCC('W','A','V','E');
 if (mmioDescend(HWaveFile, &parentCkInfo, 0, MMIO_FINDRIFF))
 throw TXFile("The file is not a RIFF file containing 
 waveform data.");
 CkInfo *waveCkInfo = new CkInfo;
 waveCkInfo->ckSize = 0;
 waveCkInfo->ckID = parentCkInfo.ckid;
 waveCkInfo->ckType = parentCkInfo.fccType;
 waveCkInfo->ckDepth = chunkDepth;
 RIFFChunks->Add((char *)waveCkInfo);
 chunkDepth++;
 // Read the subchunks contained in the WAVE chunk.
 ReadSubchunks(HWaveFile, parentCkInfo);
 }
 // Handle TXFile-type exceptions (errors during file reading/writing).
 catch (TXFile xFile) {
 // If something fails, clean up and bail out.
 if(xFile.GetClose() == TRUE)
 mmioClose(HWaveFile, 0);
 return FALSE;
 }
 // Handle xalloc exceptions (thrown by operator new).
 catch (xalloc) {
 mmioClose(HWaveFile, 0);
 return FALSE;
 }
 mmioClose(HWaveFile, 0);
 return TRUE;
}
void TWaveformDoc::ReadSubchunks (const HMMIO HWaveFile,MMCKINFO&
parentCkInfo)
 throw(TWaveformDoc::TXFile, xalloc)
{
 MMCKINFO subCkInfo;
 CkInfo ckInfo;
 char *ckContents;
 while (mmioDescend(HWaveFile, &subCkInfo, 0, 0) == 0) {
 // Prevent ascending past the end of the file.
 if ((subCkInfo.dwDataOffset + subCkInfo.cksize) >
 (parentCkInfo.dwDataOffset + parentCkInfo.cksize))
 throw(TXFile("The file contains corrupt or damaged information."));
 // Fill in some of the details about the chunk.
 ckInfo.ckID = subCkInfo.ckid;
 ckInfo.ckType = subCkInfo.fccType;
 ckInfo.ckSize = subCkInfo.cksize;
 ckInfo.ckDepth = chunkDepth;
 chunkDepth++;
 switch (subCkInfo.ckid) {
 case mmioFOURCC('L', 'I', 'S', 'T'):
 ckInfo.ckSize = 0;
 ckContents = new char[sizeof(CkInfo)];

 SETCKINFO(ckContents, &ckInfo);
 RIFFChunks->Add(ckContents);
 ReadSubchunks(HWaveFile, parentCkInfo);
 break;
 default:
 ckInfo.ckID = subCkInfo.ckid;
 ckInfo.ckType = subCkInfo.fccType;
 ckContents=new char[ckInfo.ckSize+sizeof(CkInfo)];
 mmioRead(HWaveFile,ckContents+sizeof(CkInfo),
 ckInfo.ckSize);
 SETCKINFO(ckContents, &ckInfo);
 RIFFChunks->Add(ckContents);
 break;
 }
 // Ascend out of the current subchunk.
 mmioAscend(HWaveFile, &subCkInfo, 0);
 chunkDepth--;
 }
}
// TWaveformDoc -- Save the document to permanent storage.
BOOL TWaveformDoc::Commit (BOOL force)
{
 if (TDocument::Commit(force) == FALSE)
 return FALSE;
 return WriteWaveFile();
}
BOOL TWaveformDoc::WriteWaveFile ()
{
 // HWaveFile is a RIFF API file handle. 
 HMMIO HWaveFile;
 MMCKINFO MMCkInfo;
 try {
 saveIndex = 0;
 // Attempt to open the file for saving to permanent storage.
 // The cast (char *) in the function below is a bit ugly...
 HWaveFile = mmioOpen(const_cast<char *>GetDocPath()), 0, MMIO_WRITE 
 MMIO_CREATE);
 if (HWaveFile == 0)
 throw TXFile("Error: Unable to open the file for saving.", FALSE);
 WriteSubchunks(HWaveFile);
 if (mmioAscend(HWaveFile, &MMCkInfo, 0) != 0)
 throw TXFile("Error: Unable to save a chunk to the file.");
 if (mmioClose(HWaveFile, 0) != 0)
 throw TXFile("Error: Unable to close the file being saved.");
 }
 catch (TXFile xFile) {
 if (xFile.GetClose() == TRUE)
 mmioClose(HWaveFile, 0);
 return FALSE;
 }
 return TRUE;
}
void TWaveformDoc::WriteSubchunks (const HMMIO HWaveFile) throw 
 (TWaveformDoc::TXFile)
{
 MMCKINFO MMCkInfo;
 CkInfo ckInfo;
 DWORD flags;
 while (saveIndex < RIFFChunks->GetItemsInContainer()) {

 GETCKINFO((*RIFFChunks)[saveIndex], &ckInfo);
 MMCkInfo.fccType = ckInfo.ckType;
 MMCkInfo.ckid = ckInfo.ckID;
 MMCkInfo.dwFlags = 0;
 switch (MMCkInfo.ckid) {
 case mmioFOURCC('R', 'I', 'F', 'F'):
 flags = MMIO_CREATERIFF;
 break;
 case mmioFOURCC('L', 'I', 'S', 'T'):
 flags = MMIO_CREATELIST;
 break;
 default:
 flags = 0;
 break;
 }
 if (mmioCreateChunk(HWaveFile, &MMCkInfo, flags) != 0)
 throw TXFile("Error: A chunk could not be created 
 during saving.");
 if (ckInfo.ckSize > 0)
 if (mmioWrite(HWaveFile, (*RIFFChunks)[saveIndex] + 
 sizeof (CkInfo), ckInfo.ckSize) == -1)
 throw TXFile("Error: A chunk could not be 
 saved correctly.");
 saveIndex++;
 if (saveIndex + 1 < RIFFChunks->GetItemsInContainer()) {
 CkInfo nextCkInfo;
 GETCKINFO((*RIFFChunks)[saveIndex + 1], &nextCkInfo);
 switch (MMCkInfo.ckid) {
 case mmioFOURCC('L', 'I', 'S', 'T'):
 case mmioFOURCC('R', 'I', 'F', 'F'):
 WriteSubchunks(HWaveFile);
 break;
 default:
 break;
 }
 if (mmioAscend(HWaveFile, &MMCkInfo, 0) != 0)
 throw TXFile("Error: A chunk could not be 
 saved correctly.");
 if (nextCkInfo.ckDepth < ckInfo.ckDepth)
 return;
 }
 else
 // Ascend out of the last chunk.
 if (mmioAscend(HWaveFile, &MMCkInfo, 0) != 0)
 throw TXFile("Error: A chunk could not be saved correctly.");
 }
}















September, 1994
Median-Cut Color Quantization


Fitting true-color images onto VGA displays




Anton Kruger


Anton works for Truda Software, a Fortran and C consulting firm. He can be
reached through the DDJ offices.


In many instances, the number of colors in an image exceeds the number of
displayable colors. For example, the output from 24-bit color scanners and
ray-tracing software produces what's known as true-color images, which simply
means that the red, green, and blue (RGB) components are each eight bits wide.
It is often said that a true-color image can have up to 224, or 16 million
colors, but the image dimensions normally determine the upper limit; a 512x512
true-color image, for instance, can't have more than 5122, or 262,000 colors.
While true-color displays are common on high-end workstations and some
true-color video cards are becoming available for the PC, most of us still
have to be satisfied with VGA cards that can display no more than 256 colors
at a time. Another reason for wanting to convert a true-color image to a
256-color image is the savings in disk space. A 512x512 true-color image
requires 512x512x3(approximately)786 Kbytes of disk space, while a 256-color
version of the image requires one byte per pixel, or 512x512(approximately)262
Kbytes of disk space. The problem then is: Given a true-color image, what 256
colors should we use to display the image?
Note that the RGB components of a color can be viewed as lying along the axes
of a color space, or RGB cube; see Figure 1. For a 24-bit image, the color
space is (practically speaking) continuous, since the smallest difference
between colors is imperceptible. This continuous color space must be mapped to
256 discrete colors, which are then used to represent the color space. Mapping
a continuous variable to a discrete set of values is called quantization.
A simple approach to color quantization is to take the 256 most-common colors
in the true-color image. For each color in the input image, we then search for
the closest color in the set of 256 colors, and display that color instead.
Actually, the 256 colors are loaded in the display system's color map or video
look-up table (LUT), and the pixels' LUT indexes are written out to the
display hardware. The algorithm that picks the 256 most-common colors to load
into the LUT is the popularity algorithm. Its simplicity comes at a price,
because it gives poor results for many images. Other color-quantization
algorithms may give excellent results, but may be very time consuming. For
example, as Wan et al. report, it "may take more than 20 hours on a VAX 780
computer to produce 256 clusters for a full-color image" using an algorithm
known as the K-means algorithm. Color quantization is a post-processing
procedure on an image, and should normally take only a fraction of the time to
generate the image. Thus, such run times are in most cases unacceptable.
Some algorithms quantize an image without regard to its content; their
dominant feature is speed. They use a fixed number of standard colors. For
example, 256 colors require eight bits of color per pixel, and the bits are
divided among red, green, and blue. Since the human eye is less sensitive to
blue, red and green each get three bits, and blue two bits. This gives eight
shades of red, eight of green, and four of blue. Other colors are mixtures of
these primaries. The problem is that for a particular image, most standard
colors may never be used. For example, if an image contains a lot of red and
very little blue, all the entries assigned to blue are effectively wasted.
This is a fundamental flaw of image-independent quantizers--if the image and
the quantizer don't match, the results are poor.


The Median-cut Algorithm


Heckbert's median-cut algorithm is an image-based, color-quantization
algorithm that gives good results for many images. If properly implemented,
it's also quite fast. The aim of the median-cut algorithm is to have each of
the 256 output colors represent the same number of pixels in the input image.
The starting point is the RGB cube that corresponds to the whole image, around
which a tight-fitting cube is placed. The cube is then split at the median of
the longest axis. This ensures that about the same number of colors is
assigned to each of the new cubes. The procedure is recursively applied to the
two new cubes until 256 cubes are generated. The centroids (average values) of
the cubes become the 256 output colors.
For example, Table 1 is a histogram for 14 pixels that have six unique colors;
for clarity, the colors have no blue component. The task is to apply the
median-cut algorithm to this histogram and find four output colors.
The initial rectangle is in Figure 2(a); crosses are used to indicate the
pixels' colors. The distance from maximum to minimum along the R-axis is
80--5=75; along the G-axis, 80--20=60. Thus, the longest distance from maximum
to minimum is along the R-axis. The median along this axis is R=30, so this is
where the rectangle is split, and two tight-fitting rectangles are placed
around the two new regions; see Figure 2(b).
When applying the procedure to the left rectangle, the axis with the longest
distance is the G-axis. The median along this axis is G=50--this is where the
axis is split. For the right rectangle, the axis with the longest distance
from maximum to minimum is also the G-axis. The median along this axis is
G=40, and this is where the axis is split. The final rectangles are shown in
Figure 2(c), where each rectangle has about the same number of pixels.
Following this, the centroids of the colors in each rectangle are computed.
The resulting values are the output colors; see Figure 2(d).
There are two approaches to remapping an input image: fast remapping and best
remapping. With fast remaps, the centroid of a cube represents all the colors
enclosed by the cube. This is often good enough, but does not give the best
results because the centroid of the cube in which a color falls may not be the
closest centroid. Best remap searches the list of output colors for the
closest color, but because searching is involved, it's slower.


Implementing the Median-cut Algorithm


The median-cut algorithm is normally described as recursive. However, it's
easier and more practical to implement it as a nonrecursive routine. (Any
recursive algorithm can be converted to an iterative algorithm.) To see why,
apply the recursive algorithm in Figure 3(a), assuming it's used to generate
four cubes.
The first division of the RGB cube results in two smaller cubes called CubeA1
and CubeB1. Then CubeA1 is split into CubeA2 and CubeB2. Next CubeA2 is split
into CubeA3 and CubeB3, rendering four cubes. See Figure 3(b), where CubeB1 is
never split, generating a depth-first spanning tree. To solve this problem,
associate with each cube a level where the initial RGB cube has level 0,
CubeA1 and CubeB1 have level 1, and so on. To end up with four cubes, we now
require that the maximum level for any cube be log24=2; see Figure 3(c).
Smaller cubes are also split on each recursive call. However, when the proper
level is reached, the algorithm returns and splits the larger regions.
Another problem an implementation should address is the special case in which
a cube encloses only one color, typically repeated many times. Obviously, such
a cube cannot be split. One method of dealing with this is to split one of the
other cubes; otherwise, you end up with one less cube than you want. To do
this, you need a list of all the current cubes. However, this isn't available
with the algorithms in Figure 3(a) and Figure 3(c), since you implicitly let
the algorithm save the cubes on the program stack during the recursive calls.
You can deal with this by increasing the maximum allowable level by 1 when
encountering a cube with a single color. Unfortunately, this again opens the
door for the problem of dividing smaller cubes while a larger cube is still
waiting on the stack. Furthermore, the situation where the desired number of
final cubes is not a power of 2, further complicates the process.
To address these problems, I used the following nonrecursive method. A list of
the cubes computed so far is maintained in an array. During each iteration,
the list of cubes is scanned for the cube with the smallest level, but cubes
with single colors are ignored. Several cubes are normally candidates for
splitting, and you could devise a rule for picking the best one to split, but
I simply used the first one found. This cube is then split at the median,
perpendicular to the longest axis. This increases the number of cubes by one.
One of the new cubes is saved in the slot of the cube just split, and the
other is added to the bottom of the list of cubes. When the desired number of
cubes is generated, or all of the cubes enclose a single color, the splitting
phase terminates. This algorithm is summarized in Figure 3(d). 
At this point you have a list of cubes, and the next step is to compute their
centroids. These colors are the output colors, or output color map. The
histogram is no longer required, and you can use its space for a look-up table
that holds the output colors--an inverse color map. The inverse color map
works just like the histogram, where 24-bit colors are mapped to 15-bit colors
that serve as indexes into the histogram. However, the inverse color map
contains the input colors' indexes in the color map, instead of their count.
I've seen implementations of the median-cut algorithm that compute the inverse
color map for the whole RGB cube. However, this is unnecessary because you
know which colors each cube encloses, and they're the only ones required.
Initialization of the inverse color map depends on the kind of remapping
desired. With fast remap, initialization is accomplished by replacing the
counts for all the colors in a cube with the centroid of the cube. With best
remap, initialization is accomplished by replacing the counts for all the
colors in a cube by the closest centroid, found by searching the list of
centroids.


Finding the Median


The time-consuming part of quantization is finding the median along an axis of
a cube. Algorithms exist for finding the median of a set of numbers with a
run-time complexity of O(N), but because of its simplicity, I used the
following method. First sort the numbers in ascending order and compute their
sum. Then start at the smallest number and compute a running sum. When the
running sum is equal to half the total, you've found the median.
I've used the C standard-library function qsort (based on the Quicksort
algorithm with its average run-time complexity of O(N log2N)) to do the sort.
However, the algorithm has a potentially serious pitfall. When it's called to
sort data that is already in order, the run-time complexity is O(N2). The
difference between an algorithm with O(N log2N) complexity and one with O(N2)
is dramatic when N is large. For example, with 32K=215 colors, the O(N2)
algorithm takes more than 2000 times longer. Because of the way the histogram
is constructed, with five bits of each color packed into a 15-bit color, one
color is always sorted. The macros in Listing One are used to pack the 15-bit
colors, which results in the image histogram being initially sorted on the
blue component. A worst-case scenario is when a predominantly blue image is
encountered--qsort is then called several times to sort a large set of colors
already in order.
Another problem with Quicksort is that it is often implemented as a recursive
routine, and can rapidly deplete the program stack when a large number of data
points must be sorted. A nonrecursive implementation needs much less auxiliary
storage. Thus, a good implementation of Quicksort is normally nonrecursive,
and it randomizes the input data somewhat to provide for the case were the
data is already in sorted order. Many implementations, however, don't do this.
For example, I examined the Microsoft C 5.1 run-time library source, and its
qsort is a fairly simple recursive implementation of Quicksort. For a
production version of the median-cut algorithm, you might want to replace the
qsort routine with a sort routine that's less efficient on the average, but
has a better worst-case performance. Heapsort has both an average and
worst-case run-time complexity of O(N log2N), and it needs no (or very little)
auxiliary storage.


Data Structures


The histogram, which I call Hist, is accessed indirectly via the array HistPtr
that functions as an index into the histogram. HistPtr contains the position
of the colors in the actual histogram. When the histogram (or parts of it)
must be sorted, HistPtr (or parts of it) is sorted. A structure is used to
represent a cube. In Listing One, this structure (cube_t) has several members,
but the essential members are lower and upper, and they point to the opposite
corners of a cube in HistPtr.
To see how this works, return to the data in Table 1, and rework the example
in terms of the Hist and HistPtr. Also, keep an eye on Figure 2 to see the
correspondence between the data structures and the geometric interpretation of
the median-cut algorithm. Assume that after the histogram is constructed, it
looks like Hist; see Figure 4(a). Note the "holes" in the histogram--this is
typical. Also, this example assumes that the colors are initially sorted on
green. C1 has the smallest green component, C4 has the second smallest, and so
on. Just after Hist is constructed, HistPtr is filled as in Figure 4(a). The
initial cube is the whole RGB cube, so that Cube.lower is set to 0, and
Cube.upper to 5. This completes the initialization.
Now the cube must be split. The red axis is the longest, so we sort the colors
along this axis. To do this, sort the array HistPtr instead of Hist. To find
the median along the axis, start at Cube.lower; its value is 0. We look in
HistPtr[0]; its value is 5, and this corresponds to C2 in Hist. C2's count is
4, and this is the running sum. Now we look at Cube.lower+1's corresponding
color, C0, and the running sum becomes 3+4=7. This is half the total for the
cube, so it is split to form two cubes, CubeA1, and CubeB1. The result is in
Figure 4(b).
Next we split CubeB1. The colors in CubeB1 are sorted along the longest axis,
the green axis. This is shown in Figure 4(c), while Figure 4(d) depicts the
situation after CubeB1 is split. Finally, we split CubeA1 to get the desired
four cubes; see Table 2.

To compute the centroid for, say, CubeB2, start at CubeB2.lower, whose value
is 4. Look in HistPtr[4] to find 4. The color at Hist[4] is C5, and its count
is 2. Next find the color that corresponds to CubeB2.lower+1, which is also
the last color in CubeB2. This is color C3, which has a count of 2. The
centroid for this cube is shown in Example 1. All centroids are computed this
way.


The MedianCut Program


MedianCut, the program in Listing One was developed on and for DOS PCs. To
ease porting, I've tried to adhere to ANSI C. I also used typedefs for some
variables when the number of bits was important. When moving to a different
platform, you might want to change these so that they reflect the sizes of the
target system. The code compiles and executes cleanly with Microsoft's 5.1
compiler (large memory model). If the /W4 compiler switch is used with version
6.0 of this compiler, some warning messages are issued, but they can safely be
ignored. 
MedianCut takes three arguments: the image histogram Hist, which contains the
15-bit colors; maxcubes, the desired number of output colors (this can be any
number between 1 and 256, but the upper limit can be changed by altering the
#define MAXCOLORS preprocessor directive); and a maxcubes x 3 array ColMap
that MedianCut fills with the output colors, where the red component of color
i is r=ColMap[i][0]; that of green, g=ColMap[i][1]; and that of blue,
b=ColMap[i][2]. MedianCut returns the number of actual colors, which may be
less than maxcubes if the input image already contains less colors than the
requested number.
The #define FAST_REMAP preprocessor directive controls whether the inverse
color map is initialized with the fast-remap or the best-remap method. If this
directive is deleted or commented out, initialization is done according to the
best-remap method.
Additionally, I've written a typical driver routine for MedianCut that, in
this case, converts a Targa type-2 image file to a Targa type 1. Truevision's
Targa type-2 image format is a popular format for true-color images, and Targa
type-1 images are color-mapped, so to convert from type 1 to type 2, we must
quantize the true-color image. This program is available electronically; see
page 3.
Instead of dynamically allocating the space for HistPtr and the array for the
list of cubes, I've chosen to have these static variables. These arrays occupy
about 70 Kbytes. The histogram occupies another 64 Kbytes, and depending on
your compiler's qsort routine, several Kbytes of stack may be required. Add to
this about another 20 Kbytes for the file buffers, as well as the space
required for the program code, and a total of about 200 Kbytes of memory are
required.
Figures 5 and 6 each show two images before and after application of the
median-cut algorithm (fast remap). Both were generated with the POV-Ray
(Persistence of Vision) ray-tracing program (see "Ray Tracing and the POV-Ray
Toolkit," by Craig A. Lindley, DDJ, July 1994). The 256-color version of
"grid" is indistinguishable from the true-color original, which shows how
effective the median-cut algorithm can be. However, with the "lamp" image,
there are some false contours in areas of low contrast, but this may not show
up in the photographs, since it is quite difficult to reproduce a video
display faithfully in print.
This implementation of the median-cut algorithm is quite fast. With the
fast-remap method, it takes about eight seconds to quantize 640x480 true-color
versions of the images in Figures 5 and 6 on a 20-MHz AT clone. Much of this
time is I/O, and the actual color quantization takes less than three seconds.


References


Wan, S.J., S.K.M. Wong, and P. Prusinkiewicz. "An Algorithm for
Multidimensional Data Clustering." ACM Transactions on Mathematical Software
(June 1988).
Heckbert, P. "Color Image Quantization for Frame Buffer Display." Computer
Graphics (July 1982).
Figure 5 Grid: (a) original, true-color image; (b) 256-color version
(image-file courtesy of Dan Farmer).
Figure 1 The RGB cube.
Figure 2 The median-cut algorithm: (a) original rectangle; (b) after splitting
once; (c) after splitting three times; (d) output colors.
Figure 3: (a) A flawed recursive implementation of the median-cut algorithm;
(b) depth-first division as a result of the algorithm; (c) another recursive
implementation; (d) pseudocode for the nonrecursive, cube-splitting algorithm.
(a)
Split(Cube){
 if (ncubes == 4) return;
 find longest axis of Cube;
 cut Cube at median to form CubeA, CubeB;
 Split(CubeA);
 Split(CubeB);
}

(b)
Figure

(c)
maxlevel = 2;
Split(Cube,level){
 if (ncubes == 4) return;
 if (Cube's level == maxlevel) return;
 find longest axis of Cube;
 cut Cube at median to form CubeA, CubeB;
 Split(CubeA, level+1);
 Split(CubeB, level+1);
}

(d)
build initial cube from histogram;
set initial cube's level to 0;
insert initial cube in list of cubes;
ncubes = 1;
while (ncubes < maxcubes){
 search for Cube with smallest level;
 find the longest axis of Cube;
 find the median along this axis;
 cut Cube at median to form CubeA, CubeB;
 set CubeA's level = Cube's level + 1;
 set CubeB's level = Cube's level + 1;

 insert CubeA in Cube's slot;
 add CubeB to end of list of cubes;
 ncubes = ncubes + 1;
}
Figure 4 (a) Hist and HistPtr after initialization; (b) sorted on red and
split at the median; (c) CubeB1, sorted on green; (d) CubeB1, split at the
median.
Figure 6 Lamp: (a) original, true-color image; (b) 256-color version
(anonymous image-file description).
Table 1: Histogram for median-cut example.
Color (r,g)-coordinates Count 
C0 (20,40) 3
C1 (40,20) 2
C2 (5,60) 4
C3 (50,80) 2
C4 (60,30) 1
C5 (80,50) 2
Table 2: Contents of the data structures after splitting three times.
Cube HistPtr.lower HistPtr.upper Colors Enclosed Cube Centroid 
A3 0 0 C0 (20,40)
B3 1 1 C2 (5,60)
A2 2 3 C1,C4 (46.7,23.3)
B2 4 5 C5,C3 (65,65)
Example 1 Computing cube centroids.

Listing One 
/* median.c -- Anton Kruger, Copyright (c) Truda Software, 215 Marengo Rd, 
** #2, Oxford, IA 52322-9383
** Description: Contains an implementation of Heckbert's median-cut color
** quantization algorithm.
** Compilers: MSC 5.1, 6.0.
** Note: 1) Compile in large memory model. 2) Delete "#define FAST_REMAP" 
** statement below in order to deactivate fast remapping.
*/
#define FAST_REMAP
#include <stdio.h>
#include <stddef.h> /* for NULL */
#include <stdlib.h> /* for "qsort" */
#include <float.h> /* for FLT_MAX, FLT_MIN */
#define MAXCOLORS 256 /* maximum # of output colors */
#define HSIZE 32768 /* size of image histogram */
typedef unsigned char byte; /* range 0-255 */
typedef unsigned short word; /* range 0-65,535 */
typedef unsigned long dword; /* range 0-4,294,967,295 */

/* Macros for converting between (r,g,b)-colors and 15-bit */
/* colors follow. */
#define RGB(r,g,b) (word)(((b)&~7)<<7)(((g)&~7)<<2)((r)>>3)
#define RED(x) (byte)(((x)&31)<<3)
#define GREEN(x) (byte)((((x)>>5)&255)<< 3)
#define BLUE(x) (byte)((((x)>>10)&255)<< 3)

typedef struct { /* structure for a cube in color space */
 word lower; /* one corner's index in histogram */
 word upper; /* another corner's index in histogram */
 dword count; /* cube's histogram count */
 int level; /* cube's level */

 byte rmin,rmax;
 byte gmin,gmax;
 byte bmin,bmax;
} cube_t;


static cube_t list[MAXCOLORS]; /* list of cubes */
static int longdim; /* longest dimension of cube */
static word HistPtr[HSIZE]; /* points to colors in "Hist" */

void Shrink(cube_t * Cube);
void InvMap(word * Hist, byte ColMap[][3],word ncubes);
int compare(const void * a1, const void * a2);

word MedianCut(word Hist[],byte ColMap[][3], int maxcubes)
{
 /* Accepts "Hist", a 32,768-element array that contains 15-bit color counts
 ** of input image. Uses Heckbert's median-cut algorithm to divide color 
 ** space into "maxcubes" cubes, and returns centroid (average value) of each
 ** cube in ColMap. Hist is also updated so that it functions as an inverse
 ** color map. MedianCut returns the actual number of cubes, which may be 
 ** less than "maxcubes". */
 byte lr,lg,lb;
 word i,median,color;
 dword count;
 int k,level,ncubes,splitpos;
 void *base;
 size_t num,width;
 cube_t Cube,CubeA,CubeB;

 /* Create the initial cube, which is the whole RGB-cube. */
 ncubes = 0;
 Cube.count = 0;
 for (i=0,color=0;i<=HSIZE-1;i++){
 if (Hist[i] != 0){
 HistPtr[color++] = i;
 Cube.count = Cube.count + Hist[i];
 }
 }
 Cube.lower = 0; Cube.upper = color-1;
 Cube.level = 0;
 Shrink(&Cube);
 list[ncubes++] = Cube;

 /* Main loop follows. Search the list of cubes for next cube to split, which
 ** is the lowest level cube. A special case is when a cube has only one 
 ** color, so that it cannot be split. */
 while (ncubes < maxcubes){
 level = 255; splitpos = -1;
 for (k=0;k<=ncubes-1;k++){
 if (list[k].lower == list[k].upper)
 ; /* single color */
 else if (list[k].level < level){
 level = list[k].level;
 splitpos = k;
 }
 }
 if (splitpos == -1) /* no more cubes to split */
 break;

 /* Must split the cube "splitpos" in list of cubes. Next, find longest
 ** dimension of cube, and update external variable "longdim" which is 
 ** used by sort routine so that it knows along which axis to sort. */
 Cube = list[splitpos];

 lr = Cube.rmax - Cube.rmin;
 lg = Cube.gmax - Cube.gmin;
 lb = Cube.bmax - Cube.bmin;
 if (lr >= lg && lr >= lb) longdim = 0;
 if (lg >= lr && lg >= lb) longdim = 1;
 if (lb >= lr && lb >= lg) longdim = 2;

 /* Sort along "longdim". This prepares for the next step, namely finding
 ** median. Use standard lib's "qsort". */
 base = (void *)&HistPtr[Cube.lower];
 num = (size_t)(Cube.upper - Cube.lower + 1);
 width = (size_t)sizeof(HistPtr[0]);
 qsort(base,num,width,compare);

 /* Find median by scanning through cube, computing a running sum. When
 ** running sum equals half the total for cube, median has been found. */
 count = 0;
 for (i=Cube.lower;i<=Cube.upper-1;i++){
 if (count >= Cube.count/2) break;
 color = HistPtr[i];
 count = count + Hist[color];
 }
 median = i;


 /* Now split "Cube" at median. Then add two new cubes to list of cubes.*/
 CubeA = Cube; CubeA.upper = median-1;
 CubeA.count = count;
 CubeA.level = Cube.level + 1;
 Shrink(&CubeA);
 list[splitpos] = CubeA; /* add in old slot */

 CubeB = Cube; CubeB.lower = median;
 CubeB.count = Cube.count - count;
 CubeB.level = Cube.level + 1;
 Shrink(&CubeB);
 list[ncubes++] = CubeB; /* add in new slot */
 if ((ncubes % 10) == 0)
 fprintf(stderr,"."); /* pacifier */
 }

 /* We have enough cubes, or we have split all we can. Now compute the color
 ** map, inverse color map, and return number of colors in color map. */
 InvMap(Hist, ColMap,ncubes);
 return((word)ncubes);
}
void Shrink(cube_t * Cube)
{
 /* Encloses "Cube" with a tight-fitting cube by updating (rmin,gmin,bmin) 
 ** and (rmax,gmax,bmax) members of "Cube". */
 byte r,g,b;
 word i,color;

 Cube->rmin = 255; Cube->rmax = 0;
 Cube->gmin = 255; Cube->gmax = 0;
 Cube->bmin = 255; Cube->bmax = 0;
 for (i=Cube->lower;i<=Cube->upper;i++){
 color = HistPtr[i];
 r = RED(color);

 if (r > Cube->rmax) Cube->rmax = r;
 if (r < Cube->rmin) Cube->rmin = r;
 g = GREEN(color);
 if (g > Cube->gmax) Cube->gmax = g;
 if (g < Cube->gmin) Cube->gmin = g;
 b = BLUE(color);
 if (b > Cube->bmax) Cube->bmax = b;
 if (b < Cube->bmin) Cube->bmin = b;

 }
}
void InvMap(word * Hist, byte ColMap[][3],word ncubes)
{
 /* For each cube in list of cubes, computes centroid (average value) of 
 ** colors enclosed by that cube, and loads centroids in the color map. Next
 ** loads histogram with indices into the color map. A preprocessor directive
 ** #define FAST_REMAP controls whether cube centroids become output color
 ** for all the colors in a cube, or whether a "best remap" is followed. */
 byte r,g,b;
 word i,j,k,index,color;
 float rsum,gsum,bsum;
 float dr,dg,db,d,dmin;
 cube_t Cube;

 for (k=0;k<=ncubes-1;k++){
 Cube = list[k];
 rsum = gsum = bsum = (float)0.0;
 for (i=Cube.lower;i<=Cube.upper;i++){
 color = HistPtr[i];
 r = RED(color);
 rsum += (float)r*(float)Hist[color];
 g = GREEN(color);
 gsum += (float)g*(float)Hist[color];
 b = BLUE(color);
 bsum += (float)b*(float)Hist[color];
 }

 /* Update the color map */
 ColMap[k][0] = (byte)(rsum/(float)Cube.count);
 ColMap[k][1] = (byte)(gsum/(float)Cube.count);
 ColMap[k][2] = (byte)(bsum/(float)Cube.count);
 }
#ifdef FAST_REMAP
 /* Fast remap: for each color in each cube, load the corresponding slot 
 ** in "Hist" with the centroid of the cube. */
 for (k=0;k<=ncubes-1;k++){
 Cube = list[k];
 for (i=Cube.lower;i<=Cube.upper;i++){
 color = HistPtr[i];
 Hist[color] = k;
 }

 if ((k % 10) == 0) fprintf(stderr,"."); /* pacifier */
 }
#else
 /* Best remap: for each color in each cube, find entry in ColMap that has
 ** smallest Euclidian distance from color. Record this in "Hist". */
 for (k=0;k<=ncubes-1;k++){
 Cube = list[k];

 for (i=Cube.lower;i<=Cube.upper;i++){
 color = HistPtr[i];
 r = RED(color); g = GREEN(color); b = BLUE(color);

 /* Search for closest entry in "ColMap" */
 dmin = (float)FLT_MAX;
 for (j=0;j<=ncubes-1;j++){
 dr = (float)ColMap[j][0] - (float)r;
 dg = (float)ColMap[j][1] - (float)g;
 db = (float)ColMap[j][2] - (float)b;
 d = dr*dr + dg*dg + db*db;
 if (d == (float)0.0){
 index = j; break;
 }
 else if (d < dmin){
 dmin = d; index = j;
 }
 }
 Hist[color] = index;
 }
 if ((k % 10) == 0) fprintf(stderr,"."); /* pacifier */
 }
#endif
 return;
}
int compare(const void * a1, const void * a2)
{
 /* Called by the sort routine in "MedianCut". Compares two
 ** colors based on the external variable "longdim". */
 word color1,color2;
 byte c1,c2;

 color1 = (word)*(word *)a1;
 color2 = (word)*(word *)a2;
 switch (longdim){

 case 0:
 c1 = RED(color1), c2 = RED(color2);
 break;
 case 1:
 c1 = GREEN(color1), c2 = GREEN(color2);
 break;
 case 2:
 c1 = BLUE(color2), c2 = BLUE(color2);
 break;
 }
 return ((int)(c1-c2));
}














September, 1994
EPROM Emulation


Your own emulator for microcontroller development




David Mockridge


David develops client/server systems for Southern Life Assurance in South
Africa. He also writes for electronics magazines and has a special interest in
embedded-systems software. You can contact David at 13 Ayreshire Rd.,
Rondebosch East, 7700 South Africa.


One of the most tedious parts of embedded-systems development is the
"crash-and-burn" EPROM cycle, where object code is burned into an EPROM
("erasable programmable ROM") chip, which in turn is plugged into the target
system. If (or more realistically, when) code bugs appear, you have to extract
the chip, expose it to an ultraviolet EPROM eraser, and reprogram it with the
debugged program. 
While you can ease the process by keeping a small pile of chips
nearby--erasing several at a time--a better approach is to use an EPROM
emulator plugged into the EPROM socket, disguised as the real thing. As far as
the embedded system is concerned, it is the real thing. However, there's one
difference from your perspective: Once the emulator is plugged into the
EPROM's homebase, you can download your programs over and over again. When
you're satisfied with your program, you only have to burn it into an EPROM
once.


Hardware Design


This article describes the software and hardware that make up an EPROM
emulator. The emulator has several design requirements, of which low cost is
paramount. The programs it uses are created on a PC, so they have to be
transferred out of the PC's memory and into the emulator. To avoid buying an
additional interface card or serial port, I used the PC's existing parallel
printer port. This also buys some speed when downloading. Of course, if you do
a lot of printing on your PC, you may need to buy a second parallel card,
anyway. If you do get a new port and dedicate it to the emulator, you'll have
to change the base addresses in the download program, which currently uses 3BC
hex, the standard printer-port address for LPT1. Although utilities such as
Central Point's PCTools will return your printer-port addresses, you can also
find them with DEBUG using the dump command in Figure 1. The memory dump shows
two ports, LPT1 at 03BC hex and LPT2 at 0278 hex. In any case, if LPT1 on your
PC is not at 03BC hex, you'll have to replace the base addresses in the
download program. 
The download software is a short, DOS command-line utility. Although I used
Basic (it makes for fast development and readable code), the download software
is straightforward enough to easily port to your favorite language. A Windows
version could be knocked out without difficulty, with the standard file-open
dialog and a static control for a bar graph showing the file-download
progress. If you choose a language with no verb for direct-port output,
however, you might need to add a few bytes of assembler. (Incidentally, the
prototype used the no-longer-supported Borland Turbo Basic. Some minor syntax
changes converted the programs to run under Microsoft's QuickBasic 4.5.)
When you run the program, the name of the file to be downloaded is read from
the command line or prompted for, if omitted. A short routine parks the cursor
after the command-line filename and gives a running display of the bytes
downloaded, so the user can see what is happening. Since the software drives
custom hardware, I'll examine the hardware before moving on to exactly how a
download works.
Although an IBM-type PC was used for this project, any computer with access to
one and a half output ports (one port and four extra lines) could run the same
software and drive the emulator just as well.


RAM to the Core


At the core of the emulator is a RAM chip, masquerading as an EPROM. I chose
static RAM over dynamic RAM, since SRAM does not require the constant
refreshing and complex support circuitry of DRAM. Admittedly, SRAM is more
expensive, but this device only needs a single, small-capacity chip.
Figure 2 shows the emulator core, including the central RAM chip. To write a
byte, you present your data (together with an address) to the correct pins,
and then pulse the read/write line briefly from read to write state and back
again. Since the chip is normally left in read mode, if the address data is
unchanged, the new data will now appear on the I/O lines. 
Memory chips have a fair number of lines for carrying address, data, and
control information. The PC printer port only has a few. Therefore, design
problem #1 is that there aren't enough lines to do the job.


Squeezing the Printer Port


Since the nature of the data download is to always send a consecutive block of
bytes, the addresses these bytes are assigned to must start at 0 and increment
by 1 for each byte. Put another way, at download time the address information
is implicit in the data file being downloaded, so you can eliminate explicit
address data from the download process if you can provide these addresses
locally, in the emulator.
A hardware counter (Figure 2) is used to locally address the emulator's
memory. The CD4040 chip chosen is 12 bits wide. After a 1 is applied to the
reset pin, it outputs address 0. On each 0-to-1 transition at the clock pin,
the address increments by one. The clock and reset lines are driven by
software described later. If you connect the counter directly to the RAM's
address lines (a tiny local bus?), all you have to do is advance the counter
once after each byte is loaded into the RAM. 
Using a counter to address the memory simplifies expansion, since expanding
the RAM by any amount will only require more counters, leaving the two-line PC
interface unchanged.
The prototype emulator was built using a Hitachi 6116 RAM chip. Although it
only holds 2K of RAM, this was sufficient for the assembly-language controller
program being developed. If you're developing more-complex applications using
compilers, you can easily expand memory with a larger RAM and by
daisy-chaining another address counter. The hardware described here could
support either a 4K or a 2K RAM, since the counter is 12 bits wide.
The IBM-PC parallel-printer port actually consists of three physical ports
(see Table 1). They are not memory-mapped and are accessed by the 80x86 IN and
OUT port I/O instructions. Most high-level languages have an equivalent verb
for accessing these functions. Basic has an OUT verb for port output. The
three ports are addressed consecutively from the base address upwards, with
port 3BC handling printer data, while 3BD and 3BE handle printer-control input
and output, respectively. 
As with the printer BIOS routines, you send data through the base port. In
addition, you'll hijack the traditional printer control functions in the third
port to drive the emulator with custom software. 
The emulator plugs into the EPROM socket on the host system. The RAM
connections are simply brought out to a header plug, since the 6116 is
electronically compatible with a 2716 EPROM. In the prototype, the
printed-circuit board was laid out for the often-cheaper (although four times
the size) 27C64 EPROM. This presented no problem, since the 27[C]xxx family of
EPROMs are similar enough that the smaller 2716 can easily pass for a
(partially filled) larger family member, with minor header-connection changes.
Once the data is safely in the RAM, you have to cut all connections to the
emulator to prevent the interface from conflicting with the host system's. At
worst, this could mean clipping two alligator clips onto the chip's
power-supply pins, unplugging it from the emulator, and plugging it into the
host. A better solution would be some kind of software-invoked disconnection.


A Software Guillotine


Address and data lines are fed through data gateways, comprising 74LS244
octal-buffer chips. Each buffer chip has data-bus inputs that connect straight
through to an output bus. A 1 written to the buffer's output-disable pin
effectively disconnects the two buses, leaving the outputs free to follow
whatever they are connected to. Figure 3 shows how three such buffers form a
switchable channel for address, data, and control lines between RAM and
emulator. 
So the emulator will not interfere with the host. But what about the host
clashing with the emulator? A second set of buffers would do the trick, but in
practice, the host was removed by powering it down for downloads, which
conveniently triggers a hardware reset in the target system when it is powered
back up. Since the emulator has its own power supply, its data is saved
irrespective of the host.


Software to Drive a Download



Listing One shows the complete listing for EMULOAD.BAS, the download program.
(An executable version of the program is available electronically; see
"Availability," page 3.) The emulator hardware requires individual bits in the
parallel port to be manipulated, while the PC is essentially byte (or word)
oriented. Instead of maintaining the current port value and masking bits on or
off as needed, whole-byte constants were defined. This faster approach was
made possible because the set of all useful bit patterns was small enough.
The PC downloads to the emulator by writing data to the first printer-port
address, where it is latched until replaced by another write. In between, it
writes control byte constants to the third address, which fires various pieces
of emulator hardware. 
Before sending any data, the address counter is reset to zero by writing the
ResetCounter% constant. (The percentage-sign suffix denotes a Basic integer
variable.) After the initial reset, this line is left at 0 for all other
operations. 
The first byte in the download file is written to the data port. Software
generates positive-going pulses by writing 0,1,0 to the required pin. This
technique is used in successive writes of the ReadState% and WriteState%
constants. They flip the RAM's write line to place your byte inside the chip.
ClockRising% then similarly advances the address counter. This process will
repeat until end of file is reached. Finally, the DisableCard% constant is
written to disable the buffers that carry emulator address and data lines. 


Software for Debugging


Construction of the prototype produced a rather dense mass of wiring, with a
proportionately high chance of construction errors. An advantage of hybrid
software/hardware projects is that software can sniff out hardware problems
for you. Hardware buffs may argue otherwise, but for me, it beats holding the
device in front of your nose with a magnifying glass or spending hours with a
continuity tester and circuit diagram. With this in mind, some short and
simple programs (such as TESTEM1.BAS in Listing Two which tests individual
lines) are used to pulse individual lines slowly, flip the counter over, and
kick the latches in and out of circuit. I discovered several board-wiring
errors and one cable error using this program. Fortunately, none of these were
fatal for the chips. (An executable version of the testing program is
available electronically.) 


Output Protection


The buffer chips have output protection; they can field nasty spikes in the
power supply and various overload conditions. As I found out, they also go
into a dead short if you connect the buffer inputs directly to ground. All
unused inputs on any MOS chips should be grounded through a current-limiting
resistor (around 10K, for example).


The OE Line


Once the prototype was built, tested and ready, I plugged it into an 8051
embedded system, where...nothing happened. I then plugged it into a
breadboard, where it functioned perfectly. Test programs wrote data into every
byte of memory, from which they all read back faultlessly using external
circuitry. After suspecting and replacing everything but the programmer, a
careful review of the 8051 architecture revealed the problem. The OE (output
enable) pin on a real EPROM is used to turn the EPROM's data outputs off when
the CPU does not want it on the bus. This control line exists on the 6116 RAM,
but was not in the correct place on the header plug. Once this was properly
placed, the 8051 board could turn the emulator off whenever it wanted the
busses for itself. It worked. 
After a few hours, however, the emulator stopped working again. After much
digging around, it emerged that any programs larger than a quarter of the
total memory size did not work correctly. The two most-significant address
lines had been swapped around. Downloaded programs were being written to
memory quadrants in the order 0213, instead of 0123, so the problem only
occurred with programs larger than a quarter of memory.


Have Chip, Will Travel


While the emulator was designed for an 8051-type CPU embedded system, freeing
the design from any particular host has clear portability advantages. The
final emulator design supports any conceivable EPROM use, across all
microprocessors, through to straight hardware applications like a character
generator or decoder. This is why the emulator has its own power supply and is
not connected to the host system in any way besides plugging into the target
EPROM socket.
Although described here as a stand-alone device, I originally designed
theemulator for integration into an embedded-system development tool, where it
would be compiled as a subroutine, callable from an integrated-development
environment menu bar.
Choice of RAM type may be critical for high-speed systems. Careful
consideration of system-clock speed versus chip-propagation delay time is
advised. The prototype system ticked along comfortably with a 16-MHz system
clock and a Hitachi HM6116LP-3 (a 300ns RAM).
Future enhancements for the emulator could include a verification facility and
software-switched activation of the host, although the current arrangement of
flipping a switch is not an intolerable burden. It would be nice to expand the
emulator to the full 64K address space of the 8031 by expanding the address
counter. A new counter's clock input would connect to the existing counter's
most-significant address bit. If more than one RAM chip is used, an
address-decoding block will be required to make sure that only the chip being
addressed is enabled. The prototype's chip-select input was tied permanently
low, since it was the only chip in the system. In a system with only two RAMs,
the counter line one bit above the most-significant address bit could be
inverted to provide a chip select for a second chip, with the current chip
enable being driven from the same line.
After using an EPROM emulator, you'll never want to return to the
"crash-and-burn" method. Does anyone out there want to buy some second-hand
EPROMs?


References


8-Bit Microcontrollers. Matra Harris MHS, Nantes, Cedex, France, 1989. 
CMOS Data Book. National Semiconductor Corp., Santa Clara, CA.
IC Memory Products 1986. Hitachi Electronic Components Europe GmbH, Munich,
Germany, 1986. 
MOS Memory Data Book. National Semiconductor Corp., Santa Clara, CA, 1984. 
Veary, Trevor. "BITPNET: Build Your Own Inexpensive Parallel Network." BIT
Magazine (March 1983).
Figure 1: Use this DEBUG command to find your printer-port addresses.
C:\> DEBUG
-d 40:8 L 8
0040:0008 BC 03 78 02 00 00 00 00
-q
Figure 2 The emulator core.
Table 1: The PC printer port.
 Port Bits Printer function D-connector pins Emulator function 
 0 0..7 Data output 2..9 Data input
 1 0..7 Control inputs -- Unused
 2 0 Strobe 1 Address clock
 -- 1 Auto linefeed 14 Read/write
 -- 2 Initialize 16 Address reset
 -- 3 Select 17 Enable
Figure 3 The complete emulator.


Listing One 

' EMULOAD.BAS EPROM emulator downloader by David Mockridge Copyright (C) 
' Downloads a file to dedicated EPROM emulator hardware attached to PC
' parallel printer port. Current file addr displayed during download.

' Subroutine declaration
DECLARE SUB GetCursor (pCursorX%, pCursorY%)

'----- Main ------
GOSUB Initialize
GOSUB Download
END

'----- Initialize -------
' PC standard parallel printer port base addresses for Lpt1:
PrinterData% = &H3BC
PrinterRead% = PrinterData% + 1
PrinterCtrl% = PrinterData% + 2
' Control line constants
ResetCounter% = 13 ' Reset on, Card enabled, RAM Read Clock off
ReadState% = 9 ' Reset off, Card enabled, RAM Read, Clock off
WriteState% = 11 ' Reset off, Card enabled, RAM Write, Clock off
ClockRising% = 8 ' Reset off, Card enabled, RAM Read, Clock on
DisableCard% = 1 ' Reset off, Card disabled, RAM Read, Clock off
' Other constants
AppName$ = "EMULOAD" ' Name of this program as appears on DOS cmd line
ChunkSize% = 64 ' n Bytes after which addr. display is updated
NullString$ = ""
Quote$ = CHR$(34)
GOSUB FindProgNam ' Find end of program name on screen
LOCATE CursorY%, CursorX% ' Park cursor there
FileName$ = COMMAND$ ' Read name of file to download, from command line
IF FileName$ = NullString$ THEN
 ' No file specified so prompt for it
 INPUT "Enter filename to download "; FileName$
 LOCATE CursorY%, CursorX%
 PRINT FileName$; " ";
 CursorX% = CursorX% + 1
ELSE
 CursorX% = CursorX% + LEN(FileName$) + 1
 LOCATE CursorY%, CursorX% ' Park cursor past filename
END IF
RETURN
'---- FindProgNam: Find column where .exe cmd calling this program ends ----
CALL GetCursor(CursorX%, CursorY%)
IF CursorY% > 1 THEN CursorY% = CursorY% - 1 ' Go up to command line
' Build a string with current prompt and command line
CmdLine$ = NullString$
FOR NextCol% = 1 TO 40
 CmdLine$ = CmdLine$ + CHR$(SCREEN(CursorY%, NextCol%))
NEXT NextCol%
CmdLine$ = UCASE$(CmdLine$)
' Now search this string for download command
NameStarts% = INSTR(CmdLine$, AppName$)
IF NameStarts% > 0 THEN
 CursorX% = NameStarts% + LEN(AppName$) + 1
END IF

RETURN
'----- Download: Open file, download to emulator hardware -----
FileByte$ = " " ' Set size of file i/o buffer to one byte
OPEN FileName$ FOR BINARY AS #1 LEN = 1
 FileLen% = LOF(1)
 IF FileLen% = 0 THEN
 PRINT "Could not open file "; Quote$; FileName$; Quote$;
 PRINT " - 0 bytes downloaded.";
 PLAY "T255DCBA" ' Error Fanfare
 END ' End right here
 ELSE
 PRINT "Downloading "; FileLen%; " bytes. ";
 ' Reset address counter
 OUT PrinterCtrl%, ResetCounter%
 PRINT "Address: 0";
 CALL GetCursor(CursorX%, CursorY%)
 CursorX% = CursorX% - 1 ' Backspace past "0"
 Chunk% = 0
 FOR Addr% = 0 TO FileLen% - 1
 ' Update current address display only every ChunkSize% bytes
 Chunk% = Chunk% + 1
 IF Chunk% > ChunkSize% THEN
 Chunk% = 0
 LOCATE CursorY%, CursorX%
 PRINT Addr%;
 END IF
 GET #1, , FileByte$ ' Read from file buffer
 DataByte% = ASC(FileByte$)
 ' Manipulate emulator hardware using control lines
 OUT PrinterData%, DataByte% ' Set up data
 OUT PrinterCtrl%, ReadState% ' Write pulse. (0 = Write)
 OUT PrinterCtrl%, WriteState%
 OUT PrinterCtrl%, ReadState%
 OUT PrinterCtrl%, ClockRising% ' Now advance address counter
 OUT PrinterCtrl%, ReadState%
 NEXT Addr%
 ' Update final addr display if file not a multiple of chunk size
 IF Chunk% <> 0 THEN
 LOCATE CursorY%, CursorX%
 PRINT Addr%;
 END IF
 END IF
CLOSE
' Disable emulator so CPU card can take over the RAM chip
OUT PrinterCtrl%, DisableCard%
PLAY "T255AE" ' Short bleep to say we're done.
PRINT "-Done.-";
END
SUB GetCursor (pCursorX%, pCursorY%)
 pCursorY% = CSRLIN ' Save current cursor column
 pCursorX% = POS(0) ' Save current cursor row
END SUB




Listing Two

' TESTEM1.BAS Program to test emulator hardware on printer port lines

' by David Mockridge Copyright (C). Control line 2 (constant value 4, the
reset
' line) is true; all other lines are inverted by the hardware.

' Subroutine declarations
DECLARE SUB ShowPorts (CtrlPort%, DatPort%)
DECLARE SUB GetKbd (KbdKey$)

NullString$ = ""
ControlPort% = 0
DataPort% = 0

' Expected port addresses for Lpt1:
PrinterData% = &H3BC
PrinterRead% = PrinterData% + 1
PrinterControl% = PrinterData% + 2
' Init Control lines
AddrClock% = 1 ' Inverted
ReadWrite% = 2 ' Inverted
ResetAddr% = 4 ' True
Enable% = 8 ' Inverted
' Print menu of line testing options
CLS
StartCol% = 21
StartRow% = 2
LOCATE StartRow%, StartCol%: PRINT "** Printer port line tester **"
LOCATE StartRow% + 3, StartCol%: PRINT "R To raise addr reset line"
LOCATE StartRow% + 4, StartCol%: PRINT "A To clock address counter"
LOCATE StartRow% + 5, StartCol%: PRINT "W To pulse write line"
LOCATE StartRow% + 6, StartCol%: PRINT "D To raise disable line"
FOR LineNo% = StartRow% + 8 TO StartRow% + 15
 LOCATE LineNo%, StartCol%
 BitName$ = CHR$(ASC("0") + LineNo% - 10)
 PRINT BitName$; " To raise line for data bit "; BitName$
NEXT LineNo%
LOCATE StartRow% + 17, StartCol%: PRINT "F To flash all lines x5"
LOCATE StartRow% + 19, StartCol%: PRINT "Q To quit "
CALL GetKbd(KeyHit$) ' Which key was hit?
WHILE KeyHit$ <> "Q"
 SELECT CASE KeyHit$
 CASE "R": ControlPort% = ResetAddr%
 CASE "A": ControlPort% = AddrClock%
 CASE "W": ControlPort% = ReadWrite%
 CASE "D": ControlPort% = Enable%
 CASE "0" TO "7": DataPort% = 2 ^ (ASC(KeyHit$) - ASC("0"))
 CASE "F"
 ' Flash all lines on and off 5 times
 FOR Flash% = 1 TO 5
 OUT PrinterControl%, &HFF
 OUT PrinterData%, &HFF
 CALL ShowPorts(&HFF, &HFF)
 SLEEP 1 ' Delay 1 second
 OUT PrinterControl%, &H0
 OUT PrinterData%, &H0
 CALL ShowPorts(0, 0)
 SLEEP 1
 NEXT Flash%
 END SELECT
 ' Write data to ports
 OUT PrinterControl%, ControlPort%

 OUT PrinterData%, DataPort%
 CALL ShowPorts(ControlPort%, DataPort%)
 CALL GetKbd(KeyHit$)
WEND
END
' Get key hit, after waiting until key hit
SUB GetKbd (KbdKey$)
 KbdKey$ = ""
 WHILE KbdKey$ = ""
 KbdKey$ = INKEY$
 WEND
 KbdKey$ = UCASE$(KbdKey$)
END SUB
' Print current port values
SUB ShowPorts (CtrlPort%, DatPort%)
 LOCATE 23, 7
 PRINT "Current control port value : "; HEX$(CtrlPort%); " "
 LOCATE 23, 38
 PRINT ". Current data port value : "; HEX$(DatPort%); " "
END SUB










































September, 1994
A Print Filter for UNIX


More power for your LaserJet 4M printer




Michael A. Covington and Mark Juric


Michael is an associate research scientist and manages the
artificial-intelligence lab at the University of Georgia. He is the author of
Natural Language Processing for Prolog Programmers (Prentice-Hall, 1994). Mark
is a master's degree candidate and Sun system administrator at the University
of Georgia. His specialties are genetic algorithms and neural networks.
Contact the authors at mcovingt@ai.uga.edu.


Printers are getting smarter. The Hewlett Packard LaserJet 4M, for example,
can print in three modes: PostScript, HP-control code, and plain ASCII. In
this article, we will present lj4m, a UNIX print filter that enhances the
power of this versatile printer and keeps it out of mischief. Among other
things, our program performs the following:
Distinguishes PostScript, HP, and ASCII code, kicking the printer into the
right mode for each. (The printer can supposedly do this itself, but we had
problems with it mistaking Prolog source listings for PostScript code.)
Supplies a Ctrl-D to properly terminate every PostScript job.
Bails out gracefully if a user tries to print an unprintable binary file
(essential in a student lab, where such mistakes are common).
Displays the user's name on the printer control panel (see Figure 1).
Logs every print job in /var/adm/lpd-errs (or a location specified in
/etc/syslog.conf).


Print Filtering


Our lj4m filter was developed under SunOS 4.1.3, but it should work under any
UNIX-like operating system that uses /etc/printcap in the conventional way,
including Linux.
Like any other filter, a print filter copies standard input to standard
output, making appropriate changes along the way. The filter resides in any
convenient directory (we used /etc), has global execute permissions, and is
invoked in the printer's /etc/printcap entry.
Example 1(a) shows the printcap entry for our LaserJet 4M. Decoded, this
means: "The default printer (lp) is connected to /dev/ttyb at 19,200 baud,
with the litout and ixon stty parameters. Suppress formfeeds, suppress
headers, allow unlimited length files, and communicate with /dev/ttyb in
read/write mode. Filter each print job through /etc/lj4m (our program); use
/var/spool/lpd as the spool directory; and use /dev/null as the accounting
file."
Why use /dev/null as an accounting file? Because we want accounting
information to be supplied to the filter (as command-line parameters; more
about this in a moment), but we don't actually need an accounting file. If the
af= line were not there, the filter would not receive accounting information.
Notice that our program is an input filter (if=), not an output filter (of=).
The difference is that an input filter is started afresh for every file sent
to the printer, while an output filter is started only once for a whole series
of jobs and cannot handle the individual jobs separately.
For printing through a network, the input filter resides and runs only on the
print server. The other machines send their print jobs to the server through
the rp= printcap field and do not need if= fields.


What Kind of File am I?


Listing One is the C source code for the lj4m filter. The heart of the program
is main(), which classifies each job by looking at its first two characters:
%! for PostScript, Esc-E or Esc-% for Hewlett Packard code. Anything else is
presumed to be plain ASCII. PostScript and HP code are just copied to the
printer, preceded and followed by appropriate printer commands. 
ASCII jobs are more complicated to handle, for two reasons. First, ASCII lines
must be truncated at column 80 to keep them from wrapping around and throwing
page headers out of sync with the pages. Consider what happens when a user
pipes a file to lpr --p. The --p option generates a heading every 66 lines. If
any line wraps around to the next, there will not be 66 lines on the page, and
the headers will start shifting. This is not an arcane situation; it arises
whenever anyone prints e-mail or netnews, because the headers almost always
contain long lines.
We chose, therefore, to truncate at column 80 and underline the last character
of each line that was cut off. In counting columns, our program recognizes
that Ctrl-H is a backspace and that Return, New Line, and Form Feed return the
print position to column 1. Users can still print long lines by using fold to
wrap them before the file gets divided into pages: fold < myfile lpr --p.
Second, any putative ASCII file may turn out to contain unprintable data. The
filter uses a table of printable characters stored in an array. The printable
characters are considered to be codes 32--126, plus Ctrl-D (UNIX end-of-file),
Ctrl-G (bell), Ctrl-H (backspace), Ctrl-I (tab), Ctrl-J (new line), Ctrl-L
(form feed), Ctrl-M (return), and Ctrl-Z (MS-DOS end-of-file mark, often
present in uploaded text files). This table can, of course, be altered to fit
local requirements.
When it hits an unprintable character, the filter resets the printer, prints a
message, dumps 30 lines of data in (hopefully) readable form, and terminates
the job; see Figure 2.


Printer Control


The printer-control codes used by the filter are #defined at the beginning of
the program. Several are familiar LaserJet PCL code sequences that begin with
Escape (octal 033): Esc E to reset the printer, Esc & k 2 G to tell the
printer to insert returns before all line feeds, and similar-looking codes to
select an appropriate font and margins for 80x66 ASCII printing.
However, to select or deselect PostScript and manipulate the display on the
printer console, it is necessary to use HP's higher-level Printer Job Language
(PJL). Sequences of PJL commands are introduced and terminated by the string
Esc%--12345X (no spaces between characters), presumably chosen because it is
unlikely to appear in a print job. The first command in a sequence must be
@PJL; other commands used in this program are listed in Example 1(b). The last
of these, of course, displays a string on the printer console.


Identifying the User


The print filter receives command-line arguments from the system. For example,
if the print filter is named lj4m, it executes as if invoked by a command like
this: lj4m --w132 --l66 --i0 --n username --h hostname /dev/null.
The first three arguments give nominal values for page width, page length, and
extra indentation; our filter, like most, ignores them. The arguments we use
are the fifth and seventh, which identify the user and the host; the program
puts them together into a string, hostname:username, which is used in the log
and on the printer display. If the printcap entry does not contain an af=
field, the filter does not receive these arguments, and it displays "Unknown
username" instead.



Writing in the Log 


We chose to log all print jobs in the LPD error log rather than in a separate
accounting file. Accordingly, our program outputs log entries using syslog()
rather than ordinary file-output routines.Actually, three function calls are
involved. First, openlog("LaserJet 4M",0,LOG_LPR); specifies that the program
will be writing on the printer log and that each message will be preceded by
"LaserJet 4M:" (complete with the UNIX-supplied colon). Next, calls of the
form syslog(LOG_DEBUG,"format string",arg,arg_); actually write the log
entries. The first argument specifies the priority of the message; the
priorities we use are LOG_DEBUG for normal events and LOG_ERR for errors. The
format string and subsequent arguments work just like those of printf().
Finally, closelog(); closes the log.
Where do the messages go? Wherever /etc/syslog.conf says they should. For
example, our /etc/syslog.conf contains, among other things, the code in
Example 1(c). ERR messages from any process get written on the console and on
/var/adm/messages; DEBUG messages from the printer daemon (including those
from our filter) get written on /var/adm/lpd-errs. 


Further Possibilities


A print filter like this can easily be extended. One obvious possibility is to
extract the %%Pages: comments in PostScript files and thereby log the number
of pages that each job claims to have. Another possibility is to perform
additional diagnosis on every unprintable file: Identify it (perhaps even
using "file", or at least /etc/magic) and print a more meaningful message for
the user.
Unlike ordinary filters, print filters are allowed to lseek() (reposition)
their input files under some flavors of UNIX; this raises the possibility of a
two-pass filter. For example, a filter could read a whole ASCII file,
determine the maximum line length, and set the typeface and margins
accordingly.
We encourage you to customize this filter rather than install it unaltered.
Every printer and every printing situation has its own needs; an intelligent
print filter can go a long way toward giving users what they want in every
situation, even if they don't explicitly ask for it.
Figure 1 PJL commands make the printer display user's name and machine. 
Example 1: (a) Printcap entry for LaserJet 4M; (b) sample commands used in the
lj4m program; (c) logging messages.
(a)
lpLaserJet 4M:\
 :lp=/dev/ttyb:\
 :br#19200:ms=litout,ixon:\
 :sf:sh:mx#0:rw:\
 :if=/etc/lj4m:\
 :sd=/var/spool/lpd:\
 :af=/dev/null:

(b)
@PJL ENTER LANGUAGE = POSTSCRIPT
@PJL ENTER LANGUAGE = PCL
@PJL RDYMSG DISPLAY = "your string here"

(c)
*.err /dev/console
*.err /var/adm/messages
lpr.debug /var/adm/lpd-errs
Figure 2: Sample dump produced when a file is found to be unprintable.
aisun1:mcovingt
Unprintable data! Partial dump follows...
..... ... ........... ......... ....@...D.* .......@......"........b.........
...@.......@.......@....#. @......@..........h/......(......".@..K.. .......\@
..K.. .. ...O.......^..a....K......`...d...h......../......H.. ... .. ......
...@..4.. ............d.. .. .........@..+..../......t......".@..#.. .......h.
......./......H.......... .. .........@..... ....p.....#..#...#.......b..#...
.....`..#........`d.#.... ...... .............. ....... G...... ... .........
......... .......... ... ... ... ... ...@./usr/lib/ld.so./dev/zero.......crt0:
 no /usr/lib/ld.so.......&crt0: /usr/lib/ld.so mapping failure........crt0: no
/dev/zero......%d %d.....@..Z...... ..'........ c... ...... ..'....... c.....
................. ..............b.........@..e..........`......'..............
.`......'............................... ........... ...`... ....... ..`... ..
.`...@... ...................... ..*`...............?.2.......................
. ...`....... .......... ....... ...............#@@..*.. ..................`..
. ....... ......"`..............@l..........@x..........@...........@.........
.@........L...................................................................
..............................................................................
.............'H.........@...........@........... ......P........%..........,.
....."....2..d<..@...;..M3..@....E...>.......K../...$....S..3.......[..b1..#@.
..n..W...@....}.....@.......%...@.......b!..#.......S...@.......e>..._etext._e
data._end.start.start_float.__exit._main._environ._DYNAMIC._exit.__main._print
f.___do_global_dtors.__DTOR_LIST__.__exit_dummy_ref.__exit_dummy_decl.__do_glo
bal_ctors.___CTOR_LIST_._on_exit...../usr/local/lib/gcc-lib/sparc-sun4-sunos4.
1.3/2.5.2:/usr/local/lib......... .............."............c.dl.............

..............................................................................
..............................................................................
..............................................................................
..............................................................................
..............................................................................
..............................................................................
..............................................................................

Listing One 

/* lj4m.c -- Michael A. Covington and Mark L. Juric, 1994
 Pre-spooler filter for LaserJet 4M. Compile with gcc or ANSI C.
 Install with the "if=" (not "of=") option in /etc/printcap.
 Print jobs are logged as lpr.debug messages.
*/

#include <syslog.h>
#include <stdio.h>
#include <string.h>

#define CTRLD "\004"
#define LINELENGTH 80

/*** HP LaserJet control sequences ***/
#define RESET "\033E"
#define LF_TO_CRLF "\033&k2G"
#define LMARGIN "\033&a11L"
#define FONT "\033(s0p(s12h(s4b(s4099T"
#define START_PJL "\033%-12345X@PJL\n"
#define END_PJL "\033%-12345X"
#define POSTSCRIPT "@PJL ENTER LANGUAGE = POSTSCRIPT\n"
#define PCL "@PJL ENTER LANGUAGE = PCL\n"


/*** Global variables ***/
int c0, c; /* first 2 chars of file */
long int bytes; /* character count */
char userinfo[64] = ""; /* will be "machine:username" */

/*** Printer console display functions ***/
void DisplayOnPrinterConsole(char *s)
{
 fprintf(stdout,"%s%s%s%s%s%s%s",
 RESET,
 START_PJL,
 "@PJL RDYMSG DISPLAY = \"",
 s,
 "\"\n",
 END_PJL,
 RESET);
}
void ResetPrinterConsole() /* to display "00 READY" */
{
 fprintf(stdout,"%s%s%s%s%s",
 RESET,
 START_PJL,
 "@PJL RDYMSG DISPLAY = \"\"\n",
 END_PJL,
 RESET);

}

/*** PostScript and HP file handling ***/
void PrintPostScriptFile()
{
 /* Choose language */
 fprintf(stdout,"%s%s%s",RESET,START_PJL,POSTSCRIPT);
 /* Transmit file transparently */
 putc(c0,stdout);
 for (bytes=1; !feof(stdin); bytes++)
 {
 putc(c,stdout);
 c = getc(stdin);
 }
 /* Add newline, Ctrl-D, and reset at end */
 fprintf(stdout,"\n%s%s%s",CTRLD,END_PJL,RESET);
 /* Log results */
 syslog(LOG_DEBUG,"%s, PostScript file, %d bytes",userinfo,bytes);
}
void PrintHPFile()
{
 /* Choose language */
 fprintf(stdout,"%s%s%s",RESET,START_PJL,PCL);
 /* Transmit file transparently */
 putc(c0,stdout);
 for (bytes=1; !feof(stdin); bytes++)
 {
 putc(c,stdout);
 c = getc(stdin);
 }
 /* Reset printer at end */
 fprintf(stdout,"%s%s",END_PJL,RESET);
 /* Log results */
 syslog(LOG_DEBUG,"%s, HP file, %d bytes",userinfo,bytes);
}

/*** ASCII and unprintable file handling ***/
#define PRINTABLE(c) (printable[(unsigned char) c])
char printable[256] =
 /* Table of which ASCII codes are printable characters */
 { 0,0,0,0,1,0,0,1,1,1,1,0,1,1,0,0, /* NUL to ^O */
 0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0, /* ^P to 31 */
 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, /* 32 to 47 */
 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, /* 48 to 63 */
 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, /* 64 to 79 */
 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, /* 80 to 95 */
 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, /* 96 to 112 */
 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0, /* 113 to 127 */
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 128 to 143 */
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 144 to 159 */
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 160 to 175 */
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 176 to 191 */
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 192 to 207 */
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 208 to 223 */
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 224 to 239 */
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 }; /* 240 to 255 */
 /* -1 (eof) maps onto 255 */
void RejectFileAsUnprintable()
{

 /* Set up printer again because it may have been disrupted */
 fprintf(stdout,"\n%s%s%s%s%s%s",
 RESET,
 START_PJL,
 PCL,
 LF_TO_CRLF,

 FONT,
 LMARGIN);
 /* Dump 30 lines, printing '.' for unprintable characters */
 fprintf(stdout, "%s\nUnprintable data! Partial dump follows...\n", userinfo);
 for (bytes=1; bytes<LINELENGTH*30+1; bytes++)
 {
 if (feof(stdin)) break;
 if (c<32 c>126) c = '.';
 fputc(c,stdout);
 if (bytes % LINELENGTH == 0) fputc('\n',stdout);
 c = getc(stdin);
 }
 /* Log the error */
 syslog(LOG_ERR, "LJ4M: %s: tried to print unprintable data\n",userinfo);
}
void PrintASCIIFile()
{
 int position = 1; /* where the next char on the line will be */
 int ok_to_print; /* true if no bad chars found */
 /* Set up printer */
 fprintf(stdout,"%s%s%s%s%s%s",
 RESET,
 START_PJL,
 PCL,
 LF_TO_CRLF,
 FONT,
 LMARGIN);
 /* Deal with the first character already read */
 ok_to_print = PRINTABLE(c0) && PRINTABLE(c);
 if (ok_to_print && (c0 != EOF)) putc(c0,stdout);
 /* Process rest of file, breaking at column 80 and
 underlining last character if line continues beyond */
 for(bytes=1; ok_to_print && !feof(stdin); bytes++)
 {
 if (c==4 c==26) break; /* Skip UNIX or DOS EOF mark */
 /* Compute where c will print */
 if (c==10 c==12 c==13) position=0; /* CR, FF, or LF */
 else if (c==8 && position>0) position--; /* Backspace */
 else position++;
 /* If in a printable column, print it */

 if (position <= LINELENGTH) fputc(c,stdout);
 /* If we have just run past margin, underline last character */
 if (position == LINELENGTH+1) fputs("\b_",stdout);
 /* Obtain and check next character */
 c = getc(stdin);
 ok_to_print = PRINTABLE(c);
 }
 /* If a bad byte was found, print messages and dump */
 if(!ok_to_print) RejectFileAsUnprintable();
 /* Reset printer at end */
 fprintf(stdout,"%s%s",END_PJL,RESET);

 /* If normal termination, report results */
 if (ok_to_print)
 syslog(LOG_DEBUG,"%s, ASCII file, %d bytes",userinfo,bytes);
}

/*** Main program ***/
main(int argc, char* argv[])
{
 /* Obtain machine name and user name from cmd line args */
 if (argc>8)
 sprintf(userinfo,"%s:%s",argv[7],argv[5]);
 else
 strcpy(userinfo,"Unknown username");
 DisplayOnPrinterConsole(userinfo);
 openlog("LaserJet 4M",0,LOG_LPR);
 /* Examine first 2 bytes, decide how to handle file */
 c0 = getc(stdin);
 c = getc(stdin);
 if (c0=='%' && c=='!')
 PrintPostScriptFile();
 else if (c0==27 && (c=='E' c=='%'))
 PrintHPFile();
 else
 PrintASCIIFile();
 /* Clean up */
 ResetPrinterConsole();
 closelog();
 return(0); /* UNIX insists on return code 0. */
}

































September, 1994
Examining OS/2 2.1 Executable File Formats


An inside look at 32-bit LX-style executables




John Rodley


John is president of AJR Co., located in Cambridge, MA and can be contacted on
CompuServe at 72607,3142 or at john.rodley@channel1.com.


Executable files are the end result of a massive collaboration of makefiles,
source files, include files, compiler flags, linker flags, environment
variables, definition files, resource files, and even source-control flags.
Get something wrong in one of them, and even the prettiest algorithm morphs
into a UAE. 
Windows programmers who've forgotten to export a dialog window procedure, for
instance, know all about this--the dialog comes up, runs (sort of), then
crashes. You know you have to link with the DLL version of the run-time
library (LIBCDLL.DLL), but you have this nagging suspicion you might still be
catching the static run-time library. What you need is a tool that looks into
the executable, telling you exactly what's going on inside by first dumping
the Resident and Non-Resident Name Tables where the offending window procedure
should appear, then the Imported Name Table where LLIBCDLL.DLL should appear.
SHOWEXE.C does just this. The original version of SHOWEXE, which exploded
NE-style executables, was written by David Schmitt (PC Tech Journal, November,
1988). The updated version I present here does the same for 32-bit,
flat-memory-model, LX-style executables. Although LX is documented in the IBM
OS/2 32-bit Object Module Format and Linear Executable Module Format
(available on CompuServe, type GO OS2SUP, library 17, OMF.ZIP), I've yet to
run across any NE documentation. Consequently, I relied on Schmitt's article,
my debugger, and lots of experimentation to write this update of SHOWEXE. 


Headers


An NE or LX .EXE file always contains the old DOS 2.1 MZ .EXE and the new NE,
or LX .EXE. The MZ .EXE (so called for the two ASCII bytes at offset 0 in the
file) contains the DOS stub program that prints the message "This program
requires Microsoft Windows." The MZ header contains a pointer to the NE or LX
header (see lfanew in Listing One) that is the file offset of the new style
.EXE header. Listing One shows a simplified MZ header structure that gets you
the new .EXE file offset. Between the MZ header and file offset lfanew, there
may reside an actual DOS program of variable size. 
The ASCII chars at lfanew contain the executable format specifier: NE, LE, LX,
or PE (for Windows NT). NE indicates a 16-bit, segmented Windows or OS/2 .EXE;
LX a 32-bit, flat-model OS/2 2.1 .EXE. Windows 3.1 is made up almost entirely
of NE executables, while OS/2 2.1 contains a mix of NE and LX executables.
Table 1 shows the file types of some of the files delivered with Windows 3.1
and OS/2 2.1. SMARTDRV and EMM386 are the only LE executables I have found in
either of these systems.
A look at the header flags shows that there are several moves in the direction
of cross-processor portability in LX, the most significant being the Byte
Order and Word Order specifiers. Occurring directly after the Executable Type
specifier (to minimize the amount of wrong-endian processing any loader might
have to do), these allow either Big- or Little-endian byte and word orders
over the entire rest of the executable file. Processor Type and OS Type
specifiers also now each get 16 bits of their own, where NE relegated them to
a couple of bits each in the Flags word. Interestingly, the Memory Page Size,
fixed at 4K on Intel x86 CPUs, is also parameterized in the header, presumably
to vary over different hardware platforms.
NE and LX take different approaches to preserving executable integrity. NE
allowed space for a 32-bit file checksum in the header. This was intended as a
layer of protection against viruses, to be checked off-line via a separate
virus checker. LX takes a finer-grained, load-time approach to executable
integrity. There are individual checksums for each 4K page as well as the
Fixup Section, the Non-Resident Name Table, and the Loader Section, which
includes all tables except the Non-Resident Name Table. In sum, these cover
the entire file; individually, they allow the loader to do much less expensive
checksums against small pieces of the file as they're loaded, rather than
doing the whole file at once as the NE checksum requires.
Initial values in both models are much what you'd expect: code segment/offset,
data segment/offset, stack size, heap size, and so on. They change from 16-bit
values in NE to 32-bit values in LX, but their intent remains the same.


Segments, Pages, and Objects


The real difference between OS/2 1.x and OS/2 2.x is the shift from the 16-bit
segmented-memory model to the 32-bit, flat-memory model. This shift is best
reflected in the NE Segment Table and its LX analogs, the Object and Object
Page Tables. 
Listing Two has two global arrays of 32K integers, with a single reference to
each array. When compiled and linked as an NE .EXE using Microsoft C 6.x, the
linker produces one code segment from the example program, and three data
segments: a distinct segment for each array (segments 3 and 4) and one for the
Auto-DS Segment (segment 5). The same program linked 32-bit under Borland C++
for OS/2 produces the list of objects and pages shown in Table 5. As with NE,
you get only one code object that has one page, but the two large arrays
produce only one data object made up of 64 zero-filled, 4K pages (32,767
ints*4 bytes per int262,136 bytes/4096 bytes/page=64 pages). Thus, LX
replaces the NE Segment Table with the Object and Object Page Tables. 
An NE Segment Table entry contains a set of flags (READ/WRITE/EXECUTE_), the
file location of the Segment image, the file size of the segment, and its size
in memory. An LX Object Table entry looks much the same, containing the
object's size and attributes (read/write/execute_), a count of pages that make
up the object, and a pointer into the Object Page Table where this object's
first page is described (and the rest are stored consecutively). Listing One
shows the structures of NE's Segment Table and LX's Object and Object Page
Table entries.
For these objects/segments to make a coherent program, the code object/segment
has to be able to access addresses in the other objects/segments.
Relocation--the process of connecting references within a lump of executable
code to things outside that lump--consists of the target, source, and Fixup
Record. The target is the place in the code where a symbolic reference must be
replaced by a real address (the target of the relocation process), the source
is where the real address can be found, and the Fixup Record links the target
to the source. The Fixup Record Table is a list of Fixup Records for a
particular segment/object.
External fixups are references that resolve to something outside this
executable, typically DLL calls. Internal fixups are references to objects
within this executable, typically references to the data segment. To discuss
32- versus 16-bit executables, I'll first examine internal fixups. 
NE and LX structure the relocation data differently. NE attaches a separate
Fixup Record Table to each segment that needs one. If a segment contains
targets, NE physically appends a Fixup Record Table to the end of that segment
and sets the Relocations Available bit in the segment description in the
Segment Table. The fixup records themselves contain the offset of the target
and segment number and offset of the source.
LX linkers place a single Fixup Record Table in the header. Each page in the
Object Page Table contains an index into the Fixup Record Table pointing to
the first fixup record for that page. The linker stores all the Fixup Records
for that page consecutively, right up to the first fixup record for the next
page. The fixup record contains the offset of the target and the object number
and offset of the source. The target itself contains only the offset of the
source. When loading a particular page, the loader runs through the Fixup
Record Table, reading the attributes of the target and source, the
source-object number, and the location of the target from the fixup record.
Then it gets the actual offset of the source within the source object from the
contents of the target.
While their fixup records are roughly equivalent, NE requires one record for
each source, while LX uses the more obvious one-per-target strategy. Listing
Three has three large arrays and two references to each array. NE produces
three fixup records, while LX produces six; see Table 3. Obviously, NE is more
load-time efficient, while LX is more run-time efficient. NE also allows for
chaining of targets.
External fixups refer to objects located physically outside the executable,
most commonly DLL calls. Both NE and LX executables support import-by-ordinal
and import-by-name external fixups. The linker generates import-by-name
external fixups for called functions defined explicitly in the IMPORTS section
of the .DEF file. It generates import-by-ordinal external fixups for called
functions defined in import libraries, a much more common technique. The
benefit of import-by-name is that you don't need the import library to link
the executable, but this technique is very rare.
With import-by-name DLL calls, the linker stores the names of both the
external DLL and the function within the DLL in the .EXE itself. In LX, the
target field in the fixup record contains indexes into the Import Module Name
Table and the Import Procedure Name Table. This gives you the name of both the
DLL and the function within the DLL. In NE, the target field contains two
indexes into the Imported Name Table that do the same job.
With an import-by-ordinal reference, the linker specifies the DLL name and the
function's ordinal within that DLL. The fixup-record target field contains an
index into the Import Module Name Table (Imported Name Table in NE). However,
instead of an index into the Import Procedure Name Table to get the function
name, it contains the function's ordinal number within the external DLL. Thus,
to get the function name to go with the ordinal, you have to run something
like SHOWEXE on the external DLL and find the ordinal in either the Resident
or Non-Resident Name Tables. 
While the fixup records and sources for external fixups are much the same in
NE and LX, the targets can be very different. In NE, targets within the same
segment that resolve to the same source are chained together so that the
loader has only to read one fixup record. Listing Four makes three calls to
DosSleep(). You might expect this to generate three fixup records, but you
only get one. Listing Five is a disassembled Listing Four in which the
contents of the three target calls to DosSleep are 0000:001B, 0000:0024, and
0000:FFFF. There's the chain: The fixup record points to the target at offset
0012, which points to the target at offset 001B to the target at offset 0024,
which signals the end of the chain with FFFF. At load-time, the loader gets a
real address for the source, then marches up the target chain, replacing the
chain links with the real address. (You have to look at the disk file to see
the target chain. CodeView will replace the target-chain links with a real
address.) In LX, there is no support for target chaining. The Fixup Record
Table for the LX version of Listing Four is shown in Table 4. As you can see,
the linker produces a separate fixup record for each call to DosSleep().


Reading the Tables


In either format, the linker places the "file offsets" of all the tables
(except the Fixup Record Table in NE) in the header, but you have to be
careful with these offsets. Though most of them are relative to the beginning
of the new header, some are relative to other offsets or to the beginning of
the file. Some of the tables have corresponding element counts within the
header, but some are terminated by a special character, and others are only
terminated by reaching the file offset of the next file section. And just to
make it interesting, some of the tables, such as the Entry and Fixup Record
Tables, contain entries which are structures made up of members of variable
size (8, 16, and 32 bit). Table 2 lists the tables for LX, where they're
located, what type of elements they contain, and how they're terminated.
The structures of the fixed-size entries are shown in Listing One. To read
them, keep reading the fixed-size struct until you hit the terminator. The
only table you have to dig for is the NE Fixup Record Table. The NE linker
appends Fixup Record Tables to the segments to which they belong. Loading them
requires finding the segment through the offset in the Segment Table record,
adding the segment size, reading the two-byte relocation count at that offset,
and then reading that many fixup records. 
Under NE, the "Name" table entries (Imported, Resident, and Non-Resident) all
contain a Pascal-style string (one-byte length followed by length bytes,
non-null-terminated string). Resident and Non-Resident Name Table entries also
add a 2-byte ordinal to the end. Under LX, the Name table-entry structures
(Import Module, Import Procedure, Resident and Non-Resident) are identical
except that they expand the string length to two bytes. 
The Entry Table in NE and the Entry and Fixup Record Tables in LX contain
more-complex, variable-sized entries. All these tables have to be read as byte
streams. Entry Table entries are bundled, with the first two bytes being the
count of entries in the bundle and their type, followed by count entries of a
single format. So, you hit the bundle, figure out the count and the
encapsulated format, and read count encapsulated entries. Figure 1 shows the
format for LX Entry Table entries.
Unlike NE, the LX Fixup Record Table is part of the file header. The fixup
records themselves can take one of four formats, and within each of these
formats, two or three of the fields have a variable size. The fixup-record
types you'll run into most often are 16-bit Internal Fixups and
Import-by-Ordinal External Fixups. Figure 2 is the format of Fixup Records for
LX.


Physical Objects



The only other types of data in the EXE are Debugging Info and Segments/Pages.
Both NE and LX punt on Debugging Info, allowing the linker to reserve a
section of the executable for proprietary-format debugging data. LX does make
an attempt to assist the debugging process, instituting a pointer to a
linker/debugger-specific Debugging Info Section and Debugging Info Length. 
In NE, the linker places the file offsets of all the code and data segments in
the Segment Table. The actual segments are located on sector boundaries and
are found by shifting the sector index in the Segment Table entry (ns_sector
in the SEG struct) left by the alignment (align from NE struct). Get the
offset and read ns_cbseg bytes, and you have the actual segment.
In LX, the linker puts the offset of the page within the Data or Iterated Page
Section in the page's Object Page Table entry, while placing the sizes and
file offsets of the actual Data and Iterated Page Sections in the header. You
locate the physical pages within the file by reading the page-data offset from
the Object Page Table, shifting it left by the Page Offset Shift and adding it
to either the Data or Iterated Data Pages Offset (depending on flags in the
OBJPG struct). All pages are physically 4K long with any sub-4K pages
zero-filled to reach the required size. Remember that both formats support
zero-filled pages/segments that exist as entries in the Object Page or Segment
Tables but don't have physical images in the executable.


Conclusion


The days when the loader simply copied all the bytes into memory and jumped to
the entry point are long gone. With OOP, GUIs, internationalization, and the
drive toward portability, EXE file formats have become more interesting. The
loader and operating system now encompass functionality that would previously
have been written into the source--if written at all. To master this
functionality, you need to understand the reaction of the loader and the OS to
particular facets of the executable, and be able to see clearly all the parts
of that executable. 


Acknowledgments


Special thanks to Michael Roth at IBM Austin for his help with this article.
Table 1: (a) Windows 3.1 programs and their .EXE formats; (b) OS/2 2.1
programs and their .EXE formats.
Format Name Description 
(a) NE SOL.EXE Solitaire game
 LE EMM386.EXE Extended-memory manager
 NE PRINTMAN.EXE Print manager
 NE PROGMAN.EXE Program-manager shell
 LE SMARTDRV.EXE SmartDrive disk cache
 NE GDI.EXE Graphical-device interface API
 NE RECORDER.DLL Windows recorder
 NE VBRUN300.DLL Visual Basic run-time DLL

(b) LX PMSHELL.EXE Presentation Manager shell
 LX CMD.EXE Command-line shell
 NE LINK386.EXE Linker
 NE RC.EXE Resource compiler
 LX DOSCALL1.DLL System-call DLL
 LX OS2KRNL OS/2 2.1 kernel
Table 2: LX tables, their file locations and terminations. All symbolic
references are to the LX .EXE header structure of Listing One except lfanew,
which is a member of the MZ header.
Table Name File Location Termination Entry Size
Object lfanew+ObjTblOfs Item count NumObjs Fixed
Object Page lfanew+ObjPgTblOfs Item count* Fixed
Resource lfanew+RscTblOfs Item count NumRscEntries Fixed
Resident Name lfanew+ResTblOfs Null terminator Variable/
 string
Non-Resident NResTblOfs Byte count NResNmTblLen Variable/
 Name string
Entry lfanew+EntryTblOfs Null terminator Variable/
 bundled
Module-format lfanew+ModFmtTblOfs Item count NumModEntries Fixed
Directive
Fixup Page lfanew+FixupPgTblOfs See Object Page Table Fixed
Fixup Record lfanew+FixupRecTblOfs File offset < Variable
 lfanew+ImpModTblOfs
Import lfanew+ImpModTblOfs Item count ImpModEntries Variable/
 Module Name string
Import lfanew+ImpProcTblOfs File offset < DataPgOfs Variable/
 Procedure string
 Name
Table 3: LX relocation list for Listing Three.
 Type Target Source 
 Internal 1.0054 2.000401ec
 Internal 1.004a 2.000401e8

 Internal 1.0040 2.000201f0
 Internal 1.0036 2.000201ec
 Internal 1.002c 2.01f4
 Internal 1.0022 2.01f0
Table 4: LX relocation list for Listing Four.
 Type Target Source 
 Import by ordinal 1.37 DOSCALLS.229
 Import by ordinal 1.2d DOSCALLS.229
 Import by ordinal 1.23 DOSCALLS.229
Table 5: LX Object and Object Page Tables for Listing Two.
Object and Object Page Tables
 Object 1, Size 1598, Addr 0, Flags 2005, PgTableInd 1, NumPgs 1, Rsv0
 READ, EXECUTE, 32-BIT,
 PAGES: Number Offset Size Flags
 1 0 (0x0) 2048 Legal Physical Page - 0000
 Object 2, Size 262808, Addr 0, Flags 2003, PgTableInd 2, NumPgs 65, Rsv0
 READ, WRITE, 32-BIT,
 PAGES: Number Offset Size Flags
 2 4 (0x4) 512 Legal Physical Page - 0000
 3 0 (0x0) 0 Zero Filled Page - 0003
 4 0 (0x0) 0 Zero Filled Page - 0003
 5 0 (0x0) 0 Zero Filled Page - 0003
 6 0 (0x0) 0 Zero Filled Page - 0003
 7 0 (0x0) 0 Zero Filled Page - 0003
 8 0 (0x0) 0 Zero Filled Page - 0003
 9 0 (0x0) 0 Zero Filled Page - 0003
 10 0 (0x0) 0 Zero Filled Page - 0003
 11 0 (0x0) 0 Zero Filled Page - 0003
 12 0 (0x0) 0 Zero Filled Page - 0003
 13 0 (0x0) 0 Zero Filled Page - 0003
 14 0 (0x0) 0 Zero Filled Page - 0003
 15 0 (0x0) 0 Zero Filled Page - 0003
 16 0 (0x0) 0 Zero Filled Page - 0003
 17 0 (0x0) 0 Zero Filled Page - 0003
 18 0 (0x0) 0 Zero Filled Page - 0003
 19 0 (0x0) 0 Zero Filled Page - 0003
 20 0 (0x0) 0 Zero Filled Page - 0003
 21 0 (0x0) 0 Zero Filled Page - 0003
 22 0 (0x0) 0 Zero Filled Page - 0003
 23 0 (0x0) 0 Zero Filled Page - 0003
 24 0 (0x0) 0 Zero Filled Page - 0003
 25 0 (0x0) 0 Zero Filled Page - 0003
 26 0 (0x0) 0 Zero Filled Page - 0003
 27 0 (0x0) 0 Zero Filled Page - 0003
 28 0 (0x0) 0 Zero Filled Page - 0003
 29 0 (0x0) 0 Zero Filled Page - 0003
 30 0 (0x0) 0 Zero Filled Page - 0003
 31 0 (0x0) 0 Zero Filled Page - 0003
 32 0 (0x0) 0 Zero Filled Page - 0003
 33 0 (0x0) 0 Zero Filled Page - 0003
 34 0 (0x0) 0 Zero Filled Page - 0003
 35 0 (0x0) 0 Zero Filled Page - 0003
 36 0 (0x0) 0 Zero Filled Page - 0003
 37 0 (0x0) 0 Zero Filled Page - 0003
 38 0 (0x0) 0 Zero Filled Page - 0003
 39 0 (0x0) 0 Zero Filled Page - 0003
 40 0 (0x0) 0 Zero Filled Page - 0003
 41 0 (0x0) 0 Zero Filled Page - 0003
 42 0 (0x0) 0 Zero Filled Page - 0003

 43 0 (0x0) 0 Zero Filled Page - 0003
 44 0 (0x0) 0 Zero Filled Page - 0003
 45 0 (0x0) 0 Zero Filled Page - 0003
 46 0 (0x0) 0 Zero Filled Page - 0003
 47 0 (0x0) 0 Zero Filled Page - 0003
 48 0 (0x0) 0 Zero Filled Page - 0003
 49 0 (0x0) 0 Zero Filled Page - 0003
 50 0 (0x0) 0 Zero Filled Page - 0003
 51 0 (0x0) 0 Zero Filled Page - 0003
 52 0 (0x0) 0 Zero Filled Page - 0003
 53 0 (0x0) 0 Zero Filled Page - 0003
 54 0 (0x0) 0 Zero Filled Page - 0003
 55 0 (0x0) 0 Zero Filled Page - 0003
 56 0 (0x0) 0 Zero Filled Page - 0003
 57 0 (0x0) 0 Zero Filled Page - 0003
 58 0 (0x0) 0 Zero Filled Page - 0003
 59 0 (0x0) 0 Zero Filled Page - 0003
 60 0 (0x0) 0 Zero Filled Page - 0003
 61 0 (0x0) 0 Zero Filled Page - 0003
 62 0 (0x0) 0 Zero Filled Page - 0003
 63 0 (0x0) 0 Zero Filled Page - 0003
 64 0 (0x0) 0 Zero Filled Page - 0003
 65 0 (0x0) 0 Zero Filled Page - 0003
 66 0 (0x0) 0 Zero Filled Page - 0003
 Object 3, Size 49152, Addr 0, Flags 2003, PgTableInd 67, NumPgs 1, Rsv0
 READ, WRITE, 32-BIT,
 PAGES: Number Offset Size Flags
 67 0 (0x0) 0 Zero Filled Page - 0003
Figure 1 LX Entry Table entry format.
Figure 2 LX fixup-record format.

Listing One 

// NE and LX Header structures and structures of all fixed size table entry 
// types. Dummy struct that gets you file offset of "new" exe header. Ignores
// MZ header items other than ID word and lfanew. Read at offset 0 of file.
typedef struct {
 unsigned short magic; // MUST BE ASCII "MZ"
 char useless_bytes[34]; // Ignore these bytes
 unsigned long lfanew; // Here's the file offset of new exe header.
 } SIMPLE_MZ_EXE;
// structure of NE header. Follows magic bytes "NE" in file.
typedef struct {
 unsigned char ver; // Version.
 unsigned char rev; // Revision
 unsigned short enttab; // File offset of Entry Table from lfanew.
 unsigned short cbenttab; // Entry Table byte count.
 long crc; // CRC checksum of entire file.
 unsigned short flags; // Exe flags (such as ERROR ...)
 unsigned short autodata; // Segment num of Auto-DS seg, 1-based.
 unsigned short heap; // Segment number of heap, 1-based.
 unsigned short stack; // Segment number of stack, 1-based.
 unsigned short ip; // Initial value of IP register.
 unsigned short cs; // Initial value of CS register.
 unsigned short sp; // Initial value of SP register.
 unsigned short ss; // Initial value of SS register.
 unsigned short cseg; // # of segments in Segment Table.
 unsigned short cmod; // # of modules in Module Reference Table.
 unsigned short cbnrestab; // Byte count of Non-Res Name Table.

 unsigned short segtab; // Offset of Segment Table from lfanew.
 unsigned short rsrctab; // Offset of Resource Table from lfanew.
 unsigned short restab; // Offset of Res Name Table from lfanew.
 unsigned short modtab; // Offset of Module Ref Table from lfanew.
 unsigned short imptab; // Offset of Imp Name Table from lfanew.
 unsigned long nrestab; // Offset of Non-Resident Name Table from 
 // beginning of file.
 unsigned short cmovent; // Number of movable entries.
 unsigned short align; // File sector size, Segments are aligned 
 // on boundaries of this value.
 unsigned short cres; // Item count of Resource Table.
 char resv[10]; // reserved.
 } NE_EXE;
// an entry in the NE Segment Table
struct SEG {
 unsigned short ns_sector; // The file sector segment starts at.
 unsigned short ns_cbseg; // # of bytes in segment image.
 unsigned short ns_flags; // Type of segment (code,data ...)
 unsigned short ns_minalloc; // Minimum size in memory.
 };
// an entry in NE Module Reference Table.
struct MOD_REF {
 unsigned index; // An index into the Imported Name Table.
 unsigned uModNum; // Module number used by Fixup Records 
 // trying to use this Module Reference.
 };
// an entry in a NE Fixup Record Table
struct NEW_REL {
 unsigned char target; // Type of target (see targets below)
 unsigned char source; // Type of source (see sources below)
 unsigned offset; // Offset in this segment of target.
 unsigned module_num; // Module number (see Mod Reference entry) if
 // source = 1 or 2, segment number if source = 0
 unsigned ordinal; // target offset if source=0, function ordinal if 
 // source=1 and function name, offset in Imported 
 // Name Table if source=2
 };
// Possible values for target
#define NE_TARG_16SEG 2 // 16-bit segment
#define NE_TARG_16SEGOFS 3 // 16-bit segment, 16-bit offset.
#define NE_TARG_16OFS 5 // 16-bit offset.
#define NE_TARG_16SEG32OFS 11 // 16-bit segment, 32-bit offset.
#define NE_TARG_32OFS 13 // 32-bit offset.
// Possible values for source
#define NE_DEST_THISEXE 0 // Source is in this exe.
#define NE_DEST_DLLBYORDINAL 1 // Source is imported by ordinal.
#define NE_DEST_DLLBYNAME 2 // Source is imported by name.
// Structure that defines the LX exe header. Follows the two magic bytes "LX".
typedef struct {
 UCHAR ByteOrder; // LITTLE_ENDIAN or BIG_ENDIAN
 UCHAR WordOrder; // LITTLE_ENDIAN or BIG_ENDIAN
 ULONG FormatLevel; // Loader format level, currently 0
 USHORT CpuType; // 286 through Pentium+
 USHORT OSType; // DOS, Win, OS/2 ...
 ULONG ModVersion; // Version of this exe
 ULONG ModFlags; // Program/Library ...
 ULONG ModNumPgs; // Number of non-zero-fill or invalid pages
 ULONG EIPObjNum; // Initial code object
 ULONG EIP; // Start address within EIPObjNum

 ULONG ESPObjNum; // Initial stack object
 ULONG Esp; // Top of stack within ESPObjNum
 ULONG PgSize; // Page size, fixed at 4k 
 ULONG PgOfsShift; // Page alignment shift
 ULONG FixupSectionSize; // Size of fixup information in file
 ULONG FixupCksum; // Checksum of FixupSection
 ULONG LdrSecSize; // Size of Loader Section
 ULONG LdrSecCksum; // Loader Section checksum
 ULONG ObjTblOfs; // File offset of Object Table
 ULONG NumObjects; // Number of Objects
 ULONG ObjPgTblOfs; // File offset of Object Page Table
 ULONG ObjIterPgsOfs; // File offset of Iterated Data Pages
 ULONG RscTblOfs; // File offset of Resource Table
 ULONG NumRscTblEnt; // # of entries in Resource Table
 ULONG ResNameTblOfs; // File offset of Resident Name Table
 ULONG EntryTblOfs; // File offset of Entry Table
 ULONG ModDirOfs; // File offset of Module Directives
 ULONG NumModDirs; // Number of Module Directives
 ULONG FixupPgTblOfs; // File offset of Fixup Page Table
 ULONG FixupRecTblOfs; // File offset of Fixup Record Table
 ULONG ImpModTblOfs; // File offset of Imp Module Table
 ULONG NumImpModEnt; // Number of Imported Modules
 ULONG ImpProcTblOfs; // File offset of Imported Proc Table
 ULONG PerPgCksumOfs; // File offset of Per-Page 
 // Checksum Table
 ULONG DataPgOfs; // File offset of Data Pages
 ULONG NumPreloadPg; // Number of Preload Pages
 ULONG NResNameTblOfs; // File offset of Non Resident Name Table 
 // from beginning of file!
 ULONG NResNameTblLen; // Length in bytes of Non Resident Name Table; 
 // table is also NULL terminated.
 ULONG NResNameTblCksum; // Non Resident Name Table checksum
 ULONG AutoDSObj; // Object number of auto data
 ULONG DebugInfoOfs; // File offset of debugging info
 ULONG DebugInfoLen; // Length of Debugging Info
 ULONG NumInstPreload; // Number of instance-preload pages
 ULONG NumInstDemand; // Number of instance-demand pages
 ULONG HeapSize; // Heap size
 ULONG StackSize; // Stack size
 } LX_EXE;
// An entry in the LX object table
typedef struct {
 ULONG size; // Load-time size of object
 ULONG reloc_base_addr; // Address the object wants to be loaded at.
 ULONG obj_flags; // Read/Write/Execute, Resource, Zero-fill ...
 ULONG pg_tbl_index; // Index in Object Page Table at which this
 // object's first page is located.
 ULONG num_pg_tbl_entries; // Number of consecutive Object Page Table 
 // entries that belong to this object.
 ULONG reserved; // reserved.
} LX_OBJ;
// An entry in the LX Object Page Table.
typedef struct {
 ULONG offset; // File offset of this pages data. Relative to
 // beginning of Iterated or Preload Pages 
 USHORT size; // Size of this page. <= 4096
 USHORT flags; // Iterated, Zero-filled, Invalid ...
 } LX_PG;
// An entry in the LX Fixup Page Table is a single 32-bit value.

// possible values for LX Object Page Table flags member
typedef enum pg_types {
 LX_DATA_PHYSICAL = 0, // Legal Physical Page, file offset relative to 
 // Preload Pages
 LX_DATA_ITERATED, // Iterated Data Page, file offset relative to 
 // Iterated Pages
 LX_DATA_INVALID, // Invalid page.
 LX_DATA_ZEROFILL, // Zero-filled page.
 LX_DATA_RANGE // Range of pages.
 };
// An entry in the LX Resource Table
typedef struct {
 USHORT type_id; // one of rsc_types
 USHORT name_id; // ID application uses to load this resource
 ULONG size; // size of the resource
 USHORT object; // which object is this resource located in?
 ULONG offset; // resource offset within the object 
 } LX_RSC;



Listing Two 

// TwoArray.C - Two almost-64k arrays, with one reference to each.
// NE builds 3 segments for this, LX one object with 64 4k-pages.
#include <stdio.h>
int array1[32767];
int array2[32767];
int main(){ array1[0] = 1; array2[0] = 2; return( 0 ); }



Listing Three

// SixRef.C - Three almost-64k arrays, with two references to each.
// NE uses 3 fixups for the references, LX 6.
#include <stdio.h>
int array1[32767], array2[32767], array3[32767];
int main(){ 
 array1[0] = 1; array1[1] = 2; 
 array2[0] = 1; array2[1] = 2; 
 array3[0] = 1; array3[1] = 2; return( 0 ); }



Listing Four

// ThreeExt.C - Three DLL calls. NE uses 1 fixup record with the three targets
// chained together. LX uses three fixups, no chain.
#define INCL_DOSPROCESS
#include <os2.h>
int main(){
 DosSleep( 2 ); DosSleep( 2 ); DosSleep( 2 );
 return( 0 ); }



Listing Five


000F:0001 8BEC MOV BP,SP
000F:0003 B80000 MOV AX,0000
000F:0006 9A72020F00 CALL 000F:0272
000F:000B 57 PUSH DI
000F:000C 56 PUSH SI
7: DosSleep( 2 ); DosSleep( 2 ); DosSleep( 2 );
000F:000D 6A00 PUSH 00
000F:000F 6A02 PUSH 02
000F:0011 9A1B000000 CALL 0000:001B
000F:0016 6A00 PUSH 00
000F:0018 6A02 PUSH 02
000F:001A 9A24000000 CALL 0000:0024
000F:001F 6A00 PUSH 00
000F:0021 6A02 PUSH 02
000F:0023 9AFFFF0000 CALL 0000:FFFF
8: return( 0 ); }
000F:0028 B80000 MOV AX,0000
000F:002B E90000 JMP 002E
000F:002E 5E POP SI
000F:002F 5F POP DI
000F:0030 C9 LEAVE 









































September, 1994
Image Acquisition Using TWAIN


Understanding container data structures is the key




Craig A. Lindley


Craig is the founder and an officer of Enhanced Data Technology of Colorado
Springs, CO. He is the author of Practical Image Processing in C and Practical
Ray Tracing in C, both published by John Wiley & Sons. Craig can be contacted
on CompuServe at 73552,3375.


The TWAIN software specification is designed to provide a uniform interface
between graphics-supporting software and image-capturing hardware, making it
possible for you to build image-acquisition capabilities directly into your
application. This means that you can spend more time writing the application
and less time worrying about low-level device drivers for scanners, digitizer
boards, digital cameras, and the like. I will discuss TWAIN in this article by
presenting a C++ class, implemented as a Windows DLL, which can be used to add
image acquisition to any Windows application. I've also included an example
application to drive the DLL to show how the interface works. Although the
focus here is on Windows apps, this code, along with that provided in the
TWAIN toolkit, can be implemented for the Macintosh with very little trouble.
Of course, using this code requires access to a scanner or other raster-image
generating device along with its TWAIN-compliant device driver (referred to as
a "Source"). If you have an older device without a TWAIN Source, contact the
manufacturer to see if there is now a TWAIN Source available for it. 
TWAIN has its roots in the 1990 Macintosh Scanner Roundtable, the forerunner
of the TWAIN working group formed by representatives from imaging companies
such as Aldus, Caere, Kodak, Hewlett Packard, and Logitech. The goal of the
working group was to create an easy-to-use image-acquisition protocol and API
that was useful to both image producers (hardware manufacturers) and image
consumers (application developers). 
In February 1992, the working group released the TWAIN Toolkit 1.0 (1.51 is
the current version, available on CompuServe in the HP Peripheral Forum
Library 15), which defined "the protocol and API for generalized acquisition
of raster data." Figure 1 is a high-level diagram of the TWAIN Release 1.0
architecture. Most TWAIN transactions involve three entities: 
The application code that understands the TWAIN protocol, referred to as "the
TWAIN Code." 
The TWAIN Source Manager (DSM), a go-between for the application and a Source.
Under Windows, the Source Manager is twain.dll, a DLL located in the Windows
directory. On the Macintosh, the Source Manager is called "Source Manager" in
the Preferences folder.
The Source (or device driver) for the imaging hardware. This too is a DLL
under Windows (with a .DS file extension, however).
You are responsible for developing the TWAIN Code, but you can use code in the
TWAIN toolkit. The Source Manager was developed by the TWAIN working group and
is distributed free of charge. The Source is provided by the hardware vendor
in support of its TWAIN-compliant device.
The application program controls the acquisition process by making calls to
DSM_Entry, the single entry point of the Source Manager. Parameters used in
conjunction with these calls control the process. An application never calls a
Source directly. As requests for service are made to the Source Manager, it
acts on some directly and passes others to the selected Source as required.
Image data is returned by the Source to the application under the supervision
of the Source Manager. Because the application knows nothing about the
hardware specifics concerning the Source with which it is communicating, the
Source can be a local device, such as a SCSI-connected scanner, or a remote
device connected via a network. Only the developer of the Source must (or
should) be aware of the hardware specifics. An application simply requests
connection to a specific Source, not caring how the connection is made.
The Source must provide a user interface (UI) for controlling its device. This
releases you from having to develop a UI specific to each device your
application supports. For instance, the Polaroid CS-500i TWAIN-compliant
scanner UI is shown in Figure 2.


TWAIN Operational Overview


The TWAIN toolkit recommends two menu selections for control of TWAIN
transactions: Select Source_ and Acquire_, both preferably located in the File
menu. The Select Source_ operation allows the user to determine which Source
(if more than one are available in the system) is to be used for image
acquisition. Once the user selects a Source, it is used for all subsequent
image acquisitions until another Source is selected. The Acquire_ operation
typically brings up the UI of the selected Source for control of its
corresponding device. Using the controls provided within the UI, the user
decides how images are acquired for incorporation into the application.
A Source's UI can be treated as modal or nonmodal under Windows, although it
is inherently nonmodal in nature. A modal interface (like that presented here)
restricts the user to dealing with the scanner until the UI is closed down;
typically, after an image is acquired and transferred into the application. At
that time, control is returned to the application program. Other applications,
however, use the Source's UI in a nonmodal way, bringing up the UI and keeping
it up as just another window of the application. This allows images to be
acquired for as long as the UI is active. Which method to use is dictated
entirely by the application.
Except for error detection and recovery, the steps required to support the
Select Source_ operation are:
1. Open the Source Manager (OpenDSM). This brings the Source Manager into
memory and extracts the DSM_Entry point for all subsequent TWAIN operations.
2. Select the data Source (SelectDS). Executing this function causes the
Source Manager to locate all Sources on the system and display a dialog box
containing a list box for selecting which Source to utilize. While the dialog
box is visible, F1 can be pressed to get a description of the highlighted
Source for your inspection. Similarly, pressing Alt+W+G will bring up the list
of members of the working group. Making a Source selection dismisses the
dialog box.
3. Close the Source Manager (CloseDSM). This may or may not be appropriate for
a given application. In the code presented here, the Source Manager is
unloaded after each operation, including this one. In other applications, the
Source Manager might be brought up during program initialization and shut down
when the application terminates.
Again, except for error detection and recovery, the steps required to support
the image-acquisition operation are:
1. Open the Source Manager (OpenDSM). This brings the Source Manager into
memory and extracts the DSM_Entry point used for all subsequent TWAIN
operations. 
2. If it is not already open, open the specified Source (OpenDS).
3. Negotiate with the Source for any capabilities required by the application.
The functions SetResolution, SetupFileTransfer, and RestrictToRGB "negotiate"
capabilities between the application and the selected Source. 
4. Enable the data Source (EnableDS). Executing this function brings up the UI
supplied by the selected Source. The UI communicates with the application code
via Windows messages. Any negotiation performed before the UI was brought up
should now be in place. A message, MSG_XFERREADY, will be sent to the
application whenever there is an image to transfer. A message, MSG_CLOSEREQ,
will be sent to the application whenever the user requests the UI to shut
down. To shut down, perform the following: 
 a. Disable the data Source (DisableDS). Upon reception of the MSG_CLOSEREQ
message, DisableDS is called to close down the Source's UI.
 b. Close the data Source (CloseDS). Again, this may or may not be
appropriate.
 c. Close the Source Manager (CloseDSM). May or may not be appropriate.
As straightforward as this seems, this doesn't mean the TWAIN interface is
easy to understand and use. On the contrary, the specification will take time
to fully understand and appreciate. A glance at DC.H (available
electronically) makes this apparent. The specification document is full of
information on controlling image transfers (native, through file, through
memory, formatted/not), capability negotiation, state-transition diagrams,
JPEG-compression issues, detailed message descriptions, and more. Anyone
interested in understanding TWAIN should get the toolkit and read the
documentation. After the second or third pass, the specifications will begin
to make sense. (To get the $35.00 toolkit, call 800-722-0379 or download it
from CompuServe--GO HPPERIPH--and access the TWAIN library, #15.)


Containers


Containers are data structures, used to hold structured information, that are
passed between an application and a Source. Specifically, containers are used
for information exchange during "capability negotiation." The four container
types are: 
One-Value containers, which can hold one 32-bit value.
Array containers, which contain an arbitrary number of values of any defined
type. These values are accessed using an index value just as in an array in C.
Range containers, which contain information describing a range of values of a
specified type: minimum value, maximum value, step-size value, default value,
and current value.
Enumeration containers, which contain a list of values of a defined type from
which to choose, along with a current and default value.
In all, there are 13 TWAIN-defined data types that can show up in containers;
see Table 1. CONTAINR.HPP (Listing One) and CONTAINR.CPP (Listing Two) include
the functions for manipulating containers of all types. Note that it is always
the application program's responsibility to release any memory occupied by
containers when it is finished with them. This is true whether or not the
container memory was originally allocated by the Source or by the application.


Capability Negotiations



It is via capability negotiations that an application program informs a Source
about the type of image(s) that it desires or that it can deal with. For
example, if an application only wants to handle color images but is connected
to a Source capable of handling both color and gray-scale images, the
application would negotiate the ICAP_PIXELTYPE capability with the Source (see
RestrictToRGB in TWAIN.CPP, available electronically). A successful
negotiation of this capability results in the Source's UI not allowing the
selection of gray-scale images. Therefore, the Source could only acquire color
images for the application.
Most negotiations between an application and a Source are of a similar form.
The application first asks the Source its capabilities, and the Source returns
a container describing them. The application then chooses from these
capabilitites and requests the Source to limit itself to a certain subset of
its capability. The Source will either agree or disagree with the
application's request. Capability negotiations are tricky because Sources are
not required to negotiate on every conceivable capability. Consequently, the
application program must be ready for a refusal to negotiate at any point in
the process. This is demonstrated by SetResolution (in TWAIN.CPP), which
accepts as a parameter a specification in dots per inch for the maximum
resolution for image acquisition. 
To begin the SetResolution negotiation, you must inquire of the Source which
resolutions it supports by sending a GET message on the Source's
ICAP_XRESOLUTION capability. The messaging is done by forming the parameters
and calling the DSM_Entry point. The TWAIN specification says that a container
of One-Value, Range, or Enumeration type can be used to return the XResolution
capability data. Consequently, the code must be prepared to parse any of these
returned container types. The returned data type for the XResolution
capability is specified as DCTY_FIX32, a fixed-point representation of a
floating-point number. 
The various container types are processed using a switch statement on the
returned container type. If a One-Value container type is returned, the Source
is capable of only a single resolution and no negotiations are possible. If
the Source returns a Range container, all values less than or equal to the
requested resolution are stored for later use. The same is true of all values
returned in an Enumeration container. Once all of the data in the returned
container is parsed, container memory is freed. 
At this point, all resolutions supported by the Source that meet the
specification have been saved. The application must then tell the Source which
resolutions it is free to use by forming an Enumeration container containing
the allowed values in FIX32 format. This is passed to the Source via a SET
message on the ICAP_XRESOLUTION capability. If the Source accepts the request,
it will limit the resolution selections provided in its UI to just those
specified values. It may, however, still refuse the negotiation.
Negotiation must take place within certain states of the TWAIN protocol. For
example, to negotiate with a source, you must have first loaded and executed
the Source Manager, then opened a Source. Only then will the Source be
available for the negotiation of capabilities. Most capabilities must be
negotiated in this state (state 4, according to the specification), although a
mechanism exists within the specification for extended negotiation in other
states. 


A Sample Application


The code for the sample application is shown in Listing Three . In addition to
the code in Listing Three, the application requires a Windows DLL called
MYTWAIN.DLL that supports TWAIN. The files required to produce the DLL and
sample application are available electronically; see "Availability," page 3.
The sample app has a File menu with Select Source_, Acquire_, and Exit
entities. When the Select Source_ operation is executed, the Source Manager is
instructed to put up the Source-selection dialog box so the user can select
the Source to acquire from. When the Acquire_ operation is selected, the UI
provided by the Source is brought up so the user can acquire an image. When an
image is acquired, it is written as a TIFF file to the filename hardcoded into
the application program. Admittedly, this isn't very flexible, but it's fine
for illustration.
The important thing to notice about the app in Listing Three is that when the
application window (TAppWindow) is created, an instance of the Twain class is
instantiated. When the application window is closed, that reference to the
Twain class is deleted. When the Select Source_ message is detected, the Twain
class member function SelectSource is called with the handle to the
application's window. This puts up the selection dialog box. When the Acquire_
message is detected, the Twain class member function ScanImage is called,
passing the path and filename in which to store the scanned image. For
simplicity, the scanned image is not displayed within the application's
window; it is only written to the file.
Figure 1 The TWAIN architecture.
Figure 2 Typical UI for a TWAIN-compliant device.
Table 1: TWAIN data types.
 Data Type Description 
 DCTY_INT8 8-bit signed value
 DCTY_INT16 16-bit signed value
 DCTY_INT32 32-bit signed value
 DCTY_UINT8 8-bit unsigned value
 DCTY_UINT16 16-bit unsigned value
 DCTY_UINT32 32-bit unsigned value
 DCTY_BOOL Boolean value
 DCTY_FIX32 Fixed-point description of a
 floating-point number
 DCTY_FRAME Data structure defining an
 area. Includes Left, Top,
 Right, and Bottom.
 DCTY_STR32 A string 32 bytes in length
 DCTY_STR64 A string 64 bytes in length
 DCTY_STR128 A string 128 bytes in length
 DCTY_STR255 A string 255 bytes in length

Listing One 

/***************************************************************************/
/*** containr.hpp -- interface class for TWAIN containers. ***/
/*** adapted by Craig A. Lindley -- Revision: 1.0 Last Update: 12/11/93 ***/
/****************************************************************************/
// See the file containr.cpp for the revision history
// Check to see if this file already included
#ifndef CONTAINR_HPP
#define CONTAINR_HPP

#include "dc.h"
class huge Containr {
 private:
 void GetItem(DC_UINT16 Type, LPVOID lpSource, LPVOID lpDest,
 int SourceIndex, int DestIndex);
 public:
 DC_FIX32 FloatToFIX32(float AFloat);
 float FIX32ToFloat(DC_FIX32 Fix32);
 BOOL BuildUpOneValue(pDC_CAPABILITY pCap,DC_UINT16 ItemType,DC_UINT32 Item);
 BOOL ExtractOneValue(pDC_CAPABILITY pCap, LPVOID pVoid);
 BOOL BuildUpEnumerationType(pDC_CAPABILITY pCap,pDC_ENUMERATION pE,

 LPVOID lpList);
 BOOL ExtractEnumerationValue(pDC_CAPABILITY pCap, LPVOID pVoid, int Index);
 BOOL BuildUpArrayType(pDC_CAPABILITY pCap, pDC_ARRAY pA, LPVOID lpList);
 BOOL ExtractArrayValue(pDC_CAPABILITY pCap, LPVOID pVoid, int Index);
 BOOL BuildUpRangeType(pDC_CAPABILITY pCap, pDC_RANGE lpRange);
 BOOL ExtractRange(pDC_CAPABILITY pCap, pDC_RANGE lpRange);
};
#endif



Listing Two

/****************************************************************************/
/*** containr.cpp -- interface class for TWAIN containers. ***/
/*** adapted by Craig A. Lindley -- Revision: 1.0 Last Update: 12/11/93 ***/
/***************************************************************************/

#include "containr.hpp"

// Array of type sizes in bytes
DCItemSize[] = {
 sizeof(DC_INT8),
 sizeof(DC_INT16),
 sizeof(DC_INT32),
 sizeof(DC_UINT8),
 sizeof(DC_UINT16),
 sizeof(DC_UINT32),
 sizeof(DC_BOOL),
 sizeof(DC_FIX32),
 sizeof(DC_FRAME),
 sizeof(DC_STR32),
 sizeof(DC_STR64),
 sizeof(DC_STR128),
 sizeof(DC_STR255),
};
/*** FloatToFIX32 -- Convert a floating point value into a FIX32. ***/
DC_FIX32 Containr::FloatToFIX32(float AFloat) {
 DC_FIX32 Fix32_value;
 DC_INT32 Value = (DC_INT32) (AFloat * 65536.0 + 0.5);
 Fix32_value.Whole = Value >> 16;
 Fix32_value.Frac = Value & 0x0000ffffL;
 return(Fix32_value);
}
/*** FIX32ToFloat -- Convert a FIX32 value into a floating point value ***/
float Containr::FIX32ToFloat(DC_FIX32 Fix32) {
 float AFloat;
 AFloat = (float) Fix32.Whole + (float) Fix32.Frac / 65536.0;
 return(AFloat);
}
/*** GetItem -- Gets data item at lpSource[SIndex] of datatype Type and stores
 *** it at lpDest[DIndex]. ***/
void Containr::GetItem(DC_UINT16 Type, LPVOID lpSource,
 LPVOID lpDest, int SIndex, int DIndex) {
 switch (Type) {
 case DCTY_INT8:
 *((pDC_INT8)lpDest + DIndex) = *((pDC_INT8)lpSource + SIndex);
 break;
 case DCTY_UINT8:

 *((pDC_UINT8)lpDest + DIndex) = *((pDC_UINT8)lpSource + SIndex);
 break;
 case DCTY_INT16:
 case 44: // DCTY_HANDLE
 *((pDC_INT16)lpDest + DIndex) = *((pDC_INT16)lpSource + SIndex);
 break;
 case DCTY_UINT16:
 case DCTY_BOOL:
 *((pDC_UINT16)lpDest + DIndex) = *((pDC_UINT16)lpSource + SIndex);
 break;
 case DCTY_INT32:
 *((pDC_INT32)lpDest + DIndex) = *((pDC_INT32)lpSource + SIndex);
 break;
 case DCTY_UINT32:
 case 43: // DCTY_MEMREF
 *((pDC_UINT32)lpDest + DIndex) = *((pDC_UINT32)lpSource + SIndex);
 break;
 case DCTY_FIX32:
 *((pDC_FIX32)lpDest + DIndex) = *((pDC_FIX32)lpSource + SIndex);
 break;
 case DCTY_STR32:
 lstrcpy((pDC_STR32)lpDest + DIndex, (pDC_STR32)lpSource + SIndex);
 break;
 case DCTY_STR64:
 lstrcpy((pDC_STR64)lpDest + DIndex, (pDC_STR64)lpSource + SIndex);
 break;
 case DCTY_STR128:
 lstrcpy((pDC_STR128)lpDest + DIndex, (pDC_STR128)lpSource + SIndex);
 break;
 case DCTY_STR255:
 lstrcpy((pDC_STR255)lpDest + DIndex, (pDC_STR255)lpSource + SIndex);
 break;
 }
}
/*** FUNCTION: BuildUpOneValue ***/ 
 *** ARGS: pCap, pointer to a capability structure, details about container
 * ItemType, constant that defines the type of the Item to follow
 * Item, the data to put into the OneValue container
 * RETURNS: pData->hContainer set to address of the container handle, ptr is 
 * returned there. A TRUE BOOL is returned from this function if
 * all is well and FALSE if container memory could not be allocated.
 * NOTES: This function creates a container of type OneValue and returning 
 * with the hContainer value (excuse me) "pointing" to container. Container 
 * is filled with values for ItemType and Item requested by the caller. */
BOOL Containr::BuildUpOneValue(pDC_CAPABILITY pCap, DC_UINT16 ItemType, 
 DC_UINT32 Item) {
 pDC_ONEVALUE pOneValue;
 if ((pCap->hContainer = (DC_HANDLE) GlobalAlloc(GHND, 
 sizeof(DC_ONEVALUE))) != NULL) {
 // log the container type
 pCap->ConType = DCON_ONEVALUE;
 if ((pOneValue = (pDC_ONEVALUE)GlobalLock(pCap->hContainer)) != NULL) {
 pOneValue->ItemType = ItemType; // DCTY_XXXX
 pOneValue->Item = Item; // DCPT_XXXX...
 GlobalUnlock(pCap->hContainer);
 return TRUE;
 } else { // If lock error, free memory
 GlobalFree(pCap->hContainer);
 pCap->hContainer = 0;

 }
 }
 // Could not allocate or lock memory
 return FALSE;
}
/*** FUNCTION: ExtractOneValue 
 * ARGS: pCap pointer to a capability structure, details about container
 * pVoid ptr will be set to point to the item on return
 * RETURNS: pVoid pts to extracted value.
 * NOTES: This routine will open a container and extract the Item. The Item 
 * will be returned to the caller in pVoid. I will type cast the returned 
 * value to that of ItemType. */ 
BOOL Containr::ExtractOneValue(pDC_CAPABILITY pCap, LPVOID pVoid) {
 pDC_ONEVALUE pOneValue;
 if ((pOneValue = (pDC_ONEVALUE)GlobalLock(pCap->hContainer)) != NULL) {
 // Extract the one value
 GetItem(pOneValue->ItemType, (LPVOID) &(pOneValue->Item), pVoid, 0, 0);
 GlobalUnlock(pCap->hContainer);
 return TRUE;
 }
 return FALSE;
}
/*** FUNCTION: BuildUpEnumerationType 
 * ARGS: pCap pointer to a capability structure, details about container
 * pE ptr to struct that contains the other fields of ENUM struct
 * *pList ptr to array of elements to put into the ENUM array
 * RETURNS: pData->hContainer set to address of the container handle, ptr is 
 * returned here
 * NOTES: The routine dynamically allocates a chunk of memory large enough 
 * to contain all the struct pDC_ENUMERATION as well as store it's ItemList 
 * array INTERNAL to the struct. The array itself and it's elements must be
 * type cast to ItemType. I do not know how to dynamically cast elements
 * of an array to ItemType so it is time for a big static switch.>>>
 * Protocol: Used by MSG_GET.. calls were Source allocates the container and 
 * APP uses and then frees the container. */
BOOL Containr::BuildUpEnumerationType(pDC_CAPABILITY pCap, pDC_ENUMERATION pE,
 LPVOID lpList) {
 pDC_ENUMERATION pEnumeration; // template for ENUM fields
 int Index; // anyone with more than 32K array elements
 // should crash. Could type on NumItems.
 LPVOID pVoid;
 // allocate a block large enough for struct and complete enumeration array
 if ((pCap->hContainer = (DC_HANDLE) GlobalAlloc(GHND,
 (sizeof(DC_ENUMERATION)-sizeof(DC_UINT8))+
 pE->NumItems*DCItemSize[pE->ItemType])) == NULL)
 return FALSE; // return FALSE if memory error
 if ((pEnumeration = (pDC_ENUMERATION) GlobalLock(pCap->hContainer)) == NULL)
{
 GlobalFree(pCap->hContainer); // return FALSE if memory error
 return FALSE;
 }
 pCap->ConType = DCON_ENUMERATION; // Fill in container type
 pEnumeration->ItemType = pE->ItemType; // DCTY_XXXX
 pEnumeration->NumItems = pE->NumItems; // DCPT_XXXX...
 pEnumeration->CurrentIndex = pE->CurrentIndex; // current index setting
 pEnumeration->DefaultIndex = pE->DefaultIndex; // default index setting
 // Assign base address of ItemList array to 'generic' pointer
 // i.e. reposition the struct pointer to overlay the allocated block
 pVoid = (LPVOID)pEnumeration->ItemList;
 // Now store the enumerated items

 for (Index=0; Index < (int)pE->NumItems; Index++)
 GetItem(pE->ItemType, (LPVOID) lpList, (LPVOID) pVoid, Index, Index);
 // Unlock the container
 GlobalUnlock(pCap->hContainer);
 return TRUE;
}
/*** FUNCTION: ExtractEnumerationValue
* ARGS: pCap pointer to a capability structure, details about container
* pVoid ptr will be set to point to the item on return
* Index requested index into the enumeration
* RETURNS: pVoid is set to pointer to itemtype
* NOTES: This routine will open a container and extract the Item. The 
* Item will be returned to the caller in pVoid. Returned value will
* be type cast to that of ItemType.
* COMMENTS: only a single value is returned; referred to by indexed value. */
BOOL Containr::ExtractEnumerationValue(pDC_CAPABILITY pCap, LPVOID pVoid, 
 int Index) {
 pDC_ENUMERATION pEnumeration;
 LPVOID pItemList;
 // Lock the container for access
 if ((pEnumeration = (pDC_ENUMERATION) GlobalLock(pCap->hContainer)) == NULL)
 return FALSE;
 // Check that Index is within range
 if (Index > pEnumeration->NumItems-1)
 return FALSE;
 // Assign base address of ItemList array to 'generic' pointer
 pItemList = (LPVOID) pEnumeration->ItemList;
 GetItem(pEnumeration->ItemType, pItemList, pVoid, Index, 0);
 GlobalUnlock(pCap->hContainer);
 return TRUE;
}
/*** FUNCTION: BuildUpArrayType
 * ARGS: pCap pointer to a capability structure, details about container
 * pA ptr to struct that contains the other fields of ARRAY struct
 * *pList ptr to array of elements to put into the ARRAY struct
 * RETURNS: pData->hContainer set to address of the container handle, ptr is 
 * returned here
 * NOTES: The routine dynamically allocates a chunk of memory large enough to
 * contain all the struct pDC_ARRAY as well as store it's ItemList array
 * INTERNAL to the struct. The array itself and it's elements must be
 * type cast to ItemType. */
BOOL Containr::BuildUpArrayType(pDC_CAPABILITY pCap, pDC_ARRAY pA,
 LPVOID lpList) {
 pDC_ARRAY pArray;
 int Index; // No more than 32K array elements
 LPVOID pVoid;
 // Allocate a block large enough for struct and complete array
 if ((pCap->hContainer = (DC_HANDLE) GlobalAlloc(GHND,
 (sizeof(DC_ARRAY)-sizeof(DC_UINT8))+
 pA->NumItems*DCItemSize[pA->ItemType])) == NULL)
 return FALSE; // Return FALSE if error
 // Lock the memory
 if ((pArray = (pDC_ARRAY) GlobalLock(pCap->hContainer)) == NULL) {
 GlobalFree(pCap->hContainer);
 return FALSE; // Return FALSE if error
 }
 pArray->ItemType = pA->ItemType; // DCTY_XXXX
 pArray->NumItems = pA->NumItems; // DCPT_XXXX...
 // Assign base address of ItemList array to 'generic' pointer

 // i.e. reposition the struct pointer to overlay the allocated block
 pVoid = (LPVOID)pArray->ItemList;
 // For each item of the array
 for (Index=0; Index < (int)pA->NumItems; Index++)
 GetItem(pA->ItemType, lpList, pVoid, Index, Index);
 // Unlock the memory
 GlobalUnlock(pCap->hContainer);
 return TRUE;
}
/*** FUNCTION: ExtractArrayValue
 * ARGS: pCap pointer to a capability structure, details about container
 * pVoid ptr will be set to point to the item on return
 * Index requested index into the array
 * RETURNS: pVoid is set to pointer to itemtype
 * NOTES: This routine will open a container and extract the Item. The 
 * Item will be returned to the caller in pVoid. Returned value will
 * be type cast to that of ItemType.
 * COMMENTS: only a single value is returned; referred to by indexed value. */
BOOL Containr::ExtractArrayValue(pDC_CAPABILITY pCap,LPVOID pVoid,int Index) {
 pDC_ARRAY pArray;
 LPVOID pItemList;
 // Lock the container for access
 if ((pArray = (pDC_ARRAY) GlobalLock(pCap->hContainer)) == NULL)
 return FALSE;
 // Check that Index is within range
 if (Index > pArray->NumItems-1)
 return FALSE;
 // Assign base address of ItemList array to 'generic' pointer
 pItemList = (LPVOID) pArray->ItemList;
 GetItem(pArray->ItemType, pItemList, pVoid, Index, 0);
 GlobalUnlock(pCap->hContainer);
 return TRUE;
}
/*** FUNCTION: BuildUpRangeType
 * ARGS: pCap pointer to a capability structure, details about container
 * lpRange ptr to RANGE struct
 * RETURNS: pCap->hContainer set to address of the container handle, ptr is 
 * returned here
 * NOTES: The routine dynamically allocates a chunk of memory large enough to
 * contain the RANGE struct. */
BOOL Containr::BuildUpRangeType(pDC_CAPABILITY pCap, pDC_RANGE lpRange) {
 pDC_RANGE pRange;
 // Allocate a block large enough for RANGE struct
 if ((pCap->hContainer = (DC_HANDLE) GlobalAlloc(GHND, 
 sizeof(DC_RANGE))) == NULL)
 return FALSE; // Return FALSE if error
 // Lock the memory
 if ((pRange = (pDC_RANGE) GlobalLock(pCap->hContainer)) == NULL) {
 GlobalFree(pCap->hContainer);
 return FALSE; // Return FALSE if error
 }
 // Copy complete RANGE structure
 *pRange = *lpRange;
 // Unlock the memory
 GlobalUnlock(pCap->hContainer);
 return TRUE;
}
/*** FUNCTION: ExtractRange
 * ARGS: pCap pointer to a capability structure, details about container

 * lpRange ptr to RANGE struct for return
 * NOTES: This routine will open a container and extract the RANGE.
 * COMMENTS: the complete RANGE struct is returned at lpRange. */
BOOL Containr::ExtractRange(pDC_CAPABILITY pCap, pDC_RANGE lpRange) {
 pDC_RANGE pRange;
 // Lock the container for access
 if ((pRange = (pDC_RANGE) GlobalLock(pCap->hContainer)) == NULL)
 return FALSE;
 // Copy the complete structure
 *lpRange = *pRange;
 GlobalUnlock(pCap->hContainer);
 return TRUE;
}



Listing Three

// Sample TWAIN Application Program -- (c) Craig A. Lindley 1993
/* This program exercises the MYTWAIN.DLL. It is written in Borland's OWL. It 
allows the user to select a Source for acquisition and to acquire an image
from
the selected Source. Each acquired image is written to the file c:\scanimg.tif
in the root directory of drive C. Images are not displayed. */

#include <owl.h>
#include "app.h"
#include "twain.hpp"

class TSampleTWAINApp : public TApplication {
 public:
 TSampleTWAINApp(LPSTR AName, HINSTANCE hInstance,
 HINSTANCE hPrevInstance,
 LPSTR lpCmdLine, int nCmdShow)
 : TApplication(AName, hInstance, hPrevInstance,
 lpCmdLine, nCmdShow) {};
 virtual void InitMainWindow();
};
_CLASSDEF(TAppWindow)
class TAppWindow : public TWindow {
 private:
 virtual BOOL CanClose();
 virtual void CMSelectSource(RTMessage Msg)
 = [CM_FIRST + CMID_SELECTSOURCE];
 virtual void CMAcquire(RTMessage Msg)
 = [CM_FIRST + CMID_ACQUIRE];
 virtual void CMHelp(RTMessage Msg)
 = [CM_FIRST + CMID_HELP];
 Twain *TwainClassPtr; // Pointer the the Twain object
 public:
 TAppWindow(PTWindowsObject AParent, LPSTR ATitle);
 ~TAppWindow();
};
// Window Class Constructor
TAppWindow::TAppWindow(PTWindowsObject AParent, LPSTR ATitle)
 : TWindow(AParent, ATitle) {
 AssignMenu("CmdMenu"); // Assign the menu to the window
 TwainClassPtr = new Twain; // Instantiate the Twain class object
}
// Window Class Destructor

TAppWindow::~TAppWindow() {
 delete TwainClassPtr; // Delete the Twain class object
}
BOOL TAppWindow::CanClose() {
 return TRUE; // Allow the window to close
}
// This function is called when the Select Source menu item is clicked
void TAppWindow::CMSelectSource(RTMessage) {
 // Make a call to the DLL to perform the operation
 TwainClassPtr->SelectSource(HWindow);
}
// This function is called when the Acquire menu item is clicked
void TAppWindow::CMAcquire(RTMessage) {
 // Make a call to the DLL to perform the operation
 TwainClassPtr->ScanImage("c:\\scanimg.tif");
}
// This function is called when the Help menu item is clicked
void TAppWindow::CMHelp(RTMessage) {
 MessageBox(HWindow, "(c) Craig A. Lindley, 1993",
 "Sample TWAIN Application", MB_OK);
}
void TSampleTWAINApp::InitMainWindow() {
 MainWindow = new TAppWindow(NULL, Name);
}
int PASCAL WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
 LPSTR lpCmdLine, int nCmdShow) {
 TSampleTWAINApp MyApp("Sample TWAIN Application Program",
 hInstance, hPrevInstance, lpCmdLine, nCmdShow);
 MyApp.nCmdShow = SW_SHOWMAXIMIZED;
 MyApp.Run();
 return MyApp.Status;
}
 // Assign base address of ItemList array to 'generic' pointer
 pItemList = (LPVOID) pEnumeration->ItemList;
 GetItem(pEnumeration->ItemType, pItemList, pVoid, Index, 0);
 GlobalUnlock(pCap->hContainer);
 return TRUE;
}
/*** FUNCTION: BuildUpArrayType
 * ARGS: pCap pointer to a capability structure, details about container
 * pA ptr to struct that contains the other fields of ARRAY struct
 * *pList ptr to array of elements to put into the ARRAY struct
 * RETURNS: pData->hContainer set to address of the container handle, ptr is 
 * returned here
 * NOTES: The routine dynamically allocates a chunk of memory large enough to
 * contain all the struct pDC_ARRAY as well as store it's ItemList array
 * INTERNAL to the struct. The array itself and it's elements must be
 * type cast to ItemType. */
BOOL Containr::BuildUpArrayType(pDC_CAPABILITY pCap, pDC_ARRAY pA,
 LPVOID lpList) {
 pDC_ARRAY pArray;
 int Index; // No more than 32K array elements
 LPVOID pVoid;
 // Allocate a block large enough for struct and complete array
 if ((pCap->hContainer = (DC_HANDLE) GlobalAlloc(GHND,
 (sizeof(DC_ARRAY)-sizeof(DC_UINT8))+
 pA->NumItems*DCItemSize[pA->ItemType])) == NULL)
 return FALSE; // Return FALSE if error
 // Lock the memory

 if ((pArray = (pDC_ARRAY) GlobalLock(pCap->hContainer)) == NULL) {
 GlobalFree(pCap->hContainer);
 return FALSE; // Return FALSE if error
 }
 pArray->ItemType = pA->ItemType; // DCTY_XXXX
 pArray->NumItems = pA->NumItems; // DCPT_XXXX...
 // Assign base address of ItemList array to 'generic' pointer
 // i.e. reposition the struct pointer to overlay the allocated block
 pVoid = (LPVOID)pArray->ItemList;
 // For each item of the array
 for (Index=0; Index < (int)pA->NumItems; Index++)
 GetItem(pA->ItemType, lpList, pVoid, Index, Index);
 // Unlock the memory
 GlobalUnlock(pCap->hContainer);
 return TRUE;
}
/*** FUNCTION: ExtractArrayValue
 * ARGS: pCap pointer to a capability structure, details about container
 * pVoid ptr will be set to point to the item on return
 * Index requested index into the array
 * RETURNS: pVoid is set to pointer to itemtype
 * NOTES: This routine will open a container and extract the Item. The 
 * Item will be returned to the caller in pVoid. Returned value will
 * be type cast to that of ItemType.
 * COMMENTS: only a single value is returned; referred to by indexed value. */
BOOL Containr::ExtractArrayValue(pDC_CAPABILITY pCap,LPVOID pVoid,int Index) {
 pDC_ARRAY pArray;
 LPVOID pItemList;
 // Lock the container for access
 if ((pArray = (pDC_ARRAY) GlobalLock(pCap->hContainer)) == NULL)
 return FALSE;
 // Check that Index is within range
 if (Index > pArray->NumItems-1)
 return FALSE;
 // Assign base address of ItemList array to 'generic' pointer
 pItemList = (LPVOID) pArray->ItemList;
 GetItem(pArray->ItemType, pItemList, pVoid, Index, 0);
 GlobalUnlock(pCap->hContainer);
 return TRUE;
}
/*** FUNCTION: BuildUpRangeType
 * ARGS: pCap pointer to a capability structure, details about container
 * lpRange ptr to RANGE struct
 * RETURNS: pCap->hContainer set to address of the container handle, ptr is 
 * returned here
 * NOTES: The routine dynamically allocates a chunk of memory large enough to
 * contain the RANGE struct. */
BOOL Containr::BuildUpRangeType(pDC_CAPABILITY pCap, pDC_RANGE lpRange) {
 pDC_RANGE pRange;
 // Allocate a block large enough for RANGE struct
 if ((pCap->hContainer = (DC_HANDLE) GlobalAlloc(GHND, 
 sizeof(DC_RANGE))) == NULL)
 return FALSE; // Return FALSE if error
 // Lock the memory
 if ((pRange = (pDC_RANGE) GlobalLock(pCap->hContainer)) == NULL) {
 GlobalFree(pCap->hContainer);
 return FALSE; // Return FALSE if error
 }
 // Copy complete RANGE structure

 *pRange = *lpRange;
 // Unlock the memory
 GlobalUnlock(pCap->hContainer);
 return TRUE;
}
/*** FUNCTION: ExtractRange
 * ARGS: pCap pointer to a capability structure, details about container
 * lpRange ptr to RANGE struct for return
 * NOTES: This routine will open a container and extract the RANGE.
 * COMMENTS: the complete RANGE struct is returned at lpRange. */
BOOL Containr::ExtractRange(pDC_CAPABILITY pCap, pDC_RANGE lpRange) {
 pDC_RANGE pRange;
 // Lock the container for access
 if ((pRange = (pDC_RANGE) GlobalLock(pCap->hContainer)) == NULL)
 return FALSE;
 // Copy the complete structure
 *lpRange = *pRange;
 GlobalUnlock(pCap->hContainer);
 return TRUE;
}





Listing Three

// Sample TWAIN Application Program -- (c) Craig A. Lindley 1993
/* This program exercises the MYTWAIN.DLL. It is written in Borland's OWL. It 
allows the user to select a Source for acquisition and to acquire an image
from
the selected Source. Each acquired image is written to the file c:\scanimg.tif
in the root directory of drive C. Images are not displayed. */

#include <owl.h>
#include "app.h"
#include "twain.hpp"

class TSampleTWAINApp : public TApplication {
 public:
 TSampleTWAINApp(LPSTR AName, HINSTANCE hInstance,
 HINSTANCE hPrevInstance,
 LPSTR lpCmdLine, int nCmdShow)
 : TApplication(AName, hInstance, hPrevInstance,
 lpCmdLine, nCmdShow) {};
 virtual void InitMainWindow();
};
_CLASSDEF(TAppWindow)
class TAppWindow : public TWindow {
 private:
 virtual BOOL CanClose();
 virtual void CMSelectSource(RTMessage Msg)
 = [CM_FIRST + CMID_SELECTSOURCE];
 virtual void CMAcquire(RTMessage Msg)
 = [CM_FIRST + CMID_ACQUIRE];
 virtual void CMHelp(RTMessage Msg)
 = [CM_FIRST + CMID_HELP];
 Twain *TwainClassPtr; // Pointer the the Twain object
 public:
 TAppWindow(PTWindowsObject AParent, LPSTR ATitle);

 ~TAppWindow();
};
// Window Class Constructor
TAppWindow::TAppWindow(PTWindowsObject AParent, LPSTR ATitle)
 : TWindow(AParent, ATitle) {
 AssignMenu("CmdMenu"); // Assign the menu to the window
 TwainClassPtr = new Twain; // Instantiate the Twain class object
}
// Window Class Destructor
TAppWindow::~TAppWindow() {
 delete TwainClassPtr; // Delete the Twain class object
}
BOOL TAppWindow::CanClose() {
 return TRUE; // Allow the window to close
}
// This function is called when the Select Source menu item is clicked
void TAppWindow::CMSelectSource(RTMessage) {
 // Make a call to the DLL to perform the operation
 TwainClassPtr->SelectSource(HWindow);
}
// This function is called when the Acquire menu item is clicked
void TAppWindow::CMAcquire(RTMessage) {
 // Make a call to the DLL to perform the operation
 TwainClassPtr->ScanImage("c:\\scanimg.tif");
}
// This function is called when the Help menu item is clicked
void TAppWindow::CMHelp(RTMessage) {
 MessageBox(HWindow, "(c) Craig A. Lindley, 1993",
 "Sample TWAIN Application", MB_OK);
}
void TSampleTWAINApp::InitMainWindow() {
 MainWindow = new TAppWindow(NULL, Name);
}
int PASCAL WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
 LPSTR lpCmdLine, int nCmdShow) {
 TSampleTWAINApp MyApp("Sample TWAIN Application Program",
 hInstance, hPrevInstance, lpCmdLine, nCmdShow);
 MyApp.nCmdShow = SW_SHOWMAXIMIZED;
 MyApp.Run();
 return MyApp.Status;
}





















September, 1994
PROGRAMMING PARADIGMS


Power Mac Development(s)




Michael Swaine


I attended Apple's Worldwide Developers Conference with more than my usual
level of curiosity. I wondered what technologies Apple would be pushing and
what new developments I'd hear about, but I was also curious to hear what spin
Apple would be putting on the news and rumors of recent days.
The conference turned out to be less dramatic than past conferences. A few
developers, including some conference speakers, thought that this was just a
consequence of the conference being less well planned than in previous years.
But some attendees, myself included, found the lower-key conference
refreshing. I went with specific questions about specific technologies, and
got some answers. The Newton team, an earnest bunch, had information I
particularly needed.
I also saw demonstrations of hot technologies, including the next version of
QuickTime and the dynamic language, Dylan. Dylan's development environment
seems to answer Lee Buck's plea (see last month's column) for developer tools
that treat the developer with respect.
My less-specific curiosity about future directions for Apple software and
hardware was satisfied, too. The changes Apple is making in all levels of its
technology are dramatic: moving from CISC to RISC; dumping NuBus for the PCI
bus; retooling the toolbox; rewriting the operating system (and moving to a
kernel-based operating system next year); and pushing for a fundamental change
in the way applications are designed, sold, and used in the OpenDoc model of
document-centered programming.
I know that none of these technological directions is unique to Apple; what I
find impressive is that very little of what Apple does isn't slated for
radical change in the 1994--1996 time frame. Things have to go well with each
of these changes or the next change in the sequence will falter. If Apple
hadn't done a good job in introducing the PowerPC-based Power Macintosh this
year, next year's introduction of a kernel-based, nearly native PowerPC
version of the operating system would be hurt.
Of course, we all read in the weeklies how Apple did a great job in
introducing the Power Macintosh. Impressive compatibility with 68K software,
surprisingly acceptable performance in emulation, and lots of native
applications. Or wait, was it "hardly any native applications"? The message
began to get mixed. And sales of the machines, rapid at first, dropped off
within a few months, alarming Apple stockholders. What was going on?
The most obvious theory, and the one I'm buying until it's proven wrong, is
that Apple has a crisis in developer support, has failed to deal with it, and
is suffering as a result. There are other theories, though_.


What is Making Tom Pittman So Mad


On the last day of the conference I ran into an old friend. An old friend of
this magazine, that is: Tom Pittman.
Tom will be familiar to long-time DDJ readers as the earliest implementor of
Tiny Basic. In the very first issue of this magazine, the editors laid down
the challenge of implementing a tiny language that could load and execute in
the meager pittance of RAM available in the microcomputers of the day. Tom
immediately took up the challenge.
I mention this not just out of nostalgia but because it tells you what kind of
guy Tom is: He thrives on challenges. His day job has typically been doing
some sort of embedded-systems development, an area in which you can be
challenged by limitations of the hardware, and not just by the limitations of
the hardware vendor.
In recent years, though, Tom's been writing Macintosh development tools, like
CompileIt. Tom's tools turn HyperCard into a more serious development
environment. They allow HyperTalk scripters to compile and optimize chunks of
HyperTalk code for speed or to access system services not supplied by
HyperCard. They also allow HyperCard to generate stand-alone applications.
(HyperCard itself will now finally produce stand-alones, but Tom's product
still does it better.)
Tom's products are good, and he has a loyal customer base. But he's in an odd
niche: producing serious development tools but selling to HyperCard
developers, people Apple does not consider serious developers. That niche may
have turned into a crack to fall through, as our conversation will show: 
DDJ: I was talking with a mutual friend, and he hinted that you were not happy
with Apple. What's your concern?
TP: My concern is that as near as I can tell, either Apple is lying through
their teeth when they say that they want developers to port their products to
native code, or [the PowerPC team is] staffed by incredibly incompetent bozos.
DDJ: I see.
TP: In that group only. The HyperCard team has been open and helpful and
supportive. The OpenDoc team has been open and helpful and supportive.
DDJ: So what happened between you and the PowerPC team?
TP: I went to them in August of last year. I said: I have thousands of
customers that would like to port their applications to native mode. They're
not using C, so I need some technical answers so my compiler can support them.
I figured six to nine months to get my compiler to run native and produce
native code. I started asking questions in August. Those guys wouldn't give me
the time of day. I asked and I asked and I asked and I asked, and finally they
said, "well, we don't answer technical questions to people who are not
seeded."
DDJ: And you didn't have a seed unit?
TP: You read in MacWEEK how easy it is to get seed units. Right. To this day I
have not got technical answers from that group.
DDJ: I can see why you're annoyed. What did you do?
TP: I bought a Power Macintosh and I am reverse engineering it. That is
deplorable. That is absolutely unacceptable.
DDJ: I should think it would be unacceptable to Apple.
TP: Either they are a bunch of bozos who don't know which end is up, or
Spindler has a secret plan--and I've already said this in public--a secret
plan that recognizes the huge backlog inventory of 030s, a manufacturing
capacity of Power Macintoshes that is not up to speed yet, and the bad PR that
you would get from a demand that exceeds the supply. So while saying "convert,
convert, convert," he's holding the software back. No development software.
DDJ: Well, there's MetroWerks' CodeWarrior.
TP: Okay, MetroWerks was a surprise. They weren't prepared for that. They
couldn't block them. They were too visible. So they gave them support, but
basically held everybody else back. And they withheld the beta units that they
were promising developers. They actually promised developers beta units, and
at the very last minute, somebody very high up in management diverted them.
Where did they get diverted to? They got diverted to education, manufacturing,
industry--people who can't produce software to run on it. We know that those
guys got beta units; they got quoted in MacWEEK the first week. So the PR is
all plus, plus, plus, and the actuality is all minus, minus, minus. Then the
word gets out there's no apps. If you want to run PhotoShop and you need that
superfast speed, buy one today. Otherwise wait.
DDJ: That's certainly the message that a lot of people got.
TP: So Apple comes out with new processors, flushing their 030 pipeline.
DDJ: Well, as you say, that's one theory regarding the lack of developer
support.
TP: That's the sinister spin you could put on this thing.
DDJ: Right. So where does Tom Pittman go from here?
TP: I finally have given up. I am not going to depend on Apple for anything.
CompileIt has to go native. It's no secret. Two years from now there is not
going to be any market for 68K. Zero.
DDJ: And you are_?
TP: As I said, I am reverse engineering the Power Macintosh. I have PEF almost
all figured out.
DDJ: Tell me about PEF.
TP: Ah, yes. PEF. Apple won't tell you about PEF. But I'm not telling you any
secrets, because I have none. It is thought that PEF might involve some patent
issues. This is the "preferred execution format"--that's what PEF stands for.
If they really wanted people developing for the Power Macintosh, they would
make the file format open and available. PEF is not available. You cannot get
it from anybody. When you beat on them hard enough they say, "Talk to Apple
Licensing." When you talk to Apple Licensing you find out, "Oh yes, we can
license it to you for many thousands of dollars per year plus this restrictive
contract that says that you're not allowed to give anything to anybody else
unless they sign it also." Obviously it's a great format for fleecing the
developers. And some of the developers are willing to be fleeced. I'm not one
of them.
DDJ: So you're an unhappy camper.
TP: Yeah. I wrote a nasty flame to the new editor of MacTech before I knew
that he was on the PowerPC team, before he took the editor job.
DDJ: Oops.
TP: I really have to stop writing letters in the heat of passion.



Why Hal Hardenbergh has Emulation Reservations


Maybe Tom should emulate Hal Hardenbergh. Hal manages to write some pretty
devastating criticism while coming across as the most reasonable of men. He
does this by marshaling the facts and presenting them in a manner calculated
to let them speak as boldly as possible. His facts seem dramatic, while his
personal style seems calm and dispassionate.
You may not be familiar with Hal's work, although he has written for this and
other technical magazines, because his main channel these days is an
irregularly appearing letter that he sends to a select group of people--23 of
us, at last count. I don't know what criteria he applies that put me on the
list, but I'm glad to be there.
Hal called recently to ask if I had seen any outrageous claims regarding Power
Macintosh emulation, because he was going to, uh, discuss them in his next
letter. I hadn't seen any claims that he hadn't already seen, and he had even
come across one claim in a less technical publication that the current Power
Macs could run Windows software in emulation at Pentium speeds! Maybe Hal will
let me pass along some of those claims and his thoughts on emulation in a
future column; that is, if he's not too annoyed by the lead sentence two
paragraphs back. 


What Jeff Duntemann is Saying about Your Face


Of course, it's not the emulation but, rather, the native applications that
make the Power Macintoshes fly. And despite Apple's slowness in getting
developer tools for PowerPC development into the hands of developers, there
were a lot of announced native applications either at, or shortly after, the
arrival of the hardware. Were these ports real?
It turns out that, while there were some genuine rewrites, many of the early
native releases were Flashport jobs. Flashport is the code-translation product
from Echo Logic (based on Bell Labs compiler-optimization technology) that
makes it possible to move a 680x0 application to the PowerPC in a matter of
weeks, or even days.
I picked up the Flashport fact in Inside the PowerPC Revolution, by Jeff
Duntemann and Ron Pronk. I recommend this book highly if you are at all
interested in the PowerPC. The authors attempted to do a lot in one book, and
seem to have succeeded overall.
The style is generally light and conversational (the book even includes some
interviews). Duntemann is one of the best explainers around, and with few
lapses, this book exemplifies his passion for finding the least painful way to
put across a complicated idea.
The content ranges broadly, including journalistic coverage of the newsworthy
facts, analytical discussion of the significance of events, and technical
exposition. One early chapter is even a breezy, readable history of personal
computing, included, no doubt, to broaden the target audience for the book by
bringing everyone up to date and identifying the players in the drama to come.
And a drama it is, with chapters on Apple and IBM's future PowerPC strategies
and present maneuvers, a provocative analysis of Intel's shortsightedness, and
some not-too-far-fetched speculations on the future.
But then there's the solid technical information, too: The authors play the
RISC-versus-CISC game, but at a higher level than the weeklies, digging into
just what aspects of RISC are turning out to be important.
The technical level ranges from user-level discussions, such as how it is
possible that an application program could actually run faster in emulation on
a Power Mac than native on the fastest 680x0 Mac, to a rather detailed
discussion of cache architectures.
I like this book, and I think it's because of the breadth and quality of
discussion it presents. But I have to admit that I am intrigued when the
editor of a magazine for PC programmers writes a book that contains a section
titled, "Intel and Microsoft Are in Your Face and in Your Way." Heresy gets me
every time.


What Tom Thompson Thinks about CodeWarrior


Let me tell you about one other book, a book-and-CD package, really.
Power Macintosh Programming Starter Kit, by BYTE senior tech editor-at-large
Tom Thompson, was published this year by Hayden Books, and includes a
stripped-down version of the aforementioned MetroWerks CodeWarrior development
system for PowerPC and 680x0 Macs.
I'm assuming there are some readers of this magazine who are not committed Mac
developers, but who have 40 bucks worth of curiosity about Power Mac
development. This book is for them.
As of this writing, MetroWerks' CodeWarrior is the development environment in
which anybody with a choice would want to develop software for the Mac. You
can work on a 680x0 Mac or on a Power Mac and can develop for 680x0 and/or
Power Mac. It's a reasonably friendly environment and produces fast PowerPC
code. And there's this book to help get you started.
Thompson is as narrowly focused as Duntemann and Pronk are broad. He spends
about ten pages on PowerPC background; the rest of the book is dedicated to
showing how to use CodeWarrior to develop software for the Power Mac.
After a brief look at the development environment, he jumps into code and
moves on to some other programs that exercise the compiler and the programmer.
Full source listings for the programs are in the book.
There are also chapters on debugging, the 68K and PowerPC application run-time
architectures, and a brief checklist for people porting apps to PowerPC. It
really is what the title says: a Power Mac programming starter kit. It's a
very good one, too.
The program I found most educational, as well as most useful, was a little
hack that ejects a CD-ROM from its drive. This gets around an annoyance that
has bugged me for years: Under certain circumstances, circumstances that often
seem to apply in my office, the Mac system software will decide that a CD-ROM
is a shared volume and will refuse to eject it.
Ten years of Mac hacking has taught Tom Thompson what the most annoying
problems on the Mac are. When he wrote this book, he knew that fixing at least
one of those problems ought to be a high priority.



























September, 1994
C PROGRAMMING


Patterns, New C++ Features, and OS/2




Al Stevens


I spoke last week at the Borland International Conference in Orlando, Florida.
The conference was held on Disney property in the Swan and Dolphin resort
hotels. In true Philippe style, the parties were the highlight, and there were
several conference-hosted trips to the Disney theme parks. The vision of
Borland's David Intersimone dressed like an Egyptian and getting shot by a bad
guy in the Indiana Jones attraction is something that we'll not soon forget. 
Several sessions at the conference addressed patterns, a newly evolving design
approach being studied by a small group of industry collaborators. Kent Beck,
one of the group, has written about patterns in DDJ and elsewhere; see, for
instance, "Patterns and Software Development" (DDJ, February 1994). Jim
Coplien and Grady Booch are also members of the patterns group, which is
attempting to define how designers can identify and reuse patterns that repeat
themselves in the expression and solution of software-design problems.
Correlations exist between patterns in architecture, which is centuries old,
and patterns in software design, which is only decades old. A pattern is an
expression of both the problem and its solution. This is not a new idea. In
the early 1970s, we taught that the solution to a problem should resemble the
problem; that you should be able to derive the problem by looking at the
solution. Patterns in solutions are things that we recognize from experience,
and they should clearly express their purpose. Putting this recognition to
advantage involves recording the nature of patterns and identifying where they
apply in the expression of other problems and solutions. Expect to see a lot
more about patterns in the next couple of years. Jim Coplien expresses
concerns that zealous book writers will rush to publish and eager toolmakers
will crank out early CASE tools well before anyone clearly understands
patterns. His advice: Try this at home, not at work. 


C++ Enhanced


The ANSI/ISO X3J16 committee's standard definition for C++ includes extensions
to the language that the C++ programming community at large wants and needs.
The principal new features--those not supported by all current C++
implementations--are templates, exception handling, namespaces, new-style
casts, and run-time type information.
Although several contemporary compilers support templates, the definition is
changing significantly. The view of a template as a kind of macro is gone. In
some implementations, the template member functions must be in view during the
instantiation of a parameterized type object. The template header file has not
only the template class declaration but must have the member-function
definitions, too. The new language definition provides for the member
functions to be in their own translation unit, unseen by the code that uses
the template. The binding of unique parameterized functions to types is done
by the linker, rather than during the compile, to suppress code generation for
multiple uses of the same template for the same type in different translation
units. The former is the preferred model, according to Bjarne Stroustrup in
The Design and Evolution of C++ (Addison-Wesley, 1994). Not all existing
compilers work this way, and some will have to change.
Several compilers now support exception handling. Its behavior is well
understood, and the design is nailed down. The efficiency with which compilers
will implement exception handling remains to be seen. I discussed the subject
in two successive columns last year.
I have not yet experimented with namespaces. They have not been implemented in
the PC compilers that I have. The feature is designed to eliminate a problem
that occurs when multiple libraries in a project use colliding global
identifiers. The namespace feature places independent global declarations
within unique namespaces to isolate them from one another. Tom Pennello,
vice-president of engineering for MetaWare (developers of a compiler which
does support namespaces) wrote on the topic last month; see "C++ Namespaces"
(DDJ, August 1994).


Casts


New-style casts are replacing traditional C and C++ typecast notation,
providing safer notation that can reflect the design of polymorphic class
hierarchies and that can be readily located in code with text-searching tools
such as grep. The old-style casts are still supported by the language, but
their use is discouraged and they will gradually disappear as new programs
replace old ones. In a perfect universe, programs need no casts at all, and
the framers of the language would like to have eliminated them altogether.
Research shows, however, that many idioms require them, particularly in
systems programming. The old-style cast is known to be unsafe, error prone,
and difficult to spot when we read programs. The new-style casts are an
attempt to improve the casting situation.
There are four new casting operators. Each one returns an object converted
according to the rules of the operator. They use the syntax cast_operator
<type> (object), where the cast_operator is either dynamic_cast, static_cast,
reinterpret_cast, or const_cast; the type argument is the type being cast to;
and the object argument is the object being cast from.


dynamic_cast


The dynamic_cast operator casts a base-class reference or pointer to a
derived-class reference or pointer. You can use it only when the base class
has virtual functions. It provides a way to determine at run time if a
base-class reference or pointer refers to an object of a specified derived
class or to an object of a class derived from the specified class. Example 1
shows how you use it. If you use references rather than pointers, dynamic_cast
throws a bad_cast exception when the target is not of the specified class.
The dynamic_cast operator provides a form of run-time type identification (not
to be confused with run-time type information, RTTI, discussed shortly). A
program can determine at run time which of several known, derived types a
base-class reference or pointer refers to. This feature supports idioms that
virtual functions might not. The Control class in Example 1 knows that a
derived EditBox object has unique requirements not shared by all derived
Control objects. Rather than burden all classes derived from Control with
empty virtual functions to emulate those unique to EditBox, the design casts
the object's base pointer to point to an EditBox object. If the object is not
an EditBox or of a class derived from EditBox, the cast returns a 0 value, and
the program knows not to call functions unique to EditBoxes.


static_cast


Unlike dynamic_cast, the static_cast operator makes no run-time check and is
not restricted to base and derived classes in the same polymorphic class
hierarchy.
If you are casting from a base to a derived type (not always a safe
conversion) static_cast assumes that its argument is an object of (or pointer
or reference to an object of) the base class within an object of the derived
class. The cast can result in a different, possibly invalid address. In
Example 2, if the bp pointer does, in fact, point to an object of type C, the
cast works correctly. If it points to an object of type B, the cast makes the
conversion, but the effective address is less than the address of the B object
with the difference representing the size of the B class. This address is
incorrect.
Similarly, if the pointer points to an object of the base class, using the
derived class pointer to dereference members of the nonexisting, derived class
object causes unpredictable behavior.
If you are unsure about the safety of the cast, use dynamic_cast and check the
result.
If you are casting from a derived to a base type (a safe conversion),
static_cast assumes that its argument is a valid object of the derived class
or a pointer or reference to an object of the derived class.
You can also use static_cast to invoke implicit conversions between types that
are not in the same hierarchy. Type checking is static. That is, the compiler
checks to ensure that the conversion is valid. Assuming that you did not
subvert the type system with an old-style cast or reinterpret_cast to coerce
an invalid address into a pointer or initialize a pointer with 0, static_cast
is a reasonably safe typecasting mechanism.


reinterpret_cast


The reinterpret_cast operator replaces most other uses of the old-style cast
except those where you are casting away "const-ness." It will convert pointers
to other pointer types, numbers to pointers, and pointers to numbers. You
should know what you are doing when you use reinterpret_cast just as you
should when you use old-style casts. That is not to say that you should never
use reinterpret_cast. There are times when nothing else will do.


const_cast



The three cast operators just discussed respect const-ness. That is, you
cannot use them to cast away the const-ness of an object. For that, use the
const_cast operator. Its type argument must match the type of the object
argument except for the const and volatile keywords.
When would you want to cast away const-ness? Class designs should take into
consideration the user who declares a const object of the type. They do that
by declaring as const any member functions that do not modify any of the
object's data-member values. Those functions are accessible through const
objects. Other functions are not. Some classes, however, have data members
that contribute to the management rather than the purpose of the objects. They
manipulate hidden data that the user is unconcerned about, and they must do so
for all objects, regardless of const-ness.
For example, suppose there is a global counter that represents some number of
actions taken against an object of the class, const or otherwise. In Example
3, if the declaration of the A::report() member function was not const, the
using program could not use the function for const objects of the class. The
function itself needs to increment the rptct data member, which it normally
could not do from a const member function. const functions cannot change data
values. To cast away the const-ness of the object for that one operation, the
function uses the const_cast operator to cast the this pointer to a pointer to
a non-const object of the class.


Run-time Type Information (RTTI)


The typeid operator supports the new run-time type-information feature. Given
an expression or a type as an argument, the operator returns a reference to a
system-maintained object of type Type_info, which identifies the type of the
argument. There are only a few things that you can do with the Type_info
object reference. You can compare it to another Type_info object for equality
or inequality. You can initialize a Type_info pointer variable with its
address. (You cannot assign or copy a Type_info object or pass it as a
function argument.) You can call the member function Type_info::name() to get
a pointer to the type's name. You can call the member function
Type_info::before() to get a 0 or 1 integer that represents the order of the
type in relation to another type.
How would you use typeid? What purpose is gained by determining the specific
type of an object? The dynamic_cast operator is more flexible in one way and
less in another. It tells you that an object is of a specified class or of a
class derived from the specified class. But to use it, the specified class
needs at least one virtual member function. And dynamic_cast does not work
with intrinsic types.
Consider a persistent-object database manager. It scans the database and
constructs memory objects from the data values that it finds. How does it
determine which constructors to call? RTTI can provide that intelligence. If
the first component of a persistent-object record is the class name (or,
better yet, a subscript into an array of class names), the program can use
RTTI to select the constructor. Consider this example, where the database
scanner retrieves the class name of the next object and calls the
DisplayObject function. Example 4, where only three classes are recorded in
the database, assumes that the database manager knows how to construct each
object when the file pointer is positioned just past the type identifier in
the record. This technique assumes that the scanner program knows about all
the classes in the database and is similar to one that I use in the Parody
object-database manager. I'll be discussing Parody in detail in future
columns.


OS/2: Seven Points on the Richter Scale


I am writing this column with Word for Windows 6.0 running as an OS/2-Win
application, and I am not a happy scribe. I just spent three days installing
OS/2. I spent those three days watching OS/2 and its installation procedure
crash with regularity.
Associates have been telling me that I need OS/2, the better Windows than
Windows, the better DOS than DOS, and the best environment for developing DOS
and Windows applications. They smugly point out that their operating system is
better than mine. OS/2 uses preemptive multitasking in protected mode, and
applications cannot take out the operating system when they blow up. Just
flush that sucker out and keep moving along. They say that high-tech
programmers shouldn't piddle around with low-tech software like MS-DOS and
Windows.
Until now, I organized programming projects into Windows Program Manager
groups with icons for the word processor, the C++ compiler, the program
itself, and so on. I jumped from task to task using Windows' simple
multitasking capabilities. It worked well enough, but from time to time a
program would blow up and take Windows, DOS, and anything I hadn't saved along
with it.
A copy of OS/2 2.1 has been on the shelf for a couple of months now, and
Borland's OS/2 C++ compiler arrived recently, something I've been wanting to
try. You can install OS/2 all by itself or with DOS in a dual-boot
configuration. You can use the OS/2 improved file system only if you do not
use the dual-boot feature. Praise glory that I did not go the full high-tech
route. I set up a DOS boot diskette just to be safe and left the FAT file
system intact. A prudent battle plan includes the retreat. Right out of the
box, the installation program died. It got just so far and left me staring at
a dark screen. The installation manual said that it might do that--brand new
software telling me to try it and see if it blows up. What will they think of
next? OS/2 had decided that I had some particular video display configuration,
so it installed itself accordingly. Then, when trying to use the video mode,
OS/2 changed its mind and quit.
The procedure said to restart the installation, press Esc at a certain place,
and run the SETVGA program to reconfigure everything to VGA, the lowest common
video denominator. SETVGA asked me to insert Display Driver Diskette 1. But I
was installing from a CD-ROM--and there is no Diskette 1. SETVGA, apparently
not mindful of how things are being done, wants that diskette. I located a set
of the 19 OS/2 installation diskettes and went from there. SETVGA ran for a
while and said it couldn't find a file to complete the reinstallation. It
didn't say which file, just that it couldn't find one.
I tried booting, which did not work. An error message said that OS/2 couldn't
find VVGA.SYS, a clue to the previously unnamed missing file. Using my boot
diskette and trusty, clunky old DOS, I determined that OS/2's CONFIG.SYS was
trying to load a VVGA.SYS device driver, which was nowhere to be found. I
couldn't find it on the hard disk, the diskettes, or the CD-ROM. There was a
file named VSVGA.SYS, so I changed CONFIG.SYS to load that one. OS/2 booted to
a dark screen again.
Eventually, the display problem mysteriously went away. I had been messing
with the video adaptor's installation program and something I did made the
display start working. I continued the installation, and it ran for a while
and locked up. Several restarts had similar results but at different places.
After deleting everything and starting over, the installation went without a
hitch, and OS/2 was running. Still, OS/2 has decided that my video display is
480x640 VGA only. I want 600x800, which Windows 3.1 under DOS finds perfectly
acceptable. OS/2 refuses to install itself that way. When I finally got
through the selective installation of Super VGA, OS/2 wouldn't run. The only
high-resolution configuration that works is 8514 emulation, which produces a
washed-out look that I don't like. 
I set up a folder and installed Word for Windows 6.0 into it. Ideally, the
first thing you do in the morning is start all the programs you use and leave
them running as icons. Then you activate them when you need them and minimize
them when you don't. With Word running, I pressed Alt+Tab to get back to the
desktop, which is when the unthinkable happened--the system locked up tighter
than a harp string. I had to reboot.
Next I installed WinCIS, the CompuServe Information System for Windows
program. OS/2 claims to be able to handle communications in the background
while you do other things, but the program will not run in the background
without locking up OS/2. Likewise, the CD-ROM-based CorelDraw setup program
crashes the system.
Before you OS/2 mavens rush to write letters telling me what I did wrong, how
unfair I'm being, how OS/2 deserves a more thorough and comprehensive
look-see, listen up. First, this isn't the whole story. I left out a lot of
details because they're more crashingly boring than these. Second, software
needs to be easy to install and use. Nobody should have to go through what I
did, particularly not the typical, nontechnical user. The only reason I got it
running is because, as a programmer, I know how to get around things, deduce
the meaning of arcane system constructs, and find and try alternatives.
Average users are not ready for this. Maybe OS/2 is great with powerful native
OS/2 applications. But OS/2 is promoted as being able to run Windows
applications in an armored environment. It doesn't. This highly touted,
protected-mode, bullet-proof operating system cannot install itself or install
and run three of my favorite Windows applications without crashing.
I'm using OS/2 now to develop C++ programs mainly to use the Borland compiler
and the flat 32-bit memory model. OS/2's DOS emulation is indeed better than a
Windows DOS box. Windows under OS/2 is definitely not better than Windows
running under DOS. Programs take longer to load, they run slower, and the
screen is jumpy and uncomfortable to use.
Things still crash the system. Quincy aborts OS/2 when the help database is
not properly built. The GUI goes off somewhere into the ether, and full
text-screen messages display with codes, memory addresses, and instructions to
write everything down and call my service representative, whoever that is.
Word for Windows 6.0 is sluggish and maims OS/2 from time to time. You can see
it coming. 
The user's manual and online help documentation don't help. I am still trying
to figure out how to get a document on the desktop that associates with Word
for Windows. Sure, tell me how easy it is, but try figuring out how when you
don't already know.
Last week at the Borland International Conference, IBM had their usual array
of PCs with OS/2 installed for test driving. I wanted to look into OS/2 word
processors. They had WordPerfect, Ami Pro, and a third one whose name I
forget. I tried the third one first. Guess what--it locked up OS/2. The
attendant suggested that I move to another PC and not try that program again.
I took his advice and moved to another PC--in a different exhibit.
Example 1: Using the dynamic_cast operator.
class EditBox : public Control { ... };
void Paint(Control *cp)
{
 EditBox *ctl = dynamic_cast<EditBox*>(cp);
 if (ctl != 0)
 // ctl points to an EditBox
 else
 // cp points to a non-EditBox Control
}
Example 2: Using the static_cast operator.
class C : public A,
public B { /* ... */ };
B *bp;
C *cp = static_cast<C*>(bp);
Example 3: Using the const_cast operator.
#include <iostreams.h>
class A {
 int val;
 int rptct; // number of times the object is reported
public:
 A(int v) : val(v)
, rptct(0) { }
 ~A()
 { cout << val << " was reported " << rptct << " times."; }
 void report() const;
} ;

void A::report() const
{
 const_cast<A*>(this)->rptct++;
 cout << val << '\n';
}
int main()
{
 const A a(123);
 a.report();
 a.report();
 a.report();
 return 0;
}
Example 4: Using RTTI.
void DisplayObject(char *cname)
{
 if (strcmp(cname, typeid(Employee).name())==0) {
 Employee empl;
 empl.Display();
 }
 else if (strcmp(cname, typeid(Department).name())==0) {
 Department dept;
 dept.Display();
 }
 else if (strcmp(cname, typeid(Project).name())==0) {
 Project proj;
 proj.Display();
 }
}

































September, 1994
ALGORITHM ALLEY


NP-Completeness




Bruce Schneier


A traveling salesman has to make sales calls in ten different cities, but only
has one tank of gas. Is there a route that will allow him to visit all ten
cities on that single tank of gas? See Figure 1.
This variation on the classic traveling-salesman problem is formally known as
the "Hamiltonian circuit problem." You can think of the map as a connected
graph, where the cities are nodes and roads between them are edges. Is there a
single path connecting all the nodes, where the total of all the edges is less
than a given amount?
The data structure for this problem is a matrix (see Figure 2), where element
(A,B) is the value of the distance between City A (Albuquerque) and City B
(Bakatapur). The challenge is to find a set of array elements, such that there
is exactly one element in each row and one element in each column, and such
that the sum of all the elements is less than a given amount.
While this problem is easy to solve with only ten cities, it rapidly
approaches the impossible as the number of cities increases because there's no
efficient algorithm to solve the problem. The best algorithm is to guess at
the solution, and then refine the guess based on the results. In short,
mathematics can't give an efficient algorithm for this problem. 
Unfortunately, mathematics can't prove that there isn't an efficient
algorithm, either. The best a mathematician can say is, "I can't find an
efficient algorithm, but neither can all these famous mathematicians."


Complexity of Algorithms


Complexity theory provides a methodology for analyzing the computational
complexity of different programming algorithms to solve problems such as that
of the traveling salesman. Using complexity theory, you can compare the
efficiency of different algorithms and determine which is faster.
The computational complexity of an algorithm is expressed in what is called
"big O" notation, the order of magnitude of the computational complexity. The
order of magnitude of the complexity is just the term of the complexity
function that grows the fastest as n gets larger; all constant and lower-order
terms are ignored.
For example, if the computational complexity of a given algorithm is
4n2+7n+12, then the computational complexity is on the order of n2, expressed
as O(n2).
The advantage of measuring complexity in this way is that it is system
independent. You don't have to know the exact timings of various instructions,
the number of bits used to represent different variables, or even the speed of
the processor. One computer might be 50 percent faster than another and a
third might have a data path twice as wide, but the order-of-magnitude
complexity of an algorithm remains the same. This isn't cheating. When you're
dealing with algorithms as complex as those here, this is usually negligible
compared to the order-of-magnitude complexity.
What this notation allows you to see is how the time and space requirements
are affected by the size of the input. For example, if T=O(n), then doubling
the size of the input doubles the running time of the algorithm. If T=O(2n),
then adding one bit to the size of the input doubles the running time of the
algorithm. 
Generally, algorithms are classified according to their time or space
complexities. An algorithm is constant if its complexity is independent of n,
namely, O(1). An algorithm is linear, O(n), if its complexity grows linearly
with n. Algorithms can also be quadratic, cubic, and so on. All these
algorithms are polynomial; their complexity is O(nt), where t is a constant.
Algorithms that have a polynomial complexity class are called "polynomial
time" algorithms.
Algorithms that have complexities of O(tf(n)), where t is a constant and f(n)
is some polynomial function of n, are called "exponential." Exponential
algorithms quickly get computationally impossible, so they are often unusable.
Cryptographers like to base the security of their ciphers on these algorithms,
though.
As n grows, the complexity of an algorithm can make an enormous difference in
whether or not the algorithm is practical. Table 1 shows the running times for
different algorithm classes for various values of n. Notice how fast the
complexity grows in the exponential case. If n is equal to 1 million, and if a
computer can perform one iteration per microsecond, it can complete a constant
algorithm in a microsecond, a linear algorithm in a second, and a quadratic
algorithm in 11.6 days. It would take 32,000 years to complete a cubic
algorithm; not terribly practical, but a computer built to withstand the next
ice age would eventually deliver a solution. Solving the exponential algorithm
is futile, no matter how well you extrapolate computing power, parallel
processing, or the heat death of the universe.
Consider the problem of trying to break a cryptographic algorithm. The time
complexity of a brute-force attack (trying every possible key) is proportional
to the number of possible keys, which is exponential in the key length. If n
is the length of the key, then the complexity is O(2n). For example, DES has a
56-bit key. Against a 56-bit key it will take 256 attempts (2285 years,
assuming 1 million attempts per second); against a 64-bit key, it's 264
attempts (580,000 years, with the same assumptions); and against a 128-bit
key, 2128 attempts (1025 years). The first is on the edge of possibility; the
last is ridiculous to even contemplate.


Complexity of Problems


Complexity theory also classifies problems according to the algorithms
required to solve them. The theory looks at the minimum time and space
required to solve the hardest instance of the problem on a Turing machine--a
finite-state machine with an infinite read/write tape of memory. It turns out
that a Turing machine is a realistic model of computation on real computers.
Problems that can be solved with polynomial-time algorithms are called
"tractable" because they can usually be solved in a reasonable amount of time
for reasonably sized inputs. (The exact definition of "reasonable" is
dependent on circumstance.) Problems that cannot be solved in polynomial time
are "intractable," because calculating their solution becomes infeasible.
Intractable problems are sometimes just called "hard" because they are.
It gets even worse. Alan Turing proved that some problems are undecidable. It
is impossible to write an algorithm to solve them, let alone a polynomial-time
algorithm. 
Problems can be divided into complexity classes, depending on the complexity
of their solutions. On the bottom, class P consists of all problems that can
be solved in polynomial time. Class NP consists of all problems that can be
solved in polynomial time on a nondeterministic Turing machine. This is a
variant of a normal Turing machine that makes guesses. The machine guesses the
solution to the problem and checks its guess in polynomial time.
Mathematically, you assume the nondeterministic Turing machine always guesses
correctly. In practice, of course, this doesn't happen.
Class NP includes class P, because any problem solvable in polynomial time on
a deterministic Turing Machine is also solvable in polynomial time on a
nondeterministic Turing Machine. If all NP problems are solvable in polynomial
time on a deterministic machine, then P=NP. Although it seems incredibly
obvious that some problems are much harder than others (a brute-force attack
against a cipher versus encryption of a random block of plaintext), it has
never been proven that P<>NP. Mathematicians suspect that this is the case,
though.
Stranger still, if you find a polynomial-time solution for several specific NP
problems, then you've found a polynomial-time solution for a whole class of NP
problems. A famous mathematical result proves that no NP problem is harder
than one particular NP problem (the satisfiability problem). But some NP
problems have been proven to be just as hard. This means that if the
satisfiability problem is solvable in polynomial time, then all NP problems
are solvable in polynomial time, and if any NP problem is proved to be
intractable, then the satisfiability problem is also intractable. Since then,
numerous problems have been shown to be equivalent to the satisfiability
problem. This club of maximally hard NP problems is NP-complete. If any
NP-complete problem is in P, then every NP-complete problem is in P, and P=NP.
NP-complete problems are considered the hardest problems in NP. Currently, the
fastest known algorithms for solving any of them have exponential worst-case
complexities. When (or, more realistically, if) someone finds a
polynomial-time solution to any of them, it will be a major breakthrough in
mathematics.
Further out in the complexity hierarchy is PSPACE, in which problems can be
solved in polynomial space, but not necessarily polynomial time. PSPACE
includes NP, but there are problems in PSPACE that are thought to be harder
than NP. Of course, this isn't proven either. There is a class of problems,
called "PSPACE-Complete" such that if any one of them is in NP, then
PSPACE=NP; if any one of them is in P, then PSPACE=P. Finally, there is the
class of problems called EXPTIME, problems solvable in exponential time.


NP-Complete Problems 


In Computers and Intractability: A Guide to the Theory of NP-Completeness
(W.H. Freeman, 1979) Michael Garey and David Johnson compiled over 300
NP-complete problems, among them: 
3-Way Marriage Problem. There is a roomful of n men, n women, and n members of
the clergy (priests, rabbis, and the like). There is also a list of acceptable
marriages, which consists of one man, one woman, and one clergyman willing to
officiate. Given this list of possible triples, is it possible to arrange n
marriages such that everyone is either marrying one person or officiating in
one marriage?
Knapsack Problem. Given a pile of items, each with a different weight, is it
possible to put some of those items into a knapsack such that the knapsack
weighs a given amount?
3-Way Satisfiability. There is a list of n logical statements, each with three
literals: if (x and y), then z; (x and w) or (not z); if ((not u and not x) or
(z and (u or not x))), then (not z and u) or x), and so on. Is there a truth
assignment for all the literals that satisfies all the statements? 
Like the traveling-salesman problem, the best algorithms for all these
problems are exponential. No one knows any polynomial-time algorithm to solve
these problems. No one knows that there isn't a polynomial-time algorithm to
solve these problems. However, if you find a polynomial-time algorithm to
solve one of these problems, then you've found a polynomial-time algorithm to
solve all of them and proven that P=NP. Start looking.
Figure 1 The traveling-salesman problem.
Figure 2 Data structure for the traveling-salesman problem.
Table 1: Running times of different classes of algorithms.
Complexity Class n=10 n=20 n=30 n=40 n=50 n=60

Constant O(1) .000001 s. 000001 s. .000001 s. .000001 s. .000001 s. .000001 s.
Linear O(n) .00001 s. .00002 s. .00003 s. .00004 s. .00005 s. .00006 s. 
Quadratic O(n2) .0001 s. .0004 s. .0009 s. .0016 s. .0025 s. .0036s.
Cubic O(n3) .001 s. .008 s. .027 s. .064 s. .125 s. .216 s.
Exponential O(2n) .001 s. 1.0 s. 17.9 min. 12.7 days 35.7 years 366 cent.
Exponential O(3n) .059 s. 58 min. .5 years 3855 cent. 2*108 cent. 1.3*1013 c.
s. = sec.
c. = cent.






















































September, 1994
UNDOCUMENTED CORNER


The Windows Global EMM Import Interface




Taku Okazaki


Taku Okazaki ("taQ") is a freelance programmer in the suburbs of Tokyo. He has
worked on software for the NEC PC-980x series. You can contact Taku on
NIFTY:CXB00750 (CXB00750@niftyserve.or.jp) or on CompuServe at 100213,3351.


Introduction 
by Andrew Schulman 
I have received a lot of requests for information about an obscure
undocumented interface that goes under a variety of names--the Windows/386
Paging Import specification, the Global EMM Import specification, V86MMGR
Paging Import, Paging Import/Export specification, and even Paging Import
Services specification, to name a few. 
As Taku explains this month, this interface is what the V86MMGR (Virtual-8086
Memory Manager) virtual device driver (VxD) in Windows Enhanced mode uses to
take over the page tables belonging to a 386 expanded memory manager (EMM)
such as EMM386, QEMM, 386MAX, or Helix NetRoom. The page tables include not
only expanded memory, but also XMS (extended memory) and UMBs (upper memory
blocks).
Perhaps one reason there's so much interest in this interface is that
Microsoft's Windows Device Driver Kit (DDK) mentions it in passing. For
example, the description of the _AddFreePhysPage service provided by the
Virtual Machine Manager (VMM) says "the V86MMGR device adds any unused
physical pages it finds when using the Global EMM Import function of a 386
LIMulator" (a LIMulator is a paging EMM driver, and the term "Global EMM"
refers to the fact that the EMM was present before Windows loaded and is
therefore visible in all virtual machines). Similarly, the documentation for
the V86MMGR_GetPgStatus service refers to "paging import from a
LIMulator/UMBulator" (an UMBulator--ridiculous name!--is a paging XMS driver).
Naturally, avid readers of the DDK wonder what these passing remarks refer to.
Microsoft has a document ("Windows/386 Paging Import Specification") available
to a select number of memory-manager vendors, although at least one vendor,
Novell, was unable to get the latest (Version 1.11) revision of the
specification. Consequently, Novell had to "DIY" (do it yourself), at least
according to an engineer at Novell's European Development Centre in
Hungerford, England. 
Taku's EMMIMP program (available electronically, see page 3) simply dumps out
the EMM import structure in a readable form. The V86MMGR device in Windows
actually uses the structure to import an external-memory manager's page
tables. V86MMGR calls the VMM _MapPhysToLinear service to access the
218Ch-byte EMM import structure, then processes the structure using VMM
services like _PhysIntoV86, _AddFreePhysPage, _Assign_ Device_V86_Pages, and
Set_Physical_HMA_Alias. These services are all documented in the DDK. Thus,
the Global EMM Import is basically an undocumented interface that allows those
with access to do something everyone else can do by writing a VxD.
If you have comments or suggestions for this column, please contact me on
CompuServe in the Undocumented Corner area of the Dr. Dobb's CompuServe forum
(GO DDJFORUM), where my ID is 76320,302.
Acouple of years ago, I wrote a piece of code for my NEC PC-9800 that jumped
from real mode into protected mode, then came back in virtual 8086 (V86) mode.
It did nothing but slow down the system, and only a system reset could get rid
of it. It was effectively a V86 monitor. I started playing around with the
program, adding bits of this and bytes of that. Soon I wasn't playing with it
but working on it, and for the following months, quite a bit of my real work
was compromised. In the end, I had a full-blown memory manager that supported
EMS, XMS, UMB, BIOS ROM remapping, VCPI, and DPMI--almost everything a memory
manager needs, with one exception: It couldn't run Windows.
Even though in Japan Windows is not as popular on the PC-9800 as on the PC, I
knew that if I were going to sell my V86 memory manager, distribute it as
shareware, or do anything else with that half-year's worth of work, my memory
manager had to run Windows. 
A V86 memory manager like mine runs at the most-privileged level (Ring 0) of
protected mode. Windows Enhanced mode also wants to run in Ring 0. Only one
program at a time can run in Ring 0, so when Windows starts up, the memory
manager must give up control of protected mode.
The Windows DDK documents how Windows can take control of protected mode away
from a memory manager. The memory manager can hook INT 2Fh; when Windows
starts, it calls INT 2Fh function 1605h, and the memory manager can return the
address of an "Enable/Disable Virtual 8086 Mode callback function" in DS:SI.
Function 1605h is a very overloaded interface: You can also use it to tell
Windows whether it can start up, to identify instance data (see "Undocumented
Corner," DDJ, April 1994), and to identify virtual device drivers (VxDs) that
need loading.
On startup, Windows calls the memory manager's V86 enable/disable function
with AX=0. The memory manager switches the CPU from V86 mode into real mode.
Windows can then enter protected mode. When Windows exits, it sets the CPU
into real mode, and calls the function with AX=1. The memory manager then
regains control of protected mode.
So all I needed to do was give my memory manager a mode-switch entry point,
and return its address in DS:SI when Windows called INT 2Fh function 1605h.
Easy! Now, did Windows run? Partially.
Everything looked okay, except there were no upper memory blocks (UMBs) and no
expanded memory (EMS) in the DOS box, even though they were there before
Windows started! Why did they seem to disappear under Windows?
I should have expected this. A memory manager uses the 386 paging mechanism to
create UMBs and virtual EMS. My memory manager knew which physical pages were
used to do that, but I hadn't told any of this to Windows. 
One way to carry over UMBs and EMS into Windows would be to use a VxD. But a
simple experiment with Microsoft's memory manager, EMM386, showed that I
didn't need a VxD. The VxD associated with EMM386 is embedded right inside
EMM386.EXE, so if I booted with EMM386 installed and then temporarily renamed
EMM386.EXE, Windows wouldn't be able to load the VxD. If the EMM386.EXE file
is missing, EMM386 will tell Windows (via INT 2Fh function 1605h, of course)
not to load. However, when I created a dummy file named EMM386.EXE that didn't
contain the requisite VxD, Windows still started. Furthermore, EMS and the
UMBs were in the DOS box. 
A VxD by itself was not essential to carrying over EMS and UMBs to Windows,
but what was? Searching for documentation was futile, but reverse engineering
revealed an undocumented interface between memory managers and Windows that is
responsible for carrying over EMS/UMB to Windows. The interface, as I later
found out, is called the "Global EMM Import Specification."


The EMM Import Specification


The interface itself is relatively simple. When the V86MMGR (V86 Memory
Manager) VxD in Windows 386 Enhanced mode starts up, it generates an IOCTL
read to "EMMXXXX0", the EMS device, which returns the physical address of a
block of memory; see Figure 1. This block of memory is called the "EMM import
structure" and contains information Windows can use to take over the memory
manager's page tables and support its own virtual EMS/XMS/UMB drivers. 
Figure 2 shows the layout of the EMM import structure; Listing One is the EMM
import as a set of C structures. The EMM import is described in more detail in
EMMIMP.H, available electronically; see "Availability," page 3.
The structure starts with a header, which includes one of three version
numbers. Version 1.00 does not support information on UMBs or XMS free memory;
1.10 does; and 1.11 supplies UMB and XMS information plus the memory manager's
vendor and product names. Windows 3.1 can handle all versions, but Windows 3.0
could handle only 1.00. Memory managers handed UMBs over to Windows 3.0
through VxDs. 
Frame descriptors provide Windows with a snapshot of memory status below 1
megabyte. There are 64 frame descriptors, one for each 16K of memory (a
frame). A 386 page is 4K, so each frame represents four pages. In Figure 2,
"Type" indicates either a normal frame, an EMS page frame, or a frame
containing a UMB. For an EMS page frame, "Physical Page #" is the EMS physical
page number, and "Handle #" and "Logical Page #" are the EMS handle and
logical page numbers mapped to the frame, respectively. For a UMB frame, the
"Handle #" instead acts as an index into UMB frame descriptors. A UMB
descriptor contains four physical page numbers of a UMB frame. 
The Pagemap Physical Address field of an EMS descriptor holds the physical
address of an array of 386 page-table values, describing which physical pages
are owned by the EMS handle. 
The Global EMM Import structure also supplies the real-mode INT 67h vector so
that Windows can install a V86 breakpoint. Windows replaces the first byte of
the real-mode INT 67h handler with an ARPL instruction (63h). When a program
calls the INT 67h handler, an Illegal Opcode Exception (INT 6) is generated,
so Windows can trap these calls.
The HMA page table physical address in Figure 2 points to a table of 386
page-table values that make up the Higher Memory Area (HMA). This table is
necessary when HMA consists of pages higher than 1MB+64KB. Windows adds these
pages to its memory pool (via the VMM _AddFreePhysPage service) or calls
Set_Physical_HMA_Alias for each page, depending on the value of Flag0 in the
header. 
Windows uses free page lists to add memory to its pool of free memory (again,
via _AddFreePhysPage). For memory managers that don't share EMS and XMS
memory, this list can be used to provide free EMS memory to Windows. (All XMS
memory is allocated by Windows when it starts up.)
Bit 0 of the Flag field in an XMS descriptor indicates whether or not the
handle is allocated prior to Windows startup. If bit 0 is 1, then the handle
is free, and Windows can use the handle for its XMS driver. If the memory
manager itself handles XMS calls while Windows is running, this array could be
empty.


Reaching for the EMM Import Structure


Earlier, I stated that the EMM import structure address is returned by an
IOCTL read to "EMMXXXX0". But what if this device does not exist? A memory
manager with EMS disabled (such as EMM386 with "NOEMS" or QEMM with
"EMS=NONE") does not install "EMMXXXX0" because EMS is not active. Instead, a
device with a slightly different name ("EMMQXXX0" for EMM386 and QEMM,
"QMMXXXX0" for 386MAX) is installed. 
Windows does not try to IOCTL-read all these devices if it can't find
"EMMXXXX0"; it issues the IOCTL-read to "EMMXXXX0" only. When memory managers
receive a call to INT 2Fh function 1605h, they rename the device to
"EMMXXXX0". The IOCTL-read to "EMMXXXX0" is issued afterwards, so Windows need
not know the actual device name. Memory managers restore the no-EMS device
name when INT 2Fh function 1606h (the Windows exit broadcast) is detected; see
Table 1.
Issuing INT 2Fh AX=1605h and IOCTL-reading the "EMMXXXX0" device gives you the
physical address to the EMM import structure. Simply hex-dumping the structure
would most likely yield garbage, especially if you have not run Windows since
your last boot. 
The reason is simple: Memory managers don't set up the EMM import structure at
the time of the IOCTL read or the INT 2Fh function 1605h call. They don't set
up the structure until the disable V86-mode switch function is called. Why?
Remember that the EMM import structure is supposed to be a snapshot of memory
usage. It must reflect the status of memory just before Windows gets control
of protected mode. Many programs intercept INT 2Fh, and memory managers are
towards the end of the INT 2Fh chain (because they are one of the first
installed in CONFIG.SYS). If the EMM import structure were setup at INT 2Fh,
other programs could map EMS memory or allocate/free/resize EMS/XMS handles,
all of which would make the structure stale. The same reasoning applies to
IOCTL-read. Thus, to get the EMM import structure, we must call the memory
manager's V86 enable/disable routine. 
Calling the routine to disable V86 mode puts the machine into real mode and
leaves the system in a very unstable state. When the CPU is in real mode, its
paging mechanism is disabled so there are no UMBs or EMS during that time. If
a TSR or driver that hooks an interrupt uses EMS, it probably won't work
properly; if it is in UMB, it will simply disappear. The interrupt chain is
intact, so an interrupt--hardware or software--would be disastrous. For this
reason, when you call the entry point, you must: 
Disable interrupts. 

Call the V86 enable/disable routine with AX=0, to put the machine into real
mode. 
Immediately call the routine again with AX=1, to restore V86 mode.
This will reveal the EMM import structure. (Well, at least with EMM386 and
QEMM. 386MAX requires another twist, described later.)


The EMMIMP Program


EMMIMP is a DOS program that dumps the EMM import structure in a readable
form. EMMIMP.C (Listing Two) basically emulates Windows' interaction with 386
memory managers.
The emulate_win_init and emulate_win_term routines in EMMIMP.C emulate INT 2Fh
functions 1605h and 1606h, respectively . Emulate_win_init receives the
parameter ver, which specifies the version of Windows it should pretend to be
(300h for 3.0, 30Ah for 3.1). Emulate_win_init also returns the mode-switch
entry-point address. Get_emm_imp_addr issues an IOCTL-read to "EMMXXXX0" to
get the physical address of the EMM import structure. Switch_to_real and
switch_to_prot handle calls to the mode-switch entry point. Note that some
crucial registers are saved before calling the mode-switch entry point, since
register preservation is not guaranteed across the call. 
The EMM import structure is somewhere in extended memory, so for the EMMIMP (a
real-mode program) to read it, the program must copy the structure to
conventional memory. MOVEEXT.C (Listing Three) has three functions for the
job: one for the PC (INT 15h function 87h), another for the NEC PC-9800 (INT
1Fh function 90h), and a third for 386MAX.
Figure 3 shows an excerpt of EMMIMP output when running under EMM386 Version
4.45. The output shows EMS and UMB frame descriptors. Frames 0x10 through 0x27
(4000:0--9C00:0) are large-frame EMS physical pages. Frames 0x32 through 0x37
(C800:0--DC00:0) are UMB frames with indexes into UMB descriptors. Frames 0x38
through 0x3b (E000:0--EC00:0) are EMS frames, with frame 0x38 mapped to EMS
handle 1, page 1. The UMB descriptors show where the UMB memory comes from.
For example, 16K of UMB memory at C800:0 (frame 0x37) has UMB descriptor index
0, which in turn shows that the frame maps into physical pages 120h through
123h.
There are two EMS handles: 0, the system handle; and 1, named "test." After
each EMS handle is a dump of the page map, showing the memory owned by the
handle. For example, logical page 0 of EMS handle 1 corresponds to pages 150h
through 153h.
The free-page list shows pages not used by the memory manager while Windows is
running. This is information you can't get using the EMS, XMS, or VCPI
interface. In Figure 3, there are 52 free pages starting from 15Ch.


386MAX


Just as with any memory manager, most of 386MAX's system code and data are in
extended memory, but trying to copy them into conventional memory buffer with
INT 15h function 87h won't work. 386MAX will either refuse to copy anything or
(in the case of the EMM import structure) will fill the buffer with 0s.
One way to access the system area of 386MAX is to enter protected mode via
VCPI. This involves setting up the GDT, IDT, and page tables--just to copy
some bytes of extended memory. After switching the CPU into real mode to
refresh the EMM import structure, you must directly enter protected mode.
This, after all, is what Windows is doing, albeit on a much grander scale. 
Using INT 15h function 87h won't work in real mode because 386MAX's INT 15h
handler is in a UMB, which disappears in real mode. The move_ext_mem_real
routine in Listing Three enters protected mode and copies memory. Since you
don't do any far calls and interrupts are disabled, there is no need for code
segments or an IDT. Only a GDT with a minimum of segments is created.
Table 1: Interaction of Windows and memory managers, (a) at Windows startup;
(b) at Windows exit.
 Windows Memory Manager 
 (a) INT 2Fh AX=1605h Set device name to "EMMXXXX0".
 Return mode-switch entry-point address.
 IOCTL-read "EMMXXXX0" Return physical address of EMM import structure.
 Call mode-switch 
 entry-point (AX=0)
 Set CPU to real mode.
Windows runs_

 (b) Set CPU to real mode
 Call mode-switch 
 entry-point (AX=1) Setup the EMM import structure.
 Regain protected mode.
 INT 2Fh AX=1606h Restore device name.
Figure 1: Getting the physical address of the EMM import structure.
#pragma pack(1)
typedef struct {
 unsigned long addr;
 unsigned char maj, min;
 } EMM_IMPORT_ADDR; // must be 6 bytes
unsigned long get_emm_import_addr(void)
{
 unsigned long addr = 0;
 EMM_IMPORT_ADDR impaddr;
 int emm = open("EMMXXXX0", O_RDONLY);
 if (emm == -1)
 return 0L; // fail: no EMM
 impaddr.addr = 1;
 #define IOCTLREAD 2
 if (ioctl(emm, IOCTLREAD, (void far *) &impaddr, sizeof(impaddr)) != 0)
 addr = impaddr.addr;
 close(emm);
 return addr; // physical address
}
Figure 2: Layout of the EMM import structure.
Header: (all versions)

 byte Flag0
 byte ???
 word Size of struct (bytes)
 word Version #
 dword ???
Snapshot of memory status below 1MB: (all versions)
 array of 64 frame descriptors
 a frame descriptor is:
 byte Type
 byte Handle #
 word Logical Page #
 byte Physical Page #
 byte Flag
 followed by:
 byte ???
UMB information: (all versions)
 byte # of UMB frame descriptors
 array of UMB frame descriptors
 a UMB frame descriptor is an array of 4 dwords
 each dword is 386 physical page number
EMS handle information: (all versions)
 byte # of EMS handle descriptors
 array of EMS handle descriptors
 where an EMS handle descriptor is:
 byte Handle #
 byte Flag
 byte Handlename[8]
 word EMS pages owned by handle
 dword Pagemap Physical Address
free memory information: (version 1.10, 1.11)
 dword Real mode INT 67h vector
 dword HMA page table physical address
 byte # of free page list
 array of free page list
 where a free page list is:
 dword Physical page #
 dword # of free pages
XMS information: (version 1.10, 1.11)
 byte # of XMS handle descriptors
 array of XMS handle descriptors
 where an XMS handle descriptor is:
 word Handle #
 word Flag
 word Size (KB)
 dword Base Address
free UMB information: (version 1.10, 1.11)
 byte # of free UMB descriptors
 array of free UMB descriptors
 where a free UMB descriptor is:
 word Segment
 word Paragraphs
Product information: (version 1.11)
 vendor name of the memory manager
 product name of the memory manager
Figure 3: EMMIMP output on EMM386 (excerpt).
C:\> emmimp
mode switch entry:03af0992
emm import structure address:00119000
flag0:0x4

size:0x23c bytes
version:0x10b
frame[0x10] (4000:0):large EMS (phys page 04)
 ...
frame[0x27] (9c00:0):large EMS (phys page 27)
frame[0x32] (c800:0):UMB/c800/c900/ca00/cb00/umb desc index:0
frame[0x33] (cc00:0):UMB/cc00/cd00/ce00/cf00/umb desc index:1
frame[0x34] (d000:0):UMB/d000/d100/d200/d300/umb desc index:2
frame[0x35] (d400:0):UMB/d400/d500/d600/d700/umb desc index:3
frame[0x36] (d800:0):UMB/d800/d900/da00/db00/umb desc index:4
frame[0x37] (dc00:0):UMB/dc00/dd00/de00/df00/umb desc index:5
frame[0x38] (e000:0):EMS (phys page 00)/mapped to handle 1 page 1
frame[0x39] (e400:0):EMS (phys page 01)
frame[0x3a] (e800:0):EMS (phys page 02)
frame[0x3b] (ec00:0):EMS (phys page 03)
resv2:0x74
# of umb desc:6
umbdesc[ 0]:00000120 00000121 00000122 00000123
umbdesc[ 1]:00000124 00000125 00000126 00000127
umbdesc[ 2]:00000128 00000129 0000012a 0000012b
umbdesc[ 3]:0000012c 0000012d 0000012e 0000012f
umbdesc[ 4]:00000130 00000131 00000132 00000133
umbdesc[ 5]:00000134 00000135 00000136 00000137
EMS handle 0: name="", 24 EMS pages, pagemap at 0011b000
 log page[0]:00040267 00041067 00042067 00043067
 ...
 log page[17]:0009c267 0009d067 0009e067 0009f067
EMS handle 1: name="test", 3 EMS pages, pagemap at 0011b180
 log page[0]:00150267 00151067 00152067 00153067
 log page[1]:00154267 00155067 00156067 00157067
 log page[2]:00158267 00159067 0015a067 0015b067
realmode int 67 vector:03af:02b0
hma_page_table_addr:00000000
# of free page lists:1
free page list[00]:page 0000015c, 52 pages
# of XMS info:0
# of umb free seg:1
umb free seg:c93a, 0x16c6 paragraphs
maker:"MICROSOFT "
product:"EMM386 4.45 "

Listing One 

/* emmimp.h, EMM import structure -- copyright (c) taQ 1994 */
typedef unsigned long uint32; /* double word */
typedef unsigned short uint16; /* word */
typedef unsigned char uint8; /* byte */

typedef struct {
 uint8 type;
 uint8 handle;
 uint16 logpage;
 uint8 physpg_num;
 uint8 flag;
} emmimp_frame_desc_t; /* frame descriptors */
typedef struct {
 uint8 flag0;
 uint8 resv0;
 uint16 size;

 uint16 version;
 uint32 resv1;
 emmimp_frame_desc_t frame[64];
 uint8 resv2;
 } emmimp_hdr_t; /* header */
typedef struct {
 uint32 physpgnum[4];
} emmimp_umb_desc_t; /* UMB descriptor */
typedef struct {
 uint8 handle;
 uint8 flag;
 uint8 hndlname[8];
 uint16 pages; 
 uint32 pgmap_phys_addr;
} emmimp_ems_desc_t; /* EMS descriptor */
typedef struct {
 uint32 physpgnum;
 uint32 numpg;
} free_page_list_t; /* free page list */
typedef struct {
 uint16 handle;
 uint16 flag;
 uint32 kb; 
 uint32 baseaddr;
 } emmimp_xms_desc_t; /* XMS descriptor */
typedef struct {
 uint16 startseg;
 uint16 parasize;
} free_umb_desc_t; /* free UMB descriptor */



Listing Two 

/* emmimp.c, dumping the EMM import structure -- copyright (c) taQ 1994 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <dos.h>
#include <io.h>
#include <fcntl.h>
#include "emmimp.h"
#include "moveext.h"

#ifndef __TURBOC__
#define asm _asm
#endif

static int for_386max = 0;
static int raw_dump = 0;
static int machine;

static void emulate_win_init(uint16 ver, void (far **entryp)(void))
{
 void (far *entry)(void);

 asm push ds
 asm push es
 asm mov dx, 0

 asm mov ax, 0
 asm mov es, ax
 asm mov ds, ax
 asm mov di, ver
 asm mov si, 0
 asm mov cx, 0
 asm mov bx, 0
 asm mov ax, 1605h
 asm int 2fh
 asm mov word ptr entry, si
 asm mov word ptr entry + 2, ds
 asm pop es
 asm pop ds
 *entryp = entry;
}
static void emulate_win_term(void)
{
 asm mov ax, 1606h
 asm int 2fh
}
static uint32 get_emm_imp_addr(void)
{
 uint8 buf[6];
 int fd;

 fd = open("EMMXXXX0", O_RDONLY O_BINARY);
 if (fd < 0) {
 printf("can't open EMMXXXX0\n");
 return 0;
 }
 memset(buf, 0, sizeof buf);
 buf[0] = 1;
 asm lea dx, word ptr buf
 asm mov cx, 6
 asm mov bx, fd
 asm mov ax, 4402h
 asm int 21h
 asm jc fail
 printf("emm import structure address:%08lx\n", *(uint32 *) buf);
 printf("emm import structure version:%d.%02d\n", buf[4], buf[5]);
 close(fd);
 return *(uint32 *) buf;
fail:
 printf("emm import struct address get failed\n");
 close(fd);
 return 0;
}
static void switch_to_real(void (far *entry)(void))
{
 asm push ds
 asm push es
 asm push di
 asm push si
 asm push bp
 asm mov ax, 0
 asm call dword ptr entry;
 asm pop bp
 asm pop si
 asm pop di

 asm pop es
 asm pop ds
}
static void switch_to_prot(void (far *entry)(void))
{
 asm push ds
 asm push es
 asm push di
 asm push si
 asm push bp
 asm mov ax, 1
 asm call dword ptr entry;
 asm pop bp
 asm pop si
 asm pop di
 asm pop es
 asm pop ds
}
static uint16 winver = 0x30a; /* Windows version */
main(int ac, uint8 **av)
{
 void (far *entry)(void);
 int i;
 uint32 emm_imp_addr;
 static uint8 buf[8192];

 get_options(ac, av); // not shown
 if (!for_386max)
 machine = machine_type();
 emulate_win_init(winver, &entry);
 if (!entry) {
 printf("illegal mode switch entry\n");
 emulate_win_term();
 return 1;
 }
 printf("mode switch entry:%08lx\n", entry);

 emm_imp_addr = get_emm_imp_addr();

 if (!emm_imp_addr) {
 emulate_win_term();
 return 1;
 }
 if (!for_386max) {
 disable(); /* disable interrupts while in real mode */
 switch_to_real(entry);
 switch_to_prot(entry);
 enable(); /* now Ok to enable interrupts */
 emulate_win_term();
 if (machine == MC_98)
 move_ext_mem_98(emm_imp_addr, buf, sizeof buf);
 else
 move_ext_mem_pc(emm_imp_addr, buf, sizeof buf);
 dump_emm_imp(emm_imp_addr, buf);
 } else {
 disable();
 switch_to_real(entry);
 move_ext_mem_real(emm_imp_addr, buf, sizeof buf);
 switch_to_prot(entry);

 enable();
 emulate_win_term();
 dump_emm_imp(emm_imp_addr, buf);
 }
 return 0;
}



Listing Three 

/* moveext.c, extended memory move -- copyright (c) taQ 1994 */
#pragma inline
#include <stdio.h>
#include <string.h>
#include <dos.h>
#include "moveext.h"

#ifndef __TURBOC__
#define asm _asm
#endif

typedef unsigned short uint16; /* word */
typedef unsigned char uint8; /* byte */
/* segment descriptor */
typedef struct {
 uint16 lim0_15; /* Bits 0-15 of the segment limit */
 uint16 base0_15; /* Bits 0-15 of the segment base */
 uint8 base16_23; /* Bits 16-23 of the segment base */
 uint8 acc_byte; /* Access byte */
 uint8 lim16_19; /* Bits 16-19 of the segment limit
 plus some flags */
 uint8 base24_31; /* Bits 24-31 of the segment base */
} desc_t;
/* convert far pointer to address */
static uint32 fp_to_addr(void far *p)
{
 uint32 addr = FP_SEG(p);
 addr <<= 4;
 addr += FP_OFF(p);
 return addr;
}
static void set_data_gdt(desc_t *gdtp, uint32 addr, uint16 limit)
{
 gdtp->lim0_15 = limit;
 gdtp->base0_15 = addr;
 gdtp->base16_23 = (addr >> 16) & 0xff;
 gdtp->base24_31 = (addr >> 24) & 0xff;
 gdtp->acc_byte = 0x93; /* data, expand up, r/w, dpl = 0 */
 gdtp->lim16_19 = 0;
}
/* move extended memory, PC-AT */
int move_ext_mem_pc(uint32 srcaddr, void far *dest, uint16 bytes)
{
 void far *p;
 static desc_t gdt[6]; /* 10h: src descriptor, 18h: dest */

 memset(gdt, 0, sizeof gdt);
 /* set src & dest descriptors */

 set_data_gdt(gdt + 2, srcaddr, bytes);
 set_data_gdt(gdt + 3, fp_to_addr(dest), bytes);

 p = &gdt;
 asm les si, p
 asm mov ah, 87h
 asm mov cx, bytes
 asm inc cx
 asm shr cx, 1
 asm int 15h
 /* error check omitted */
 return 1;
}
/* move extended memory, PC-9800 */
int move_ext_mem_98(uint32 srcaddr, void far *dest, uint16 bytes)
{
 desc_t far *p;
 static desc_t gdt[6]; /* 10h: src descriptor, 18h: dest */

 memset(gdt, 0, sizeof gdt);
 /* set src & dest descriptors */
 set_data_gdt(gdt + 2, srcaddr, 0xffff);
 set_data_gdt(gdt + 3, fp_to_addr(dest), 0xffff);

 p = gdt;
 asm mov cx, bytes
 asm les bx, p
 asm mov si, 0
 asm mov di, 0
 asm mov ah, 90h
 asm clc
 asm int 1fh
 asm jnc _OK
 return 0;
_OK:
 return 1;
}
typedef struct {
 uint16 len;
 uint32 physaddr;
 uint16 filler;
} gdt_val_t;
/* move extended memory, generic, from real mode, assumes interrupts
disabled*/
int move_ext_mem_real(uint32 srcaddr, void far *dest, 
 uint16 bytes)
{
 gdt_val_t gdt_val;
 static desc_t gdt[3]; /* 8h: src descriptor, 10h: dest */

 memset(gdt, 0, sizeof gdt);
 /* set src & dest descriptors, limit must be 0xffff */
 set_data_gdt(gdt + 1, srcaddr, 0xffff);
 set_data_gdt(gdt + 2, fp_to_addr(dest), 0xffff);

 gdt_val.len = sizeof gdt;
 gdt_val.physaddr = fp_to_addr(&gdt);
 gdt_val.filler = 0;

 asm push ds

 asm push es
 asm push si
 asm push di
 asm .386p
 /* load gdt */
 asm lgdt fword ptr gdt_val
 /* enter protected mode */
 asm mov eax, cr0
 asm or eax, 1
 asm mov cr0, eax
 /* ds <- src descriptor */
 asm mov ax, 8h
 asm mov ds, ax
 asm xor si, si
 /* es <- dest descriptor */
 asm mov ax, 10h
 asm mov es, ax
 asm xor di, di
 asm mov cx, bytes
 asm cld
 asm rep movsb
 /* back to real mode */
 asm mov eax, cr0
 asm and eax, not 1
 asm mov cr0, eax
 asm .8086
 asm pop di
 asm pop si
 asm pop es
 asm pop ds
 return 1;
}
int machine_type(void)
{
 uint16 rc;
 union REGS regs;
 rc = MC_UNKNOWN; /* unknown */
 /* do PC-AT get realtime clock call, NOP interrupt in PC-9800 */
 regs.x.cx = 0;
 regs.h.ah = 4;
 int86(0x1a, &regs, &regs);
 /* if ch contains current century, then PC-AT */
 if (regs.h.ch == 0x19 regs.h.ch == 0x20)
 rc = MC_PC; /* PC-AT */
 else
 rc = MC_98; /* PC-98 */
 return rc;
}














September, 1994
PROGRAMMER'S BOOKSHELF


Writing Windows Virtual Device Drivers




Walter Oney


Walter is a freelance developer and software consultant based in Boston. He
specializes in system tools and in interfacing complex applications to
Windows, NT, and DOS. Walter can be reached on CompuServe at 73730,553.


The concept of a virtual device driver arose in Windows 3.0 386 Enhanced mode
as a way to "virtualize" hardware devices so that multiple DOS and Windows
applications could share them. If I type on the keyboard, for example, my
keystrokes might at one time belong to the active Windows application, and at
another, to a character-mode program running in a DOS box. Microsoft's
designers built a multitasking operating system--WIN386.EXE--around the idea
of "virtual machines," a familiar paradigm for academics and others who had
long ago used IBM's own CP-67 (later VM/370). Handling the virtual keyboard
attached to a virtual machine calls for a virtual keyboard device (VKD) which
can direct the actual keystrokes from physical hardware to the correct program
in such a way that each consumer of keystrokes believes it's dealing directly
with hardware. Handling some other hypothetical "x" device calls for a Virtual
"x" Device--a VxD (in the cyberspeak shorthand of the folks from Redmond).
Since VxDs are 32-bit, flat-model programs running in the same privileged
ring-0 world of the true operating system, programmers who need to do hardcore
systems programming will gravitate toward this level of programming. Do you
need to control math-coprocessor emulation in a 3.0 system, where the DPMI
0Exx series of services hadn't yet been implemented? Simple. Just write a VxD
that intercepts software interrupt 31h and provides the necessary
virtualization of the processor's CR0 control register. Do you want to provide
demand paging of executables for a 32-bit Windows extender? A VxD is part of
the solution.
Writing virtual device drivers is generally the arcane specialty of trained
stunt programmers--and David Thielen and Bryan Woodruff's Writing Windows
Virtual Device Drivers does nothing to dispel that notion. Organizationally,
the book shows great initial promise. The first three sections of the book,
comprising 13 chapters and 170 pages, contain tutorial material aimed at
teaching how to combine the myriad of possible services into usable
components. The remaining three quarters of the book contain reference
material, including register-by-register instructions about how to use those
services. The reference material duplicates Microsoft's own documentation but,
at least in the section on VMM services, follows an obvious, alphabetic plan
that seems to have escaped the Microsoft writers as a preferable organizing
principle. The book breaks down, unfortunately, in precisely the area of
tutorial exposition for which potential readers have been thirsting for years.
Regrettably, Thielen and Woodruff don't develop the theme of why anyone might
want or need to write a VxD in any straightforward way. In the first two
pages, you are confronted with: the abbreviation VxD without any explanation
of where the "x" comes from; the gratuitous proposal that VMM (mysteriously
equated to WIN386.EXE without further explanation) might launch COMMAND.COM
instead of KRNL386.EXE (which is what, exactly?); advice not to tamper
directly with the IDT; the acronym IRQ; and many other low-level concepts that
belong somewhere in the book, but not right at the start. As a programmer
who's written many VxDs and who teaches VxD programming on occasion, I wasn't
startled by anything in this introductory material, of course; but right from
the start, I knew that this is a book by and for programmers who aren't afraid
to trap I/O ports, deal with a virtualized programmable interrupt controller,
or impale themselves on the "suicide" fence of device initialization.
If the introduction made me feel that I had stepped into the middle of a
manuscript, the ensuing discussion of the mechanics of building a VxD left me
hopelessly confused. The Microsoft DDK's Virtual Device Adaptation Guide
explains that a VxD can contain a real-mode initialization part, a
protected-mode initialization part, a set of handlers for noteworthy events in
the life of a virtual machine, and a collection of service routines for use by
other VxDs and by application programs. Using macros in VMM.INC (a DDK
component), a programmer creates an assembly-language program that contains
one USE16 segment for real-mode initialization and several USE32 code and data
segments for everything else. You assemble the program nowadays with MASM 6.1
and link it with a LINK386 left over from the early beta program of OS/2 2.0.
The resulting LE signature file is then postprocessed to make it usable by the
VxD loader in WIN386 and by the WDEB386 debugger. None of the actual mechanics
of building a VxD are discussed in Writing Windows Virtual Device Drivers,
however. Even a make script with some minimal commentary would be helpful. 
The potentially complex subject of real-mode initialization becomes two
paragraphs in the book. The first paragraph reminds you that it's actually
V86-mode initialization, if you happen to be using a memory manager such as
EMM386, 386Max, or QEMM. The other paragraph supplies the information that
certain segment sizes cause unspecified "problems" within VMM. I was glad to
know this (although I would have appreciated more information about what the
"problems" were, so I could diagnose failures in my own code better), but I've
never had a real-mode initialization section that was large enough to trip on
the restrictions. On the other hand, I didn't read about any of the things
I've actually done with real-mode initializers. How would I learn the
potential benefits of the 2F/1607 device callout function? How to claim owned
pages, prevent a duplicate driver from being loaded, or halt the startup of
Windows altogether? How would I learn about passing reference data to the
protected-mode initialization part of the driver, or about communicating with
a TSR whose 2F/1605 hook caused me to be loaded?
Thielen and Woodruff have fallen into the common trap of programmer/authors,
in which they assume that their readers know almost as much about the subject
as they do. Hence, they leave out many steps of reasoning and explanation.
Since their readers won't, in fact, know very much about the subject (why buy
the book otherwise?), this becomes the book's major failing. Another example,
drawn from Chapter 3 on memory management, illustrates the problem:
The MMGR manages instance data for VMs. Instance data is a range of V86
address space that VMM maintains separately for each VM. It is used frequently
for MS-DOS and some TSRs.
 For example, if an MS-DOS device driver maintains an input buffer, it may be
useful to have the buffered input directed to the VM that was active when the
buffer was filled. In this case, the VxD would query the device driver for the
buffer address and maximum size and add an instance data area as shown here_.
This text, which is the entire description provided of instance data, is
followed by an example (in C) that calls the authors' own VMM_AddInstanceItem
function.
I find several problems with this snippet, primarily in regard to what isn't
said. "Instance data" is data that must be private to each virtual machine
even though it has the same real-mode address in every machine. This is
conceptually similar to automatic data in a reentrant subroutine or to thread
local storage in NT. The buffer used by DOSKEY is a good example: It won't do
for a command typed in one DOS box to show up in the recall buffer of another.
The virtualized video RAM is another good example. VMM implements instance
data differently, depending on how large the instanced area is. If an entire
page is instanced, each VM has its own physical page buffer that gets mapped
into the V86 region at the appropriate common address. If a data area smaller
than a page is instanced, VMM marks the containing page "not present" on every
VM switch, and it thereafter copies the per-VM information if the page is ever
touched. You never learn about these implementation details, however.
You also never learn about one of the most important ways of establishing a
region of instanced address space: the response to the 2F/1605 startup
broadcast. These details appear, to be sure, in an appendix that discusses the
important INT 2F interface. But a novice needs an indication about how to use
vocabulary like 2F/1605 in the sentences and paragraphs of a real application.
This example alludes to directing input to the right virtual machine, which
seems to indicate the 2F/1685 (switch virtual machines and call back)
interface created originally for network vendors. While this interface is
undoubtedly implicated in some instance-data situations as well, it's surely
not the main feature requiring explanation. In any case, the code sample
doesn't address VM switching anyway.
The code sample itself provides another small bone of contention. In
principle, you should be able to write a VxD in either assembler or any
high-level language for which you can find a 32-bit compiler. (Visual C++
32-bit edition would not be a good choice because its COFF-format output is
unintelligible to the aging LINK386 linker.) VMM employs a dynamic linking
scheme that uses data in the instruction stream which is then replaced as
links are "snapped" to their run-time locations. Assembly language is the only
way to achieve the linkage, and an assembly language header file (VMM.INC) is
the only official place for finding the right macros and equates. Any
high-level interface for VxD writers must speak to this issue, and the authors
have provided their own C-callable API for this purpose. Thus,
VMM_AddInstanceItem is a C wrapper for the _AddInstanceItem service found in
VMM.INC. Unfortunately, I find this particular sample, as well as all of the
others written in C, too cluttered for expository purposes. The complication
of interfacing in C with extraordinarily long function names subsumes the
logic of the program, especially when the code contains adequate error
handling. In this situation, I think the sufficiently evocative assembly
language macros in VMM.INC would more clearly express the ideas.
Writing Windows Virtual Device Drivers is nonetheless a plausible addition to
the bookshelf of an experienced VxD writer. The authors have unique insight
into infrequently visited areas, like direct memory access (DMA) programming.
Their exposition of how they implemented a C-callable interface for VxDs, a
communications driver, and an inter-VM linkage driver makes interesting
reading that won't fit in a magazine format. And the reorganized and reprinted
reference material is independently useful for people like me who prefer hard
copy to electronic media. As I said at the outset, however, I fear that this
book's terseness makes it virtually unusable for beginners.
Writing Windows Virtual Device Drivers
David Thielen & Bryan Woodruff
Addison-Wesley, 1994, 650 pp. $39.95
ISBN 0-201-62706-X



























September, 1994
SWAINE'S FLAMES


Rhymes for Our Times


Borland ate
Ashton-Tate,
Learned that it was poison, but by then it was too late.
***
All praise to earnest Albert Gore,
The Veep.
He's working through the night while you're
Asleep.
Gonna build that highway,
Gonna build it my way,
Says straight, unbending Albert Gore,
The Veep.
But Al, is your way open, free, or
Cheap?
***
Kaleida had a kalision with Spindler's bottom line.
When Michael makes decisions, executives resign.
Heads of two divisions fell under Spindler's frown.
As well as lesser mortals: The shake-up trickles down.
***
Last year Utah-based Novell
Planned, when it bought USL,
To shake the Windows, rattle Gates.
Float UnixWare on the Great Salt Lake.
But new exec Bob Frankenberg
Thinks desktop Unix is absurd,
So former CEO Ray Noorda's
Plan floats dead upon the waters.
***
A bug in a chip that's already been shipped
Is rarely a thing to applaud.
But with Bill saying, "Trust us," 'twas poetic justice
When the Clipper was shown to be flawed.
,,,
Richard Nixon, scum of the earth,
Was buried with honors in the land of his birth.
Kurt Cobain, the voice of our youth,
Considered a bullet the pathway to truth.
Isn't it time that somebody said,
It's not always wrong to speak ill of the dead?
***
Their RS systems sales were brisk,
So IBM thought, "Run the RISC!"
Apple beat them to the punch.
Will Compaq have them both for lunch?
***
Microsoft lost! Though it's hard to believe,
They're making the payments; there'll be no reprieve.
The judge gave his verdict, to anonymous cheers.
A million a month for almost four years.
And Stac Electronics has reason to grin:

They're now growing fat from making files thin.
But somewhere in Redmond there's code now in test
For sending out royalty payments compressed.
Michael Swainepoet-at-large


























































September, 1994
OF INTEREST
Micro Focus Cobol 3.2 for SCO UNIX has been optimized for 486 and Pentium
processors. Also optimized is the Cobol SDK for 32-bit OS/2 and Windows NT for
Intel-architecture-based workstations and servers. The technology involves
specific optimizations to take advantage of the instruction scheduling and
pipelining features of the Pentium processor, as well as general optimization
and optimal instruction ordering of compiled code for both the Intel 486 and
Pentium processors.
Micro Focus Cobol 3.2 for SCO UNIX is priced at $1250.00. The Micro Focus
Cobol SDK for 32-bit OS/2 and Windows NT sells for $2500.00. Reader service
no. 20. 
Micro Focus
2465 East Bayshore Road
Palo Alto, CA 94303
415-856-4161
Readers interested in the Beta programming language described in the October,
1993 issue of DDJ might be equally interested in the Mj0lner Beta Newsletter
from Mj0lner Informatics, a Beta vendor. The newsletter provides news and
techniques for Beta programmers. Sample issues are provided upon request;
contact newsletter@mjolner.dk. Reader service no. 21.
Mj0lner Informatics
Science Park Aarhus
Gustav Wieds Vej 10
DK-8000 Aarhus C, Denmark
+45-86-20-20-00
Former DDJ columnist Michael Abrash has published his Zen of Code
Optimization, which covers code optimization in C, C++, and assembly. The
book, published by the Coriolis Group, covers techniques from loop unrolling
to Boyer-Moore string searching. The Zen of Code Optimization also covers
techniques for optimizing C programs, as well as speeding up C with inline
assembly. Abrash examines how hardware programming affects code performance,
discusses how cycles can be eaten up by the display adapter, dynamic RAM and
the prefetch queue, and provides detailed coverage of the 386/486/Pentium
processors. The book includes the Zen timer, a tool for measuring code
performance; it sells for $39.95. ISBN 1-883577-03-9. Reader service no. 22.
The Coriolis Group 
7721 East Gray Road, Suite 204
Scottsdale, AZ 85260
602-438-0192
The Cross Platform Toolset is, as its name suggests, a new cross-platform
development tool from Visual Edge. The initial release of the tool is hosted
on SunOS 4.1.3 and targeted for Windows 3.1. The software is an add-on to
Visual Edge's UIM/X GUI builder for OSF/Motif. 
With the Cross Platform Toolset, objects can be implemented using everything
from Motif widgets to Visual Basic VBXs, allowing you to work with whichever
tools you're familiar with. The Cross Platform Toolset sells for $2500.00.
Reader service no. 23.
Visual Edge Software
3950 Cte Vertu
Saint Laurent, PQ
Canada H4R 1V4
514-332-6430
Interactive Software Engineering (ISE) has released Personal Eiffel for
Windows, an implementation of the Eiffel language which supports incremental
compilation under Windows 3.1., though the environment is character based. The
package provides several hundred precompiled classes, including base classes
to support basic data structures and algorithms, lex, and parser classes.
Personal Eiffel for Windows also comes with a variety of tools to manage
classes, including browsers and documentation tools. 
Professional Eiffel for Windows, on the other hand, adds the ability to
directly call C functions and generate a portable C package from an Eiffel
program. Both versions of Eiffel for Windows support the NICE standard
specification, and they are fully source-code compatible with UNIX, VMS, and
NextStep implementations. ISE claims that it will release GUI versions of the
compiler that support the Windows interface, providing both CASE tools and
graphics classes later this year. Personal Eiffel sells for $49.95. Pricing
for Professional Eiffel for Windows is expected to be approximately $500.00.
Reader service no. 24.
Interactive Software Engineering
270 Storke Road, Suite 7 
Goleta, CA 93117
805-685-1006
In other Eiffel news, Tower Technology has ported its TowerEiffel to OS/2.
TowerEiffel, which includes an Eiffel 3 com-piler and an integrated
emacs-based environment with browsing and documentation tools, syntax-directed
highlighting, and auto-indentation. TowerEiffel is also available for Sun
SPARC for SunOS 4.1.x and Solaris 2.3.
The TowerEiffel system supports Eiffel and C++ interoperability, allowing C++
objects to be invoked from Eiffel, and vice versa. The compiler will also
optionally generate C++ code. A commercial license for TowerEiffel for OS/2
sells for $1295.00. TowerEiffel is available to individuals for noncommercial
use at a 70 percent discount; full-time students taking an approved Eiffel
class can receive an 80 percent discount; and university professors can
receive a 90 percent discount. Reader service no. 25.
Tower Technology
3300 Bee Caves Road, Suite 650
Austin, TX 78746
512-452-9455
Iterated Systems has announced a new fractal-image format called "Fractal
Paper." Based on its Fractal Transform process which compresses an image to a
fraction of its original size while maintaining high image quality, Fractal
Paper extends fractally compressed files to third-party toolkits and provides
image protection through encrypted-content password and comment fields.
The first developer toolkit to adopt Fractal Paper is the yet-to-be-released
Lotus SmarText Release 3. Lotus will include the fractal-image decompressor
within its SmarText Builder and Viewer software. Developers interested in
using fractal compression within a Lotus SmarText title can purchase Images
Incorporated, Iterated's fractal file compressor, and license Fractal Paper
from Iterated. Reader service no. 26.
Iterated Systems
5550-A Peachtree Parkway
Norcross, GA 30092
404-840-0310
Xing Technology has released its XingCD 1.0, CD-ROM MPEG compression software
for Windows. XingCD produces MPEG video streams from .TGA, .AVI, or .BMP files
and creates MPEG audio streams from .WAV files. It also interleaves audio and
video into an MPEG-compatible system stream.
The company claims that XingCD is the only available software that lets you
easily create full-screen MPEG video streams from a variety of source files,
including video captured with AVI-compliant motion JPEG boards, 3-D
animations, and image files. 
XingCD can create full-motion video streams from 32x32 to 352x288 resolution.
Furthermore, compression and interleaving jobs can be queued up to run in the
background, allowing users to continue to work in other applications. XingCD
supports full I, B, and P frames, and full MPEG I system-stream syntax. MPEG
streams encoded by XingCD play back on any MPEG playback hardware. The
software sells for $995.00. Reader service no. 27.
Xing Technology 
1540 W. Branch St.
Arroyo Grande, CA 93420-1818
805-473-0145
RSA Data Security has released, free-of-charge, its RIPEM/SIG
digital-signature encryption software that's built on top of RSA's RSAREF
cryptography toolkit (also available at no charge). The software is released
for both private use and commercial development. Furthermore, RSA has received
permission from the State Department to allow the software to be freely
exported. 
RIPEM was written by Mark Riordan of Michigan State University using the
RSAREF toolkit. A Macintosh version was developed by Ray Lau of MIT. Versions
are available for Macintosh, DOS, UNIX, and other popular platforms. To
receive RSAREF and RIPEM/SIG, send e-mail to RSAREF@rsa.com or download it via
anonymous ftp from ripem@ripem.msu.edu. Reader service no. 28.
RSA Data Security
100 Marine World Pkwy.
Redwood Shores, CA 94065
415-595-8782
ObjectPro from Trinzic is a Windows-hosted, object-oriented database
development tool which, the company claims, combines the power of languages
such as C/C++ with the ease-of-use of rapid-prototyping languages. During
initial development phases, ObjectPro functions as an interpreter. When the
app is ready for production, ObjectPro translates the code into C, then
compiles and links it into a series of deliverable DLLs. Among the databases
ObjectPro supports are Oracle, Sybase, Ingres, Informix, DB2/2, dBase, Access,
FoxPro, and others via an ODBC interface. ObjectPro sells for $2995.00. Reader
service no. 29.

Trinzic
101 University Ave.
Palo Alto, CA 94301
415-328-9595
MediaShop 1.0 for Windows, an open architecture, component-based software
multimedia authoring and delivery tool, has been released by Motion Works.
MediaShop provides a multimedia-specific API along with links to Visual Basic,
Visual C++, and other languages that support VBXs. MediaShop ships with a
royalty-free run-time player. Its animation file format is compatible with
CorelMove from Corel Corporation and Asymetrix's Multimedia ToolBook 3.0.
MediaShop sells for $695.00. Reader service no. 30.
Motion Works 
1020 Main Land, Suite 130
Vancouver, BC
Canada V6B 2T4 
604-685-9975
The DynaZip 2.0 Data Compression Tool-kit for Windows has been released by
Inner Media. This toolkit makes it possible to incorporate into your programs
the ability to read, test, create, modify, and write standard ZIP files
without shelling to DOS. The toolkit provides multiple-disk support,
encryption/decryption with user-defined passwords, and easy creation of
Windows-hosted, self-extracting files. Compressed files are compatible with
PKWare's PKZIP 2.04G.
The royalty-free DynaZip DLLs are compatible with C, C++, Visual Basic, and
other languages which support DLLs. DynaZip sells for $249.00. Reader service
no. 31.
Inner Media
60 Plain Road
Hollis, NH 03049
603-465-3216














































October, 1994
EDITORIAL


Seeing Isn't Believing Anymore


If Waldo Richardson only knew what the paparazzi were up to these days, he'd
be, well, spinning in his darkroom. Waldo, who lived all his 90-plus years a
few creeks over from where I grew up, was an early-20th century optimist
enamored with the high technology of his day--steam engines, magnetos, and the
like. In particular, he was fascinated with photography, and from about 1900
to 1910 he made a living chronicling life as it was in our Ozark hills. Among
my few prizes, in fact, is a collection of Waldo's 4x5-inch glass negatives
which he gave me shortly before he died. Even with the boxy,
mahogany-and-brass view camera and Sears and Roebuck mail-order chemicals he
used, the images are sharp and the contrasts stunning. From Fourth-of-July
picnics and the Florence German Marching Band to steam engines and the Self
Chapel baseball team, Waldo captured a true history of life tucked away in the
slipstream of time.
When examining old glass negatives like these, you know that what you see is
what you had--real people living real lives--because, as the saying goes,
photographs don't lie. But for better or worse, digital technology has changed
all that. From the malicious manipulation of O.J. Simpson's mug shot on the
cover of Time magazine to the marvelous subtlety of the movie Forrest Gump,
photographs don't necessarily tell the truth anymore. 
Of course, there's nothing new about photo manipulation. Since Eastman's first
snapshot of Kodak, photographers have manipulated images using light and
shadow in both the field and the darkroom. But today's
photographers--including Michael Carr, who takes the photos you see on the
cover of DDJ--spend more time peering at computer monitors than squinting
through viewfinders. 
This goes to the heart of the biggest difference between today's digital
manipulation and yesteryear's cut-and-paste photo fraud--you just can't tell
the difference anymore. (You didn't really think that legless Vietnam vet in
Forrest Gump was a real-life double amputee, did you?) As movie producer Steve
Starkey recently said, "We're finding that digital technology's great
breakthrough isn't in mind-boggling effects, but in its ability to heighten
the drama of a scene in very subtle ways."
It's one thing to get lost in the fantasy of a movie theater or annoyed by
spin doctoring of news magazines. It is quite another, however, when
digital-image manipulation threatens the foundations of basic research. In
laboratories which rely upon scientific images for evaluation and approval,
computer photography is old hat, as scientists routinely use it to record
everything from cell counts to molecular structures. Consequently, says Mt.
Sinai School of Medicine's Paul Anderson in Science magazine, "the opportunity
for adjusting the photographic representation to fit the hypothesis" is always
there. 
There are proposals to combat unethical and/or illegal doctoring of digital
data. In systems devised by JPL researcher Gary Friedman, image-verification
cameras use the NIST's DSS and RSA Data Security's BSAFE or TIPEM public-key
authentication software to verify that an image hasn't been tampered with.
Other approaches involve time/date stamping using similar algorithms. 
Still, researchers like Friedman see the courtroom as the real battleground
for digital-imagery authenticity. The day a lawyer challenges the truthfulness
of a photograph introduced as evidence is, without a doubt, upon us. The
intelligence-gathering community has already run head-on into this palaver. In
a U.N. debate over whether the U.S. justifiably shot down a Libyan fighter,
U.S. diplomats offered photos taken from the U.S. plane's video system that
clearly showed missiles under the Libyan's wings. Libyan countered that the
photos had been manipulated--and the U.S. couldn't prove otherwise.
Ultimately, we're again faced with the double-edged sword of technological
progress--it can make our life better, or diminish it. As for images and
photographs, we'll simply stop believing what we see. If this forces us to
question more and accept less, so much the better. But if the cost is a
society which is perpetually suspicious and hardened by skepticism, you have
to hope progress is worth it. 
Then again, digital-image manipulation would let us see what Jeff Duntemann
would look like with a slicked-backed pompadour, or Swaine with a shave and
flat-top. And, if the truth were known, I wouldn't mind shedding 10 pounds.
Maybe there's something to this technology stuff after all.
Jonathan Ericksoneditor-in-chief












































October, 1994
LETTERS


More on Fractal Rulers


Dear DDJ,
I was interested by the "Algorithm Alley" column about fractal rulers (DDJ,
June 1994), particularly because Tom Swan sped up the algorithm by replacing
recursion with an explicit stack. I was not convinced that this was the best
way, especially as recursion may be faster than implementing the stack
manually.
I noticed that the pattern of the lengths of the markings of the ruler is the
same as the number of 0s on the right-hand end of a binary number; see Example
1(a).
You can clear all but the least significant set bit of a number by calculating
n and --n. This leaves you with a power of two, but the ruler markings should
increase in size linearly. You could take the logarithm of n and --n, but this
isn't very fast; you could test each bit in turn until you find one that's
set, but that isn't very elegant; or you could consult alt.hackers on Usenet.
Somebody will reply, reminding you that for such a small problem size, you
can't beat a lookup table. (Thanks to David Jones at dej@eecg.toronto.edu for
that one!) So the ruler algorithm is like that in Example 1(b). This should be
more efficient, but instead of demonstrating recursion elimination, it shows
that you should look for a better algorithm before optimizing.
Anthony Finch
Bristol, England 
Dear DDJ,
I enjoyed Tom Swan's "Algorithm Alley" column on drawing a ruler, but I should
point out that he took the hard approach. If you number the ticks starting
with 1, then it can be seen that the length of each tick (up to a maximum) is
simply one more than the number of low-order 0 bits in the tick number. Taking
that observation, and using C instead of Pascal (shudder), the routine could
have been written as Example 2, which does not involve either recursion or
stack usage. The code uses line() and label(). The line() function is the same
as in Tom's article; label() was not mentioned there, but something had to
draw the numbers! Also, I have assumed that both line() and label() are zero
relative and that the top variable and the charHeight variable are set
elsewhere. As you can also see, there is substantially less code than in the
Pascal version, although there are additional variable definitions (some of
which were left undefined in the Pascal version).
Michael Lee Finney
Lynchburg, Virginia


The Fuzzy-Logic Language Model


Dear DDJ,
The trouble with fuzzy logic is that it isn't--logic, that is. Fuzzy, maybe.
Calling it "logic" at best confuses the issue, and at worst sounds like
scammery.
Fuzzy logic, at least the way it seems to be used today, is much more like a
computer language--a translation product that allows the programmer to produce
a program in high-level constructs, which are then translated into actual
machine codes. Even when assembly language is used, this is still the
approach, except that translation is done manually.
Briefly, the fuzzy-logic language model is as follows:
 Inputs are characterized in real quantities as fuzzy things: The sensor range
from 70 to 100 might be "fast," and a range from 0 to 80 might be "slow." The
degree of fast and slow is variable, and both can be somewhat true at the same
time. These things are controlled by relatively simple input functions which,
apparently, come in different flavors.
 A set of fuzzy "rules" is used to combine these fuzzy inputs and produce
fuzzy outputs, somewhat analogously to ordinary logic, except that the exact
method used is not fixed. Apparently there are varying strategies available,
which have their fans and likely applications. "Fast" and "heavy" inputs might
trigger a fuzzy "brakes" output, for instance.
 The fuzzy outputs are "de-fuzzified." "Hi-power" might signify that a value
of 200 should be applied to a digital-to-analog converter that controls a
motor.
There is nothing particularly "logical" about this process that I can discern,
anymore than there is anything especially logical about the translation of a C
program into assembly language. (By the way, I learned all this useful stuff,
and therefore became an immediate expert on fuzzy logic, by using the
fuzzyTECH 3.1 MCU-166 Explorer Edition toolkit from Inform Software
Corporation, 1840 Oak Avenue, Evanston, IL 60201, 800-929-2815. I don't know
what other platforms and/or demo packages they offer. The one I used was for
the Siemens 80C166 processor and probably wouldn't appeal to many, but the
product is an excellent introduction to fuzzy logic.)
I am not antifuzzy. To emphasize my initial point, I feel referring to this
stuff as "logic" or "a logic" is confusing at best, and misleading at worst.
As a language tool, this thing is probably quite useful for the
embedded-control applications for which it is in fact used. Will it or
something like it also solve the mysteries of consciousness and the future of
mankind? I don't think so.
Gregor Owen 
CompuServe 71121,625


NT Services


Dear DDJ,
I enjoyed reading Marshall Brain's article "Inside Windows NT Services" (DDJ,
May 1994). However, I did want to alert Marshall and DDJ readers to a
typographical error in the Windows NT SDK concerning NT services. The
prototypes for the ServiceMain and ControlHandler functions in the include
file WINSVC.H should be prototyped to use the WINAPI (or CALLBACK) calling
convention. See Example 3. 
The WINAPI and CALLBACK macros are defined in the NT SDK as __stdcall. The
ServiceMain and ControlHandler functions must use the __stdcall calling
convention. Failure to use the correct calling conventions will result in an
improperly functioning service. The next version of the NT SDK will correctly
prototype these functions.
Peter Santoro
Microsoft Consulting Service
Example 1: An alternative fractal-ruler algorithm.
(a)
 0 = 0000 -> 4 -> -----
 1 = 0001 -> 0 -> -
 2 = 0010 -> 1 -> --
 3 = 0011 -> 0 -> -
 4 = 0100 -> 2 -> ---
 5 = 0101 -> 0 -> -
 6 = 0110 -> 1 -> --
 7 = 0111 -> 0 -> -
 8 = 1000 -> 3 -> ----
 9 = 1001 -> 0 -> -
 10 = 1010 -> 1 -> --
 11 = 1011 -> 0 -> -
 12 = 1100 -> 2 -> ---

 13 = 1101 -> 0 -> -
 14 = 1110 -> 1 -> --
 15 = 1111 -> 0 -> -
(b)
 void ruler(double length, /*of ruler (screen units) */
 int subdivs, /* in an inch (power of 2) */
 double step, /* for each subdivision (screen units) */
 double min_mark_len, /* length of smallest mark (ditto) */
 double mark_incr, /* difference in length of marks (ditto) */
 double long_mark_len,/* length of the inch markers (ditto) */
 double num_size /* for the inch numbers (do I really need to
 say it again?) */
 {
 double mark_pos, mark_len;
 int counter, inch;
 const char zeros[64] = {
 6, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
 4, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
 5, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
 4, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
 };
 assert(subdivs == 1
 subdivs == 2
 subdivs == 4
 subdivs == 8
 subdivs == 16
 subdivs == 32
 subdivs == 64);
 inch = 0;
 counter = subdivs;
 for(mark_pos = 0; mark_pos < length; mark_pos += step)
 {
 if(counter == subdivs)
 {
 line(mark_pos, 0, mark_pos, long_mark_len);
 plot_num(inch, mark_pos, long_mark_len, num_size);
 counter = 1;
 ++inch;
 }
 else
 {
 mark_len = min_mark_len + zeros[counter] * mark_incr;
 line(mark_pos, 0, mark_pos, mark_len);
 ++counter;
 }
 }
 }
Example 2: More on fractal rulers.
// assumed variables and functions...
extern unsigned top;
extern unsigned charHeight;
void line(unsigned x1, unsigned y1, unsigned x2, unsigned y2);
void label(usigned x, unsigned y, char *s);
void ruler(
 unsigned segments, // Number of complete segments
 unsigned partialTicks) // Must be in range 0..ticksPerSegment-1
 {
 unsigned const tickIncrement = 1;
 unsigned const ticksPerSegment = 16; // Must be power of two.

 unsigned const tickLength[ticksPerSegment] =
 {1 * tickIncrement,
 2 * tickIncrement, 0,
 3 * tickIncrement, 0, 0, 0,
 4 * tickIncrement, 0, 0, 0, 0, 0, 0, 0,
 5 * tickIncrement};
 unsigned ticks = ticksPerSegment * segments + partialTicks;
 unsigned length;
 char s[16];
 for (unsigned i = 0; i < ticks; i++)
 {
 length = tickLength[min(x ^ (unsigned long)(x + 1),
ticksPerSegment - 1)];
 line(i, top, i, top - length);
 if (i && (i % ticksPerSegment == 0))
 label(i, top - length - charHeight, itoa(i / ticksPerSegment,
 s, 10));
 }
 }
Example 3: NT services.
// Function Prototype for the Service Main Function
typedef VOID(WINAPI*LPSERVICE_MAIN_FUNCTIONW)(
 DWORD dwNumServicesArgs,
 LPWSTR*lpServiceArgVectors
 );
typedef VOID(WINAPI*LPSERVICE_MAIN_FUNCTIONA)(
 DWORD dwNumServicesArgs,
 LPSTR*lpServiceArgVectors
 );
//Prototype for the Service Control Handler //Function
typedef VOID(WINAPI*LPHANDLER_FUNCTION)(
 DWORD dwControl
 );





























October, 1994
InterOperable Objects


Laying the foundation for distributed-object computing




Mark Betz


Mark is a senior consultant with Semaphore Training, a consulting and training
company specializing in object technology, client/server development, and
distributed computing. Mark can be contacted at 76605.2346@compuserve.com.


Distributed-object computing is swiftly shaping up as the next
computer-industry battleground. Unlike recent skirmishes involving multimedia
or pen-based computing, distributed-object computing will touch everyone
working in the modern enterprise. This is because distributed computing
represents the direction in which the broad mainstream of information
technology will likely evolve, as mainframe-based, enterprise-wide systems
link up with desktop PCs and departmental servers.
At the heart of distributed computing are "interoperable" or "component"
objects that go beyond the traditional boundaries imposed by programming
languages, process address space, or network interfaces (see "Component Object
Wars Heat Up," by Ray Valds, Dr. Dobb's Developer Update, May 1994). What
makes an object interoperable is the design of its object model. In this
article, I'll survey a set of complex, rapidly changing object technologies
and examine the major object-model designs. Despite its length, this
discussion is incomplete. Because the technologies I'll discuss here span the
entire range of computing from system- and application-level to
network-oriented software and languages, a myriad of technical details cannot
be addressed. In many cases, the technologies in competition for both market-
and mindshare are not directly comparable--too many apples and oranges are in
the mix. In other cases, many technologies are in flux, some being nothing
more than design ideas and press releases. For a few well-publicized
contenders, not much information is available to programmers beyond an initial
white paper or two. For other contenders, the implementations are very real,
but the prospects of a small company influencing the mainstream are remote.


Roots of Distributed Computing


Distributed processing is a natural outgrowth of the ability of the
computer-hardware industry's ability to produce smaller, more powerful, and
less expensive CPUs. On a per-transaction basis, simple business applications
are usually less expensive when executing on smaller computers than on their
larger cousins. However, this benefit is reduced--and may become
negative--when large, complex applications are pulled off mainframes.
In addition to these trends in hardware, the nature of business-application
end users has evolved toward more local autonomy (and attendant
responsibility) in the definition and automation of processes. The typical
computing installation of the 1970s was a centralized computing resource that
housed data and applications accessed by users connected to telecommunications
links; today's systems are more accurately described as a network of data
resources and local processors, all cooperating to provide for the flow of
information between the components of an enterprise. The current realization
of this model is called "client/server" because it consists mostly of data
models residing on servers and applications (clients) residing on local
processors that access the server-based data across a local- or wide-area
network. 
Distributed processing envisions sharing applications and functionality in the
same way that data is currently shared. However, instead of having monolithic
applications, as in the current model, local applications in a
distributed-processing environment take on the role of controllers: They
coordinate the activity of "functions" which provide not only data, but the
code to manipulate it. Code and data are provided in a form that can be
accessed by multiple processors in a heterogeneous environment, without regard
to physical location. For example, a forecasting application with access to
sales data from a server might need sophisticated statistical functions to
make its predictions. Rather than building the statistical functions into the
application, the application would locate a statistics package on the network
and ship the data off as part of a request for certain processing services. 
In short, the principal requirements for a distributed-computing environment
are: 
The application must be able to locate the processing capability it requires.
The application must be able to send parameters and data to, and receive
results from, the process. 
The parameters and results of the process must be meaningful on various
machine architectures within the environment.
The process must be usable in different implementation environnments and
languages. 
There are most certainly other requirements. For instance, this definition
says nothing about security, interface description, how information is
transmitted, or a host of other topics. Nevertheless, it provides the absolute
basics without which distributed computing simply will not work well. Not all
the technologies I'll discuss here satisfy all these requirements, and in
fact, the requirements can serve as boundary markers for grouping the various
offerings.


Distributed Objects 


How is the idea of distributed objects different from the basic ideas of
distributed processing? Simply put, objects are an enabling technology for
distributed systems, just as they have been for client/server, multimedia,
document processing, and other applications. In the case of distributed
systems, the concerns--naming, address-space conversion, transport protocols,
interface description, and the like--are many. Each of these areas is complex,
and objects are at their best when helping us to abstract and deal with
complexity. 
Not only do objects provide a tractable way of organizing the complexity of a
modern operating system, they can also simplify distributed processing.
Objects, with their natural combination of data and behavior and strict
separation of interface from implementation, make a neat, useful package for
distributing data and processes to end-user applications. In
previous-generation structured approaches, a process is considered separately
from the data it acts upon, complicating the issue of where to locate each in
the design of a distributed system. Objects are complete entities from a
problem domain. For example, a video-server object with a formally described
interface that internally maintains all the state and data needed to perform
its task fits in as any other object would, with the added (and hopefully
transparent) consideration that it is not necessarily located in the local
address space. 


Components are Not Object Models


One area of ongoing confusion is the difference between application-level
component technologies and the object models that support them. At the lowest
level in Figure 1 are "object models" such as the System Object Model
(SOM/DSOM) from IBM and Component Object Model (COM) from Microsoft. These
system-level technologies are basically intended to solve the problem of tight
binary coupling between an application and the objects it relies upon.
Consider a C++ application that uses several classes whose methods are
implemented in DLLs. Conveniently, a DLL is not part of the application's code
and so can be upgraded or altered without affecting the application, as long
as the interface remains the same. Unfortunately, this idea falls apart if
changes are made to a DLL-based class that alter the size of the object, or
even just the layout of the virtual function table. In that case, the calling
application may need to be recompiled even though its source has not changed
textually. 
To sever the binary bond between client and server object implementations, an
object model is defined at a level of abstraction which subsumes the language
object model and renders it transparent. The model is usually expressed in
terms of an Interface Definition Language (IDL), which is processed
independently of the implementation language. IDLs predate object computing,
and many Remote Procedure Call (RPC) mechanisms (such as DCE) use them to
decide the calling protocol of a procedure. An IDL is way of defining the
interface to a service; often, the mechanism generates stub code callable by
the application. On the implementation end, it describes the interface to the
code that executes the function and generates stubs which can be filled in to
perform the operation. The RPC mechanism bridges the gap between the two.
Since the IDL must be translated (or mapped) into the implementation language,
this approach also fulfills the language-independence requirement: The stubs
can be generated in any language for which the IDL has a mapping. 
The main difference between interface definitions for procedural mechanisms
and those for object models, is that in the object model the interface is part
of a semantic construct that represents an object. Depending on the specific
model, this construct may have any or all of the characteristics and
advantages expected of objects, including encapsulation, inheritance, and
polymorphism. As an alternative to the static IDL approach of issuing
requests, many models offer a dynamic means of invoking requests. In dynamic
invocation, the interface is determined at run time, and the request data is
built up into a structure, which is passed to the system. Some implementations
offer both kinds of invocation.
Loose coupling of client and server objects would go a long way toward making
objects more reusable. Our ultimate goal, however, is the use of objects
across process address spaces, either within a single processor or across
multiple processors, in a heterogenous networked environment. The systems and
proposed designs discussed here don't necessarily achieve that. Some systems
only work across process boundaries on a single machine, and some don't do
even that much. Others work seamlessly across multiple machines in a net, but
are subject to other constraints.


Compound Documents


High above the fray of the operating-system wars are issues of concern to
application designers and users. These are the component-integration
facilities as found in OLE (Microsoft), OpenDoc (Apple), and OpenStep (Next).
As Figure 1 shows, these facilities rely on lower-level object models in order
to implement functions such as linking and embedding, drag-and-drop, in-place
activation, and scripting. These facilities all revolve around a
"document-centric" end-user model for applications. In this model, a
"container" application serves as the framework for presenting the user with a
number of individual "objects" or components, each of which is self-contained
in terms of its data and the actions which can be taken on that data. Such
groupings of objects are often referred to as "compound documents," and so we
might refer to all of these technologies as "compound-document technologies."
Figure 2 depicts a typical example of a compound document containing text,
image, sound, and spreadsheet table objects. (For an introduction to compound
documents, see "Compound Documents" by Lowell Williams, DDJ, March 1993.)
Compound documents make a nice model for end-user utilization of shared and
distributed objects. They are compelling enough that most of the major vendors
of these technologies are either producing their own high-level integration
model or cooperating in the development of someone else's. Examples are
Microsoft's OLE 2.0, which utilizes COM as its enabler and is a shipping
product, and the OpenDoc consortium, whose technology will eventually rest
upon IBM's SOM and its fully distributed progeny DSOM. The application
frameworks being designed by Taligent (a startup funded by Apple, IBM, and HP)
will also rest on SOM/DSOM. 
Compound documents are not simply an end-user technology, however. They rely
upon the capabilities of objects to describe themselves to applications and
export their interfaces for use by those applications. In essence, these
objects are dynamically linked modules. Application developers are already
finding OLE 2.0 useful both for integration at the object level and for
allowing applications to export interfaces which can be dynamically linked to
by other applications.


Crossing Borders



The fundamental idea behind interoperable objects is to pass through existing
boundaries such as those in Figure 3. In today's model of object-oriented
programming, there is a tight binary coupling between an application and the
classes of objects it uses. In many mainstream applications, everything is
implemented in a single language running in a single process located on a
single machine under a single operating system. The first boundary to fall is
address space. An "interprocess object model" allows a process in one address
space to request the services of an object in another, or two processes to
share an object in a third address space. 
The next boundary is the machine. It requires only a short leap of the
imagination to move from the idea of objects shared across address spaces to
the notion of objects shared among many interconnected processors. This short
leap, however, spans a great deal of complexity. Any interprocess object model
must be able to translate the data associated with requests between memory
models. A technology which crosses the machine boundary must also locate the
server object, establish communication with it, pack up the request and
parameters and ship them off, then wait for the results, unpack and translate
them, and deliver them back to the application. This is the most basic
requirement. Add to it the increased need for security, versioning,
repositories, name-collision resolution, and a host of other details inherent
in distributing objects across a network and you have the makings of, not a
short leap of imagination, but a big hurdle of technological complexity. Only
those technologies that cross over to an interprocess/interprocessor model can
be called "distributed-object technologies."
Two other boundaries don't have much to do with whether a technology is
distributed or not, but they do affect the potential for its application in
the real world. These boundaries are the programming language and operating
system, both important practical considerations for enterprise-wide systems.
The ultimate goal is a model which allows objects written in any language to
be shared among applications written in any other language, running on any
machine in a network, and under any operating system.


The Combatants


The combatants lining up on the interoperable-object battleground range from
large, cross-industry consortia and established system vendors to small,
entrepreneurial software houses. Due to space constraints, I'll focus here on
those contenders most likely to affect the mainstream, much the way that
Windows has done. Unfortunately, many of the smaller contestants who have
developed proprietary solutions that actually work now will get only brief
mention. 
A bird's eye view of the battlefield reveals that the principal tug-of-war is
currently between the Object Management Group (OMG) and Microsoft. OMG is a
consortium of more than 300 hardware, software, and end-user companies,
including every heavyweight in the business and, nominally, Microsoft itself.
OMG was founded in 1989 by 11 companies including Digital, Hewlett Packard,
Hyperdesk, NCR, and SunSoft. Those companies, along with ObjectDesign, were
authors of the "Common Object Request Broker Architecture" (CORBA)
specification Version 1.0, released in October 1991. It was followed in March
1992 by Version 1.1; the group is currently working on revision 2.0, due
sometime in 1994. CORBA specifies the architecture of an Object Request Broker
(ORB), whose job it is to enable and regulate interoperability between objects
and applications. ORB is part of a larger vision called the "Object Management
Architecture" (OMA).
It is more than passing strange to compare CORBA to Microsoft's OLE 2.0. Among
the aspects of OLE 2.0 is an application-level, component-integration
technology that has no real counterpart in the OMG world. OLE is built on a
foundation called the "Component Object Model" (COM) which performs some of
the same tasks as an ORB, but at a different scale, using different
techniques. Also, Microsoft's idea of an object model differs greatly from
that of the rest of the industry. In fact, the whole basis for comparing the
two technologies rests on the promise that in the future they will have
similar capabilities. 
If it were simply OMG versus Microsoft, this article would be much shorter.
There's more to the story, however. OMG's CORBA specification lays down the
plans for an architecture, but does not address implementation. In addition,
the spec itself leaves many areas undefined. The result is that, while you can
address CORBA's overall design and intent, when turning to real or promised
implementations, you are effectively faced with several proprietary
technologies. IBM, Digital, Hewlett Packard, Iona, ExperSoft, and SunSoft all
have (or have planned) implementations of the CORBA spec.
IBM's System Object Model is one major CORBA-compliant implementation. In some
ways, the Microsoft versus OMG contest has evolved into a battle between
Microsoft and IBM. Both companies offer technologies that are now shipping;
both are engaged in trying to shift the loyalties of desktop users from the
competitor's operating system to the homegrown alternative; and both consider
their particular visions of shared components as strategic technologies which
will serve them well in the larger contest for the operating-system dollar. 
While the behemoths line up to do battle, a number of small companies have
been quietly producing tools that enable some set of the full capabilities of
distributed-object computing to be realized. Some of these tools are
proprietary, others are headed toward CORBA or COM compliance. Examples
include RDO from Isis, Snap from Template Software, SynchroWorks from Oberon,
ILOG Broker/Server from ILOG Inc., and OpenBase-SIP from Prism Technologies.
Each of these vendors has a shipping product, and testimonials from users who
say they are using it now to create distributed applications. That's more than
some of the larger companies I'll cover can claim, which is a bit of irony,
but there you have it.


OMG, OMA, and CORBA


The Object Management Group was founded in 1989 to adopt a standard for the
interoperation of software--specifically, object-oriented software--across
operating systems and platforms in a heterogenous networked environment. CORBA
is a specification of an architecture and interface which allows applications
to make requests of objects in a transparent, independent manner, regardless
of language, operating system, or locale considerations. The nature of
objects--what they are and how they are created, destroyed, and
manipulated--is specified in the OMG object model, a part of the OMA.
The OMA spec is OMG's complete vision of the distributed environment. While
the CORBA spec focuses solely on the interaction of objects and the mechanisms
which enable it, OMA defines a broad architecture of services and
relationships within an environment, as well as the object and reference
models. As Figure 4 illustrates, OMA is built upon the ORB services defined by
CORBA which provide the interaction model for the architecture. The
environment is made richer with the addition of Object Services and Common
Facilities, both intended to serve as building blocks for assembling the
frameworks within which distributed solutions are built.
Object Services is an area covered by yet another OMG specification, Common
Object Services Specification (COSS), that defines a set of objects which
perform fundamental operations, such as lifecycle, naming, event, and
persistence services. The second stage of the COSS spec, expected late in
1994, defines relationships, externalization, transactions, and concurrency
control. Additional stages planned for the next two years will address issues
such as security, licensing, queries, and versioning. 
Common Facilities (CF) are the newest area of effort by the OMG. Unlike CORBA
and Object Services, which are low-level fundamental operations, the CF has an
application-level focus, and defines objects which provide key
workgroup-support functions: printing, mail, database queries, bulletin boards
and newsgroups, and compound documents. The OMG envisions this as the layer
most often used by developers working within a distributed environment. This
spec is also due sometime in 1994.


The OMG Object Model 


The CORBA specification describes the OMG object model, which underlies CORBA
and all of the OMA, as "classical": Clients send messages to servers, and a
message identifies an object and zero or more parameters to the request. The
OMG model strictly separates interface from implementation. The model itself
is concerned only with interfaces, to the extent that "interface" and "object
type" are synonymous. This approach is used by other technologies (such as
OLE) and results from the model's obligation to define the interface between
components regardless of their implementation language. 
In C++ programs, an object is identified by its unique memory address. In the
OMG model, objects are identified by "references"--an implementation-defined
type guaranteed to identify the same object each time the reference is used in
a request. The CORBA spec is silent on how references are implemented. ORB
vendors have implemented references as objects which carry enough descriptive
information about the object referred to make them effectively unique. The
CORBA spec explicitly states that references are not guaranteed to be unique.
The OMG chose not to define a Universal Unique Identifier scheme in Version
1.1 of the specification because of concerns about management and interaction
with legacy applications that have a different idea of an object ID. The lack
of a universal means of "federating" (that is, making globally compatible) the
names used to reference objects is a failing that the OMG intends to address
in Version 2.0 of the specification. 
Objects in the OMG model have a life cycle: They are created and destroyed
dynamically in response to the issuance of requests. The specification does
not define a means of allowing the application to create and destroy objects;
however, vendors such as IBM have implemented this capability in their
versions. Objects can also participate in any of the normal types of
relationships, the most important perhaps being subtype-supertype
relationships. Multiple inheritance is also permitted, although in this sense
it is limited to interface inheritance only. Since the OMG model does not deal
with implementation, there is no provision in the spec for implementation
inheritance. Inheritance between object interfaces is specified syntactically
using the OMG's IDL. Nothing prevents the developer of a set of server objects
from using implementation inheritance in the design of the servers, but the
dependency is not made explicit in the Interface Definition syntax. The ORB is
unaware that a set of servers accessed through an interface hierarchy is also
related by implementation inheritance; this therefore becomes a maintenance
and management concern.
The OMG model has a strong concept of "types"--identifiable entities which
have an associated predicate defined over a set of values. Where the predicate
is true, the value is said to satisfy and be a member of the type. Types are
used to restrict and characterize operations. The two primary categories of
types in the object model are Basic and Constructed. Basic types are nonobject
types which represent fundamental data types: signed and unsigned short and
long integers, 32- and 64-bit IEEE floating-point numbers, ISO Latin-1
characters, Booleans, enums, strings, and a nonspecific type, any. In
addition, a special 8-bit data type is guaranteed not to undergo conversion
when transferred from one system to another.
Constructed types are more-complex, higher-level entities, the most important
of which is the Interface type. An object is an "instance" of an Interface
type if it satisfies the set of operations defined by the type. An Interface
type is satisfied by any value which references an object that satisfies the
interface. Other types include Structs, Sequences, Unions, and Arrays. Structs
are pure data structures which operate much like C structs; Unions operate
like C unions. Sequences are a variable-length array type which may contain
any single type of object, including other Sequences. Arrays are fixed-length
arrays of a single type. Figure 5 illustrates the OMG-type hierarchy. 


The Architecture of an ORB


The job of the Object Request Broker is to manage the interaction between
client and server objects. This includes nearly all the responsibilities of a
distributed computing system already mentioned, from location and referencing
to "marshaling" of request parameters and results. To provide this capability,
the CORBA specification defines an architecture of interfaces, all of which
may be implemented in different ways by different vendors. Figure 6 depicts
the CORBA architecture, which consists of three specific components:
client-side interface, implementation-side interface, and ORB core. 
The client-side architecture provides clients with interfaces to the ORB and
to server objects. It consists of the Dynamic Invocation, IDL stub, and ORB
services interfaces. In general, the IDL stub interface comprises functions
generated based on IDL interface definitions and linked into the client
program. The function stubs represent a language mapping between the client
language and the ORB implementation. Thus, ORB capabilities can be made
available to clients written in any language for which stubs can be generated
from IDL specifications. There is currently an accepted language mapping for
C; mappings for C++ and Smalltalk are planned. All vendors of CORBA
implementations provide a C++ mapping based on a not-yet-approved OMG
proposal. The use of the stub interface brings the ORB right into the
application programmer's domain: The client interacts with server objects by
invoking functions, just as it would for local objects.
The Dynamic Invocation interface is a mechanism for specifying requests at run
time, rather than calling linked-in stubs. The dynamic interface is necessary
when the object interface cannot be known at compile time. It is accessed
using a call (or series of calls) to the ORB in which the object, request, and
parameters are specified. The client code is responsible for specifying the
types of the parameters and expected results. This information may come from
an Interface Repository, about which more will be said later. Most clients
will probably use stubs to access object services. In any case, the receiver
of the request--the server object--cannot tell whether the request was sent
via the stub or dynamic interfaces. 
The last of the client-side interfaces are the ORB services, functions of the
ORB which may be accessed directly by the client code. An example might be
retrieving a reference to an object. The details of these services are mostly
undefined by the specification.
ORB services are the one component that the architecture of the
implementation-side interface shares with the client-side architecture.
Additionally, the implementation-side interface consists of the IDL skeleton
interface and the Object Adapter. The skeleton interface is an "up-call"
interface, through which the ORB calls the method skeletons of the
implementation to invoke a method requested by a client. Most functionality
provided by the ORB to object implementations is supplied through the IDL
skeletons and the Object Adapter. The OMG expects only a few services to be
common across all objects and accessed via the ORB core. 
The Object Adapter is the means by which object implementations access most
ORB services, including generation and interpretation of object references,
method invocation, security, activation (the process of locating an object's
implementation and starting it running), mapping references to
implementations, and object registration. The adapter actually exports three
separate interfaces: a private interface to the skeletons, a private interface
to the ORB core, and a public interface for use by implementations. The CORBA
specification is less than concrete about the services an adapter needs to
support, but it is clear that the adapter is intended to isolate object
implementations from the ORB core to as great an extent as possible.
The spec envisions a variety of adapters providing services needed by specific
kinds of objects. The most generic adapter described is the Basic Object
Adapter (BOA). The BOA allows a variety of object implementation schemes to be
accommodated, from separate programs for each method, to separate programs for
each object, to a shared implementation for all objects of a given type (the
C++ model). The specification also describes adapters suited to objects stored
in libraries and object-oriented databases. 


Interface Definition Language (IDL)


Most interprocess object models are expressed in terms of a language for
defining interfaces. Since the early days of RPC mechanisms, these languages
have been known as "Interface Definition Languages" (IDLs). The basic purpose
of an IDL is to allow the language-independent expression of interfaces,
including the complete signatures (name, parameters, parameter and result
types) of methods. This is accomplished by providing a mapping between the IDL
syntax and whatever language is used to implement client and server objects.
The two need not be implemented using the same language--and in fact it is
anticipated that they will not be--as long as mapping is available for the
client and server implementation languages. 
CORBA IDL is a C-like language with many constructs similar to C++. In fact,
the specification credits Stroustrup and Ellis's Annotated C++ Reference
Manual as the source for the adaptation which became the CORBA IDL
specification. IDL obeys the same lexical rules as C++, while introducing a
number of new keywords specific to the needs of a distributed system. If
you're familiar with C++, you shouldn't have any trouble adapting to IDL.
Writing interface definitions in IDL is a bit like writing class declarations
in C++. Since IDL is expressly for interface definition, it lacks the
constructs of an implementation language, such as definitions (which actually
create storage for a variable or object), flow control, and operators. In
particular, there is no concept of public and private parts of the interface
declaration, since the notion of encapsulation is implicit in the separation
of the IDL interface from the implementation.


Interface and Implementation Repositories



As an alternative to IDL, the CORBA spec devotes a couple of paragraphs to the
idea of repositories for both interface and implementation definitions. On the
interface side, the repository is intended to augment the dynamic-invocation
interface by providing persistent objects which represent information about a
server's interface. With an interface repository, a client should be able to
locate an object unknown at compile time, query for the specifics of its
interface, and then build a request to be forwarded through the ORB. The
implementation repository contains information which allows the ORB to locate
and activate objects to fulfill dynamic requests. The spec also envisions this
repository being used to contain other incidental information about an object,
such as for debugging, versioning, and administration. The specification does
not define how either repository is implemented, so vendors have gone their
separate ways, as they have with much of the CORBA spec.


CORBA Prospects


The OMG has been criticized for resembling other industry consortia which
began with much fanfare about open architectures and cooperation, but
ultimately produced little of substance. In the case of CORBA, however, the
comparison is unfair; there is broad industry support for the spec, many
implementations are available, and serious work is underway to address
shortcomings in Version 1.1. CORBA is viewed by some large institutions as the
only viable technology that is truly cross platform and cross operating
system.
But is it a standard? Yes and no. Within the consortium it is a standard
description of an architecture, but it is not a standard for implementation,
and it is not as well-defined as it needs to be. The result is that each
implementation of CORBA is a proprietary product. There is currently no
interoperability between ORBs, though various partnerships have been
announced--SunSoft and Iona, for example.
As a technology, CORBA is maturing rapidly. Companies such as Netlinks
Technology (founded by two of the key implementors of DEC's ORB) have produced
tools which ease the building of distributed applications using CORBA. A
number of training companies now offer hands-on courses. Version 2.0 of the
specification may deliver on the promise of interoperability. Currently, CORBA
implementations are available for nearly all the major operating systems, and
if an organization is willing to stick to a single vendor, real-world
solutions can be built today.


IBM's System Object Model 


It would not be accurate to describe IBM's SOM as an implementation of CORBA.
Rather, it is a binary standard for objects that are operating-system and
language neutral, and whose interfaces conform to CORBA definitions expressed
in IDL. DSOM, the distributed-object framework which ships with the SOMobjects
Toolkit, is a CORBA-compliant ORB. Dealing with object implementations sets
SOM well apart from the CORBA spec, which defines object interfaces strictly
without regard to implementation. As do other CORBA-compliant implementations,
SOM extends the spec's capabilities: It supports implementation inheritance
and polymorphism, provides metaclasses which are manipulated as first-order
objects, and allows dynamic addition of methods to a class interface at run
time. 
SOM is not a distributed technology, nor is it even an interprocess
technology. DSOM serves these purposes. SOM was intended specifically to solve
the problem of tight binary coupling between an application and the libraries
of classes it uses. To accomplish this, SOM relies on interfaces defined in an
extended version of CORBA's IDL, which uses the SOM compiler and "emitters" to
generate the interface stubs and implementation skeletons described earlier.
In addition to language-neutral definition of object interfaces, SOM provides
run-time support for objects, which again sets it apart from the OMG model. 


The SOM Object Model 


IBM's SOM object model is a classical model in the same sense as the OMG
model--classes define the characteristics of objects, and method requests
identify a single object on which the method is to be executed. SOM is a
"singly rooted" object hierarchy: All objects derive from the base class
SOMObject, which provides run-time support methods common to all objects in
the system. As opposed to Microsoft, IBM's stated goal is to provide for
loosely coupled object libraries while retaining the commonly agreed-upon
principles of object orientation: encapsulation, inheritance, and
polymorphism. SOM provides for method overloading, run-time method resolution
(polymorphism), and all the common forms of implementation inheritance. Types
in SOM are CORBA IDL types, as described in Figure 5. Unlike CORBA, these
types are used in the implementation of SOM objects, as well as the definition
of the interfaces to them.
Also unlike CORBA and yet more like "pure" object-oriented languages, SOM
classes are themselves objects, which are instances of SOM "metaclasses." A
metaclass is (roughly) the type of a class. Whereas a class describes a set of
potential object instances, a metaclass describes a set of potential classes.
In practice, SOM metaclasses function similarly to static-member functions and
variables in C++. Metaclasses in SOM define functions that operate on the
class as a whole, including methods which execute when an instance of the
class is created, functioning much like a C++ constructor. Figure 7 shows the
relationship between classes and metaclasses in the SOM object model. Note
that SOMClass is the parent of all metaclasses in the same way that SOMObject
is the parent of all classes. Interestingly, SOMClass is itself derived from
SOMObject. It is from this derivation that metaclasses in the IBM model
receive the common methods which allow them to behave as first-order objects
in the system. Neither SOMObject nor SOMClass contain member variables, so
classes and metaclasses inheriting from them suffer no increase in size.


SOM Extensions to CORBA IDL


As with CORBA, the process of creating a SOM object involves using the IDL to
define its interface and attributes. Once these are specified, the SOM
compiler generates the stub and skeleton bindings in the preferred language.
The SOM compiler uses "emitters," back-end code generators which perform the
actual mapping of IDL syntax into the implementation language and generate the
implementation skeletons. On the client-side, the emitter generates the
include files that specify the method signatures clients use to invoke methods
on objects.
SOM adds to the standard CORBA IDL syntax a number of extensions to support
the SOM model or provide convenience in object specification. These include
implementation statements, instance variables, and private methods and
variables. Implementation statements provide information about an object
implementation to the SOM compiler, such as the metaclass of the object,
version information, whether or not the object is persistent, the name of the
DLL in which it is implemented, and so on. Implementation statements are
nested within the interface statement for the object. In addition to the
implementation information ("modifiers"), these statements allow the
declaration of "instance variables"--declarations of IDL types meant to serve
as private data to an instance of the object. These variables are distinct
from the attributes declared in the interface statement as defined by CORBA.
SOM IDL allows the declaration of private methods and variables in the
specification of an interface. The intent is similar to that of private
properties of C++ classes, though the mechanism is quite different. Under
normal operation, the SOM compiler ignores private methods and variables, and
only the public-interface bindings are generated for client use. A
command-line switch enables generation of bindings for the private methods, as
well as access methods for private variables, so that these declarations can
be provided to modules which need access to them. Methods and attributes
declared as private in the specification can thus behave a bit differently
from their C++ counterparts. Where C++ private properties are visible only
within the class methods, private properties in SOM IDL may be exposed in a
controlled manner to any client that needs them. This is analogous to
declaring a class or function to be a friend of a C++ class, thus allowing
access to private methods and data.


Inheritance in SOM


SOM supports interface inheritance in the same manner as CORBA. Subclasses
inherit the interface signatures of their parent classes, so that any method
available on a parent class is also available on the subclass. Unlike CORBA,
subclasses also inherit the procedures which implement those methods, unless
the methods are overridden or specialized. Subclasses may also introduce new
methods, attributes, and instance variables which will be inherited in turn
from any class derived from them. This is consistent with the common model of
class inheritance in languages such as C++. 
Metaclasses in SOM are also participants in inheritance relationships. These
relationships are separate from the inheritance relationships between classes.
For example, a class A with a metaclass M_A may be subclassed by a class B. If
the class B explicitly specifies its own metaclass M_B, then it does not
automatically inherit the relationship between A and its metaclass. In some
cases, this can lead to incompatibilities. Suppose that in Figure 8, class A
contains a method Foo() which in turn invokes a class method Bar() defined in
metaclass M_A. Class B will inherit Foo() from A; however, since B has no
relationship with A's metaclass, there is no Bar() for the inherited version
of Foo() to invoke. A hierarchy of this type is not allowed in SOM: The SOM
compiler will automatically generate an intermediate metaclass, as in Figure
9. This intermediate metaclass M_C is derived from both M_A and M_B, ensuring
that class B's metaclass provides the method Bar() upon which B::Foo()
depends. 
SOM also supports multiple inheritance, which allows a subclass to inherit the
interface and implementation of multiple base classes. A classic problem with
multiple inheritance is the ambiguities that may arise when a class inherits
either the same method from two different bases or different methods with the
same signatures. Any multiple-inheritance model must provide a means of
disambiguating such method collisions. SOM automatically detects and resolves
such situations by giving precedence to the method inherited from the leftmost
ancestor of the class. IBM calls this "left-path precedence." 
If you decide when implementing a class that left-path precedence is not
appropriate, you have two alternatives. You can create a new metaclass which
alters the makeup of the method table for the class. This effectively alters
the semantics of SOM's default inheritance mechanisms. Alternatively, you can
override the inherited method and make a fully qualified call to the parent
method you select. 
Multiple inheritance in SOM also results in a similar problem. If, for
instance, a class C is derived from two classes A and B, and if A and B both
declare explicit metaclasses M_A and M_B, then the SOM compiler must generate
a new metaclass M_C, which is derived from M_A and M_B, and made the metaclass
of class C. The programmer may override this behavior by creating the derived
metaclass explicitly and assuring that it supports all the required methods.
If all this sounds complicated, that's because it is. The advantage of
metaclasses is the availability of information about a class at run time. C++
classes provide capabilities similar to metaclass methods in SOM, by allowing
static class methods to be declared, but C++ classes are compile-time
constructs about whom most information is lost once the program has been
built. 


SOM Method Resolution 


Method calls in SOM are bound at run time using a mechanism similar to virtual
function calls in C++. Each class has a method table which contains pointers
to the procedures that implement its interface methods. Unlike C++, SOM
metaclasses can be made to alter the composition of these method tables. The
SOM table-lookup mechanism, known as "offset method resolution," allows method
calls to behave polymorphically at run time, exactly as C++ virtual functions
do. Like C++ virtual function calls, offset resolution requires that the names
of the method and the class that introduced it be known at compile time.
In addition to offset resolution, a method call may use name-lookup resolution
or dispatch-function resolution. Name-lookup resolution, a dynamic-method
binding similar to that in Smalltalk and Objective-C, is more flexible than
offset resolution because the name of the method can be unknown at compile
time. You can use it when a method is selected at run time based on user input
or when a method has been added to a class interface dynamically. As you might
expect, it is less efficient than offset resolution, because finding the
method procedure involves searching a number of data structures associated
with the class. Dispatch-function resolution is different from both offset-
and name-lookup resolution. A dispatch function allows the implementor of the
class to decide arbitrarily which rules and conditions will be used to find
and invoke a procedure which implements a method. It is the most flexible--and
most costly--of the three means of binding method invocations in SOM. 


Distributed SOM 


The SOM capabilities discussed thus far are for objects which exist in the
same process address space as the calling application. While SOM does provide
a robust implementation of a language- and operating-system-neutral object
model, it is not a distributed-object technology. To address this limitation,
IBM ships with the SOMobjects Toolkit a "framework" (a set of SOM classes)
known as "Distributed SOM" (DSOM). Where SOM defines an
implementation-independent model for objects, DSOM extends this to allow use
of objects independent of their location with regard to the calling
application.
In its current version, DSOM supports two types of distribution: across
process spaces on a single machine, or across multiple machines in a network.
The former is an extension to SOM packaged by IBM as "Workstation DSOM," and
the latter a CORBA 1.1-compliant ORB packaged as "Workgroup DSOM." 
DSOM is currently available on AIX 3.2 (IBM's flavor of UNIX), OS/2 2.0, and
Windows 3.1. Workgroup DSOM supports distribution of objects across local-area
networks comprised of machines running all three operating systems, making it
a multiplatform model. Future versions of DSOM will allow distribution across
larger, enterprise-wide networks. Transport protocols currently supported
include NetWare IPX/SPX on AIX, OS/2 and Windows, NetBIOS on OS/2 and Windows,
and TCP/IP on OS/2 and AIX. An application can also define its own transport
protocol.



The SOM Toolkit


IBM's SOM is a complete, shipping technology currently available for three
popular operating systems. In addition to the basic features, the SOMobjects
Toolkit includes several frameworks consisting of SOM classes which provide
higher-level facilities for application developers. These facilities include:
a CORBA-compliant framework for Interface Repositories; a Persistence
Framework, for archiving objects between run-time sessions of an application;
a Replication Framework that allows an object to be mirrored in multiple
address spaces (with locking, synchronization, update propagation,
fault-tolerance, and guaranteed consistency among copies); and an Emitter
Framework to aid developers in creating new language bindings for SOM IDL. The
kit also includes collection classes, utility metaclasses, and
event-management classes as well as bindings for C and C++. 
Like most models of this kind, SOM and DSOM are complex. One development which
may ease the conversion from binary-coupled objects in C++ to SOM objects is
the "direct-to-SOM" support in C++ compilers from MetaWare and Symantec, among
others. In a direct-to-SOM implementation, the compiler generates SOM classes
directly from C++ code, allowing existing class libraries to be recompiled as
binary-insulated SOM classes. 


Microsoft's OLE


Microsoft's OLE 2.0 is the heavyweight wildcard in the race to define
standards for language-neutral and distributed-object technologies. The
foundation of OLE is its Component Object Model (COM). This model, along with
the high-level application integration technology that rests on top of it,
represents a clear challenge to CORBA and CORBA-compliant technologies. As
expressed in OLE, Microsoft's vision of system-object technology presents a
strong contrast to that of CORBA and SOM. It also diverges from some commonly
accepted principles of object orientation.
Despite the differences between low-level system-object technology and
high-level component-integration facilities, Microsoft has striven to combine
the two in the minds of developers. Marketing tactics aside, the reason for
this is likely Microsoft's role as a leading vendor of application packages
and suites, in contrast with the system vendors (HP, Sun, DEC), who are
focusing on CORBA and other low-level technologies.
OLE 2.0 is not Microsoft's first foray into the world of interprocess object
communication. To understand the rationale behind OLE, it's worth a moment to
examine the previous process-interaction model, Dynamic Data Exchange (DDE)--a
broadcast protocol whereby an application can set up a channel of
communication with a "DDE server" located elsewhere on the machine on which
the app is running. DDE is an inherently asynchronous protocol, meaning that
once communication is established (itself no mean feat), the caller ships off
a request and waits in a loop for the results to come back. Such a mechanism
is more complicated than a synchronous function call, due to the possibility
of failed communications, timeouts, and other errors which the looping
application must detect and recover from. Many developers have found DDE
frustrating and error prone, hence its lack of popularity. Microsoft has tried
to make it more palatable by adding a library, DDEML, that handles many of the
more complex aspects of the protocol, but apparently this has not been enough.

Version 1.0 of OLE was designed mostly as an embedding-and-linking mechanism
for compound documents; it used DDE as its underlying communications
mechanism. Thus, OLE 1.0 inherited many of the problems associated with an
asynchronous broadcast protocol. OLE 2.0 enhances Version 1.0 by defining many
system services in addition to 1.0's linking and embedding. These services
include Uniform Data Transfer (an expansion on older data exchange protocols
such as the clipboard), Structured Storage (a way of providing persistent
storage for nested hierarchies of objects), and OLE Automation (a way for
applications to expose interface APIs for use by other applications and
scripting languages). The most important change made to OLE 1.0, however, is
the abandonment of DDE as the underlying protocol in favor of the Component
Object Model. 
The relationship between COM and OLE 2.0 is shown in Figure 10 where COM
specifies a binary standard for object interaction. Microsoft provides
run-time support for COM via COMPOBJ.DLL, which implements a small API for use
in creating and manipulating the entities known as "Windows Objects."


The Component-Object Model


A Windows Object is a functional entity that obeys the object-oriented
principle of encapsulation. Clients do not manipulate Windows Objects
directly. Instead, the object exposes to its clients various sets of function
pointers, known as "interfaces." An interface is effectively a pointer to a
table of function pointers. Figure 11 depicts the relationship between an
interface table and the object implementation. An object may support any
number of interfaces. All Windows Objects must support the most basic
interface, IUnknown (by convention, interface names start with "I"), which
supports three methods that supply basic functionality to all Windows Objects.
These methods are QueryInterface, which allows a client to inquire which
interfaces an object supports, and AddRef and Release, which manage reference
counting for objects. Reference counting is a mechanism familiar to most
object-oriented programmers with which the system can track how many clients
possess a pointer to one or more of a given object's interfaces. When a
reference count reaches zero, the system can delete the object and recover its
resources.
Microsoft has specified a set of 60 or so interfaces which comprise the OLE
2.0 architecture. These include interfaces for In-place activation, Linking,
and Embedding--the core of OLE 2.0's compound-document technology. Interfaces
also exist for Drag-n-Drop, Uniform Data Transfer, Automation, Compound Files,
and other useful capabilities. Objects are also allowed to create custom
interfaces. However, support for this is currently limited, and in fact
Microsoft recommends that COM developers stick to the standard interfaces for
the time being. 


Inheritance versus Aggregation


Microsoft's opinion is that some of the standard mechanisms of object-oriented
programming are not properly applied in an interprocess object model. In this
view, the particular mechanism that causes the most trouble is inheritance.
While implementation inheritance is useful in constructing stand-alone
applications, Microsoft believes that inheritance is improper when applied to
interprocess object models. The reasons for this lie in the "fragile base
class problem," which results from a dependency between a derived class and
its parents that is "implicit and ambiguous." Should the base class alter its
behavior, that alteration may force changes in derived classes, according to
Microsoft. While this is certainly true, experienced object-oriented
programmers might point out that the interface between any two classes,
whether parent and derived, or client and server, represents a contract which,
if changed, will force alterations on the other side of the relationship.
Nevertheless, Microsoft's concern with the potential management problems of
implementation inheritance was enough to rule out supporting it in COM. To be
fair, it should also be noted that Microsoft has a more compelling argument
against inheritance: It intends to use Windows Objects to implement many
advanced features of its next-generation operating systems. Eventually, COM
object interfaces will take the place of the procedural API through which
Windows is now accessed. When using Windows Objects, which are part of the
operating system, an application will not have access to the source code for
these objects. Such a restriction makes it difficult to use these classes as
bases for implementation inheritance. Developers who use third-party libraries
for which no source is available will likely sympathize.
In place of implementation inheritance, Microsoft offers a different model of
code reuse called "aggregation," which allows an object to be constructed from
subobjects. In object-oriented programming languages, aggregation (or
"composition" or "containment") may take many forms. The containing object may
allow access to the subobjects directly; it may provide forwarding
capabilities through which the subobject's methods can be invoked via the
owner's interface; or, it may use the subobject entirely for internal
purposes. In COM, the first scenario is true aggregation, while the second is
containment. Does aggregation function as a complete substitute for
implementation inheritance? Not really. Inheritance in object-oriented
languages is a syntactic mechanism enforced automatically by the language.
Aggregation is a convention subject to implementation in any number of ways.
Inheritance usually requires little or no code to support it, whereas
aggregation must be completely supported by the programmer. Whether
object-oriented programming can be done effectively without implementation
inheritance is something for individual developers to decide.


COM Object Identity 


COM identifies objects differently from CORBA and SOM. With CORBA, there's a
potentially significant problem with making object names globally compatible
across distributed systems. In a dynamic environment, name collisions can
cause applications to "link up with" the wrong object, with possibly
disastrous results. Microsoft has foreseen the problem and utilized a
mechanism to cope with it: "Globally Unique Identifiers" (GUIDs), 128-bit
integers guaranteed to be "unique across space and time." You can obtain GUIDs
for identifying COM objects either by requesting a block of 256 GUIDs from
Microsoft or by using a network card and the UUIDGEN.EXE utility shipped with
the OLE 2.0 SDK. UUIDGEN uses the date, time of day, and a unique number
embedded in the network adapter to create a set of 256 GUIDs. The chance of
this tool generating duplicate IDs is, according to Kraig Brockschmidt, "about
the same as two random atoms in the universe colliding to form a small
avocado" (Inside OLE 2, Microsoft Press, 1994).


COM Object Creation and Marshaling


In addition to the interface specifications, Microsoft provides run-time
support for COM in the form of COMPOBJ.DLL, a library of API functions for
object creation and marshaling. Objects are created by requesting them from
the API using a GUID. Microsoft has defined GUIDs for the standard interfaces
which come predefined with OLE 2.0. When COMPOBJ.DLL creates an object, it
returns to the requester a pointer to the first interface of the object,
usually IUnknown. COM objects need not be implemented such that they can be
created using this mechanism. Such implementation, however, insulates users of
the object from its implementation language, and in future versions of the
technology will also insulate clients from object location in a distributed
system. To make an object addressable from COMPOBJ.DLL using this mechanism,
the object must reside in a DLL or executable file and must export a specific
set of functions which COMPOBJ.DLL uses to interact with the object during its
life cycle.
The other major piece of functionality in this module involves a process
Microsoft refers to as "marshaling"--translating and delivering parameters to,
and results of, a method invocation across address spaces. The marshaling
mechanism in OLE 2.0, "Lightweight Remote Procedure Call" (LRPC), currently
works across address spaces on a single machine. In the future, Microsoft
intends to implement a more robust RPC mechanism compliant with OSF DCE that
will allow object interaction across networks and between Windows and OSF
servers. Microsoft claims that objects which conform to the current interface
in COMPOBJ.DLL will require no changes--source or binary--to work with the
proposed RPC mechanism. 
One limitation of the current mechanism is that it does not support generic
marshaling. That is, Microsoft has provided code in COMPOBJ.DLL which handles
marshaling only for the standard, predefined interfaces currently shipping
with the OLE 2.0 SDK. Support for generic marshaling remains in the future, so
some developers advise against creating custom interfaces now. At present,
creators of custom interfaces must provide their own marshaling mechanisms, a
difficult task beyond the resources, if not the abilities, of many
programmers. The result is that, today, COM is limited for use in support of
the compound-document architecture defined by Microsoft. 


Wrapping It Up


How do COM and OLE 2.0 compare with the other technologies? COM is available
today only as the set of interfaces which defines OLE 2.0 capabilities. In
that sense, there is no possibility of direct comparison to CORBA or IBM's
SOM. The intent of those technologies is similar to that of COM, but the
implementation of COM is currently too restricted. By comparison, IBM's
technology is more complete and far-reaching. Also, IBM's technology is more
consistent with generally accepted notions of object-oriented programming.
Microsoft claims it intends to create distributed versions of COM (DCOM?) and
to implement OLE 2.0 on non-Windows platforms. The Macintosh version of OLE
was demonstrated in March 1994 and is reportedly in beta.
Microsoft and Digital Equipment have announced an agreement that will
integrate DEC's ObjectBroker, a CORBA-compliant ORB, with Microsoft's COM,
creating the Common Object Model (also known as COM). This will allow the two
technologies to interoperate to some extent. Microsoft has not ruled out
more-direct CORBA compliance, if the market demands this.
Despite the alternatives, the business reality is that, unless some other
system overtakes Windows as the desktop leader (now at 60 million
installations), Microsoft's stated intent to build future operating systems on
top of COM makes this technology one that you ignore at your own financial
peril. The likely scenario is that the current Win32 API continues to exist
within future systems, with advanced features being provided by COM objects in
a gradual migration strategy. OLE 2.0 provides capability for application
integration and interoperation that CORBA and DSOM can only hint at--the
former through the Common Facilities Compound Document initiative, and the
latter through the OpenDoc collaboration with Apple, WordPerfect, Borland, and
others.
In truth, there is no easy answer--and there likely won't be one in the near
future. Windows developers had better pay attention to COM/OLE 2.0, while OS/2
and AIX developers had better become familiar with SOM and DSOM. If you need
to build distributed applications now, then COM is not at all useful. If you
are compelled by the market to interoperate with evolving Windows
implementations, then moving to COM is a virtual mandate emanating from the
company that controls that operating system. 
Figure 1 Relationship between applications, component-integration models, and
object models.
Figure 2 A compound document with text, image, spreadsheet, and sound objects.

Figure 3 Application boundaries for distributed processing.
Figure 4 Object-management architecture.
Figure 5 OMG-type hierarchy.
Figure 6 CORBA architecture. From the programmer's perspective, the standard
interfaces are the Dynamic Invocation, ORB, and (Basic) Object Adapter. The
IDL stubs are also standard, depending on the language mapping used by the
client program.

Figure 7 SOM class relationships.
Figure 8 Example of incompatibility caused by metaclass dependency for method
Foo() of class A.
Figure 9 Generated metaclass to resolve incompatibility caused by metaclass
dependency for A::Foo().
Figure 10 OLE 2.0 interfaces and the Component Object Model.
Figure 11 Relationship between client, component object, and interface.

























































October, 1994
Oberon System 3


Designed with software reuse in mind




Johannes L. Marais


Hannes is a member of the Institute for Computer Systems, ETH, Zurich and can
be contacted at marais@inf.ethz.ch.


If nothing else, history has taught us that technological progress is achieved
by reusing, refining, and building upon the ideas and work of others. In the
world of software, "componentware" or "interoperable objects," as the current
push for reuseable, off-the-shelf software has been dubbed, is viewed as just
such a technological advance. 
True componentware is, however, an elusive goal that requires us to rethink
how software systems are constructed and programmed in-the-large. Issues we
must consider involve how software systems can be extended with new parts and
how parts can be exchanged at run time. Even more problematic is how to extend
or subclass existing parts of a running system, exemplified by the many
technical discussions about inheritance, delegation, message forwarding, and
aggregation. Not only do we have to specify how components are created with
programming languages, but also how the components are combined with each
other to form a system. One approach is to regard both the programming
language and the operating system as a symbiotic whole. There are many reasons
why this is meaningful, but none illustrates it as well as memory
deallocation.
In a truly extensible system, components use other components. The network of
these components is continuously expanded at run time by the addition of new
or existing subclassed components. Since you don't know what components will
be added to the component soup tomorrow, you cannot identify a single instance
responsible for deallocating components when they are no longer used
(referenced). Clearly, what's needed is a system-wide garbage collector, which
implies a type-safe programming language. If these concepts sound foreign,
then you are probably used to linking all your components together in one
monolithic block, blocking the ability of other programmers to extend your
application. 


The Oberon Project


Such considerations were the driving force behind the Oberon project at the
Institute for Computer Systems, ETH Zurich. Oberon is both a programming
language and a programming environment or system. (Incidentally, the name
"Oberon" was taken from one of the moons of Uranus, which the Voyager probe
had photographed around the time we started our project.) The Oberon language,
created by Niklaus Wirth in 1986, is the successor to the Algol, Pascal, and
Modula-2 generation of languages. The Oberon system, developed in cooperation
with Jrg Gutknecht of ETH, illustrates how the Oberon language supports
programming in-the-large. Completed in 1989, the Oberon system and language
were ported from special-purpose hardware to platforms such as DOS, Windows,
and UNIX, and documented in books such as The Oberon System: User Guide and
Programmer's Manual, by M. Reiser (Addison-Wesley, 1992), Project Oberon: The
Design of an Operating System and Compiler, by N. Wirth and J. Gutknecht
(Addison-Wesley, 1992), and others. The Oberon system is available from the
ETH Zurich anonymous ftp server neptune.inf.ethz.ch in the /pub/Oberon
directory. Versions may also be ordered on diskette for a small fee from the
Institute for Computer Systems, ETH-Zentrum, CH-8092, Zurich. A version for
SPARC workstations is also available. 
In 1991, Gutknecht and his group began revising the Oberon system. Their
resulting work shows some similarities with commercial componentware systems,
although it is much simpler and cleaner to understand, more flexible in many
respects, and more compact. Today's system, called "Oberon System 3," is the
basis for further research and experimentation at the Institute (see Jrg
Gutknecht's "Oberon System 3: Vision of a Future Software Technology" in
Software Concepts and Tools, 1994). The system has been ported to a number of
hardware platforms and is also available at no charge.


The Object Model


Oberon System 3 adds several new concepts to the classic Oberon
implementation. Most importantly, System 3 supports persistence by binding or
inserting objects into an object library, although they can also be unbound or
free when of a more temporary nature. A library, normally stored in a file,
functions as a container for objects and allows applications to access objects
through a simple indexing mechanism. 
Public and private libraries allow applications to control the visibility and
sharing of objects. Interestingly, fonts in Oberon System 3 are nothing more
than public libraries of character-pattern objects indexed by ASCII code. Text
is then reduced to a stream of (Library, Index) pairs. Note that this implies
text may contain objects that do not have a pattern nature. Documents, or
collections of objects with arbitrary relationships (pointers between each
other), are often stored in the same library, although pointers between
libraries may also exist. In this case, one library is importing an object
from another library. Such references can be made by normal pointer variables,
by (Library, Index) pairs, or by names in the form "L.O", where L indicates
the library and O the object. The names of public or exported objects in a
public library are stored in an associated-library dictionary mechanism. The
public library has the useful property of allowing applications to share
objects. In Oberon System 3, this allows you to link objects into other
documents; for example, an icon library collection is shared by many
applications in the system.
Objects have local states and respond to messages. The local state of an
object is called the "object attributes." To allow the user to configure the
state of an object, the system defines a special message to retrieve, set, and
enumerate attributes, and a special universal editor called the "Inspector" to
edit them. This is in addition to setting local variables directly by the
program. The system uses three larger message classes: one that all objects
should respond to, another for objects of a visual nature, and a third class
of application-defined messages. 


Gadgets


The object model forms the basis of a GUI toolkit called "gadgets." Each
dialog element, or gadget, is an object that can be embedded in any UI or
application. Gadgets can float inside the text stream of a text document and
be embedded in a panel interface, graphic editor, or even our System 3
page-layout system. In fact, all gadgets can be integrated and reused in any
other System 3 environment. Often, container gadgets manage other gadgets as
their children. They act as the glue to build bigger components out of smaller
ones. The principal containers are the panel gadgets (two-dimensional edit
surfaces) and the text gadgets (complete text editors with support for
embedding), although more-refined containers like books and special elements
are available for building menu-like structures. Most containers specify
relatively few rules for parentship, meaning that almost all gadgets can be
inserted into all containers. Listing One presents the source for a simple
gadget: a small, rectangular block that can be embedded in all System 3
environments, colored using the Inspector, and moved and resized with the
mouse. This module (discussed in detail later) is often the starting point for
creating your own gadgets. 
Another property of gadgets is that they can be modified and used wherever
they are located. In effect, UI construction is reduced to document editing.
Oberon System 3 users create new UIs or modify existing ones in typical
drag-and-drop fashion. A UI is frozen only after it is explicitly locked with
the Inspector, and users can unlock a UI at any time to make adjustments. Such
dynamic behavior requires a relatively thin interface between the UI and the
application. This interface is governed by several rules and system
guidelines.
First, you need to specify what the gadgets should do when they are used
(clicking on a button, for example). Using the Inspector and a very simple
script facility to pass parameters, a user can associate an action with each
gadget. This action is coded as a procedure in the Oberon language. The
procedure can search for objects in the user interface and change their state
accordingly. Often, special-purpose gadgets acting as abstract data types of
commonly used data structures can be linked interactively into a UI, which is
then visualized by other components in the UI. This is, of course, the
well-known Smalltalk Model-Viewer-Controller (MVC) framework which allows us
to decouple applications even further from their UIs. With these and other
techniques, it is possible to decouple the UI to such a degree from the
application that it can be modified or even exchanged while the application is
running.
It is also possible to construct so-called "links" or connections between
objects at run time and follow these links using the Inspector. The resulting
network-like structure of objects can easily be made persistent by inserting
all objects involved into a library. The links relay messages between objects
and form the basis on which the whole hierarchical display space, or screen,
is organized. The screen itself is an object, which is then decomposed into
document gadgets, which in turn have menu panels and contents right down to
the simplest gadgets, like buttons and check boxes. We have experimented with
very fine-grained gadgets, too; the System 3 graphic editor provides line,
spline, circle, ellipse, and other gadgets, which can be used in any other UI.
Using special tuning measures, we are able to edit very complicated diagrams
that contain thousands of single gadgets with respectable speed. 
The success of such a system hinges on a strict message protocol. The complete
integration property of gadgets requires an exact message protocol, and this
is where the application programmer comes in.


Programming Oberon System 3 


Oberon System 3 supports several layers: The first and simplest is creating
UI/documents interactively and adding simple behaviors to them; the next is
programming Oberon code fragments that manipulate existing gadgets; and the
third is programming your own gadget. Here, you must distinguish between two
difficulty levels: constructing your own leaf gadget (that is, a gadget that
does not contain any further gadgets) and programming a container gadget that
manages other gadgets. In this context, "programming" means extending an
existing gadget or creating a new gadget from scratch. System 3 provides code
skeletons that can be modified quickly to reach your goal. 
The Oberon system is structured as a tree of Oberon modules in a shared
address space. Modules are loaded and linked dynamically when required. All
imported modules are also loaded on command. Modules normally remain in memory
until they are explicitly freed. Each module contains one or more
implementations of software components and allocates global storage from a
system-wide, shared-memory heap. Memory is recycled automatically when not
referenced by a mark-and-sweep garbage collector. In addition to type
definitions, variables, and procedures, each module may contain several
exported, parameterless procedures called "commands." Command procedures can
be called by the user directly from the UI. Behind the scenes, the system
converts strings of the form "M.P", where M is the module name and P is the
name of the parameterless procedure, to a jump address. Commands are typically
invoked by simply clicking on the string "M.P" written somewhere on the
display. Commands are also invoked as a side effect of using a gadget.
Commands operate on global variables of their own or imported modules, and are
executed in sequence by the user. Parameters are passed from the UI to
commands by global variables. All Oberon systems contain a set of standard
modules for keyboard and mouse input, display output, file manipulation,
system management, and the like. Most important, support for a few abstract
data types like text and bitmaps are provided, in addition to standard editors
for them. You can write new modules that use all the existing modules; for
example, you can write modules that operate directly on the text-editor
components without having its source code (the compiler generates an interface
definition or symbol file for each module). The editor extension will link
itself dynamically to the editor module as soon as one of its commands is
executed. The Oberon system provides a comfortable environment for writing,
compiling, loading, testing, and then unloading modules again.


Objects and Messages


Oberon System 3 components, called "Objects," are defined as a base type in a
module with the same name. The Objects module defines the messages that each
component should respond to. Typical messages store or load an object, make a
copy of an object, or request attributes of an object. These messages form a
contract that all components should obey. Typical objects have local variables
and respond to messages with a so-called "message handler"--a procedure
variable that determines what message an object has received, then acts on it.
By convention, only one message handler is used, although you may define more.
The messages are recognized as RECORDs and passed to the message handler on
the stack; see Example 1. In addition, you have to pass the self parameter
explicitly to the message handler. Message-type extension allows you to build
an additional type hierarchy next to the normal hierarchy of objects. Example
2(a), for instance, creates its own object, a new message type, and a message
handler. The message handler determines what to do by explicitly testing for
each message type. Example 2(b) creates a new instance of the object and sends
a message to it. In languages such as C++, you don't need to initialize method
handlers or define message record types. In Oberon, however, it is necessary
(a variant of Oberon called "Oberon-2" provides some relief here at the cost
of some flexibility).
Messages form another type hierarchy orthogonal to the object hierarchy
(although new message types normally belong to a new object type), allowing
you to layer or screen messages. For example, you can create an extension of
the object's base message type that defines a new base type, which is sent to
objects that can display themselves. In Oberon System 3, these objects are
called "Frames" and have a corresponding FrameMsg base type with additional
information useful to frames; see Example 3(a). A Frame can, for example,
determine if the message it receives is based on a Frame message before
responding to it. In this way, an object can selectively respond or screen
messages based on a certain type without knowing the message's real type. This
is useful with message forwarding. Example 3(b) defines a new Frame message to
display a frame on the display. Note that you sometimes collapse slightly
different semantics of an action into one message and distinguish between them
through an id field in the message itself. The Gadget's base type is again a
type extension of the Frame type and defines further local variables for
attribute management, its visible region on the display, current state, and so
on.
One reason for handling messages this way is that you can send messages to
objects you don't know yet, and objects can forward messages they don't know
to other objects. In most other object-oriented languages, you at least have
to know the type of the object in which you activate a method, and each object
can only respond to messages that it knows about. Oberon System 3's message
handling is exactly what's needed for building extensible systems. First we
investigate why objects need to respond to messages they don't understand.
This is necessary because you need to structure the system, and you need
special objects that manage other objects. These container objects manage a
set of child objects. Here you can imagine a display-manager component that
arranges a set of documents on the display and forwards unknown messages to
its children, which might in turn forward it to their subcomponents. For
example, the display manager might not know about a message to the documents
that some shared global data value has changed. We cannot determine beforehand
what the system will be used for, how exactly components will be structured,
or what messages are required to manage objects. Therefore, we must program
all containers or component managers to forward messages to their children, in
the hope that the children will make some sense of them. Often the containers
may determine to what class a message belongs and take some special action
(like refusing to forward the message when it does not manage any displayable
objects). 

Beyond this, the system needs to be able to communicate with objects that it
does not know about yet (otherwise we can't extend our system!). We need
assume only that the component is an object and thus understands at least the
base object-message type. The message handler, not the type definition of an
object, determines what messages an object understands. This means that an
object can respond to messages declared in different modules, perhaps
logically belonging to different object classes. This can, when compared to
other object-oriented languages, be regarded as a sort of message multiple
inheritance. Oberon System 3 has a set of messages that handle the interaction
with text editors and function with all the different text editors available
for the system. In this way, you can create a spelling checker that functions
for all the different text editors. In other systems, this is often solved
with special scripting languages (AppleScript or Visual Basic, for instance)
that glue applications together. 
Another advantage of this model is that you can change the behavior of objects
without affecting their type definition, and thus without invalidating all
clients or extensions of this object. For example, you can change objects to
respond to more messages, alleviating the so-called "fragile-base" problem at
the cost of opaque object implementations. The latter is a major problem
because it is not always explicit what messages an object will respond to. In
Oberon System 3, each message has a result field in which an object must
indicate if it responded to a message or not. Another problem is that
explicitly testing message types takes time. Thus, it is imperative that the
handlers and message hierarchy be structured to facilitate quick determination
of the message type.


The Gadget Example


As mentioned earlier, Listing One is the code for a small, rectangular gadget.
Lines 3--6 declare the new gadget as a type extension of the base gadget type
with a local variable to store the color of the block instance. The asterisks
following the variable and type names indicate that they are exported from the
module (unlike Modula-2, Oberon does not have separate definition and
implementation modules). A new instance of the gadget is allocated by the
Oberon command procedure NewFrame (lines 103--106). After initialization, the
object is stored in the global variable NewObj in the Objects module. This
location allows other Oberon commands to process the gadget, the simplest
process being the insertion of the gadget at the caret position into an
existing user interface. The W and H fields, declared in the base type,
indicate the width and height of the gadget in screen pixels.
The message handler (44--102) is the largest part of a gadget and is
responsible for processing all messages sent to a gadget instance. Messages
addressing the visual aspects of the gadget (Frame messages) appear in lines
48--78 and are distinct from those messages that all objects should understand
(lines 80--97). The FrameHandler handles messages that are not understood
(line 99) and acts as the gadget's base-class-handler implementation.
Lines 91--97 handle the request to an object to make a copy of itself. The
resulting copy is returned in the obj field of the message M. No global copy
service exists, all copying operations are distributed throughout the network
of objects, and the network of objects can be of any topology. Therefore, copy
messages might arrive twice at the same object, although the object should
only be copied once. This is detected by testing the time-stamp of the message
against the time-stamp of the last copy operation, which is conveniently
remembered when the message first arrives. In the meantime, the copy is cached
in a private field called dlink. The actual copying is done in lines 39--43,
including a call to a procedure to copy the fields of the base gadget type
(line 42).
In lines 81--89, the object is stored to and loaded from disk, where the id
field of the FileMsg indicates which of these operations should be performed.
The field R of the message M contains a Rider, the access mechanism for
reading and writing files in Oberon. The FrameHandler stores and loads the
fields of the base type. Files.ReadInt and Files.WriteInt store the object's
color in a portable fashion to ensure that the gadget can be used on all other
hardware architectures. The attribute handling is completed in lines 80 and
7--22. The FrameAttr procedure checks whether attributes must be retrieved,
updated, or enumerated. The field Name of the message indicates which
attribute, the field class determines its type, and diverse other fields in
the message indicate the value (M.i stores an INTEGER value). It is important
to note that the attribute Gen specifies the procedure that allocates another
instance of this gadget (line 10). Thus, a call to this procedure followed by
a FileMsg sent to the resulting object will restore the previously stored
state of a gadget. In lines 11 and 15--16, the color attribute is retrieved
and updated, and in line 20 it is "advertised" by invoking a procedure
variable in the message's Enum field. The advantage of such a strategy is that
we can use Inspector to edit the attributes of all gadgets, rather than
building a new attribute editor for each gadget class.
The second part of the message-handler processes Frame messages (lines
48--78). Most messages are handled in a default way (lines 70--73, 75).
Messages are normally addressed to a distinct frame F (as indicated by the
field F in a FrameMsg) or to all objects on the display (a so-called "message
broadcast"). For the latter, the destination frame F is set to NIL. Line 50
checks if one of these two cases applies before calculating the coordinates of
the gadget on the display by adding the origin (M.x, M.y) to the actual frame
coordinates (F.X, F.Y, F.W, F.H) in line 51. The coordinates of a frame are
always stored relative to its parent or container frame, allowing us to move
whole containers without updating all the coordinates of its contents.
In this particular case, you are interested only in messages related to
display events (handling DisplayMsg in lines 52--62) and keyboard and mouse
events (handling InputMsg in lines 63--69). A gadget can either be displayed
as a whole (lines 54--56) or as a selected rectangular part of it (lines
57--60). The calls to Gadgets.MakeMask (lines 55 and 58) create a so-called
"display mask" that acts as a clipping region for all further drawing
operations in the Restore procedure (lines 23--29). The call to ReplConst in
line 25 has the actual task of displaying our block. Lines 26--28 display the
gadget in a highlighted state if it is selected. The handling of InputMsg is
also quite straightforward. If the mouse is located in the active area of the
gadget (line 65) (that is, not inside the sensitive control areas around each
frame used for moving and resizing), the mouse is simply drawn in the form of
a pointing hand at position (M.X, M.Y) (line 66). Otherwise the default
message handler overtakes all moving and resizing operations (line 67). 
Printing is done in the same fashion as displaying (lines 74, 30--38). There
are no device-independent devices in Oberon, so display coordinates are
converted to printer coordinates using the formula in lines 32--34. In
essence, such measurements are made in special units organized so that no
rounding errors occur when converting between different devices.
Example 1: The Object base type.
MODULE Objects;
TYPE
 Object = POINTER TO ObjDesc;
 ObjMsg = RECORD END; (* base message type *)
 Handler = PROCEDURE (obj: Object; VAR M: ObjMsg);
 ObjDesc = RECORD (* base object type *)
 lib: Library; ref: INTEGER;
 handle: Handler
 END;
Example 2: MyObject definition.
(a) MODULE MyObjects;
 IMPORT Objects;
 TYPE
 MyObject = POINTER TO MyObjDesc;
 MyObjDesc = RECORD (Objects.Object)
 myColor: INTEGER;
 END;
 SetColorMsg = RECORD (Objects.ObjMsg) (* message to set color of an object*)
 newColor: INTEGER;
 END;
 PROCEDURE MyHandler(obj: Object; VAR M: ObjMsg);
 BEGIN
 WITH obj: MyObject DO
 IF M IS SetColorMsg THEN
 WITH M: SetColorMsg DO
 obj.myColor := M.newColor
 END
 ELSIF M IS Objects.CopyMsg THEN
 WITH M: Objects.CopyMsg DO
 (* ... make a copy of the object ... *)
 END
 ELSIF (* ... etc ... *)
 END
 END
 END MyHandler;

(b) PROCEDURE Do:
 VAR me: MyObject; M: SetColorMsg;
 BEGIN
 NEW(me); (* allocate new object on the heap *)
 me.handle := MyHandler; (* don't forget to install the message handler *)
 M.newColor := 1; (* init message *)
 me.handle(me, M) (* send message *)
 END Do;

Example 3: The Frame base class.
(a) TYPE
 Frame = POINTER TO FrameDesc; (* base type of all displayable objects *)
 FrameDesc = RECORD (Objects.ObjDesc)
 next, dsc: Frame;
 X, Y, W, H: INTEGER (* display coordinates of the frame *)
 END;

 FrameMsg = RECORD (Objects.ObjMsg)
 F: Frame; (* target frame*)
 x, y: INTEGER; (* origin of the frame receiving this message *)
 res: INTEGER (* operation result code *)
 END;

(b) DisplayMsg = RECORD (FrameMsg)
 id: INTEGER; (* flag to display the whole frame, or just an area of it *)
 u, v, w, h: INTEGER (* area of destination frame to be displayed *)
 END;

Listing One 
 MODULE Skeleton;

1 IMPORT Files, Display, Display3, Printer3, Effects, Objects, 
 Gadgets, Oberon;
2 TYPE
3 Frame* = POINTER TO FrameDesc;
4 FrameDesc* = RECORD (Gadgets.FrameDesc)
5 mycol*: INTEGER;
6 END;
7 PROCEDURE FrameAttr(F: Frame; VAR M: Objects.AttrMsg);
8 BEGIN
9 IF M.id = Objects.get THEN
10 IF M.name = "Gen" THEN M.class 
 := Objects.String; 
 COPY("Skeleton.NewFrame", 
 M.s); M.res := 0
11 ELSIF M.name = "Color" THEN M.class 
 := Objects.Int; 
 M.i := F.mycol; M.res := 0 
12 ELSE Gadgets.framehandle(F, M)
13 END
14 ELSIF M.id = Objects.set THEN
15 IF M.name = "Color" THEN
16 IF M.class = Objects.Int THEN F.mycol := SHORT(M.i); 
 M.res := 0 END;
17 ELSE Gadgets.framehandle(F, M);
18 END
19 ELSIF M.id = Objects.enum THEN
20 M.Enum("Color"); Gadgets.framehandle(F, M)
21 END
22 END FrameAttr;
23 PROCEDURE RestoreFrame(F: Frame; M: Display3.Mask; x, y, w, h: 
 INTEGER);
24 BEGIN
25 Display3.ReplConst(M, F.mycol, x, y, w, h, Display.replace);
26 IF Gadgets.selected IN F.state THEN
27 Display3.FillPattern(M, Display3.white, Effects.selectpat, 
 x, y, x, y, w, h, Display.paint)
28 END

29 END RestoreFrame;
30 PROCEDURE Print(F: Frame; VAR M: Display.PrintMsg);
31 VAR R: Display3.Mask;
32 PROCEDURE P(x: INTEGER): INTEGER;
33 BEGIN RETURN SHORT(x * LONG(10000) DIV Printer3.Unit)
34 END P;
35 BEGIN
36 Gadgets.MakePrinterMask(F, M.x, M.y, M.dlink, R);
37 Printer3.ReplConst(R, F.mycol, M.x, M.y, P(F.W), P(F.H), 
 Display.replace);
38 END Print;
39 PROCEDURE CopyFrame*(VAR M: Objects.CopyMsg; from, to: Frame);
40 BEGIN
41 to.mycol := from.mycol;
42 Gadgets.CopyFrame(M, from, to);
43 END CopyFrame;
44 PROCEDURE FrameHandler*(F: Objects.Object; VAR M: Objects.ObjMsg);
45 VAR x, y, w, h: INTEGER; F0: Frame; R: Display3.Mask;

46 BEGIN
47 WITH F: Frame DO
48 IF M IS Display.FrameMsg THEN
49 WITH M: Display.FrameMsg DO
50 IF (M.F = NIL) OR (M.F = F) THEN 
 (* message addressed to this frame *)
51 x := M.x + F.X; y := M.y + F.Y; w := F.W; h := F.H; 
 (* calculate display coordinates *)
52 IF M IS Display.DisplayMsg THEN
53 WITH M: Display.DisplayMsg DO
54 IF (M.id = Display.frame) OR (M.F = NIL) THEN
55 Gadgets.MakeMask(F, x, y, M.dlink, R);
56 RestoreFrame(F, R, x, y, w, h)
57 ELSIF M.id = Display.area THEN
58 Gadgets.MakeMask(F, x, y, M.dlink, R);
59 Display3.AdjustMask(R, x + M.u, y + h - 1 + 
 M.v, M.w, M.h);
60 RestoreFrame(F, R, x, y, w, h)
61 END
62 END
63 ELSIF M IS Oberon.InputMsg THEN
64 WITH M: Oberon.InputMsg DO
65 IF (M.id = Oberon.track) & 
 Gadgets.InActiveArea(F, M) THEN
66 Oberon.DrawCursor(Oberon.Mouse, Effects.PointHand,
 M.X, M.Y); M.res := 0
67 ELSE Gadgets.framehandle(F, M)
68 END
69 END
70 ELSIF M IS Display.ModifyMsg THEN Gadgets.framehandle(F, M)
71 ELSIF M IS Oberon.ControlMsg THEN Gadgets.framehandle(F, M)
72 ELSIF M IS Display.SelectMsg THEN Gadgets.framehandle(F, M)
73 ELSIF M IS Display.ConsumeMsg THEN Gadgets.framehandle(F, M)
74 ELSIF M IS Display.PrintMsg THEN Print(F, M(Display.PrintMsg))
75 ELSE Gadgets.framehandle(F, M)
76 END
77 END
78 END
79 (* Object messages *)
80 ELSIF M IS Objects.AttrMsg THEN FrameAttr(F,M(Objects.AttrMsg))

81 ELSIF M IS Objects.FileMsg THEN
82 WITH M: Objects.FileMsg DO
83 IF M.id = Objects.store THEN (* store private *)
84 Files.WriteInt(M.R, F.mycol);
85 Gadgets.framehandle(F, M)
86 ELSIF M.id = Objects.load THEN (* load private *)
87 Files.ReadInt(M.R, F.mycol);
88 Gadgets.framehandle(F, M)
89 END
90 END
91 ELSIF M IS Objects.CopyMsg THEN
92 WITH M: Objects.CopyMsg DO
93 IF M.stamp = F.stamp THEN M.obj := F.dlink
 (* message arrives again *)
94 ELSE (* First time copy message arrives *)

95 NEW(F0); F.stamp := M.stamp; F.dlink := F0; 
 CopyFrame(M, F, F0); M.obj := F0
96 END
97 END
98 ELSE (* an unknown message arrived; framehandler might know it *)
99 Gadgets.framehandle(F, M)
100 END
101 END
102 END FrameHandler;
103 PROCEDURE NewFrame*;
104 VAR F: Frame;
105 BEGIN NEW(F); F.W := 20; F.H := 20; F.mycol := Display3.red; 
 F.handle := FrameHandler; Objects.NewObj := F;
106 END NewFrame;
107 END Skeleton.































October, 1994
Making a Case for Animating C++ Programs


Do new styles of programming require new kinds of tools?




Alan West


Alan is the chief architect of Look!, a C++ animation system. He can be
reached at alan@openobjs.com or alan@ost.co.uk.


The move to object-oriented languages is paralleling the rise of complex,
event-driven, GUI-based applications which use object-oriented frameworks to
partition and encapsulate their functionality. Unfortunately, there are few
tools currently available that help you understand the dynamic aspects of
object programs. CASE tools do allow certain design diagrams to be reverse
engineered, showing class inheritance, containment, and use relationships, but
these are static and show the structure of the code rather than indicate how
it functions. 
Clearly, the code itself is the main source of information about dynamic
activity, even though most debuggers used for examining C++ programs in action
present a procedural stream of executed source lines. Up to now, the use of
procedural tools to debug and understand C++ programs hasn't been a stumbling
block, since many programmers still don't utilize all of C++'s object-oriented
capabilities. But as C++ programs begin to become truly object oriented,
problems will arise when viewing execution of object systems as a sequential
execution of source code. For instance, you'll need to focus on the changing
set of objects that make up the system and the interactions between objects.
This information supplements--not replaces--that provided by static tools.
When it comes to understanding and debugging programs, the more information
and data points you have, the better off you usually are.
Some would say that dynamic views of program execution are unnecessary, overly
confusing, and generally not worth the effort. In this article, I'll argue
that object-oriented systems require dynamic, object-oriented, animated views
of C++ programs. For the purpose of example, I'll refer to Look! (which I
designed), which lets you "look" inside Windows- or UNIX-based C++ programs as
they execute. Figure 1 shows a typical Look! session. Although Look! is
currently perhaps the only tool providing a peek at program execution that is
both dynamic and visual, other tools support one or the other. For instance,
Pure Software's Purify gives you a dynamic view of memory, although not yet a
visual one, while the TVIEW component of Nu-Mega's Bounds-Checker32/S debugger
provides you with a visual, albeit static, view. 
Smalltalk programmers have long been familiar with object inspectors that
allow object-status examination while the sys-tem is running. Likewise, in his
article, A Minimal Object-Oriented Debugger for C++" (DDJ, October 1991),
William Miller presented a simple debugger that lets you look at objects being
created and destroyed while the program was running. Other work in this vein
includes animating processes in concurrent systems (see "An Extensible
Distributed Object Management System, EDOMS" by Burke, Domae, and Johnson,
Proceedings of the Second International Conference TOOLS, 1990) and animating
algorithm execution ("Algorithm Animation" by M.H. Brown, ACM Distinguished
Dissertations Series, MIT Press, 1988). Other projects have concentrated on
the animation of static displays of program components, usually using Prolog,
("An Integrated Prolog Programming Environment" by Schreiweis, Keune, and
Langendorfer, ACM Sigplan Notices, February 1993) or Lisp-derivative languages
("GraphTrace: Understanding Object-Oriented Systems Using Concurrently
Animated Views" by Kleyn and Gingritch, ACM Sigplan Notices, November 1988).
The GraphTrace program aimed to increase the understanding of object programs
through the animation of static layouts of program components. The aim of
Look! is similar; however, Look! produces dynamic object diagrams, showing
objects as they are created and destroyed and as they communicate, rather than
animating static, structural views. Also, this system does not function by
play-and-record--applications are animated as they execute. By pointing at
screen objects, you can map the displayed objects or messages to the
corresponding classes and functions and explore source code and data. 


Dynamic Diagrams of C++ Programs


By embodying basic principles of sound engineering design directly in
programming languages, object technology has standardized, at the
architectural level, the components (program objects) that define a program
both statically--classes and functions/methods--and dynamically--objects and
messages. Likewise, relationships and communications between objects have
become standardized. 
In nonobject environments, the amount of design-level information that can be
extracted from code is limited: Function-call trees can be produced, but these
convey nothing about the state of data in the program; the only other
important architectural information available is the module grouping of
functions. By embedding design-level structures in the code, object technology
allows the derivation of static, class-level information (as well as the usual
function-call trees) and the automatic generation and animation of
object-level diagrams that show the objects in the program as they are created
and destroyed, as they communicate, and as their relationships change.
Together with the source and machine level, the diagrammatic representation of
object programs can be regarded as a third level of representation; see Table
1. At any point during execution of an object program, you can view the
execution state at different levels, moving between them at will.


Visual Analogs of Object Programs


Object-oriented code lends itself naturally to visual representation. Programs
consist of sets of objects that communicate with one another dynamically. You
can represent objects, communications, and relationships using diagrams such
as Figure 2, where objects are represented by circles, although icons
representing object classes could also be used. Abstract objects (container,
list, and the like) still have a visual analog. As well as using icons to
differentiate classes of objects, the appearance of an object can be changed
to indicate internal data changes or the current state of the object. In the
latter case, you can show factors that affect the object as a whole, such as:
if it is active, selected in the user interface, deleted while still active,
or has been cross-referenced. The icon may also show physical aspects of the
object such as memory allocation (static, heap, auto) or size (class-based).
Grouping and displaying objects with different representations means that
aspects of the state of the program can be rapidly comprehended and checked as
the types and states of existing objects are visually broadcast.
Relationships between objects can be shown spatially (even if only a subset of
relationships can be presented at any one time). The layout of objects is
problematic, however, as the set of objects is dynamic, and the pattern of
communication between objects generally cannot be predicted in advance.
Animators (such as Look!) can incorporate different object-layout strategies
or views:
First-reference hierarchies, which show objects organized into a hierarchy in
which the parent objects are those that first held a reference to the child
objects. This representation automatically lays out most common structures
correctly, including trees and lists, and can directly show incorrect
structuring as it occurs. Objects are first shown below the object that
created them and are then reparented when a reference to them from another
displayed object is created.
Creation hierarchies, which show creation relationships between objects. It
does not show structural problems as clearly as the first-reference hierarchy,
but can more clearly show the actual working of a system as the view maps
directly to code execution. 
Class-ordered hierarchies, which align classes in columns. These are generally
less useful than the others, but when combined with filtering, are helpful for
following the interaction between a small set of classes whose objects are not
very dynamic.
Figure 1 is a sample creation-hierarchy view of a paint program. The program
is structured to have a palette and toolbar on the main window, with tools
being added to the toolbar. In this case, the creation hierarchy effectively
shows the structure of the system because it is based on a containment
organization.
At the object level, the unit of activity is the member-function call, so
system activity is represented as member-function calls taking place between
dynamic, iconic object representations. Each call is represented by dynamic
arrows and message labels. Figure 2 shows a sequence of constructors as the
system is initialized. The object graph changes dynamically as objects are
created and destroyed. 
Each object can have multiple relationships with other objects. The
first-reference view uses one relationship as the basis of organizing the
hierarchy. At any point, the other references to or from an object can be
shown as an additional set of linking lines on an object diagram, acting as a
visual cross reference between the objects. Pressing on a line displays the
data member which actually stores that object reference.
Given the complexity of software systems, many two-dimensional views have to
be used to describe a system. This applies to animation just as much as to
design, so an animator must be able to display and animate multiple views of
the program components simultaneously. These include object diagrams, message
diagrams, class diagrams, and source views. I am focusing primarily on object
diagrams because they are the most important and directly represent the
dynamic state of the system. They are supplemented by the other views: Message
traces show the details and history of object-to-object communications and
allow replay of calling sequences; class diagrams show the usual static class
relationships, although they can also be animated, showing the active class
changing as objects communicate (see Figure 3).


Understanding C++ Details


Although at the object level you gain an overall understanding of program
operation, it is useful to directly link to the source level by synchronously
animating source code and diagrammatic views. At the diagrammatic level, the
system is stepped one function call at a time; at the source level, it is
usual to step line by line. If source and diagrammatic views are
simultaneously animated, then it is possible to see the sequence of calls (and
the creation of temporary objects) caused by executing a single C++ line.
Similarly, stepping by a message shows the lines executed for that call or
return. In Figure 4, for example, the object view shows the implicit
invocation of a constructor to convert a character string to a TdDate object
as an assignment statement is stepped over. 
Object-level animation demonstrates fundamental C++ features such as
construction and destruction ordering, different means of allocating objects,
creation and destruction of temporary objects, invocation of type-conversion
operators, resolution of calls among inherited classes, and base- and
aggregate-object use. For example, the order of creation of the component
parts of an object can be shown visually. The left-to-right ordering in Figure
5 indicates the sequence of initializing bases of the expand object. Last to
be initialized is an aggregate object.
The C++ puzzle in Example 1 (which originally appeared in "The C++ Puzzle" by
R. Murray, C++ Report, November/December 1992) illustrates how you can
visually follow a program. Ignoring for the moment questions of C++ style
(passing the node by value is unnecessary, and why have a single Node class
for three Node types?), the bottom line is that nodeA does not get printed.
Instead, the unary constructor for node--Node(Node &)--is invoked, because it
has the form of a C++ copy constructor. The unary constructor doesn't copy,
however; it creates a new node and attaches the one we passed in to it. So now
you are trying to print a temporary unary node instead of a copy of nodeA.
Just looking at the text, this problem may be a bit difficult to spot, but
running the example through the animator shows nodeA being created, followed
by a temporary CnodeA object. When CnodeA is created, nodeA becomes referenced
from CnodeA, and this reference change is automatically detected by the
animator and shown by reparenting nodeA below CnodeA; see Figure 6. The
existence of the CnodeA/nodeA reference clearly indicates the problem;
checking the message trace confirms it.
Besides directly showing incorrect structuring and references, object-level
animation detects various memory-usage errors. To animate C++ without
requiring source changes, a program must be monitored at the object-code level
and a considerable amount of information about the program must be maintained
to identify objects and member-function calls. This monitoring also yields
many common C++ bugs--objects being deleted while still active, use of invalid
object pointers or references, and the passing of invalid pointers or
references--which are automatically detected as they occur.


Filtering


As critics of program animation often (correctly) point out, systems of
significant size often have too many objects and messages for the entire
operation of the system to be animated. Consequently, you may want to view
only selected classes of objects, or objects created by certain other objects,
or objects created after a certain message sequence. Animation systems must
therefore provide facilities for filtering the display of system activity.
We have found that filtering should focus on the exclusion of sections of the
program. It appears more natural to gradually remove irrelevant activity
rather than having to state initially what you want to see. Ideally, all
filtering could be done visually by pointing at the things to leave out.
Static filtering of this type can exclude (and include) classes, functions,
and modules simply by pointing at them on a static view. In the dynamic object
views, pointing at an object allows the corresponding class to be included or
excluded; similarly, pointing at a message label allows the function to be
excluded or included. 
However, in the most general case there is a need to be able to exclude
sections of program activity; for example, to show only calls from objectA to
objectB, or every call from the object which created objectA. To do this, you
must be able to make statements about the dynamic structure of the program;
this can be done by defining the set of messages (member-function calls) to
animate. Each call has a sending and a receiving object, and we can reason
about the attributes of these objects and the function, in order to decide
whether the message is to be included or excluded. For instance, Example 2(a)
doesn't show calls to functions called paint in objectA, while Example 2(b)
stops every time a method is called on a Persistent object. Example 2(c) only
shows object construction, and Example 2(d) shows all activity until any
object created by the display_manager makes a call, and then excludes all
further activity.

Dynamic filters are expressed in an extension of C++ expression syntax that
allows a wide range of statements to be made about a metamodel describing the
inheritance and creation structure of any C++ program. Dynamic filters can be
entered using a graphical editor, or textually, as in Example 2.


Conclusions


With some exceptions, the current interest in C++ focuses on detailed
syntactic features, not opportunities to exploit object technology to
construct large, maintainable systems. Object technology creates systems as
sets of intercommunicating objects, a structure which has an intuitive and
powerful visual representation. Exploiting our cognitive abilities by
representing program execution with visual analogs allows you to view and
interact with programs in intuitive ways. This has only recently become
practical because of the standardization of program structure brought about by
object technology. Animation is practical and effective. As the capabilities
of GUIs increase (and systems become more standardized), it will become
possible to provide an increasing amount of relevant information.
Figure 1 Typical creation-hierarchy view of a paint program.
Example 1: C++ puzzle illustrating how you can visually follow a program.
class Node
{
 ...
public:
 Node();
 Node (Node &); // Unary
 Node (Node &, Node &); // Binary
};
void display_node()
{
 Node nodeA;
 Display display;
 display->print(nodeA);
}
Figure 2 Representing objects.
Table 1: Different ways to view programs.
Representation 
level Visual representation Granularity 
Diagrammatic Figure Member-function call
Source Expresult *ans= 0; Source line
 Expresult *left = leftD->eval();
 Expresult *right = rightD->eval();
 if (left && right)
 {
 ans = appluop(right,left);
 if (left) delete left;
 if (right) delete right;
 }
 return ans;
Machine ov DX,seg D0 Machine instruction
 mov AX,offset D026h
 les BX,6[BP]
 mov ES:0Eh[BX],DX
 mov ES:0Ch[BX],AX
 inc word ptr _nautosD__4Auto
 mov ES:byte ptr 010h[BX],1
 xor AX,AX
Figure 3 The active class changes as objects communicate.
 (a) Source view; (b) object view. 
Figure 5 The sequence of initializing bases of the expand object.
Figure 6 Tree view.
Example 2: Filtering program execution.
(a)
Filter remove_paints
(to.name == objectA &&
 function.name == paint)
{ exclude; }
(b)
Filter stop_on_store (to.class.baseclass.name == Persistent)
{ stop; }

(c)
Filter show_construction
(function.type != constructor)
{ exclude; }
(d)
Filter until_suppliers (from.parent.name == display_manager)
{
 add exclude_all;
 remove until_suppliers;
}
Filter exclude_all() { exclude; }



















































October, 1994
Endian-Neutral Software, Part 1


System concepts and implications




James R. Gillig


Jim is a software engineer on OS/2 and IBM Workplace technologies in Boca
Raton, Florida. He can be reached through the DDJ offices.


Endian is a processor-addressing model that affects the byte ordering of data
and instructions stored in computer memory, and the data's representation
provided by a programming language. Endian concepts can be confusing since
there are different Endian types, different ways to represent these types, and
intertwined considerations for both code and data portability between
opposite-endian hardware platforms. Historically, the term "Endian" comes from
Gulliver's Travels, by Jonathan Swift:
It is computed that eleven Thousand Persons have, at several Times, suffered
Death rather than submit to break their Eggs at the smaller End. 
In the first installment of this two-part article, I will lay the groundwork
by examining what Endian means from the programmer's perspective. In next
month's article, I'll discuss how you can write portable software by applying
Endian-neutral design and programming principles.
The most common addressing models are Big-endian, derived from the
left-to-right order of writing in western-culture languages, and
Little-endian, stemming from the right-to-left order of arithmetic operations
in hardware processors. As Figure 1 illustrates, the Big-endian (BE)
addressing model assigns or maps the lowest address to the highest-order (that
is, the most significant or leftmost) data byte of a multibyte-scalar data
item. The Little-endian (LE) addressing model assigns or maps the lowest
address to the lowest-order (least significant or rightmost) data byte of a
multibyte-scalar data item.
The "Endianness" of a multibyte-scalar data type such as an integer halfword
or word is BE or LE. When compiled for a LE processor, its byte order is the
reverse of the byte order compiled for a BE processor. The simplest way to
think about Endian is that a LE scalar data item is equivalent to a
byte-reversed BE scalar data item. Such a scalar should be treated as a
single, indivisible data item although it has more than one byte and is
composed of smaller addressable units of storage. Aggregate data such as
files, data structures, and arrays are composed of multiple data elements;
each element that is a multibyte scalar has Endianness. Byte values or
single-byte character data do not have Endianness because the smallest
addressable unit of memory is one byte; consequently, byte order is not an
issue. 
Some processors are Little-endian (Intel x86), others are Big-endian (IBM
AS/400, System/370, Macintosh), and some are bi-endian (PowerPC) and can run
in either BE or LE mode. In turn, the Endianness of software (code and data)
is determined by the processor for which it is written. 
The data structure in Figure 2 shows how Endianness can affect addressability
and byte order. When a data structure containing different data types is
compiled for a BE processor and again separately for a LE processor, note the
following about the compiled data structure:
Each data item is at the same address location, whether BE or LE (see variable
b at address 0x08, Figure 2). 
The LE byte order within a scalar data item is equivalent to byte-reversed BE
(see variable b byte address, Figure 2). 
Single-byte characters lack Endianness and are at the same byte address in BE
or LE mode (see array d[7], Figure 2).


Endian Maps and Forms


An Endian model maps addresses to the bytes of a multibyte scalar. There are
different ways to illustrate Endian maps and forms of data for human viewing.
The byte addresses of a LE data item are shown in either left-to-right or
right-to-left order, with byte values appearing in the opposite order. For a
BE data item, both addresses and bytes are shown in the same left-to-right
order. The relationship between BE and LE mappings and their forms of
representation are shown in Figure 3. Figure 4 is based on the sample data
structure in Figure 2 but illustrated in the alternate left-to-right
addressing form for LE. A disadvantage of this form is that the scalar data
items do not appear in the more readable (to western cultures) left-to-right
order. 
In addition to BE and LE, other related Endian maps and forms may exist as
part of a processor's addressing architecture or its implementation. Some
special forms may be internal to a processor and transparent to software; they
should not be confused with BE and LE, which are visible to software. BE and
LE are most common, but you should not categorically assume that they are the
only addressing models in existence and that all data in the world is only BE
or LE. 
Finally, it is interesting to compare how halfword, word, and doubleword
integers can appear as members of a data structure in BE and LE form.
The data structure in Figure 5(a) has its BE/LE byte-address mappings shown
next to it. Figure 5(b) shows a different mapping for LE than before. Finally,
Figure 5(c) shows yet a different byte-address mapping for LE. For BE, the
byte address of each byte value is the same in (a), (b), and (c) of Figure 5;
for LE, the byte addresses are all different for the same byte value. 
Multibyte-scalar data should be treated by software as a single, indivisible
entity, such as an integer, pointer, or float. You can write code that treats
a scalar as aggregate data by addressing a specific byte location or byte
subfield internal to the scalar. This practice results in code that is not
readily portable between Endians. In Figure 5, the short-integer s3.k data
item is at address 04 for both Endian types, but its two component bytes are
at different addresses! A program accessing data at location (char*) &s3.k+1
would find 0x16 when running in BE mode and 0x15 in LE mode. In short, when
twiddling with the internal bits and bytes of scalar data, do not assume they
are stored at a particular address; otherwise, such a program may break when
ported to a different Endian. Bits can be more portably selected in BE or LE
with bitwise operations such as n & 0x03FC0000 and be independent of byte
address. The important principle is not to rely on those bits being stored at
a particular byte address.


System Endianness


The classification of a processor, program, or data according to the
addressing model it is based on (usually BE or LE) is its Endian type. A
processor or program is said to execute in BE or LE mode. Furthermore,
Endianness means being of a certain Endian type or mode. More generally,
Endianness means the technical considerations for executing in different
Endian modes and porting program code and data between BE and LE platforms.
Endianness is not limited to any particular component of a system but can
occur wherever data is addressed, retrieved, stored, processed, or
transmitted.
A single-endian processor is architected as either Big- or Little-endian; most
Intel processors, for example, are LE. Some processors are bi-endian, such as
the PowerPC, which has the ability to run in either BE mode, LE mode, or both
under software control. Bi-endian capability makes it possible to migrate
existing operating systems, their applications, and data from both BE and LE
platforms to a common bi-endian processor such as the PowerPC. (For more
details, see the accompanying text box entitled, "PowerPC Bi-endian
Capabilities.") The operating system is responsible for handling
Endian-specific controls, registers, and interrupts that a processor may
provide. 
A processor has Endianness as a characteristic of its architecture. Therefore,
hardware units that have embedded processors, such as video displays,
printers, or communications adapters, take on the processor's Endianness, as
does any software supporting the hardware unit. The Endianness of input/output
data and commands between devices and adapters attached to a system of
opposite Endian must be taken into consideration. Typically, all related
system and attached hardware from a given manufacturer has the same Endian
type. The situation is even more complicated in distributed computing
environments, as described in the accompanying text box, "Distributed
Environments and Endian."
The user of a stand-alone, single-endian system with all of its data being of
the same Endian does not encounter Endian-related problems; however, if data
of another Endian type is imported by LAN, communications, diskette, or other
media, then software must handle conversion to the correct Endian. Endian
conversion of data requires knowing the data structure, data type, and Endian
type. A cross-platform application that runs on different Endian types of
platforms needs Endian-conversion capability for data interchange with itself
across different platforms. When applications are different, a conversion
utility can be written to convert data files between different applications
running on opposite-endian platforms.
The machine-executable instructions of compiled source code are handled as
data during compilation into binary code and loading from disk for execution,
and while being managed by the operating system. When being handled as data by
other software, binary program code is subject to the same effects of
Endianness as data and should be treated as multibyte scalar data.


Programming-Language Data Representation


A programming language represents data based on the same addressing model
(Endian type) inherent to the processor for which it is compiled;
left-to-right for BE and right-to-left for LE. Programming languages may
extend data representation and provide data constructs down to the bit level
(for example, bit fields in C) even though the processor allows addressing
only to the byte level. A bit field, which can be thought of as a "tiny"
integer, is a contiguous set of bits, where the most significant bit is on the
left end and least significant bit is on the right end. Multiple bit fields
can be defined within a word. The programming language, in general practice,
extends the Endian type down to the bit-field level; that is, multiple bit
fields defined within a word are represented in left-to-right order for BE and
right-to-left order for LE.


Why Endian Awareness is Important


Endian awareness is needed in today's open, interconnected systems for program
portability and data interchange across BE and LE platforms. There are two
basic consequences of Endianness:
Code may not be portable to systems of the opposite Endian. This is a result
of Endian-specific program code that twiddles with the internal bits and bytes
of scalar data and assumes an Endian-specific (BE or LE) addressing byte
order. 

Data may not be (automatically) interchangeable between systems of the
opposite Endian. This is a result of LE and BE data items (multibyte scalars)
being the byte reverse of one another. 
How do you deal with these issues? Source-code portability can be facilitated
by writing Endian-neutral code that is more readily portable across BE or LE.
Data portability is achieved by conversion of the Endian type, when the data's
structure, data type, and Endian type are known. In next month's installment,
I'll present techniques for writing Endian-neutral applications.
PowerPC Bi-endian Capabilities
The PowerPC is a bi-endian RISC processor that supports both Big- and
Little-endian addressing models. The bi-endian architecture provides hardware
and software developers with the flexibility to choose either mode when
migrating operating systems and applications from their current BE or LE
platforms to the PowerPC. Figure 6 shows the address mapping of its 32-bit
executable instructions when running in BE mode and LE mode. These examples
illustrate how program instructions are like multibyte-scalar data and are
subject to the byte-order effect of Endian. 
Each individual PowerPC machine instruction occupies an aligned word in
storage as a 32-bit integer containing that instruction's value. In general,
the appearance of instructions in memory is of no concern to the programmer.
Program code in memory is inherently either a LE or BE sequence of
instructions even if it is an Endian-neutral implementation of an algorithm. 
How does the PowerPC handle both LE and BE addressing models? The processor
calculates the effective address of data and instructions in the same manner
whether in BE mode or LE mode; when in LE mode only, the PowerPC
implementation further modifies the effective address to provide the
appearance of LE memory to the program for loads and stores. 
The operating system is responsible for establishing the Endian mode in which
processes execute. Once a mode is selected, all subsequent memory loads and
stores will be affected by the memory-addressing model defined for that mode.
Byte-alignment and performance issues need to be understood before using an
Endian mode for a given application. Alignment interrupts may occur in LE mode
for the following load and store instructions:
Fixed-point load instructions. 
Fixed-point store instructions. 
Load-and-store with byte reversal instructions. 
Fixed-point load-and-store multiple instructions. 
Fixed-point move-assist instructions. 
Storage-synchronization instructions. 
Floating-point load instructions. 
Floating-point store instructions. 
For multibyte-scalar operations, when executing in LE mode, the current
PowerPC processors take an alignment interrupt whenever a load or store
instruction is issued with a misaligned effective address, regardless of
whether such an access could be handled without causing an interrupt in BE
mode. For code that is compiled to execute on the PowerPC in LE mode, the
compiler should generate as much aligned data and instructions as possible to
minimize the alignment interrupts. Generally, more alignment interrupts will
occur in LE mode than in BE mode. When an alignment interrupt occurs, the
operating system should handle the interrupt by software emulation of the load
or store. 
A very powerful feature of the PowerPC architecture is the set of integer
load-and-store instructions with byte reversal that allow applications to
interchange or convert data from one Endian type to the other, without
performance penalty. These load-and-store instructions are lhbrx/sthbrx,
load/store halfword byte-reverse indexed and lwbrx/stwbrx, load/store word
byte-reverse indexed. They are ideal for emulation programs that handle
LE-type instructions and data, such as the emulation of the Intel instruction
set and data. These instructions significantly improve performance in loading
and storing LE data while executing PowerPC instructions in BE mode and
emulating the Intel instruction behavior; this eliminates the byte-alignment
and data-conversion overhead found in architectures that lack byte-reversal
instructions. Currently, these instructions can be accessed only through
assembly language. Until C compilers provide support to automatically generate
the right load and store instructions for this type of data, C programs can
rely on masking and concatenating operations or embed the assembly-language
byte-reversal instruction.
--J.R.G.
Distributed Environments and Endianness
A distributed application running between client desktops, servers, midframes,
and mainframes depends on the communications model and its API for resolving
Endian differences. In a mixed, distributed environment, applications must be
able to compensate for differences in data representation between the systems
that participate in the application. 
Specific implementations for handling Endian and other conversions exist
within applications written to lower-layer communications APIs. Higher-level
application-development models like the Remote Procedure Call (RPC) of the
Distributed Computing Environment (DCE) provide more general and robust
support that isolates applications from these differences. 
Most existing distributed software is written directly to a communications
API. Typical communication interfaces are TCP/IP with a sockets or streams
interface, NetBIOS with its own control block-based interface, or various SNA
or ISO OSI interfaces. 
Although communications APIs guarantee that data will be transmitted/received
between network nodes, they do not understand the data types being transmitted
and cannot convert data or data attributes, including Endian type, between
clients and servers that have dissimilar data representations. This forces a
distributed application to compensate for any differences. 
DCE RPC allows an application to be developed as if it were nondistributed. At
the same time, it allows any of the application's subroutines to be executed
on a remote system. The RPC application-development model divides the local
(client) and remote (server) parts of a program along an application's
internal procedural interfaces. 
Since the remote procedures are application defined, they must be able to
support a variety of high-level language data types, including int, char, and
struct. RPC hides the fact that data communications take place between client
and server subroutines, and one of its functions is to interpret and convert
native data-representation differences that may exist between the
communicating systems. These differences include the addressing model
(Endianness), alignment rules, character-set encoding, floating-point
conventions, and numerical data formats. 
Unlike writing directly to a communications API, writing to the DCE RPC
interface allows you to ignore data representation and Endian conversion. DCE
RPC can convert a well-defined, broad set of data types, including most C
scalar and vector types as well as some extended types for use in a
distributed environment. Examples of the latter include a byte data type to
protect data from any conversion and a pipe data type to transfer large blocks
of data. 
The RPC data marshaling and unmarshaling routines handle the bulk of the
data-conversion responsibility. Marshaling converts typed data into an
encoded, linear buffer suitable for data communications. Unmarshaling
recreates the typed data by interpreting the encoded data in the buffer. The
marshaling/unmarshaling process takes, for example, a struct data type,
decomposes it into its elements, and writes the data and a description of the
struct into a single logical buffer. Unmarshaling rebuilds the struct by
reading the data and description contained in the buffer. 
A typical client/server call has at least two data transfers: The first is
from client to server, and the second is the return flow back from server to
client. The RPC subsystem takes the arguments from the procedural interfaces
and assembles them into buffers using the Network Data Representation (NDR)
encoding rules. The buffers constructed by the RPC marshaling routines include
the data itself, as well as descriptors defining the type, size, and relative
location of the data and its elements. Additional protocol information
includes a field describing the native data representation of the transmitting
system. 
Embedded in the buffers containing the transmitted data is a variable that
classifies the data as Big- or Little-endian. The algorithm used to properly
decode or unmarshal the data buffers uses the principle of
receiver-makes-right; see Figure 7. The receiver determines from the protocol
information whether the transmitter's data representation is the same as its
own. If so, no conversion is necessary. If not, a specific, standard
conversion routine is called for each data type unmarshaled from the received
packet(s). The data can then be presented to the application in the
native-machine format.
In summary, a distributed application either compensates for any Endian
differences when using lower-layer comunications APIs or uses a higher-level
model such as DCE RPC that supports automatic conversions.
--J.R.G.
Figure 1 (a) Big-endian addressing; (b) Little-endian addressing.
Figure 2 Typical C data structure and its Endian maps. 
Figure 3 Relationship between Big- and Little-endian mappings and their forms;
both mappings are 4-byte word examples.
Figure 4 Multibyte-scalar data items are reversed in this representation (as
compared to Figure 2).
Figure 5 Comparing halfword, word, and doubleword integers as members of a
data structure in Big- and Little-endian form.
Figure 6 The address mapping of PowerPC 32-bit executable instructions when
running in BE and LE modes.
Figure 7 Typical DEC RPC call/return sequence.
























October, 1994
Extended State Diagrams and Reactive Systems


Designing systems for unpredictable inputs




Doron Drusinsky


Doron, who holds several patents in the areas of state-chart synthesis and
finite state machine optimization, is the president of R-Active Concepts.
Doron can be contacted at doron@infoserv.com.


As the cost of hardware continues its downward spiral, the application of
embedded electronic control continues to accelerate into new domains. Many of
these new applications have complex designs, however, and graphical tools are
proving to be the most efficient way of specifying, designing, and documenting
such systems. In particular, graphical tools are well suited for the design of
systems based on state machines or data flow. Consequently, in this article
I'll examine the use of extended state diagrams (also known as "Harel
diagrams") for the design of reactive systems--those which endlessly react to
a plurality of partially correlated entities in their environment. To
illustrate extended state diagrams, I'll base my discussion on BetterState, a
graphical state-machine design tool with a built-in code generator my company
has developed. 
Transformational systems are those invoked when the inputs are ready and the
outputs are produced after a computation period; see Figure 1(a). Examples are
voice-compression systems (software or hardware) or (sub)systems which
calculate the square root of input. Top-down decompensation is a natural
design methodology for transformational systems because it breaks down complex
input/output (functional) relationships into simpler, more manageable ones.
Similarly, conventional programming and system-level specification languages
are transformationally oriented; they cater to top-down functional design.
Fundamentally different from transformational systems are reactive systems
such as Figure 1(b), in which inputs are not ready at any given point in time.
A typical reactive system is a traffic-light controller which never has all
its inputs ready--the inputs arrive in endless and perhaps unexpected
sequences. It is virtually impossible to write a transformational program that
implements a controller such as this. In fact, most controllers are by
definition reactive, not transformational, with application domains ranging
from military, aerospace, and automotive applications to DSP, ASIC design,
medical electronics, and similar embedded systems. Just about every system has
a reactive component, because a system is seldom isolated from its
environment. On the contrary, the reason the system exists is typically to
collaborate or interact with some entity or entities in its environment. Such
collaboration is done by sending, receiving, recognizing, and rejecting
sequences of symbols--a reactive behavior.
Finite state machines (FSMs) and state diagrams (FSM's visual counterpart)
have traditionally been used to specify and design reactive (sub)systems. They
are well known, well accepted, highly visual, and intuitive. Their ability to
describe finite and infinite sequences, combined with their visual appeal,
made FSMs one of the most commonly accepted formalisms in the electronic
industry. State diagrams are easier to design, comprehend, modify, and
document than the corresponding textual approach. But FSMs and state diagrams
haven't changed much over the past 30 years and suffer from limitations when
applied to today's reactive applications:
FSMs are flat. They do not cater to top-down design and information hiding. 
FSMs are purely sequential, whereas applications are not. Modern controllers
need to react to signals to and from a plurality of entities in the
environment. Consider an answering machine controller specified to cater to a
"second- call waiting" situation in addition to the "first caller." A
conventional FSM needs to account for all possible combinations of states
catering to the first and second callers, which leads to the well-known
state-blowup phenomenon. 
Text-based entry methods, which are by definition sequential, cannot
effectively capture concurrent behavior. Therefore, drawing state diagrams on
paper and entering them textually is no longer effective.
Top-down design concepts require interactive software to enable the user to
manipulate and browse through complex designs. 
Because of such limitations, FSMs have been used sparingly in recent years.
Compensating for these limitations are extended state diagrams (or
"statecharts"), designed by David Harel and described in his paper
"Statecharts: A Visual Approach to Complex Systems" published in the Science
of Computer Programming (1987). (Harel, who was my PhD advisor, is also the
author of Algorithmics: The Spirit of Computing, Addison-Wesley, 1987.) While
addressing the hierarchy, concurrency, priorities, and synchronization within
state diagrams, extended state diagrams retain the visual and intuitive appeal
inherent to state diagrams. 


A Traffic-Light Controller Example


To illustrate how you design systems around extended state diagrams, I'll use
a typical traffic-light controller as an example. The specification for this
traffic-light controller (TLC) is as follows: 
There are two directions, Main and Sec, with alternating lights.
Lights alternate based on a Timeout signal, which can be read from the Timeout
variable.
Initially, all lights flash yellow. Upon reset going low (0), the on-going
operation can start. When reset goes high (1), the system must reset into this
initial state.
The priority order is: Reset, Timeout, all other signals.
A Camera, positioned in the Main direction, operates only when the Main
direction has the red light. It should take shots of cars going through a red
light in the Main direction, unless a policeman is present (signal Manual_on
is 1).
When the Main direction has the red light, and four or more cars or a truck
that follows one or more cars are waiting in that direction, Main gets the
green light.
When the Main direction has the red light and three cars are waiting in that
direction, Camera should shoot.
The extended state diagram in Figure 2 realizes the highest level of the TLC's
behavior. It captures the most high-level events and state transitions. State
Red2Main, however, has "hidden" information that can be accessed by double
clicking on Red2Main. Such information hiding makes the diagram more readable
and manageable. Note how the transition labeled Reset takes effect no matter
what the present state is within on_going. Such high-level transitions are a
powerful tool for managing work between designers. Any change made to Red2Main
by one designer is automatically captured by the high-level transition
designed by another.
Figure 3 shows two concurrent threads of control, one capturing the state
sequence for the Camera's (sub)state machine, and the other capturing the
state sequence for the Counter's machine. Concurrency here has little to do
with real time--we are not specifying how fast the design will work.
Concurrency, in this case, is related to independent activities. The Counter
and Camera are independent most of the time, but are always active in the
Red2Main state. The semantics of extended state diagrams implement the desired
behavior, without the designer explicitly implementing suspend-and-resume
behavior. Despite their independence, the specification dictates some
correlation between Camera and Counter. When the Counter counts two cars
waiting in Main, it tells the Camera to shoot. The transition from c_2 to
Shoot effects that behavior precisely, without any message passing.


Code Generation


Code generation for conventional FSMs is straightforward: A case statement (C
switch statement) over all possible states is a common representation.
Similarly, code generation for concurrent FSMs is no more than a set of code
blocks, one per each FSM. However, code generation for extended state diagrams
is more complicated because: 
An extended state machine has flexible concurrency. When the TLC example is in
the Red2Main state, there are two active threads of control (Camera and
Counter); in the YellowAll state there is only one active thread. 
Hierarchy is an additional source of potentially concurrent transitions.
Due to concurrency, a case statement is inappropriate: More than one state
might be active at any given time.
Support for visual synchronization and visual priorities is nontrivial.
For these reasons, handcoding for extended state diagrams can become a
confusing and time-consuming task.
I have invented two code-generation methods for extended state diagrams. The
first method, coinvented with David Harel, is more hierarchical in nature; the
code generated preserves the hierarchy in the original diagram. The second
method, currently in use by BetterState, flattens the diagram in an attempt to
generate simpler code. For this reason, the BetterState code generates is
simple in structure; it is no more than a large set of If statements. The
process of generating this code however, is entirely nontrivial.
The BetterState code generator that builds the extended state diagram is based
on the following code-generation methodology:
Additional code is not required to run the design on a particular device. For
example, the C code generated doesn't require a real-time operating system to
implement concurrency and hierarchy, and is compatible with any processor or
microcontroller equipped with a C compiler.
There is a one-to-one mapping between the transitions in the diagram and the C
code generated. Blowups won't occur.
Concurrency, hierarchy, visual priorities, and synchronization features are
implemented by the code generator. For generated C/C++ code, concurrency is
implemented as a "fair" interleaving of statements, one per transition in the
diagram. The code generator makes sure that all concurrent transitions that
can fire at a given point in time will fire one after the other and that no
others will fire. This is a compile-time schedule.
In all languages supported by the code generator (C, C++, VHDL, and Verilog
HDL), the generated code fits into the system-level design as a "component,"
without forcing a system-level design methodology. In C, the controller is a
function called by the system-level C program (the main program) whenever it
wants. Each invocation fires all concurrent transitions enabled at that time,
then returns control to the calling program. This way the controller can be
scheduled in any way that C permits. In VHDL, the code generated is an
"architecture" for a controller entity. The designer provides the entity as
well as the entire system-level VHDL code. The controller's generated code
fits in as an architecture for that entity. In Verilog HDL, the code is a
"task" that can be invoked by the system-level Verilog module. Each invocation
fires all concurrent transitions enabled at that time, then returns control to
the calling program. Listing One is the C code generated for the traffic-light
controller example.



Visual Synchronization in the Traffic-Light Controller


Once concurrency is provided for, visual synchronization--a means for visually
specifying dependencies and relationships between concurrent threads of
control--is needed. In the traffic-light controller, when three cars have been
counted by the counter, it causes a transition to state Green2Main, thereby
aborting everything else inside Red2Main state (the Camera). The programmer
simply draws the transition from state c_3 to state Green2Main; everything
else is automatically derived from the diagram's semantics. 
Another example is in the same diagram, where the Counter thread synchronizes
the Camera thread into the Shoot state when the Counter is in state c_2; a
behavior specified by the transition from state c_2 to state Shoot. Another
instance of visual synchronization includes compound transitions with multiple
sources and/or targets, which act as a rendezvous (a meeting place between the
threads). Visual priorities in BetterState are visually programmed using
arrowhead colors. This is superior to hierarchical prioritization, because
event and condition priorities are not necessarily associated with states. For
example, a transition based on a new_car condition inside the counter might be
more important than the Timeout condition.


Scheduling the Traffic-Light Controller


As discussed earlier, the code generated for the controller is a component in
the overall system-level design (in C, this component is a function; in C++,
it's a class; and so on). In each language, this component needs to be invoked
by the system-level code. This is done in C/C++ by a function call, where each
call to the controller's function realizes one pass over all transitions in a
diagram, firing one or more concurrent transitions, and then returning control
to the calling program. In Verilog, the invocation is done using an always
statement that invokes the task, typically based on a clock event. In VHDL,
the invocation is done by a CLOCK input signal to the entity for this
architecture. This simple way of scheduling lets you invoke the controller in
a flexible way. Listing Two shows some possibilities.
Often, the controller will be invoked in some infinite loop, based on a clock,
a certain input, or some other event. Sometimes, the design needs to abort
this infinite invocation when the controller reaches a certain state; a
terminal state supports this property. When it reaches a terminal state, it
returns a value indicating that a terminal state has been reached, thereby
allowing the C invocation in Listing Three to break out of the infinite loop.
Similarly, in VHDL, when the controller reaches a terminal state, it will
suspend itself, generate a suspended signal, and resume only when the entity
receives a resume signal from the system-level design.


Animated Playback and Graphing


Often, you need to view the behavioral execution of a reactive component to
verify the design or analyze the actual behavior in the field under the real
stimuli. A playback mechanism allows the execution of the generated code to be
recorded in a database, then played back in an animated fashion onto the
original extended state diagram graphics. A state box flashing on/off might
indicate, for instance, that a state is being "visited." The execution might
use simulated stimuli (using a C or C++ debugger, for example) or the real
stimuli from the field.
Graphing is a vehicle for visually displaying visitation information from the
recorded database. This gives both you and system users insight into the
actual behavior in the field. For example, a controller for automatic-door
handling might recognize the fact that certain doors are open more than others
during certain time periods.
Figure 1 (a) Transformational systems; (b) reactive systems.
Figure 2 An extended state diagram.
Figure 3 Two concurrent threads of control.

Listing One 

/* C Code-Generation Output File #1 of 1: Generated for Chart Traffic_light 
by the Better-State Code Generator, Proprietary of R-Active Concepts
Cupertino CA, 95014, (408)252-2808 */

/*----- State ID dictionary
Use this dictionary to symbolically reference your states (for those States
that you gave names in the chart editor). You can also examine the state 
register, using the mapping provided as a remark. 
-------*/
#define DUMMY -7
#define DONT_CARE 0
#define St_c_3_P3 6 /* mapped to PS[0] */
#define St_Yellow_All_P2 14 /* mapped to PS[0] */
#define St_Green2Main_P2 24 /* mapped to PS[0] */
#define St_c_0_P3 86 /* mapped to PS[0] */
#define St_c_1_P3 89 /* mapped to PS[0] */
#define St_c_2_P3 90 /* mapped to PS[0] */
#define St_On_P3 73 /* mapped to PS[1] */
#define St_Off_P3 76 /* mapped to PS[1] */
#define St_Shoot_P3 78 /* mapped to PS[1] */
#define St_Red2Main_P2 33 /* mapped to PS[3] */
#define St_On_going_P2 23 /* mapped to PS[4] */ 
int CHRT_Traffic_light(int BS_Reset)
{
/* RESETing: calling CHRT_Traffic_light with BS_Reset>0 will reset your 
controller to it's default composite state, and will execute all on_entry 
actions for that state. */
 static int PS[5]= {St_Yellow_All_P2, DUMMY, DUMMY, DUMMY, DUMMY};
 int NS[5]= {0, 0, 0, 0, 0};
 int BS_i;
if (BS_Reset>0)
 {
 /* Reset state assignments */

 NS[0]=St_Yellow_All_P2; 
 NS[1]=DUMMY; 
 NS[2]=DUMMY; 
 NS[3]=DUMMY; 
 NS[4]=DUMMY; 
 /* On_entry actions for reset states */ 
 
Color_Main=YELLOW;Color_Sec=YELLOW;
 }
else
{
 /*-------*/
 if (PS[4]==St_On_going_P2) 
 if ( (NS[0] == DONT_CARE) 
 && (NS[1] == DONT_CARE) 
 && (NS[2] == DONT_CARE) 
 && (NS[3] == DONT_CARE) 
 && (NS[4] == DONT_CARE) 
 ) 
 if (Reset)
 {
 NS[0] = St_Yellow_All_P2 ;
 NS[1] = DUMMY ;
 NS[2] = DUMMY ;
 NS[3] = DUMMY ;
 NS[4] = DUMMY ;
 Color_Main=YELLOW;Color_Sec=YELLOW;
 }
 /*-------*/
 if (PS[0]==St_Yellow_All_P2) 
 if ( (NS[0] == DONT_CARE) 
 && (NS[1] == DONT_CARE) 
 && (NS[2] == DONT_CARE) 
 && (NS[3] == DONT_CARE) 
 && (NS[4] == DONT_CARE) 
 ) 
 if (!Reset)
 {
 NS[0] = St_Green2Main_P2 ;
 NS[1] = DUMMY ;
 NS[2] = DUMMY ;
 NS[3] = DUMMY ;
 NS[4] = St_On_going_P2 ;
 Color_Main=GREEN; Color_Sec=RED;
 }
 /*-------*/
 if (PS[0]==St_Green2Main_P2) 
 if ( (NS[0] == DONT_CARE) 
 && (NS[1] == DONT_CARE) 
 && (NS[3] == DONT_CARE) 
 ) 
 if (TIMEOUT)
 {
 NS[0] = St_c_0_P3 ;
 NS[1] = St_On_P3 ;
 NS[3] = St_Red2Main_P2 ;
 Color_Main=RED; Color_Sec=GREEN;
 }
 /*-------*/

 if (PS[3]==St_Red2Main_P2) 
 if ( (NS[0] == DONT_CARE) 
 && (NS[1] == DONT_CARE) 
 && (NS[2] == DONT_CARE) 
 && (NS[3] == DONT_CARE) 
 ) 
 if (TIMEOUT)
 {
 NS[0] = St_Green2Main_P2 ;
 NS[1] = DUMMY ;
 NS[2] = DUMMY ;
 NS[3] = DUMMY ;
 Color_Main=GREEN; Color_Sec=RED;
 }
 /*-------*/
 if (PS[1]==St_On_P3) 
 if ( (NS[1] == DONT_CARE)
 ) 
 if (Car_in_Junct)
 {
 NS[1] = St_Shoot_P3 ;
 }
 /*-------*/
 if (PS[1]==St_Shoot_P3) 
 if ( (NS[1] == DONT_CARE)
 ) 
 {
 NS[1] = St_On_P3 ;
 }
 /*-------*/
 if (PS[1]==St_On_P3) 
 if ( (NS[1] == DONT_CARE)
 ) 
 if (Manual_on)
 {
 NS[1] = St_Off_P3 ;
 }
 /*-------*/
 if (PS[1]==St_Off_P3) 
 if ( (NS[1] == DONT_CARE)
 ) 
 if (!Manual_on)
 {
 NS[1] = St_On_P3 ;
 }
 /*-------*/
 if (PS[0]==St_c_0_P3) 
 if ( (NS[0] == DONT_CARE) 
 && (NS[2] == DONT_CARE) 
 ) 
 if (New_car_waiting)
 {
 NS[0] = St_c_1_P3 ;
 NS[2] = 122 ;
 }
 /*-------*/
 if (PS[0]==St_c_1_P3) 
 if (NS[0] == DONT_CARE) 
 if (New_car_waiting)

 {
 NS[0] = St_c_2_P3 ;
 }
 /*-------*/
 if (PS[0]==St_c_2_P3) 
 if (NS[0] == DONT_CARE)
 {
 NS[0] = St_c_3_P3 ;
 }
 /*-------*/
 if (PS[0]==St_c_2_P3) 
 if ( (NS[1] == DONT_CARE)
 ) 
 {
 NS[1] = St_Shoot_P3 ;
 }
} /* if BS_Reset */
/* Assigning next state to present-state */
for (BS_i=0;BS_i < 5;BS_i++)
 if (NS[BS_i] != DONT_CARE) 
 {PS[BS_i]=NS[BS_i]; NS[BS_i]=DONT_CARE;}
 return 1;
 }
 /* end of BS controller */



Listing Two

/* calling the two controllers in a multi-rate scheme: the TLC runs twice
 for each cycle of the TM */

for (i=0;i<100; i++)
 {for (j=0;j<2;j++) CHRT_TLC(0);
 CHRT_TM(0);
 }
/* calling the charts stochastically */
for (i=0;i<100;i++)
 {x1=rand(); 
 if (x1/RAND_MAX 0.5) CHRT_TLC(0);
 x2=rand();
 if (x2/RAND_MAX 0.3) CHRT_TM(0);
 }
/* A Round-robin scheduler in an endless execution (infinite-loop) */
while(1)
 {CHRT_chart1(0);
 CHRT_chart2(0);
 CHRT_chart3(0);
 ...
 }



Listing Three

/* Jumping out of an infinite-loop execution of the controller using Terminal
states. When the controller reaches a Terminal state (designated as such using
the State C-Code Dialog while drawing the state), it returns a 0 value. */


while (1)
 {
 if (!CHRT_TLC(0)) break; /* will break if Terminal state has been reached */
 }


























































October, 1994
Network Communications Using the NetBEUI Protocol


Named pipes and mailslots for NT and Chicago




Marshall Brain


Marshall works for Interface Technologies (Wake Forest, NC), which does
software design, consulting, and programmer training in Windows NT, Motif,
C++, and object-oriented design. He is the lead author for Prentice Hall's
five-book series on Windows NT, which includes Win32 System Services: The
Heart of Windows NT. He can be reached at brain@iftech.com.


This article describes the NetBEUI (NetBIOS Extended User Interface) protocol
and shows how to apply it in your own applications. NetBEUI is the native
network protocol for Windows NT Version 3.1 and allows the operating system to
communicate with other NT machines, as well as machines running Windows for
Workgroups. NT uses NetBEUI internally to handle such things as disk and
printer sharing over the network and also supports NetBEUI directly in the
Win32 API, so that application developers can employ network communications in
their own programs. Chicago, Microsoft's successor to Windows 3.1, will use
the same NetBEUI interface and function calls in its API.
Two different facilities in the API, mailslots and named pipes, support
NetBEUI network communications. I'll discuss both techniques in this article
as well as their respective strengths and weaknesses. Because the Win32 API
directly supports NetBEUI, it is remarkably easy to create applications that
use the network in many different ways. For example, you might want to create
a multiuser conferencing system for your network, similar to the "CB" systems
you find on CompuServe and other BBSs. In a system like this, users run the
conferencing program on their machines, and any messages they type get
broadcast to all of the other users on the network. You generally use
mailslots for the implementation because mailslots make it easy to broadcast
information. Any multiplayer game that uses the network employs similar
techniques. 
When you want to stream large quantities of data between two machines, you
normally use point-to-point named-pipe connections. For example, you would use
named pipes for a digitized phone or video system implemented on a network.
Any client/server configuration also uses named pipes. One central machine
acts as the server, and then all of the clients connect to it individually
with named pipes; the server uses a multithreaded approach to handle multiple
connections simultaneously. 


Network Basics


Figure 1 illustrates a simple network that you might find in a small business.
Each machine has a network adapter that connects it to the network, as well as
a name that uniquely identifies it. The network adapter determines the type of
network, generally either Ethernet or Token Ring. The adapter also controls
the media used for the network: coax or twisted pair, for example. In this
kind of simple network, all machines can communicate with all others equally. 
Machines can communicate using NetBEUI via either mailslots or named pipes.
With a mailslot, one machine can broadcast a message that is received by all
of the other machines on the network. With a named pipe, one machine chooses
another and forms a specific connection to it. The advantage of a named pipe
is that the connection is reliable. If the connection breaks--because a
network card or cable malfunctions, for example--both ends of the connection
receive notification of the break immediately. Mailslots are unreliable in the
sense that the sender has no way to confirm receipt of messages. The advantage
of a mailslot is that it is easy to get information to many machines
simultaneously.
Figure 1 shows one network segment. A segment is defined as a group of
machines directly connected to one another. There is a limit to the number of
machines that can exist on one segment, however, because network traffic grows
with the number of machines. Generally, the limit is about 30 machines. In a
large company, each department might have a single segment consisting of 20 to
30 machines. All of the segments are then connected to one another with a
router so that they can intercommunicate, as shown in Figure 2. This
distinction is important because, in general, NetBEUI messages are not
routable.
Using mailslots and named pipes, three different communication architectures
are possible: broadcast, point-to-point, and client/server. In broadcast mode
using a mailslot, one machine sends a message to all others on the segment. In
point-to-point communications, one machine forms a specific connection with
another and data passes back and forth using a named pipe. In a client/server
relationship, one machine acts as the server, and all clients connect to it
with point-to-point named-pipe connections. To emulate a broadcast operation
with a client/server architecture, one machine sends a message to the server,
and then the server sends duplicates of the message individually to each
client. 


Mailslot Connections


Mailslots are the simplest way to perform network communications in the Win32
API. Mailslots provide a one-directional communication path from a sender to
one or more recipients on the same network segment. You generally use
mailslots when you want to send data to many recipients at once. 
Mailslots are extremely easy to create, and reading and writing are done using
the API's normal ReadFile and WriteFile functions. When creating the mailslot,
a special pathname passed to CreateMailslot causes the system to create a
mailslot rather than a normal file. The details of this call are found in the
Win32 programmer reference.
The programs shown in Listings One and Two are as simple as possible so that
you can easily see the steps necessary to transmit and receive data through a
mailslot. Listing One is sms_recv, a program that creates and reads from a
mailslot server using polling. The code shows how to create a mailslot server
with CreateMailslot. The server is a queue that holds messages received until
the user reads them using ReadFile. Messages are stored in the queue in the
order of their arrival. 
The name of the mailslot must be of the form \\.\mailslot\[path]name. This
looks just like a filename, and it acts like a filename in ReadFile. However,
no actual file is created by the function; the mailslot is simply held in
memory. A typical mailslot name, \\.\mailslot\sms, is used in Listing One. It
is also possible to add subdirectories to the path to further categorize
mailslots. 
When you create the mailslot, you can specify the maximum message length, as
well as the read timeout. Mailslots can send no more than 400 bytes over the
network in a single message. If you set the timeout value to 0, then any call
to ReadFile will return immediately whether or not there is anything in the
buffer. If you set the timeout value to a specific number of milliseconds, any
read operation will fail if that amount of time elapses before a message
arrives. You can also use the MAILSLOT_WAIT_FOREVER constant to create a
blocking read.
The sms_recv program takes the non-blocking approach and uses GetMailslotInfo
to make sure that messages exist in the mailslot queue before performing a
read. This function returns the maximum length of messages in the queue, the
length of the next message in the queue, and the number of messages waiting.
The code continuously checks to determine if messages exist in the mailslot.
If so, then it reads the first one. Reading from a mailslot is just like
reading from a file.
Any message sent from any computer on the network to a machine running
sms_recv will be received, provided that the mailslot names of the sender and
the receiver match. The program sms_send (Listing Two) shows how to send
messages to a mailslot; it writes to a mailslot every five seconds. It starts
by using the normal CreateFile to open a writable connection to the mailslot.
The program is referred to as a "mailslot client" because it writes to
mailslot servers already running on the network. CreateFile understands, via
the special mailslot filename, that you are not creating a file but instead
wish to communicate with a mailslot. Four different formats for the filename
are possible. The first is:\\.\mailslot\[path]name. In this case, the name
specifies the local machine. Alternatively, you can identify a specific
machine on the net via the secondformat\\machine\mailslot\[path]name. To
specify a broadcast operation to all machines in the local machine's primary
domain, you use the format: \\*\mailslot\[path]name. Finally, to all machines
in an indicated domain, you would use: \\domain\mailslot\[path]name. For more
information on domains and domain controllers, refer to Windows NT
Administration: From Single Machines to Heterogeneous Networks (Prentice-Hall,
1994).
After opening the mailslot, sms_send gets the local computer's name using
GetComputerName, and then broadcasts the name to all mailslots in the current
domain every five seconds. 
The sms_recv program checks for messages via polling. Every half-second it
calls GetMailslotInfo and checks to see if any messages are waiting in the
slot. In general, polling is not a good technique for a multithreaded
environment because it is inefficient. You can eliminate polling by setting
the timeout value in CreateMailslot to an appropriate value and then calling
ReadFile with a buffer length of zero to wait for a message to arrive. Once
this call to ReadFile returns, you know a message exists, so you can then call
GetMailslotInfo and ReadFile, as in Listing One.
When you run sms_send, it broadcasts to all machines on the network. If you
run multiple copies of the reader on the same or different machines, all of
them will see the messages produced by the writer. Alternatively, you can run
multiple writers on the net, and any copies of the reader will see the
messages from all of them. In both Listing One and Two, note the presumption
that the program will be terminated externally. You can formally close either
a mailslot server or client using CloseHandle.


Named Pipes


Named pipes provide a guaranteed delivery mechanism. Instead of broadcasting
the packet onto the network, you form a distinct connection to another machine
with a named pipe. If the connection breaks, both parties to the connection
find out as soon as they try to send or receive anything. Packets are also
guaranteed to arrive in sequence through a named pipe. The only problem with
named pipes is that you lose the ability to broadcast packets. To broadcast
anything, all of the target machines must have an individual connection to a
central server, and the server must separately transmit the message to each
one. 
Named pipes are only slightly more difficult to create than mailslots.
Listings Three and Four show how to create a simple, point-to-point connection
between two applications using named pipes. The program ssnprecv (Listing
Three) is a named-pipe server (receiver); its counterpart is ssnpsend (Listing
Four), which connects to the receiver and sends it messages. 
You can run both these programs on the same machine. Launch the receiving
program first, and then on the same machine, run ssnpsend. It will query you
for the name of the machine to connect to. Type "." or enter your machine
name. You will see a message sent from sender to receiver every five seconds
or so. When you kill off the sender, notice that you immediately see a message
in the receiver indicating that it has detected the break in the pipe. If you
try to start up the sender without the receiver running, the sender will fail
immediately because it cannot connect. Unlike mailslots, pipes can tell when
the other end is not working properly.
A named-pipe connection can occur across the network as simply as it occurs on
the same machine. For example, if ssnprecv is running on a machine named
"orion," you can log in to a different machine using an account with the exact
same login ID and password as the one you are using on "orion." Run ssnpsend
on the new machine and enter the name "orion" when it asks for the machine's
name. The connection will occur properly. Note that, with named pipes, you
must know the name of the machine running the server. 
Also, note that if a different user tries to connect to the receiver, the
connection fails. For example, when user "jones" is running the receiver on
the machine "orion" and user "smith" tries to connect from another machine,
the connection fails with an "access denied" error. This is the NT security
system at work. See "Win32 System Services" for details on named-pipe
security. 
The ssnprecv program starts by creating a named-pipe server using
CreateNamedPipe. The name used with CreateNamedPipe will always have the form
\\.\pipe\[path]name. 
As with mailslots, you can specify a path before the name of the pipe to
clearly distinguish it from other pipes on the system. The openMode parameter
passed to CreateNamedPipe lets you determine the direction of the pipe. Named
pipes can be one directional or bidirectional, depending on certain #defined
constants used with the openMode parameter. These constants are:
PIPE_ACCESS_DUPLEX, PIPE_ACCESS_INBOUND, and PIPE_ACCESS_OUTBOUND.
The pipeMode parameter of CreateNamedPipe determines whether the pipe works
with a pure stream of bytes or with packets of bytes, called "messages." A
stream of bytes has no logical boundaries. Messages contain a group of bytes
perceived as a unit. You can declare byte or message behavior in both the read
and write directions, via the following constants: PIPE_TYPE_MESSAGE,
PIPE_TYPE_BYTE, PIPE_READMODE_MESSAGE, and PIPE_READMODE_BYTE.
A pipe can have more than one instance on a single machine. This capability
allows an application to handle multiple clients, each in different threads,
and is required to create a named-pipe server. Because the sender/receiver
pair in Listings Three and Four comprises a simple point-to-point connection
where only one instance is necessary, a maximum of one instance is specified
in the call to CreateNamedPipe. See "Win32 System Services" for details on
server configurations.
The ssnprecv program next waits for a connection on the named pipe using
ConnectNamedPipe. A connection is formed in the server when a client program
calls CreateFile with the proper machine and named pipe specified as its
destination. Upon connection, ConnectNamedPipe returns. Alternatively, you can
specify an overlapped structure and ConnectNamedPipe will return immediately
and later signal the event upon connection. 
The server then enters a loop, waiting for data to arrive. ReadFile behaves
slightly differently here than it does with files. Because this named pipe is
in message mode, ReadFile will return as soon as it receives a complete
message, regardless of how many bytes the message contains. It is possible to
use a blocking read as shown, or an overlapped read. 

The ssnpsend program in Listing Four is a simple client for ssnprecv. It
starts by creating a connection to the named pipe via CreateFile. It then
writes messages using WriteFile. Each individual call to WriteFile constitutes
a message at the receiving end of the named pipe, so the receiver's ReadFile
function will unblock when it receives the message. Each time the client
writes, the server produces a message on the screen. 
If two copies of the client try to connect to ssnprecv at the same time, then
the server will reject the second client. If you terminate either the client
or the server, then the other half of the pair will immediately terminate when
it detects the broken connection.
Figure 1 A typical network.
Figure 2 Two segments connected by a router.

Listing One 

//*******************************************************************
// sms_recv.cpp -- this creates a mailslot server and reads from it.
// The mailslot receiver uses polling. By Marshall Brain.
//*******************************************************************
#include <windows.h>
#include <iostream.h>

int main()
{ char toDisptxt[80];
 HANDLE hSMS_Slot;
 DWORD nextSize,Msgs,NumBytesRead;
 BOOL Status;

 /* Create a mailslot for receiving messages */
 hSMS_Slot=CreateMailslot("\\\\.\\mailslot\\sms",
 0, 0, (LPSECURITY_ATTRIBUTES) NULL);
 /* Check and see if the mailslot was created */

 if (hSMS_Slot == INVALID_HANDLE_VALUE)
 { cerr << "ERROR: Unable to create mailslot " 
 << GetLastError() << endl;
 return (1);
 }
 /* Repeatedly check for messages until the program is terminated */
 while(1)
 { Status=GetMailslotInfo(hSMS_Slot,
 (LPDWORD) NULL, &nextSize, &Msgs,(LPDWORD) NULL);
 if (!Status)
 { cerr << "ERROR: Unable to get status. " 
 << GetLastError() << endl;
 CloseHandle(hSMS_Slot);
 return (1);
 }
 if (Msgs) /* If messages are available, then get them */
 { 
 /* Read the message and check if read was successful */
 if (!ReadFile(hSMS_Slot, toDisptxt, nextSize,
 &NumBytesRead, (LPOVERLAPPED) NULL))
 { cerr << "ERROR: Unable to read from mailslot " 
 << GetLastError() << endl;
 CloseHandle(hSMS_Slot);
 return (1);
 }
 cout << toDisptxt << endl;/* Display the Message */
 }
 else Sleep(500); /* Check for new messages twice a second */
 } /* while */
}


Listing Two 


//***************************************************************
// sms_send.c -- a simple mailslot sender that writes to
// a mailslot every five seconds. By Marshall Brain.
//***************************************************************
#include <windows.h>
#include <iostream.h>
#include <string.h>

int main()
{ char toSendTxt[100], buffer[100];
 DWORD bufferLen=100, NumBytesWritten;
 HANDLE hSMS_Slot;
 BOOL Status;
 /* Create the mailslot file handle for sending messages */
 hSMS_Slot=CreateFile("\\\\*\\mailslot\\sms",

 GENERIC_WRITE, FILE_SHARE_READ,
 (LPSECURITY_ATTRIBUTES) NULL,
 OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL,(HANDLE)NULL);
 /* if the mailslot file was not opened, terminate program */
 if (hSMS_Slot == INVALID_HANDLE_VALUE)
 { cerr << "ERROR: Unable to create mailslot "
 << GetLastError() << endl;
 return (1);
 }
 GetComputerName(buffer, &bufferLen); /* form string to send */
 strcpy(toSendTxt, "Test string from ");
 strcat(toSendTxt, buffer);
 /* Repeatedly send message until program is terminated */
 while(1)
 { cout << "Sending..." << endl;
 /* Write message to mailslot */
 Status=WriteFile(hSMS_Slot, toSendTxt, (DWORD) strlen(toSendTxt)+1,
 &NumBytesWritten, (LPOVERLAPPED) NULL);
 /* If error occurs when writing to mailslot,terminate program */
 if (!Status)
 { cerr << "ERROR: Unable to write to mailslot "
 << GetLastError() << endl;
 CloseHandle(hSMS_Slot);
 return (1);
 }
 Sleep(4800); /* Wait sending the message again */
 } /* while*/
}








Listing Three

//*********************************************************************
// ssnprecv.cpp --- a simple named pipe server (receiver). The 
// server will wait and accept one connection, then receive messages 
// from it. By Marshall Brain.
//*********************************************************************

#include <windows.h>
#include <iostream.h>

int main()
{ char toDisptxt[80];
 HANDLE ssnpPipe;
 DWORD NumBytesRead;

 /* Create a named pipe for receiving messages */
 ssnpPipe=CreateNamedPipe("\\\\.\\pipe\\ssnp",
 PIPE_ACCESS_INBOUND, PIPE_TYPE_MESSAGE PIPE_WAIT,

 1, 0, 0, 150, (LPSECURITY_ATTRIBUTES) NULL);
 /* if named pipe was not created, terminate */
 if (ssnpPipe == INVALID_HANDLE_VALUE)
 { cerr << "ERROR: Unable to create a named pipe. "
 << endl;
 return (1);
 }
 cout << "Waiting for connection... " << endl;
 /* Allow a client to connect to the pipe, terminate if unsuccessful */
 if(!ConnectNamedPipe(ssnpPipe, (LPOVERLAPPED) NULL))
 { cerr << "ERROR: Unable to connect a named pipe "
 << GetLastError() << endl;
 CloseHandle(ssnpPipe);
 return (1);
 }
 /*Repeatedly check for messages until the program is terminated. */
 while(1)
 { /* Read the message and check to see if read was successful */
 if (!ReadFile(ssnpPipe, toDisptxt, sizeof(toDisptxt),
 &NumBytesRead, (LPOVERLAPPED) NULL))
 { cerr << "ERROR: Unable to read from named pipe "
 << GetLastError() << endl;
 CloseHandle(ssnpPipe);
 return (1);
 }
 cout << toDisptxt << endl;/* Display the Message */
 } /* while */
}




Listing Four

//***************************************************************
// ssnpsend.cpp -- a simple named pipe sender.
// This connects to receiver (ssnprecv) and sends it messages.
//***************************************************************
#include <windows.h>
#include <iostream.h>

int main()
{ char *toSendtxt="Test String";
 HANDLE ssnpPipe;
 DWORD NumBytesWritten;
 char machineName[80],pipeName[80];


 cout << "Enter name of server machine: ";
 cin >> machineName;
 wsprintf(pipeName, "\\\\%s\\pipe\\ssnp", machineName);


 /* Create the named pipe file handle for sending messages */
 ssnpPipe=CreateFile(pipeName,
 GENERIC_WRITE, FILE_SHARE_READ,
 (LPSECURITY_ATTRIBUTES) NULL,
 OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL,
 (HANDLE) NULL);
 /* If the named pipe file was not opened, terminate program */
 if (ssnpPipe == INVALID_HANDLE_VALUE)
 { cerr << "ERROR: Unable to create a named pipe " << endl;
 cerr << GetLastError() << endl;
 return (1);
 }
 /* Repeatedly send message until program is terminated */
 while(1)
 { cout << "Sending..." << endl;
 /* Write message to the pipe */
 if (!WriteFile(ssnpPipe,
 toSendtxt, (DWORD) strlen(toSendtxt)+1,
 &NumBytesWritten, (LPOVERLAPPED) NULL))
 { /* If an error occurs when writing to pipe, terminate program */
 cerr << "ERROR: Unable to write to named pipe "
 << GetLastError() << endl;
 CloseHandle(ssnpPipe);
 return (1);
 }
 Sleep(4800);/* Wait before sending the message again */
 } /* while*/
}





























October, 1994
Examining the Software Development Process


How do you measure the effectiveness of a development process?




James O. Coplien


Jim is a member of the technical staff at AT&T Bell Labs and the author of
Advanced C++ Programming Styles and Idioms (Addison-Wesley, 1992). Jim can be
contacted at cope@research.att.com.


From both an applications and a systems perspective, software is getting
larger and larger, involving more and more programmers during the development
cycle. For reasons ranging from resource planning to cost control, it is
becoming increasingly important that you be able to evaluate and control the
overall software-development process.
At Bell Laboratories Software Production Research Department, we've borrowed
from object-oriented analysis a tool called "CRC cards" to evaluate
development processes. CRC is an acronym for "classes, responsibilities, and
collaborators"--three of the most important dimensions of abstraction in
object-oriented analysis. (For more information on CRC cards, see Kent Beck's
"Think Like An Object," UNIX Review, September 1991.) Our intent was to extend
the use of CRC cards--originally designed to analyze system-software
architecture development--to study software-development organizations within
AT&T. Later, we applied the technique to other development projects (including
Borland's Quattro Pro for Windows) to test the validity of the methodology and
to learn from other organizations. 
When it comes to capturing the essential properties of organizational roles,
we've found that CRC cards have advantages over traditional role-modeling
tools. CRC cards are informal, easily learned and understood by developers,
and delineate the social interactions that are important for sound, empirical
models of organizations such as software-development groups. CRC cards let us
gather process and organizational information from our development colleagues
using their vocabulary, at their level, giving us a faithful model of a
development culture.
To understand, compare, and model the software-development process, we enter
CRC-card data into a process-evaluation program called "Pasteur" which
contains transcription of the CRC cards wrapped in a hypertext database called
"Eggs." (See Hypertext: Concepts, Systems and Applications, Cambridge
University Press, 1990.) The Pasteur environment lets the programmer create an
on-screen card abstraction, then type information into appropriate fields.
Cards can be resized and moved about the screen. On a typical workstation
display, we can fit about 40 full-size, nonoverlapping cards. Once data is
entered, we can browse, cluster, and animate the cards. 
CRC cards provide an object-oriented analysis of the structure of an
organization by dividing it into objects ("roles") that are cohesive locales
of related responsibilities. Each role's responsibilities to the organization
are written on the left side of a 3x5 index card; see Figure 1. The right side
of the card lists the helpers (or collaborators) used by a role to carry out
its responsibilities. Responsibilities and collaborators are discovered in a
real-life, role-playing exercise where development scenarios are simulated.
The interests of a role are represented by someone who commonly fills the
role, by a domain expert in the appropriate area, or by someone who is
otherwise familiar with the work.
Collaborations between roles form a network, or graph. The edges of the graph
are subjectively weighted by the participants (high, medium, or low) according
to the strength of the coupling between two roles. The graph can be visualized
in many different ways: a force-based network, a topologically sorted
hierarchy or directed graph, an interaction grid, and so on. We use the
Pasteur process-analysis environment to create and interact with such
visualizations.
These visualizations offer insights into organizational dynamics. For example,
cliques can be identified from the natural-force-based networks. Interaction
grids offer insight into the cohesiveness of an organization. Highly
specialized patterns have been noted in visualizations using each of these
techniques, including a tendency for roles to cluster according to their
degree of extrovertedness or introvertedness with respect to the process
community.
Once data has been gathered, analysis can begin. One obvious analysis is to
illuminate all the roles having a strong coupling to a chosen role to
determine the centrality of that role in the process. A role was considered
central to the process to the degree it had strong coupling with remaining
roles in the process. One curiosity we discovered was that the "developer"
role had strong coupling with most of the internals of the
software-integration process. This was surprising, since the developer is
supposedly isolated from the details of software integration.
We have found many other recurring patterns in these visualizations, many of
which bear out common management folklore, others which are counterintuitive,
and still others which deserve further study. One goal of our research is to
correlate these patterns to high productivity, quality, and responsiveness in
the organizations that generate them. By understanding these patterns, we hope
to develop principles from which new, highly productive organizations can be
built. This work follows the lead of the generative-pattern movement which,
like CRC cards, started as a software-design phenomenon but has also found a
home in organizational analysis (see "Patterns and Software Development" by
Kent Beck, DDJ, February 1994). 


Applying the CRC-Card Process in the Real World 


The development of Borland's Quattro Pro for Windows spreadsheet (since sold
to Novell) is one of the more remarkable processes we've encountered in the
Pasteur process-research project. The project assimilated requirements,
completed design, implemented one million lines of code, and completed testing
in 31 months. Coding was done by no more than eight people at a time, which
means that individual coding productivity was higher than 1000 lines of code
per staff-week. (Granted, lines of code is an imperfect measure of
productivity at best. However, a disparity of one or two orders of magnitude
between the Borland experience and more typical numbers from the rest of the
industry cannot be explained with the usual attacks on source-line
measurements.)
The project capitalized on its small size by centering development activities
around daily meetings where architecture, design, and interface issues were
discussed. Quality-assurance and project-management roles were central to the
development sociology, in contrast to the developer-centric software
production most often observed in our studies of AT&T telecommunications
software. Analyses of the development process are "off the charts" relative to
most other processes we have studied. 
As with all Borland software, Quattro Pro for Windows (QPW) was designed to be
an independent, self-contained deliverable, sharing a common infrastructure
and look-and-feel (and, conjecturally, code providing this functionality). The
total code volume of all Borland software, expressed as original source lines,
is huge: tens, if not hundreds, of millions of lines of code (my estimate). 
QPW had a core team of four people who interacted intensely over two years to
produce the bulk of the product. Prototyping was heavily used: Two major
prototypes were built and discarded (the first in C; the second, called
"pre-Crystal," in C++). The team defined the architecture, built early
prototypes and foundation code, and participated in implementation through its
delivery. Additional programmers were added after about six months. The
prototypes drove architectural decisions that were discussed in frequent
(almost daily) meetings. 
The methodology was iterative. Except for architectural dialogue, the
developers worked independently. Early code can be viewed as a series of
prototypes that led to architectural decisions, and drove the overall
structure of the final system.
The QPW final implementation stages stressed Borland's C++ compiler--being
developed in parallel with QPW--to the max. There was uncharacteristically
tight coupling between the QPW group and the language group. QPW was one of
the largest and earliest projects for the C++ compiler. 
As soon as the software took shape (after about a year), additional roles were
engaged in development activities. Quality assurance (QA), testers, and others
were at last allowed to see and exercise copies of the code that had been kept
under wraps during early development. These roles had been staffed earlier,
but were engaged only when the developers felt they had something worth
testing. 


Analysis of the Pasteur Data for QPW


We most frequently use a natural-force-based network analysis to analyze
organization data collected in the Pasteur database. This analysis produces an
adjacency diagram in which a default repelling force is established between
each pair of roles. There is also an attracting force between pairs of roles
that are coupled to each other by collaboration or mutual interest; a stable
placement occurs when these forces balance. Figure 2 shows the diagram that
results when applying this analysis to QPW. Several items set this project
apart from other organizational process models we've made: 
The QPW process has a higher communication saturation than 89 percent of the
processes we've looked at.
The adjacency diagram shows that all roles have at least two strong
connections to the organization as a whole. The project's interaction grid is
dense. The coupling per role is in the highest 7 percent of all processes
we've examined. This is a small, intensely interactive organization.
There is a more even distribution of effort across roles than in most other
processes we've examined. The roles in the adjacency diagram are shaded
according to their intensity of interaction with the rest of the organization.
In the QPW process, project manager and QA glow brightly; coders, a little
less so; architect, product manager, and beta sites are "third-magnitude
stars;" and tech support, documentation, and VP still show some illumination.
Most "traditional" processes we've studied show a much higher concentration of
interaction near the center of the process. That is, most other processes
comprise more roles that are loosely coupled to the process than we find in
QPW. That may be because QPW is self-contained, or because it is small. It may
also be because the process was intense: a high-energy development project
racing to hit an acceptable point on the market-share curve.
Project manager and product manager are tightly coupled, central roles in the
process. These managerial roles were filled by individuals who were also key
technical contributors to the project (they wrote real code), which
contributed to their acceptance and success as process hubs. Product manager
was a role that was employed only after a year of development.
QA is a tightly coupled, central role. Many organizations consider QA to be an
external function, outside their organization and process. At Borland, QA
becomes a way of life once development has converged on a good design and a
stable user interface. For QPW, this was about 12 months into development.
The CEO (Philippe Kahn) figures strongly in the organization. 
The overall interaction-grid pattern (Figure 3) differs from that found in
other processes. Interaction grids show patterns of interactions in an
organization, and are particularly useful when the organization is large or
when its interactions are dense. We most often use an interaction grid where
roles are ordered on both axes by their degree of coupling to the organization
as a whole. The most integral roles are placed near the origin. Most other
processes exhibit a characteristic pattern of points along the axes, with
lower point density and lower intensity for increasing distances from either
axis. In QPW, there is a general lessening of density and intensity as you
move toward the northeast quadrant of the interaction grid. The northwest and
southeast quadrants of the Borland grid remain more dense than we've seen in
other processes.
Each project member had to personally sign off on a set of project floppy
disks before they were released to the next stage (beta test or the "street").
Accountability, ownership, and pride in one's work were central to the
process. 
QPW is organized along lines of domain specialization. Domains important to
QPW are dependency-registration software, human interfaces, databases, and a
few others, and an individual was identified for each of those domains. Within
a domain, each individual did what he or she was good at. Equally important is
what these individuals were not good at; they were not expected to take
responsibility for domains not related to their specialty. For example, when
it came to documentation, developers were supported by a documentation team
that developed internal and external documentation. The time spent by
developers in conveying information to the documentation organization is far
less than it would take for them to commit it to writing, put it into an
acceptable format, and have it edited for linguistic elegance. (By contrast,
most AT&T developers write their own memos. It's not clear whether this stems
from our history, our organizational boundaries, the nature of our business,
or reward mechanisms. In any case, developers spend roughly 13 percent of
total development time creating and refining memos.)
QPW development was highly iterative. To understand the nature of the
iteration, you must understand its ramifications for architecture and
implementation. You must also understand the culture in which changes were
approved and decisions made. This takes us into the realm of project meetings,
always a topic of interest in a large development organization.
The core architecture team met daily to hammer out C++ class interfaces,
discuss overall algorithms and approaches, and develop the underlying
mechanisms on which the system would be built. These daily meetings lasted
several hours; from what I heard, the project was made more of meetings than
anything else. Everyone's external interfaces were globally visible and
globally discussed. The software structure was built on the domain expertise
brought to the table by domain experts, but it was socialized and tempered by
the needs of the product as a whole.
In spite of the intense, meeting-oriented development culture, class
implementations were fleshed out in private. Individuals were trusted with
doing a good job of implementation: After all, project members were
acknowledged experts in their domains. 
There are three observations worth noting about the QPW organization's
communication architecture:
Meetings are not a bad thing. 
Development took place on two levels: architecture and implementation. Both
were ongoing and interacted with each other strongly. New implementations
suggested architectural changes, and these were discussed at the daily
meetings. Architectural changes usually required radical changes to the
implementation. The implementors' ability to quickly reflect those changes in
their implementation was essential. 
The development interaction style was a good match for the implementation
technology the group had selected. Object-oriented development leads to
abstractions whose identity and structure are largely consistent across
analysis, design, and implementation. Classes hide implementations and
localize design decisions, though their external interfaces are globally
visible. Mapping C++ classes and people close together made it possible for
developers to reason about the implementation off-line, away from the meetings
that dealt with interface issues.

This is contrary to the commonly presumed model that the object paradigm makes
it possible for an individual to own a class, interface and all, with a
minimum of interaction with other class owners in the organization. It should
be emphasized that classes are good at hiding implementation and detailed
structure (that is, in derived classes) but not at reducing the ripple effect
of interface changes. In fact, because interactions in object-oriented systems
form an intricate graph, and interactions in structured, procedural systems
usually form a tree, the ripple effect of interface changes in an OO system
can be worse than in a block-structured, procedural design.
A question frequently posed to organizations using iterative techniques is:
"How do you mark progress or do scheduling?" For QPW, the answer had two
parts: First, the team relied on its experience in sizing similar jobs, and
found the overall estimates to be satisfactory. Second, they kept multiple
sets of books internal to Borland to achieve different goals. The hardest of
the dates was owned (and not divulged) by the financial group. A "real" street
date was needed so the company could provide planning and resource support to
the development. But internal scheduling provided incentive, focus, and
pressure for development to move ahead. 


Looking Inward


QPW used iteration throughout the development cycle, increasing the stability
of the software and decreasing iteration over time. This iteration took place
in what might be described as a "traditional corporate context." From its
outset, QPW was a strategic, visible product in the company. That meant that
all development areas were primed for its deployment, including QA,
documentation, and product management.
Though these areas were staffed from the outset, they were denied access to
the details of the product until about a year into development. That gave the
architect/developers room to change the functionality, interface, and
methodology of the project before interfacing it with the corporate culture
and ultimately with users. 
Can an organization without an explicit, conscious process effort enjoy the
same process benefits as an organization with full ISO 9000 process
certification? Certified organizations may reap stronger process benefits than
those lacking any formal concern for process; nevertheless, this Borland
project had many of the hallmarks of a mature development organization.
Borland is not subject to the ISO 9000 series process standards, has no
concept of its SEI CMM rating, and is not conversant with the
software-development-process lingo being used increasingly in large software
organizations. For someone interested in "process" to visit was a rare event.
Before going through the CRC-card exercise, my presence as a process engineer
was viewed with interest, curiosity, and even suspicious doubt. By the time I
left, those involved were able to identify some parts of their value system
and culture with what we call "process." 
Even though the organization had no codified system of process, it was keenly
aware of what it did, how it did it, and what worked. It viewed software
development as something fundamentally driven by special cases (at least for
initial, generic development); repeatability was not an important part of
their value system. Members of the organization were nonetheless able to
articulate, in great detail, aspects of their process that demonstrated that
they shared a single model, perhaps based on development rules, of how
development should be done.
Many organizations we've interviewed have a weak or confused notion of the
responsibilities and interaction of roles within the organization. Most AT&T
organizations with a weak notion of process are those who have not gone
through an ISO audit, yet developers' notions of their roles even in some
ISO-certified organizations are fuzzy at best. Other organizations that do not
have any conscious process culture are still able to articulate their process
in explicit terms, at a level of abstraction that transcends technology,
tools, or methodology.
In his book, Quality Software Management, Vol. 1 (Dorset House, 1991), Gerry
Weinberg describes several levels of organization. Organizations at levels 1
and 2 need strong managerial direction. There is a paradigm shift between
levels 2 and 3 of the SEI Capability Maturity Model (CMM), so organizations at
level 3 and above are self directing. Borland appears to be in this latter
category--though it may not register a level-3 rating according to commonly
accepted criteria.
Charlie Anderson, one of the QPW architects, told us how the project team felt
about itself and its accomplishments. "We are satisfied by doing real work,"
he noted as he thought about how the project dovetailed daily architectural
meetings with implementation. "Software is like a plant that grows," he mused.
"You can't predict its exact shape, or how big it will grow; you can control
its growth only to a limited degree." 


Process and Quality


One widely held stereotype of companies that build PC products (or of
California-based companies) is that they hire "hackers" and that their
software is magic, unreadable spaghetti. Meeting with the QPW group dissolved
that stereotype for me. Their constant attention to architectural issues,
efforts to build an evolving structure, and care to document the system well
(both externally and internally), are all hallmarks of professionalism. 
If there was any disappointment on the project, it was in the inability to
bring down the bug curve as fast as they wanted. They noted that the shapes of
software-development bug curves are well known, so there is hope of predicting
how long it will take to ferret out an acceptable fraction of the remaining
errors. However, the boundary conditions for the curve aren't known at the
outset, so it is difficult to predict the exact shape of the curve until some
bugs have been discovered and resolved. Inability to predict the exact shape
of this curve resulted in a modest schedule slip.


Conclusions 


Can other organizations capture the architecture of the Borland development
process? To the extent that large jobs can be partitioned into small ones, the
Borland approach may be suitable for individual parts of large developments.
Borland was able to coax a lot of production code from a few people in a short
time. Perhaps a PC-based development environment and PC-based deployment
platform make developers more effective, and perhaps QPW doesn't have the same
fault-tolerance requirements one finds in large telecommunications systems.
However, those considerations alone don't seem to account for figures that are
orders of magnitude above industry norms.
Figure 1 Typical CRC card.
Figure 2 Natural-force-based analysis of the QPW project roles.
Figure 3 Interaction grid for the QPW project.






























October, 1994
OLE2 and .INI Files


Putting OLE2's persistent storage model to work




Billy Cousins


Billy is a senior application developer for AT&T with 14 years of
application-development experience. He can be contacted at
billy.cousins@columbiasc.ncr.com.


Object Linking and Embedding (OLE) is an architecture that allows applications
to integrate data or objects into a compound document. OLE2 provides a large
set of interfaces that developers must understand to produce OLE2-compliant
applications. Applications use these interfaces to provide features for
linking and embedding objects, persistent storage, in-place editing,
drag-and-drop, and more. 
One basic feature of the OLE model is a sophisticated storage system called a
"compound file." Compound files make hierarchies of objects persistent. This
storage system's paradigm is that of a file system. A disk file system is
composed of directories and files. Directory objects contain other directories
and files. A file contains the user's data. Compound files provide two similar
abstractions: storage objects, which are analogous to file-system directories,
and stream objects, which are similar to files. A storage object can contain
other storage objects and streams. A stream contains the equivalent of a
typical file's data. The compound file manages these two logical abstractions
and places the data into a single file in the file system.
Structured storage is a powerful feature provided with OLE2. Most of the OLE
SDK information about compound files discusses the use of structured storage
by OLE client and server applications to store linked and embedded objects in
a compound document. It is important to point out, however, that applications
can take advantage of this technology without being an OLE client or server
application. 
With the functions presented in this article, you can use compound files to
replace and enhance the initialization file functions provided with Windows.
Think of these functions as one example of using compound files for persistent
storage. With a little more work, the sample code provided could be made
generic enough to use for any type of data an application needs to save. Think
of OLE's structured storage as a general-purpose data storage that allows you
to name your data for later access. The storage system will manage the details
of allocation and fragmentation within the file for you.


Initialization-File Functions


A Windows initialization (.INI) file is an ASCII file containing a hierarchy
of information used to store application settings. The file is divided into
sections noted by [section name]. Each section contains entry=value
associations. A typical .INI file fragment is shown in Figure 1.
The Windows API provides several functions used to read and write information
from an .INI file. My example provides clones for the standard
GetPrivateProfileInt, GetPrivateProfileString, and WritePrivateProfileString
routines. My functions store the data in a compound file instead of an .INI
file. The advantages this provides are varied. For one, compound files are
binary files. Storing application settings in a binary format makes it
difficult for users to change the values without being in the application.
Another advantage is more flexibility in the data that can be stored. Normal
.INI files truncate trailing spaces in a value, and they don't allow line
feeds in an entry's value. None of these limitations exist when using compound
files. The binary nature of compound files also allows the development of
extensions to initialization functions. For example, a function could be
written that would serialize C++ objects into an entry's stream. If an
application saved its last window position, it could use the extended function
to save a rectangle object in the stream and then restore it at start-up. This
saves converting a rectangle to a string, writing it to an .INI file, reading
it in, parsing the string, and turning it into a rectangle object.


The Clone Functions


The complete implementation of the system presented here is provided
electronically; see "Availability," page 3. These source listings provide
three C functions to emulate the initialization functions provided by Windows.
The functions CxGetPrivateProfileInt, CxGetPrivateProfileString, and
CxWritePrivateProfileString are identical to the Windows routines, both in
their parameter list and their operation; the only difference is that the data
is stored in a compound file instead of an ASCII .INI file. This different
persistent format is completely transparent to the application using the
functions. I've provided a C interface to make it easy to use these functions
with existing code.


Classes


I used Visual C++ 1.5 and MFC 2.5 for the implementation of the new functions.
MFC provides some excellent classes for both OLE and standard abstract data
types that made the work much easier than writing straight to OLE.
Specifically, I used the MFC COleStreamFile class to write an entry's value to
the compound file. I found that the MFC class COleDocument and its derived
classes did not exactly provide the functionality needed to implement the
profile functions. The COleDocument class is very much directed towards OLE
client and server applications, and many of its member functions and variables
are not applicable to this example. 
There are two classes provided in this example for implementing the profile
functions: CxOleDocFile and CxOleStorage; see Listing One. The CxOleStorage
class provides an encapsulation of OLE's IStorage interface. This class was
modeled after the MFC COleStreamFile class and its relationship to OLE's
IStream interface. The CxOleDocFile class abstracts concepts applicable to OLE
compound files and implements methods for creating and opening compound files.
The term DocFile is the historical term for compound files and is used
throughout the OLE APIs.
The profile functions use the compound-file, storage, and stream abstractions
to provide the information hierarchy that is present in an .INI file. A
compound file corresponds to the .INI file. Storages are created in the file
to represent sections. A stream is used for an entry, and the contents of the
stream represent the value for the entry. You can consider these classes
provided as extensions to MFC. Your application must be an MFC application to
use them. To include the functions, you can add the two source files provided
to the application's project. Also, be sure to include a call to AfxInitOle in
your application's InitInstance method.


Writing Profile Data


To write a string, call CxWritePrivateProfileString with a section name, entry
name, value, and filename as parameters (see Listing Two). This function
instantiates a CxOleDocFile class and calls OpenDocFile to open the file. If
the file does not exist, CreateDocFile is called to create the file. Both of
these methods look in the Windows directory for the file if the filename does
not contain a fully qualified path. If the filename is fully qualified, it
will be created or opened from the specified location. Next, the
WriteProfileString method is called to do the work of writing the new setting
to the file.
WriteProfileString has a fair amount of work to do in order to honor the
various ways the parameters can be specified. If the section name, entry name,
and value are all valid strings, the data is written to the file. If the entry
name pointer is NULL, the section and all of its entries are removed. If the
value pointer is NULL, the entry is removed from the file. While this method
does not throw an exception, it does use exception handling to improve its
robustness. 
WriteProfileString first checks the entry-name parameter. If this parameter is
NULL, the section is deleted from the file by telling the root storage object
to destroy the element specified by the section name. The
CxOleStorage::DestroyElement method calls the IStorage::DestroyElement method
to remove the storage from the compound file.
If an entry name was specified, the section name is used to open or create a
storage in the file. Next, the function checks the value parameter. If this
parameter is NULL, the DestroyElement method is called on the section storage
to remove the entry's stream from the file. Otherwise, a COleStreamFile object
is instantiated with the same name as the entry. The value string is then
written to the stream, and the stream is closed. It is also important to close
the section storage and flush the root storage so that everything is written
to disk.


Reading Profile Data


Reading values from the compound file works similar to writing data. The
function CxReadPrivateProfileString is used to read a string value, and
CxReadPrivateProfileInt is used to read a 16-bit integer value from the file.
These functions create a CxOleDocFile object and invoke either the
ReadProfileString or ReadProfileInt methods. 
The ReadProfileString method first opens a storage with a name specified by
the section name. After the storage is opened, the entry parameter is checked.
If it is NULL, the behavior of this method is defined to enumerate all of the
entries within this section and return them in the return buffer. Each entry
name is null terminated, with the final string ending in two null-terminating
characters. If the parameter is not NULL, then a stream is opened with the
name specified by the entry parameter. The contents of the stream are read in
and copied to the return buffer. If a stream does not exist with the specified
name, the default value is copied to the return buffer.
The ReadProfileInt method works in a fashion similar to that of
ReadProfileString. Instead of returning a null-terminated string, it reads in
the contents of an entry's stream and treats it as a 16-bit unsigned integer.
If the entry stream is not found, the specified default integer is returned.



Performance Characteristics


I was not sure what to expect in terms of performance for compound files. To
get a feel for the performance of these files, I wrote a sample application
that uses the C functions and some of the methods on the classes; these
functions are available electronically. 
The first thing you can do with the sample program is convert an existing .INI
file to a compound file. The sample copies the selected existing .INI file to
your Windows temp directory. It then creates a compound file in the temp
directory and calls the LoadFromIni method on the CxOleDocFile class. The
amount of time the conversion takes is shown on the screen. After a test .INI
file and a test compound file are created, you can perform several tests in
parallel on the two files. The test will use the Windows API calls to access
the .INI file and the new functions to access the compound file. The time each
of these operations takes is shown on the screen. The sample provides the
following tests: get all entries from both files, change each entry in each
section to be one byte shorter, and change each entry to be one byte longer.
The easiest comparison was between the file sizes of an .INI file and a
compound file. The smallest .INI file on my PC was 20 bytes. The equivalent
compound file was 2560 bytes. My WIN.INI was the largest at around 30 Kbytes.
The converted file was 150 Kbytes. If you need the smallest possible file,
compound files are probably not the way to go.
The next tests were designed to compare read, write, and access times to the
data stored in the files. When I first ran the test, the times on the compound
files were horrible. To improve performance on storages, I changed from the
direct mode to the transacted mode. This change improved performance, but the
time to access all of the entries was still much slower than that for .INI
files. I stepped through some code in the debugger to see what was taking so
long. I found that the code was reading values quickly, but the disk light
came on for long periods when the compound file was closed. More stepping
showed that the call to the Commit method on a storage was the culprit. I was
opening the storage in read/ write mode, and even though the code never wrote
to the file, this function took a long time to execute. I changed the code to
open the file in read mode when accessing entries, and the speed test improved
dramatically.
The next big performance difference to tackle was the write times for updating
entries. Write times were significantly slower than .INI files. Applications
usually write to .INI files in short bursts. Because of this, Windows caches
an .INI file. When a write occurs, the cache is updated in memory and written
to disk. Because a compound file is much larger, I really did not want to
cache the entire file. Instead, I decided to keep the compound file open
between calls when writing information. If a call to write information is
writing to the same file as the previous call, the compound file is already
open, and the update happens very quickly. If you want to close the file, call
CxWritePrivateProfileString with NULL for all parameters except the filename.
You can also use these same parameters on the Windows call to make it refresh
the cache for an .INI file. After adding the code to keep the file open, the
performance for compound files was reasonably close to that of .INI files.


Extensions


Compound files such as my .INI replacements are just one example of using
OLE's structured storage. There are several other areas where this storage
model can be useful. In a C program, compound files can be very useful if your
application needs to store different types of structures. You can assign
storage and stream names to access the data and let the storage system
allocate and reclaim space in the file.
If you are using C++, modify the Cx-OleDocFile class to be a container of an
abstract class CxOleDocFileItem. Derive new classes from the CxOleDocFileItem
class to hold different types of data for your application. When you
instantiate these derived classes, assign a storage name to them and add them
to the compound-file class. Add a Checkpoint method to the compound-file class
that goes through all of its contained items, then serialize them into a
COleStreamFile if they have been modified. This is very similar to the way MFC
uses compound files, but without a lot of extra overhead. This is appropriate
if you just want to use structured storage.
Structured storage could also be made available to Visual Basic applications
by putting the code into a DLL and providing the appropriate APIs. This would
extend some very powerful functionality to your VB applications. There are
more features with structured storage that I have not covered here. The model
provides for nested transactions with complete program control over committing
or reverting the transactions. There are also functions for moving, copying,
and renaming storages and streams in the compound file. For any Windows
application that has nontrivial data to store, OLE's structured storage should
prove to be a very handy system to use.
Figure 1: Typical [settings] section within a Windows .INI file.
[Settings]
Font=Arial
Size=12

Listing One 
// Classes for managing OLE compound files and storages within a file.
//----------------------------------------------------------------------------
// CxOleDocFile
class CxOleDocFile : public CObject
{
 DECLARE_DYNAMIC(CxOleDocFile)
// Constructors and Destructors
public:
 CxOleDocFile ();
 ~CxOleDocFile ();
// Operations
 // Create a compound file.
 BOOL CreateDocFile (const char * pszFilename,
 DWORD dwOpenFlags = CX_CREATE_DOCFILE_DEFAULT,
 CFileException * pError = NULL);
 // Open a compound file.
 BOOL OpenDocFile (const char * pszFilename,
 DWORD dwOpenFlags = CX_OPEN_DOCFILE_DEFAULT,
 CFileException * pError = NULL);
 // Get integer data from file.
 UINT GetProfileInt (const char* pszSection, const char* pszEntry,
 int iDefault, CFileException* pError = NULL);
 // Get string data from file.
 int GetProfileString (const char* pszSection, const char *pszEntry,
 const char* pszDefault, char* pszRetBuf, int cbRetBuf,
 CFileException* pError = NULL);
 // Get all of the sections.
 BOOL GetSections (CStringArray& rgSectionNamesRet);
 // Load the contents of an ini file into a compound file.
 BOOL LoadFromIni (const char * pszIniFilename);
 // Write a string to the doc file.
 BOOL WriteProfileString (const char * pszSection,
 const char * pszEntry, const char * pszValue,
 CFileException * pError = NULL);
// Implementation
public:

 virtual void Close ();
 virtual void Flush ();
#ifdef _DEBUG
 virtual void Dump(CDumpContext&) const;
 virtual void AssertValid() const;
#endif
protected:
 CxOleStorage * m_pRootStg;
}; // end class CxOleDocFile
//----------------------------------------------------------------------------
// CxOleStorage
class CxOleStorage : public CObject
{
 DECLARE_DYNAMIC(CxOleStorage)
// Constructors and Destructors
public:
 CxOleStorage (LPSTORAGE lpStorage = NULL);
 ~CxOleStorage ();
// Operations
 // Create a new storage
 BOOL CreateStorage (LPSTORAGE lpParentStg, const char * pszName,
 DWORD dwOpenFlags = CX_CREATE_STORAGE_DEFAULT,
 CFileException * pError = NULL); 
 // Delete an storage or stream from the file.
 virtual BOOL DestroyElement (const char * pszName,
 CFileException * pError = NULL);
 // Enumerate the elements in a storage and return the ones
 // that match the specified type.
 BOOL EnumElements (enum tagSTGTY tyElem, CStringArray& rgNamesRet);
 // Open a storage.
 BOOL OpenStorage (LPSTORAGE lpParentStg, const char * pszName,
 DWORD dwOpenFlags = CX_OPEN_STORAGE_DEFAULT,
 CFileException * pError = NULL);
// Implementation
public:
 virtual void Close (); // May raise exception.
 virtual void Flush (); // May raise exception.
#ifdef _DEBUG
 virtual void Dump(CDumpContext&) const;
 virtual void AssertValid() const;
#endif
protected:
 friend class CxOleDocFile;
 LPSTORAGE m_lpStorage;
 BOOL m_bCloseOnDelete;
}; // end class CxOleStorage



Listing Two

// Just like the Windows APIs except a slightly different name.
UINT CxGetPrivateProfileInt (LPCSTR lpszSection, LPCSTR lpszEntry,
 int iDefault, LPCSTR lpszFilename);
int CxGetPrivateProfileString(LPCSTR lpszSection, LPCSTR lpszEntry,
 LPCSTR lpszDefault, LPSTR lpszReturnBuffer, int cbReturnBuffer,
 LPCSTR lpszFilename);
BOOL CxWritePrivateProfileString (LPCSTR lpszSection, LPCSTR lpszEntry,
 LPCSTR lpszString, LPCSTR lpszFilename);































































October, 1994
PROGRAMMING PARADIGMS


Mind and Life as Mechanism




Michael Swaine


From time to time in this space I attempt to justify to myself my four years
of undergraduate and three years of graduate study of the human mind. I
suppose I should be satisfied with the deep insights into human nature that my
education has given me, if only I knew what they were. Or maybe I should just
put it all behind me as the transient obsession of a youth wasted hanging out
in coffeehouses among poets and social workers. After all, Jerry Pournelle
actually finished a doctorate in psychology and doesn't feel the need to
inject the stuff into his Byte columns.
But I do feel the need to inject the stuff here, and in the past, I have
injected several critiques of books on what could be called the "mechanical
model of the mind." I'm at it again this month, although this time I offer two
points in my defense: 1. The specific model I discuss here was concocted by a
real computer scientist who has implemented at least a part of it in a real
computer program; and 2. toward the end of the column I execute a slippery
segue into a completely different subject, a discussion of a real commercial
computer program that embodies an interesting programming paradigm or two.


The Mind is Software


The mechanical model is simply this: The brain is a computer, and the mind is
its software.
To many people, this notion is unpalatable. Certainly, anyone with traditional
religious beliefs (almost any tradition) should be uncomfortable with it. But
it's possible to be skeptical about the mechanical model and also to be
agnostic with respect to any religious beliefs about the mind. Many
nonreligious psychologists and philosophers, and some computer scientists, are
mechanical-model skeptics.
What isn't so easy, apparently, is to come up with an alternative model of the
mind that has an equivalent level of scientific rigor. A lot of the critics of
the mechanical model (including Hubert Dreyfus and John Searle) only attack
it, without offering any model of their own.
There are exceptions. In The Emperor's New Mind (Oxford University Press,
1989), Roger Penrose has presented an approach that rests ultimately on
quantum uncertainty. Penrose's approach is brave, because it's open to easy
ridicule: Drawing on quantum uncertainty to explain the workings of the mind
can seem like an act of scientific desperation. I've discussed Penrose's
approach here before, and my view is more generous: I'm willing to believe
that the questions we want to ask about the mind may be the sort of questions
that can only be answered by extraordinary, credibility-challenging answers.
Maybe. But Penrose has the burden of demonstrating that his theory has any
clear scientific advantage over the simpler, mechanical model. It's not clear
that he can do it.


Enter David Gelernter


Like Penrose, David Gelernter is a mechanical-model skeptic. He has also
presented a model of the mind that challenges the mechanical model in his The
Muse in the Machine (The Free Press, 1994). But Gelernter's model does its
work without also challenging credibility. And Gelernter is not a psychologist
or a philosopher, but a computer scientist, the inventor of the distributed
programming language Linda, and a leading light in programming for parallel
architectures. And Gelernter is actually building a program that embodies his
model of the mind.


The Turing Test is a Black Hole


It is arguable that this mechanical model is not really a theory so much as a
choice of research instruments. Acting as though the mind were the brain's
software lets psychologists use the computer as a tool for doing research into
mental processes, and they've been doing that for more than 20 years. In Human
Associative Memory (V.H. Winston & Sons, 1973), psychologists John Anderson
and Gordon Bower indicated how widespread, and how useful, computer
simulations had already become in psychological theory:
The various neo-associationist theories of memory..., including our own, have
been cast in the form of computer simulation models.... This is no accident.
The task of computer simulation simultaneously forces one to consider both
whether his theory is sufficient for the task domain to be simulated and also
whether it can deal with the particular trends found in particular
experiments.
But these guys are talking about particular simulations of particular aspects
of the mind. If we consider the mechanical model itself as a theory, is it
really specific enough to generate any testable predictions?
Testing the mechanical model does seem to present some problems. A lot of the
questions we want to ask about the mind don't immediately lead to critical
experiments that could demolish the model as a theory of mental organization.
The really interesting questions are often as vague as they are interesting.
And so, somehow, attempts to test assertions from some mechanical model
regarding the workings of the mind often lead to some sort of Turing test, and
the Turing test never proves anything of scientific interest.
Here's how all questions about the mind tend to get sucked into the Turing
test, as though it were a black hole, whenever they approach the mechanical
model:
There's precious little that we might consider the mind capable of doing that
we can't convince ourselves that software can also do, in principle. The mind
doesn't do any better with uncomputable problems than a computer does. And if
a mind or a computer program fails to solve a computable problem, it's
arguable that the failure was a practical one having to do with available
resources (including time) rather than a fundamental limitation.
So the question morphs into one of not whether but how problems are solved.
Certainly the mind doesn't do math, for example, the way Mathematica does. But
a more meaningful question is, "Can we write a program that does math the way
the mind does?"
But that's basically a programming challenge. Are you a good enough programmer
to write a program that simulates some aspect of the operation of the human
mind sufficiently well to meet some kind of Turing test?


Problem Solving is not the Problem


Some would argue--Gelernter for one--that this is simply not the point. 
It is a common view that there are two modes of thought, Gelernter says: the
rational, problem-solving, goal-directed mode, and the creative, intuitive,
emotional mode. Gelernter spends much of his book describing these two modes
of thought. He calls the rational mode "high-focus," and the emotional mode
"low-focus," for a reason I'll explain momentarily.
All existing computer models of the mind, he argues, tackle only the
high-focus mode. The reason that research questions about the mind get sucked
into the Turing test is that high-focus thinking places such emphasis on
problem solving. Can the mind solve such-and-such a problem? How does the mind
solve such-and-such a problem?
But a lot of thought is not problem solving. Particularly low-focus thought.
So should we consider a model of this other mode of thinking? A low-focus
model, or perhaps a dual-mode model? Some (such as psychologist Endel Tulving)
have proposed this, but Gelernter thinks it's a bad idea.
Gelernter thinks that what we've got is not really two discrete modes of
thought, but a continuum. This is, in fact, his central thesis. He calls it
the "continuum focus," and uses the terms high- and low-focus to describe the
ends of the continuum. By focus he doesn't mean focus of attention, although
that's close to his meaning. Instead he's talking about how detailed your
perception is. High-focus thought looks at aspects or attributes of a scene or
a phenomenon or a memory. For high-focus thought, the usual sorts of
associative models make sense: Things get recorded in memory and are retrieved
from memory on the basis of their attributes. Connections get made on the
basis of attributes. Things that have many attributes in common are more
likely to call each other up from memory. Chains of thought will tend to be
made up of ideas that are close in meaning, in the sense that they share many
attributes. For high-focus thought, the usual associative mechanical models
are more or less the correct story.


Feelings are the Glue of Thought



But then there's low-focus thought. Here, entire scenes get stored away in
memory, uninterpreted, with trivial details and coincidentally occurring but
logically unrelated events getting as much importance as crucial attributes.
Even the feeling you were experiencing when you laid down a memory trace gets
stored away with it. Low-focus thought deals with information in a different
form: as a large, uninterpreted chunk of perception. If it were a data type,
it would be a BLOB (binary large object). Low-focus thoughts don't have
addressable attributes.
This kind of memory storage clearly requires a different kind of retrieval
mechanism. If high-focus thoughts are retrieved on the basis of their
attributes, that won't work for low-focus thoughts, which are stored as
uninterpreted BLOBs. If they are to be retrieved at all, and if there is to be
any way of associating one with another, they need to be tagged in some way.
Gelernter proposes feelings as the tagging mechanism. The emotional state you
were in when you experienced the event or thought gets attached to its
representation in memory. No internal detail is accessible. Only the emotion
is available to use for retrieval or for associating such memories. So two
low-focus thoughts that have the same emotional tag have something in common.
One can call up another. If you are now in emotional state X, it is easier to
access memories that have state X as their emotional tag, which is to say,
memories of thoughts or events that occurred when you were also in emotional
state X. Emotion is the glue for low-focus thoughts.
Gelernter makes clear that he's not just talking about the kinds of emotional
states for which we have common adjectives: happy, sad, jealous, disgruntled.
He's imagining a much richer and subtler palette of feelings. The feeling of
satisfaction from solving a problem. The different satisfaction of hitting a
nail squarely. The still-different satisfaction that comes at the end of a big
sneeze. Gelernter is thinking in terms of a whole lot of distinguishable
feelings.
He is also thinking in terms of this continuum. Thoughts are not typically one
or the other, high focus or low focus; most thoughts are somewhere in between.
Focus is a continuum, Gelernter claims.
Gelernter presents many examples designed to show that focus must be a
continuum, but he does something else as well. He argues that you can see the
continuum in the development of the individual, with children operating in a
more low-focus mode than adults, and in the development of the human mind over
the centuries, with early texts showing a low-focus worldview. This last point
becomes important when he draws upon ancient texts for support, something I'll
get into shortly.
And Gelernter makes another interesting claim about low-focus thought and
feelings, a claim that makes him a skeptic about the mechanical model. We
can't model low-focus thought as the software that runs on the computer that
is the brain, he says, because feelings do not reside strictly in the brain.
How we feel is as much a function of glandular secretions and other bodily
states as it is of brain states. The mind doesn't live in the brain, it lives
in the body as a whole.


The Feeling Program


Having claimed that a computer model is impossible, Gelernter proceeds to
build one. Here's how he does it: His computer model cheats. He freely admits
this fact. What he does is feed ready-made emotions into the program. He tells
it how it feels about things, bypassing the need for a body to resolve these
matters.
Gelernter's program is called "FGP," short for its primary operations: Fetch,
Generalize, and Project. It embodies the kind of memory storage and retrieval
that his low-focus and high-focus memories require. It's still in early
development, so there are not a lot of results to report, apparently.


A Critique of Pure Feeling


But Gelernter's program does not appear to be equipped to test Gelernter's
theory.
It's a theory that presents some difficulties in terms of testing. For one
thing, there are all those different emotions. Unless Gelernter presents a
model for how emotions are generated or classified or related to one another,
these emotional states are just so many independent variables in the theory.
Too many independent variables for the theory to be testable, I'd think. And
he doesn't present a model for these emotions, unless the single-dimension
numbers he uses in one example are to be taken seriously.
There are a number of other unanswered questions about emotions.
Given some satisfactory answers to these questions, does Gelernter present any
means for testing his claim that emotion is the glue for thoughts? It should
be testable--but he doesn't present any test of it.
Gelernter's central thesis is this idea of a continuum, but it may not be as
easy as Gelernter thinks to distinguish between a model involving two
processes and one involving a continuum. Note that the normal distribution, a
continuous model, does a fine job of predicting runs of heads in coin
flipping. One specific continuous model versus one specific discrete one, yes,
but even if Gelernter had a specific continuous model, he'd have to demolish
all reasonable discrete ones to establish support for his continuous one.
To be fair, Gelernter doesn't try to prove anything scientifically about his
model in his book. When he gets to the point where you would expect to see
results of tests, he launches into a literary exegesis. I confess that this
mystifies me.
His analysis of a Biblical story about Abraham and circumcision is way over my
head, but his conclusions do seem to hinge on the assumption that we know what
emotions would have been stirred centuries ago in the average Jew by the idea
of circumcision at birth, as opposed to circumcision at puberty. This exegesis
is supposed to support the idea that primitive thought was more emotion-based
than present-day thought, and this in turn is supposed to support the idea of
a continuum of modes of thought, from emotion based to rational. I don't see
how.
This literary approach seems to me capable of "proving" anything. For example,
take the Arabian Nights.


The Thousand and One Theories


The most salient aspect of the Arabian Nights is its structure of stories
within stories. No commentator on the Nights writes at any length about it
without touching on this obvious and seemingly important fact. Stories aren't
written this way any more, but in the time of the Arabian Nights, it was
common.
What does this blatant difference in the structure of narrative tell us about
modes of thought in primitive and modern times? Is it possible that nested
narratives are easier to remember than sequentially presented narratives? In a
time when stories were passed on via oral tradition, this would have been
crucial to the survival of the stories.
Which leads us to postulate the psychological hypothesis that hierarchical
structures for things like narratives aid in their recall from memory.
The point is that you can pick any piece of literature at random and do this
kind of speculative stuff.
That doesn't mean that it's not useful if followed up on. I actually did a
study in graduate school that showed that recall for a certain type of
narrative material was better when that material had a hierarchical structure.
Sounds more relevant than it was: Since I was looking at the structure within
a single narrative rather than a structure, like that of the Arabian Nights,
that ties narratives together, my results don't really have anything to do
with the Arabian Nights question. Except this: They do show that it is
possible to formulate testable conjectures about the way the mind works on the
basis of a critical reading of ancient texts.
It seems to me that this is exactly what Gelernter fails to do, and this is
why I find his arguments ultimately unconvincing.


Life is Software


Gelernter rejects the mechanical model because emotions are part of the work
of the mind, and emotions depend on the whole body rather than just on the
brain. So the formulation, "brain=computer and mind=its software" can't be
right.
Well, why not just extend the formulation: "body=computer and mind=its
software"? This could be called the "mechanical model of life," and looks like
the assumption underlying artificial-life research.
If you aren't up to speed on "a-life," a good place to start is with an
entertaining tool by Rudy Rucker called Artificial Life Lab, published by the
Waite Group.
The Waite Group has been publishing a lot of book-and-disk packages in trendy
areas like fractals, morphing, and a-life, and while they're all pretty
entertaining, most are of little real interest to developers. This package,
based on work that Rucker did while at Autodesk, should be of interest to
anyone. Although he doesn't give you a language to work in, he does provide
enough technical detail about the a-life productions this program generates to
serve as a solid introduction to the subject.
This isn't just cellular automata spreading patterns across a grid. Rucker's
Boppers program lets you create colonies of critters, snip their DNA, and
fiddle with their sexual habits, muck around with diet and death, and do
infinite tweaking of the supplied algorithms. This in addition to watching
cellular automata spread patterns across a grid. Rucker's chapter on theory is
as clear an introduction to the subject as I've seen.
Oh, and it's a lot of fun.













October, 1994
C PROGRAMMING


Lexical Scanning and Symbols




Al Stevens


This month I'll look at the lexical scan that Quincy uses to build a run-time
interpretable token stream and at the symbol-table process. Quincy is
operational: I finished the tutorial book it was designed to support and
integrated its tutorial mode with the book's exercises. I'll devote the next
several columns to the completion of the project.


On and Off the Road Again


I'm back at home after a two-week tour of the Midwest, playing the piano at
jazz festivals and concerts. I always figure I'll get a lot of day-gig work
done on the road by taking the laptop. I've got one of those converters that
you plug into the car cigarette lighter to get household current. The idea is
to pound out columns, books, and software while Judy drives. (Dream on, Al.)
That converter would heat a small stadium. Luckily, the minivan has an
adequate air conditioner. Judy has been working on her family tree, so between
the two of us, we got a lot of computing mixed in with the commuting. She uses
a program called "Family Tree Maker for Windows," and that is one slick
program. Being a devoted DOS user, Judy reluctantly switched to Windows
because she hit the wall on FTM's DOS version. It holds only about 1200
people, and there are a lot more Stauffers than that in Ringtown,
Pennsylvania.
Upon our return, I hooked the laptop into the network to upload all our road
work to the desk machines. I have one of those dongle-like Ethernet adaptors
that plugs into the laptop printer port. Without it I'd have to move files
around on diskettes. Since I was in a hurry, Murphy's law kicked in, and the
network device's AC/DC adaptor fell apart when I plugged it into the wall. The
tiny ends of the transformer winding were sheared off and too short to solder.
A search of the workshop turned up no other adaptor that delivers 12 volts DC
and 500 milliamps. The garage, however, provided the solution. I am now making
high-speed file transfers across my high-tech network through a
state-of-the-art network adaptor powered by a 30-year-old automobile battery
charger/eliminator. Greasy, dented, bigger than a hatbox, with bug-eyed volt
and amp meters, fat cables, and brass clamps, it squats on my desk next to all
the fancy stuff and proudly does what it does best--delivers a well-regulated,
flat, clean, 12 volts that you could weld with.


Lexical Scanning


A language translator has several processes. When the language is C, the first
process is the preprocessor. Not all languages have preprocessors; the
operations served by C's preprocessor are often built-in operators in other
languages. C's preprocessor converts C code and preprocessing directives into
C code ready to be translated. I discussed Quincy's preprocessor in the June
and July issues.
The first part of translation beyond preprocessing is the lexical scan, which
reads the source code and translates it into tokens--single-character codes
that represent discrete language elements. Subsequent translation operates on
this stream of tokens. When the translator is an interpreter, the token stream
is what the interpreter reads to execute the program.
Tokens represent identifiable language elements. The lexical scan parses the
source code from beginning to end, extracting code fragments and translating
them into tokens. The four discrete parts of C code are keywords, operators,
constants, and identifiers. When the interpreter is integrated with a
debugger, the token stream must also identify line numbers from the original
source code.


Syntax Checking


The main purpose of the lexical scan is to reduce source code into smaller,
more easily interpreted code. The code's grammatical correctness is the
concern of a later process. Compilers do that in the code-generation process
when they compile tokens into executable code. An interpreter such as Quincy
involves some mix of compile-time and run-time grammatical-syntax checking. In
general, the lexical scan determines only that each element of code is
translatable into a language token without respect to its grammatical context,
but there are exceptions. For example, the colon character has four uses in
the C language. It terminates statement labels; terminates case expressions;
serves as the second, else delimiter of the ?: conditional operator; and may
appear in string and character constants. (Its potential appearance in
comments is unimportant in this context because the preprocessor strips
comments from the source code.) The same thing applies to the dot, which can
be part of a floating-point constant, the structure-member dereferencing
operator, or one-third of the ellipse token. The lexical scan recognizes and
translates constants, ellipses, and statement labels, so it must do some
contextual analysis of the source code.


The tokenize Function


Listing One is scanner.c, Quincy's lexical scanner, which consists of the
tokenize function and some local functions. The tokenize function accepts two
pointer arguments. The first points to a buffer to receive the token stream,
and the second, to the preprocessed source code. The scanner reads the source
code a character at a time and determines which token to build, based on the
character's value. First the scanner calls the FindOperator function to see if
the current character and the next one constitute a two-character operator,
such as the != not-equal operator. That function returns the token when it
finds such an operator and 0 when it does not. If FindOperator returns a
token, the scanner copies it to the token stream. If the token is one of the
shift operators, the scanner tests to see if it is followed by an equal sign,
which signifies a shift-assignment operator. If so, the token is modified
accordingly. 


Newlines and Line-Number Information


The scanner recognizes the newline character to identify the file and line
number for the debugger. The preprocessor inserts a newline and comment for
each nonblank source-code line. The comment has the format /*01:02*/, where
the first value is the file number of the source-code file, and the second is
the line number within that file. The preprocessor assigns a file number to
each included file and puts the filename in a global table that the
interpreter and debugger can use to report errors to the programmer. The token
stream for a newline contains a T_LINENO token followed by two unsigned
character integers containing the file and line numbers.


String Constants


A double quote in the source code indicates the beginning of a string
constant. The scanner copies the T_STRCONST token to the token stream,
followed by the length of the string and the null-terminated string value
itself. The scanner remembers that it just processed a string constant so that
if another one immediately follows, it can properly concatenate adjacent
strings. Translation of the string calls the uncesc function for each
character, which returns the character value or its value as represented by a
backslash-escape sequence.


Character Constants


Character constants are like string constants except that character constants
occupy one character position. The scanner builds a character constant when it
sees an apostrophe, inserting a T_CHRCONST token and the character-constant
value into the token stream. The scanner gets the character value from the
unesc function, which decodes escape sequences.



Operators


The scanner converts C operators into their token equivalents, which are the
same as the operators themselves. If the operator is one of those that can
combine with the equal sign to form an assignment operation (+=, --=, etc.),
the scanner puts a true value into the op variable. Then, if the next
character in the source-code stream is an equal sign, the scanner sets the
operator's most significant bit, which identifies the operator as an
assignment operation. 


Dots, Braces, Colons, and Question Marks


If a dot is followed by two more dots, they are converted into the T_ELLIPSE
token. If the dot is followed by a digit, however, the scanner calls the
fltnum function to build a floating-point constant.
The scanner counts and balances pairs of left and right braces and then copies
them into the token stream as their own tokens. It uses the brace count when
it builds statement labels, and it uses a brace count of 0 to know that it can
assume an external function declaration or definition.
The question mark is the if part of a conditional expression. The scanner
remembers that one has been seen so that it does not try to interpret a
subsequent identifier/colon pair as a statement label.


Numerical Constants


When the scanner sees a digit in the source code, it assumes a numerical
constant. It calls the intnum function to decide which kind of constant to
build into the token stream.
The intnum function scans the characters from the point of the digit until it
finds a nondigit character. If that character is a dot or an upper- or
lowercase "E," the constant is a floating constant, and intnum calls fltnum to
translate it. Otherwise, intnum looks at the first digit. If it is a 0 and the
next character is a digit, the constant is an octal number. If the first two
characters are 0x, the constant is a hexadecimal number. Otherwise, the
constant is a decimal integer. Depending on the range of the number and
whether or not it is followed by L, the constant is either short or long.
Accordingly, the program builds either the T_INTCONST or T_LNGCONST token into
the token stream followed by the constant's integer value of the appropriate
length.
The fltnum function builds a floating constant, which follows the T_FLTCONST
token in the token stream.


Keywords and Identifiers


An alphabetic character or an underscore means that the next language element
is either a keyword or an identifier. The scanner calls the FindKeyword
function, which returns a keyword's token if the text is a keyword, and 0 if
not.
If the text is not a keyword, it is either a variable, a function identifier,
or a statement label. If it is followed by a colon (and a case or conditional
expression is not being built), the identifier is a statement label, so the
scanner builds and installs a Variable structure for the label. The label
includes the current brace-nesting count, which helps the interpreter to find
its way to the label when it processes a matching goto statement.
If the text is neither a keyword nor a statement label, the scanner calls
AddSymbol to add the identifier to the symbol table and to get an integer
value to represent the symbol. The AddSymbol function adds the symbol to the
table and returns its integer value or, if the symbol is already in the table,
returns the previously assigned integer value. The scanner is not interested
in identifier scope or reuse. It needs only a unique integer offset for each
unique identifier.
An identifier can be either a variable reference, a function
declaration/definition, or a call to a function, for which the scanner assigns
the tokens T_SYMBOL, T_FUNCTION, and T_FUNCTREF, respectively.
The interpreter does not deal with actual identifiers. It uses the integer
offsets that the scanner assigns to represent unique identifier values. The
interpreter keeps the symbol table throughout the run so that the user can
name identifiers to watch, examine, and modify from within the debugger.
You might wonder why Listing One includes a preprocessing directive statement
that undefines isxdigit, forcing the compiler to use the function version of
isxdigit rather than the macro version. For some reason, the Borland C++
compilers (3.1 and 4.0) that I use to compile Quincy do not compile the macro
version correctly all the time.


Symbol Tables


Quincy uses five symbol tables during translation. Four of them are static;
they record library-function identifiers, keywords, multiple-character
operators, and preprocessor directives. The fifth is the dynamic table of
identifiers that the program declares.
Listing Two is symbols.c, the program that declares and manages symbol tables.
The tables are arrays of SYMBOLTABLE objects. SYMBOLTABLE is a structure
defined by the qnc.h header file. Listing Two declares and initializes the
static symbol tables. 
Each symbol table associates its symbols with an integer code. Library
functions are associated with codes that tell the interpreter which functions
to execute. Keywords and operators are associated with interpreter tokens.
Preprocessor identifiers are associated with a value that the preprocessor
uses to determine which directive to process. Declared identifiers are
associated with unique integer values that the scanner assigns in the order in
which the identifiers occur.
Symbol tables are maintained in identifier sequence to facilitate a binary
search on an identifier argument. The SearchSymbols function implements the
binary search and is followed by several specialized functions that call
SearchSymbols to search the individual tables. (The FindOperator function
mentioned earlier is one such function.) Each of these functions takes an
identifier as an argument and returns the code that matches the identifier or
0 if the identifier is not in the table. The FindSymbolName function takes an
identifier code and returns a pointer to the matching identifier. The debugger
uses this function to display function lists and matched variable names. The
AddSymbol function adds an identifier to the dynamic symbol table and returns
the code that the function associates with the new entry.


Run Time versus Compile Time


Quincy's lexical scanner and linker--the process that follows the
scanner--defers until run time some of the language translation and error
checking. This approach reflects the compromises that I made in the interest
of performance. Inasmuch as Quincy is an interactive interpreter, I want to
get the program running as soon as possible after the user makes some changes
and chooses the run command. Therefore, some of the operations normally done
by a compiler are done during run time by the interpreter. As a result, the
program does not run as efficiently as it would if things such as the
recursive-descent parser were earlier done by the translator. The performance
hit does not concern me; Quincy's purpose is to support an interactive
C-language tutorial, not to be a comprehensive software-development
environment. Sometimes I mull over design changes that would support object
linking, build stand-alone EXE files, optimize the run-time token
interpretation, or yield some other slick improvement. After pondering those
subjects for a while, it occurs to me that I would just be rebuilding Turbo C
1.0, and nobody needs that.


Book Report: The Zen of Code Optimization


It's supposed to be hip to name a book the "Zen" of something. The local
library lists over a dozen such books on as many subjects. The classic
standard-bearer of that title is Zen and the Art of Motorcycle Maintenance, by
Robert M. Pirsig, which had a lot to do with Zen and something to do with
motorcycle maintenance. Most of the others have very little to do with Zen and
a lot to do with the other part of their titles. The Zen of Code Optimization,
by Michael Abrash (Coriolis Group Books, 1994, ISBN 1-883577-03-9) falls into
that category. It is one of a series of Zen books from the publisher who dubs
his authors "Zen masters." This represents, I suppose, the other side of the
world--far away from a population of Dummies who talk about computers and
programming in spite of their doubts, lack of confidence, and low self-esteem.
It also reflects a trend among book publishers to create "lines" of books with
titles that are like one another, usually following an unexpected success.
Thus, we might see Advanced C++ Class Design for Dummies, Teach Yourself
Quantum Mechanics, and OLE2 Made Easy in 21 Days.
Zen is a school of Buddhism wherein enlightenment is attained through
meditation, self-contemplation, and intuition rather than through scriptures.
A Zen Buddhist would therefore conclude that such a line of contemporary
scriptures could not lead the reader anywhere near enlightenment. So, do not
buy this book expecting to learn about Zen or to attain perfect programming
enlightenment.
But, by all means, buy this book.
The Zen of Code Optimization is about writing fast code for the PC. More to
the point, it is about viewing code that you've written with an eye to
optimizing its performance. To optimize is to make as good as possible. What's
good? Fast? Cheap? Small? Portable? On time? Depends on who you ask. In this
case, only fast is good. Use the fewest processor cycles to do the same job.
Optimize for speed. The book takes the position that you, the programmer, are
the best speed optimizer. Only you can find the bottlenecks and only you can
widen them. When is speed important? Are all bottlenecks bad? Why worry about
cutting down on the cycles used waiting for a keystroke? Where in your program
is valuable time wasted? This book encourages you to ask and answer such
questions and does so by example. It starts with a program that works but that
is not as fast as it might be. Then it moves the program through successive
performance enhancements, explaining each time, the implication of the change,
why it speeds things up, and what there is about the earlier version that
should call your attention to the change. That's the important lesson. Learn
how to spot the performance bottlenecks. Can you remove them without
compromising the program's readability and maintainability? If not, is it
worth it?
To help spot the bottlenecks, Abrash provides a timer program written in
assembly language that uses the PC's 8253 timer chip to measure the
performance in C programs. The C program links with these functions and calls
them to record the elapsed times of processes that you want to measure. A poor
man's profiler, of sorts. 
The book teaches about exploiting hardware, using assembly language where it
makes sense, and not using assembly language where it doesn't matter; it also
includes several chapters on the Pentium. There are examples that optimize the
game of Life, the Boyer-Moore string-searching algorithm, and much more. The
book includes a diskette. Perhaps you will use the code in a project, perhaps
not. The important contribution is not so much Abrash's code, which is
certainly good, but his influence on your view and attitude toward your own
code. When a book keeps you thinking well after you are finished reading it,
then it has served you well, and this is such a book.


Listing One 


/* --- scanner.c - Quincy's lexical scanner --- */
#include <stdio.h>
#include <stdlib.h>
#include "qnc.h"
#undef isxdigit

static int uncesc(char **);
static void fltnum(char **, char **);
static void intnum(char **, char **);

/* --- Convert C in srcbuf to tokens in tknbuf --- */
int tokenize(char *tknbuf, char *srcbuf)
{
 char *start, *laststring = NULL, *cp, c, c2, c3, op;
 char buf[8];
 int i;
 int BraceCount = 0;
 char *tknptr = tknbuf;
 int sawCond = 0;
 int sawCase = 0;
 while (*srcbuf) {
 /* --- search for 2-char C operators --- */
 if ((i = FindOperator(srcbuf)) != 0) {
 srcbuf+=2;
 if ((i == T_SHL i == T_SHR) && *srcbuf == '=') {
 srcbuf++;
 i = 0x80; /* encode op= operator */
 }
 *tknptr++ = i;
 continue;
 }
 c = *srcbuf++; /* next src code char */
 c &= 0x7f;
 op = 0;
 c2 = *srcbuf; /* lookahead 1 */
 c3 = *(srcbuf+1); /* lookahead 2 */
 if (c != '"' && c != '\n')
 laststring = NULL;
 switch (c) {
 case '\n': /* File/Line */
 /* _____________
 * T_LINENO 
 * _____________
 * fileno (byte)
 * _____________
 * lineno (word) 
 * _____________ */
 handshake(); /* keep D-Flat clock ticking */
 *tknptr++ = T_LINENO;
 Ctx.CurrFileno = atoi(srcbuf+2);
 *tknptr++ = (unsigned char) Ctx.CurrFileno;
 srcbuf = strchr(srcbuf, ':');
 Assert(srcbuf != NULL);
 srcbuf++;
 Ctx.CurrLineno = atoi(srcbuf);

 *(int*)tknptr = Ctx.CurrLineno;
 tknptr += sizeof(int);
 srcbuf = strchr(srcbuf, '/');
 Assert(srcbuf != NULL);
 srcbuf++;
 break;
 case '"': /* string constant */
 /* ___________
 * T_STRCONST
 * ___________
 * length 
 * ___________
 * char(s) 
 * ___________
 * 0 
 * ___________ */
 if (laststring != NULL)
 /* ---- concatenated string ---- */
 tknptr = laststring+strlen(laststring);
 else {
 *tknptr++ = T_STRCONST;
 laststring = tknptr++;
 }
 while ((c = *srcbuf) != '"' && c)
 *tknptr++ = uncesc(&srcbuf);
 *tknptr++ = '\0';
 *laststring = tknptr - laststring;
 if (c)
 ++srcbuf;
 break;
 case '\'': /* character constant */
 /* ___________
 * T_CHRCONST
 * ___________
 * value 
 * ___________ */
 *tknptr++ = T_CHRCONST;
 *tknptr++ = uncesc(&srcbuf);
 /* --- Skip to delimiting apostrophe --- */
 while ((c = *srcbuf++) != '\'' && c)
 ;
 if (!c)
 --srcbuf;
 break;
 /* --- operators --- */
 /* ___________
 * op token 
 * ___________ */
 case '*':
 case '^':
 case '%':
 case '&':
 case :
 case '+':
 case '-':
 case '/':
 op = c;
 case '=':
 case '!':

 case '<':
 case '>':
 case '[':
 case ']':
 case '(':
 case ')':
 case ',':
 case '~':
 case ' ':
 case ';':
 /* --- single character operator --- */
 *tknptr++ = c;
 break;
 case '?':
 sawCond++;
 *tknptr++ = c;
 break;
 case ':':
 if (sawCond)
 --sawCond;
 sawCase = 0;
 *tknptr++ = c;
 break;
 case '{':
 BraceCount++;
 *tknptr++ = c;
 break;
 case '}':
 --BraceCount;
 *tknptr++ = c;
 break;
 case '.':
 if (c2 == '.' && c3 == '.') {
 *tknptr++ = T_ELLIPSE;
 srcbuf += 2;
 }
 else if (isdigit(c2)) {
 /*
 * floating pointer number.
 */
 --srcbuf;
 fltnum(&srcbuf, &tknptr);
 }
 else 
 *tknptr++ = c;

 break;
 default:
 if (isdigit(c)) {
 /* --- constant --- */
 /* ___________
 * T_INTCONST (or T_LNGCONST, 
 * ___________ T_FLTCONST, etc.)
 * value <- binary value of the
 * ___________ number. Number of
 * . bytes depends on type
 * ___________ */
 --srcbuf;
 intnum(&srcbuf, &tknptr);

 }
 else if (alphanum(c)) {
 /* --- identifier --- */
 start = cp = tknptr+2;
 --srcbuf;
 while (alphanum(*srcbuf))
 *cp++ = *srcbuf++;
 *cp++ = 0;
 if ((i = FindKeyword(start)) != 0) {
 /* --- keyword --- */
 /* ___________
 * key token 
 * ___________ */
 *tknptr++ = i;
 if (i == T_CASE)
 sawCase = 1;
 }
 else if (!sawCond && !sawCase &&
 *srcbuf == ':') {
 /* --- label for gotos --- */
 VARIABLE var, *lvar;
 NullVariable(&var);
 var.vkind = LABEL;
 var.vsymbolid = AddSymbol(start);
 var.vclass = BraceCount;
 lvar = InstallVariable(&var,
 &Ctx.Curfunction->locals, 0,0,1,0);
 lvar->voffset = tknptr - tknbuf;
 srcbuf++;
 }
 else {
 /* symbol, function declaration,
 prototype, or call? */
 FUNCTION *funcp;
 int fsymbol = AddSymbol(start);
 
 if ((funcp =
 FindFunction(fsymbol)) != NULL) {
 /* decl, func call, or addr */
 /* ____________
 * T_FUNCTREF 
 * ____________
 * Function 
 * Number 
 * ____________ */
 *tknptr++ = T_FUNCTREF;
 *(unsigned *)tknptr =
 (funcp - FunctionMemory);
 tknptr += sizeof(unsigned);
 }
 else if (*srcbuf == '(' &&
 BraceCount == 0) {
 FUNCTION func;
 NullFunction(&func);
 /* declaration or prototype */
 /* _____________
 * T_FUNCTION 
 * _____________
 * symbol offset

 * _____________ */
 /* --- install the function --- */
 func.symbol = fsymbol;
 func.libcode = SearchLibrary(start);
 func.ismain =
 (strcmp(start, "main") == 0);
 func.fileno = Ctx.CurrFileno;
 func.lineno = Ctx.CurrLineno;
 Ctx.Curfunction = NextFunction;
 InstallFunction(&func);
 *tknptr++ = T_FUNCTION;
 *(int *)tknptr = func.symbol;
 tknptr += sizeof(int);
 }
 else {
 /* variable reference */
 /* _____________
 * T_SYMBOL 
 * _____________
 * symbol offset
 * _____________ */
 *tknptr++ = T_SYMBOL;
 *(int *)tknptr = fsymbol;
 tknptr += sizeof(int);
 }
 }
 }
 else
 /* --- Bad character in input line --- */
 error(LEXERR);
 }
 if (*srcbuf == '=' && op) {
 tknptr[-1] = 128;
 ++srcbuf;
 }
 }
 *tknptr++ = T_EOF;
 *tknptr = '\0';
 return tknptr - tknbuf;
}
static int uncesc(char **bufp)
{
 /* Unescape character escapes */
 char *buf, c;

 buf = *bufp;
 if ((c = *buf++) == '\\') {
 int i;
 char n[4];

 switch (c = *buf++) {
 case 'a': c = '\a'; break;
 case 'b': c = '\b'; break;
 case 'f': c = '\f'; break;
 case 'n': c = '\n'; break;
 case 'r': c = '\r'; break;
 case 't': c = '\t'; break;
 case 'v': c = '\v'; break;
 case '\\': c = '\\'; break;

 case '\'': c = '\''; break;
 case '"': c = '"'; break;
 case 'x':
 sscanf(buf, "%x", &i);
 c = i;
 while (isxdigit(*buf))
 buf++;
 break;
 default:
 if (isdigit(c)) {
 --buf;
 for (i=0; i<3 && isdigit(*buf); ++i)
 n[i] = *buf++;
 n[i] = 0;
 sscanf(n, "%o", &i);
 c = i;
 }
 break;
 }
 }
 *bufp = buf;

 return c;
}
static void fltnum(char **srcstr, char **tknstr)
{
 /* Parse a floating point number */
 char *srcp, *cp;
 char numbuf[64];
 char c, n, dot, e, sign;
 double f;
 n = dot = e = sign = 0;
 srcp = *srcstr;
 **tknstr = T_FLTCONST;
 ++(*tknstr);

 while (*srcp) {

 if ((c = *srcp++) == '.') {
 if (dot) {
 /* Already saw a dot */
 --srcp;
 break;
 }
 ++dot;
 }
 else if (c=='e' c=='E') {
 if (!(dot n) e) {
 /* 'E' does not immediately follow dot
 or number */
 --srcp;
 break;
 }
 ++e;
 }
 else if (c=='+' c=='-') {
 if (e!=1 sign) {
 /* Sign does not immediately follow an 'E' */
 --srcp;

 break;
 }
 ++sign;
 }
 else if (isdigit(c)) {
 ++n;
 if (e) {
 /* number follows an 'E' - don't allow
 the sign anymore */
 ++e;
 }
 }
 else {
 --srcp;
 break;
 }
 }
 /* copy number into local buffer and null terminate it */
 n = 0;
 cp = *srcstr;
 while (cp < srcp)
 numbuf[n++] = *cp++;
 numbuf[n] = 0;
 f = atof(numbuf);
 *((double*)*tknstr) = f;
 *srcstr = srcp;
 *tknstr += sizeof(double);
}
/* --- Parse a decimal, octal or hexadecimal number --- */
static void intnum(char **srcstr, char **tknstr)
{
 char *srcp, *cp, c;
 int i;
 long j;
 int isDecimal = 1;
 /* ---- test for float number ---- */
 srcp = *srcstr;
 while (isdigit(*srcp))
 ++srcp;
 if (*srcp == '.' *srcp == 'e' *srcp == 'E') {
 fltnum(srcstr, tknstr);
 return;
 }
 /* ----- not a float ----- */
 c = T_INTCONST;
 srcp = *srcstr;
 if (*srcp++ == '0') {
 if (isdigit(*srcp)) {
 /* --- octal constant --- */
 sscanf(srcp, "%o", &i);
 while (isdigit(*srcp))
 ++srcp;
 isDecimal = 0;
 }
 else if (tolower(*srcp) == 'x') {
 /* --- hexadecimal constant --- */
 sscanf(++srcp, "%x", &i);
 while (isxdigit(*srcp))
 ++srcp;

 isDecimal = 0;
 }
 }
 if (isDecimal) {
 cp = --srcp;
 while (isdigit(*cp))
 ++cp;
 /* --- decimal integer number --- */
 i = atoi(srcp);
 j = atol(srcp);
 if (*cp == 'U')
 cp++;
 if (*cp == 'l' *cp == 'L') {
 c = T_LNGCONST;
 ++cp;
 }
 else if (j != (long)i)
 c = T_LNGCONST;
 srcp = cp;
 }
 *srcstr = srcp;
 **tknstr = c;
 ++(*tknstr);
 if (c == T_LNGCONST) {
 *((long *)*tknstr) = j;
 *tknstr += sizeof(long);
 }
 else {
 *((int *)*tknstr) = i;
 *tknstr += sizeof(int);
 }
}




Listing Two

/* --------- symbols.c --------- */
#include <string.h>
#include <stdlib.h>
#include "qnc.h"
#include "sys.h"

SYMBOLTABLE LibraryFunctions[] = {
 /* --- These have to be maintained in alphabetic order --- */
 { "_Errno", SYSERRNO },
 { "_filename", SYSFILENAME },
 { "_lineno", SYSLINENO },
 { "abs", SYSABS }, 
 { "acos", SYSACOS }, 
 { "asctime", SYSASCTIME },
 { "asin", SYSASIN },
 { "atan", SYSATAN },
 { "atan2", SYSATAN },
 { "atof", SYSATOF },
 { "atoi", SYSATOI },
 { "atol", SYSATOL },
 { "ceil", SYSCEIL },

 { "clrscr", SYSCLRSCRN },
 { "cos", SYSCOS },
 { "cosh", SYSCOSH },
 { "cprintf", SYSCPRINTF },
 { "cursor", SYSCURSOR },
 { "exit", SYSEXIT }, 
 { "exp", SYSEXP },
 { "fabs", SYSFABS },
 { "fclose", SYSFCLOSE },
 { "fflush", SYSFFLUSH },
 { "fgetc", SYSFGETC },
 { "fgets", SYSFGETS },
 { "findfirst", SYSFINDFIRST },
 { "findnext", SYSFINDNEXT },
 { "floor", SYSFLOOR },
 { "fopen", SYSFOPEN }, 
 { "fprintf", SYSFPRINTF },
 { "fputc", SYSFPUTC },
 { "fputs", SYSFPUTS },
 { "fread", SYSFREAD },
 { "free", SYSFREE },
 { "fscanf", SYSFSCANF }, 
 { "fseek", SYSFSEEK },
 { "ftell", SYSFTELL },
 { "fwrite", SYSFWRITE },
 { "getch", SYSGETCH },
 { "getchar", SYSGETCHAR },
 { "gets", SYSGETS },
 { "gmtime", SYSGMTIME },
 { "localtime", SYSLOCALTIME },
 { "log", SYSLOG },
 { "log10", SYSLOG10 },
 { "longjmp", SYSLONGJMP },
 { "malloc", SYSMALLOC },
 { "mktime", SYSMKTIME },
 { "pow", SYSPOW },
 { "printf", SYSPRINTF },
 { "putch", SYSPUTCH },
 { "putchar", SYSPUTCHAR },
 { "puts", SYSPUTS },
 { "remove", SYSREMOVE },
 { "rename", SYSRENAME },
 { "rewind", SYSREWIND },
 { "scanf", SYSSCANF }, 
 { "setjmp", SYSSETJMP },
 { "sin", SYSSIN },
 { "sinh", SYSSINH },
 { "sprintf", SYSSPRINTF },
 { "sqrt", SYSSQRT },
 { "sscanf", SYSSSCANF }, 
 { "strcat", SYSSTRCAT }, 
 { "strcmp", SYSSTRCMP }, 
 { "strcpy", SYSSTRCPY },
 { "strlen", SYSSTRLEN },
 { "strncat", SYSSTRNCAT }, 
 { "strncmp", SYSSTRNCMP },
 { "strncpy", SYSSTRNCPY },
 { "system", SYSSYSTEM },
 { "tan", SYSTAN },

 { "tanh", SYSTANH },
 { "time", SYSTIME },
 { "tmpfile", SYSTMPFILE },
 { "tmpnam", SYSTMPNAM },
 { "ungetc", SYSUNGETC }
};
#define MAXLIBFUNCTIONS (sizeof(LibraryFunctions)/sizeof(SYMBOLTABLE))
/* --------- keyword lookup table ------------ */
static SYMBOLTABLE Keywords[] = {
 /* --- These have to be maintained in alphabetic order --- */
 { "auto", T_AUTO },
 { "break", T_BREAK },
 { "case", T_CASE },
 { "char", T_CHAR },
 { "const", T_CONST },
 { "continue", T_CONTINUE },
 { "default", T_DEFAULT },
 { "do", T_DO },
 { "double", T_DOUBLE },
 { "else", T_ELSE },
 { "enum", T_ENUM },
 { "extern", T_EXTERN },
 { "float", T_FLOAT },
 { "for", T_FOR },
 { "goto", T_GOTO },
 { "if", T_IF },
 { "int", T_INT },
 { "long", T_LONG },
 { "register", T_REGISTER },
 { "return", T_RETURN },
 { "short", T_SHORT },
 { "sizeof", T_SIZEOF },
 { "static", T_STATIC },
 { "struct", T_STRUCT },
 { "switch", T_SWITCH },
 { "typedef", T_TYPEDEF },
 { "union", T_UNION },
 { "unsigned", T_UNSIGNED },
 { "void", T_VOID },
 { "volatile", T_VOLATILE },
 { "while", T_WHILE }
};
#define MAXKEYWORDS (sizeof(Keywords)/sizeof(SYMBOLTABLE))
/* -------- multi-character operator lookup tbl ------------ */
static SYMBOLTABLE Operators[] = {
 /* --- These have to be maintained in collating order --- */
 { "!=", T_NE },
 { "&&", T_LAND },
 { "++", T_INCR },
 { "--", T_DECR },
 { "->", T_ARROW },
 { "<<", T_SHL },
 { "<=", T_LE },
 { "==", T_EQ },
 { ">=", T_GE },
 { ">>", T_SHR },
 { , T_LIOR }
};
#define MAXOPERATORS (sizeof(Operators)/sizeof(SYMBOLTABLE))

static SYMBOLTABLE PreProcessors[] = {
 /* --- These have to be maintained in collating order --- */
 { "define", P_DEFINE },
 { "elif", P_ELIF },
 { "else", P_ELSE },
 { "endif", P_ENDIF },
 { "error", P_ERROR },
 { "if", P_IF },
 { "ifdef", P_IFDEF },
 { "ifndef", P_IFNDEF },
 { "include", P_INCLUDE },
 { "undef", P_UNDEF }
};
#define MAXPREPROCESSORS (sizeof(PreProcessors)/sizeof(SYMBOLTABLE))
/* --- search a symbol table for matching entry --- */
int SearchSymbols(char *arg, SYMBOLTABLE *tbl, int siz, int wd)
{
 int i, mid, lo, hi;

 lo = 0;
 hi = siz-1;
 while (lo <= hi) {
 mid = (lo + hi) / 2;
 i = wd ? strncmp(arg, tbl[mid].symbol, wd) :
 strcmp(arg, tbl[mid].symbol);
 if (i < 0)
 hi = mid-1;
 else if (i)
 lo = mid + 1;
 else
 return tbl[mid].ident;
 }
 return 0;
}
/* --- search for library function identifier --- */
int SearchLibrary(char *fname)
{
 return SearchSymbols(fname,LibraryFunctions,MAXLIBFUNCTIONS,0);
}
/* --- search for keyword --- */
int FindKeyword(char *keyword)
{
 return SearchSymbols(keyword, Keywords, MAXKEYWORDS, 0);
}
/* --- search for two-character operator --- */
int FindOperator(char *oper)
{
 return SearchSymbols(oper, Operators, MAXOPERATORS, 2);
}
/* --- search for preprocessing directive --- */
int FindPreProcessor(char *preproc)
{
 return SearchSymbols(preproc,PreProcessors,MAXPREPROCESSORS,0);
}
/* --- search for user-declared identifier --- */
int FindSymbol(char *sym)
{
 if (SymbolTable != NULL)
 return SearchSymbols(sym, SymbolTable, SymbolCount, 0);

 return 0;
}
/* --- find identifier given code --- */
char *FindSymbolName(int id)
{
 int i;
 for (i = 0; i < SymbolCount; i++)
 if (SymbolTable[i].ident == id)
 return SymbolTable[i].symbol;
 return NULL;
}
/* --- add identifier to symbol table --- */
int AddSymbol(char *sym)
{
 int symbolid = 0;
 if (SymbolTable != NULL) {
 symbolid = FindSymbol(sym);
 if (symbolid == 0) {
 if (SymbolCount < qCfg.MaxSymbolTable) {
 int i, j;
 int len = strlen(sym)+1;
 char *s = getmem(len);
 strcpy(s, sym);
 for (i = 0; i < SymbolCount; i++)
 if (strcmp(sym, SymbolTable[i].symbol) < 0)
 break;
 for (j = SymbolCount; j > i; --j)
 SymbolTable[j] = SymbolTable[j-1];
 SymbolTable[i].symbol = s;
 SymbolTable[i].ident = ++SymbolCount;
 symbolid = SymbolCount;
 }
 else
 error(SYMBOLTABLERR);
 }
 }
 return symbolid;
}
/* --- delete the symbol table entries --- */
void DeleteSymbols(void)
{
 int i;
 for (i = 0; i < SymbolCount; i++)
 free(SymbolTable[i].symbol);
}

















October, 1994
ALGORITHM ALLEY


Genetic Annealing




Kenneth V. Price


Ken holds a BS in physics from Rensselear Polytechnic Institute. He is
currently engaged in research on artificial intelligence and system modeling.
Ken can be reached at kprice@solano.community.net.


Introduction 
by Bruce Schneier
How do you determine the best machine for a job? 
Step #1. Express all the parameters of your machine using some kind of coding
machine. Each particular machine will then be expressed as a list of these
parameters. 
Step #2. Generate some machines with random parameters. 
Step #3. Test the machines against each other and select the few best ones. 
Step #4. Generate new machines by combining parameters from the ones selected
in Step #3, occasionally making random modifications in some of the
parameters. 
Step #5. Repeat Steps #3 and #4 until you're tired of watching the show.
What I've just described is a genetic algorithm, and it's done a pretty good
job with life on this planet. Generation after generation of iterations,
selecting the best few to reproduce and occasionally throwing in the odd
mutation, has resulted in millions of different species. The end result is
that there are lifeforms suited for every niche in every environment.
Genetic algorithms might not be the fastest way to generate the "best" machine
for a job (it would be a lot easier to design a sentient lifeform from scratch
than to wait for evolution to stumble across a human), but their main
advantage is that they require no knowledge about the system. For instance,
the traditional way of generating an aircraft wing is to study aerodynamics
and spend months calculating airflows, lifts, stresses, and the like.
Alternatively, you could just express a generic wing as a string of parameters
(size, shape, weight, and so on), create some random wings, test them against
each other, and let the fittest reproduce. A weekend of iteration on a
supercomputer would probably yield a pretty good design. 
Solving problems such as these falls into the category of "combinatorial" or
"global" optimization. Other approaches to finding the "best" (according to
some predefined criteria) configuration involve simulated annealing, a
stochastic-search technique familiar to chip designers, who must determine the
best geometrical arrangement for millions of circuits on microprocessors.
DDJ has examined both genetic algorithms and simulated annealing before (see
"Genetic Algorithms" by Mike Morrow, April 1991 and "Simulated Annealing" by
Michael P. McLaughlin, September 1989). In this month's column, however,
Kenneth Price combines the two techniques, resulting in an approach called
"genetic annealing" which takes the best from both the genetic and annealing
worlds.
Genetic annealing is a hybrid, random-search technique that fuses simulated
annealing and genetic methodologies into a more efficient algorithm. Like its
genetic counterpart, the genetic-annealing algorithm creates new solutions by
exchanging genetic material between members of a population. Instead of using
a competitive tournament to decide which members will survive into the next
generation, however, genetic annealing uses an adaptive, thermodynamic
criterion based on a simple feedback scheme. With this strategy, the
genetic-annealing algorithm can outperform the algorithms whose principles it
embodies even though it is hardly more complex than running a suite of greedy
algorithms in parallel.


Bipartitioning a Necklace


To evaluate the effectiveness of a random-search algorithm, it helps to use a
problem whose global minimum is already known. To this end, I have chosen a
simple graph-bipartitioning problem to illustrate how genetic annealing works.
A graph is a collection of vertices connected by edges. Bipartitioning is the
process of assigning each vertex to one of two sets. The goal of the
bipartitioning problem is to assign half of a graph's vertices to each set so
that the number of edges connecting the two sets is minimized. In many ways,
this problem resembles the real-world task of the circuit designer, who wants
to divide a set of components equally between two chips so that the number of
lines needed to connect the chips is minimal.
The particular graph used in this example is commonly called a "necklace"
because of its obvious resemblance to neckwear; see Figure 1(a). An [M,2]
necklace is a ring of M vertices ("beads") each of which is connected radially
to a companion vertex for a total of 2M vertices. As Figure 1(b) illustrates,
any diameter across the necklace constitutes a bipartition since it divides
the necklace into two sets, each of which contains M vertices. The bipartition
generated by a diameter is optimal because only two edges cross the diameter
to connect opposing sides. Because of its rotational symmetry, an [M,2]
necklace has M optimal bipartitions in all.
For the purposes of computation, you can represent an [M,2] necklace as a
string of 2M bits. Bits at even positions are the beads on the ring itself,
and bits at odd positions represent the dangling beads; see Figure 2. The
actual binary value assigned to a bead indicates the set to which it belongs.
To count the number of edges that span sets, perform an XOR operation on every
pair of bits that represent connected vertices and sum the results. When the
vertices at the ends of an edge belong to the same set, the bits they contain
will be identical, and the XOR operation will return a 0. Conversely, a pair
of connected vertices with different binary values represents an edge with a
vertex in each set. The XOR operation counts these spans by returning a 1.
To simulate a heat bath, you will need an entire population of bit strings.
The number of bit strings depends on the problem: for this example a
population of 20 strings is adequate. For convenience, you can arrange the bit
strings into a two-dimensional array so that one coordinate locates a string
within the population while the other coordinate gives you the position of a
bit within a string.
Next, you initialize the bit strings by filling each one with an equal number
of 1s and 0s. In a genetic-annealing experiment, you must randomly select the
initial population to ensure that the system starts out like a "white-hot"
thermodynamic ensemble. Starting from this condition makes it less likely that
the population will prematurely cool to a suboptimal minimum. To randomize a
bit string while keeping its populations of 1s and 0s equal, try swapping each
bit with another bit from a randomly selected location within the same string.

The number of edges having a vertex in each set measures the quality of a
bipartition as a solution. In the context of a genetic algorithm, this number
reflects the "fitness" of a bit string to survive as a "gene" in a competitive
selection procedure. By contrast, simulated annealing uses the fitness of a
configuration-like energy in order to exploit the laws of statistical
mechanics. Genetic annealing adopts the annealing metaphor, treating a
configuration's fitness-like energy, even though the annealing process itself
is driven by population dynamics.


Acceptance Criteria 


Like its component algorithms, genetic annealing is a random-search technique.
This class of algorithm tries to improve an existing configuration by
subjecting it to trial mutations. A mutation is any procedure that alters
either the structure of a configuration or the data that it holds. One of the
keys to a successful random search is knowing when a mutation produces an
acceptable improvement. For example, under the "greedy" criterion, a mutant is
deemed acceptable whenever its energy is less than or equal to the energy of
the configuration from which it was derived. If it is accepted, the mutant
supplants its progenitor and becomes the target for subsequent mutations. This
way, each improved mutant lowers the acceptability criterion until
configurations with lower energies fail to turn up.
In the genetic-annealing approach, you assign an energy threshold to each bit
string. Initially, each threshold equals the energy of the randomized bit
string to which it is assigned. Unlike the greedy criterion, a bit string's
threshold, not its energy, determines which trial mutations constitute
acceptable improvements. If the energy of a mutant exceeds the threshold of
the bit string that spawned it, you reject the mutant and move on to the next
bit string. However, if its energy is less than or equal to the threshold, you
accept the mutant as a replacement for its progenitor.
While the greedy algorithm accepts a better configuration without regard to
how much better it is, the genetic-annealing algorithm uses this information
to drive the annealing process. Genetic annealing uses an "energy bank,"
represented by the real variable DE, to keep track of the energy liberated by
successful mutants. Whenever a mutant passes the threshold test, you add the
difference between the threshold and the mutant's energy to DE for temporary
storage. Once you account for this quantum of heat, you reset the threshold so
that it equals the energy of the accepted mutant and then move on to the next
bit string.


Reheating


After each bit string has been subjected to a random mutation, you "reheat"
the population by raising each threshold a little. The size of the increase
depends both on the amount of energy accumulated in the energy bank and on the
rate at which you want to cool the population. If you let N equal the number
of bit strings in the population, then the average contribution to the energy
bank is just DE/N. To fully reheat the population, all you need to do is add
DE/N to each threshold. Annealing results from repeated cycles of collecting
energy from successful mutants (spontaneous cooling) and then redistributing
nearly all of it by raising the threshold energy of each population member
equally (uniform reheating).
After they have been reheated, thresholds are higher than the energies of the
bit strings to which they have been assigned. This means that sometimes you
are forced to accept a mutant even though its energy is not as low as the
energy of the bit string it replaces. Replacing a bit string with a worse one
may seem counter-productive, but these occasional reversals of fortune provide
floundering bit strings with a helpful energy boost. In essence, the entire
population acts like a giant heat reservoir that exchanges energy among its
members. Less successful bit strings can escape suboptimal configurations by
borrowing the energy they need from the more successful versions.


The Cooling Constant



To relax the bit strings into their optimal condition, they must be cooled
very slowly. In a genetic-annealing program, you control the rate of cooling
with the cooling constant, C--a real number in the closed interval [0,1] that
represents the fraction of DE that is returned to the population. For example,
C=1 holds the population at a constant "temperature" by using 100 percent of
the energy stored in DE to reheat thresholds. By contrast, C=0 releases all of
DE's stored energy from the system and leaves thresholds unaltered. In effect,
C=0 sets up a suite of greedy algorithms since each threshold is always equal
to the energy of the bit string to which it's assigned. Most problems of
consequence require very slow cooling. Typically, C ranges from 0.9 to 0.99
and beyond, although for a given problem, the optimal choice of C depends on a
variety of factors. Pseudo-code for the genetic-annealing algorithm is shown
in Figure 4.
You can also use the genetic-annealing algorithm to "cool" a population of
configurations to a condition of maximum energy. All you need to do is replace
the "greater than" symbol with a "less than" symbol in the portion of code
that determines whether or not a mutant is acceptable. This has the effect of
reversing the sign of the spontaneous energy so that during the reheating
cycle each threshold is lowered by the amount dE=C*DE/N. This feature lets you
use a problem's natural measure of fitness without having to invert it. Figure
3 shows what the maximum energy solution to the necklace problem looks like
for M=8.
This energy bank approach to simulated annealing can substantially reduce the
time you need to design and execute difficult combinatorial optimizations.
Traditional annealing methods invoke the venerable Metropolis algorithm to
forge a link between the laws of statistical mechanics and combinatorial
optimization. Despite its great utility, the Metropolis algorithm is
computationally expensive because the decision of whether or not to accept a
mutant usually requires that you generate a random number and compare it to an
exponential term. Configurations that are not much worse than their
progenitors may be rejected depending on the random number generated. The
genetic-annealing algorithm accepts every configuration that is not much worse
than its progenitor based on the result of a simple comparison that requires
neither a random-number generator nor the use of acceptance probabilities.
Traditional annealing methods also rely on an empirically derived "annealing
schedule" to control the rate at which the Metropolis temperature should be
lowered. The advantage of using thresholds to track the time-averaged loss of
spontaneous energy from individual configurations is that you can maintain
equilibrium at any temperature without using an annealing schedule just by
restoring energy to each threshold at the same average rate that the ensemble
loses it. In most cases, this reduces the researcher's challenge to finding
the smallest value of C that will allow the population to maintain equilibrium
as it anneals. 


Mutations


Choosing the right mutation scheme is just as important to the success of a
random search as determining which mutants are acceptable. A mutation can be
an elementary operation like flipping a bit, or it can be a more complex
procedure like a symmetry operation or crossbreeding. In general, each problem
will have its own menu of mutations. Frequently, mutations are subject to
constraints based on a problem's symmetries. For example, mutations used in
the bipartitioning problem must preserve the equal number of 1s and 0s in each
bit string. 
Of all the ways you can alter a bit string without changing the number of 1s
(or 0s) that it contains, swapping bits is perhaps the easiest. To swap a pair
of bits, just randomly select two bits with different binary values and
replace each with its one's complement.
While it may be the simplest form of mutation, swapping a single pair of bits
provides you with only a limited search capability. You can explore remote
regions of solution space more effectively if you occasionally swap more than
one pair of bits at a time. For example, you can usually enhance the
efficiency of a random search by drawing the number of elements involved in a
mutation from an exponential distribution. You can characterize an exponential
distribution by its decay constant, EX, where EX is a real number in the
half-open interval [0,1). In the case of swapping bits, EX=.5 means that a
single pair of bits is swapped half of the time, two pairs of bits are swapped
one quarter of the time, three pairs of bits are swapped one eighth of the
time, and so on. EX=0 means that you never swap more than one pair of bits.
With the right decay constant, an exponential distribution of mutation sizes
makes distant points in solution space more accessible without sacrificing the
ability to efficiently fine tune a configuration.


Symmetry Operations as Mutations


In most problems, a well-chosen symmetry operation can make a valuable
addition to your mutation scheme. When appropriate, operations like rotations
and reflections can generate large-scale variations of a configuration without
des-troying crucial local relationships. For example, a reflection operation
in which the first four bits of the sequence: 0000111100001111 are exchanged
with the second four bits, produces the optimal configuration:
0000000011111111. Despite altering the target string on a relatively large
scale, the reflection procedure not only conserves the number of 1s, but also
maintains the XOR relationships between the bits within the reflected
substring. To accomplish the same transformation in a series of random swap
operations would require considerable good fortune. Despite their utility,
symmetry mutations reduce the novelty of a "random" search, so you should not
rely on them exclusively.


Splicing


Perhaps the greatest advantage that cooling a population in parallel affords
you is the opportunity to employ crossbreeding as a form of mutation.
Actually, instead of mating pairs of bit strings in a separate cross-breeding
procedure (as you do in the genetic algorithm), the genetic-annealing
algorithm transfers genetic information between bit strings by "splicing" it.
Splicing tentatively replaces a portion of the target string with the
corresponding section from another bit string chosen at random from the
population. In the splicing scenario, the randomly chosen string remains
unaltered while it donates a copy of a segment of its "genetic material" for
the target string to use as a trial mutation; see Figure 5. By drawing upon
the success of other configurations in the population, splicing brings to the
genetic-annealing algorithm all of the problem-solving power that
crossbreeding imparts to the genetic algorithm. 
You must be careful when implementing a splicing procedure for the
bipartitioning problem because the number of 1s in the substring being donated
must be the same as the number of 1s in the target substring. To ensure that
1s are conserved, the size of the donated substring is allowed to grow until
the number of 1s it contains equals the number of 1s in the corresponding
target substring and until the two substrings differ in at least one bit
position, or until the whole string is used.
The mutation scheme in GENNEAL.C (described later) includes all three forms of
mutation: random bit swapping, a reflection operation, and splicing. PS, PR,
and PX are the probabilities that a mutation will swap random bits, reflect a
substring, or splice a substring, respectively. Of course, PS+PR+PX=1. The
size of a swap mutation is controlled by EXS, while EXR determines the size of
a reflection mutation.


Putting it All Together


Along with the mutation probabilities and their respective decay constants,
other control variables include the population size, N, and the cooling
constant, C. Given a necklace of size M (for a minimum energy of 2, make M
even), what combination of these variables will repeatedly produce an optimal
partition with a minimum of computational effort?
Since run times on a sequential computer are proportional to population size,
you will usually want to reduce N until further reductions begin to jeopardize
the population's ability to simulate a heat bath. A population containing
anywhere from 10 to 40 configurations should be sufficient for most problems.
Smaller populations hamper the annealing process by failing to provide energy
when it is needed, while larger populations increase execution times while
enhancing the thermal simulation only incrementally.
The robust flexibility of the genetic-annealing algorithm makes it easy to
experiment with a variety of computational approaches to find the one that
works best. You can transform the genetic-annealing algorithm into a greedy
algorithm, a suite of annealing programs, or a genetic-style algorithm simply
by changing a few control constants. When you approach a new problem, try
starting with N=20 and run a suite of greedy algorithms in parallel by setting
C=0 and PX=0 (no annealing or splicing). These results will provide you with a
convenient performance benchmark. 


Results


Although it occasionally turns up an optimal configuration, the greedy
algorithm performs poorly because it is not consistently successful. The true
performance of the greedy algorithm, like any stochastic search, is more
reliably gauged by the average and variance of an ensemble of results.
In a series of ten trials with N=20 and M=80, the greedy version of the
genetic-annealing algorithm (in which C, EXS, PR, and PX are all zero),
produced an average minimum energy of 6.62 and a variance of 4.63. In each
trial, the program was allowed to run for 256,000 generations. When each
mutation swapped only one pair of bits (EXS=0), just four out of the total of
200 configurations (10 trialsx20 strings per trial) reached the optimal energy
of 2. Increasing EXS produced a modest improvement. A series of ten trials
that used EXS=.35 turned up six optimal bit strings and dropped the average
final energy to 6.37. For the sake of simplicity, the random-swap procedure
uses a constant value of EXS=.35 throughout the remainder of this
demonstration.
As expected, results improved significantly once the population was annealed.
With N and C both greater than 0 and no splicing, this version of the
genetic-annealing algorithm resembles a suite of annealing algorithms running
in parallel. Initial trials, which used C=.84, drove the average energy down
to 4.83, reduced the variance to 2.61, and produced 21 optimal bit strings.
The next set of trials used C=.92, which cooled the population at about half
the rate that C=.84 did, because annealing times are roughly proportional to
1/(1--C). With C=.92, the average final energy dropped to 4.51, and 28 bit
strings found their way to an optimal state. Cutting the cooling rate in half
again gave still better results, but the returns were clearly diminishing.
C=.96 lowered the average final energy to 4.43 and produced a total of 38
optimal bit strings. When the cooling rate was halved yet again to C=.98, the
average final energy rose to 4.52, indicating that cooling was so slow that
even 256,000 generations were not enough to completely quench the population.
In the absence of the reflection mutation, schemes that used only a minuscule
fraction of splicing performed best. If splicing is used too frequently, the
population will misconverge. Annealing, however, can overcome misconvergence.
For example, the "greedy-genetic" scheme: C=0, PX=.0002, PS=.9998 produced
optimal results in only five of ten trials with convergence occurring at about
100,000 generations. Annealing with C=.84 took a little longer but
misconvergence abated. By 120,000 generations, all 200 bit strings had found
an optimal state.
The real surprise comes when you use a scheme that uses both long reflections
and splicing. In particular, the scheme EXR=.95, PR=.3, PX=.7 not only
produced perfect results in ten out of ten trials without annealing, but also
converged on average in only 280 generations! This level of performance is
somewhat atypical, due primarily to the reflection mutation's ability to speed
equilibration across a bit string when large values of EXR are used. Some
annealing is needed when you bipartition larger necklaces and/or use a less
effective mutation scheme. 


Computational Synergy


In the genetic-annealing approach, a synergy exists between splicing and
annealing. Annealing controls the rate of convergence, tolerates error, and
can endure high rates of random mutation. All these factors help alleviate the
tendency of the genetic algorithm to misconverge. Furthermore, splicing
genetic material empowers the annealing algorithm by providing the means to
efficiently compare competing bit strings that are widely separated in
solution space. Run times benefit from this synthesis, since you can exploit
the superior search capabilities of splicing without having to maintain a
large population to ensure genetic diversity. Add to this the increase in
computational speed made possible by using thresholds instead of the
Metropolis algorithm, and you can improve run times substantially when
compared to traditional genetic and annealing techniques. Of course, you can
also improve performance by running the genetic-annealing algorithm on a
parallel computer, especially one using the SIMD architecture, since each
configuration can then be assigned its own processor.


The GENNEAL.C Program


GENNEAL.C is a bipartitioning program available electronically; see
"Availability," page 3. I've kept code simple for the sake of clarity and as a
result, several routines have not been optimized. For instance, GENNEAL.C
computes the energy of a mutant by performing an XOR operation on every pair
of connected vertices, even though most mutations affect only a small part of
the target string. It is faster to compute the energy of a mutant by
reevaluating just those links of the target string that have been altered.
Similarly, if a mutation fails, you need to restore only that part of the
target string affected by the mutation. GENNEAL.C copies the whole bit string
and then restores it in its entirety, if necessary.



Conclusion


You should now have enough information about how genetic annealing works to
experiment on your own. If you currently use either a traditional genetic or
simulated annealing algorithm, give genetic annealing a try. I would enjoy
hearing from those of you who do the comparison. 
I would also like to acknowledge Margaret E. Burwell for her valuable
assistance in the preparation of this article.
Figure 1 (a) The "necklace" graph; (b) any diameter across the necklace
constitutes an optimal bipartition.
Figure 2 Bits at even positions are the beads on the ring itself; bits at odd
positions represent the dangling beads.
Figure 3 Maximum-energy solution to the necklace problem for M=8.
Figure 4:The genetic-annealing algorithm (minimization version).
 1. Randomly select an initial population of N configurations.
 2. For i=1 to N: Initialize the ith threshold,
 Th[i], with the energy of the ith configuration.
 3. DE=0 /* Empty the energy bank */.
 4. For i=1 to N: /* Begin cooling loop */.
 5. Splice or randomly mutate the ith configuration.
 6. Compute the energy, E, of the resulting mutant.
 7. If E>Th[i] then restore the old configuration.
 8. If ETh[i] then:
 a.DE=DE+Th[i]--E /* Add energy difference to DE */.
 b.Th[i]=E /* Reset threshold */.
 c. Replace old configuration with successful mutant.
 9. Mutate next configuration or end cooling loop.
10.dE=DE*C/N /* Compute reheating increment, dE. 0C1 */
11.For i=1 to N: /* Begin reheating loop */.
12.Th[i]+dE /* Add dE to each threshold */.
13.Return to step 3 once all thresholds have been reheated.
Figure 5 The splicing scenario.


































October, 1994
UNDOCUMENTED CORNER


Microsoft's Grip on Software Tightened by Antitrust Deal




Andrew Schulman


On Friday, July 15, Microsoft signed a consent decree with the Antitrust
Division of the U.S. Department of Justice (DoJ), ending a four-year
investigation by U.S. antimonopoly agencies--first the Federal Trade
Commission (FTC) and later the DoJ--into Microsoft's trade practices. At the
same time, Microsoft signed a nearly identical settlement with the
Directorate-General for Competition of the European Commission. The judgment
lasts for six and a half years in the U.S., four and a half in Europe.
Microsoft agreed to immediately abandon several arrangements for licensing the
MS-DOS and Windows operating systems to PC hardware vendors. It also agreed to
halt some "unnecessarily restrictive" clauses in its nondisclosure agreements
(NDAs) for the forthcoming "Chicago" version of Windows. The consent decree
explicitly excludes Windows NT.
The consent decree is still subject to a 60-day public review. The full text
of the DoJ's July 15 complaint against Microsoft for violations of sections 1
and 2 of the Sherman antitrust act, the U.S. District Court final judgment in
U.S. v. Microsoft, and the "Stipulation" signed by the DoJ and Microsoft
consenting to the final judgment, are available via Internet Gopher from the
DoJ's Gopher server. 


Who Won?


The consent decree was first viewed as a victory for the DoJ and Microsoft's
competitors. The New York Times (July 17) carried the front-page headline,
"Microsoft's Grip on Software Loosened by Antitrust Deal," and crowed that
"the pact could reshape the world of computing_. The accord could undermine
Microsoft's near total control of the market for operating systems." The
Boston Globe's headline was equally enthusiastic: "Microsoft Accord to Create
Competition in US, Europe." 
Indeed, the consent decree sounds at first as if it should cramp Microsoft's
style, and lead to more competition in PC software. For years, Microsoft has
provided PC hardware manufacturers (original equipment manufacturers, or OEMs)
with per-processor licenses to MS-DOS and Windows, in which the vendor pays
Microsoft based on the number of machines it think it will ship, rather than
the number of copies of DOS or Windows it actually uses. In 1993, such
per-processor agreements accounted for about 60 percent of MS-DOS OEM sales,
and 43 percent of Windows OEM sales.
According to the DoJ, "Microsoft's per processor contracts penalize OEMs,
during the life of the contract, for installing a non-Microsoft operating
system. OEMs that have signed per processor contracts with Microsoft are
deterred from using competitive alternatives to Microsoft operating systems."
The consent decree put an immediate stop to this practice, leading to the hope
that non-Microsoft operating systems would now have a shot at the desktop.
But the morning after, nearly everyone realized that, in fact, U.S. v.
Microsoft is a victory for Microsoft. Directly contradicting the previous
day's headline, a New York Times (July 18) news analysis by John Markoff spoke
of "Microsoft's Barely Limited Future": "Rather than reining in the Microsoft
Corporation, the consent decree_frees the company to define the computer
industry's ground rules through the rest of the decade." The Wall Street
Journal had a similar take: "A Winning Deal: Microsoft Will Remain Dominant
Despite Pact In Antitrust Dispute." According to the Journal, Gates "has just
won big again, this time by letting the Justice Department rake in a small pot
while his company retains the power to dominate the nation's desktops."
In the first day of trading after the settlement, Wall Street made its
statement on the consent decree: Microsoft stock rose $1.87, to $50.50. Rick
Sherlund, an analyst for Goldman Sachs, stated that with the settlement,
Microsoft "should dominate the market for desktop software for the next 10
years." Another frequently quoted analyst, Richard Shaffer, announced that
"The operating system wars are over--Microsoft is the winner_. Microsoft is
the Standard Oil of its day."
But how could a ban on an important Microsoft trade practice be viewed as
cementing Microsoft's hold on the industry?
First, to achieve the DoJ's goals, the change from per-processor to per-copy
licensing probably comes about four years too late. Despite some brave words
from IBM and Novell after the consent decree, it seems unlikely that the
change will lead to a larger presence for OS/2 or Novell DOS. As a spokesman
for Compaq (which already offers OS/2 to its customers) noted, "Windows is the
standard--not much will change."
Nor does the consent decree address the key questions about Microsoft's role
in the PC software industry. Companies such as Lotus and Borland that compete
with Microsoft in application areas such as word processors and spreadsheets
have long asserted that Microsoft "leverages" its control of the operating
system to benefit its applications--particularly the Microsoft Office "suite,"
which bundles together Microsoft Word, Excel, Access, Mail, and PowerPoint--at
the expense of applications and suites from other vendors.


Grabbing the Whole Pie


More and more, Microsoft's applications seem like part of the operating
system. Many PCs today come, not only with MS-DOS and Windows preinstalled on
the hard disk, but also with Microsoft Office. The forthcoming "Chicago"
release of Windows will include numerous features once considered the province
of third-party applications developers. Microsoft not only has a near-monopoly
on the operating system, but is constantly expanding the definition of what
belongs in the operating system. 
Some commentators see these increasing ties, and the DoJ's apparent refusal to
touch them, as a good thing. For example, Steward Alsop was quoted in the New
York Times (July 18) as saying, "If you really care about improving the
personal computer, you want Microsoft to take over all the pieces of the pie."
There is a certain logic in this. For example, one reason the Apple Macintosh
was for so long far easier to use than a PC was that Apple had a closed
architecture and completely dominated the market, guaranteeing that almost
everything came from a single vendor. Monopoly has some clear benefits. In
certain situations, such as public utilities, monopoly may be the only viable
industry structure, leading to a so-called "natural monopoly." 
Interestingly, the superb biography Gates, by Stephen Manes and Paul Andrews
(Doubleday, 1993), quotes a 1981 statement by Microsoft chairman Bill Gates
where he noted that volume and standards in PC software can lead to a "natural
monopoly." But companies in such a favored position usually are forced to make
an important trade-off: so-called natural monopolies are generally regulated,
are prevented from expanding their monopoly into new areas, and so on. 
Microsoft already has MS-DOS installed on about 120 million PCs in the world,
and Windows on about 50 million. With the DoJ consent decree, Microsoft can
move even more rapidly toward its goal of becoming an unregulated, nonpublic
utility providing total, one-stop shopping for all your software needs. 


Exposing Microsoft's Monopoly


Microsoft continues to deny that it monopolizes the PC software industry. Nor
has it admitted to any guilt by consenting to the court's final judgment. The
consent is explicitly "without trial or adjudication of any issue of fact or
law; and without this Final Judgment constituting any evidence or admission by
any party with respect to any issue of fact or law."
Nonetheless, the PC software industry has been treated to some puzzling
denunciations of Microsoft trade practices from high government officials.
After the signing of the consent decree, U.S. Attorney General Janet Reno
said, "Microsoft's unfair contracting practices have denied other U.S.
companies a fair chance to compete, deprived consumers of an effective choice
among competing PC operating systems, and slowed innovation."
The Assistant Attorney General for Antitrust, Anne Bingaman, noted that
"Microsoft is an American success story but there is no excuse for any company
to try to cement its success through unlawful means, as Microsoft has done
with its contracting practices." 
"Microsoft has used its monopoly power, in effect, to levy a 'tax' on PC
manufacturers who would otherwise like to offer an alternative system," said
Bingaman. "As a result, the ability of rival operating systems to compete has
been impeded, innovation has been slowed and consumer choices have been
limited." According to a DoJ press release, Bingaman noted that Microsoft has
maintained the price of its operating systems even while the price of other
components has fallen dramatically, and that, since 1988, Microsoft's share of
the market has never dropped below 70 percent.


The Road Not Taken


No matter what else it says, the fact remains that the consent decree
addresses only a narrow issue: OEM sales represent less than 25 percent of
Microsoft revenue. 
The complaint notes that "At least 50,000 applications now run on MS-DOS and
over 5000 have been written to run on Windows. Microsoft sells a variety of
its own very successful and profitable applications." But that is all it has
to say about applications!
The complaint also notes that "All versions of Windows released to date
require the presence of an underlying operating system, either MS-DOS or a
close substitute," but says nothing about alleged tying arrangements between
Windows and MS-DOS (see "Examining the Windows AARD Detection Code" DDJ,
September 1993). 
Similarly, the complaint mentions "critical information about the interfaces
in the operating system that connect with applications--information which the
ISVs need to write applications that run on the operating system"--yet doesn't
address the issue of whether or not Microsoft unfairly withholds some critical
information, trying to give its developers exclusive use of undocumented
interfaces.
Likewise, the DoJ was well aware of, and quite interested in, the issues
surrounding Microsoft's ownership of the vastly important DOS and Windows
standards. Yet none of this is addressed in the consent decree, which ends up
looking quite similar to what Microsoft probably could have got from the FTC a
year ago. Even Bill Gates, who was apparently in the habit of denouncing even
the mildest FTC and DoJ questions as "communistic" and "socialistic," had to
admit that the final settlement was no big deal saying, after years of
investigation, that "this is what they came up with" (Wall Street Journal,
July 18).



Why So Little?


Why did the DoJ settle for so little? How could they seemingly ignore the
entreaties of so many PC software vendors?
One theory is that the Clinton administration views Microsoft as a "national
treasure," and put pressure on DoJ to leave Microsoft alone. The press made
much of a May 25 meeting between Bill Gates and Clinton's chief economic
advisor, Robert Rubin. The date is significant because just one week later,
Gates testified under oath before the DoJ. According to one anonymous source,
Gates pointed out to Rubin that Microsoft is responsible for a substantial
portion of U.S. software exports (Information Week, June 27).
Frankly, I don't buy Clinton administration pressure as an explanation for the
DoJ's limited settlement. Microsoft may be highly visible, but it simply isn't
that important to the U.S. economy, at least when compared to companies such
as IBM or GM that make tangible goods. Microsoft, remember, produces software.
While software is a crucial part of the modern world economy, consider that
even "giant" Microsoft has only about 15,000 employees and that its quarterly
sales are about $1.25 billion, compared to $13.3 billion for IBM, or even $2.5
billion for Apple.
What makes Microsoft different is its incredibly low costs. This is very nice
for Microsoft, but it's hard to see what it does for the U.S. economy,
especially when 45 percent of Microsoft's stock is owned by insiders. Had it
wanted, the DoJ could have made a moderately plausible case to the American
public that Microsoft, far from being a "national treasure," is simply a
grossly profitable monopolist, with few employees and few stockholders, that
gives back little to the public.
Another explanation is that DoJ feared a repeat of U.S. v. IBM, which dragged
on for 13 years, only to be dropped as "without foundation." While you could
easily imagine lawyers for the DoJ not wanting to stake their careers on a
losing battle, you have to wonder whether U.S. v. IBM was such a complete
washout, after all. Even though the case was eventually dropped, for years it
had a serious effect on IBM. You could even argue that it was this supposedly
unsuccessful case that caused IBM to unbundle software from hardware, thereby
opening the way to an independent software market, making room for software
upstarts, including a company called Microsoft. In many cases, Microsoft was a
beneficiary of U.S. v. IBM, and "the next Microsoft" could have been a
beneficiary of a U.S. v. Microsoft case. 
Ultimately, I think that the DoJ didn't push for more against Microsoft for
the very simple reason that they felt they couldn't win anything else.
Responding to widespread criticism of the settlement as a DoJ sell-out, Anne
Bingaman protests, "folks, we looked at every aspect of this. We brought the
case that was there to bring." According to the DoJ, the Microsoft settlement
was "everything we could have hoped for in a fully litigated case, and
possibly more."
This is probably true. Law, like politics, is an "art of the possible." While
the settlement gives the Microsoft steamroller the green light, at the same
time it's hard to see what the DoJ could have done differently. The DoJ's job
is to enforce the antitrust laws, not to make industries more competitive--and
the two are not the same.
What all this means is that those Microsoft practices studied by the DoJ, but
not covered in the settlement, are either not illegal, or would be too
difficult to prove illegal. 


Where To Now?


While there might be some private antitrust action from Novell, Lotus, or
Borland, and while the terms of the settlement are subject to public review,
Microsoft must be feeling emboldened by the limited scope of the consent
decree. Microsoft should be able to go full-steam ahead with its plans to
greatly expand the operating system's dimensions in Chicago. Microsoft Office
will increasingly seem like an essential part of Windows. With policies such
as its new, heavy requirements for using the "Windows Compatible" logo (see
"How to Adapt an App for Chicago: Requirements for the New Windows Logo,"
Microsoft Developer Network News, July 1994), Microsoft is raising the Windows
development bar ever higher.
The PC-software industry is rapidly headed in the same direction as many other
technology-based industries before it: rapid consolidation to a handful of
vendors. There once were hundreds of U.S. car manufacturers; now there are
just a few. With Novell's acquisition of WordPerfect and parts of the Borland
product line, with Symantec's acquisition of Central Point, and Microsoft's
purchasing a minority share in Stac Electronics, we are already seeing the
same (probably inevitable) process occurring in software. As Table 1 shows,
market shares reflect an already highly concentrated industry.
On most scales, Microsoft is about twice the size of its two nearest
competitors combined. Lotus had 4450 employees and Novell also had 4450;
Microsoft has 14,450. In 1993, Lotus sales were $981 million and Novell sales
were $1.123 billion; Microsoft sales were $3.753 billion.
Given that the DoJ could apparently do very little about this increasing
concentration in the software industry, what are software developers and
vendors to do?
It is probably stating the obvious, but there is little point in trying to
compete with Microsoft over productivity apps and office suites. These are
rapidly becoming a quasi part of Windows itself, and even Novell and Lotus
probably have little chance in this area. Microsoft Office is everywhere and
everything. Perhaps there is still some room in databases, desktop publishing,
and personal-finance software.
As always, another interesting area is plugging holes in Microsoft's own
offerings: add-ins to Microsoft Office, remedying the inevitable temporary
problems in Chicago, and so on.
The best bet is to find areas where Microsoft doesn't have a product, and
where there is a chance of a several-year window of opportunity before it does
have a product. On the other hand, the only market I've ever heard of that
Microsoft didn't want to get into was pornographic screen savers and related
multimedia titles. As one company employee told me, "We looked carefully at
adult software, and decided to leave that money on the table."
Table 1: Application-software market shares.
 Market Share (Percent) 
 Operating Word
 Systems Processors Spreadsheets 
 Microsoft 66 47 52
 Novell/WordPerfect 14 35 --
 Lotus -- 3 37
 IBM 17 -- --
 Apple 2 -- --
 Borland -- -- 6




























October, 1994
PROGRAMMER'S BOOKSHELF


So What Do You Think?




Peter D. Varhol


Peter is chair of the graduate computer-science department at Rivier College
in New Hampshire. He can be contacted at varholp@alpha.acast.nova.edu.


I recently came across a new printing of Why Computers Still Can't Think, by
Hubert Dreyfus, which explores the technical limitations of computers before
leaping to the ambitious (and dubious) conclusion that trying to make
computers appear intelligent is a waste of time and money. Since I'm more used
to extending my imagination than limiting it, I turned to two books whose
purpose was to guide and direct intelligence, whether human or machine.
Between the two books, I found a much more positive perspective on the future
interactions between people and computers.


Can Pyschologists Make Us Smart?


I confess to losing respect for "psychologists in computing" after being one
of them. For the most part, the psychologists in computing I knew could not be
bothered with learning about the technical limitations and opportunities
afforded by hardware and software (they were, however, more than happy to
collect paychecks equivalent to those of their engineering counterparts).
That's not to say that there isn't a role for specialists who can bridge the
gap between the human mind and the computer processor. While software moves
closer to the human way of perceiving the world, software developers continue
to lack a good conceptual model of human thought that can assist us in making
design decisions.
Donald Norman, once professor of cognitive science at the University of
California at San Diego and now an Apple Fellow, doesn't pretend to have just
such a conceptual model in Things That Make Us Smart: Defending Human
Attributes in the Age of the Machine. However, Norman does point out some
rather remarkable things about the way humans think and work, and how
computers can be used to complement rather than detract from our natural
traits. 
Norman begins with a simple distinction between experiential and reflective
modes of thinking. Experiential thinking concerns the lessons we learn from
experiences--lessons which are usually learned and applied with little or no
delay, and no conscious thought behind them. Reflective thinking, on the other
hand, involves reasoning out a situation that may not have been encountered
before, and making an evaluation and a decision based on that evaluation.
This distinction affects people's use of software if the software requires
them to reflect when they should be experiencing and experience when they
should be reflecting. Desktop publishing, for example, didn't become popular
until we could immediately see on the page the impact of a design change,
rather than trying to picture the result of a change in our minds as we typed
in the formatting characters.
The way we represent information determines how the user makes use of
technology. Different designs can force users into specific ways of working
that rarely get noticed. For example, while Windows can be used without a
mouse, very few bother without some kind of pointing device. The graphical
display almost demands a way of moving the pointer in ways other than those
defined by the cursor keys. A spreadsheet, on the other hand, has a basic
design that's so flexible it can be used for a wide variety of activities for
which it was never intended.
What lessons does Norman provide for software developers? The answer is in his
last chapter, which distinguishes between hard technologies--those that are
inflexible and require the user to adapt to their way of working, and soft
technologies--those that adapt to and complement the way people work and think
about things.
How important is it to distinguish between hard and soft technologies when
designing and building software? It may make the difference between a
successful product and a failure, and it may not always be possible for the
developer to consciously distinguish between the two types of technologies.
How many times have we felt that we have "conquered" or "mastered" a
particular software package, when in fact it should have eased its way into
our style of working, while we were unconsciously adapting to its perspective
on our problem?
Unlike Norman, I don't blame the technologists, or anyone else for that
matter, for producing computers and software that are difficult to use. In the
beginning of the computer era, computers were impossible to use for just about
everyone. They had little power to spare from their number-crunching
activities, and most knowledgeable people believed that they would always be
run by specialists for very specific purposes. There was little understanding
of or need for the distinction of hard and soft technologies. They were all
hard because of the limitations of the technology and how we thought about it.
As computers became more powerful and less expensive and gained a foothold on
the desktop, software developers responded with products that became easier
for individual, nonspecialist users. Today, software has to be graphical to
survive, while the very best software is making a few tentative efforts at
working in a way that makes it appear less of a tool and more of a
semi-intelligent assistant.
All this means that people like Donald Norman have important lessons to impart
that software developers are not always aware of. In software development, we
rarely design with the idea of a sensible division of labor between the human
and the computer. Norman points out that we can learn valuable lessons from
looking at why people do certain activities the way they do, and try to
support it with computers, rather than try to replace it with something
better.


Thinking Fuzzy Thoughts


Like usability engineering, fuzzy logic is a field that has not achieved
widespread acceptance, despite the cheerleading efforts of Lotfi Zadeh and
Bart Kosko. The problem is that it sounds too much like yet another theory of
probability, and its proponents are terribly defensive as they try to explain
how the two differ. Nevertheless, there is a certain appeal to being able to
program a computer using a type of logic that seems to map directly into the
imprecise notions of how people think and communicate.
I began reading The Fuzzy Systems Handbook fully expecting a nuts and bolts
account of assembling fuzzy constructs. The book did contain a C++ disk, after
all, and had code examples of just about every type of fuzzy concept. I was
pleasantly surprised to find a lucid, engaging discussion on just why fuzzy
sets were good at representing human reasoning processes. 
In one sense, The Fuzzy Systems Handbook is a true how-to book on constructing
fuzzy systems from scratch. Each concept, from the mathematical constructs of
fuzzy sets to the fuzzy-membership function, is described, illustrated, and
coded into a working example. The code is presented in the text and on disk,
so you can examine it as it runs. Even better, the code is of high enough
quality to be integrated into your own programs, letting you quickly build
fuzzy models into new and existing applications. Several months ago, for
instance, I began writing a set of fuzzy-operator libraries for my visual
simulation language. I simply substituted Cox's code for mine and ended up
with more capable and robust functions.
However, the book goes well beyond the code to give you a sense of how fuzzy
logic emulates human thought. There is no preaching, as is Bart Kosko's
tendency when describing fuzzy thinking as the "universal truth," but rather a
practical discussion of why it can be a useful computational model for
imprecision. As I read the text and ran the examples, it became clear that
fuzzy systems are an attempt to adapt the Boolean nature of computers and
computer programming to the imprecise nature of many human activities.
The word "warm," for example, has different meanings to different people and
under different circumstances. Yet people understand well enough the concept
of warm so that it need not be more precisely defined. Fuzzy systems also
understand the concept of warm by defining a membership function across the
possible range of temperatures that connotes the degree of "warmness." By
using ordinary set operators such as union and intersection (with their own
fuzzy definitions), fuzzy systems can manipulate these concepts to produce
evaluations and decisions that might not be normally possible with software.
Cox starts with some of the simpler concepts, such as fuzzy-set operators and
membership functions, and uses these as a foundation to progress into fuzzy
reasoning, fuzzy models, and the fuzzy-system life cycle. Several example
projects are included that enable the reader to assemble working models of
integrated fuzzy systems. At every step, Cox describes the concept, relates it
to previous concepts, and shows you the code. He works with Borland C++, but
does not make use of any object constructs, so it may be possible to work with
any ANSI C compiler.
The Fuzzy Systems Handbook is best read beside your computer. The disk is full
of C++ examples and demos, and examining, compiling, and running them while
you're reading about them is just about the best way of learning. Possibly the
best thing about it is that, unlike most how-to books on new or esoteric
subjects, The Handbook of Fuzzy Systems does not require you to buy yet
another book to understand the background behind what you've just done.


Closing the Book


Hubert Dreyfus bases many of his arguments for the limitations of computers on
a dichotomy between things that people do well and things that computers do
well. Norman recognizes a dichotomy too, but instead of adhering slavishly to
it, he uses the dichotomy as the starting point for weaving together the
strengths of people and software to get more than is possible from either one
individually. For their part, fuzzy systems question the very existence of
such a dichotomy, or at least demonstrate that it may not be where we think it
is. In any case, both Norman and Cox provide fitting responses to Why
Computers Still Can't Think.
Things That Make Us Smart: Defending Human Attributes in the Age of the
Machine
Donald S. Norman
Addison-Wesley, 1993, 290 pp., $22.95
ISBN 0-201-58129-9
The Fuzzy Systems Handbook
Earl Cox
AP Professional, 1994, 615 pp., $49.95
ISBN 0-12-194270-8
































































October, 1994
SWAINE'S FLAMES


Is Justice Done?


I am afraid that this is going to be one of those dry, sober,
voice-of-reason-type columns. I'd much rather do a crazy, irresponsible,
hyperbolic diatribe, and I'm sure that's what you expect of me.
Trouble is, with respect to this particular subject, the other diatribesmen
just got there first. The subject is the final resolution in the case of
Microsoft v. Justice.
The facts of the decision are as follows: After the FTC deadlocked on its
investigation of widespread claims of monopolistic practices by Microsoft, the
Justice Department, under Anne Bingaman, Assistant Attorney General for
Antitrust, took over the case, expanded it beyond the original scope,
threatened a lawsuit, and reached a settlement that represents a retreat to
more or less the original FTC scope: No more per-processor licensing, no more
multiyear licenses, and no more nondisclosure provisions that effectively
prevent developers from working on products from Microsoft's competitors.
The meaning of these facts has been variously interpreted, depending on who's
doing the interpreting.
Justice brags that it went toe-to-toe with Bill Gates and Bill blinked.
Microsoft disagrees about who did the blinking, and claims that the decision
means nothing and will have no effect. Computer-magazine columnists mainly
take one of two stances: 1. The decision means nothing and will have no
effect; or 2. quit your whining, you sniveling developers you. Microsoft's
competitors mainly say either: 1. The decision means nothing and will have no
effect and it's just not fair; or 2. give us a level playing field and we'll
crush Microsoft like a flea.
All of these claims are hooey. Taking the last first: On a level playing
field, Microsoft would still have you for lunch. No other software company
today can match Microsoft's understanding of the industry or its marketing
savvy, and few can match its dedication (killer instinct, need to dominate) or
its depth of experience. Even in some dream world in which technical
excellence is all that matters, Microsoft could prevail simply by moving
technical excellence up from fifth to first priority. "Level playing field,"
my eye.
As to the "quit your whining, be a manly man, don't cry to the government"
position: That's fine if you like rolling in the mud. Without rules, the
dirtiest fighter wins. Or the richest; in any case, that's probably not you.
As to the claim that the decision means nothing and will have no effect, I'd
say that's up to Microsoft's competitors. Microsoft formerly engaged in
certain practices that presumably gave it a competitive advantage. At least
with respect to those practices, that advantage has now been removed. This
removal represents an opportunity for someone. Of course, there is no
guarantee that someone will seize this opportunity.
It's even been claimed that Microsoft got the settlement it wanted. Excuse me?
Microsoft wanted to be forced to change its licensing policies and be branded
a monopolist?
True, the settlement was disappointing to critics of Microsoft's practices in
that it addressed only three of the issues in contention. But does this mean
that Justice caved in to Microsoft on the other issues, or was it only
pursuing them as a threat to get Microsoft to knuckle under on these three? We
don't know. We do know that nothing in the settlement rules out further action
by the Justice Department or a class-action suit against Microsoft over other
alleged monopolistic practices.
It's not over till it's over.
Michael Swaineeditor-at-large










































October, 1994
OF INTEREST
PKWARE recently began shipping a Windows version of its Data Compression
Library, which provides a small set of compression functions allowing
developers to add data compression to their applications. The library provides
an implode() function which encapsulates the data compression engine and an
explode() function to handle decompression. These functions allow for
compression and decompression of either ASCII or binary data from virtually
any data source. The library also includes a 32-bit CRC function, provides
support for error handling, and supplies a mechanism for handling user-defined
callback functions. Two additional functions are provided explicitly for
Visual Basic support. Other languages supported by the library include
Microsoft C, Visual C++, and Turbo Pascal for Windows 1.5. Although the
implode() compression format is not compatible with the company's PKZip
format, it is compatible with the DOS and upcoming OS/2 versions of the
compression library. The library is provided both as a DLL and as a static
.LIB file. PKWARE claims that the DLL version requires only 36 Kbytes of
memory for compression and 12.5 Kbytes for decompression. There are no
royalties for DLL distribution. Both the Windows and upcoming OS/2 versions
are priced at $350.00. Reader service no. 20.
PKWARE
9025 N. Deerwood Drive
Brown Deer, WI 53223
414-354-8699
Watcom has begun shipping Watcom C/C++ 10.0, which provides a new, graphical
IDE. The environment includes a C++ class browser, GUI debugger, text editor,
and profiler. Additionally, Version 10.0 provides Windows resource editing
tools for icons, bitmaps, dialogs, and menus; it also includes the Spy,
Heapwalker, and Dr. Watson debugging utilities.
Watcom C/C++ 10.0 is, for the most part, C++ 3.0 compliant, supporting both
templates and exception handling. It also supports MFC, OLE, and SOM/DSOM.
Application targets includes 16-bit DOS, Windows 3.x, OS/2 1.x, 32-bit DOS
(with extenders), OS/2 2.x, Windows NT, Win32s, 32-bit Windows 3, and NetWare
NLMs.
For a limited period, Watcom C/C++ 10.0 is available on CD-ROM for $199.00,
although the regular retail price is $350.00. Reader service no. 21.
Watcom International
415 Philip Street
Waterloo, ON
Canada N2L 3X2
519-883-6308
ILOG has announced two new tools. The first, ILOG Broker, is a tool that lets
you turn any existing C++ application into a distributed app by changing
header files. The second, the ILOG Server, is a tool for building dynamic
servers of C++ objects so that programmers designing complex C++ applications
can simultaneously access objects and the ILOG Broker. 
ILOG Server is based on the concept of view coherence and is an extension of
the Smalltalk Model-View-Controller (MVC) architecture. ILOG Server provides a
C++ preprocessor and a set of libraries with two types of services: Object
Model classes and Object Server classes. 
ILOG Broker is designed for C++ programmers who need to develop distributed,
object-oriented applications without leaving C++ or learning a new programming
language. It is a light, distributed programming tool that can be used at
several levels in cooperative processing environments. Its core technology
utilizes the RPC protocol and can be used, for example, to effectively
implement both C++-based Object Request Brokers and transparent C++ support
for CORBA/IDL applications. Since ILOG Broker is an extension of the C++
language, you can revamp any existing C++ application. Both tools are
currently available for UNIX-based systems and sell for $5000.00 each. Reader
service no. 22.
ILOG 
2105 Landings Dr.
Mountain View, CA 94043
415-390-9000
Greenleaf Software has provided a major upgrade to its asynchronous
communications library, CommLib. The latest release, Version 5.0, provides
over 350 C functions and support for the Win32 API and 32-bit DOS extenders,
including Phar Lap's TNT DOS-Extender and Rational Systems' DOS/4G. CommLib
5.0 will also include language-independent DLLs for Windows NT and Windows
3.x.
The library provides a set of both high-level functions for quick development
and low-level functions to fine tune communications apps. CommLib provides
support for multiple multiport boards in the same PC; 16550 UART FIFO modes;
numerous file-transfer protocols (including the addition of CompuServe's B+
protocol); and XON/XOFF, RTS/CTS, and DSR/DTR flow control. The library
supports all of the popular C compilers, including Borland C/C++, Microsoft
C/C++, Symantec C++, and Watcom C/C++. Additionally, the library is callable
from any language that supports DLLs and Pascal calling conventions, including
Visual Basic, Pascal, and Smalltalk. CommLib is priced at $359.00. Reader
service no. 23.
Greenleaf Software
16479 Dallas Parkway, Suite 570
Dallas, TX 75248
214-248-2561
Archimedes IDE for Windows is a point-and-click embedded-systems development
environment for 8-, 16-, and 32-bit microcontrollers. The Windows-hosted
environment includes an editor, ANSI C compiler, intelligent linker/loader,
librarian, C libraries, real-time debugger, simulator debugger, and the make
utility. The toolset is currently available for the 680x0, 683xx, 68HC16,
68HC11, 68HC08, and 68HC05 controllers. The Archimedes IDE toolset sells for
$1595.00. Reader service no. 24. 
Archimedes Software
2159 Union Street
San Francisco, CA 94123
415-567-4010
CommTouch has announced Pronto/IP, a PC-based e-mail client to TCP/IP hosts,
which enables Windows users to directly (that is, without a gateway) exchange
e-mail with UNIX-based TCP/IP hosts. The tool also makes it possible for text
and binary attachments to be sent internally, as well as through the Internet,
using UNIX or other TCP/IP-based hosts as mail servers.
Pronto/IP uses the Windows Sockets (Winsock) API to interface with TCP/IP
stacks such as those built into Windows for Workgroups or other commercial
offerings. Pronto/IP uses the standard POP and SMTP protocols for exchanging
incoming and outgoing mail with the host mail server. Pronto/IP sells for
$69.00 in single quantities. Reader service no. 25.
CommTouch Software
1206 W. Hillsdale Blvd., Suite C
San Mateo, CA 94403 
415-578-6580
A consortium of computer companies have teamed up to define a common speech-
recognition API for Windows. The Speech Recognition API Committee, which
includes WordPerfect, Dragon Systems, IBM, Kolvox, Kurzweil, Lernout &
Hauspie, Philips, and Novell, is developing an open standard to enable the
integration of speech-recognition technology into their Windows applications.
The Speech Recognition API is promoting a consistent user interface, along
with features and interactions that can be done more easily and efficiently by
voice rather than by keyboard or by mouse. 
The Speech Recognition API will support continuous and discrete command and
control speech capabilities (that is, individual voice commands such as "File
Print") as well as continuous and discrete dictation capabilities for
inputting of text and data (natural-language input such as "Print five copies
of this document"). Reader service no. 26.
WordPerfect Corp.
1555 N. Technology Way
Orem, UT 84057
801-225-5000
CardTrick FFS, a flash file system that's compatible with Microsoft's FFS2
linked- list, installable file system, has been released by Datalight. Flash
memory is read/write, nonvolatile memory that can retain information without
power. 
In addition to being compatible with FFS2, Datalight says that CardTrick FFS
is about 10 percent smaller and less expensive when it comes to license fees.
The company also claims that CardTrick FFS does not accumulate speed
degradations (leading to increasingly slower access times) when files are
changed, as does FFS2. Reader service no. 27.
Datalight 
307 N. Olympic Ave., Suite 201
Arlington, WA 98223
206-435-8086
The MIDI Programmer's Toolkit for Windows, available from Music Quest, is a
kit that allows multimedia and music-program developers to create applications
ranging from sequencing, music notation, and music instruction to live
performance using MIDI instruments and sound cards. The Toolkit hides much of
the Windows API while providing a complete library of functions. The library
allows content developers to read and write songs in Standard MIDI File form;
receive and transmit MIDI events to and from MIDI instruments and sound cards;
filter events; and synchronize to either a MIDI clock, internal timebase, or
SMPTE time code. The library is provided as a DLL and supports Microsoft and
Visual C++, Borland C/C++ and Visual Basic. The MIDI Programmer's Toolkit
sells for $99.95. Reader service no. 28.
Music Quest
1700 Alma Drive, Suite 300
Plano, TX 75075
214-881-7408
The Open Mail System C++ Class Library has been released by Raindrop Software.
The library is designed to provide C++ programmers a straightforward means of
writing e-mail support into Windows applications using protocols such as VIM,
MAPI, and MHS. Future releases will provide support for CompuServe, MCI mail,
and SMTP. The company also provides Open Mail System support in the form of
VBXs. The royalty-free C/C++ libraries sell for $995.00. Reader service no.
29.
Raindrop Software

833 Arapaho, Suite 104
Richardson, TX 75081
214-234-2611
Durand Communications Network (DCN) has released DC Genesys, a relational
multimedia-database toolkit for electronic bulletin-board systems (BBSs). The
software, which is dBase and FoxPro compatible, supports JPG, CMP, PCX, BMP,
FIF (fractal), TIF, TGA, and GIF image formats. The database search engine is
based on query-by-example via custom forms as well as the command line. In
addition to the database engine, the package includes an application engine,
DOS and Windows terminal software, DOS and Windows compression software, and a
sample phonebook application. The royalty-free software sells for $995.00.
Reader service no. 30.
Durand Communications Network
147 Castilian Drive
Santa Barbara, CA 93117
805-961-8700
Apiary recently began shipping Version 2.0 of its NetWare Client SDK for
Visual Basic. The development kit, which includes the NetWare API, allows VB
programmers to create NetWare client applications. The product also includes a
Windows Help file that documents all versions of the API and includes function
prototypes for C and Visual Basic. The NetWare Client SDK additionally
supplies example programs demonstrating various functions in NetWare 2.x, 3.x,
and 4.x, as well as Novell's new Directory Services API. The SDK sells for
$395.00. Reader service no. 31.
Apiary
10201 West Markham, Suite 101
Little Rock, AR 72205
501-221-3699
Taligent has announced the start of its Partners Early Experience Kit (PEEK)
program, which will provide Taligent code, documentation, and training to
qualified developers. In addition, the company stated that developers will
receive a prebeta release of its Taligent Application Environment (TalAE),
which the company describes as an open, portable application system that is
operating-system independent. The environment, which is based on Taligent's
Task-Centered Computing model, provides the foundation for both functionality
and interoperability. TalAE will initially run on IBM's AIX operating system.
The program will include six one-week training courses and participation in
the Taligent Developer Technical Services (DTS) program. Reader service no.
33.
Taligent Inc.
10201 N. De Anza Blvd.
Cupertino, CA 95014
408-255-2525
In a related announcement, Taligent has licensed the SNiFF+ development
environment from TakeFive Software. SNiFF+ is a C++ portable programming
environment that runs on UNIX workstations, including Sun SPARC, IBM RS/6000,
HP 9000/7xx, DEC RISC, DEC Alpha, and SCO ODT. SNiFF+ provides browsing and
design visualization, cross referencing, editing, and documentation support
for any C++ compiler. For portability reasons, debugging support is delegated
to the C++ compiler. The product features an extractor which retrieves
information from the source code, making use of a fast fuzzy C++ parser to
parse source files and send a tokenized stream of extracted information to
client applications. An execution component either compiles and executes or
interprets the source and provides debug information. The product also
features an information repository based on a symbol table kept in main memory
and provides tools to browse and manipulate information in the data
repository. SNiFF+ is priced at $2990.00. Reader service no. 32.
TakeFive Software
20823 Stevens Creek Blvd., Suite 440
Cupertino, CA 95014
408-777-1440
Versions 1.1, a version-control system for Windows and Windows NT, is now
available from StarBase. The versioning software, which is based on a
project-library concept, allows for any number of files to be checked in or
out of the project library and provides access to both current and previous
versions of a given file. The product supports all file types, including
binary files, allowing users to collaborate on anything from source code and
resource files to word processing, graphics, or spreadsheet files. Versions
1.1 includes a feature called "delta versioning" that allows side-by-side
comparisons of ASCII files. Other features include a DOS command-line
interface for batch operations, automatic schedule check-in, a Project Wizard
for project setup, user-definable build and test commands, drag-and-drop
support, and more. Versions 1.1 retails for $279.00. Reader service no. 34.
StarBase Corp.
18872 MacArthur Blvd., Suite 400
Irvine, CA 92715
714-442-4400


































November, 1994
EDITORIAL


Who's that Tapping at Your Back Door?


After reading two different headlines from two different newspapers, you begin
to understand why sales of Bruce Schneier's book, Applied Cryptography are
spiraling up, why RSA Data Security president Jim Bizdos is going to be a rich
man, and why encryption will be in the eye of technological and social storms
in the coming decade. 
For starters, consider the May 12, 1994 Associated Press article headlined,
"Agents Tap More Lines Than Ever." In the first year of Clinton's
administration, so the story goes, court-approved phone taps and electronic
bugs by federal agents were up 50 percent over the previous high. This was
followed by an August 9th headline proclaiming, "Bill Would Make Wiretapping,
Tracing Messages Easier." While I'd be the last to suggest that the title
intentionally referred to the Prez, Clinton has nonetheless supported the
Digital Telephony bill that would force telephone and cable-television
companies to grease the skids for government agents who want access to your
private data. Adding insult to digital injury, the government plans on forking
over $500 million of your tax dollars to phone and cable companies to pay for
developing and installing the necessary software, not to mention ponying up an
unspecified amount of money for unknown future costs.
Almost lost in this summer's crime-bill and health-care shuffle, House bill HR
4922 (sponsored by Rep. Don Edwards, D-Calif.) and the Senate version S 2375
(backed by Sen. Patrick J. Leahy, D-Vt.) underwent one compromise after
another as they quietly slid from subcommittee to the floor. In the end, we're
left with legislation that does, in fact, make it easier for government agents
to tap digital communications and gain access to billing records and audit
data, assuming they first obtained court-sanctioned search warrants. As if
that isn't enough, telcom carriers would have to guarantee they can quickly
isolate targets, identify origins and destinations, and transmit this
information to the government--all without the target catching on. As
proposed, the law would apply to the surveillance of public communications
networks, including yet-to-come, two-way, cable-based communications systems
that will carry both voice and data.
As bad as this sounds, the Leahy/Edwards compromise is still good news to the
civil libertarians who've been tracking it, especially considering the
previously proposed alternatives. In one incarnation, the bill went so far as
to give the attorney general thumbs-down power over technological advances
that could hinder the government's ability to tap communications. The bill
also imposed stiff penalties on common carriers that didn't comply with
wiretap regulations.
Widespread opposition by the communication industry, civil libertarians,
computer professionals, legislators, and even former administration officials
like Roy Neel (who now heads the United States Telephone Association) forced
the government to back down. According to Neel, who spoke at last spring's
Computers, Freedom, and Privacy Conference, the Digital Telephony act would
undermine consumer confidence in the privacy of the nation's communications
networks, turn your local phone company into "an agent of law enforcement" and
"put a damper on technological development."
This isn't to say that there aren't big-time criminals who pose a real threat
to our personal safety. In particular, the government claims it needs help in
coping with technically sophisticated, cash-rich drug dealers who can afford
to stay a step or two ahead of government agents. According to the
Administrative Office for U.S. Courts, about 75 percent of the court-approved
wiretaps and bugs in 1993 were for narcotics-related investigations. In
congressional testimony, FBI Director Louis Freeh has stated that new
communications technologies are making it harder to tap phones, arguing that
it would be a "disaster" if Congress doesn't give law enforcement an express
lane on the information highway. 
But the fact remains that the government is involved in more electronic
eavesdropping than ever before and, with its carrot-and-stick approach to
dealing with communication carriers, is looking for ways to increase its
electronic-surveillance capabilities. Privacy watchdogs like the Electronic
Frontier Foundation were mollified by the Leahy/Edward compromises,
particularly those requiring court-approved search warrants. "This is a key
part of the package that makes it, if not palatable, at least in the ballpark
of acceptability for us," said Jerry Berman, EFF policy director. However,
Representative Edwards, recalling his days as an FBI agent under J. Edgar
Hoover, notes that although "it was illegal for us to tap telephones, I seem
to remember we did it anyway."
If the Administration and FBI had thought this through, they'd realize that
what they need is not more crime statistics, but better spin doctors. Think
about it. If the $500 million package for developing and installing software
back doors into the communication system were pitched as a jobs package for
out-of-work programmers, it would go through Congress like a kid down a water
slide. Oh, well. Knowing how Congress works, some legislator would probably
tack on a rider to the effect that all work had to be done in PL/1. In the
meantime, encryption experts like Bizdos and Schneier will go on being the big
winners as developers and users look for better ways of keeping their private
digital data truly private.
Jonathan Ericksoneditor-in-chief













































November, 1994
LETTERS


Pound, Pound, Pound


Dear DDJ,
Eric Zapletal makes some good points ("Letters," DDJ, August 1994). However,
as a software developer with an EE background, I am qualified and compelled to
respond to his criticism of programming languages. I believe Eric is correct
in his assertion that a schematic representation would be far more productive
for RLU programming than traditional languages; he is incorrect, however, in
implying that schematic representations are inherently superior to programming
languages in general. Eric states: 
A circuit schematic is 2-D, and it is understood that you can read (or look
at) any part of the schematic in any order. For a language [presumably he
means "program"] to make sense, you must start at the beginning and work
steadily through to the end (clearly, programs don't run steadily from BEGIN
to END--the main reason why languages are not suited to programming).
For starters, the comparison itself is flawed; he is comparing the process of
understanding a single part of a schematic with the understanding of an entire
program. Furthermore, any part of a program (or at least a well-written
program) may be viewed and understood separately from the whole. That is a key
principle of virtually every programming methodology--schematic or language
based. (And where is it written that because programs do not run steadily from
BEGIN to END, languages are not suited to programming? Indeed, that is quite a
leap in reasoning.)
I think it true that a visual methodology for language-based programming would
be more productive for some tasks than some current methods, and the
evolutionary direction of certain software-development tools supports this.
However, until the field of software engineering matures, I do not believe
this will be entirely possible. The very notion of a simulation environment
implies a well-defined number of tightly controlled parameter inputs and
outputs. A physical component can be modeled as a truth table, transfer
function, or appropriate metaphor on a diagram--can the same be said for a
function? Certainly a great number of common algorithms have come into
widespread use in the programming community, but is any significant percentage
thereof truly standardized? I think not.
With regard to Eric's questions about the existence of 32- and 64-bit
software, there is a paradox of inertia involved. The primary motivating force
behind most product development is sales. Why should one develop software for
a 2x-bit platform when the market for x-bit software is much more lucrative?
It is in this manner that hardware stifles software. Paradoxically, it is the
enticement of more powerful software that generally moves the installed base
to upgrade. So, what really comes first, the upgrade or the software?
Concerning bugs becoming a "programming badge of merit," I share Eric's
opinion that this is disgraceful. I have, however, never personally met a
developer whose goal was to generate bugs or wear them as a badge of merit. I
have, unfortunately, met developers who seem to share Eric's implied
conviction that perfection is possible. I am sure that virtually every
nontrivial piece of code I have ever written has a hidden bug somewhere, but I
take no perverse pride in this; it is simply a painful acknowledgment of my
flawed and fragile humanity.
In the same way that different programming languages lend themselves to
different tasks, different development methodologies also lend themselves to
different products. I continue to enjoy the process of learning new languages
and learning to best differentiate the class of problems to which a particular
language is best suited. In summation, Eric's comments remind me of the old
adage about one's possession of a hammer so inclining one to (myopically) view
each new task as just one more nail to pound_.
John B. Williston
Plainwell, Michigan


Mr. Postman


Dear DDJ,
In his "Editorial" on the U.S. Postal Service (DDJ, June 1994), Jonathan
Erickson falls into some common misconceptions about the Post Office. He
states that the Postal Service is stuck between universal delivery of the mail
and (often cheaper) competitors who can pick and choose where to deliver.
Jonathan should check with some of these competitors. UPS delivers to every
address in the United States. They also deliver to every address in many
European and east Asian nations. I think Federal Express also delivers to
every address in the United States.
UPS and Federal Express do enjoy the advantage of primarily serving the
business-to-business markets of express and package delivery, but this is
because of Postal Service monopolies which prevent competitors from delivering
many types of mail.
The reason private companies provide universal delivery even when not required
to is quite simple. If a private delivery service didn't provide universal
delivery, shipping a package would involve checking lists of destinations to
decide who delivers where. It's simpler to go with a single shipping
service--adding a little to delivery costs (to get to remote areas) increases
the volume of business immensely.
This translates into the "information superhighway." Congress is being lobbied
to legislate universal, subsidized access to "worthy" causes. The incentive
exists to provide universal access without government mandates since anything
less results in a "look up how to send the information" problem. It's one
reason why the major online services (CompuServe, MCI Mail, America Online)
are all connected to the Internet: It avoids a question of which service to
log onto and permits me to send this letter electronically even though I don't
have an account on any of the online services.
Thomas Wicklund
Longmont, Colorado
DDJ Responds: Thanks for your letter, Thomas. You're right. It would have been
irresponsible for me not to check with competitors to the U.S. Postal Service
to get their side of the story--that's why I called both Federal Express and
UPS. According to the spokesperson I talked to, not only is FedEx barred by
law from competing with the Postal Service in the home-to-home market, they
have no interest in doing so.


IPC Kudos


Dear DDJ,
In his article, "IPC: UNIX vs. OS/2" (DDJ, May 1994) John Rodley did an
excellent job of comparing the UnixWare and OS/2 approaches to IPC. In
particular, his use of analogies made the article very clear. This issue of
DDJ was timely for me, because I'm involved in a project that requires
portability between UNIX, OS/2, and Windows, and relies on shared-memory IPC.
Also, I'd like to mention that I ported the example code to Linux with only a
single change in one #define.
Thanks for this interesting issue. An article on comparing streams frameworks
on different OSs is welcome!
Carlos Crosetti
Buenos Aires, Argentina


DAN Feels Right


Dear DDJ,
I just read Reg Charney's article, "Data Attribute Notation and C++" (DDJ,
August 1994) and must comment that his approach "feels" very right. As he
points out, the idea of encapsulating attributes in their own class allows a
close match with systems design and can ensure that the application has a
consistent method of handling attributes across all classes that define that
attribute (a rudimentary data dictionary). I look forward to trying this
method on my own projects.
James Mitchell
Auckland, New Zealand
Those Installation Blues
Dear DDJ,
I read about Al Stevens' experience with OS/2 with significant empathy ("C
Programming," DDJ, August 1994). I feel vindicated in my decision to not load
OS/2 2.x on my system at all. Back when IBM had the $49.00 upgrade offer, I
bought a copy. After I read the installation directions, I decided it wasn't
worth the hassle, and I gave my copy away. I am also a professional
programmer, and I had only curiosity to satisfy. I didn't feel that satisfying
my curiosity was worth the risk of trashing my system and having to restore it
all from back-ups.
Meanwhile, quite a bit of time has passed, and I have decided to push my way
through the difficulties and try Windows NT and Coherent, Mark Williams' UNIX
clone. I thought I had a difficult time with Windows NT, but it wasn't quite
as bad as Al Stevens' experience with OS/2. I recently purchased a 2GB Seagate
Barracuda hard drive and a WangDat 3200 for backup. Part of the reason for the
extra disk space was to have some room to play with things like Windows NT and
various programming languages and development environments. 
First, I backed up everything to a DAT tape, and then began to install Windows
NT. I had both a CD-ROM and a set of disks. My CD-ROM came as part of a
SoundBlaster Multimedia kit, and so was not directly supported by Windows NT.
Nonetheless, there were instructions for installing using a nonsupported
CD-ROM. I followed those instructions. The installation program asked me where
to put the Windows NT files, and since the 200-Mbyte IDE C: drive was nearly
full, I specified the E: drive (second 500-Mbyte logical drive on the
Barracuda). Well, the end result of this was that I lost everything on the
2-gigabyte drive. Windows NT and DOS had some kind of disagreement about which
drive was the E: drive, and I had to use my backup tape. (At least that worked
fine!) So I moved lots of stuff from the C: drive to the D: drive to make room
for Windows NT, backed up again, and tried again. This time everything loaded
fine. I took the default 640x480 video configuration, planning to follow the
directions and change to a higher resolution after installation. When I tried
to install a higher-resolution driver, Windows NT asked for an installation
disk, but refused it when I put it in the floppy drive. It also refused to
take it from the CD-ROM, even thought the CD was accessible via the driver I
had downloaded from Creative Labs' BBS. I didn't want to fuss with the stack
of floppy disks, so I decided to live with 640x480 for a while. I noticed that
NT was having problems with my 90-Mbyte Bernoulli drives. They would spin up
and down repeatedly for 15 minutes or even longer before NT finally decided it
couldn't tell what file system was installed. It booted more quickly when I
turned off the Bernoullis. Meanwhile, I didn't try to do any serious work with
NT.
Time passed. I replaced my Trident SVGA card with a Hercules Dynamite Pro, and
my Adaptec 1522 SCSI card with an Adaptec 1542. I decided to bite the bullet
and install NT from the floppies so that I could use 1024x768. I made the
mistake of asking for 1024x768 during installation, and ended up having to
repeat the first part of the installation process. Finally, I was able to
install a 1024x768 driver, but I still had the Bernoulli problem. Now,
however, I couldn't simply turn off the drives to get NT to boot, I also had
to reconfigure the AHA-1542 to supply a termination. Also, NT failed to
properly migrate my windows desktop to NT. It completely missed a few groups,
and in the group it did get, it initially set all the Icons to a question-mark
icon. I found that by selecting an icon and hitting Ctrl-Enter followed by
Enter, NT would then find the correct icon. However, my Microsoft Office group
was nowhere to be found--in a strange sort of poetic justice, it was only a
couple of groups of Microsoft applications that failed to migrate from
Windows. Eventually, I stumbled onto the solution to the Bernoulli problem.
The AHA-1542 ROM setup has an option to "Send Start Unit Command." The default
is disabled. When I enabled the option for the Bernoullis, NT booted with no
problem. So far, I haven't been able to make NT crash, but then, I haven't
tried very hard, either. I suspect that it is more bullet-proof than OS/2 per
Al Stevens' experience.
After I realized that Coherent required a separate hard-drive partition, not
merely a logical drive in an extended DOS partition, I decided not to install
Coherent on my main system, but to use another computer, at least initially,
to avoid reorganizing my 2-gigabyte hard drive. I purchased an additional
420-Mbyte Connor IDE drive for less than $250.00 and managed to get it working
as the master with my old 212-Mbyte Connor IDE drive. I was then able to
install Coherent, with the assistance of a couple of tech support calls. It
seems that both Coherent and Windows NT are more picky about hardware than DOS
because they access it directly without using the BIOS. I can no longer reboot
the computer with Coherent on it by pressing reset. I have to turn off the
power, and then power on again. This is apparently some kind of inadequacy in
the chipset or BIOS. The Coherent tech-support person was competent and
helpful, and I didn't have to wait long on hold. (I'll give you one guess why
I haven't even bothered to try to get Microsoft tech support on the phone.)
Meanwhile, I realized that X Windows support could barely limp along in 4
Mbytes of RAM, so I ordered some more RAM, and decided to leave Coherent alone
for a while.

It is certainly not "love at first sight" with Coherent. It really does act
like UNIX, with all the user-unfriendliness included. However, Coherent is an
inexpensive way to learn something about UNIX. Hopefully, the educational
value will be worth the trouble. I'll know more when I fiddle with the X
Windows stuff and the C/C++ compilers.
Daniel E. Hale 
Anaheim, California


Tab is the Key


Dear DDJ,
In Michael Swaine's interview with Lee Buck ("Programming Paradigms," DDJ,
August 1994), Lee says: "Call me silly, I shouldn't have to spend a lot of
time hitting tab. I just think that's stupid." I'm calling Lee silly. Doesn't
he know about indent?
Indent is a BSD program which has been around for about 18 years, is currently
in the GNU suite, and can reformat C code in a wide variety of formats. I
never found the time I spent formatting code to be a waste. (I never have used
a context-sensitive editor--it might be nice to use something where I would
hit a key and get a new function to fill in.)
Marty Leisner
leisner@sdsp.mc.xerox.com 

















































November, 1994
An SQL Server Message-Handling Class


C++ classes handle messages from both the server and DB-Library 




Mark Betz


Mark is a senior consultant with Semaphore, a consulting and training company
specializing in object technology, client/server development, and distributed
computing. Mark can be contacted on CompuServe at 76605,2346.


Wherever applications interact with the Sybase/Microsoft SQL Server by
executing stored procedures, sending complete SQL statements, or some hybrid
of the two, you'll find that you need a mechanism to handle communication
between the client application and the server. Microsoft supplies DB-Library,
an API of C-callable functions for sending data to or retrieving data from the
server. Recently, however, my colleagues and I required an interface to an SQL
Server which consisted of a layer of C++ classes on top of the DB-Library. In
particular, we had to design classes to handle messages from the server and
DB-Library to the client application. 
In this article, I'll examine the nature of these messages, and present the
classes we developed to handle them. In this application, the database logic
was contained largely in stored procedures. Operations were performed by
building SQL statements that invoked procedures and using DB-Library to send
them to the server.
DB-Library and SQL Server identify a connection to a database using a
structure called DBPROCESS that is allocated and maintained by DB-Lib. An
application connects to a database by requesting a DBPROCESS, and receives a
near pointer to use as a handle to it. "Processes" (or "Process objects")
refer to objects in an application that own a DBPROCESS handle and use DB-Lib
to send commands to the server.


Message Generation and the Message Pipeline


When an application tells the server to perform a database operation, the
server may generate a message in reply. DB-Library may also send messages to
the program. There are over 1000 possible messages dealing with everything
from communications problems to acknowledgment of a change of default
database. When a message originates with DB-Library, it is normally an error.
However, the server generates informational messages which are not necessarily
errors. For my purposes here, the terms "error" and "message" are synonymous.
DB-Library is the clearinghouse for server messages. In addition, however, it
generates its own messages. To receive message data, an application registers
one of two callbacks with DB-Lib: one for server messages, the other for
DB-Lib messages. In the Microsoft SQL Server C++ example code (available on
CompuServe), Eric Reel implemented a pipeline allowing messages to be sent to
the part of an application which prompted them. The idea is based on the
assumption that there are classes encapsulating each process, as well as a
central class responsible for logging in to the server and receiving and
dispatching messages. Figure 1 shows a typical scheme in which messages arrive
at the Process objects and are passed on to individual handlers. The scheme
could just as easily use a central message handler, though it seems useful to
allow different types of processes to have differently configured handlers.
Each message handler has an interface which allows it to receive messages, yet
it is unaware of how those messages got there.


DB-Library Message Data


Messages from the server or DB-Library consist of several items of data passed
as arguments to the callback functions. Table 1 lists the parameters of a
DB-Library error message. All data types are DB-Lib/SQL Server data types.
The first parameter is a handle to the DBPROCESS structure identifying the
process which caused the message event. The second parameter, a DBSMALLINT,
indicates the severity of the condition which caused the message. This value
ranges from EXINFO, an informational message, to EXCONSISTENCY, a severe
internal error in DB-Lib or SQL Server. The possible values for severity are
assigned to constants in the DB-Lib header files. The third parameter, another
DBSMALLINT, gives the number of the message. The range of possible values is
assigned to constants in DB-Lib's header files. The fourth DBSMALLINT gives
the operating-system error number if the message was generated by an
operating-system error. The last two data items are both far pointers to
null-terminated strings and are nonnull if valid. The first contains the text
of any DB-Lib error message, and the second, the text of an operating-system
error message.


SQL Server Message Data


Messages from the server are more involved. Table 2 lists the parameters of a
server message. The first parameter is identical to that of a DB-Lib message:
a handle to a DBPROCESS. The second, a DBINT, contains the SQL Server message
number. Many of the message numbers have also been assigned to constants in
DB-Lib. The third parameter is a DBSMALLINT and is described by the
documentation as the "message state." 
The severity level of the message is passed in the fourth parameter. It is
ostensibly constrained to the same values as those in a DB-Lib message;
however, we sometimes saw severity levels beyond those documented. The next
three parameters are far pointers to null-terminated strings. The first
contains the text of the server message; the second, the name of the server.
If the message was generated by a stored procedure, then the third string
contains its name, otherwise it is null. If the procedure name is nonnull, the
last parameter, a DBSMALLINT, contains the number of the line in the procedure
which caused the error. This value is useful in debugging stored procs.


The Message Structures


Message data is stored and transmitted between objects in our interface layer
in two structures declared in DBMSG.H (see Listing One). 
Neither of the structures exactly duplicates the message data described
previously. In particular, both structures omit the near pointer to the
DBPROCESS structure. The reasons for this are that the DBPROCESS pointer is
used to identify the Process object managing the affected database connection;
the message data is then forwarded to it. Since every Process object contains
a pointer to its DBPROCESS structure, there is no need to retain that data in
the messages.
Both structures contain a flag called "received," which is an implementation
detail of the message pipeline. When a message arrives in the central Manager
class in the interface layer, it is sent to the appropriate process by calling
an interface function of the Process class. Some messages from the server are
accompanied by a notification message from DB-Lib; it is also possible for the
Process object to have delayed forwarding messages long enough for both
structures to have fresh message data. The received flag indicates when
message data has been copied into the structure. It is cleared by the handler
when the message(s) are processed.
Lastly, the null-terminated ASCII strings have been used to initialize objects
of type String. Given the amount of text manipulation sometimes necessary, it
is useful to have a robust string class, and String provides a full complement
of features. (Documentation on String is provided electronically; see
"Availability," page 3.) 


The Message-Handling Classes


The message-handling classes need to be able to handle the following: 
Format and display message data.
Log message data to an error file on disk.
Retry certain errors.
Provide default handling for categories of messages.

Provide customized handling for specific messages.
Provide a simple interface for customization of handling.
The classes we created to do this are ErrStrategy and DBMsgHandler. Both
classes are declared in DBMSG.H (Listing One) and are implemented in DBMSG.CPP
(available electronically). Due to space constraints, I'll focus primarily on
the interfaces of these classes. 


The ErrStrategy Class


The ErrStrategy class was designed to encapsulate the definition of a handling
strategy (that is, the actions taken when a message is received), be
customized for a given message, and provide for added functionality.
The interface to ErrStrategy consists largely of constructors and functions
which access the data. There is no destructor required. Wherever possible, the
implementations of the member functions have been inlined. ErrStrategy is
intended to be copied and manipulated by the application, and we wanted it to
involve as little overhead as possible.
There are three constructors for ErrStrategy objects. The first is a default
constructor, taking no parameters. It initializes the object to some default
values, the result of which is a fairly useless invalid object. The values
assigned are the same as those for the ErrStrategy instance, ESZERO, that is
used as a NULL instance. The default constructor allows arrays of ErrStrategy
objects to be created. After creating an array of ErrStrategy instances, the
application should cycle through the array and set them all to a valid state.
The second constructor is the initialization constructor. It takes eight
parameters which completely define the handling for the message in question.
These arguments correspond to the class data members discussed previously. Of
note is the callFunc parameter, which is defaulted to NULL. This parameter
specifies a callback function for the message. If no callback is to be
specified, this argument can be ignored. The implementation of this
constructor consists entirely of a member-initialization statement given after
the argument list, with an empty function body.
Next is the copy constructor. Its sole parameter is a reference to a constant
ErrStrategy instance which is copied into the instance being constructed. This
constructor is actually implemented in terms of the next member function, the
assignment operator, which does the actual work. The operator function also
takes a constant reference to an ErrStrategy instance as its only argument, as
does the relational-equality operator function, which follows. This operator
returns nonzero if the instance matches the one for which the operator is
invoked. Two ErrStrategy objects are considered equal if they refer to the
same error source and number.
The rest of the ErrStrategy interface consists of access functions for member
data that should be self-explanatory. Each function reads or writes a specific
member of the private data described earlier, and all are in lined for
performance reasons.
With the ErrStrategy class, we have an object which can represent the
customized handling of messages. It can be assigned to, copied, compared,
created in arrays, and manipulated through interface functions. We now need a
class which puts the ErrStrategy, along with all the other details already
described, to use. The next section will examine the DBMsgHandler class, which
is the core of the message-handling system.


The DBMsgHandler Class


The DBMsgHandler class is responsible for providing the mechanisms to handle a
message, track retry attempts, manage a list of custom strategies, and allow
strategies to be added or removed. DBMsgHandlers have full copy and assignment
semantics so that they can be easily shared among processes, or copied and
modified using a default "base handler" as a reference. 
The PendingErr structure is declared in the protected interface of
DBMsgHandler, and each handler contains a single instance of this struct
called "pending." The struct is used to hold all of the data on the message
currently being processed. The source of this data is an ErrStrategy instance,
if one has been defined for the message, or the default handling strategy if
no custom strategy is available. In addition, PendingErr holds other control
values needed by the handler. It serves as the basic control structure for
handling messages, and its operation will be examined in more detail in the
description of the HandleMsg member function.
The interface to DBMsgHandler begins with two constructors. The first takes a
single parameter of type String, specifying the name to use for the
message-log file. This parameter defaults to an empty string (""), allowing
the constructor to serve as a default constructor. The log-file name defaults
to SQLERROR.LOG if this parameter is NULL. The implementation of the
constructor clears the PendingErr structure and initializes it to a NO ERROR
condition, as defined in DBMSG.H. It then sets the default handling strategy
by setting the action levels using the constants described earlier. Lastly,
the name of the log file is set.
The remaining constructor is a copy constructor implemented in terms of the
assignment-operator function. The assignment operator takes a single argument,
a reference to a constant DBMsgHandler object, and performs the usual
assignment of member data, including copying the source instance's ErrStrategy
list. One twist in the mechanism is required: If the handler being assigned to
has an error currently being retried, the error is flushed before the member
assignment is done. Otherwise, a message might be lost when the source object
was copied into the target. The assignment operator, and thus the copy ctor,
copy the complete state of the source handler, including any pending errors.
The destructor for DBMsgHandler has the sole responsibility of cleaning up the
list of ErrStrategy instances. It does so using the ClearStrategies member
function. The destructor was made virtual because it was foreseen that more
specialized message handlers might be derived from DBMsgHandler in the future.


The Message-Handling Mechanism


The public-member function HandleMsg represents the mechanism for applying
handling strategies to messages from the server or DB-Lib. This function is
called by the Process object when a message is received, and passed two
message structures of the types described previously. HandleMsg returns a
value of type SqlAction, as defined in DBMSG.H. SqlAction provides three
constants which allow the handler to inform the application about the status
or consequences of processing a given message. These codes are SA_PROCEED,
SA_RETRY, and SA_CANCEL.
If a message has not been defined as requiring a retry, HandleMsg will process
the message and return either SA_PROCEED or SA_CANCEL. The first code tells
the application that the message was informational or nonfatal and that it can
proceed with the current task. The second represents handling of a fatal error
and instructs the application to terminate the current task. The code returned
depends on the terminate-severity level, which in turn depends on whether the
handling is by the default strategy or a Boolean proceed flag in the custom
handler.
The SA_RETRY code is returned when a message is caused by an operation which
should be retried. The number of retries allowed is defined in either the
default strategy or in a custom strategy for the message. The HandleMsg
function will return SA_RETRY on each receipt of this message until the retry
count is matched or until a different message is received. Receipt of a new
message during a retry cycle causes the existing message to be flushed before
the new message is processed. In either case, the return value is ultimately
set to either SA_PROCEED or SA_CANCEL.


HandleMsg Operation


The HandleMsg member function operates as a state machine controlled by the
current contents of the pending struct, which contains all of the information
required to handle the message currently being processed. This includes all of
the data from either the default or custom strategy, as well as the retry
counter, copies of the two message structures, and the message source and
number. During handling, the new message passed into the function by the
Process object will be compared with the pending message data to determine
what action to take next. The first task is to determine what kind of message
the new data represents.
In most cases only one of the two types of messages will be sent to the
application in response to a given operation. The only case we saw where two
messages were received was when a server message caused DB-Lib to send the
SQLESMSG notification. This message informs the app that a server message is
coming. There may be other cases when both messages are sent, and the design
of our message handler has assumed that other messages from DB-Lib always
constitute an error condition. As mentioned earlier, some server messages are
merely informational.
The HandleMsg function needs to have one message source and number in order to
choose a strategy. If both kinds of messages need to be dealt with and the
DB-Lib message is SQLESMSG, then it is ignored and the server message is
handled. Otherwise, the DB-Lib message is handled. In either case, when action
is taken to display a message or flush a message to the disk log, any data in
either structure that is new since the last call to HandleMsg will be output.


Conclusion


An application can utilize several combinations of methods to handle messages
using DBMsgHandler and ErrStrategy. A simple loop controlled as a result of
the HandleMsg member function will automate retries of those errors that allow
them, while those that don't will cause codes that can trigger the app to
invoke its own error handling. Figure 2 illustrates an example of an
application interacting with the message handler using this method. As an
alternative, the application can define a custom strategy that specifies a
callback function and use this function to take some action, either in place
of, or in addition to, the action defined in the strategy. 
Figure 1 SQL Server and DB-Library messages received by the Manager class are
forwarded to individual processes and then on to a message handler.
Table 1: Parameters of a DB-Library message.
 Data type Purpose 
 DBPROCESS Structure identifying
 process which generated message.
 DBSMALLINT Severity level of exception, falls between
 EXINFO and EXCONSISTENCY.
 DBSMALLINT DB-Library error number, nonzero if DB-Lib
 error occurred.
 DBSMALLINT Operating-system error number, nonzero if
 operating-system error occurred.
 LPSTR Pointer to null-terminated string containing

 DB-Lib error message if DB-Lib error
 number is nonzero.
 LPSTR Pointer to null-terminated string containing
 operating-system error message if
 operating-system error number is nonzero.
Table 2: Parameters of an SQL-Server message.
 Data type Purpose 
 DBPROCESS NEAR* Pointer to DBPROCESS structure identifying
 process which generated message.
 DBINT SQL-server message number.
 DBSMALLINT Message state.
 DBSMALLINT Severity level of exception, falls between
 EXINFO and EXCONSISTENCY.
 LPSTR Pointer to null-terminated string containing text
 of server message.
 LPSTR Pointer to null-terminated string containing
 name of server.
 LPSTR Pointer to null-terminated string containing
 name of process generating message.
 DBSMALLINT Number of the line in the above process which
 caused the message to be generated.
Figure 2 An application interacting with the DBMsgHandler class using the
HandleMsg() member function to respond to server or DB-Lib messages.

Listing One 

//************************************************************************
// DBMSG.H Class, data structure, and constant declarations for SQL
// Server/DB Lib message handling -- by Mark Betz
//************************************************************************

#ifndef DBMSG_H
# define DBMSG_H

# include <windows.h> 
# define DBMSWIN // DB Library needs this for Windows
 extern "C" // so the linker doesn't look for
 { // mangled names
 #include <sqlfront.h> // Microsoft includes for DB Library
 #include <sqldb.h>
 }
# include <string.h> // string class
// action messages returned by DBMsgHandler after processing a server or
// DB Library message.
enum SqlAction
{
 SA_CANCEL, // exit current procedure
 SA_PROCEED, // proceed, non-fatal or informational
 SA_RETRY // retry last operation
};
// error codes for use within DBMsgHandler and related classes. These
// represent errors which occur in the database interface. Server and
// DB Lib errors are signaled through the DBMsgHandler class.
enum DBErr
{
 DB_OK, // no error
 DB_ALLOCFAILED, // memory allocation failed
 DB_IOERR, // file or device i/o error
};
// error-source constants, used by DBMsgHandler

enum ErrSource
{ 
 ES_DBLIB, // error source was DB Library
 ES_SERVER // error source was SQL Server
};
// error-display handling constants, used by DBMsgHandler
enum ErrDisplay
{
 ED_ALERTONLY, // display an error alert/no info
 ED_BRIEF, // display error text info only
 ED_VERBOSE // display all error info
};
// default action levels for handling errors without custom strategies. These
// constants define the severity levels at which certain actions will occur,
// and the number of retries allowed.
const DEF_DISPLAY_LEVEL = EXCONVERSION; // display severity >=
const DEF_TERM_LEVEL = EXUSER; // terminate severity >=
const DEF_WRITE_LEVEL = EXUSER; // disk log severity >=
const DEF_DISPLAY_TYPE = ED_VERBOSE; // default display handling
const DEF_RETRY_CNT = 0; // default retries
// structure for DB_lib error messages, used by DBMsgHandler
struct ErrorStruct
{
 int received; // true if error message received
 int severity; // error severity
 int dberr; // DB error code
 int oserr; // operating system error code
 String dberrstr; // DB error message text
 String oserrstr; // OS error message text
};
// structure for SQL Server messages, used by DBMsgHandler
struct MessageStruct
{
 int received; // true if server message received
 int msgno; // server message number
 int msgstate; // server message state
 int severity; // message severity
 String msgtext; // server message text
 String server; // name of server issuing message 
 String process; // name of process causing message 
 int lineno; // line of process causing message
};
// function pointer type used in ErrStrategy
typedef void (*SqlErrCall)(MessageStruct&, ErrorStruct&);
class DBMsgHandler; // forward declaration
// SQL/DB Lib error strategy class. Used in DBMsgHandler to set custom
// strategies for handling DB Lib and SQL Server errors.
// IMPLEMENTATION: DBMSG.CPP
class ErrStrategy
{
 friend class DBMsgHandler;
public:
 ErrStrategy();
 ErrStrategy( ErrSource src, int num, int retCnt, bool show, bool notFatal,
 bool log, ErrDisplay disp, SqlErrCall callFunc = NULL );
 ErrStrategy( const ErrStrategy& );
 void operator = ( const ErrStrategy& );
 int operator == ( const ErrStrategy& );


 void SetSource( ErrSource src ) { source = src; }
 ErrSource GetSource() const { return source; }

 void SetErrNo( int num ) { errNo = num; }
 int GetErrNo() const { return errNo; }

 void SetRetryCnt( unsigned retCnt ) { retryCnt = retCnt; }
 unsigned GetRetryCnt() const { return retryCnt; }

 void SetDisplay( bool show ) { display = show; }
 bool GetDisplay() const { return display; }

 void SetProceed( bool procd ) { proceed = procd; }
 bool GetProceed() const { return proceed; }

 void SetWrite( bool log ) { write = log; }
 bool GetWrite() const { return write; }

 void SetDispType( ErrDisplay dispt ) { disptyp = dispt; }
 ErrDisplay GetDispType() const { return disptyp; }
private:
 ErrSource source; // the error source, ES_SERVER or ES_DBLIB
 int errNo; // the error number
 bool display; // display the error message
 bool proceed; // ok to proceed after handling
 bool write; // flush the error to a disk file
 unsigned retryCnt; // number of retries allowed
 ErrDisplay dispTyp; // how error display is handled if display == TRUE
 SqlErrCall callBk; // function called on this error
 ErrStrategy* next; // next strategy in the list
};
// for comparing against after an operation on strategies. MSGIMP is
// defined in DBMSG.CPP

#ifndef MSGIMP
 extern ErrStrategy ESZERO;
#else
 ErrStrategy ESZERO(
 ES_SERVER, -32768, -1, FALSE, FALSE, FALSE, ED_VERBOSE, NULL);
#endif
// DBMsgHandler class. This class contains all the logic for handling
// errors using default and custom strategies. 
// IMPLEMENTATION: DBMSG.CPP
class DBMsgHandler
{
public:
 DBMsgHandler( const String& logName = "" );
 DBMsgHandler( const DBMsgHandler& );
 virtual ~DBMsgHandler();
 void operator = ( const DBMsgHandler& );

 void SetDisplayLevel( int severity );
 int GetDisplayLevel() const { return displayLevel; }

 void SetTermLevel( int severity );
 int GetTermLevel() const { return termLevel; }

 void SetWriteLevel( int severity );
 int GetWriteLevel() const { return writeLevel; }


 void SetRetryCount( unsigned retCount ) { retryCnt = retCount; }
 unsigned GetRetryCnt () const { return retryCnt; }

 void SetDisplayType( ErrDisplay dispt ) { disptyp = dispt; }
 ErrDisplay GetDisplayType() const { return disptyp; }

 DBErr GetStatus();

 ErrStrategy AddErrStrategy( const ErrStrategy& );
 ErrStrategy GetErrStrategy( int errno, ErrSource source );
 ErrStrategy DelErrStrategy( int errno, ErrSource source );
 DBErr LoadStrategies( const ErrStrategy* strats, int count,
 bool clear = FALSE);
 void ClearStrategies();

 virtual SqlAction HandleMsg( const ErrorStruct&, const MessageStruct& );
protected:
 struct PendingErr
 {
 ErrSource source;
 int errNo;
 unsigned retry;
 unsigned retryCnt;
 bool display;
 bool write;
 bool proceed;
 ErrDisplay dispType;
 ErrorStruct es;
 MessageStruct ms;
 SqlErrCall callf;
 } pending;
private:
 SqlAction ResolveErr();
 bool IsPending( ErrSource msgSource, int msgNum );
 void SetMsgData( const ErrorStruct&, const MessageStruct& );
 void NotifyUser();
 DBErr WriteLog();

 int displayLevel;
 int termLevel;
 int writeLevel;
 int retryCnt;
 ErrDisplay dispTyp;
 ErrStrategy* stratList;
 String log;
 String message;
 DBErr status;
};
#endif // DBMSG_H












November, 1994
Object Databases


Object methods in distributed computing




Jonathan Wilcox


Jonathan is president of Menai Corp., a producer of object-oriented
programming tools. He can be contacted at jonathan@menai.com.


Distributed computing and object-oriented programming are central to the
emerging class of computing platforms. Architectures like COM, CORBA, and
OpenDoc, as well as formal and de facto standards such as OLE2, DOE, DCE,
DSOM, ODMG-93, and COSS, all target the intersection where distributed systems
meet objects. Numerous object-database research projects and a dozen or so
commercial object databases are evolving. To date, the implementations have
been based on a wide variety of approaches, and no single one has emerged as
dominant. 
As usual on the cutting edge, confusion is the one constant. Consequently, in
this article I'll examine a number of issues concerning object methods and
object databases. In particular, I discuss where method code is located, how
it is loaded, and where it is executed.
The combination of object databases (ODB) and object-oriented applications
produces two discernible categories of class methods: those specific to
applications, and those relating to database management and data manipulation.
For example, a chemical-formula calculation is application specific, while a
search for an object having a particular attribute value is database specific.
To distinguish between these categories, I'll refer to them as "application
methods" and "ODB methods." It is important not to confuse these labels with
the site of method execution. Some ODB products actually execute their ODB
methods as part of the application, and others execute application methods as
part of an ODB server process; see Figure 1.


ODB Implementations


To be dealt with as instances of a class, objects must be recognizable.
Practical implementations of object technology depend upon instance tagging
for object identification. The tag is placed at a defined position within the
structure of the instance data, where hardwired code of an application (or of
an ODB stub linked into the application) expects to find it. The most commonly
used tags are unique object identifiers (OIDs), class names or numbers, and
schema-version identifiers. Obviously, a control system must exist to prevent
redundant use of OIDs and enforce correct use of class and version tags.
Given a class name or number, the client application can reference the class
methods by finding that name or number with a program-loader mechanism. In
IDB, from Persistent Data Systems (Philadelphia, PA), for example,
distributed-object database instances include a header with a class identifier
that provides the leverage for dynamic dispatch of methods by indirect call
through a vector of method addresses.
Given that the class instance is recognized, where can you find the executable
code of a method of that class? The simple answer is that it must be found
locally, in the form of a linked library, or in database storage, or in a code
library at some network node. This can be done using an automatic
code-retrieval mechanism or remote function server. If the code is installed
locally but the instances are supplied by an object database, the database is
"passive;" if both the instances and the code are supplied by an object
database, it is "active." 
However, you can't describe ODB implementations in terms of a simple
active/passive dichotomy. Further distinctions must be made between
data-management methods and application-specific methods, and between
client/server architecture and peer architecture. These distinctions are hard
to define, because many software implementations allow application designers
to choose either approach.
If the binary code to implement the identified method is not available at the
application site (in the same executable or an accessible library), the
identification of its location can be supported by an accessible name-server
process. Some ODBs employ a universal catalog of methods; others use a
domain-name approach.
An application program need not retrieve remote class instances or remote
method code. It is sufficient that the application can identify an object or
class of objects located elsewhere, because the application can send a message
to the remote server requesting that class methods stored at the server be
invoked with reference to the class instances also stored there. Indeed, in
the world of distributed computing, it may be hard to know just where
execution takes place, because instances and methods may be dispatched to a
node where the compute load is temporarily low.
The Itasca object database from Itasca Systems (Minneapolis, MN) stores both
source and binary-method code in servers where method execution takes place.
This allows active distribution of the computation load among Itasca server
nodes. In a forthcoming version of Itasca that will support heterogeneous
servers, the system will provide automatic compilation/linking of source code
as needed on the execution platform.
There appear to be four design approaches to management of methods in
distributed ODB systems: 
A local code library is directly linked to the executing or ODB server
program. 
The ODB stores reference information to identify the method and the linkable
library where it may be found. This information is passed to a conventional
program loader, usually within the application program, but sometimes within
the ODB server. Occasionally, the method is stored and executed on a remote
network node.
The ODB stores script or intermediate computer code, which is delivered to an
interpreter or incremental compiler built into the server or the application. 
The ODB stores binary code native to the execution platform and involves an
extended program loader on the part of the application to retrieve the code
objects.
A "local" code library is generally a DLL or shared library, although it may
be statically linked. It may only appear to be local to the process that loads
it, because it may be in a remote directory that has been mounted locally by a
network facility such as NFS from Sun Microsystems. The "executing process"
may be an application process, an ODB server process, or both. The ODB may or
may not be implemented by a server process. Finally, the ODB designers may
have employed more than one of these approaches and given application
developers the same option. Certainly, this isn't a subject that yields easy
classifications.


Design Categories


The local-library design is the most conventional. It relies on
method-resolution facilities supplied by the computer language and
program-loader facilities included with standard computer libraries. In C++,
for example, the available methods are known at compile time and accessed
through a vector table associated with the class of a given object. If the
method is not yet in memory, its library file and method name are accessed
from the program's static-memory area and supplied to the program-loader code.
The loader maps the referenced binary code into memory and performs "fixups"
that change internal executable binary code references into memory addresses.
When the library appears to be local as a result of NFS mounting, the same
routine occurs.
An extreme example of the local-library design is found in the Persistent Data
Systems IDB object database. All application and ODB methods are compiled into
the application. There is no server process. The database consists of a set of
files to which coordinated access is shared by cooperating applications. The
POET object database, from Poet Software (Santa Clara, CA), uses the same
approach. 
A less extreme example is the O2 object database by O2 Technology (Mountain
View, CA). O2 uses local-library methods in association with a page-server
process that is blind to the semantics of the objects that it handles. The
object semantics are managed by code linked into the application.
Similarly, Versant, from Versant Object Technology (Menlo Park, CA), relies
upon ODB and application methods linked to the application. A subset of the
ODB methods operates by remote procedure call to the ODB server process, which
manages only object instances, not method references.
The second design category is exemplified by the many object databases that
employ database-server processes and rely on local libraries for methods. Most
of these tools store method references in the database; however, they rely on
the application to employ these references to find the code in local
libraries. Examples include UniSQL, from UniSQL Inc. (Austin, TX); ONTOS, from
ONTOS Inc. (Burlington, MA); Objectivity/DB, by Objectivity Inc. (Mountain
View, CA); Matisse, from ADB Inc. (Cambridge, MA); and EasyDB, from Basesoft
Open Systems (Kista, Sweden). Open ODB, from Hewlett-Packard (Palo Alto, CA),
also fits into this category (more on it shortly). 
UniSQL stores the name and location of method code in the ODB and passes the
information to an extension of the application-program loader. UniSQL uses NFS
to make remote libraries appear local. ONTOS and Objectivity/DB do likewise,
with ONTOS storing method references in "procedure objects" and Objectivity/DB
maintaining a catalog of schemata and methods. In EasyDB, a run-time view of
the data dictionary is used to look up methods for application loading from a
local library. Illustra (formerly Montage), by Illustra Information
Technologies (Oakland, CA), can be used to manage application methods
according to this second category, though it is usually employed to manage ODB
method code for ODB server execution.
The third design category employs an interpreter facility linked to a process
that retrieves script or compiled intermediate code from the ODB. O2 and ONTOS
exemplify this approach. O2 provides an optional, 4GL, incrementally compiled
language called O2C, which can be used to develop database-storable ODB
methods. Because the O2 server is only a page server, the O2C methods must be
loaded to run in an interpreter linked with the application process. ONTOS
provides an optional, storable "method object" that records combinations in
which application-linked methods should be executed, thus emulating pre- and
postcondition triggers and method wrappers.
The fourth design category employs database storage of compiled binary code.
This code may be retrieved for execution by the database-server process and/or
by the application. In each circumstance, an extension of the conventional
program loader is required to retrieve the code and possibly to "swizzle"
internal OID references into memory addresses if the binary code is stored as
a class of objects.
Because it uses binary-code storage, OpenODB requires that the code be
developed in the OPL language, which compiles to binary. These methods execute
on the ODB server. ODB storage of binary code is also a developer-specified
option with Itasca which also requires that stored code be executed by the ODB
server process. OpenDB and Itasca give you the option of storing application
methods or linking them to the application for application-site execution.
OpenDB also achieves that objective by storing a reference to the method as a
remote binary located at the application node and employing an external
function server at the application's node to launch the method.
An extension of the binary-code storage approach is the storage of references
to executable binaries stored in associated databases or available for
automated loading and execution on network nodes. Invoking such a method is
usually location transparent to an application because it is managed by the
ODB server, perhaps with the aid of a function server (automatic program
loader) at a cooperating execution node. This capability exists in OpenODB.
Analogous to database storage of intermediate method code are ODBs that use
the Smalltalk environment. With Gemstone, from Servio (Beaverton, OR), you can
define new methods in Smalltalk or C, and these are stored and executed at the
ODB server. In ObjectStore for Smalltalk from Object Design (Burlington, MA),
the ODB extends the Smalltalk Virtual Machine to obtain demand paging into
applications of any Smalltalk object, which can include methods.


Native Code


Finding executable code and mapping it into memory for execution is the role
of a program loader. Operating systems have program loaders that spawn
processes, and programs have program loaders that map program components into
memory. Overlays, DLLs, and shared libraries are also mapped into memory by
program loaders. Thus, software mechanisms that find and retrieve executable
code from a different network node may be viewed as extensions of program
loaders.

In a network of homogenous computers that actually execute methods, no
complexities are involved in retrieving method code from a local or remote
node. If the execution takes place only on servers, then only the servers need
to be homogenous; if the execution takes place only on clients, then only the
clients need to be homogenous. In all of these instances the same binary code
suffices.
In a network of heterogeneous computers that execute the same methods,
however, obvious problems arise. If methods are to be accessed locally (in
DLLs, for example), manual installation of appropriate libraries may
circumvent the problem. But in the case of a remote server, which may be
accessed by more than one kind of computer seeking executable method code,
some provision must be made to supply the right kind of binary. The case is
essentially the same when a client may invoke method execution on multiple,
dissimilar computers that may not use local libraries.
A number of solutions have been tried: 
Maintain at each code server every version of the binary that could be needed
at any execution site, and require each program-load message to identify the
binary version needed.
Maintain a local intermediate-code interpreter at each execution site and
respond to a code request with the intermediate code or incremental compiler.
Maintain source code at the code server and provide automated compile/link
services appropriate to any node that requests method code and has not yet
received an appropriate binary.
The first alternative is exemplified in the IDB distributed-object database,
in which the server keeps "operations" files for each supported platform.
These files are distinguished from one another by a naming convention (for
example, foo.dll for Windows, foo.on for Next, and foo.om for Macintosh). The
second alternative is exemplified by Smalltalk systems, such as ObjectStore
and Gemstone, which use byte code inherent in Smalltalk. Products using other
languages offer this approach as well; for example, HP's OpenODB offers the
OPL programming language. The third alternative is planned for Itasca, which
stores both source and executable code of methods in the database. (At
present, Itasca executes all distributed methods on homogeneous servers.)
Any code-retrieval mechanism requires a means for an application to
unambiguously reference an operation for which a method has been defined. For
DLLs and shared libraries, the reference is generally to a library filename
and specifically to a function or procedure name distinguished according to
class. In C, the distinction can be maintained by a naming convention; in C++,
it is automatically supported by class scoping and function-name mangling. 
When method code is (or appears to be) situated locally to the executing
process, the program loader for the generating system or process will locate
the code by filename and offset within that file. The program loader will
perform "fixups" that change file-offset references into memory-location
references. Shared libraries and DLLs add an intermediate name-lookup step to
secure the offset of the code within the library. The difference between this
and non-object-oriented loading procedures or function code is that the method
may have been called through a C++ vtable (object vector table) or a
C-function pointer array.
When method code is stored in an object database, more complex techniques must
be used. Methods must be referenced in the database via an object instance
known to the application. Some characteristics of the object instance must
provide the key to accessing an associated method through database functions.
This is often accomplished through use of a class tag, schema version tag,
and/or unique object identifier, any one of which may provide a reference to
the methods of the class. The routine for object-method resolution begins
after the application code has selected an object of a particular class and an
operation upon that object, so that the remaining activity pertains only to
locating the code for the designated method. 
The class-instance and method-locating routines can proceed from the same
take-off point, rather than first instance, then method, if the searches are
based on a unique OID. In such a design, the OID may serve as a key into an
index that has two separate values associated with that key; a reference to
the instance-storage site and a reference to the method-storage site. Often,
however, the first reference is to a class or schema representation.
A class tag or schema-version tag is unique to a class, rather than to an
object instance, so the tag must be associated with the instance. This might
be accompanied by physically storing the tag at a position within the instance
that is known to the database implementation, or by an index that associates
it with a unique OID that is physically stored with the instance. The index
may reference the schema itself, or it may reference an intermediate index to
secure the schema version of the object if multiple, concurrent schemata are
supported. Thereafter, the class or schema provides a further reference to the
location of stored method code. 
A schema is a collection of object types that describe the physical storage
layout of object-data instances of a given class. Methods can be included in a
schema directly or by reference to a list of methods. When included in a list,
methods are typically described by signatures that include such information as
name, argument types, return types, exception types, and a reference to the
physical location of the method code in the database or in a linkable library.
It is not essential that an OID be retained physically together with an
instance, as long as an index is maintained that associates the OID value with
a physical storage site and a class or schema that describes the instance.
This would be hazardous, however, due to the risk of index corruption. Keeping
the OID physically together with the instance allows the index to be rebuilt.
It is unrealistic to be categorical about any of the designs. First of all, a
reference may be direct or indirect at any stage where a reference is used.
Second, the tags and OIDs are often used in parallel for reasons that relate
to support of legacy code and databases. Third, the terminology of
object-oriented design is not settled, so differing meanings and roles of
class, metaclass, schema, schema version, object type, object version, and
method version are embodied in a great variety of ODB implementations.
For example, in some databases it is possible to define and store
data-manipulation methods as "method objects" within the database itself; but
at least one significant maker considers this to be an impure concept and
insists that database methods should only be accessed from linkable libraries.
It may be gratuitous to observe that the practical difference between the two
is that the program loaders differ in their implementation details.


Conclusion


The variety displayed in ODB implementations is at once confusing and
encouraging. We may now expect to see a convergence of particular designs with
application needs to which they are well suited.
Agents and Object Databases
Object databases typically support well-defined applications that access
objects of known classes. The methods of those classes are defined with
precise syntax. A very different run-time environment exists for programs
known as "agents." Agents are being developed as personal productivity
assistants (for example, to cull news items for a particular reader or to
perform database research) and many other uses. Truly useful agents must
respond to complex and variable messages and often must delegate tasks to
other agents. This circumstance invites the development of generalized message
languages which can be interpreted and generated by agents. Agents may be
viewed as engines that respond to messages of interest (others being ignored),
where the messages state or satisfy a need rather than carry out a procedure.
In contrast, the typical application that uses an object database will fail or
generate an error condition if a malformed or unexpected message is received
by an object.
--J.W.
Figure 1 (a) Library method executed at client site; (b) stored method
executed at ODB-server site; (c) stored method executed at client site.




































November, 1994
Database Management in C++


A single interface to multiple file formats




Art Sulger


Art specializes in database administration, analysis, and programming for the
state of New York. He can be contacted on CompuServe at 71020,435.


Because individual database formats generally require that you write
individualized code, programming to access database files can quickly become
unnecessarily complex. In this article I present a class library which
provides a single interface to multiple database file formats. In addition to
freeing you from code duplication, this class structure allows your DBMS to
support new data types such as those used with multimedia. (Imagine your xBase
files holding sound and pictures!) 
My original intent was to design a class structure for accessing xBase files.
The structure had a parent Database class to provide portability; a Table
class, in turn, descended from that. The Table class reads the xBase header
and handled the opening and closing of files. The Column class implements the
"table-has-columns" relation.
This design worked fine until I began to update a program based on the CData
file format described in C DataBase Development, Second Edition, by Al Stevens
(MIS Press, 1991). Instead of writing a new set of classes for this file
structure, I developed the class structure in Figure 1, which moves duplicate
code into two virtual classes, Table and Column. The advantage of this
approach is that only 200 additional lines of code are necessary to provide
support for a second database format. A user of the CData-based application
also needs to access multiple related files. With the new design, it is easy
to derive a "view" class from the cDataTable class and encapsulate the code
that does the joins. And because a user of this class hierarchy sees any
derived instance of Table as an instance of Table, the new design accommodates
views constructed from different file formats.


The LogicalDB Class


At the top of the hierarchy is LogicalDB from which you derive Table and
Column. LogicalDB is a convenient place to pool global resources used by all
the descendants. LogicalDB gets its information from the system on which you
run your application. When you compile it as a Windows app, for instance, it
reads the INTERNATIONAL section of the WIN.INI file. If you compile it as a
Presentation Manager app, LogicalDB gets the various PM_National settings. Or
if you use neither of these platforms, LogicalDB reads and writes to its own
default file.
The derived classes use an enum SYSERR type extensively. You can map these
errors onto a STRINGTABLE in a GUI platform and build an error routine with
platform-specific behavior. If you want to take advantage of platform-specific
behavior, put that code into the LogicalDB. Listing One provides all of the
headers, defined for OS/2, DOS, and Windows.
Because some operations work on sets of records, there is a NOTIFY_YESNO
variable that you should set to "No" before doing Set operations to prevent
informational messages from being repeated for every row. Most of these
variables are static. Only one copy exists, no matter how many tables you
open. A static counter (ObjectID) ensures that you initialize them once. The
Message() member and its variables are not static, however. If you use these
classes with a multitasking operating system, you may want to have different
tables run in different threads, each with its own error mechanism.
The LogicalDB class instructs the Column object how to display numbers, dates,
currency, and other system-defined data types; see Example 1. In this
instance, the Date enum, which is in the LogicalDB class, is a switch variable
for the Column class.


The Table Class


Table is an abstract class that descends directly from LogicalDB. A user of
your derived classes will only use the public methods of the Table class. You
should not add any methods to classes you derive from Table without first
putting them into Table. The Table object is responsible for the storage of a
single record. Table also maintains information about the current record
pointer, the length of the record, and the name of the file that contains the
record. You will have to override most of the methods in this class when you
derive one for a specific DBMS, because different formats require different
I/O.


The Column Class


The Column object is where the real action takes place. The Column object
knows what type of data it is and knows how to display or "play" itself. I
implemented only the basic types-- NUMBER, CHARACTER, and CURRENCY. You must
write your interpreters for the more exotic domains such as SOUND and COMMAND.
The Table object normally passes domain information into the Column object;
otherwise, it will initialize a battery of Columns, doing the best it can to
tell the Column object to which domain it belongs. For xBase files, this is
easy because of the header information. An SQL derivation would query the
SYSCOLUMNS tables for this information.
The real power of the Column class comes when you pass in domain information
so that you can store a picture or sound in the file. Just write the picture
displayer or sound interpreter, and you can "play" that structure as a native
domain of your database. In the source code (provided electronically, see
"Availability," page 3), for example, both CData and xBase files will have
Timestamps and Record Sequence numbers, neither of which is native to the
original format. The Column objects have two buffers: One is a pointer to the
field's location in the record, and the other is a buffer to display the data.


Deriving a DBMS 


Classes for specific formats descend from both Table and Column. xBase, the
file layout used by Borland's dBase, Microsoft's FoxBase, Computer Associates'
Clipper, and other systems, has extensive information about the file in a
variable-length section at the beginning of the file. You can, for instance,
find information about the row length, names, and types of the columns, and
the date you last updated the file. In this respect, the format is similar to
Paradox data files. I've provided xBaseTable and xBaseColumn classes in the
source code.
The CData file format, however, has a dictionary bound into the application
itself; therefore, no column-identifying information is available in the file.
Consequently, you can send in the domain information from the application, or
default everything to simple CHARACTER types. The classes CDataTable and
CDataColumn illustrate this. If you don't send in domain information,
CDataTable will read the first row to gather information about the lengths of
the columns by counting the delimiting nulls. There is no way to deal with a
CData file with no rows unless you explicitly send in domain information.
Again, the only truly public methods are those in the Table class. When you
want to add another file format to your hierarchy, build the table and column
class at this level, using the virtual methods as parents of your specific
implementations. Be careful about adding new methods to your derivations that
are too specific to the inherited class. You should be careful about adding
new methods to your derived classes. To preserve the polymorphic nature of
Table, all public methods, no matter how trivial, have to be part of the Table
class.
Most of the Column methods are okay as is. Column has many methods that,
although virtual, have instantiated code. Column assumes the DBMS stores dates
CCYYMMDD. CData stores dates MMDDYY, so the CDataColumn class overrides the
AssignDate and DisplayDate methods. CData also stores columns as ASCIIZ, that
is, with a terminating null. The CDataColumn takes care of this by first
calling the virtual parent's Assign method, then it puts the null in the
proper location.


Further Developments


Because I plan to add a class for one of the SQL engines (such as Sybase or
DB2/2), I designed the classes for expansion. (This will also make it possible
for me to add an ODAPI interface in the future.) Consequently, as you move
between different database implementations, you won't need to modify the
application code (or retrain the application coders) if you use this design.
Furthermore, these classes allow ordinary database formats to express
themselves beyond their built-in character types. This is one way you can
extend the life of those formats used in relational databases. 
Aside from expanding them horizontally by including more database formats, you
can grow the classes vertically. First, you derive classes to encapsulate
methods for a specific table. For example, an Employee class would include, as
members, the column structure and any business rules associated with an
Employee object. You can overload the Table methods with Employee-specific
code, then invoke the inherited method explicitly; see Example 2. You can also
inherit classes from Employee and, for example, Department. This is how to
create a "view"--a virtual table resulting from selecting, projecting, or
joining rows from multiple tables. The tables can be in different formats on
different hardware. Figure 2 shows a complete hierarchy (I've included only
the first three levels in this article).
You may have noticed that there are no indexes in these classes because, at
this generic level, I can't decide which format to support. Nor has speed been
a problem because the code here executes quickly and my tables are small. For
example, I have timed xFind at less than 4 seconds on a 5000 record file of
50-character records using a 486/66.

Figure 1 Initial class structure.
Example 1: The LogicalDB class is where the Column object learns how to
display numbers, dates, currency, and other system-defined data types.
 static LogicalDB * db ;
 switch ((int)db->Date)
 case (int)MMDDYY:
Example 2: Growing classes vertically.
void Employee::NewRow()
 {
 i = 0;
 while (key_array[i].tname != NULL)
 {
 key_array[i].IsModified = TRUE ;
 i++ ;
 }
 xBaseTable::NewRow() ;
 }
Example 3:This code will display a row to an output file.
 for (i = 1; i < K.FieldCount() + 1; i++)
 write(out, bf, sprintf(bf, "%s ", K.Display(i)
)) ;
Example 4: This code prints all the rows in an xBase file where a column
matches a value.
while (!K.IsEOF())
 {
 if (K.IsMatch(ColumnName, "Smith", exact))
 {
 for (i = 1; i < K.FieldCount() + 1; i++)
 cout << " - " << K.Display(i)
 ;
 cout << endl ;
 }
 K.Next() ;
 }
Example 5: Tables of this array can be xBase or CData.
xBaseTable & e = *new xBaseTable() ;
CDataTable & d = *new CDataTable() ;
struct Tab
 {
 Table & table ;
 } tab[] = {
 Tab(e)
,
 Tab(d)
,
 } ;
switch (iAction)
 case 0:tab[i].Close() ; break ;
Figure 2 Class hierarchy.

Listing One 


// LogDB.hpp
//this encapsulates error messages
#ifndef LOGICALDB_HPP
#define LOGICALDB_HPP
#include <stdio.h>
#include <stdlib.h>
#include <io.h>
#include <sys\stat.h>

#include <string.h>
#include <fcntl.h>
#include <iostream.h>
#include "System.h"
// errors are all positive so we can map into a Stringtable
// errors above 20,000 are fatal
enum SYSERR
 {
 GOOD_RETURN = 0 ,
 // fatal system errors:
 D_OM = 21000, // out of memory
 D_INDXC = 21001, // index corrupted
 D_IOERR = 21002, // i/o error
 D_LOCK = 21003, // locking failure
 D_DEFAULTS = 21004, // corrupted or missing defaults
 // fatal dbms errors
 D_FORMAT = 22000, // bad header info in data file
 D_PRIOR = 22001, // no prior record for this request
 D_FILENOTEXIST = 22002,
 D_MXTREESMAX = 22003,
 D_BEYONDFILE = 22004,
 D_DBNOTOPEN = 22005,
 D_INDEXLOCKED = 22006,
 D_DISKFULL = 22007,
 D_OPENFAILED = 22008,
 D_CLOSEFAILED = 22009,
 D_READFAILED = 22010,
 D_WRITEFAILED = 22011,
 D_CREATEFAILED = 22012,
 // dbms warnings:
 D_DUPL = 2000, // primary key already exists
 D_DEPEND = 2001, // dependent record exists
 D_NOPARENT = 2002, // no parent record exists for given key
 D_INDEXMAX = 2003, // index number > than max indices
 D_NOINDEXSET = 2004,
 D_ACCESSDENIED = 2005,
 D_WAITCOUNT = 2006,
 D_CASCADEFAIL = 2007, // Child Nullify or Delete failed
 D_NAMENOTFOUND = 2008,
 D_KEYISNULL = 2009,
 D_NOTUNIQUE = 2010,
 D_KEYPARTISNULL = 2011,
 // dbms notifications:
 D_NF = 12003, // record not found
 D_EOF = 12004, // end of file
 D_BOF = 12005, // beginning of file
 D_ZERORECS = 12006, // empty file
 D_NOTNEW = 12007, // New() not called before Write()
 D_NOTSELECT = 12008, // Cursor Selection require a SelectOpen()
 // Business Rules warnings:
 D_INVALIDDATE = 1000,
 D_BADFORM = 1001, // error in sum, count or formula
 D_BADDOMAIN = 1002
 } ;
#ifndef TRUE
#define TRUE 1
#endif
#ifndef FALSE
#define FALSE 0

#endif
#ifndef BOOL
#define BOOL short
#endif
#ifndef UINT
#define UINT unsigned int
#endif
#ifndef USHORT
#define USHORT unsigned short int
#endif
#ifndef ULONG
#define ULONG unsigned long int
#endif
enum NOTIFY_YESNO // interupt extended operations for messages?
 {
 NO = FALSE,
 YES = TRUE
 } ;
enum ARENULLS
 {
 NOTNULL, PARTLYNULL, ALLNULL
 } ;
const int MXKEYLEN = 20 ;
const int MXCOLUMNWIDTH = 32 ;
const int MXCOLUMNNAME = 32 ;
typedef enum enumCountry {
 OTHER=0,
 USA=1,
 CANADA=2,
 LATIN_AMERICA=3,
 NETHERLANDS=31,
 BELGIUM=32,
 FRENCH=33,
 SPAIN=34,
 ITALIAN=39,
 SWISS=41,
 DANISH=45,
 SWEDEN=46,
 NORWAY=47,
 GERMAN=49,
 AUSTRALIAN=61,
 JAPAN=81,
 KOREAN=82,
 SIMPL_CHINA=86,
 TRAD_CHINA=88,
 PORTUGUESE=351,
 FINNISH=358,
 ARABIC=785,
 HEBREW=972
 } eCountry ;
typedef enum enumCurrrencyFormat
{CHARNUM,NUMCHAR,CHARSPACENUM,NUMSPACECHAR} eCurrencyFormat ;
typedef enum enumDate
{MMDDYY,DDMMYY,YYMMDD} eDate ;
typedef enum enumDigits
{ZERO,ONE,TWO,THREE,FOUR,FIVE,SIX,SEVEN,EIGHT} eDigits ;
//===============================================================
class LogicalDB
 {

 private :
 static short LogDBid ; // construct this only once
 #if defined (PM_INCLUDED)
 HAB hab ;
 #elif defined (WINDOWS)
 HINST hInst ;
 #endif
 public :
 static BOOL ReadOnly ;
 static eCountry Country ;
 static eCurrencyFormat CurrencyFormat ;
 static eDate Date ;
 static eDigits Digits ;
 static char s1159[3], s2359[3], sCurrency[2],
 sThousand[2], sDecimal[2],
 sDate[2], sTime[2] ;

 char MessageBuffer[200] ;
 SYSERR er_num ;
 NOTIFY_YESNO notify ;
 SYSERR dberror(SYSERR e){er_num = e; dberror() ; return er_num ;}
 void dberror() ;
 int Message() ; // emit platform-specific messages
 LogicalDB(BOOL READONLY = TRUE) ;
 virtual ~LogicalDB() {if (LogDBid) LogDBid--;}
 void SetNotify(NOTIFY_YESNO x){notify = x ;}
 #if defined (PM_INCLUDED)
 void SetHab(HAB h){hab = h ;}
 #elif defined (WINDOWS)
 void SethInst(HINST h){ hInst = h ;}
 #endif
 SYSERR Error(){return er_num ; }
 } ;
#endif // class LogicalDB

#ifndef COLUMN_HPP
#define COLUMN_HPP
#include "LogDB.hpp"
enum eElementType {
 CHARACTER =0x0001,
 CURRENCY =0x0002,
 DATE =0x0004,
 DECIMAL =0x0008,
 INTEGER =0x0010,
 FLOAT =0x0020,
 GRAPHIC =0x0040,
 LOGICAL =0x0080,
 MEMO =0x0100,
 TIME =0x0200, //hhmmss 24hr clock
 ZEROFILLED =0x0400,
 SPACEFILLED =0x0800,
 RSN =0x1000,
 DOCUMENT =0x2000,
 COMMAND =0x4000,
 UPPER =0x8000,
 LOWER =0x00010000,
 CALCULATION =0x00020000,
 WORDINTEXT =0x00040000,
 NUMBER =0x00080000,

 TIMESTAMP =0x00100000, //ccyy mm dd hh mm ss xxx
 DURATION =0x00200000 , // hhh mm ss hx
 SOUND =0x00400000
};
typedef struct COLUMNINFO
 {
 char * FieldName ;
 eElementType Type ;
 unsigned short Width ; // Unformatted storage width.
 unsigned short Decimals ;
 } COLUMN_INFO, * PCOLUMN_INFO ;
enum ePrecision // For IsMatch().
 {
 exact,
 like
 } ;
class Column : public LogicalDB
 {
 protected :
 char wrec[60] ;
 short bReadOnly ;
 eElementType ColType ;
 unsigned int rawWidth ; // Does not include nulls for CData.
 char * cValue; // raw value portion (created in Table)
 char * cDisplayValue ;
 char * cName ; // External name of the Column.
 char * cDefault ; // Filled in by NewRow()
 struct TabParms // Table passes this in for RSN calc.
 {
 ULONG recordcount ;
 } * pTabParms ;
 unsigned int DispWidth; // includes terminating null
 unsigned int nDecimals ; // Implied decimal for the stored value.
 int rc ; // Generic.
 public :
 Column() {cName=cDefault=cDisplayValue=NULL; }
 virtual ~Column() ;
 virtual SYSERR Assign(char * value) ;
 virtual SYSERR AssignDate() ;
 virtual SYSERR AssignNumber() ;
 virtual SYSERR AssignTimeStamp() ;
 virtual SYSERR AssignTime() ;
 virtual void AssignDefault(void *) ;
 eElementType ColumnType(){ return ColType ; }
 virtual const int DayOf() ;
 virtual const char * Display() ;
 virtual const char * DisplayCurrency() ;
 virtual const char * DisplayDate() ;
 virtual const char * DisplayNumber() ;
 virtual const char * DisplayTime() ;
 const unsigned int DisplayWidth(){return DispWidth - 1 ; }
 virtual void Init // allow Column arrays to be filled out
 (char *Portion,
 char *cName, // column name
 eElementType Type, // see eElementType, above.
 unsigned int ucLen, // Storage length.
 unsigned short ucDec = 0,
 char *DefaultValue = NULL) ;
 BOOL IsMatch(const char * Value, ePrecision p) ;

 virtual const char * Name(){return cName ; }
 void SetDomain(eElementType x) ;
 } ;
#endif // COLUMN_HPP

#ifndef TABLE_HPP
#define TABLE_HPP
#include "LogDB.hpp"
#include "Column.hpp"
class Table: public LogicalDB
 {
 protected:
 Column * * Col ; // The attributes of the table
 int curr_fd; // Current file descriptor
 BOOL bReadOnly, bSelect ;
 int i, j, rc ; // Utility vars.
 char * cRowBuffer ; // Where the raw row is stored.
 enum eFileStatus
 {
 not_open,
 not_updated,
 updated
 } TabStat;
 struct TabParms // Pass to Column in NewRow(0 processing.
 {
 ULONG recordcount ;
 } Tab_Parms ;
 ULONG ulRecordCount;
 ULONG ulCurrentRecord;
 ULONG ulRowSize ;
 BOOL E_O_F ;
 BOOL AlreadyRead ;
 BOOL IsNew ;
 unsigned int unFieldCount;
 unsigned int unCurrentFieldPointer ;
 char cFullFileName[128];
 void SetColumnDomain(int cl,eElementType d)//Make a Column all it can be!
 {Col[cl - 1]->SetDomain(d) ;}
 public :
 Table()
 {
 unCurrentFieldPointer=0;ulCurrentRecord=0;AlreadyRead=E_O_F=0;
 IsNew=0; bSelect = FALSE ;
 cRowBuffer=NULL;Col=NULL;
 }
 virtual ~Table() { ; }
 virtual SYSERR Assign(int COL, char * data)
 {return Col[COL - 1]->Assign(data) ;}
 int char2offCol(char * colname) ; // Name returns Number.
 virtual SYSERR Close() = 0 ;
 virtual eElementType ColTypeXfrm(char hdr)
 {if (hdr == 'x') ; return CHARACTER ;}
 virtual char ColTypeFromElement(eElementType e)
 {if (e == CHARACTER) ; return 'C' ;}
 const char * ColumnName(int COL){ return Col[COL - 1]->Name() ; }
 virtual SYSERR Create(char * fname, COLUMN_INFO c[]) = 0 ;
 virtual const int DayOf(int COL) // day part of date, timestamp
 {return Col[COL - 1]->DayOf() ; }
 virtual ULONG Delete() = 0 ;

 virtual const char * Display(int COL){return Col[COL - 1]->Display() ;}
 virtual const UINT DisplayWidths(int * ListOfColumns)
 {
 i = 0 ;
 while (*(ListOfColumns))
 i += Col[*(ListOfColumns)]->DisplayWidth() ;
 return i ;
 }
 virtual const UINT DisplayWidth(int COL)
 {return Col[COL - 1]->DisplayWidth() ;}
 const unsigned int FieldCount(){return unFieldCount ; }
 virtual const char * FirstColumnName()
 {
 unCurrentFieldPointer = 0 ;
 return Col[0]->Name() ;
 }
 BOOL IsEOF() {return E_O_F ; }
 BOOL IsColumn(unsigned int C)
 {
 if(C<1)return FALSE ;
 return (C>unFieldCount?FALSE:TRUE) ;
 }
 BOOL IsColumn(char * C)
 {return (char2offCol(C)==-1?FALSE:TRUE) ;}
 BOOL IsMatch(int ColNm, const char * Val, ePrecision e) ;
 BOOL IsMatch
 (const char * ColNm, const char * Val, ePrecision e) ;
 virtual const char * Name(){return cFullFileName ;}
 virtual SYSERR Next() = 0 ;
 virtual const char * NextColumnName()
 {
 unCurrentFieldPointer++ ;
 if (unFieldCount <= unCurrentFieldPointer)
 return (char *)"" ;
 return Col[unCurrentFieldPointer]->Name() ;
 }
 virtual void NewRow() = 0 ;
 virtual SYSERR Open
 (char * name, BOOL readonly=FALSE, COLUMN_INFO c[]=NULL) = 0 ;
 virtual SYSERR Top() = 0 ;
 eElementType Type(int Cl){return Col[Cl-1]->ColumnType() ;}
 virtual SYSERR Write() = 0 ;
 }; // end of class definition
#endif // TABLE_HPP

// xColumn.hpp
#ifndef XCOLUMN_HPP
#define XCOLUMN_HPP
#include "Column.hpp"
/*
 Native xBASE(c) columns are either LOGICAL, MEMO, NUMBER, CHARACTER
 or DATE.
*/
//======================Column descendants=================================
class xColumn : public Column
 {
 private:
 public:
 xColumn(){ ; }

 xColumn
 (char * PortionOfRowBuffer,
 char * cName, // column name
 eElementType Type, // LOGICAL, MEMO, NUMBER, CHARACTER, DATE
 unsigned short rawSize, // display (and storage) length
 unsigned short decimals = 0,
 char * DValue = NULL)
 {
 Init(PortionOfRowBuffer,
 cName, Type, rawSize, decimals, DValue);
 }
 };
 #endif

// xTable.hpp
#ifndef XTABLE_INC
#define XTABLE_INC
#include "Table.hpp"
#include "xColumn.hpp"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int const FIELD_REC_LEN = 32; // length of field description record
int const HEADER_PROLOG = 32; // Header without field desc and terminator
class xBaseTable: public Table
 {
 private :
 short iHeaderSize ;
 unsigned long ulFilesize ;
 struct DBF
 {
 unsigned char dbf_version; // version character
 unsigned char update_yr; // date of last update - year(-1900)
 unsigned char update_mo; // date of last update - month
 unsigned char update_day; // date of last update - day
 ULONG records; // number of records in dbf
 unsigned short header_length; // length of header structure
 unsigned short record_length; // col lengths + 1 for delete mark
 unsigned char reserved_bytes[20] ;
 } dbf ;
 struct FIELD_INFO // This structure is filled in memory
 { // with a fread and passed to Column class
 char name[11]; // name of field in asciz
 char type; // type of field...char,numeric etc.
 char field_data_address[4];// offset of field in record(not used here)
 unsigned char len; // length of field
 unsigned char dec; // decimals in field
 unsigned char reserved_bytes[14]; // reserved by dbase
 } header ;
 SYSERR Go(ULONG recno) ;
 BOOL IsDeleted() ;
 public:
 xBaseTable() {curr_fd = -1 ;}
 xBaseTable(char * FileName, BOOL readonly=FALSE)
 {
 curr_fd = -1 ;
 Open(FileName, readonly) ;
 }
 ~xBaseTable(){Close() ; }

 SYSERR Close() ;
 eElementType ColTypeXfrm(char header_type) ;
 char ColTypeFromElement(eElementType e) ;
 SYSERR Create(char * fname, COLUMN_INFO c[]) ;
 ULONG Delete()
 {
 *(cRowBuffer)='*';
 if (Write()==GOOD_RETURN)return 1L ; else return 0L ;
 }
 SYSERR Next() ;
 void NewRow() ;
 SYSERR Open(char *name, BOOL readonly=FALSE, COLUMN_INFO c[]=NULL) ;
 SYSERR Top() ;
 SYSERR Write() ;
 };
#endif // Table definitions

// CDataColumn.hpp
#ifndef CDATACOL_HPP
#define CDATACOL_HPP
#include "Column.hpp"
/*
CData columns are either Alphanumeric (CHARACTER),
 Numeric(DECIMALZEROFILLED,DECIMALSPACEFILLED),
 DATE,
 CURRENCY
CData character columns are left-justified and space filled with a
single terminating null.
Numbers are right-justified and left filled with either spaces of zeros.
Decimals are fixed.
*/
//======================Column descendants=================================
class CDataColumn : public Column
 {
 private:
 SYSERR AssignDate() ;
 const char * DisplayDate() ;
 public:
 CDataColumn(){;}
 CDataColumn
 (char * PortionOfRowBuffer,
 char * cName, // column name
 eElementType Type,
 unsigned short rawSize, // Does NOT include the Null!
 unsigned short decimals = 0,
 char * DValue = NULL)
 {
 Init(PortionOfRowBuffer,
 cName, Type, rawSize, decimals, DValue);
 }
 SYSERR Assign(char * value)
 {
 *(cValue + rawWidth) = '\0' ;
 return Column::Assign(value) ;
 }
 }; // end of class definition
 #endif

// CDatatab.hpp

#ifndef CDATA_INC
#define CDATA_INC
#include "Table.hpp"
#include "CDataCol.hpp"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

class CDataTable: public Table
 {
 private :
 struct FHDR
 {
 ULONG first_record ;
 ULONG next_record ;
 short record_length;
 } fhdr ;
 SYSERR Go(ULONG recno) ; // from One.
 BOOL IsDeleted() ;
 public:
 CDataTable() {curr_fd = -1 ;}
 CDataTable(char * FileName, COLUMN_INFO col[], BOOL readonly=0)
 {
 curr_fd = -1 ;
 Open(FileName, readonly, col) ;
 }
 ~CDataTable(){Close() ; }
 SYSERR Close() ;
 SYSERR Create(char * fname, COLUMN_INFO c[]) ;
 ULONG Delete() ;
 SYSERR Next() ;
 void NewRow() ;
 SYSERR Open(char * name, BOOL readonly=FALSE, COLUMN_INFO c[]=NULL) ;
 SYSERR Top() ;
 SYSERR Write() ;
 }; // end of CDataTable
#endif // Table definitions
//==========================System.h==========================
#ifndef SYSTEM_H
#define SYSTEM_H
#include <stdlib.h>
#include <direct.h>
#if defined __OS2__
#define INCL_WINSHELLDATA
#define INCL_WINDIALOGS
#define INCL_WINPOINTERS
#include <OS2.H>
#else
#ifndef BOOL
typedef short BOOL ;
#endif
#endif
#if defined __OS2__ __MSDOS__
#define PATH_SEPARATOR "\\"
#else
#define PATH_SEPARATOR "/"
#endif
#define MAXPROFILEPATH 30
char * DataDirectory(char * szFullFileName) ;

char * TimeStamp(char * bf) ;
BOOL IsBlank(const char * c, int wide) ;
BOOL IsValidDate(int iCCYY, int iMM, int iDD) ;
BOOL IsValidDate(char *CCYY, char *MM, char *DD) ;
int Leapyear(int iYYYY) ;
LONG TimeHundreths(const char * v, char sep) ;
//------------------------------------------------------------------
inline BOOL IsBlank(const char * c)
 {
 while ((*c == ' ') (*c == '\t') (*c == '\b') (*c == '\n') (*c == '\r')
 (*c == '\n'))
 c++ ;
 return (*c == '\0') ;
 }
//------------------------------------------------------------------
inline BOOL IsBlank(const char * c, int wide)
 {
 while ((*c == ' ')&&(wide > 0))
 {
 wide-- ;
 c++ ;
 }
 return (wide == 0) ;
 }
#endif // SYSTEM_H





































November, 1994
Endian-Neutral Software, Part 2


Program design and coding practices




James R. Gillig


Jim is a software engineer for IBM in Boca Raton, Florida. He can be reached
through the DDJ offices.


Software designed and written for Endian portability, commonly referred to as
"Endian-neutral," should be recompile-and-run capable. Achieving Endian
portability requires that you identify all necessary Endian dependencies in a
program, separating and isolating them as best you can through interfaces and
conventions. "Endian-aware" programs, on the other hand, may have
well-defined, Endian-specific parts that require some modification for
Endianness when porting the program to a processor of the opposite Endian; see
Figure 1.
Endian-neutral (EN) design goals focus on separating Endian-neutral and
Endian-specific parts, and minimizing the size of any necessary
Endian-specific part and any necessary modifications for porting from a
processor of one Endian type to another.
In general, Endian dependencies can occur when: 
An application provides Endian conversion for data interchange.
An application has both a Big-endian (BE) and a Little-endian (LE) version of
the same set of source code. 
An operating system manages a processor's Endian-related controls if any
exist. 
Device drivers handle Endian differences between the processor the device
driver runs on and its attached devices.
Communications and LAN software handle Endian differences between connected
systems. 
Conversion utilities allow the end user to import data from an opposite-Endian
program.
Compilers provide compiling options to generate different Endian-type code and
data.
Debuggers and dump utilities provide options for viewing data in different
Endian forms.
An instruction-set translator has to emulate the opposite Endian on a
platform. 


Endian-Neutral Programming


You can follow a number of practices for writing source code that's portable
between BE and LE processors. Although the guidelines and examples I present
here are based on C++, the principles apply to any language. 
Use a high-level language. Programming-language keywords in a language such as
C++ allow the declaration of data types and aggregate data and provide for
correct data type conversion. Operations such as casting, union, and bit field
allow flexibility and optimization for handling data but require Endian
awareness to ensure the production of Endian-neutral code. High-level language
constructs should be used for better programming consistency and handling of
data. 
Assembly language does not offer high-level constructs and data type checking,
and is more difficult for architecture-neutral programming. However, using a
high-level language is neither a necessary nor sufficient condition for making
software Endian neutral and, in general, the EN guidelines covered here apply
to assembly-language programming as well. 
Object-oriented design and programming can provide a higher degree of
neutrality than procedural programming alone because objects hide data from
direct access by pointers from outside the object; however, the object class
and its methods are still responsible for implementing Endian neutrality
within the object's program code.
Use data types correctly. In general, the same data should not be used as
different data types. Type conversion can corrupt data and introduce Endian
dependencies. Different data types can have different byte lengths and the
same data type may have different lengths on different processors. 
A data type should be treated by executable program code as intended for that
type. A multibyte-scalar integer, for instance, is treated by the processor as
a single, indivisible data item that represents a numeric value. The location
of bit and byte subfields within a scalar is variant between BE and LE, so
treat data according to its data type, length, and Endian type. For
portability, do not twiddle with the internal bits and bytes of
multibyte-scalar data while dependent on their individual byte addresses. 
Organize aggregate and scalar data. Program code becomes Endian dependent when
a multibyte-scalar data item is treated as aggregate data containing multiple
pieces of data across its bytes. Multiple pieces of data should be organized
as aggregate data using proper programming-language constructs such as a data
structure or data array. The individual members of a structure may be a scalar
element or another aggregate data element with its own members.
You should be cognizant of aggregate data and scalar data when programming and
treat scalars as single, indivisible, nonaggregate data items. Organize
scalars as members of an aggregate data construct when dealing with multiple
pieces of data in a collective manner. 
Avoid pointers within scalars. Do not point or index to smaller units of
storage within the internal byte structure of a scalar. LE data is
byte-reversed BE data, so internal byte positions are different. Do not
address, point, or index to a byte, halfword, or word scalar that may exist
within a longer data type. For example, a short integer contains two bytes, an
integer contains two short integers, and a long integer contains two integers.

Accessing bytes internal to scalar data through pointers and indexing is a
poor practice; see Example 1. Multiple-byte data elements can be defined as a
byte array such as char p[4], which is independent of Endian. 
Avoid overlaying scalars. A data item is said to be overlaid when it has more
than one meaning or type. Examples of overlaid data include using: a short
integer within a long integer; the high-order byte of an integer as a flag; a
bit field and a binary value in the same integer; and a scalar as an array of
bytes.
A longer data type should not overlay a shorter data type so that integers
don't overlay characters or bit fields and longer integers don't overlay
shorter integers. Overlaying data is not the same as sharing memory for
mutually exclusive use by different data types. Sometimes programmers overlay
data to save storage, but other times storage saving is not that valuable and
extra program code may be necessary to get at the desired piece of data. 
Be aware of Endianness with casting, union, and bit fields. C++ provides
programming constructs that may have side effects on data in the same or
different Endian modes. 
Casting forces a data-type conversion to another type: for example, short to
long or char to short. Pointer or reference casts allow a program to view a
variable of one data type as if it were another type. Thus, they make program
code dependent on the Endian mode of execution, as in Example 2(a), where the
cast to a short pointer results in pointing to different data depending upon
whether the execution mode is BE or LE. Regardless of Endian, do not cast a
longer type to a shorter type because if the value does not fit in the smaller
type, it can lead to data corruption as a result of type conversion; see
Example 2(b).
Union allows different variables and data types to share the same memory.
Don't use different data types concurrently in union, and do not use union to
do type conversion. The actual data currently stored in union, its actual data
type, and its referenced data type must be consistent. In Example 3, for
instance, the array name p should not be used to access data stored with
variable a because they are different types (p[0] will access the MSB of
integer a in BE mode and the LSB of a in LE mode.
A bit field is a set of adjacent bits that allows efficient, convenient use of
memory for implementing a program's flags, switches, or discretes. Example
4(a) shows a bit field defined in an integer. Fields can be referenced by
name, such as bitfield.b. 
If bit-field data is created with the structure in Example 4(a) and ported to
an opposite Endian system, then either the order of the actual data bit fields
a, b, c, pad in the integer must be reversed to pad, c, b, a for use with the
same structure or the order of the bit fields defined within the structure
must be reversed for use with the original imported integer and bit-field
ordering; see Example 4(b).
The reversal effect upon bit-field ordering between BE and LE is analogous to
byte reversal between BE and LE; in general practice, it's simply extended
down to the bit-field level by the "programming language." However, this
practice cannot be guaranteed for all compilers.
Alternatively, a program can perform its own bitwise-logical operations to
select, set, and test bit fields by shifting and masking. It must not depend
on byte address and order, however. That would make it Endian specific and not
portable to an opposite-Endian platform. 
Avoid alignment complications. Data alignment is a general portability concern
and, although independent of Endian, it can complicate assumptions about byte
location. A data item is "aligned" when its address is a multiple of its byte
length. Thus a 2-byte short integer aligns on an address that is a multiple of
2; a 4-byte long integer aligns on an address that is a multiple of 4; and an
8-byte long integer aligns on an address that is a multiple of 8. If the
architecture requires it, compilers will align data by default. When
unaligned, a data item may be at a different location than when aligned.
Assumptions about alignment, byte location, and Endian byte order can become
more complex, as in Example 5. When aligned, a padding byte may be inserted by
the compiler after the character "c" to force the integer n onto a proper
even-address boundary.
In Example 5, the pointer p may end up pointing to a pad byte before n if the
data is aligned, to the most significant byte of n if the data is unaligned
and BE, or to the least significant byte of n if the data is unaligned and LE.

To reduce padding for aligned data, avoid intermingling different data-type
lengths by arranging data in order from longer to shorter types. 


The Endian Test



An "Endian-adaptive" program queries or tests the processor to decide on the
Endian mode of its current execution so as to take a LE processing path or BE
processing path at run time. If the processor does not provide a means for
telling software its Endian mode, then a program test can be improvised, as in
Example 6(a), which implements an "Endian test." Endian macros can be defined
for reuse in a program, as in Example 6(b).
Endian-adaptive code is Endian neutral in that the same source can be
recompiled without change to run in the other Endian mode. Recompilation is
necessary for a different processor instruction set as well as Endian type. 
As an example of how to make code neutral with the "Endian test," consider
Example 7(a). This can be written using the macro, as in Example 7(b), or
without the macro and independent of byte order (for this particular example),
as in Example 7(c). This example assumes long is four bytes, but this is not a
safe assumption for portability. An alternate implementation of this example
is to declare long a instead as a character array a[4], in which case a[2] is
the same data 0x03 in either Endian mode, as single-byte character data has no
Endianness.
You must be careful with an adaptive implementation to make sure it is
portable for BE and LE across the required processors and compilers. You may
be able to implement necessary Endian-specific code with more-portable
Endian-adaptive code; nevertheless, writing Endian-neutral code is the first
step in achieving portable software across platforms of different Endian
types. 


Application Portability 


There are different ways to port a program to run on other processors. The
most common is to recompile the source code to run on the new processor;
another way is to translate the original binary code and its nonnative
processor instruction set on the new processor. 
A program can be recompiled to run on the instruction set of a different
processor (recompile-and-run). If the new processor is of the opposite Endian,
then the source code (unless it's Endian neutral) will need to have its
Endian-specific code modified to run in the new Endian mode before
recompilation. Recompiling an application to another processor produces a new
binary version of the program that will have to be serviced and supported. A
given application may have different binary versions for running on different
processors and in different Endian modes. The application's source-code set,
any Endian-specific parts, and all binary versions of it will have to be
maintained. 
Translation of binary code requires an instruction-set translator (IST) to
translate the original processor's instruction set and Endian mode on the new
(nonnative) processor. Compiled, binary code undergoing translation is treated
as data by the IST. If the new processor is of the opposite Endian then the
IST, running in the Endian mode of the new processor, must handle the
byte-reversed instructions and data of the program being translated. 
A special consideration is a bi-endian processor. A single binary version of
an Endian-neutral program cannot run in both the BE and LE modes of a
bi-endian processor even though the instruction set is the same. This is
because the address model, and therefore the (scalar) byte order, is different
for both executable instructions (binary code), and data. The benefit of a
bi-endian processor is that existing LE and BE applications and their
operating systems can be ported to it and still run in their native Endian
mode. An application must be compiled to the Endian of the operating system
and processor on which it will run. 
Today's modular systems share and reuse software resources, so be aware of the
Endian effect on reusable resources and make them Endian neutral wherever
possible. Presentation resources (icons, dialogs, controls, and image
bitmaps), program resources (pointers, integers, arrays, structures, and
headers), and miscellaneous resources (device drivers, presentation drivers,
objects, server resources, and files) are created and passed around for reuse
in different programs and on different platforms. These resources can
introduce Endian dependencies into a program, so awareness, inspections, and
testing are imperative. 


Data Portability 


Users of today's open, connected systems need the capability to interchange
data between different systems. Data can be interchanged through media
(diskettes, disk, tape, and so on) and local- and wide-area networks of
client, server, and host communications. Data portability requires the ability
to handle any difference between the Endian type of the data being ported and
the Endian mode of the system to which it's being ported. 
Data conversion between BE and LE requires knowing the data type, length, and
Endian type for all data elements of an aggregate data layout to be converted
(such as a data structure, array, file, or table). The conversion algorithm
between BE and LE is straightforward: Reverse the byte order of a given
multibyte-scalar data item. A prerequisite for conversion and often the crux
of the conversion problem is knowing the aggregate data layout or organization
of the data to be converted. Data is often understood only by the application
that creates it. 
When importing data created or processed by a program running on another
platform, the data's Endian type and other data characteristics may need to be
converted. Typically, the receiver or importer of the data does the
conversion; this is known as "receiver-makes-right." 
Data can be readily interchanged between the same application running on
different platforms because an application understands its own data. So a BE
version and LE version of the same application running on two different
platforms can exchange and convert data to the correct Endian type of the
application's resident platform.
To interchange data between different applications, a conversion utility can
be written that understands an application's data and converts it to another
application's data format and Endian type. Data-file formats often become
public for popular applications. For example, different versions of a word
processor that runs on an IBM PS/2 (LE, Intel), Macintosh (BE, Motorola), and
a RISC platform (bi-endian PowerPC) may need to import data from another word
processor.
A self-describing data resource can be created, handled, and converted by any
program that understands the public specification of its format and data
descriptors. For example, if a piece of data has associated descriptors for
its data type, length, and Endian type, then it can be easily converted. 
Data can be kept in a standard ("canonical") form for easier conversion. For
example, if a program always writes its data as LE, then a program running on
a BE platform knows to convert that data from LE to BE upon reading the data
from some magnetic/optical media, network, or communications link. 


Text Data and Unicode


A text stream or binary stream is composed of byte data. A byte is the
smallest addressable unit of storage; therefore, character data has no
Endianness and is neutral and portable. This is for single-byte character
encodings handled by the character data types (signed and unsigned).
The Unicode standard defines a fixed-width, uniform-text, character-encoding
scheme. All Unicode characters are represented as 16-bit values. For example,
the hex value 0x0041 represents letter A; 0x0020, the space character and
0x0409, the character named "CYRILLIC CAPITAL LETTER LJE." Unicode values must
be treated as single, 16-bit, unsigned integer values. Like other
multibyte-scalar values, Unicode data stored as binary data in a file, or
other interchange media, is Endian dependent. To manage Unicode data
correctly, data-interchange programs must be sensitive to the Endianness of
the source and target platforms.
To help deal with Endian conversion, the Unicode standard describes an
optional technique for programs that manipulate Unicode data. The standard
defines the value 0xFEFF as the character named "BYTE ORDER MARK" (BOM).
Applications may use this character to explicitly announce the Endianness of
the data. The byte-swapped mirror image of the BOM, the value 0xFFFE, is not a
defined Unicode character. Therefore, an application expecting the BOM and
finding 0xFFFE instead, knows that the Unicode data is not in the Endian
format expected. The application may then decide to perform byte swapping to
convert the Endian type or notify the user to run a conversion utility.


Bit-Field Data


Data portability and compiler implementation of bit fields is another issue.
Within an integer or word, fields may be assigned in left-to-right order by
some compilers and right-to-left by others. The programmer must contend with
these complications in addition to Endian byte order when exchanging data
across systems. In general, using a 32-bit word, BE labels bits 0--31
beginning with the most significant or leftmost bit; LE labels bits 0--31
beginning with the least significant or rightmost bit.
A bit field is a contiguous set of bits, where the most significant bit is on
the left end and the least significant bit is on the right end. The
programming language, if it supports bit fields, will generally extend the
Endian type for compiled data down to the bit-field level, even if a processor
doesn't support bit fields; that is, multiple bit fields defined within a word
appear in left-to-right order for BE and right-to-left order for LE. This is
the prevailing practice for compilers.
So three different bit fields named a,b,c, appearing in that order and
beginning at the left end of a word for BE would appear in c,b,a order,
beginning at the right end of a word for LE. This is for bit fields, not the
bits themselves; they retain the same relative bit order within a defined
field whether BE or LE. For example, if 16 1-bit fields are defined within a
16-bit word for LE, then all the bits in that word will in effect be in
reverse order when compared to its data representation for BE. 
When importing bit-field data from a system of the opposite Endian type,
either the order of the data bit fields must be reversed or the bit fields
must be accessed by a program in the reverse order.


Acknowledgments


Thanks to the following people at IBM for their contribution and review: Art
Adkins, Ken Borgendale, Norman Cohen, Ian Holland, Alan MacKay, Roy Ritthaler,
Rick Simpson, and Mark Wieland.
Figure 1 Endian-neutral design goals.
Example 1: Avoiding pointing and indexing within scalars.
long a = 0x01020304;
char c, *p;
p = (char *) &a; // set up pointer to a
c = p[2]; // c is 0x03 in BE mode
 and 0x02 in LE mode
Example 2: (a) Making program code depend on Endian execution mode; (b) data
corruption due to type conversion.

(a) long a;
 short b;
 a=1;
 b=*(short *)&a;
 // b=0 for BE and b=1 for LE

(b) int a;
 char b;
 b=(char) a;
 // Data loss, but no Endian problem
Example 3: The array name p should not be used to access data stored with
variable a because they are different types.
char c;
union udata {
 int a;
char p[4];
}
uvar ;
uvar.a = 0x01020304;
c = uvar.p[1] ; // c is 0x02 in BE mode
 and 0x03 in LE mode
Example 4: (a) Bit fields defined in an integer; (b) reversing the order of
data bit fields.
(a)
struct {
 unsigned a : 2 ; // two bit field
 unsigned b : 1 ; // one bit field
 unsigned c : 3 ; // three bit field
 unsigned pad : 10 ; // ten bit padding
} bitfield;

(b)
struct {
 unsigned pad : 10 ; // ten bit padding
 unsigned c : 3 ; // three bit field
 unsigned b : 1 ; // one bit field
 unsigned a : 2 ; // two bit field
} bitfield;
Example 5: Dependency on alignment and Endian mode of execution.
char *p;
struct {
 char c;
 // Alignment pad byte may occur
 // between C and n
 integer n;
} s ;
p = (char *) &s.c;
p++; // p points to what byte?
Example 6: (a) Endian test program; (b) Endian macros defined for reuse.
(a) #define BE 1
 #define LE 0
 char endian (void)
 {
 short x = 0x0100;

 return *((char *) &x); // return 0x01 for BE and 0x00 for LE
 }

(b) #define is_big_endian (endian() == BE)
 #define is_little_endian (endian() == LE)
Example 7: Making code neutral with the Endian test.

(a) long a=0x01020304;
 char c,*p;
 p=(char *) &a; // set up pointer to a
 c=p[2]; // c is 0x03 in BE mode and 0x02 in LE mode

(b) if (is_big_endian) c=p[1]; else c=p[2]; // c is 0x02 in BE or LE

(c) c=(a & 0x00FF0000) >> 16; // reliably yields 0x02 without a test






















































November, 1994
Sharing Peripherals Intelligently


Matching client/server needs to device exigencies




Ian Hirschsohn


Ian holds a BS in mechanical engineering and an MS in aerospace engineering.
He is the principal author of DISSPLA and cofounder of ISSCO. He can be
reached at Integral Research, 249 S. Highway 101, Suite 270, Solana Beach, CA
92075.


The problem of coping with mountains of data will get worse before it gets
better. Part of the solution is being able to share available data at
reasonable transfer rates. In this article, I'll describe how a standard PC
can be used to provide a pool of high-performance tapes, disks, image
printers, and other peripherals that are made available to client workstations
at sustained rates ranging from 4--7 Mbytes/sec each. The design uses SCSI-2,
but allows for IEEE 488 or other networks; the clients can be an arbitrary mix
of workstations, Macs and PCs typically connected via SCSI-2 to a 486-based
"peripherals manager." The result is the (commercially available) "STAR
Peripherals Manager," a collaborative effort between Texaco (Briarpark, Texas)
and Integral Research (my company).
Sustained throughput is clearly paramount for gigabyte files. At 50 Kbytes per
second, it takes Ethernet or Token Ring more than five hours to move one
gigabyte. Inexpensive QIC drives can hold several gigabytes, but it takes them
about two and a half hours to move one gigabyte through the printer port. At a
typical 250 Kbytes per second, even the SCSI Exabyte 8200 8mm requires an hour
to transfer one gigabyte. High-performance drives such as the IBM 3490 or the
$35,000 Metrum 2150 T120 will transfer the same amount of data at 2 Mbytes per
second in about 9 minutes, and SCSI-2 disks capable of 4--5 Mbytes/sec take
only about five minutes to do the same. 
Though client transfer performance is important, sustained device throughput
is critical. If a tape is streaming past the heads at 125 inches per second
(ips), the host processor must be able to accept the data at that rate.
Otherwise, the drive motors brake, reverse the tape, and reposition with
throughput going from Mbytes/sec to Kbytes/sec in "washing-machine mode."
Furthermore, the oscillating tape is susceptible to errors, increased head and
motor wear, and degradation of the tape itself. In short, sustained data
transfer is critical to all mechanical peripherals. Multi-megabyte device
cache is not a panacea: If you're reading one tape record just to find the
location of the next, streaming 20 Mbytes into cache is counterproductive. 
While future fiber networks promise to solve the performance issue, the
problem remains that every workstation platform has its own custom device
drivers and each operating system has its own API. Each variance of the
operating system often needs its own API, plus different SCSI drivers for each
adapter. The resulting 3-D matrix of platforms, APIs, and drivers (or lack of
them) often limits the acceptance of a device more than its physical
characteristics. The optimum solution clearly involves an intelligent
peripheral manager dedicated to matching client needs to device exigencies.


SCSI-2


Theoretically, SCSI-2 is the perfect solution--its 10-Mbyte/sec data-phase
rate (synchronous mode) for 8-bit and 20-Mbyte/sec rate for 16-bit (Wide
SCSI-2) is still beyond today's network capabilities. SCSI-2 is
inexpensive--almost every platform sports the 50-pin connector at the back.
You should be able to string up to eight machines together with cheap cables
and transmit data at rates that leave Ethernet standing still. But it isn't
that simple, even though it is perfectly feasible under the SCSI-2
specification. Although SCSI-2 is a standard, device nuances and
vendor-specific options enter in. For example, SCSI-2-compatible tape drives
are supposed to support variable-length records. However, most QIC drives can
only handle fixed-length blocks, which are often only 512 bytes long.
Applications expecting a QIC tape to behave like a 9-track will be
disappointed, regardless of the SCSI-2 specs. Some drives support fast
positioning, others don't. In practice, different APIs and even drivers are
needed for 9-track, 8mm, 3490, QIC, DAT, and D2. Likewise many platforms view
SCSI-2 disks differently.
A practical stumbling block to making SCSI-2 universal is the number of
applications and systems that key to a specific vendor's device. For example,
an application may expect an Exabyte 8200 and refuse any device that does not
respond as such--even an Exabyte 8500. Many applications install their own
drivers, and some customers customize their operating systems so that a
complete suite of APIs and drivers may be inadequate. Another issue involves
artificial system limitations. Most Sun SPARC systems, for example, are
limited to 64-Kbyte contiguous tape blocks. If a program writes a 256-Kbyte
record, four separate blocks are unknowingly written. This can be a surprise
when you try to read it on another platform.
Even when devices are fully SCSI-2 compliant, there may be more efficient ways
for them to operate. Most high-performance cassette drives, for example,
sustain peak throughput with maximum-length fixed blocks. Small records
substantially degrade throughput, so it is far more efficient to pack the
small records into large blocks. However, this requires customized unpacking
APIs for each platform. High-performance disks would come closer to delivering
rated throughput if systems used blocks longer than the usual 512 bytes or
took advantage of SCSI-2 queued requests. Published benchmarks show that disks
with a theoretical 4 Mbytes/sec only yield 300--500 Kbytes/sec on a PC.
There are also devices with features not addressed by the SCSI-2 spec. Robotic
tape drives need to be instructed to select a specific cartridge, typically
using a separate RS-232 connection. This presents a challenge when you need to
provide APIs and drivers on different platforms. Finally, many devices don't
support SCSI-2 at all, but they have unique, highly desirable features. 


STAR Peripheral Manager


The STAR Peripheral Manager is a standard 486/Pentium-based PC with up to
eight EISA SCSI-2 adapters (a passive backplane can support up to 18
adapters). Each adapter card can interface up to seven clients or devices,
although client workstations generally utilize a dedicated adapter. Figure 1
shows the STAR configuration. The 486 PC has a VGA monitor, keyboard,
200-Mbyte IDE drive, and from 16 to 256 Mbytes of memory. The IDE drive is for
STAR-system use only: The client disk farm (if any) uses multi-gigabyte SCSI-2
drives connected to the SCSI-2 adapters. This isolates the STAR system from
client disk use. The STAR system uses 8 Mbytes of RAM; the remaining 8--248
Mbytes are dedicated to a cache pool for client-device I/O.
Although STAR is capable of handling up to 24 active client-device queues
concurrently, two to four queues are typical. Generally, a client workstation
will copy a massive file to its own disk at maximum speed, then disconnect. A
486/66 EISA STAR PC easily keeps a 2-Mbyte/sec Metrum T120 tape running at
maximum capability while sustaining 4.5 Mbytes/sec from a SCSI-2 disk.
Measurements using other PCs to emulate tapes and disks (to measure rates
beyond physical devices) show that the STAR Peripherals Manager (PM) can
sustain 7 Mbytes/sec to a single client and about 5 Mbytes/sec to each of two
clients. A limiting factor is the 33-Mbyte/sec DMA limit of the EISA bus
coupled with the fact that each client-device queue requires concurrent I/O to
both the client and device. Therefore, 7 Mbytes/sec corresponds to a
14-Mbyte/sec EISA load. These figures are for 8-bit (Fast) SCSI-2 with a
theoretical 10-Mbyte/sec synchronous bandwidth (closer to 8 Mbytes/sec when
SCSI-2 protocol overhead is included), 16-bit (Fast Wide) SCSI-2 has a
theoretical 20-Mbyte/sec limit. So, the 7 Mbytes/sec could be exceeded with
16-bit SCSI-2, but few devices use Wide SCSI-2 at this time.
The STAR PM enables a mixed group of workstations (Macs and/or PCs) to share a
common pool of disks, tapes, and other peripherals. Since the devices are
connected to the clients via software, the peripherals are not restricted to
SCSI-2. Devices can be interfaced via any protocol, but clients typically see
the devices as generic SCSI-2 tape drives. Under SunOS, for example, a C
program can communicate with a SCSI-2 tape via standard C reads and writes to
/dev/mt0. STAR software then directs the data to or from the actual
peripheral; see Figure 2.
For disks, STAR responds via standard SCSI-2 disk protocol so that STAR disks
are indistinguishable from client system disks. But since STAR-based disks are
connected via software, STAR is able to concatenate multiple disks into, say,
a single 100-gigabyte virtual disk common to all clients. Alternatively, STAR
can partition a large disk or mirror multiple disks as an inexpensive RAID.
Another feature is that STAR caches client disk queues to its 486 extended
memory so that up to 240 Mbytes can wait in cache, thereby improving disk I/O.
An interesting possibility of this software connection is that STAR could
emulate different disk strategies, such as FAT clusters for PCs and UNIX
tables for workstations, on the same disk, enabling mixed-client platforms to
share a common disk.
A second STAR function is to mimic whatever actual device a client expects to
see. For each client, STAR is initialized to respond as a specific vendor's
SCSI-2 device. If the client expects an Exabyte 8200, that is what the client
sees--right down to the vendor's name and model number on the SCSI-2 Inquiry
command. STAR emulates all nuances of the vendor's device. The actual
peripheral may be a 9-track, 3480/90, T120, DAT, or QIC. STAR can even emulate
tape with disk, enabling multiple workstations to retouch frames of the same
movie or share a similar data set. Thus, standard software such as Landmark
ITA, Adobe Photoshop, Advanced Geophysical Promax, and the like can access any
vendor's device without platform-specific drivers or APIs. Additionally, since
only one client usually uses a given STAR adapter at a time, STAR can mimic
the device on multiple IDs and accommodate disparate apps.
An interesting aspect of mimicking widely used devices is that STAR enables
new technologies to be pressed into immediate service. For example, robotic
tape drives can place terabytes of data at your fingertips, but there are no
standard SCSI-2 commands to select a specific cassette. STAR can readily issue
the vendor-specific RS-232 commands for the robotic functions without custom
modifications to the client system. This avails the device to its whole mix of
client platforms, making it appear as a 3480, 9-track, or whatever else they
are comfortable with. Although a driver has to be developed for STAR, it is
only one driver.
A third and critical STAR function is to apply transformations or filters to
the data on the fly. Even minor differences such as blocking factors and
record headers precipitate this conversion process. Since a 486 separates the
client from the actual device, STAR is able to apply arbitrary transformations
to the data as it passes from the device to the client. These transformations
are loaded as 32-bit, protected-mode overlays so that users can program
whatever algorithms they wish. Generally, these transforms are no more than a
few hundred lines of code. To sustain maximum throughput, transforms are
written in 32-bit assembly language, but can be coded in any language that
will link as a standard DOS overlay.
Data transforms also enable optimum utilization of a specific device. As
previously mentioned, a Metrum T120 tape can sustain about 2 Mbytes/sec
provided the tape blocks are its maximum 256 Kbytes, but most apps use tape
records of a few Kbytes. The smaller records are packed/unpacked on the fly
via a transform subroutine, thereby improving performance. Transforms are also
key to emulating a specific vendor's peripheral, RGB to CYMK conversion, and
so on.


PORT


Although simple in concept, implementing a peripherals manager presents some
practical obstacles. Because it is more than a simple controller, STAR needs
an operating system. STAR uses the PORT system which was expressly designed
for multiprocessors (see my article series "Personal Supercomputing," DDJ,
June--August, 1992). The PORT system is only used for startup, screen
messages, contingency handling, and other functions that are not time
critical. Client-device queues, transformation handling, and all other
time-critical functions are handled by a 32-bit Assembly kernel. The kernel
programs the SCSI-2 adapter processors directly, even bypassing ROM-BIOS. PORT
remains in suspended animation until the transfer is complete, an error
occurs, a new client signs on, or some other event. 
A key factor in selecting PORT is its use of 486 virtual memory (vm). It is
this feature that limits the use of OS/2 or Windows NT with STAR. This vm
feature (common in workstations, minis, and some mainframes) translates the
contiguous virtual memory seen by a program into 4-Kbyte pages via an on-chip
Translate Lookaside Buffer (TLB). However, these 4-Kbyte pages could be
scattered all over real RAM. Virtual memory enables the operating system to
pack real memory full of tasks and not run out as tasks are spawned and
deleted. The feature is used heavily by all 32-bit protected-mode operating
systems, including the Phar Lap DOS extender. The SCSI-2 adapters, however,
have on-board processors of their own. For reasonable throughput, data needs
to be shipped via DMA direct from 486 memory. The problem is that if you
requested a DMA transfer from address 123ABC of 64 Kbytes, the actual memory
location of the first page may be at 403DEF, the second at 23AB55, and so on.
Although multitasking OSs circumvent this via dedicated blocks of contiguous
memory, the data has to be copied back and forth from these limited buffers.
All 486 memory should be directly addressable by the SCSI-2 microprocessors.
PORT is a virtual-memory system. However, the difference is its use of
software virtual memory, not the 486 TLB, so real memory is contiguous for
buffers, cache, multiprocessing, and all kernel functions. PORT uses 8 Mbytes
of extended memory and the remaining 8--248 Mbytes is used by the STAR kernel
for its cache queues, buffers, and so on. The STAR Manager software, resident
in PORT, is written in PORT's 64-bit Fortran_C, simplifying development of the
extensive management program. As a virtual-memory system, PORT enjoys the
benefits of a paged system and can also utilize multiple large buffers for
tape directories, device-content analysis, and so on. Although software
virtual memory (essentially demand overlay) is slower than an on-chip TLB, the
difference is insignificant for the functions the STAR Manager performs. PORT
buffers that overflow its 8-Mbyte space are staged to the STAR IDE disk
transparently.
An interesting aspect of PORT is its ability to support multiple RISC
processors (such as the Intel i860) on PC plug-in cards. On its own i860, the
STAR Manager has almost no overhead. Alternatively, it can farm out Transforms
to DSP or RISC processors, yielding performance well beyond the 486 for the
particular algorithm; see Figure 3. A multiprocessor STAR dedicates the 486
entirely to the client-device I/O. PORT's orientation to asymmetrical
multiprocessing (task specific, dissimilar, multiple processors) is a perfect
fit for STAR. Although not required, multiple processors provide STAR with a
growth path beyond the limitations of the 486 family.


Next Time 


A sophisticated operating system should allow STAR to implement a universal
networking and client communication protocol via SCSI-2 tape read/writes.
Although this protocol isn't mandatory, it enables a client to query the
devices available on the STAR, select a transform, position to specified file,
and perform other functions that may not be supported by the application
itself. All this is transparent to the client's host operating system. This
protocol, which I'll cover in a future article, is important for device
features not covered by the SCSI-2 command set.
Figure 1 STAR Peripherals Manager uses a 486-family PC to interface client
workstations/Macs/PCs to a peripherals pool. Clients generally connect via
SCSI-2, but devices can use a variety of protocols.
Figure 2 Adapter cards typically service single client or device, but up to
seven clients/devices can chain to a single 8-bit SCSI-2 adapter. Clients and
devices do not share the same adapter.

Figure 3 PORT's Cray multiprocessor model utilizes shared memory for maximum
throughput. The system is designed to manage task-specific processors
(asymmetrical multiprocessing), which is ideal for STAR.





























































November, 1994
Partitions


Dividing data into meaningful groups




Joe Celko


Joe is a database consultant and contributing editor to DBMS magazine. He can
be contacted on CompuServe at 71062,1056.


Formally, a partition of a set of objects is a collection of subsets such
that: 1. The union of all the subsets in the collection is equal to the
original set; 2. the intersection of all the subsets in the collection is
empty; and 3. no subset in the collection is empty. This means that every
element of the original belongs to one and only one subset in the partition.
Informally, a partition is a way of dividing up the loot, cutting up the cake,
or grouping data into classes for a report. 
If you are familiar with SQL, the GROUP BY clause is a partitioning table. The
GROUP BY clause results in the minimum number of groups where, for each
grouping column of each group of more than one row, all values of that
grouping column are either the NULL value, or equal to each other. For
example, if I used GROUP BY states on a geographic database, I would expect to
get (at most) 51 rows in the grouped table (one grouped sub-table per state
and one for all NULL values). 
In computer science, there is a whole class of NP-complete problems called the
"Knapsack" and "Bin-Packing" problems which depend on partitioning.
Informally, NP-complete problems are ones which get so big, so fast that it is
not practical to work them out. They often involve a factorial calculation
hidden in them. 
The Knapsack problem takes its name from the way it is usually presented.
Imagine that you are packing a knapsack for a hike. The items which you can
stuff into the knapsack have a weight (or cost) and a value to you. A compass
is extremely light, but very valuable. A bag of wet sand is extremely heavy
and totally worthless on a hike. A canteen of water is somewhere between these
two extremes in both weight and value. 
The problem is to get the most value in the knapsack under a certain weight
limit. This means that you want to partition the items into all possible
subsets, and then rate them for weight and value. Of those whose weight is no
greater than the limit of your back, you wish to find the ones which have the
greatest value. The packing may or may not be unique, if it exists at all. 
The Bin-Packing problem is usually explained with a warehouse full of
identical empty bins and a pile of different-sized boxes which must be put
into the bins. The goal is to put all the boxes into the smallest number of
bins. The size of the bins and boxes are usually given as integer units to
make the problem easier. The partition we want to find has the smallest number
of subsets, each with a total size no greater than the bin size. 
Partitions have been an area of study for mathematicians for over 300 years,
ever since Leibniz asked Bernoulli if he had investigated P(n), the number of
partitions of a set of (n) objects. 
The mathematician Eric Temple Bell came up with the "Bell numbers," a formula
for finding the number of partitions of a set. There are two formulas to
compute the Bell number; one uses Sterling numbers, and the other uses
Binomial coefficients. The Sterling number, S(m,n), is defined as the number
of ways of picking a subset of n elements from a set of m elements. Binomial
coefficients, C(n,r), represent the number of combinations from a set of n
elements taken r at a time. 
The function for C(n,r) can be coded iteratively and will run very fast, but
the procedure which calls it will be recursive. The function for S(m,n) can be
coded in a simple, recursive fashion, but the procedure which calls it will be
iterative. You simply cannot escape recursion in partitioning problems. I've
given the Sterling version here because it is short and uses simpler
arithmetic. 
If you wish to try coding the Binomial coefficient version, the function for
C(n,r) is also given. I have not compared the run times. Figure 1(a) shows the
recursive formula.
The Bell number simply tells you how many partitions to expect, but does not
show you what the partitions will look like. For example, Figure 1(b) lists
all 15 possible partitions of the set {1, 2, 3, 4}. The vector beside each
partitioning is the signature of the collection. The first partition has all
four elements in one set; the second partition has the first three elements in
one set and the fourth element in another; and so forth until we have four
partitions of one element each. This signature can be used for generating
partitions. The procedure Part(n) is an algorithm for finding the signatures.
The procedure OutSignature can be replaced by one that will print the actual
subsets. 
Inspect the signatures in the example. There are really only five different
signature patterns in this partition, namely:
One set of four elements.
One set of three elements and one set of one element. 
Two sets of two elements each.
One set of two elements and two sets of one element. 
Four sets of one element each.
The count of partition multisets is called p(n), and there is no known formula
to calculate it; you just have to work it. As you have likely already figured
out, p(n) is the number of ways that you can write the integer (n) as the sum
of smaller integers. An approximation formula does exist for p(n), but it is a
little high for smaller values of (n). The formula is shown in Figure 2.
The two tricks to generating the signature patterns are relatively simple. The
first is that no signature pattern can be longer than (n) elements; the second
is to realize that if you know the sum of all but one of the elements, you can
compute it by subtracting it from the total of the others. The total of a
signature pattern is always equal to (n). 
This means that you can start with a single element of (n) and expand it in an
orderly fashion, building each series of partitions of a particular length.
See GenPattern1 in Listing One . 
This algorithm can be improved upon by a change in notation for a multiset.
Let the x @ y sign mean "x sets of y members each" so that you can condense
the notation as in Figure 3. This algorithm is given in Genpattern2 in Listing
One. The program could be made more readable by introducing a record data type
with size and count fields to use as an array, but I have not done so. 
Lotteries usually involve picking a subset of numbers from some range. As an
exercise, you're invited to take your state lottery rules and use these
algorithms to make yourself a millionaire. Who says computer science isn't
practical? 


References


Djokic, B., et al. "A Fast Iterative Algorithm for Generating Set Partitions."
The Computer Journal, 1989. 
Er, M.C. "A Fast Algorithm for Generating Set Partitions." The Computer
Journal, 1988. 
Nijenhaus, A. and H.S. Wilf. Combinatorial Algorithms. San Diego, CA: Academic
Press, 1978. 
Reingold, E.M., J. Nievergelt and N. Deo. Combinatorial Algorithms: Theory and
Practice. Englewood Cliffs, NJ: Prentice-Hall, 1977.
Semba, I. "An Efficient Algorithm for Generating All Partitions of the Set
{1,_n}." Journal of Information Processing, 1984.
Figure 1: (a) Computing the Bell number via Binomial coefficients; (b) 15
possible partitions of the set {1, 2, 3, 4}.
(a) Bell(n+1) = Sum(C(n, k) * Bell(n)) from k :=0 to n;
(b) {1, 2, 3, 4}} = (1, 1, 1, 1)
 {{1, 2, 3}, {4}} = (1, 1, 1, 2)
 {{1, 2, 4}, {3}} = (1, 1, 1, 2)
 {{1, 2}, {3, 4}} = (1, 1, 2, 2)
 {{1, 2}, {3}, {4}} = (1, 1, 2, 3)
 {{1, 3, 4}, {2}} = (1, 1, 1, 2)
 {{1, 3}, {2, 4}} = (1, 1, 2, 2)
 {{1, 3}, {2}, {4}} = (1, 1, 2, 3)
 {{1, 4}, {2, 3}} = (1, 1, 2, 2)

 {{1, 4}, {2}, {3}} = (1, 1, 2, 3)
 {{1}, {2, 3, 4}} = (1, 2, 2, 2)
 {{1}, {2, 3}, {4}} = (1, 2, 2, 3)
 {{1}, {2, 4}, {3}} = (1, 2, 2, 3)
 {{1}, {2}, {3, 4}} = (1, 2, 3, 3)
 {{1}, {2}, {3}, {4}} = (1, 2, 3, 4)
Figure 2: Function to approximate the partition count.
FUNCTION PartitionCount(n : INTEGER): INTEGER;
{ approximate number of partition patterns for a set of (n)
 elements. Formula tends to guess high. }
CONST pi = 3.141592653;
BEGIN { math can be optimized because of constants }
PartitionCount :=Round ((1.0 / (4.0 * n * Sqrt(3.0))) * Exp (pi * Sqrt((2.0 *
n)/3.0)))
END;
Figure 3: Condensing notation for multisets.
{{1, 2, 3, 4}} becomes (1 @ 4)
{{1, 2, 3}, {4}} becomes
 (1 @ 3, 1 @ 1)
{{1}, {2}, {3}, {4}} becomes (4 @ 1)

Listing One 

PROGRAM TestBellNumbers;
VAR n : INTEGER;

FUNCTION Bell(m : INTEGER) : INTEGER;
{ total number of partitions for a set of n elements }
VAR i, sum : INTEGER;


FUNCTION Sterling (m, n : INTEGER) : INTEGER;
{ partition of m things in sets of size n }
BEGIN
IF ((m = n) OR (n = 1)) { subset equal to original }
THEN Sterling := 1
ELSE IF (m < n) { subset greater than original }
 THEN Sterling := 0
 ELSE IF (n = 0) { empty set }
 THEN Sterling := 0
 ELSE Sterling := Sterling((m-1), (n-1))
 + (n * Sterling((m-1), n));
END;

BEGIN 
sum := 0;
FOR i := 0 TO m
DO sum := sum + Sterling (m, i);
Bell := sum;
END;

BEGIN
Write('Give an n: ');
ReadLn (n);
n := Bell(n);
WriteLn('Bell Number is ', n);
ReadLn;
END.

PROGRAM TestCombinations;

{ version using combination operator }
VAR n, k : INTEGER;

FUNCTION Comb (n, k : INTEGER) : INTEGER;
{ Binomial coefficient 
 or number of k element subsets in set of n }
Var i, top, bottom : INTEGER;
BEGIN
top := 1;
FOR i := n DOWNTO (n - k + 1) DO top := top * i;
bottom := 1;
FOR i := k DOWNTO 2 DO bottom := bottom * i;
Comb := top DIV bottom;
END;

BEGIN
Write('Give an (n): ');
ReadLn (n);
Write('Give an (k): ');
ReadLn (k);
n := Comb(n, k);
WriteLn('Comb Number is ', n);
ReadLn;
END.
PROGRAM GenPattern1;
CONST big = 100;
VAR
 p : ARRAY [0..big] OF INTEGER;

 i, j, n, PatternSize : INTEGER;

PROCEDURE OutPattern;
{ Display the current partition pattern }
VAR i : INTEGER;
BEGIN
Write('(', p[1]);
FOR i := 2 TO PatternSize
DO Write(', ', p[i]);
WriteLn(')');
END;

FUNCTION Sum(a, b :INTEGER) : INTEGER;
{ compute total of subarray p[a:b] }
VAR total, i : INTEGER;
BEGIN
total := 0;
FOR i := a TO b
DO total := total + p[i];
Sum := total;
END;

BEGIN
PatternSize := 1;
p[0] := -1; { sentinel value }
WriteLn('Give me n: ');
ReadLn(n); 
p[1] := n; { load starting value into array }
WHILE (PatternSize <= n)
DO BEGIN

 OutPattattern;
 i := PatternSize - 1;
 WHILE ((p[PatternSize] - p[i]) < 2)
 DO i := i - 1;
 IF (i <> 0)
 THEN FOR j := (PatternSize - 1) DOWNTO i 
 DO p[i] := p[i] + 1
 ELSE BEGIN
 FOR j := 1 TO PatternSize 
 DO p[j] := 1;
 PatternSize := PatternSize + 1;
 END; 
 p[PatternSize] := n - Sum(1, (PatternSize - 1))
 END;
END.
PROGRAM GenPatterns2;
{ generate partitions in dictionary order }
CONST BIG = 100;
VAR
 m, p :ARRAY [-1..BIG] OF INTEGER;
 i, sum, n, left : INTEGER;

PROCEDURE OutList;
{ this procedure uses the new notation,
 but can be easily modified to print out full multisets }

VAR i : INTEGER;
BEGIN
Write('(', m[1] ,' @ ', p[1]);
FOR i := 2 TO left
DO Write(', ', m[i] ,' @ ', p[i]);
WriteLn(')');
END;

BEGIN
left := 1;
p[-1] := 0;
m[-1] := 0;
WriteLn('Give me n: ');
ReadLn(n);
p[0] := n + 1;
m[0] := 0;
p[1] := 1;
m[1] := n;
WHILE (left <> 0)
DO BEGIN
 OutList;
 sum := m[left] * p[left];
 IF (m[left] = 1)
 THEN BEGIN
 left := left - 1;
 sum := sum + (m[left] * p[left]);
 END;
 IF (p[left - 1] = p[left] + 1)
 THEN BEGIN
 left := left - 1;
 m[left] := m[left] + 1;
 END
 ELSE BEGIN

 p[left] := p[left] + 1;
 m[left] := 1;
 END;
 IF (sum > p[left])
 THEN BEGIN
 p[left + 1] := 1;
 m[left + 1] := sum - p[left];
 left := left + 1;
 END;
 END;
END.
PROGRAM Partitions;
CONST big = 100;
VAR
 p, q: ARRAY [0..big] OF INTEGER;
 n : INTEGER;

PROCEDURE OutSignature;
{ display the signature of a partitioning }
VAR i : INTEGER;
BEGIN
Write('(', q[1]);

FOR i := 2 TO n
DO Write (', ', q[i]);
WriteLn(')');
END; { OutSignature}

PROCEDURE AllSubsets(n : INTEGER);
LABEL 10;
VAR
 i, last, m, ClassCount : INTEGER;
BEGIN
ClassCount := 1;
FOR i := 1 TO n DO q[i] := 1;
p[1] := n;
OutSignature; { display single set }
REPEAT
 m := n;
 WHILE (TRUE)
 DO BEGIN
 last := q[m];
 IF (p[last] <> 1)THEN GOTO 10;
 q[m] := 1;
 m := m -1
 END;
10: ClassCount := ClassCount + m - n;
 p[1] := p[1] + n - m;
 IF (last = ClassCount)
 THEN BEGIN
 ClassCount := ClassCount + 1;
 p[ClassCount] := 0;
 END;
 q[m]:= last + 1;
 p[last] := p[last] - 1;
 p[last + 1] := p[last + 1] + 1;
 OutSignature;
UNTIL (ClassCount = n);
END; { AllSubsets }


BEGIN
WriteLn ('Give me an n: ');
ReadLn(n);
AllSubsets(n);
ReadLn
END.

PROCEDURE OutSubsets;
{ Build actual subsets from signature array }
VAR i, j, ThisSet : INTEGER;
BEGIN
ThisSet := 1; { current set id # }
BEGIN
 FOR i := 1 TO n { scan array for elements }
 DO IF (p[i] = ThisSet)
 THEN BEGIN { format display }
 Write('{ ', i); { first element }
 FOR j := (i+1) TO n { scan rest of set }

 DO IF (p[j] = ThisSet)
 THEN Write (', ', j);
 Write(' } '); { close up brackets }
 ThisSet := ThisSet + 1; { set up for next set }
 END;
 END;
WriteLn;
END;


































November, 1994
Interfacing Laboratory Instruments


Moving data from lab instruments to PCs via RS-232




Brian R. Anderson


Brian is an instructor at the British Columbia Institute of Technology in
Burnaby, British Columbia. He can be contacted through the DDJ offices.


When most computer users think of RS-232 serial communications, what usually
springs to mind are modems, BBSs, and the information highway. However, most
modern laboratory instruments include a serial port for transferring data
readings made by the instruments to a PC.
Lab-instrument user manuals frequently include sample programs for initiating
a data transfer. While these programs usually function correctly, they
invariably provide limited functionality, and each is different (requiring
different actions by the users and producing output in a different format).
Since most laboratories use a variety of instruments, the sample programs for
each instrument need to be entered, tested, and debugged, and the operation of
each must be learned by laboratory personnel. It is no surprise, then, that
most scientists, engineers, and technicians still prefer to record readings
from their instruments using pencil and paper!
A local research facility (for which I do consulting work) asked me to propose
a solution to this sort of problem. The client frequently ran a battery of
tests on both raw material and the manufactured product. The results of these
tests were entered into spreadsheets and/or database programs for analysis.
The researchers were wasting far too much time collecting data, and even more
time entering it into the computer. Besides slowing down the collection and
analysis cycle, the manual steps were prone to errors. 
My solution was a program that runs on a laptop computer (the client had a
number of under-utilized laptops) with interfaces to the instruments for
generating data files in formats that can be imported into other programs
(such as Lotus 1-2-3). To make my program (which I dubbed "LabMate") more
flexible, I didn't hardcode "knowledge" of the individual instruments, opting
instead to use a configuration file to describe the particular requirements of
individual instruments. Each entry in the configuration file results in a new
menu item. Regardless of the instrument, a simple and consistent user
interface is maintained. The program provides for a series of individual
measurements (under user control) or a timed series of measurements (under
program control). Readings, including rudimentary analysis (minimum, maximum,
average, and standard deviation) are saved to a file. 
On the downside, my proposal eventually lost out to a competing one that used
a central mainframe computer wired to each station (laboratory instrument).
Each station was outfitted with a small keypad to initiate the transfer of
data back to the central computer. On the upside, however, I'm able to present
LabMate here. LabMate and the UI toolkit are available for both Microsoft C
6.0 and Bor-land Turbo C. Only the UI toolkit has compiler-specific code. All
source code (including both versions of the UI toolkit), the LabMate
executable, a sample configuration file, the online help file, and an ASCII
version of the user's manual are available electronically; see "Availability,"
page 3. Contact me directly for information on hardcopies of the documentation
and revisions to LabMate.
Some of the code I used for this project was originally developed as sample
code for students in programming classes I teach. This recycled code includes
a module for direct access to the RS-232 communications ports (written in C,
but using many assembly-language techniques) and the port module is interrupt
driven (unlike the DOS BIOS port functions). The other recycled code is the
user-interface toolkit--similar to Al Stevens' D-Flat, but much cruder and
simpler (and predating Al's code by a couple of years). The UI code is
actually several modules, each with its own header file, but with all the
compiled code collected into a single library.
The remainder of the program is new, and divided into two modules, each
containing several functions: The main module (labmate.h and labmate.c)
includes code to tie into the UI toolkit and code for accessing and processing
the configuration file. The run.c module contains code relating to
step-by-step readings under user control and continuous readings under program
control.


Serial Communications


The port module--port.h and port.c (Listings One and Two)--interfaces directly
to the UART chip in the computer. This module is interrupt driven--as soon as
the UART receives a character, it generates a hardware interrupt. An interrupt
service routine (ISR)in the port module saves the character into a buffer. The
program can then get the character "at its leisure" with no danger of lost
characters. DOS-based PCs commonly have four communications ports, COM1
through COM4, with COM1 and COM3 sharing interrupt vector 0Ch, and with COM2
and COM4 sharing interrupt vector 0Bh. LabMate's port module uses a single
ISRto handle all four ports. (Despite this sharing, a program can use all four
ports, if need be, although LabMate uses only one port at a time.)
The port module begins with a group of #define statements used to derive the
addresses that access the various registers of the UART (each
communications-port UART has its own unique address space). The next group of
defines includes constants (mostly masks used to pick out particular bits)
needed to access the UART and PIC. Also included is a constant related to the
crystal frequency used to generate the communications pulses. Following the
constants are a number of arrays used to access the UART, PIC, and data
buffers. In some cases, the arrays have four elements (one for each port); in
other cases, they have only two (one for each of the shared interrupts). These
arrays are initialized with the proper address, interrupt vector, and so on
for each port.
The receive interrupt service routine, rxisr(), is the heart of the port
module. Because it is shared among up to four ports, when an interrupt does
occur, we must check to see which port caused the interrupt (a procedure
usually referred to as "poll on interrupt")--a For loop checks the
interrupt-pending bit within the interrupt-identification register of each
UART: ((INTPEND & inpw (adr + INTID)) == 0). If the test is True, the waiting
character is copied into the circular buffer reserved for that port.
The initcom() function writes data directly to the UART control registers to
set the bit rate, number of data bits, number of stop bits, and parity. The
bit rate is set by turning on the divisor-latch access bit (DLAB) and then
depositing a number derived from the clock (crystal) frequency and the bit
rate. The other parameters are set via various bit fields with the
line-control register of the UART.
The start() function installs the ISR. While usually straightforward, this is
complicated somewhat by shared interrupt vectors. For instance, if a program
is using both COM1 and COM3, the interrupt must be installed only once. This
is ensured by an array of two flags. Each port has its own UART, and various
bit fields in the UART must be set to activate it. The UART must also be told
to generate interrupts upon receipt of a character. The stop() function
removes interrupt vectors and deactivates the UART. Again, this is complicated
because of sharing--you may want to deactivate COM3 without removing the
interrupt (because that interrupt is also used by COM1); and again, flags help
sort this out.
The other functions in the port module are simpler: rxstat() checks the buffer
set up by the interrupt routine and returns True if there is a character in
the buffer; rx() returns the next character in the buffer (yet relies on the
previous function for correct operation!); txstat() and tx() directly access
the UART registers, first to check if the UART is busy, and then to send the
UART a character; and ctsstat(), ristat(), dcdstat(), and dsrstat() query the
UART to determine the status of the various RS-232 input-handshake lines.


Configuration


LabMate uses a configuration file to hold information about the instrument(s)
that it interfaces with. This file includes the menu text (what the user sees
in one of the program's menus); information about how the instrument uses the
RS-232 port; and the strings that must be sent to the instrument before,
during, and after data transfer. Figure 1 is a sample configuration-file entry
for a Fluke/Philips multimeter. 
The first line appears in LabMate's Instrument menu--the tilde (~) indicates
that the next letter (R, in this case) will become a hot key for this menu
selection. The second line specifies the RS-232 port settings, including which
input-handshake line to monitor (one of the output-handshake lines from the
instrument must be wired to this line). The third line is an initialization
string for the instrument. The backslashes (\) are used to indicate a
noncharacter code (in decimal)--for example, you may recognize \27 as an ASCII
<Esc> code. The contents of this line (and the next two lines) is determined
solely by the requirements of the instrument: Consult your instrument user's
manual for this information. The fourth line is a deinitialization string sent
after all readings have been completed. It is needed by many instruments to
switch them out of remote-transfer mode back into manual mode. The last line
of the configuration entry for this instrument is the string that is sent to
the instrument to cause a single datum to be transferred to the computer. Note
that lines three, four, and five all end with \10, the ASCII <lf> (line feed)
character. For some instruments this may be omitted; for others it may be <cr>
or <cr><lf>. See your instrument manual for details.


Theory of Operation


The two source modules, labmate and run, are available electronically. The
main() function of the text-based UI performs the following operations: 
Sets up structures for the main menu and for keyboard accelerators; both of
these structures include pointers to functions executed when the menu item is
selected or an accelerator key is struck.
Creates (draws) the main window and menu bar.
Enters the message loop. 
Terminates the program. 
Processes the configuration file.
To process the configuration file, readcfg(), translate(), and freecfg() are
used. Most of the work is done by readcfg(); translate() is responsible for
interpreting the backslash sequences (such as: \127); and freecfg() is called
when the program terminates to deallocate the dynamic memory used to save the
configuration information. The readcfg() function processes each line of a
configuration-entry sequence and will process up to MAXINST instruments. The
first line of a configuration entry is saved into an array of pointers to
strings (InstName). In order that the strings all be the same length (which
looks better in the menu), this information is first put into a temporary
array, then padded with a number of spaces, depending upon the length of the
longest instrument name. The next step is to process the communications-port
information: strtok() is used to parse the semicolon-separated fields. As each
field (which are strings) is recognized, a field in the COMPORT struct for
that instrument is filled with the correct numeric information. The final
three lines of configuration information are passed through translate() to
convert backslash codes into ASCII and saved for later use. There are five
parallel arrays used to store the configuration information, one for each line
in a configuration entry.
The pointers to functions in the menu struct refer to functions responsible
for processing particular pull-down menus. Each pull-down menu causes other
functions to be executed, some of which may also be executed via the pointers
to functions in struct hotkey. The program terminates when one of these
functions returns Quit, which eventually gets back to ServiceMenu() in the
main() message loop. All of these functions--doopen(), dosaveas(), and the
like--are involved with collecting information from the user: filenames,
sample identifiers, and indications of which instrument to use. Eventually,
the user must make a selection from the Collect menu, at which point functions
from the run.c module are called to perform the interaction with the
instrument to collect and save the data.
The run.c module interfaces to LabMate via three functions: dosrun() performs
single-step measurement runs; docrun() performs continuous measurement runs;
and checkmodem() ensures that the proper RS-232 handshake signals are present
before any measurement is attempted. In addition, there are 17 "helper"
functions defined as "static" to keep them local to this module. 
Most of the code in this module is related to making measurements easy for the
user, who can access any of the measurement-screen features via either
keyboard or mouse and get help via the F1 key. 
The dosrun() function begins by initializing the serial-port module and
sending the initialization string to the instrument. Then the screen is
"painted"--that is, a number of user-input fields are placed on the screen.
Next, an event loop is entered, where you wait for the user to initiate an
action: Within the event loop is a Switch/Case statement for recognizing
keystrokes and a nested If/Else for checking to see if the mouse has been
clicked over one of the button commands on the screen. When the user chooses
an action, one of six functions is called to process the request. 
For instance, if the user selects "Measure," dom() is called. This results in
the Trigger string being sent to the instrument via putinststr(), at which
point the program monitors for a reply from the instrument via getinststr().
If a reply is found, it is added to the end of a dynamically allocated array
using realloc(), and a count is updated both internally and on the screen. The
getinststr() and putinststr() functions are similar to the standard library
functions gets() and puts(), except that the instrument input/output functions
provide for time-out if data cannot be received from, or sent to the
serial-communications hardware.
At any time during the reading of a group of measurements, the user may
replace or change a reading; this is useful if a bad reading is taken. On the
measurement screen is a small scrolling window that shows readings as they are
taken and edit events as they occur. The user may choose either Edit or
Delete; in either case, the scrolling window is replaced with a list box used
to select the measurement to be edited (replaced, actually) or deleted. The
list box always reflects the order in which the data is stored, whereas the
scrolling window simply shows the events as they occur. Deletes are shown as a
line of red dashes, replacements are shown in red. 

When the user finally selects OK to finish a group of measurements, the event
loop is terminated after calling doo() (do OK). The doo() function first uses
scanf() to ensure that all of the data in the dynamic array is numeric, find
the minimum and maximum, calculate the average, and sample standard deviation.
scanf() writes all of the information it has collected and/or calculated to
the file, including an error message if any nonnumeric data was encountered.
The docrun() function is similar in many ways to dosrun() but is somewhat
simpler. First the user is prompted for the number of measurements to take and
the interval between them. Both of these figures must be given "up front"
before the program will continue. At that point, a measurement screen is
presented to the user with the measurements "paused." The user has only three
choices: Stop, Go, or Pause. Readings are shown on a scrolling window as they
occur. When readings are paused, a bright red message appears on screen. 


Conclusion


The process presented here shows how to significantly automate the task of
making and analyzing many measurements via instruments that include an RS-232
interface. Some instruments may have an IEEE-488 parallel interface port
instead of an RS-232 serial port, but these ideas can be adapted to handle
that type of instrument as well.
Figure 1: Sample configuration-file entry.
PM2525 (~Resistance)
COM1;1200;8;N;2;DSR
\27 2, \27 5, \27 4, FNC RTW, OUT N, TRG B, EMO A, X 20 \10
EMO 0, \27 1 \10
X 1 \10

Listing One 

/* Physical Layer Module -- IBM-PC Serial Interface
 * Employs Hardware Interrupts and direct UART access.
 * Programmer: Brian R. Anderson * Date: Sept. 29, 1990
 * Modified: December 17, 1990 -- (Support for COM3 & COM4)
 * Modified: December 10, 1993 -- (Monitor handshake)
 */

#define COM1 0
#define COM2 1
#define COM3 2
#define COM4 3

#define TRUE 1
#define FALSE 0

enum { NONE, ODD, filler, EVEN };

/* initialize baud rate, data bits, parity, and stop bits */
void initcom (int port, int baud, int data, int parity, int stop);
/* start receiving -- install interrupt service routine */
void start (int port);
/* stop receiving -- remove ISR, disable interrupts */
void stop (int port);
/* determine if UART is able to accept another character */
int txstat (int port);
/* send one character to the UART */
void tx (int port, char c);
/* determine if there is a character waiting in the receive buffer */
int rxstat (int port);
/* get the next character from the receive buffer */
char rx (int port);
/* determine status of Clear To Send */
int ctsstat (int port);
/* determine status of Ring Indicator */
int ristat (int port);
/* determine status of Data Carrier Detect */
int dcdstat (int port);
/* determine status of Data Set Ready */
int dsrstat (int port);





Listing Two

/* Physical Layer Module -- IBM-PC Serial Interface
 * Employs Hardware Interrupts and direct UART access.
 * Programmer: Brian R. Anderson * Date: Sept. 29, 1990
 * Modified: December 17, 1990: Added Support for COM3 & COM4
 * NOTE: COM3 shares an interrupt vector with COM1; COM4 shares an interrupt
 * vector with COM2. In this implementation, all ports use a common ISR.
 * Modified: December 10, 1993: Provide monitoring for input handshake
lines...
 * DSR, DCD, CTS, RI
 * N.B.: both output handshake lines (DTR and RTS) activated when the port 
 * receiving is started.
 */
 
#include <dos.h>
#include "port.h"

/* 8250 UART port address offsets */
#define DATA 0 /* Send or Receive Data */
#define BAUDDIV 0 /* Baud Rate Divisor (DLAB set to 1) */
#define ENINT 1 /* Enable Interrupts */
#define INTID 2 /* Interrupt Identification */
#define LNCTRL 3 /* Line Control */
#define MDMCTRL 4 /* MODEM Control */
#define LNSTAT 5 /* Line Status */
#define MDMSTAT 6 /* MODEM Status */

#define DLAB 0x80 /* Divisor Latch Access Bit (in Line Control Register) */
#define DTR 0x01 /* Data Terminal Ready (in MODEM Control Register) */
#define RTS 0x02 /* Request to Send (in MODEM Control Register) */
#define OUT2 0x08 /* UART OUT2 enables Interrupts on IBM-PC */
#define RXINT 0x01 /* Enable Receive Interrupts for UART */
#define RXREADY 0x01 /* Receive Ready Bit (in Line Status Register) */
#define TXREADY 0x60 /* Transmit Holding & Shift Register Empty Bits */
#define INTPEND 0x01 /* This bit will be Zero if an interrupt is pending */
#define DCTS 0x01 /* Delta (change in) Clear to Send */
#define DDSR 0x02 /* Delta (change in) Data Set Ready */
#define DRI 0x04 /* Delta (change in) Ring Indicator */
#define DDCD 0x08 /* Delta (change in) Data Carrier Detect */
#define CTS 0x10 /* Clear to Send */
#define DSR 0x20 /* Data Set Ready */
#define RI 0x40 /* Ring Indicator */
#define DCD 0x80 /* Data Carrier Detect */

#define CLOCK 0x1C200 /* Baud Rate Clock (1,843,200 / 16) */

#define PICMR 0x21 /* Priority Interrupt Controller Mask Address */
#define PICCR 0x20 /* Priority Interrupt Controller Control Reg. Address */
#define EOI 0x20 /* End Of Interrupt signal to PIC */
 
#define BUFSIZE 1024 /* size of circular receive buffer */

static int PortAdr[4] = {0x03F8, 0x02F8, 0x03E8, 0x02E8};
static char Buffer[4][BUFSIZE]; /* circular buffer for COM1 - COM4 */
static int PtrB[4] = {0, 0, 0, 0}; /* pointer (index) to start of buffer */
static int PtrE[4] = {0, 0, 0, 0}; /* pointer (index) to end of buffer */
static int InUse[4] = {FALSE, FALSE, FALSE, FALSE}; /* port still in use? */
static void (_interrupt _far *OldFunc[2])(void); /* original ISR vectors */

static int Vect[2] = {0x0C, 0x0B}; /* serial port vector numbers */
static int PICMask[2] = {0x10, 0x08}; /* priority interrupt controller mask */
static int Installed[2] = {FALSE, FALSE}; /* ISR installed yet? */

/* local function for handling COM1 - COM4 serial port interrupts */
static void interrupt rxisr (unsigned bp, unsigned di, unsigned si,
 unsigned ds, unsigned es, unsigned dx,
 unsigned cx, unsigned bx, unsigned ax)
{
 char c; /* character to read from port */
 int adr; /* address of port */
 int port; /* port number: COM1 - COM4 */
 int pos; /* position in buffer */
 
 for (port = COM1; port <= COM4; port++) { /* poll each port */
 adr = PortAdr[port]; /* get port address */
 if ((INTPEND & inpw (adr + INTID)) == 0) { /* Interrupt Pending? */
 c = inp (adr + DATA); /* get character */
 pos = PtrE[port]; /* get position of buffer end */
 Buffer[port][pos++] = c; /* add character to end of buffer */
 PtrE[port] = (pos == BUFSIZE) ? 0 : pos; /* update buffer position */
 }
 }
 /* tell priority interrupt controller that interrupt is over */
 outp (PICCR, EOI); 
}
/* initialize baud rate, data bits, parity, and stop bits */
void initcom (int port, int baud, int data, int parity, int stop)
{
 int line;
 int adr = PortAdr[port];
 /* Access Divisor Latch; set baud rate */
 outp (adr + LNCTRL, DLAB);
 outpw (adr + BAUDDIV, CLOCK / baud);
 /* combine data, parity, and stop bits; set port accordingly */ 
 line = (data - 5) (parity << 3) ((stop - 1) << 2);
 outp (adr + LNCTRL, line);
}
/* start receiving -- install interrupt service routine */
void start (int port)
{
 char pic;
 int adr = PortAdr[port];
 int ip = port & 0x0001; /* COM3/4 uses ISR for COM1/2 */
 PtrB[port] = PtrE[port] = 0; /* initialize receive buffer */
 if (!Installed[ip]) {
 /* save original value from vector table */
 OldFunc[ip] = _dos_getvect (Vect[ip]);
 /* set vector table with address of our serial port ISR */
 _dos_setvect (Vect[ip], rxisr);
 /* unmask the priority interrupt controller */
 pic = inp (PICMR);
 pic &= ~PICMask[ip];
 outp (PICMR, pic);
 /* mark isr for this "group" as installed */
 Installed[ip] = TRUE;
 }
 if (rxstat (port))
 rx (port); /* remove any character "stuck" in UART */

 /* enable Data Terminal Ready, Request to Send; allow interrupts through */
 outp (adr + MDMCTRL, DTR RTS OUT2);
 /* enable receive interrupts in UART */
 outp (adr + ENINT, RXINT);
 /* mark this port as in use */
 InUse[port] = TRUE;
}
/* stop receiving -- remove ISR, disable interrupts */
void stop (int port)
{
 char pic; /* PIC Mask */
 int adr = PortAdr[port]; /* port address */
 int ip = port & 0x0001; /* COM3/4 uses ISR for COM1/2 */
 /* mark this port as no longer in use */
 InUse[port] = FALSE;
 /* disable all interrupts in UART */
 outp (adr + ENINT, 0);
 /* disable Data Terminal Ready, Request to Send; block interrupts */
 outp (adr + MDMCTRL, 0);
 /* disable interrupt only if other port not using it */
 if (!InUse[ip] && !InUse[ip 0x0002]) { /* neither port using ISR */
 /* mask the priority interrupt controller */
 pic = inp (PICMR);
 pic = PICMask[ip];
 outp (PICMR, pic);
 /* restore original contents of vector table */
 _dos_setvect (Vect[ip], OldFunc[ip]);
 /* mark ISR as no longer installed */
 Installed[ip] = FALSE;
 }
}
/* determine if UART is able to accept another character */
int txstat (int port)
{
 unsigned char stat; /* UART status */
 stat = inp (PortAdr[port] + LNSTAT);
 return (stat & TXREADY) == TXREADY; /* Holding and Shift Empty */
}
/* send one character to the UART */
void tx (int port, char c)
{
 outp (PortAdr[port] + DATA, c);
}
/* determine if there is a character waiting in the receive buffer */
int rxstat (int port)
{
 return PtrB[port] != PtrE[port];
}
/* get the next character from the receive buffer */
char rx (int port)
{
 char c; /* character to get from buffer */
 int pos = PtrB[port]; /* position within buffer */
 c = Buffer[port][pos++]; /* get the character from the buffer */
 PtrB[port] = (pos == BUFSIZE) ? 0 : pos; /* update buffer position */
 return c;
}
/* determine status of Clear To Send */
int ctsstat (int port)

{
 unsigned char stat; /* MODEM status */
 stat = inp (PortAdr[port] + MDMSTAT);
 return (stat & CTS) == CTS;
}
/* determine status of Ring Indicator */
int ristat (int port)
{
 unsigned char stat; /* MODEM status */
 stat = inp (PortAdr[port] + MDMSTAT);
 return (stat & RI) == RI;
}
/* determine status of Data Carrier Detect */
int dcdstat (int port)
{
 unsigned char stat; /* MODEM status */
 stat = inp (PortAdr[port] + MDMSTAT);
 return (stat & DCD) == DCD;
}
/* determine status of Data Set Ready */
int dsrstat (int port)
{
 unsigned char stat; /* MODEM status */
 stat = inp (PortAdr[port] + MDMSTAT);
 return (stat & DSR) == DSR;
}




































November, 1994
Packet Filtering in the SNMP Remote Monitor


Controlling remote monitors on a LAN




William Stallings


William is president of Comp-Comm Consulting of Brewster, MA. This article is
based on his recent book, SNMP, SNMPv2, and CMIP: The Practical Guide to
Network Management Standards (Addison-Wesley, 1993). He can be reached at
stallings@acm.org.


The Simple Network Management Protocol (SNMP) architecture was designed for
managing complex, multivendor internetworks. To achieve this, a few managers
and numerous agents scattered throughout the network must communicate. Each
agent uses its own management-information database (MIB) of managed objects to
observe or manipulate the local data available to a manager.
The remote-network monitoring (RMON) MIB, defined as part of the SNMP
framework, provides a tool that an SNMP or SNMPv2 manager can use to control a
remote monitor of a local-area network. The RMON specification is primarily a
definition of a data structure containing management information. The effect,
however, is to define standard network-monitoring functions and interfaces for
communicating between SNMP-based management consoles and remote monitors. In
general terms, the RMON capability provides an efficient way of monitoring LAN
behavior, while reducing the burden on both other agents and management
stations; see Figure 1. 
The accompanying text box entitled, "Abstract Syntax Notation One (ASN.1)"
gives details on defining the communication formats between agents and
managers.
The key to using RMON is the ability to define "channels"--subsets of the
stream of packets on a LAN. By combining various filters, a channel can be
configured to observe a variety of packets. For example, a monitor can be
configured to count all packets of a certain size or all packets with a given
source address. 
To use RMON effectively, the person responsible for configuring the remote
monitor must understand the underlying filter and channel logic used in
setting it up. In this article, I'll examine this filter and channel logic.
The RMON MIB contains variables that can be used to configure a monitor to
observe selected packets on a particular LAN. The basic building blocks are a
data filter and a status filter. The data filter allows the monitor to screen
observed packets based on whether or not a portion of the packet matches a
certain bit pattern. The status filter allows the monitor to screen observed
packets on the basis of their status (valid, CRC error, and so on). These
filters can be combined using logical AND and OR operations to form a complex
test to be applied to incoming packets. The stream of packets that pass the
test is referred to as a "channel," and a count of such packets is maintained.
The channel can be configured to generate an alert if a packet passes through
the channel when it is in an enabled state. Finally, the packets passing
through a channel can be captured in a buffer. The logic defined for a single
channel is quite complex. This gives the user enormous flexibility in defining
the stream of packets to be counted.


Filter Logic


At the lowest level of the filter logic, a single data or status filter
defines characteristics of a packet. First, consider the logic for defining
packet characteristics using the variables input (the incoming portion of a
packet to be filtered), filterPktData (the bit pattern to be tested for),
filterpktDataMask (the relevant bits to be tested for), and
filterPktDataNotMask (which indicates whether to test for a match or a
mismatch). For the purposes of this discussion, the logical operators AND, OR,
NOT, XOR, EQUAL, and NOT-EQUAL are represented by the symbols o, +, --, _, =,
and _, respectively.
Suppose that initially, you simply want to test the input against a bit
pattern for a match. This could be used to screen for packets with a specific
source address, for example. In the expression in Example 1(a), you would take
the bit-wise exclusive-OR of input and filterPktData. The result has a 1 bit
only in those positions where input and filterPktData differ. Thus, if the
result is all 0s, there's an exact match. Alternatively, you may wish to test
for a mismatch. For example, suppose a LAN consists of a number of
workstations and a server. A mismatch test could be used to screen for all
packets that did not have the server as a source. The test for a mismatch
would be just the opposite of the test for a match; see Example 1(b). A 1 bit
in the result indicates a mismatch.
The preceding tests assume that all bits in the input are relevant. There may,
however, be some "don't-care" bits irrelevant to the filter. For example, you
may wish to test for packets with any multicast destination address.
Typically, a multicast address is indicated by one bit in the address field;
the remaining bits are irrelevant to such a test. The variable
filterPktDataMask is introduced to account for "don't-care" bits. This
variable has a 1 bit in each relevant position and 0 bits in irrelevant
positions. The tests can be modified; see Example 1(c).
The XOR operation produces a result that has a 1 bit in every position where
there is a mismatch. The AND operation produces a result with a 1 bit in every
relevant position where there is a mismatch. If all of the resulting bits are
0, then there is an exact match on the relevant bits; if any of the resulting
bits is 1, there is a mismatch on the relevant bits.
Finally, you may wish to test for an input that matches in certain relevant
bit positions and mismatches in others. For example, you could screen for all
packets that had a particular host as a destination (exact match of the DA
field) and did not come from the server (mismatch on the SA field). To enable
these more complex tests to be performed, use filterPktDataNotMask, where:
The 0 bits in filterPktDataNotMask indicate the positions where an exact match
is required between the relevant bits of input and filterPktData (all bits
match).
The 1 bits in filterPktDataNotMask indicate the positions where a mismatch is
required between the relevant bits of input and filterPktData (at least one
bit does not match).
For convenience, assume the definition in Example 2(a). Incorporating
filterPktDataNotMask into the test for a match gives Example 2(b).
The test for a mismatch is slightly more complex. If all of the bits of
filterPktDataNotMask are 0 bits, then no mismatch test is needed. By the same
token, if all bits of filterPktDataNotMask are 1 bits, then no match test is
needed. However, in this case, filterPktDataNotMask is all 0s, and the match
test automatically passes relevant_bits_differento0=0. Therefore, the test for
mismatch is as in Example 2(c).
The logic for the filter test is summarized in Figure 2. If an incoming packet
is to be tested for a bit pattern in a portion of the packet, located at a
distance filterPktDataOffset from the start of the packet, the following tests
will be performed:
Test #1: As a first test (not shown in the figure), the packet must be long
enough so that at least as many bits in the packet follow the offset as there
are bits in filterPktData. If not, the packet fails this filter.
Test #2: Each bit set to 0 in filterPktDataNotMask indicates a bit position in
which the relevant bits of the packet portion should match filterPktData. If
there is a match in every desired bit position, then the test passes;
otherwise the test fails.
Test #3: Each bit set to 1 in filterPktDataNotMask indicates a bit position in
which the relevant bits of the packet portion should not match filterPktData.
In this case, the test is passed if there is a mismatch in at least one
desired bit position.
A packet passes this filter if and only if it passes all three tests.
Why use the filter test? Consider that you might want to accept all Ethernet
packets that have a destination address of "A5"h but do not have a source
address of "BB"h. The first 48 bits of the Ethernet packet constitute the
destination address and the next 48 bits, the source address. Example 3
implements the test. The variable filterPktDataOffset indicates that the
pattern matching should start with the first bit of the packet; filter PktData
indicates that the pattern of interest consists of "A5"h in the first 48 bits
and "BB"h in the second 48 bits; filter PktDataMask indicates that the first
96 bits are relevant; and filterPktDataNotMask indicates that the test is for
a match on the first 48 bits and a mismatch on the second 48 bits.
The logic for the status filter has the same structure as that for the data
filter; see Figure 2. For the status filter, the reported status of the packet
is converted into a bit pattern. Each error-status condition has a unique
integer value, corresponding to a bit position in the status-bit pattern. To
generate the bit pattern, each error value is raised to a power of 2 and the
results are added. If there are no error conditions, the status-bit pattern is
all 0s. An Ethernet interface, for example, has the error values defined in
Table 1. Therefore, an Ethernet fragment would have the status value of
6(21+22).


Channel Definition


A channel is defined by a set of filters. For each observed packet and each
channel, the packet is passed through each of the filters defined for that
channel. The way these filters are combined to determine whether a packet is
accepted for a channel depends on the value of an object associated with the
channel (channelAcceptType), which has the syntax INTEGER {acceptMatched(1),
acceptFailed(2)}. 
If the value of this object is acceptMatched(1), packets will be accepted for
this channel if they pass both the packet-data and packet-status matches of at
least one associated filter. If the value of this object is acceptFailed(2),
packets will be accepted to this channel only if they fail either the
packet-data match or the packet-status match of every associated filter.
Figure 3 illustrates the logic by which filters are combined for a channel
whose accept type is acceptMatched. A filter is passed if both the data filter
and the status filter are passed; otherwise, that filter has failed. If you
define a pass as a logical 1 and a fail as a logical 0, then the result for a
single filter is the AND of the data filter and status filter for that filter.
The overall result for a channel is then the OR of all the filters. Thus, a
packet is accepted for a channel if it passes at least one associated filter
pair for that channel.
If the accept type for a channel is acceptFailed, then the complement of the
function just described is used. That is, a packet is accepted for a channel
only if it fails every filter pair for that channel. This would be represented
in Figure 3 by placing a NOT gate after the OR gate.


Channel Operation


The value of channelAcceptType and the set of filters for a channel determine
whether a given packet is accepted for a channel or not. If the packet is
accepted, then the counter channelMatches is incremented. Several additional
controls are associated with the channel: channelDataControl, which determines
whether the channel is on or off; channelEventStatus, which indicates whether
the channel is enabled to generate an event when a packet is matched; and
channelEventIndex, which specifies an associated event. 
When channelDataControl has the value off, then, for this channel, no events
may be generated as the result of packet acceptance, and no packets may be
buffered. If channelDataControl has the value on, then these related actions
are possible.

Figure 4 summarizes the channel logic. If channelDataControl is on, then an
event will be generated if: 1. an event is defined for this channel in
channelEventIndex; and 2. channelEventStatus has the value eventReady or
eventAlwaysReady. If the event status is eventReady, then each time an event
is generated, the event status is changed to eventFired. It then takes a
positive action on the part of the management station to reenable the channel.
This mechanism can therefore be used to control the flow of events from a
channel to a management station. If the management station is not concerned
about flow control, it may set the event status to eventAlwaysReady, where it
will remain until explicitly changed.


Summary


The packet-filtering facility of RMON provides a powerful tool for the remote
monitoring of LANs. It enables a monitor to be configured to count and buffer
packets that pass or fail an elaborate series of tests. This facility is the
key to successful remote-network monitoring.
Abstract Syntax Notation One (ASN.1)
Steve Witten
Steve, a software engineer for Hewlett-Packard, specializes in network testing
and measurement. You can contact him at stevewi@hpspd.spd.hp.com.
SNMP protocol and MIB are formally defined using an abstract syntax. This
allowed SNMP's authors to define data and data structures without regard to
differences in machine representations. This abstract syntax is an OSI
language called "abstract syntax notation one" (ASN.1). It is used for
defining the formats of the packets exchanged by the agent and manager in the
SNMP protocol and is also the means for defining the managed objects.
ASN.1 is a formal language defined in terms of a grammar. The language itself
is defined in ISO #8824. The management framework defined by the SNMP
protocol, the SMI, and the MIB use only a subset of ASN.1's capabilities.
While the general principles of abstract syntax are good, many of the bells
and whistles lead to unnecessary complexity. This minimalist approach is taken
to facilitate the simplicity of agents. 
Listings One through Three show an MIB, using a fictitious enterprise called
SNMP Motors. Listing One is an ASN.1 module that contains global information
for all MIB modules. Listing Two, another ASN.1 module, contains the
definitions of specific MIB objects. Finally, Listing Three illustrates
manageable objects. 
Once data structures can be described in a machine-independent fashion, there
must be an unambiguous way of transmitting those structures over the network.
This is the job of the transfer-syntax notation. Obviously, you could have
several transfer-syntax notations for an abstract syntax, but only one
abstract-syntax/transfer-syntax pair has been defined in OSI. The basic
encoding rule (BER) embodies the transfer syntax. The BER is simply a
recursive algorithm that can produce a compact octet encoding for any ASN.1
value.
At the top level, the BER describes how to encode a single ASN.1 type. This
may be a simple type such as an Integer, or an arbitrarily complex type. The
key to applying the BER is understanding that the most complex ASN.1 type is
nothing more than several simpler ASN.1 types. Continuing the decomposition,
an ASN.1 simple type (such as an Integer) is encoded.
Using the BER, each ASN.1 type is encoded as three fields: a tag field, which
indicates the ASN.1 type; a length field, which indicates the size of the
ASN.1 value encoding which follows; and a value field, which is the ASN.1
value encoding.
Each field is of variable length. Because ASN.1 may be used to define
arbitrarily complex types, the BER must be able to support arbitrarily complex
encodings.
It is important to note how the BER views an octet. Each octet consists of
eight bits. BER numbers the high-order (most significant) bit as bit 8 and the
low-order (least significant) bit as bit 1. It's critical that this view be
applied consistently because different machine architectures use different
ordering rules.
Figure 1 RMON description.
Figure 2 Logic for the filter test. 
Figure 3 Logic by which filters are combined for a channel whose accept type
is acceptMatched.
Figure 4: Logic for channel filter.
procedure packet_data_match;
begin
 if (result = 1 and channelAcceptType = acceptMatched) or
 (result = 0 and channelAcceptType = acceptFailed)
 then begin
 channelMatches := channelMatches + 1;
 if channelDataControl = on
 then begin
 if (channelEventStatus _ eventFired) and
 (channelEventIndex _ 0) then generate_event;
 if (channelEventStatus = eventReady) then 
 channelEventStatus := eventFired
 end;
 end;
end;
Example 1: Testing the input against a bit pattern for a match. 
(a) (input XOR filterPktData) = 0 --> match

(b) (input XOR filterPktData) (does not equal) 0 --> mismatch

(c) ((input XOR filterPktData) (and) filterPktDataMask) = 
 0 --> match on relevant bits 
 ((input XOR filterPktData) (and) filterPktDataMask) (does not equal) 
 0 --> mismatch on relevant bits
Table 1: Ethernet-interface error values.
 Bit Error 
 0 Packet is longer than 1518 octets.
 1 Packet is shorter than 64 octets.
 2 Packet experienced a CRC or
 alignment error.
Example 2: Assuming the definition in (a), incorporating filterPktDataNotMask
into the test for a match, you end up with (b). Test for a mismatch is shown
in (c).
(a) relevant_bits_different = 
 (input XOR filterPktData) (and) filterPktDataMask

(b) (relevant_bits_different (and) filterPktDataNotMask') = 
 0 --> successful match


(c) ((relevant_bits_different (and) filterPktDataNotMask) (does not equal) 0)
+ 
 (filterPktDataNotMask = 0) --> successful mismatch
Example 3: Launching a filter test.
filterPktDataOffset = 0
filterPktData = "00 00 00 00 00 A5 00 00 00 00 00 BB"h
filterPktDataMask = "FF FF FF FF FF FF FF FF FF FF FF FF"h
filterPktDataNotMask = "00 00 00 00 00 00 FF FF FF FF FF FF"h

Listing One 

SNMP-motors-MIB DEFINITIONS ::= BEGIN
IMPORTS
 enterprises
 FROM RFC1155-SMI;
SNMP-motors OBJECT IDENTIFIER ::= { enterprises 9999 }
expr OBJECT IDENTIFIER ::= { SNMP-motors 2 }
END



Listing Two

SNMP-motors-car-MIB DEFINITIONS ::= BEGIN
IMPORTS
 SNMP-motors
 FROM SNMP-motors-MIB;
IMPORTS
 OBJECT TYPE, ObjectName, NetworkAddress,
 IpAddress, Counter, Gauge, TimeTicks, Opaque
 FROM RFC1155-SMI;
car OBJECT IDENTIFIER ::= { SNMP-motors 3 }
-- this is a comment
-- Implementation of the car group is mandatory
-- for all SNMP-motors cars.
-- ( the rest of the SNMP-motors-car-MIB module )
END



Listing Three

carName OBJECT TYPE
 SYNTAX DisplayString (SIZE (0..64))
 ACCESS read-only
 STATUS mandatory
 DESCRIPTION
 "A textual name of the car."
 ::= { car 1 }
carLength OBJECT TYPE
 SYNTAX INTEGER (0..100)
 ACCESS read-only
 STATUS mandatory
 DESCRIPTION
 "The length of the car in feet."
 ::= { car 2 }
carPassengers OBJECT TYPE
 SYNTAX INTEGER (0..4)
 ACCESS read-only
 STATUS mandatory

 DESCRIPTION
 "The number of passengers in the car."
 ::= { car 3 }
carPassengerTable OBJECT TYPE
 SYNTAX SEQUENCE OF CarPassengerEntry
 ACCESS not-accessible
 STATUS mandatory
 DESCRIPTION
 "A table describing each passenger."
 ::= { car 4 }
carPassengerEntry OBJECT TYPE
 SYNTAX SEQUENCE OF CarPassengerEntry
 ACCESS not-accessible
 STATUS mandatory
 DESCRIPTION
 "A entry table describing each passenger."
 ::= { carPassengerTable 1 }
CarPassengerEntry ::= SEQUENCE {
 carPindex
 INTEGER,
 carPname
 DisplayString,
 carPstatus
 INTEGER
}
carPindex OBJECT TYPE
 SYNTAX INTEGER (1..4)
 ACCESS read-only
 STATUS mandatory
 DESCRIPTION
 "Index for each passenger which ranges from 
 1 to the value of carPassengers."
 ::= { carPassengerEntry 1 }
carPname OBJECT TYPE
 SYNTAX DisplayString (SIZE (0..64))
 ACCESS read-write
 STATUS mandatory
 DESCRIPTION
 "The name of the passenger."
 ::= { carPassengerEntry 2 }
carPstatus OBJECT TYPE
 SYNTAX INTEGER { other(1),driver(2) }
 ACCESS read-write
 STATUS mandatory
 DESCRIPTION
 "The status of the passenger."
 ::= { carPassengerEntry 3 }















November, 1994
Character Simulation with ScriptX


A general-purpose framework for dynamic behavior




Assaf Reznik


Assaf is a senior multimedia designer at Kaleida Labs, where he developed
Playfarm with Lisa Lopuck, Mike Powers, and other engineers. He can be
contacted at assaf@well.com.


Filmore T. Goose is an obsessively neat bird, the Felix Unger of the animal
world, who continuously picks up after fellow animals. Irma La Sheep is a
good-hearted quadruped, always ready to knit a sweater as a gift for any
animal that happens to pass by. These are two of the characters in Playfarm, a
prototype multimedia title for children implemented in ScriptX from Kaleida
Labs. ScriptX is a platform-independent, object-oriented development
environment for creating multimedia applications (see the accompanying text
box, "Introducing ScriptX"). 
Playfarm consists of a virtual farm environment populated with some
interesting and truly autonomous characters; see Figure 1. Every character in
the Playfarm simulation runs in its own thread and can interact with the other
characters in a rich, complex, and unpredictable manner. Playfarm's open-ended
structure allows you to dynamically add new characters into the environment
while it is running. This article describes the architecture and
implementation of Playfarm, which was designed in a general-purpose way so
that its design and code can be reused in creating other simulations. 


Design Goals


Many current multimedia titles are structured as a series of film clips
stitched together in a static way that sometimes provides a convincing
illusion of dynamic behavior--even though every sequence has been planned,
scripted, and burned into code before execution time. However, we wanted our
characters to interact in a more dynamic fashion, rather than in the usual,
predetermined way. At each clock tick, characters sense, process, and react to
their surroundings. Their reaction depends both on the state of the
environment at a given time, and on their own internal state. This free-form
response is implemented by a character's "activity engine," discussed later.
A related design goal was to create characters with real personalities which
reveal themselves over time. Behavior must be continuous rather than "action
bites." Characters respond to the environment and to the user. When Playfarm
props and characters are added, removed, or otherwise changed, characters must
continue to interact with their surroundings in a manner consistent with their
personalities. 
Another goal was to make the environment extensible, creating an experience
that users could modify and control, if so inclined. An additional goal was to
create a general-purpose, content-independent model for character-based
simulation. By creating an abstract model that is decoupled from the
presentation layer, this model can be used in other contexts. Our approach is
guided by the well-known Model-View-Controller (MVC) paradigm, first used in
Smalltalk-80 and now incorporated into modern class libraries, including
ScriptX.


Playfarm Architecture


The Playfarm architecture consists of three components: simulation
environment, character and prop, and user representation. The
user-representation component is the simplest, consisting of a geometric space
equal in size to the screen display, and an intelligent cursor object.
The simulation-environment component is a general-purpose framework for a
character-based simulation. It is divided into the presentation space and the
model space; see Figure 2. The presentation space is roughly analogous to the
View portion of the Smalltalk MVC. It contains all visible objects (except for
the cursor, which belongs to the user-representation component), as well as
objects that manipulate the visual aspects of the environment--such as the
Panner object, which is responsible for panning or moving around the model
space. Only the section of the farm that is visible at a given time is part of
the presentation space. By contrast, the model space covers the whole span of
the virtual farm. 
The character-and-prop component is, as you might expect, the set of classes
used to instantiate characters and props. Props are inanimate objects in the
space, while characters have animated behavior. Figure 3 illustrates the
structure of a Playfarm character, which consists of an animation object and a
character shell. The animation object contains all the raw animation sequences
for a character. Each of these sequences is named so that the model object can
refer to it. Whenever the model makes an animation request, the name of the
requested animation sequence is added to the animation queue and then played
in order.
The character shell contains the character's repertoire of activities and the
logic describing the relationships between this set of activities and factors
such as time and mood (which in itself has a relationship to the environment
and to other characters). These relationships are modeled in the character's
"activity engine." Excerpts from the Playfarm source code are shown in Listing
One . The complete code (about 2000 lines of ScriptX) is available
electronically; see "Availability," page 3.


The Model Space


Playfarm maps the virtual farm onto a simulation-modeling space. This is a
geometric mapping. As shown in Figure 4, it consists of a web of triangles
whose nodes form a series of equilateral triangles. From each node, a
character can potentially move in six directions: right, down/right,
down/left, left, up/left, and up/right. 
This kind of grid has several advantages over other designs. Unlike a
traditional rectangular grid (with four directions of movement), it is not
immediately obvious to the user that characters can only move in six possible
directions. Also, unlike a three-dimensional model, the flat structure of the
Playfarm grid significantly reduces the complexity of character-animation
sequences:There are only two basic character orientations (left- and
right-facing), obviating the need for four views of a character (back, front,
left, right).
Although the farm is presented on a vertical (visually upright) screen from a
frontal view point, the mapping is of an aerial view. Each row, then,
represents a different z-plane in the virtual farm. This is translated to the
presentation layer by setting the z instance variables of corresponding
presenters in each row to reflect the depth. Thus we can achieve a
two-and-a-half-dimensional perspective.
The class Grid consists of nodes which are instances of class Cell (both are
available electronically). It contains the behavior objects of Playfarm--that
is, prop shells and character shells. The Cell class, among other things,
implements a basic set of services for moving from one cell to any of its six
neighbors, for adding and deleting objects from a cell, and for accessing a
cell's contents and that of its neighboring cells.


The Character Component


In designing a Playfarm character, the first step is creating some animation
sequences. You can do this with tools such as Macromedia Director. Cells
created with Director must be converted to ScriptX's internal object
representation, Bento, using ScriptX's import facility. Bento is an
object-storage format designed at Apple and used in other software systems
such as OpenDoc and the Taligent Frameworks. There is a Bento file associated
with each character.
Next, these individual frames must be combined into a meaningful, named
sequence. The Playfarm function makeFrameSequence takes a Bento file, a
direction in which a character moves while performing the animation, the
number of nodes to move (0 or 1 in Playfarm), the bitmaps associated with that
sequence, and an optional sound argument. This function returns a
FrameSequence object that, when properly attached to a movement controller,
animates through all cells of the animation and smoothly moves the character
in the specified direction at a velocity synchronized with the flip animation.
All sequences in Playfarm are created such that they either start and end at
the same location or move the animated object exactly one cell. For more
details on the Animation class, see Listing One.
In the class SheepPresenter For Irma, sequences are defined for walking,
facing, turning (all these in the six possible directions), as well as for
knitting, eating, complaining, and so on. Then the animation object is told
what the transition sequences are--in this case, turnLeft and turnRight--and
which sequences a character can go to from each transition. The
model-character object is unaware of the names of transition sequences; they
are abstracted from the behavior layer. Thus, when the behavior component of
the character decides to walk left when the character is walking right, the
animation object contains the logic that a transition sequence needs to be
played before the character can walk left. This completes the presentation
aspect of the character.


Character Behavior


The class Shell is listed in Listing One, along with its derivative
CharacterShell. Together, these two encapsulate the behavior aspect of a
character: attributes, current state, direction, relation to the model grid,
and relation to a target object with which it might be choreographed. The
Shell class and its subclasses provide services for keeping model shells and
shell presenters in sync with each other. The Shell class also provides
services for adding new shells, with their presenters, to the scene. To create
the behavior shell for Irma, we must inherit from CharacterShell. This wires
the sheep object to the model space and to its presenter.
The meat, if you will, of the sheep shell is its logical actions. The sheep
presenter's animation sequences are combined into actions that make sense by
defining character actions in the init method for the Sheep object. These
actions rely on action classes, which provide the general building blocks for
creating meaningful activities for characters. Each action instance is one
atomic thing a character does, such as "walk left" or "knit a sweater."
Combining these atomic actions into sequences and branches results in
interesting behaviors and coherent interactions between objects in the system.
Actions are the wires that connect a character's activity engine with the
model space and the animation class.

Actions are combined together by two mechanisms: nested actions and action
sequences. In the simplest case, an action simply corresponds to one animation
sequence. The animation object (sheep presenter) is referred to by the
targetAction instance variable of the Action object. But targetAction can also
refer to a nested action or action sequence. In this case, when the action is
invoked, it will invoke yet another action after doing some processing of its
own. Sequencing several actions together in a meaningful context creates an
activity.
Irma possesses a wander-around activity, a knit-sweater activity, and a
complain activity. complainActivity, for example, is a sequence of three
wander actions and a random action, which is nested. Sixty percent of time
that complainActivity is invoked, it calls complainAction (the rest of the
time it returns with no effect).


The Character Engine


The last step in creating a character is to install character activities into
the activity engine. The approach described here was influenced by Patie Maes'
work at MIT Media Lab. The character engine is the main driver of the
character component. Character-level decisions define the overall expression
of characters. For example, a character engine may make a character decide to
sleep when it is tired, clean when it is anxious, and knit sweaters when it
encounters a friend. The character engine understands schedules (biological
clocks), moods, and the character's sense of the environment. It weighs all
these different values and makes changes in its activities as needed.
The character engine models a character's traits and then influences the
choice of activities that the character undertakes over time. Each activity is
associated with a weight. This weight changes dynamically through the
experience. Weights are influenced by factors such as the schedule,
motivations, and reactions to the environment. For example, consider a dog
character whose activity repertoire consists of eat, sleep, run around, and
chase a cat. Each of these activities is assigned an initial weight. Then, if
the dog is old and lazy, we create a schedule that favors sleeping and eating
activities most hours of the day (or what ever time unit is abstracted by the
schedule). The tiredness motivation goes up each time the character ends up
running or chasing a cat.
Schedules link the weights of certain behaviors to the passage of time. One
example is a 24-hour clock that influences a different behavior for each hour
of the day. In our dog example, the schedule may read: eat at 8 a.m., run
around at 9, chase cat at 10, eat at 11, and so on. Possible patterns can be
seasonal, circadian, or simple rhythms. At each cycle of the activity engine,
appropriate activity for that time is selected. Then, that weight associated
with it is added to the active weight for that activity.
Strengtheners can strengthen an activity or a schedule. For example, if the
dog chases a cat whenever one is in the vicinity, we can use a strengthener on
the schedule to check if the current activity is the scheduled one. If it
isn't, the strengthener will magnify the scheduled activity even more, thereby
enforcing the schedule. 
Motivations are externally attached personality factors of the character. This
is how a character can pick up the feeling of a locale or experience.
Motivations are usually modeled as range values that can be lowered or
raised--for example, our dog has a loner motivation, which is triggered by a
sensor of too_much_commotion around our dog. Whenever the sensor reads that
the dog is not alone (that is, other characters are around), the loner
motivation is pushed higher. In this way the dog might choose behavior based
on its desire to be alone.
Reactions are sets of activities triggered directly by events (signaled by
sensors). This allows characters to be unconditionally responsive to events no
matter what their mood is at the time; for example, when hit by lightning.


Importing a Character


One of Playfarm's most attractive features is that it allows end users to
import characters and props at run time. After a user spends some time with
Playfarm, it can evolve into the user's own creation. 
The process of developing and importing an independent character is very
simple: You must provide the system with the presenter definition, the
character-shell definition, and the object store containing the media. These
steps have all been described here. The only remaining steps are to put the
new character in its initial location, add it to the scene, and start its
activity engine.
Introducing ScriptX
Ray Valds
ScriptX is a multimedia-oriented development environment created over the last
two years by Kaleida Labs. Unlike packages such as Macromedia Director,
ScriptX is not an authoring tool for creating multimedia titles, although it
does come with a built-in authoring tool. Rather, it is a general-purpose,
object-oriented, multiplatform development environment that includes a
powerful, dynamic language and a well-rounded class library. ScriptX is as
applicable for implementing client/server applications as it is for authoring
multimedia titles. While large, complex, and powerful, ScriptX is designed
from the ground up in an integrated fashion, making it smaller, more
consistent, and easier to learn than equivalent traditional systems (say, a
C++ environment and class library).
With ScriptX's dynamic nature, classes, objects, and their relationships can
be reconfigured during execution. Methods can be redefined and new objects
added at run time. ScriptX code is semi-compiled into a bytecode
representation, similar to that in Smalltalk, that is then interpreted by a
platform-specific virtual-machine interpreter. Multithreading constructs are
built into the language, which blends elements of Smalltalk, Dylan, Hypertalk,
Lisp, Object Logo, C++, and Pascal.
Syntactically speaking, the language is small, about half the size of C and a
fraction of C++ (in terms of BNF rules required to specify the grammar). There
are no statements in ScriptX; every construct is an expression that returns a
value. There are the usual block, iteration, and conditional constructs: If,
Then, Else, While, and so on. There are also various ways of setting variable
scopes: the local and global keywords for individual variables, and the module
keyword which is a larger-granularity, scope construct roughly equivalent to
the recently added namespace keyword in C++.
Everything in ScriptX is an object, including integers, strings, methods,
classes, and functions. Integer and float objects have an "immediate"
implementation that allows the bytecode interpreter to obtain values without
method dispatching (by using tag bits and a bitshift, as is done in many
Smalltalk implementations). There is no distinction at the level of bytecode
representation between user-defined objects and classes and those supplied by
the system; every entity has equivalent first-class status. In addition to
supporting multiple inheritance, ScriptX lets you specialize behavior not just
with classes, but also at the level of particular objects and individual
methods. The ScriptX Core Classes comprise a class library that provides
integrated general-purpose facilities like those in app frameworks such as
OWL, MFC, zApp, MacApp, and C++/Views. The ScriptX library consists of about
240 classes and perhaps 2000 methods and provides support for text, fonts,
streams, events, menus, scrolling lists, push buttons, scroll bars, files,
properties, error-handling, arrays, B-trees, and more. 
The most important aspect of the ScriptX classes is the use of classic
Model-View-Controller (MVC) paradigm which, by decoupling an application's
dataset from its presentation, enables the user to simultaneously view and
manipulate an app's data in different ways (graphically, textually, and so
on). 
A key ScriptX feature is a Clock class, which provides facilities for
synchronizing timed sequences of actions required by multimedia apps. Other
classes implement a persistent-object storage facility, a search engine to
retrieve objects within large-scale models, facilities for spawning and
synchronizing multiple threads, a rich set of exception-handling mechanisms,
and (in the future) support for distributing objects across networks.
For more information, send e-mail to kaleida.dev@kaleida.com.
Figure 1 A scene from Playfarm. The hand at the bottom of the image is an
intelligent-cursor object, part of the user-representation component.
Figure 2 The layered structure of the Playfarm project distinguishes the
modeling space from the presentation space.
Figure 3 The structure of a Playfarm character, which spans both modeling and
presentation spaces.
Figure 4 The model space has an underlying triangular structure, allowing six
directions of motion.

Listing One 
------------------------------------------------
-- EXCERPTS FROM ANIMATION CLASS IN PLAYFARM
------------------------------------------------
class Animation(TwoDShape, Projectile, Pannable)
instance variables
 seq
 cell
 animationClock
 blockflag
 animationq
 basicset
 transitionset
 lasttransition
 partialpivots
 movecontroller
 modelshell
end

------------------------------------------------
method startAnimation self {class Animation} ->
(
 startClock self
 self.blockflag.state := @open
)

------------------------------------------------
method doaction self {class Animation} #key anim:(undefined) ->
(
 local allseqs
 if anim = undefined do 
 (
 print "animation called without a sequence"
 return false
 )
 -- if the animation is in the list of the last transitionset's 
 -- animations then add it to the queue
 if (ismember self.transitionset[self.lasttransition] anim) then
 (
 append self.animationq self.basicset[anim]
 )
 -- otherwise find the transition necessary to perform the animation by
 -- checking each transition list to see if animation is contained in it
 else
 (
 -- append the entire animation sequence returned to the queue
 local aq
 aq:= (buildAnimSequence self anim #(self.basicset[anim]))
 addmany self.animationq aq
 )
 self.blockflag.state := @closed 
 gateWait self.blockflag 
)
---------------------------------------------------------------------
-- This method recursively builds the sequence of animations required
-- to perform the requested animation since it was not possible to
-- perform the animation from the last transition. anim is the
-- animation you want to perform seq is the array that gets built that
-- holds the entire sequence through recursion When a method (other
-- than this one) calls this method set seq:= #(anim)
method buildAnimSequence self {class Animation} anim seq ->
(
 local a, akey, atran
 -- get the transition that the animation can be performed from
 a:= select first array in self.transitionset where (ismember it anim)
 -- get the transition token
 akey:= getkeyone self.transitionset a.target
 -- if this transition is not the last transition, recurse to find 
 -- the transition that akey is a member of
 if akey <> self.lasttransition do
 (
 -- add it to the animation sequence
 prepend seq self.basicset[akey]
 buildAnimSequence self akey seq
 atran:= self.partialpivots[akey]

 if atran <> empty then
 self.lasttransition:= atran
 else
 self.lasttransition:= akey
 )
 seq
)
---------------------------------------------------------------
method tick self {class Animation} ->

( 
 if (self.seq = empty) do 
 (
 if ((self.seq := self.animationq[1]) = empty) do return
 self.velocity.x := self.seq.velocityx
 self.velocity.y := self.seq.velocityy
 -- If there is a sound for this frame sequence, play it.
 local s := self.seq.sound
 if (s <> undefined) do
 (
 -- Add time callback so player will stop and go to beginning.
 addTimeCallback \
 s (p -> (stop p; goToBegin p)) s #() s.duration false
 play s
 )
 self.seq
 )
 -- Change the stencil to the next animation cell in the series.
 self.boundary := self.seq[self.cell]
 tickle self.movecontroller self.animationClock
 
 self.cell := ((mod self.cell (size self.seq)) as Integer) + 1
 if self.cell = 1 do 
 (
 if (self.animationq[3] = empty) do 
 (
 self.blockflag.state := @open
 )
 setPresenterX self self.x + self.seq.extrax
 self.y := self.y + self.seq.extray
 deletenth self.animationq 1
 self.seq := empty
 )
)
------------------------------------------------
-- EXCERPTS FROM SHEEP PRESENTER CLASS
------------------------------------------------
class SheepPresenter (Animation, ModelDragger)
end
--------------------------------------------------------
method init self {class SheepPresenter} #rest args ->
(
 apply nextmethod self args
 self.x := 0

 self.y := 180
 self.velocity := new point x:0 y:0
 m := rootHandle "media" SheepStore
 -- Changed: added sound.
 local baaahSound := Sounds["sheep baaah friendly"]
 -- Changed: added sound.
 self.basicset := #(
 @walkright: \
 (makeFrameSequence m 1 1 #(151,152,153,154,155,156,157,158)),
 @walkrightdown: \
 (makeFrameSequence m 2 1 #(151,152,153,154,155,156,157,158)),
 @knitright: \
 (makeFrameSequence m 0 0 #(163,164,165,166,167,168,169,170) \
 sound:baaahSound),

 @exerciseleft: \
 (makeFrameSequence m 0 0 \
 #(63,64,65,66,67,68,69,70,71,72,73,74)),
 --........and so on for the rest of the sequences....
 )
 self.transitionset := #(
 @turnleft: \
 #(@walkleft, @walkleftdown, @walkleftup, @measureleft, \
 @exerciseleft, @complainleft, @tiltheadleft, @eatleft, \
 @dropleft, @knitleft, @faceleft,@turnright),
 @turnright: #(@walkright, @walkrightup, @walkrightdown, \
 @measureright, @exerciseright, @tiltheadright, @eatright, \
 @dropright, @knitright, @faceright,@turnleft)
 )
 self.partialpivots:= #()
 self.lasttransition := @turnright
 self.boundary := m[151]
 return self 
)
------------------------------------------------
-- EXCERPTS FROM SHELL CLASS
------------------------------------------------
class Shell (rootObject)
instance variables
 attributes -- the attributes of the shell, such as category, 
 -- size, moveable
 shelltype -- types in Playfarm are @prop or @character or @fixture
 homecell -- the cell the shell is contained in now
 parentModel -- the model grid 
 parentSpace -- the presentation space 
 shellPresenter -- the animation, or presentation object, for this shell
 direction -- own direction, @left or @right are supported
 targetdirection -- holder for the direction of another shell 
 -- interacting with this one
 busy -- boolean for busy state (when in a choreography
 -- or being dragged a shell's busy state is true)
end
------------------------------------------------------------

-- This method adds a new shell to the model space and creates a
-- representation in presentation space.
method addToScene self {class Shell} model space ->
(
 self.parentModel := model
 self.parentSpace := space
 
 if (self.homecell <> undefined) then
 (
 reconcilecell self
 switchto self.homecell self
 )
 else
 reconcilexy self
 if (self.shellpresenter <> undefined) do
 prepend space self.shellpresenter
)
---------------------------------------------------------------
method celltoxy self {class Shell} ->
(

 -- translate the row/column coordinates to the x,y coordinates 
 -- in presentation space. 
 local node := self.homecell.location
 local x, y
 y := node[1] * NODEHALF + HORIZON - self.shellpresenter.height
 if (mod node[1] 2 = 0) then
 ( x := (node[2] * NODELENGTH \
 - (self.shellpresenter.width / 2) - NODEHALF) as integer
 )
 else
 ( x := (node[2] * NODELENGTH \
 - (self.shellpresenter.width / 2) - NODELENGTH) as integer
 )
 setPresenterX self.shellpresenter x
 self.shellpresenter.y := y
)
----------------------------------------------------------------
-- reconcile presentation and model space coordinates based on
-- presentation coordinates. This method is typically called after a
-- shellpresenter is dragged and dropped.
method reconcilexy self {class Shell} #key x: y: ->
(
 if (x <> unsupplied) do
 (
 setPresenterX self.shellpresenter x
 self.shellpresenter.y := y
 )
 xytocell self
 celltoxy self 
)
------------------------------------------------
-- EXCERPTS FROM CHARACTER SHELL CLASS
------------------------------------------------
class CharacterShell (shell)
instance variables

 belongings -- props (and possible other objects) the character 
 -- is carrying
 charActivityEngine -- the activity engine guiding this character
end
--------------------------------------------------------------
method init self {class CharacterShell} #rest args ->
(
 apply nextmethod self args
 self.belongings := #()
 self.shelltype := @character
 self.charActivityEngine := new ActivityEngine
)
--------------------------------------------------------------------
-- EXCERPTS FROM SHEEP CLASS
-- Specialized behavior for Irma La Sheep. Activities and choreographies.
--------------------------------------------------------------------
class Sheep (CharacterShell)
end
method init self {class Sheep} #rest args ->
(
 apply nextmethod self args
 self.shellpresenter := new SheepPresenter \
 actioneng:self.charActivityEngine

 self.shellpresenter.modelshell := self
 -- Wander Around Activity
 local wander := new CharacterWanderAction actor:self

 -- Knit Sweaters Activity
 local isAnimalNearBy := new AnimalNearbySensor pshell: self
 local measure := new MAction \
 actor:self targetAction: self.shellpresenter \
 anim: #(@measureleft,@measureright)
 -- function that creates a new sweater object. This function 
 -- is passed to the knit action as the effect
 local function knitfunc arg ->
 (
 local b := getnth \
 #(propBitmaps["sweater 1"],propBitmaps["sweater 2"],\
 propBitmaps["sweater 3"]) \
 ((rand 3) + 1)
 local pp := new PropPresenter boundary:b fill:blackbrush \
 stroke:undefined
 local p := new PropShell propPresent: pp
 p.attributes := new SortedKeyedArray \
 keys:#(@category, @size, @moveable) \
 values:#(@item,@small, @true)
 return p
 )
 local knit := new ProduceAction actor:self \

 effect: knitfunc \
 anim: #(@knitleft,@knitright) \
 targetAction: self.shellpresenter 
 local giveGift := new DropAction actor:self \
 targetAction: self.shellpresenter \
 anim: #(@dropleft,@dropright) 
 local hand_item := new MessageAction actor:self message: @hand_item \
 effect: (arg -> \
 arg.target := isAnimalNearBy.searchResult; \
 arg.adata := giveGift.target)
 local knit_seq := new TwoShellChoreog \
 seq: #(measure, knit, giveGift, hand_item) \
 actor: self sensor:isAnimalNearBy
 local wander_loop := new LoopAction \
 targetaction: wander condition: (-> ask isAnimalNearBy)
 local findAnimal2Knit := new ActionSequence actor:self \
 seq: #(wander_loop, knit_seq, wander, wander)
 -- Complain Activity
 local complainAction := new MAction \
 actor:self targetAction: self.shellpresenter \
 anim: #(@complainleft,@complainleft)
 local randComplain := new randomAction \
 actor:self targetAction:complainAction \
 probability: 60
 local complainActivity := new ACtionSequence actor:self \
 seq: #(wander, wander, wander, randComplain)
 -- The schedule for the sheep contains 2 activities:
 -- findAnimal2Knit and complainActivity The time slot
 -- each would run is 5 cycles. See comments in
 -- activeng.sx. This is a primitive implementation of
 -- schedules, will be upgraded.
 self.charActivityEngine.schedule := #(findAnimal2Knit,complainActivity)

 self.charActivityEngine.timeunit := 5
 self.charActivityEngine.timeCycle := 10
)
--------------------------------------------------------------------
-- EXCERPTS FROM ACTION CLASS
--------------------------------------------------------------------
class MAction (rootObject) 
instance variables
 actor -- the character or prop that owns this action
 effect -- a script to run after the actio is complete. 
 -- things like increasing or decreasing strengthners go here
 anim -- token for animation to call
 targetAction -- the presenter (or embeded action) associated with action
end
---------------------------------------------------------------------

method init self {class MAction} #rest args #key targetAction:(undefined) \
 anim: actor: effect: ->
( 
 self.actor := actor
 self.targetAction := targetAction
 self.effect := effect
 self.anim := anim
)
-- This method carries out the targetAction. If target action is
-- an animation, it will animate.
method doAction self {class MAction} #key anim: ->
(
 if ((anim <> unsupplied) and (anim <> undefined)) do self.anim := anim
 if self.targetAction <> undefined do 
 (
 if self.anim <> unsupplied and self.anim <> undefined then
 (
 doAction self.targetAction \
 anim: (if (self.actor.targetdirection = @left) then \
 self.anim[1] else self.anim[2])
 )
 else
 doAction self.targetAction
 )
 if self.effect <> unsupplied do self.effect(self) 
)
---------------------------------------------------------------------
-- This action class encapsulates the behavior of taking an object
-- form the environemt and then owning (carrying) it.
class TakingAction (ActionWithOther)
end
method doAction self {class TakingAction} #key anim: ->
(
 nextmethod self anim:anim
 if self.actor <> undefined and self.target <> undefined do
 (
 -- Remove the prop form the environent and add to the actor.
 -- addProp self.actor self.target
 removeProp self.target.homecell self.target
 deleteOne self.target.parentSpace self.target.shellpresenter
 )
)
---------------------------------------------------------------------

-- This action is the complementary action to taking action. It
-- encapsulates the behavior of adding an object to the environment.
class DropAction (ActionWithOther)
end

method doAction self {class DropAction} #key anim: ->
(
 nextmethod self anim: anim
 

 if self.actor <> undefined and \
 ((getnth self.actor.belongings 1) <> empty) do
 (
 self.target := (getnth self.actor.belongings 1)
 deleteOne self.actor.belongings self.target
 self.target.homecell := self.actor.homecell

 addToScene self.target self.actor.parentModel \
 theSceneSpace
 if ((getOne (attributesGetter self.target) @moveable) = @true) do
 append theDragController self.target.shellPresenter
 )
)







































November, 1994
Building Multimedia Databases


Component software and OLE Automation lead to modular development




Michael Regelski


Michael is director of software development at Lenel Systems International,
290 Woodcliff Office Park, Fairport, NY 14450. He can be contacted on
CompuServe at 71333,622.


Just a few years ago, PCs were capable of handling only alphanumeric
information. But with the power and availability of graphical environments
such as Windows, other types of information--graphics, animation, video, and
audio--are finding their way into mainstream applications. In particular,
database- management systems have benefited from the power of multimedia
technology. A typical employee database, for example, provides a textual
description that describes the employee, but it doesn't tell you what the
employee looks like--and that's important information in secure areas of a
building.
The Multimedia Information Management System (MIMS) is a Windows-based
database system I designed for managing personnel databases. MIMS uses
multimedia technology to capture, display, and print still and motion video.
Furthermore, multimedia technology is combined with relational-database
technology to create, store, and retrieve the multimedia information along
with traditional textual information.
In building MIMS, I used Microsoft's Visual Basic as the primary development
environment, Intersolv's Q&E database VBX for the database development, and
Lenel Systems' MediaDeveloper for the multimedia components. Since the
full-blown MIMS project is large, I'll focus here on how to integrate the
capture and display of still and motion video into an application and how to
store and retrieve multimedia information in a relational database. The
complete source code for a minimal implementation of MIMS is available
electronically; see "Availability," page 3.


MediaDeveloper


MediaDeveloper is a library of multimedia programming objects that enable you
to design stand-alone software or modify existing apps that incorporate video,
sound, animation, graphics, and images. In addition to Visual Basic,
MediaDeveloper works with any Windows software that supports DLLs: Borland's
Paradox for Windows, Informix's HyperScript, and Microsoft's Access. 
MediaDeveloper-based applications can become an OLE Automation Server to other
Windows apps. Thus, you can create reusable software components which can be
programmatically accessed via these other applications. Consequently, end
users can cut and paste multimedia data from apps that incorporate
MediaDeveloper to other programs that support OLE. Because MediaDeveloper
consists of a variety of interfaces--OLE Automation Server, DLL, C++ class
library, and VBX--adding multimedia to an existing or new application is
generally straightforward. 
The MIMS employee database uses several MediaDeveloper functions: video
capture from a video-overlay board, digital-video display, and image display.
Each employee record has associated with it both a digital-video clip stored
in Audio Video Interleaved (AVI) format and a still JPEG image.
There are two methods you can use for storing images. The first stores the
image directly into a BLOB (Binary Large Object) field of the employee record.
This method is okay for binding the information directly to the rest of the
employee record, but it relies on the database system to support BLOB fields.
The second method (which I ultimately used with MIMS) is to have a text field
in the data record which contains a path to the employee photo. I chose this
method so as to allow migration to any database system (some databases don't
support BLOB fields) and potential use of the photos outside of MIMS. If an
employee photo is present, it is displayed in the upper-right corner of the
screen via Open, which is called from UpdatePhoto in the file FEMP.FRM
(available electronically). Note that MediaDeveloper objects are very
polymorphic--one object can display graphics, animation, digital video, analog
video, or audio simply by passing the filename or device name in to the Open
method. MediaDeveloper takes care of the rest. This approach saves both
resources and time since you don't have to purchase a separate control for
each media type you want to use.
Although the video-capture portion of the MIMS system is not included in the
minimal system I present here, it is an easy addition to the system. Since the
MediaDeveloper Open method can open devices such as video-overlay boards, the
overlay-board driver name can be passed into the method and the video board
can be controlled instantly. MediaDeveloper provides methods to query the
number and names of all video boards present on the system. Example 1
illustrates how you open a video-overlay board and capture a photo.


Adding Digital Video to MIMS


Adding digital-video playback to MIMS is also straightforward. If an employee
record contains a digital-video clip of the employee, a button on the bottom
of the employee search screen is enabled. The AVI digital-video format allows
for 15 frames per second playback with audio on a standard 486 PC. No special
hardware is required for the playback of digital video. A video-capture board
is required for saving digital video to hard disk.
Once a digital-video clip is captured, the filename is stored with the
employee record. The digital-video filename is stored in the field FldThumb,
and all captured video files (still and digital video) are stored in a
directory indicated by the variable g_config.CapDir.
To allow for control buttons such as Play, Rewind, and the like, I developed a
special digital-video playback form to display digital-video files. See
FVIDEO.FRM (Listing One) for complete details for implementing digital video.
When the digital-video playback form loads, a local MediaDeveloper object,
m_dispobj, is created and initialized. The object is centered in the middle of
the display screen. The digital-video file is then opened through the Open
method. The methods Play, Rewind, and Stop are called from the digital-video
playback form to give users control over video-file playback.


OLE Automation 


OLE Automation, Microsoft's implementation of a reusable-component
architecture, allows for libraries of code to be reused and shared by other
tasks in a user's environment without language dependence or compile-time
linking. The binding and communications between the client task and the server
component is handled by the OLE 2.0 DLLs. OLE Automation allows for a smooth
transition to Chicago and Windows NT by providing for 32-to-16 bit
interoperability.
An Automation Server can be either an executable (*.EXE, called a "local
server") or a DLL (*.DLL, or "in-process server"). When compared to a local
server, an in-process server is faster to load and use because a DLL doesn't
contain process-startup overhead. A DLL server also executes faster because no
process-boundary marshaling takes place. The downside is that it can be more
difficult to service many processes and to handle complicated user-interface
details (updating menus and tool bars during process-idle time).
A local server is slower during the initial load of the executable. This can
be minimized by loading the server process during the loading of your client
process. A local server is also slower for parameter passing because the
parameters must be marshaled across the process-boundary space; see Figure 1.
A local server can easily handle multiple client processes and can be used in
a distributed environment. It also has the advantage of not crashing the
client application or affecting multiple clients when a general-protection
fault occurs.
MediaDeveloper is a "local" automation server. This allows applications taking
advantage of MediaDeveloper's user-interface elements to update during
process-idle time.
An Automation Server provides the client application with components
(objects). An automation object is composed of two parts: properties and
methods. A property is analogous to a public data member of a C++ class
library. The property or data can be both set and queried. If necessary, the
object implementor may make some or all of the properties read-only for
retrieving a Window Handles property of an object, for example. A method is
similar to a method in a C++ class, which performs some action on or for the
object.
This object-oriented flavor extends to traditionally nonobject environments
such as Visual Basic. In Visual Basic, Automation Servers are much easier to
use than a VBX. Example 2, for instance, shows how to create an Automation
Server object. The first step is to declare an object variable. Note that
under Visual Basic (as with C++), an object's lifetime is determined by the
scope of the object. VB automatically terminates an automation object once it
is out of scope. In a C++-like environment, a declared object may lose scope
but it is not necessarily freed from memory. VB takes care of this
housecleaning for you.
The first set of statements in Example 2 declares two MediaDeveloper
Automation Server objects for you to use; see BDEMO.VCG (Listing Two). The
second set of statements creates the objects and activates the Automation
Server. The string MMDisplay is the registered name of the MediaDeveloper
automation object you want to create. This name is provided by the
automation-server vendor (Lenel Systems, in this case) and can be retrieved
from the Windows registration database (see also the file BDEMO.VCM, available
electronically).
After you create the object, properties can be set or queried and methods can
be called. Deleting the object is not necessary since Visual Basic deletes the
object after it loses scope. As Example 3 shows, getting and setting property
values for an object is straightforward. The first five statements are
properties for the global object g_dispobj, while the last statement calls a
method which creates the window element of the object. Table 1 lists all of
the properties available to DisplayObject, the MediaDeveloper OLE Automation
Server interface.
With OLE Automation, software components allow you to plug your components
into other components without knowing the details of how those components were
built--you only need to know the capabilities supported by the component in
order to use them. Consequently, the emerging generation of component-based
applications will be more modular, revisable, customizable, and flexible.


Conclusion


Multimedia and software-component technologies are striding into the
mainstream of software development. By combining these two technologies, I
have been able to develop a large-scale security application while, at the
same time, shortening the development cycle. Furthermore, tools such as those
described here let you concentrate on the application content, while
off-the-shelf components address your multimedia needs.

Example 1: Opening a video-overlay board and capturing a photo.
Dim CaptObj As Object Dim DeviceName As String
 ' create media developer object
Set CaptObj = CreateObject ( "MMDisplay" ) '
 eliminate all user interface objects
CaptObj.SetUIOptions ( UI_OPTIONS_NO_UI ) '
 Set the parent window as the form
CaptObj.ParentHwnd ( Form1.hWnd )
 ' Set the MediaDeveloper Window styles
 CaptObj.WindowStyle ( WS_CHILD Or
 WS_VISIBLE ) ' Create The MediaDeveloper
 window
CaptObj.Create
' get the device name of the first video board available in the system
DeviceName = CaptObj.GetDeviceName ( VAL_VIDEO_BOARD , 1 )
 ' open the video board
CaptObj.Open ( DeviceName )
' save the current image as a JPEG file photo.jpg
CaptObj.Save ( "photo.jpg" )
Example 2: Creating an Automation Server object.
Global g_dispobj As Object
Global g_capobj As Object
Set g_capobj =
CreateObject("MMDisplay")
Set g_dispobj
= CreateObject("MMDisplay")
Example 3: Getting and setting property-object values.
g_dispobj.AutoCfg = False
g_dispobj.NotifyHwnd = PnlPic.hWnd
g_dispobj.ParentHwnd = PnlPic.hWnd
g_dispobj.WindowStyle = WS_CHILD
g_dispobj.UIOptions = UIOPTION_NO_UI
g.dispobj.Create
Figure 1 OLE Automation, local versus in-process server (LRPC = lightweight
remote procedure call).
Table 1: MediaDeveloper OLE Automation Server interface.
Properties Description 
ParentHwnd Window handle of the DisplayObject parent.
NotifyHwnd Window handle for the DisplayObject to send
 notification events to WIndowStyle. Style
 flags for the creation/alteration of the
 DisplayObject window; see the Windows 3.1
 SDK for applicable values for window styles.
UIOptions User-interface options for the DisplayObject.
NotificationMs Flags for determining which events to monitor.
TimeFormat Time format used by the MCI driver.
Top Top coordinate of DisplayObject.
Left Left coordinate of DisplayObject.
Width Width of DisplayObject.
Height Height of DisplayObject.
Stretch Scales multimedia object to fit in dimensions of 
 DisplayObject.
DragDrop Enables file drag/drop from File Manager.
AutoSize Automatically sizes DisplayObject to show entire 
 object.
AutoCfg Automatically reconfigures DisplayObject user 
 interface for each new object type loaded.
UseSegmentMark Limits the range of the MCI driver to the length 
 of the media/segment.
StartSegment Position of starting point of the defined segment.

EndSegment Position of end point of the defined segment.
PreviewWidth Width of Preview Window within DisplayObject window.
Preview Height Height of Preview Window within DisplayObject window.
Visible Shows/hides DisplayObject window.

Listing One 

'FVIDEO.FRM: VBC Version
'Note: Header re-ordering can leave full-line comments out of order

Option Explicit

'======================= 'Form/Module Variables =======================
Dim m_dispobj As Object

Sub ButClose_Click ()
 Unload FVideo
End Sub
Sub ButPlay_Click ()
 If m_dispobj.WindowHandle <> 0 Then
 m_dispobj.Play
 End If
End Sub
Sub ButRwd_Click ()
 If m_dispobj.WindowHandle <> 0 Then
 m_dispobj.Rewind
 End If
End Sub
Sub ButStop_Click ()
 If m_dispobj.WindowHandle <> 0 Then
 m_dispobj.Stop
 End If
End Sub
Sub Form_Activate ()
 Dim l_fname As String
 Dim l_retval As Integer
 FMain.PnlInfo(0).Caption = "Loading Video..."
 DoEvents
 l_retval = m_dispobj.Open(FormatPath(g_config.CapDir, True) & LblFile)
 l_fname = FormatPath(g_config.CapDir, True) & LblFile
 FMain.PnlInfo(0).Caption = "Video loaded!"
End Sub
Sub Form_Load ()

'VBC ADVISORY: The following item(s) were found & handled as described
'unref variables (Removed): l_fname

 Dim l_retval As Integer

 FMain.PnlInfo(0).Caption = "Setting up hardware..."
 FMain.MousePointer = HOURGLASS

 Set m_dispobj = CreateObject("MMDisplay")
 m_dispobj.AutoCfg = False
 m_dispobj.NotifyHwnd = FVideo.hWnd
 m_dispobj.ParentHwnd = FVideo.hWnd
 m_dispobj.Width = PnlButs.Width / Screen.TwipsPerPixelX
 m_dispobj.Height = (Height - PnlButs.Height * 3) / Screen.TwipsPerPixelY
 m_dispobj.Width = 160

 m_dispobj.Height = 120

 m_dispobj.Left = ((FVideo.Width / Screen.TwipsPerPixelX) - 
 m_dispobj.Width) / 2
 m_dispobj.Top = ((FVideo.Height / Screen.TwipsPerPixelY) - 
 m_dispobj.Height) / 2 - 50
 m_dispobj.WindowStyle = WS_CHILD Or WS_BORDER Or WS_VISIBLE
 m_dispobj.UIOptions = UIOPTION_NO_UI
 l_retval = m_dispobj.Create()
 If l_retval = 0 Then
 FMain.MousePointer = DEFAULT
 MsgBox ERRMSG_PREFIX & " Could not create display object! Error 
[30194001].", MB_ICONSTOP, ERRMSG_TITLE
 Exit Sub
 End If
End Sub
Sub Form_Resize ()
 If Width < PnlButs.Width + 100 Then
 Width = PnlButs.Width + 100
 End If
 If Height < PnlButs.Height + 100 Then
 Height = PnlButs.Height + 100
 End If
 PnlButs.Top = Height - PnlButs.Height * 2
 PnlButs.Left = (Width - PnlButs.Width) \ 2
End Sub
Sub Form_Unload (Cancel As Integer)
 m_dispobj.Destroy
End Sub



Listing Two

'BDEMO.VCG: VBC Consolidated Global File for BDEMO Project
'Note: Consolidation can leave full-line comments out of order

Option Explicit

' Visual Basic Constants used in this application!
'Key codes
Global Const KEY_PRIOR = &H21
Global Const KEY_NEXT = &H22
Global Const KEY_HOME = &H24
Global Const KEY_UP = &H26
Global Const KEY_DOWN = &H28

' Shift parameter masks
Global Const ALT_MASK = 4

' Button parameter masks
' MousePointer
Global Const Default = 0 ' 0 - Default
Global Const HOURGLASS = 11 ' 11 - Hourglass
Global Const APP_TITLE = "MediaDeveloper Application Demo"

'Privilege levels for security
Global Const PRIV_PROHIBITED = 0
Global Const PRIV_ADMINISTRATOR = 2


'Color definitions
Global Const COLR_WHITE = &HFFFFFF
Global Const COLR_LTGRAY = &HC0C0C0

'ASCII Key Codes
Global Const MAXPATHLEN = 256

'Editing modes
Global Const MODE_EDIT = &H1 'Edit mode (fields are editable)
Global Const MODE_VIEW = &H2
Global Const MODE_QFIND = &H20
Global Const MODE_TXT_VIEW = " View"
Global Const MODE_TXT_QFIND = " Search"

'Predefined quick keys or short cut keys for database viewing
Global Const QKEY_SEARCH = 19
Global Const QKEY_CLEAR = 12

'Types of Q+E Operational Errors
Global Const QETYPE_NEW = 1
Global Const QETYPE_ADD = 2
Global Const QETYPE_MODIFY = 3
Global Const QETYPE_DELETE = 4
Global Const QETYPE_DISCARD = 5
Global Const QETYPE_ENDQUERY = 6
Global Const QETYPE_LOCK = 7
Global Const QETYPE_NEXT = 8
Global Const QETYPE_PREV = 9
Global Const QETYPE_QUERY = 10
Global Const QETYPE_TRANPEND = 11
Global Const QETYPE_NOTRANPEND = 12

'Error message constants used with prcTypicalQEErr ()
Global Const MSG_QUERYING = 1
Global Const MSG_QUERYCOMPLETE = 2
Global Const MSG_QUERYFAILED = 3
Global Const MSG_QUERYABORT = 4
Global Const MSG_BROWSEVIEW = 5
Global Const MSG_DETAILVIEW = 6
Global Const MSG_RECORD_ADDING = 7
Global Const MSG_RECORD_ADDED = 8
Global Const MSG_MAIN = 9

'Maximum number of records allowed to be retreived to use
'delete for single records (in any query!)
'Global Const MAXRECS_SINGLE_DELETE = 32000
'Global Const MAXRECS_IN_BROWSE = 1024
'This is the main query expression for employee records
Global Const EMP_WHERE_LINKS = "EMP.LOC = LOCATION.ID AND 
EMP.COUNTRY = COUNTRY.ID AND EMP.DIV = DIVISION.ID AND EMP.DEPT 
= DEPT.ID AND EMP.EMPTYPE = EMPTYP.ID AND EMP.EXT1 = EXT1.ID AND 
EMP.EXT2 = EXT2.ID AND EMP.ACTIVEBADG = BADGE.ID AND 
BADGE.STATUS = BADGSTAT.ID AND BADGE.TYPE = BADGETYP.ID"

Global Const UIOPTION_NO_UI = &HFF
Global Const ASCII_LF = 10

'This subroutine closes a Q+E Query, if open. If not open, nothing happens.

Global Const BROWSE_LASTRECNO = 2047999999 'Last possible record# (move to
end)

'Default sizes for Browse grid
Global Const BROWSE_READAHEAD = 128
Global Const BROWSE_MAXSIZE = 32730
Global Const BROWSE_BUFSIZE = 300 'Number of rows in a grid before buffering
Global Const BROWSE_FLG_DEL = -1 'Delete flag in delete array of records

'Index positions of specific controls to expect in p_ctrls() array
Global Const BROWSE_CTRLSINDX_QRY = 0 'Index in p_ctrls() of Query 
control
Global Const BROWSE_CTRLSINDX_FLDS = 2 
 'First Index in p_ctrls() of field controls
'Directions for movment
Global Const BROWSE_BCK = -1 'Move backwards
Global Const BROWSE_FWD = 1 'Move forwareds

'Information structure used to associated printers to layouts
'Information structure for badge type association to layout name

Global Const TBL_EMP = "EMP"
Global Const TBL_IDS = "IDS"
Global Const TBL_LOC = "LOCATION"
Global Const TBL_DEPT = "DEPT"
Global Const TBL_DIV = "DIVISION"
Global Const TBL_EMPTYP = "EMPTYP"
Global Const TBL_BADGE = "BADGE"
Global Const TBL_BADGTYP = "BADGETYP"
Global Const TBL_BADGSTAT = "BADGSTAT"
Global Const TBL_EXT1 = "EXT1"
Global Const TBL_EXT2 = "EXT2"
Global Const TBL_COUNTRY = "COUNTRY"
Global Const TBL_PHOTOS = "PHOTOS"

'======================================================================='
' QeLink.txt '
' Q+E Multilink/VB Version 2.0 '
' Global Constants File '
'======================================================================='
' Copyright: 1992-93 Q+E Software, Inc. '
' This software contains confidential and proprietary '
' information of Q+E Software Systems, Inc. '
'=======================================================================' 

Global Const QE_RECORD_NOT_FOUND = 31002
Global Const QE_RECORD_LOCKED = 31004

Global Const QE_NO_LOCK_ON_KEY_PRESS = &H1000 
' Do not lock records whenever a field changes

Global Const QE_NO_COMPARE_AFTER_LOCK = &H2000
' Do not compare fields before

Global Const BITSPIXEL = 12 ' Number of bits per pixel
Global Const PLANES = 14 ' Number of planes

' MessageBox() Flags
Global Const MB_YESNO = &H4


Global Const MB_ICONHAND = &H10
Global Const MB_ICONQUESTION = &H20
Global Const MB_ICONEXCLAMATION = &H30
Global Const MB_ICONASTERISK = &H40

' Dialog Box Command IDs
Global Const IDNO = 7

' Window Styles
Global Const WS_CHILD = &H40000000
Global Const WS_VISIBLE = &H10000000
Global Const WS_BORDER = &H800000

Global Const CHAR_BACKSLASH = "\"

Global Const ERRMSG_TITLE = "Demo Error Message"
Global Const ERRMSG_PREFIX = "ERROR! " 'Used to save space in heap space 
string declarations
Global Const INFOMSG_TITLE = "Demo Information"
Global Const APP_ABREV = "DEMO"

'dependent definitions
Global Const PRIV_DEFAULT = PRIV_ADMINISTRATOR
Global Const COLR_BG_VIEW = COLR_LTGRAY 'Background color used for 
fields during view
Global Const COLR_BG_EDIT = COLR_WHITE 'Background color for fields during
edit

Global Const MB_ICONINFORMATION = MB_ICONASTERISK
Global Const MB_ICONSTOP = MB_ICONHAND

'================== Type Definitions ==================
'independent definitions
'This is the structure that each T_REC_? record type should
'contain. This structure stores the field data that is to be displayed
'in the browse grid. Not all columns must be used.
Type T_BROWSEROW
 Col01 As String
 Col02 As String
 Col03 As String
 Col04 As String
 Col05 As String
 Col06 As String
 Col07 As String
 Col08 As String
 Col09 As String
 Col10 As String
 Col11 As String
End Type

Type T_SIZEPOS
 Top As Long
 Left As Long
 Width As Long
 Height As Long
End Type
'Security information structure
Type T_SECURITY
 Default As Integer 'Default privilege level
 Logon As Integer 'True if logon required

 ExitPriv As Integer 'Minimum privilege for exiting (0 means logoff anytime)
End Type
Type T_PHOTOINFO
 Original As String 'Original photo in employee record
 LName As String 
 'Employee last name in photo database that is no longer in use 
 '(photo was selected for emp record). Used to find record to 
 'delete once emp record is saved
 FName As String
 'Employee first name in photos database that is no longer in 
 'use (photo was selected for emp record)
 Freeze As String
 'Name of photo that was frozen (in case user changes mind, 
 'needs to be deleted if not used)
 NewPhoto As String 
 'Name of photo user has selected to include in employee record.
End Type
Type T_USERINFO
 FName As String
 LName As String
 id As String
 logonid As String
 Privilege As Integer
End Type
'dependent definitions
Type T_REC_EMP
 BrowseRow As T_BROWSEROW
 AlphaID As String 'BrowseRow.Col01
 SSNo As String 'BrowseRow.Col02
 FName As String 'BrowseRow.Col03
 LName As String 'BrowseRow.Col04
 EmpTyp As String 'BrowseRow.Col05
 addr1 As String
 addr2 As String
 city As String
 state As String
 zip As String
 country As String
 bdate As String
 badgeno As String
 badgetype As String
 BadgeStat As String 'BrowseRow.Col06
 title As String
 dept As String
 div As String
 loc As String
 homephone As String
 ext As String
 Floor As String
 Building As String
 officephone As String
 Photo As String
 HasPhoto As String
End Type
Type T_BROWSEINFO
 SingleDelete As Integer 
 'True if record numbers are maintained in a deleted() array 
 '(Recs in query are less than max array size 32760)
 SameDataset As Integer 'False if new search occured

 Header As String 'This is the header that goes on the grid
 ReadAhead As Integer 
 'Number of records to read into grid when scrolling past 
 ' buffer size (Must be < BufferSize)
 BufferSize As Integer 
 'Maximum number of records stored in grid at any one time ()
 'AddRecs As Integer 
 'Number of additional records in the m_added () array
 ModifyOK As Integer 
 'This is false when no more room to store modified records 
 'user must research
 DelRecs As Integer 
 'Number of records marked as deleted in m_deleted()
 'RecNoCol As Integer 'Column in grid containing record number
 TopRec As Long 
 'LOGICAL Record number (excluding delete) of top record 
 'in browse grid (may not be the same as Q+E number!)
 Rec As Long 'Current LOGICAL record number - not actual record number 
 '(takes into account deleted records)
 Recs As Long 'Number of records in query (not including deleted)
 'AddsIndx As Integer 'Index into m_added() array - declared by form!
 'Adds As Integer 'Upper bound (size) of m_added() array -declared by form!
 ModsIndx As Integer 
 'Next open element in m_modified () array -declared by form!
 Mods As Integer
 'Upper bound (size) of m_modified() array -declared by form!
 BrowseRecNos(1 To BROWSE_BUFSIZE) As Long 
 'Record numbers corresponding to rows in grid
End Type
Type TYP_CONFIG
 CompanyName As String * 128 'Company name
 Logo As String * 128 'Filename to company logo
 internal As String * 64 'Type of Q+E database connection for internal files
 external As String * 64 'Type of Q+E database connection for external files
 intprefix As String * 256 'Prefix/direcotry to internal tables
 extprefix As String * 256 'Prefix/direcotry to external tables
 CapSizePos As T_SIZEPOS
 CapDir As String * 256 'Name of directory to store photo directories
 CapDirLimit As Integer 
 'Maximum number of files in sub photo directory before continuing.
 Security As T_SECURITY 'Security information structure
End Type
'====================== API/DLL Declarations ======================
Declare Function fDoQuery Lib "qelink.vbx" (queryCtl As Control) As Integer
Declare Function fEndQuery Lib "qelink.vbx" (queryCtl As Control) As Integer
Declare Function fRandom Lib "qelink.vbx" (queryCtl As Control, 
 ByVal RecNumber&) As Integer
Declare Function fEnterQBE Lib "qelink.vbx" (queryCtl As Control) As Integer
Declare Function fDelete Lib "qelink.vbx" (queryCtl As Control, 
 ByVal rowIndex%) As Integer
Declare Function fLogon Lib "qelink.vbx" (connectionCtl As Control) As Integer
Declare Function fLogoff Lib "qelink.vbx" (connectionCtl As Control) As
Integer
' Extended Window Styles
Declare Function GetModuleHandle Lib "Kernel" (ByVal lpModuleName 
 As String) As Integer
Declare Function GetModuleFileName Lib "Kernel" (ByVal hModule 
 As Integer, ByVal lpFileName As String, ByVal nSize As Integer) As Integer
'GDI Routines....
Declare Function GetDeviceCaps Lib "GDI" (ByVal hDC As Integer, 

 ByVal nIndex As Integer) As Integer
'================== Global Variables ==================
'independent definitions
Global g_retval As Integer
' DrawStyle
'Global Const SOLID = 0 '0 - Solid
'Global Variables (specific global structures in respective modules!)
'NOTE! Globals for paint program are in Paint.BAS
Global g_cnct_int As ConnectClass 'Internal connection control
Global g_cnct_ext As ConnectClass 'External connection control
Global g_name_int As String 'Name of internal connection
Global g_name_ext As String 'Name of external connection
Global g_exe_path As String 'Path (without \) of executable directory
Global g_colors As Long 'Number of colors (current color mode)

Global g_develop As Integer 'Is 1 if development mode (set by commandline)
Global g_exit As Integer 
 '1 if valid exit. 0 if user presses something like [Alt-F4].
Global g_dbinit As Integer '1 if InitDB () successfully called
Global g_formreturn As Integer 'Return value after calling Load form...

Global g_dispobj As Object
Global g_capobj As Object
Global g_mediadev As Integer 
 'True if the OLE capture object has been initialized!
Global g_capobjcreated As Integer 
 'True if the OLE capture object has been created ()
'dependent definitions
Global g_config As TYP_CONFIG
Global g_photoinfo As T_PHOTOINFO
Global g_userinfo As T_USERINFO































November, 1994
PROGRAMMING PARADIGMS


Unnatural-Born Killers




Michael Swaine


This month's column presents several views from the edge of software
development. Mostly these views are problems posed or solutions suggested at a
recent conference on artificial intelligence. But there's also a peek at a
strange and violent world where real robots do real battle for the amusement
of an evolving underculture whose tastes run to that sort of thing. And
there's something about the information superhighway: obligatory, these days,
like car crashes in certain kinds of movies.
In short, this column won't help you to meet last Wednesday's deadline, but it
may satisfy your bloodlust.


Obligatory Car Crash


Let's see hands of those who want to read about a car crash involving a
Microsoft executive.
Uh-huh. That's about what I expected. You Microsoft employees: one hand only,
please. Those Microkids are so eager.
I have to warn you: It's not much of a car crash. I don't want you to be
disappointed. I'll try to work in another one later in the column to make up
in quantity for what they lack in quality. The real carnage and destruction
comes in the robot wars section of the column. First, though, this_.
You hold a software-development conference in Seattle, presumably, to make it
convenient for Microsoft to send over a guest speaker.
It worked well at the Twelfth National Conference on Artificial Intelligence
(and Sixth Innovative Applications of Artificial Intelligence Conference),
henceforth AAAI-94, this August: Tuesday's Invited Talk was delivered
by--Steve Ballmer??
Okay, so Microsoft's executive vice president of sales and support might not
have been the program committee's first choice to address an audience of
highly technical, academically oriented developers, but Steve is always a
dynamic speaker. And he more than made up for any ignorance of just exactly
what the attendees were doing for a living with his laser-sharp vision of what
they ought to be doing.
Port your "cycle-sucking apps" to Windows, Ballmer told the crowd, citing a
44-million/year PC sales rate and a 150--200 million installed base. PCs,
Ballmer said, are selling faster "than cars and small trucks."
He acknowledged that ordinary Windows is, for these AI folks, a "toy"
operating system. He pointed them NT-ward, not dwelling overmuch on just what
fraction of that installed base now uses or ever will use NT. But NT is the
future, he said, adding, "long-term, we don't need to invest in two streams of
R&D." Translation: Windows is future history. A pre-denouncement?
If the Seattle location worked well for the program committee, it didn't work
so well for Steve, who dinged his car in the Convention Center parking garage.
His car. If he had been at, say, COMDEX in Las Vegas, it would have been a
rental.
An AI navigation system that kept him from bumping into things would be nice,
he told the crowd.
At COMDEX in Las Vegas, a comment like that from a Microsoft executive would
send a dozen entrepreneurs back to their suites to write business plans. Not
here, though, I suspected. Not with this crowd.


Riders versus Striders


Who were these people? I decided to do some objective tests to confirm what I
suspected. Suitcounting the crowd at Ballmer's talk, I got a low 20 percent.
Not a suit crowd, then.
Nor were they escalator striders. There are two kinds of people: those who
treat escalators like elevators and those who treat them like stairs. Riders
and striders. I've never known an entrepreneur who wasn't a strider. Here at
AAAI-94, the strider-to-rider ratio was low, lower than at most trade shows.
My conclusion: The attendees at AAAI-94 were neither suits nor striders;
neither blind followers nor tunnel-vision fanatics. This could be an
interesting conference, I thought.
The list of tracks in the program reflected what the program committee took to
be the active areas in AI research today: causal reasoning, spatial reasoning,
nonmonotonic reasoning, model-based reasoning, uncertainty management,
constraint satisfaction, knowledge bases, distributed AI (collaborating
agents, for example), robotics, perception, machine learning (from
reinforcement learning to induction and discovery), natural-language
processing, neural nets and simulated annealing, planning, scheduling, and
search.
In addition to the technical presentations, there were the student poster
sessions expected at any academic conference, exhibits, a video program, a
special program on advances in machine translation, a robot competition, and
an AI art show.
I hope you won't be disappointed if I don't tell you all about the student
papers, the video program, or the art show. I won't say much about the
exhibits, either, because they were--how shall I put it--pitiful. Not the
individual exhibits, just the number of them. A couple dozen companies saw fit
to buy booth space. I'm not including publishers' row: Looking at the number
of publishers who exhibited and the number of AI-related books they have
brought out recently, you'd conclude that AI was an exciting field.


The Exciting Part


Raj Reddy of Carnegie Mellon University delivered the keynote address for the
conference, on the state of AI--no, sorry--on "The Excitement of AI." It's
nice when the title conveys not only the subject but also the bias of the
talk.
What this field needs, Reddy told the crowd, is a few decades of quiet,
sustained progress. Just keep the funding flowing and leave us alone, I guess
he means, and we will show you some exiting stuff. Like AI cruise control in
every new car. (Reddy gave this talk before Ballmer dinged his car.)
Most of the cruise-control work has already been done, Reddy said, and could
be put into action really soon. There's been similarly impressive work in
intelligent automated tutoring and in planning and scheduling, the latter an
outgrowth of Desert Storm. Current applications of this war work include
smarter manufacturing and disaster management, with potentially big payoffs.
There are ten or twenty other AI areas, Reddy said, that have shown just as
much success.
Fine, fine. So why does AI get no respect? Could it be_forty years of effort
at a half billion a year with few publicly visible results? Could that be it?
Probably, Reddy opines. And then there's the academic publish-or-perish
mentality that leads to a focus on short-term results, while interesting AI
problems generally require a large, sustained attack, and the really
interesting problems, he says, probably need a 1000 to 10,000 person-year
effort. Not that much different from Microsoft's NT development.
This idea of breaking down the big problems of AI and tackling manageable
pieces--not particularly novel, one would think--pervaded the invited lectures
at AAAI-94.


Reddy's Recommendations


So what to do? Reddy recommends:

Talk it up. Produce some concept demos. Show that AI produces results.
Don't try to do it all. Not only the afore-advised notion of figuring out how
to peel off one piece of a large task while others tackle other parts, but
also, using agents. For instance, consider tackling difficult problems by
developing agents that work with humans, that don't give up but rather ask
people for help and know how to learn from their human partners. Agents that
collaborate with people.
There was more talk of collaborating agents during this week in Seattle than
at a dozen writers' conferences.
Reddy laid down some challenges for laborers in the field of AI:
The (Inter?)National Information Infra-structure (NII, III?) needs a lot of
help if everybody is really going to use it.
Create aids for the disabled. There are people who might be said to have
conceptual disadvantages, as opposed to perceptual or motor disadvantages. AI
can help them. If the field is really artificial intelligence, why can't it
give such people an intellectual prosthesis?
Create scientific-discovery tools. Let us all be Einsteins.
After that, Reddy suggests, we can get on with the real work of AI: the
creation of superhuman intelligence.


Can't We Just Get Along?


Whew. Barbara Grosz, in her AAAI presidential address, picked up the theme of
collaborating agents. Grosz gave some helpful distinctions and definitions:
"Collaboration," as opposed to interaction, a familiar computer-science
paradigm, involves processes (or people, of course) working together rather
than acting on each other. Collaborative activities are characterized by:
A relationship of partners, rather than master/slave.
Collaborators having different knowledge and abilities.
Partial knowledge being the rule rather than the exception.
A need to work together both at the level of action and at the level of
planning.
Beliefs and intentions being inextricable from the collaborative phase of
planning.
Intentions come in two varieties, and these have different roles in
collaborative planning. There is the intention to do something, and there is
the intention that something be the case. Individual plans and collaborative
plans are not the same kind of thing; you don't add up the individual plans to
get the collaborative plan. Individual plans require:
The knowledge of a recipe.
The ability to perform its subtasks.
The intention to perform them.
Collaborative plans, on the other hand, require:
A belief in the recipe.
The intention that the goal be attained.
The individual plans.
Grosz also pointed out that there are problems that one robot can't solve but
that two collaborating robots can.
A lot of presentations at AAAI-94 dealt with collaboration among agents. A
recent issue of Communications of the ACM was also dedicated to the subject.
Must be a fad, I mean, trend.


The Obligatory Information-Superhighway Mention


Reading back over what I've written, I detect a note of cynicism. I regret it,
because AI is important and the AAAI-94 attendees are doing serious,
cutting-edge computer-science work. This attitude of mine is probably due to
my belief that the term "artificial intelligence," like the term "liberal," is
dead, and that those who have embraced it ought to just get a new word.
I also regret there being a note of cynicism in this column up to this point
because we have just arrived at the point where cynicism is truly deserved.
We have finally come to the obligatory information-superhighway mention.
Depending on the level of cynicism one brings to a discussion of the
information superhighway, it is either the post-cold- war $500 toilet seat,
the moral equivalent of the B-1 bomber, a runaway metaphor, the future home of
mankind, or as good an excuse as any for funding computer-science research.
The information superhighway will certainly need computer-science research,
said ARPA's Kirstie Bellman in her invited AAAI-94 talk.
Actually, the impressively credentialed Bellman, who appears to juggle complex
ideas and humongous projects with equal ease, makes a good case for needing AI
in the infrastructure of the infobahn. The information highway surely will be
a huge heterogeneous network made up of diverse components, since the closest
thing we've got to the information superhighway, the Internet, is that
already. Whenever we are dealing with a complex system, she said, it's
basically a modeling problem. We will need the power of new formalisms, but
they need to be the right formalisms. Even the word formalism is often used
too loosely these days, she said.
That was apparently the point at which I passed out in the rarefied
atmosphere, because my notes jump abruptly to an audience member asking
Bellman if there is any connection between what she's been saying and anything
in the PC universe.
"The government's role in this is to get out of the way," Bellman answers.
"However--"
However, the commercial market is not doing well in this whole area of
infrastructure. Too many commercial companies crassly trying to own the
platform. "We need to keep that from happening," she said, but didn't say how.
She did offer an insight that applies to a lot more than the information
infrastructure, though:
"Don't confuse standards with uniformity. An individual user wants a uniform
interface, but not all users want the same interface."


Janitorial Combat


Intimidated by Bellman's search for the right formalisms, I sought something
more grounded. I found the AAAI-94 robot competition.
AAAI does this every year; last year, the robots moved large boxes around.
This year, they picked up and dropped several kinds of small objects into
waste baskets. They distinguished among these small objects--soda cans,
Styrofoam cups, and paper wads. And they moved through a multiroom environment
cluttered with chairs and tables as they performed their janitorial chores.
It would have been more impressive if all the robots had actually done all
these things, but many took penalty points and virtualized some of the
steps--like picking up the objects and putting them in the waste baskets. But
there was still a lot of computer science going on.
Chip, a robot from Chicago, for example, used texture, size, and color
information in its visual system. It used sonar to ensure that its arm didn't
hit chairs and tables. It used infrared to tell if the object was in its grip
and tactile feedback to keep from using too much pressure in its grip.
Stanford's entry had a distinctive search strategy, moving to the center of
each room and patiently scanning the entire room for objects, then zipping
from object to object to (virtually) pick up and basket the trash. Most robots
searched for new trash after each pickup, although they generally remembered
where the closest waste basket was.


Carnage and Leather



We now jump two weeks ahead and 700 miles south to San Francisco and the first
annual Robot Wars event, the brainchild of Animatronics special-effects design
engineer Mark Thorpe.
The difference between the two competitions is immediately apparent. The AAAI
roboticists wore team t-shirts. At Robot Wars, black leather and camouflage
coloration predominate. A member of the program committee gave the laconic
play-by-play in Seattle; in San Francisco, the M.C. would have been at home in
Thunderdome. There are distinct robotics cultures growing up. The culture at
this San Francisco event owes something to demolition derby and something to
Mad Max.
Understand, Robot Wars was not an academic demonstration in robotic
technology. It was pitched battle in an arena, with no quarter asked or given.
A huge poster at the door warned that attendees, by the act of entering the
building, were accepting the risk of injury from flying robot parts.
The program wasn't too complicated. There were various events, but basically
the (radio-controlled) robots just beat each other to staggering scrap. The
judge cleared the hall for the lunch break with a jet-engine blast that must
have violated a city ordinance or two and certainly wasn't healthy, and after
lunch the carnage resumed.
One crowd pleaser was the robot with the chain saw, which hacked the bodies
and electronics of its opponents, as well as chewing on the arena itself.
Another robot was studded with spikes, apparently to intimidate the other
robots. It didn't seem to work. My favorite was the house robot, supplied by
the sponsors to serve as a neutral nemesis in some of the events. In the final
free-for-all, it was the house 'bot that reduced the others to inert rubble on
the arena floor, triumphing by use of an arm/paddle appendage with which it
swatted opponents and flipped them upside down.
There wasn't much computer science going on. What did seem to be going on was
a new form of entertainment. I dunno, this could be bigger than disc golf. The
main drawback to robot wars as a sport, it seems to me, is that the toys get
fairly wrecked in the game, but I guess the same is true of football.
Oh, yes. The second car crash. The car I was in on the way to Robot Wars was
rear-ended in a four-car pile-up on the Coast Highway. No big deal. Not my
car. Why I was in a rental car in the San Francisco area, where I live, while
Steve Ballmer was in his own car in Seattle, where he lives, is hard to
explain. Probably a karma thing.





















































November, 1994
C PROGRAMMING


Quincy's Translator and the C++ Library




Al Stevens


There is no better example of software validating hardware than TV Nation, a
news-magazine program created by Michael Moore, the "Me" in Roger and Me.
Because of it, I bought a TV for my office in case I'm working late on any
Tuesday at 8:00 PM. Moore demonstrated how lobbyists work by hiring one to get
a resolution making August 16 national "TVNation Day" introduced onto the
floors of the Senate and House.
In another edition, Moore visited the corporate headquarters of several
Fortune 500 companies, stood on the sidewalk with a bullhorn, and challenged
the CEOs to come down and demonstrate that they could use the products of
their companies. The CEO of IBM did not come down and show us that he could
format a floppy disk. I would prefer to see him try to install OS/2. Since my
diatribe on that subject in the September issue, I have heard from several
readers about it. Most of them had experienced, seen, or heard about similar
episodes. The consensus is that IBM has not figured out installations yet,
particularly where video drivers are concerned. However, some readers
disagreed with me completely, and one suggested that I should get into a
different line of work. That's funny--I was thinking the same thing during the
whole ordeal.
One of my complaints concerned the number of crashes in OS/2, particularly
when running Windows applications. That situation improved after I installed
some upgrades that I found on a CD-ROM, and OS/2 became much more stable.
There were problems with the upgrade installation, though. Grrr.


Patterns


In September, I also mentioned Jim Coplien's "patterns" discussion at the
Borland International Conference and relayed his concerns that trade books and
CASE tools would abound before anyone really understands the concept. Cope
responded by saying:
I did say it; I meant it; there are examples that illustrate it. My concern is
that readers of the column may come to the conclusion that ALL imminent books
on patterns are trash_I mentioned during my talk that the forthcoming book by
Gamma, Helm, Johnson, and Vlissides (Design Patterns: Elements of Reusable
Object-Oriented Software, Addison-Wesley, ISBN 0-201-63361-2; due out October
14, 1994) is a solid foundation for further patterns work.
I am very interested in this area and plan to review the book in detail when
it becomes available. I've always thought of software development as an infant
craft; it lacks what centuries-old crafts enjoy--the intuitive ability of the
craftsmen to visualize the result before it is designed. One of the problems
is that the tools are part of the product. You don't build a Skil saw so that
you can build a house and then include the Skil saw in the house. That's
probably not a clear analogy, but you know what I mean. The point is, all the
methodologies notwithstanding, we really don't know how to take full advantage
of what we know from experience, and we don't know how to pass wisdom and
experience on to succeeding generations because we don't have a crystal-clear
model for expressing design--one that the designer and builder can see
intuitively, not only on paper but in their heads, too. Structured and
object-oriented design have addressed and improved the matter considerably but
have not solved the problem completely.
An architect designs a structure. A carpenter reads the blueprint and builds
the structure. If they know what they are doing, there are few surprises when
the job is completed. Furthermore, everybody knows when they are finished; we
software developers have none of that.


Quincy: Loosely Coupled Code


The Quincy C-interpreter project continues this month. The discussion focuses
on how the design separates the IDE, the translator, and the interpreter. I've
intentionally kept those three components as loosely coupled as possible. I
might want to use the IDE for a different language or translator, and I might
want to use the translator and interpreter in different environments. Table 1
lists Quincy's C source files, organized by their relative responsibilities
among the three tasks.
The IDE column in Table 1 lists the source-code files that support the D-Flat
IDE. The Translator column lists the files that support translation--preparing
the source code for interpreting. The Interpreter column lists the files that
support run-time interpreting.
There could be times where you would build a program using any one or two of
the components. For example, if you wanted to build a different language into
the IDE, you could compile and link the files in the first column with the
D-Flat library and see what fell out as unresolved references. That would tell
you what the new language module needed to provide or what you could
eliminate. Similarly, if you wanted to build a run-time-only interpreter that
reads and interprets files of translated token streams, you could compile and
link only the source files in the Interpreter column of Table 1.
The source-code files in these columns share a few global references across
the three categories. Linking any one or two of them will report unresolved
functions and variables. These references represent the coupling between the
modules. Depending on your requirements, you can either remove the references
or provide the missing external item. For example, Quincy's IDE provides the
main function. To build a command-line interface, you would delete the IDE
column and add a module with the user interface and a main function. If the
interface does not include a debugger, you would remove the references in the
Translator and Interpreter code to debugger variables and functions. Or, to
preserve the integrity of the source code, you could stub them in. I hesitate
to supply a complex set of compile-time conditional preprocessing directives.
I learned from D-Flat that they represent a large potential number of compile
configurations that I cannot possibly test every time I modify the code.


Quincy's Translator


I discussed the IDE in May, the preprocessor in June and July, the debugger in
August, and the lexical scanner in October. This month I'll begin to describe
the translator, the code that builds an interpretable program from the token
stream built by the lexical scanner.
Quincy interprets the token stream, which encodes source code. While some
interpreters compile the tokens into a pseudocode that, when interpreted,
implements a virtual-machine architecture, Quincy does not. Its translation
consists of scanning the source code into tokens; building and initializing
the global and static declarations; and resolving symbol references to global,
local, and argument identifiers. However, the translator must first establish
the run-time environment and call the preprocessor and scanner. Listing One is
cinterp.c, the code that initiates translation and interpreting. It represents
the interpreter's shell. The debugger calls the qinterpmain main function to
run a program, passing the address of the source-code buffer and the argc and
argv command-line arguments.
Listing One declares a number of global variables. It sets off those shared by
the IDE to make them easy to find if I want to split out the components. The
interpreter uses lists of global variables, structures, and functions. The
data structures that define these tables and lists are declared in cinterp.h
(Listing Two), which also provides the prototypes and global declarations for
the translator and interpreter.
The qinterpmain function in Listing One allocates memory for the tokens,
stack, variable definitions, data memory, functions, symbol table, and
function prototypes. The sizes of these allocations are determined by global
integer values that the IDE and the translator share. The IDE has a dialog box
that lets the programmer change these sizes. After allocating the run-time
memory, translation begins. The program uses a setjmp to specify where
translation and run-time errors should return. It calls the preprocessor,
lexical-scanner, and compiler functions in that order to translate the
program. I'll discuss the compiler operation next month.
To execute the program, the translator builds a small, one-line program that
calls the interpreted program's main function, passing the argc and argv
parameters. It calls the lexical scanner to tokenize the statement and then
calls the interpreter's statement function to interpret the statement. The
one-line statement does not need to be compiled because it has no function or
variable declarations to resolve. The only references it has are to its own
two parameters and main, which has already been built. When statement returns,
the program has completed running, and the interpreter cleans up all the
allocated memory.
If an error occurs during translation or run time, the error function in
Listing One is called. It posts the error code that identifies the error and
does a longjmp. If the Watching variable is true, the error occurred when the
user specified a variable to watch or examine from the IDE, and the error
function makes its longjmp to the Watchjmp jmp_buf. Otherwise, the longjmp
goes to the Shelljmp jmp_buf, which the program set just before it started
translation. In this case, the program jumps to where the statement program
would have returned. As far as the translator and interpreter are concerned,
this is a normal completion. It's up to the IDE to recognize that the error
code has been set and report the error to the programmer.
Listing One includes a function named AssertFail, which implements a
D-Flat-friendly variant of the Standard C assert function. Listing Two defines
the Assert macro under control of the NDEBUG compile-time conditional after
the fashion of the Standard-C assert function. There are uses of Assert
throughout the program. The AssertFail function does not abort Quincy the way
that assert would. It uses the IDE's error-reporting mechanism to report the
error instead. Because D-Flat programs hook and chain interrupt vectors,
untimely aborts crash the system.
Listing One also includes the getoken function, which the translator and
interpreter share to retrieve tokens from the token stream. Different tokens
cause different actions beyond being retrieved and returned to the translator
and interpreter. The T_LINENO token posts the current file and line number to
the program's context and then proceeds to retrieve the next token. This
action permits the error-reporting mechanism to report the file and line
number of a translation or run-time error. Space tokens are bypassed. Symbols
update a global current-variable data structure. Functions update a global
current-function data structure. The token retriever recognizes constants and
posts their values to a global current-value data structure. These data
structures are defined in Listing Two.


"C Programming" Column Source Code


Quincy, D-Flat, and D-Flat++ are available to download from the DDJ Forum on
CompuServe and on the Internet by anonymous ftp. See page 3 for details. If
you cannot get to one of the online sources, send a diskette and a stamped,
addressed mailer to me at Dr. Dobb's Journal, 411 Borel, San Mateo, CA 94402.
I'll send you a copy of the source code. It's free, but if you want to support
my Careware charity, include a dollar for the Brevard County Food Bank.


The Draft Standard C++ Library


The Draft Standard C++ Library, by P.J. Plauger (Prentice Hall, 1995, ISBN
0-13-117-0031), follows in the tradition of the author's earlier book, The
Standard C Library, which explains and implements the Standard-C function
library as defined by ANSI X3J11. The new book takes a similar approach,
presenting what the draft C++ Standard says about each of the Standard library
header files, amplifying those terse descriptions, and providing an
implementation in source code of the classes defined by the ANSI X3J16
committee. The book explains the details of the implementation, testing, and
use of the draft Standard classes.

In the preface, Plauger states five purposes for the book:
 To present the text of the library portion of the draft Standard, which it
does. 
 To be a model for implementers of the library, which it certainly is. 
 To be a tutorial on the library's use, which succeeds, but only with respect
to the version of the library addressed. 
 To teach by example how to "design and implement class libraries in general,"
presumably without respect to the language. 
 To address the issues specific to building C++ class libraries. 
These goals are delineated in the preface and addressed in each of the
following chapters--one for each of the library header files defined in the
draft Standard. 
This book is based on the Standard C++ library as defined in a February
publication by X3J16 of a draft Standard for public review. This document has
been overtaken by events. For whatever reasons, members of the committee
followed that publication almost immediately with new proposals for language
and library changes that, if accepted, will significantly change the Standard.
Some of what the public saw--particularly with respect to the library--was
incomplete and obsolete shortly thereafter. Plauger's book, then, presents a
snapshot of the library as it existed for one brief moment in the history of
C++. The book's implementation may well be the only one of this momentary
version of the library ever to see the light of day. You're not left in the
dark to wonder about the future, though. Each chapter includes a section
titled "Future Directions" that describes what's changing.
An unstated purpose, but one that the book serves well, is to provide insight
into the complex language that C++ is becoming. Someone who has not
participated in Committee deliberations is likely to reel with the impact of
the changes. For better or worse, Standard C++ will be a much bigger language
than the one implemented by most contemporary compilers. Plauger, an active
participant (he is editor of the library portion of the Standard) and an old
hand at language definition and translator development, understands the
changes well and respects their consequences. He brings a mature perspective
to the implications of some of the new features and is candid about them. From
his comments in the book and from reading parts of the draft, I conclude that
some changes are probably underspecified; their proponents may have developed
the details of new features without benefit of extensive experience in their
use. 
Much of the book's implementation is offset by those changes to the Standard.
The string and stream classes from the February draft are being replaced by
template classes that take advantage of a new language feature, default
template parameters, to provide one-class support for wide-character strings
and streams. The bits<T> template and bitstring class may be replaced by the
Standard Template Library, a proposal made in May for standard template
container classes. Not that the code or the book are without use. You can
develop to this interim standard, and your work will readily port to the next
one. The changes do not affect the library's interface, only its
implementation.
Plauger could not have modified the book to meet the new draft Standard. No
one compiler is available to readers that implements all the new language
features needed to support the library changes. Publishing deadlines and
commitments could have been involved, too. His choices were to delay the
project unduly or to charge ahead with the library as accepted by the
Committee and published in the February draft. This circumstance must have
compromised his influence on the Committee process. To oppose a change for any
reason would have suggested a conflict of interests. The merit of his
arguments might have been overshadowed by the appearance of an outside agenda,
whether real or imagined. That is too bad. Plauger is one of the more
experienced members both in language standardization and in dealing with
committees. On the other hand, his book well achieves its stated goals
considering the erratically shifting target. We have the benefit of that
achievement, and Plauger can always do a second edition (and a third and a
fourth, ad infinitum) as the Committee continues, Sybil-like, in its endless
cycle of innovation.
Table 1: Quincy .c source-code files.
 IDE Translator Interpreter 
 qnc.c cinterp.c stmt.c
 qdialogs.c preproc.c expr.c
 qmenus.c preexpr.c primary.c
 print.c scanner.c func.c
 debugger.c ccompile.c stack.c
 watch.c symbol.c
 break.c symbols.c
 qconfig.c sys.c
 errs.c

Listing One 
/* ------------ cinterp.c ------------ */
/* QUINCY Runtime Interpreter */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <dos.h>
#include <setjmp.h>
#include <sys\stat.h>
#include <alloc.h>
#include <errno.h>

#include "dflat.h"
#include "cinterp.h"
#include "debugger.h"
#include "quincy.h"

unsigned char *Progstart; /* start of user program */
unsigned char *NextProto; /* addr of next prototype */
int Saw_return; /* return encountered in user program */
int Saw_break; /* break encountered in user program */
int Saw_continue; /* continue" encountered in user program */
int Linking; /* set when in linker */
unsigned char *pSrc;
VARIABLE *Blkvar; /* beginning of local block variables */
VARIABLELIST Globals; /* table of program global variables */
char *PrototypeMemory; /* table of prototypes */
jmp_buf Shelljmp;
/* -------- IDE/interpreter common global items ------------ */
int inSystem;
jmp_buf BreakJmp;
CONTEXT Ctx; /* running program's context */
ITEM *Stackbtm; /* start of program stack */
ITEM *Stacktop; /* end of program stack */
SYMBOLTABLE *SymbolTable; /* symbol table */
int SymbolCount; /* symbols in table */

VARIABLE *VariableMemory; /* table of variables */
FUNCTION *FunctionMemory; /* table of functins */
FUNCTION *NextFunction; /* next available function in table */
char *DataSpace; /* data space for autos */
unsigned Progused; /* bytes of program space used */
static int ExecuteProgram(unsigned char *source, int argc, char *argv[]);
static void qprload(unsigned char *SourceCode, char *prog);
/* ----- deallocate memory ----- */
static void ClearMemory(void **buf, void **end, int *count)
{
 free(*buf);
 *buf = NULL;
 if (end)
 *end = NULL;
 if (count)
 *count = 0;
}
/* ----- main entry to compile & interpret program ----- */
int qinterpmain(unsigned char *source, int argc, char *argv[])
{
 int rtn = -1;
 Globals.vfirst = NULL;
 Globals.vlast = NULL;
 Ctx.Curvar = NULL;
 Ctx.Curstruct.vfirst = NULL;
 Ctx.Curstruct.vlast = NULL;
 Ctx.Curfunc = NULL;
 ConstExpression = 0;
 /* Allocate memory for program runtime tokens */
 errno = 0;
 Progstart = getmem(qCfg.MaxProgram);
 Ctx.Progptr = Progstart;
 /* Allocate stack, variables, data, functions, symbols, prototypes */
 Stackbtm = getmem((qCfg.MaxStack+1) * sizeof(struct item));
 Ctx.Stackptr = Stackbtm;
 Stacktop = Stackbtm + qCfg.MaxStack;
 VariableMemory = getmem(qCfg.MaxVariables*sizeof(VARIABLE));
 Ctx.NextVar = VariableMemory;
 if ((DataSpace = malloc(qCfg.MaxDataSpace)) == NULL)
 error(OMERR);
 Ctx.NextData = DataSpace;
 FunctionMemory = getmem(qCfg.MaxFunctions * sizeof(FUNCTION));
 NextFunction = FunctionMemory;
 SymbolTable = getmem(qCfg.MaxSymbolTable * sizeof(SYMBOLTABLE));
 NextProto = PrototypeMemory = getmem(qCfg.MaxPrototype);
 *ErrorMsg = '\0';
 fflush(stdin);
 fflush(stdout);
 /* compile and interpret the program */
 rtn = ExecuteProgram(source, argc, argv);
 /* clean up after the program */
 ClearHeap();
 DeleteSymbols();
 if (ErrorCode && Ctx.CurrFileno)
 sprintf(ErrorMsg+strlen(ErrorMsg), " %s Line %d: ",
 SrcFileName(Ctx.CurrFileno), Ctx.CurrLineno);
 CleanUpPreProcessor();
 ClearMemory(&(void*)pSrc, NULL, NULL);
 ClearMemory(&(void*)PrototypeMemory,&(void*)NextProto,NULL);

 ClearMemory(&(void*)SymbolTable, NULL, &SymbolCount);
 ClearMemory(&(void*)FunctionMemory, &(void*)NextFunction,NULL);
 ClearMemory(&(void*)DataSpace, &(void*)Ctx.NextData,NULL);
 ClearMemory(&(void*)VariableMemory,&(void*)Ctx.NextVar,NULL);
 ClearMemory(&(void*)Stackbtm,&(void*)Ctx.Stackptr,NULL);
 ClearMemory(&(void*)Progstart,NULL,&(int)Progused);
 errno = 0;
 return rtn;
}
/* -------- compile and execute the program -------- */
static int ExecuteProgram(unsigned char *source, int argc, char *argv[])
{
 unsigned char Tknbuf[80];
 unsigned char ln[40];
 WINDOW wwnd = WatchIcon();
 if (setjmp(Shelljmp) == 0) {
 /* --- preprocess and lexical scan --- */
 qprload(source, Progstart);
 /* --- compile --- */
 ccompile(&Globals);
 /* ---- execute ----- */
 sprintf(ln, "return main(%d,(char**)%lu);", argc, argv);
 tokenize(Tknbuf, ln);
 Ctx.Progptr = Tknbuf;
 getoken();
 SendMessage(wwnd, CLOSE_WINDOW, 0, 0);
 wwnd = NULL;
 if (!Stepping)
 HideIDE();
 statement();
 }
 if (wwnd != NULL)
 SendMessage(wwnd, CLOSE_WINDOW, 0, 0);
 TerminateProgram();
 return ErrorCode ? ErrorCode : popint(); 
}
/* ----- preprocess and lexical scan ----- */
static void qprload(unsigned char *SourceCode, char *prog)
{
 /* tokenize program */
 pSrc = getmem(MAXTEXTLEN);
 PreProcessor(pSrc, SourceCode);
 Progused = tokenize(prog, pSrc);
 free(pSrc);
 pSrc = NULL;
 Ctx.Progptr = Progstart;
}
/* ----- compiler and runtime error function ------ */
void error(int errnum)
{
 ErrorCode = errnum;
 if (Watching)
 longjmp(Watchjmp, 1);
 else if (Running)
 longjmp(Shelljmp, 1);
}
/* ----- gets memory for the interpreter ----- */
void *getmem(unsigned size)
{

 void *ptr;
 if ((ptr = calloc(1, size)) == NULL)
 error(OMERR);
 return ptr;
}
#ifndef NDEBUG
/* ------- Quincy's version of assert ------- */
void AssertFail(char *cond, char *file, int lno)
{
 sprintf(errs[ASSERTERR-1], "Assert(%s) %s, Line %d", cond, file, lno);
 error(ASSERTERR);
}
#endif
/* ----- compile and interpret get token ------ */
int getoken()
{
 static int isStruct;
 for (;;) {
 switch (Ctx.Token = *Ctx.Progptr++) {
 case T_LINENO:
 Ctx.CurrFileno = *Ctx.Progptr++;
 Ctx.CurrLineno = *(int*)Ctx.Progptr;
 Ctx.Progptr += sizeof(int);
 break;
 case ' ':
 break;
 case T_EOF:
 Ctx.Value.ival = *Ctx.Progptr--;
 isStruct = 0;
 return Ctx.Token;
 case T_SYMBOL:
 Ctx.Value.ival = *(int*)Ctx.Progptr;
 Ctx.Curvar = SearchVariable(Ctx.Value.ival, isStruct);
 if (!isStruct && Ctx.Curvar == NULL)
 Ctx.Curvar = SearchVariable(Ctx.Value.ival, 1);
 Ctx.Progptr += sizeof(int);
 isStruct = 0;
 return Ctx.Token;
 case T_IDENTIFIER:
 isStruct = 0;
 Ctx.Curvar = MK_FP(FP_SEG(VariableMemory),
 *(unsigned*)Ctx.Progptr);
 Ctx.Progptr += sizeof(int);
 return Ctx.Token;
 case T_FUNCTION:
 Ctx.Curfunction=FindFunction(*(int*)Ctx.Progptr);
 Ctx.Progptr += sizeof(int);
 return Ctx.Token;
 case T_FUNCTREF:
 Ctx.Curfunction = FunctionMemory + *(int*)Ctx.Progptr;
 Ctx.Progptr += sizeof(int);
 return Ctx.Token;
 case T_CHRCONST:
 Ctx.Value.ival = *Ctx.Progptr++;
 return Ctx.Token;
 case T_STRCONST:
 Ctx.Value.cptr = Ctx.Progptr + 1;
 Ctx.Progptr += *Ctx.Progptr;
 return Ctx.Token;

 case T_INTCONST:
 Ctx.Value.ival = *((int *)Ctx.Progptr);
 Ctx.Progptr += sizeof(int);
 return Ctx.Token;
 case T_LNGCONST:
 Ctx.Value.lval = *((long *)Ctx.Progptr);
 Ctx.Progptr += sizeof(long);
 return Ctx.Token;
 case T_FLTCONST:
 Ctx.Value.fval = *((double *)Ctx.Progptr);
 Ctx.Progptr += sizeof(double);
 return Ctx.Token;
 case T_STRUCT:
 case T_UNION:
 isStruct = 1;
 return Ctx.Token;
 default:
 isStruct = 0;
 return Ctx.Token;
 }
 }
}




Listing Two

/* cinterp.h QUINCY Interpreter - header file */

#ifndef CINTERP_H
#define CINTERP_H

#include <setjmp.h>
#include <ctype.h>

#undef isxdigit
#undef isalnum
#undef isdigit
#undef isalpha

#include "dflat.h"
#include "errs.h"
#include "tokens.h"

#define PROGTITLE "The Quincy C Interpreter"
#define QVERSION "4.2"
/* Table size constants */
#define MAXSTACK 256 /* default program stack size */
#define MAXPR (16*1024) /* default user program space */
#define MAXVARIABLES 1024 /* max variables */
#define MAXFUNCTIONS 200 /* max functions in program */
#define DATASPACE (16*1024) /* data space for program */
#define MAXPARMS 10 /* maximum macro parameters */
#define MAXSYMBOLTABLE 1024 /* symbol table space */
#define AVGPROTOTYPES 10 /* avg prototype bytes/func */
#define MAXDIM 4 /* max dimensions for arrays */
#define MAXOPENFILES 15 /* max open FILEs */
#define MAXINCLUDES 10 /* max nested #include files */

#define MAXIFLEVELS 25 /* max nested #if...s */
/* Constants */
#define RVALUE 0 /* a constant */
#define LVALUE 1 /* a variable */
enum Type { VOID, CHAR, INT, LONG, FLOAT, STRUCT, UNION, ENUM };
#define FUNCT 1 /* a function */
#define STRUCTELEM 2 /* structure element */
#define LABEL 4 /* goto label */
#define TYPEDEF 8 /* typedef */
/* ---- storage classes ----- */
#define AUTO 1
#define REGISTER 2
#define VOLATILE 4
#define EXTERN 8
/* Variable table entry -- (one for each declared variable) */
typedef struct variable {
 int vsymbolid; /* variable identifier */
 char vclass; /* its indirection level */
 char vkind; /* kind of variable (func, struct elem, etc. */
 int vtype; /* type, INT, CHAR, etc. */
 int vsize; /* size of variable */
 int vdims[MAXDIM]; /* lengths (if an array) */
 char vconst; /* 0=read/write, 1=variable is const,*/
 /* 2=pointer -> const, 3=both */
 char vstatic; /* 1 = static */
 char vqualifier; /* 1=auto, 2=register, 4=volatile, 8=extern */
 char islocal; /* 1=local variable, 2=argument */
 char isunsigned; /* 1 = unsigned, 0 = signed */
 char isinitialized; /* 1 = variable is initialized */
 int voffset; /* offset of data fr start buffer */
 int vwidth; /* width of data space */
 int vBlkNesting; /* block nesting level */
 struct variable *vstruct; /* for a struct var, -> definition */
 int fileno; /* file number where declared */
 int lineno; /* line number where declared */
 int enumval; /* integer value for enum constant */
 /* ----- must be same structure as VARIABLELIST below ---- */
 struct {
 struct variable *vfirst;
 struct variable *vlast;
 } velem; /* VARIABLELIST of struct elements */
 struct variable *vprev; /* backward link (1st item ->last) */
 struct variable *vnext; /* forward link */
} VARIABLE;
/* Variable list */
typedef struct {
 VARIABLE *vfirst; 
 VARIABLE *vlast;
} VARIABLELIST;
/* Function definition -- (one for each declared function) */
typedef struct {
 int symbol; /* function symbol id */
 char ismain; /* 1 = main() */
 int libcode; /* > 0 = standard library function */
 char *proto; /* function prototype */
 void *code; /* function code */
 int type; /* return type, INT, CHAR, etc. */
 char class; /* indirection level of func return */
 unsigned char fileno; /* where the function is */

 int lineno; /* line no of function header */
 char fconst; /* 0=read/write, 1=function is const*/
 /* 2=pointer -> const, 3=both */
 int width; /* width of auto variables */
 VARIABLELIST locals; /* list of local variables */
 int BlkNesting; /* block nesting level */
} FUNCTION;
/* Running function table entry 
 * (one instance for each iteration of recursive function) */
typedef struct funcrunning {
 FUNCTION *fvar; /* function variable */
 char *ldata; /* local data */
 int arglength; /* length of arguments */
 struct funcrunning *fprev; /* calling function */
 /* need this so debugger can find correct variables */
 int BlkNesting; /* block nesting level */
} FUNCRUNNING;
/* Stack data item's value */
typedef union datum {
 char cval; /* character values */
 int ival; /* integer values */
 long lval; /* long values */
 double fval; /* floating point values */
 char *cptr; /* pointers to chars */
 unsigned char *ucptr; /* pointers to unsigned chars */
 int *iptr; /* pointers to ints */
 unsigned int *uiptr; /* pointers to unsigned ints */
 long *lptr; /* pointers to longs */
 unsigned long *ulptr; /* pointers to unsigned longs */
 double *fptr; /* pointers to floats */
 FUNCTION *funcptr; /* pointers to functions */
 char **pptr; /* pointers to pointers */
} DATUM;
/* Stack item with attributes */
typedef struct item {
 char kind; /* STRUCTELEM, FUNCT, LABEL, TYPEDEF */
 char isunsigned; /* 1 = unsigned, 0 = signed */
 char class; /* pointer or array indirection level */
 char lvalue; /* 1 == LVALUE, 0 == RVALUE */
 char vconst; /* 0 = read/write, 1,2,3 = const */
 char vqualifier; /* storage class, etc. */
 int size; /* size of the thing on the stack */
 char type; /* type of the thing on the stack */
 int dims[MAXDIM]; /* array dimensions */
 VARIABLE *vstruct; /* for a struct var, -> definition */
 VARIABLELIST *elem; /* structure's element variable list */
 DATUM value; /* the value of the thing */
} ITEM;
/* ----- preprocessor tokens ----- */
enum PreProcTokens {
 P_DEFINE = 1, P_ELSE, P_ELIF, P_ENDIF, P_ERROR,
 P_IF, P_IFDEF, P_IFNDEF, P_INCLUDE, P_UNDEF
};
/* ----- program running context ----- */
typedef struct context {
 unsigned char *Progptr; /* statement pointer */
 unsigned char *svpptr; /* saved statement pointer */
 char svToken; /* saved token value */
 VARIABLE *svCurvar; /* saved variable */

 int CurrFileno; /* current source file */
 int CurrLineno; /* current source file line number */
 VARIABLE *Curvar; /* -> current variable declaration */
 FUNCTION *Curfunction; /* -> current function declaration */
 FUNCTION *Linkfunction; /* -> function being linked */
 ITEM *Stackptr; /* stack pointer */
 DATUM Value; /* value on stack */
 char Token; /* current token value */
 FUNCRUNNING *Curfunc; /* current running function */
 VARIABLE *NextVar; /* next avail stack frame variable */
 VARIABLELIST Curstruct; /* list of current struct members */
 char *NextData; /* next available data space */
 int Looping; /* set inside while or for loop */
 int Switching; /* set inside switch */
} CONTEXT;
/* -------- setjmp buffer ----------- */
typedef struct jmpbuf {
 int jmp_id;
 jmp_buf jb;
 CONTEXT jmp_ctx;
} JMPBUF;
typedef struct symbol {
 char *symbol;
 int ident;
} SYMBOLTABLE;
/* -------- shell prototypes --------- */
void *getmem(unsigned);
void error(int);
/* ------- preprocessor/linker/compiler prototypes ------ */
void PreProcessor(unsigned char*,unsigned char*);
void CleanUpPreProcessor(void);
int FindPreProcessor(char*);

VARIABLE *SearchVariable(int,int);
VARIABLE *InstallVariable(VARIABLE*,VARIABLELIST*,int,int,int,int);
FUNCTION *FindFunction(int);
void InstallFunction(FUNCTION*);
VARIABLE *DeclareVariable(VARIABLELIST*,int,int,int,int);
void Initializer(VARIABLE*,char*,int);
int VariableWidth(VARIABLE*);
void *AllocVariable(void);
void *GetDataSpace(int,int);
int isTypeDeclaration(void);
void ccompile(VARIABLELIST*);
int tokenize(char*,char*);
int istypespec(void);
int SearchLibrary(char*);
int FindKeyword(char*);
int FindOperator(char*);
int SearchSymbols(char*,struct symbol*,int,int);
int FindSymbol(char*);
char *FindSymbolName(int);
int AddSymbol(char*);
void DeleteSymbols(void);
char *SrcFileName(int);
void *DataAddress(VARIABLE*pvar);
void ClearHeap(void);
void PromptIDE(void);
int CBreak(void);

/* -------- interpreter prototypes----------- */
void stmtend(void);
void stmtbegin(void);
int ExpressionOne(void);
int expression(void);
void cond(void);
void assignment(void);
void callfunc(void);
void DeleteJmpbufs(void);
void torvalue(ITEM*);
int getoken(void);
void skip(char,char);
void statement(void);
VARIABLE *primary(void);
void sys(void);
int readonly(ITEM*sp);
char MakeType(char tok);
int TypeSize(char type);
void TestZeroReturn(void);
void OpenStdout(void);
int ArrayElements(VARIABLE*);
int ArrayDimensions(VARIABLE*);
int ItemArrayDimensions(ITEM*);
int ItemArrayElements(ITEM*);
int ElementWidth(ITEM*);
void TypeQualifier(VARIABLE*);
char MakeTypeToken(char,int*);
void TerminateProgram(void);
/* ------- stack prototypes ------- */
int popint(void);
long poplng(void);
double popflt(void);
void store(void*,int,void*,int,char);
void psh(void);
void pop(void);
void popn(int);
void push(char,char,char,char,unsigned,char,VARIABLELIST*,DATUM*,char);
void pushint(int);
void pushlng(long);
void pushptr(void*,char);
void pushflt(double);
int popnint(int);
int popint(void);
void *popptr(void);
long poplng(void);
double popflt(void);
int StackItemisNumericType(void);
void topget(ITEM*);
void topset(ITEM*);
void topdup(void);
void FixStackType(char);
/* -------------- global data definitions ------------------ */
extern unsigned char *Progstart; /* start of user program */
extern unsigned char *NextProto; /* addr of next prototype */
extern int Saw_return; /* return in user program */
extern int Saw_break; /* break in user program */
extern int Saw_continue; /* continue in user program */
extern int Looping; /* inside while or for loop */
extern int Switching; /* inside switch */

extern int Linking; /* in linker */
extern int Linklib; /* linking stdlib */
extern int ConstExpression; /* initializing globals */
extern int SkipExpression; /* skipping effect of expression */
extern FUNCTION *Functions; /* functions */
extern VARIABLE *Blkvar; /* beg of lcl block autos */
extern VARIABLELIST Globals; /* global variables */
extern VARIABLELIST Curstruct; /* current struct members */
extern char *PrototypeMemory; /* prototypes */
extern int inSystem;
extern CONTEXT Ctx; /* running program context */
extern ITEM *Stackbtm; /* start of program stack */
extern ITEM *Stacktop; /* end of program stack */
extern SYMBOLTABLE *SymbolTable; /* symbol table */
extern int SymbolCount; /* symbols in table */
extern VARIABLE *VariableMemory; /* table of auto variables */
extern FUNCTION *FunctionMemory; /* table of functions */
extern FUNCTION *NextFunction; /* next avail func in table */
extern char *DataSpace; /* data space for autos */
extern unsigned Progused; /* program space used */
/* ----------------- configuration items ------------------- */
extern struct QuincyConfig {
 unsigned int MaxProgram; /* user program space */
 unsigned int MaxStack; /* stack size */
 unsigned int MaxVariables; /* number of variables */
 unsigned int MaxFunctions; /* number of functions */
 unsigned int MaxDataSpace; /* data bytes */
 unsigned int MaxSymbolTable; /* symbol table space */
 unsigned int MaxPrototype; /* prototype table space */
 char scrollbars; /* display scrollbars */
 char inTutorial; /* start in tutorial */
 char tutorhelp[11]; /* current tutorial help wnd*/
} qCfg;
/* ----- jmp_bufs ---------- */
extern jmp_buf Shelljmp;
extern jmp_buf PreProcessjmp;
extern jmp_buf Includejmp;
extern jmp_buf BreakJmp;
/* ----- state variables ------- */
extern int Including;
extern int PreProcessing;
extern int ErrorCode;
/* ---------- system-wide macros -------- */
#define rslva(a,l) ((l)?(a):(char*)(&a))
#define rslvs(s,c) ((c)?sizeof(void *):s)
#define NullVariable(var) memset(var, 0, sizeof(VARIABLE))
#define NullFunction(fnc) memset(fnc, 0, sizeof(FUNCTION))
#define alphanum(c) (isalpha(c)isdigit(c)c=='_')
#define isSymbol() \
 ((Ctx.Token)==T_SYMBOL(Ctx.Token)==T_IDENTIFIER)
#define ItemisAddressOrPointer(i) ((i).class)
#define ItemisPointer(i) \
 (ItemisAddressOrPointer(i) && (i).lvalue)
#define ItemisAddress(i) \
 (ItemisAddressOrPointer(i) && !(i).lvalue)
#define ItemisArray(i) ((i).dims[0] != 0)
#define ItemisInteger(i) \
 ((i)->type==INT (i)->type==CHAR (i)->type==LONG)
#define StackItemisAddressOrPointer() (Ctx.Stackptr->class)

#define StackItemisPointer() \
 (StackItemisAddressOrPointer() && Ctx.Stackptr->lvalue)
#define StackItemisAddress() \
 (StackItemisAddressOrPointer() && !Ctx.Stackptr->lvalue)
/* (this is a bad test. It returns true for char address, too) */
#define StackItemisString() \
 (StackItemisAddress() && Ctx.Stackptr->type == CHAR)
#define isTypedef(var) (((var)->vkind&TYPEDEF) != 0)
#define isArray(var) ((var)->vdims[0])
#define isPointerArray(var) \
 (((var)->vclass) > ArrayDimensions(var) && isArray(var))
#define isPointer(var) \
 ((((var)->vclass) && !isArray(var))isPointerArray(var))
#define isAddressOrPointer(var) ((var)->vclass)
#define rslvaddr(addr, lval) (lval ? *addr : (char *)addr)
#define rslvsize(size, class) (class ? sizeof(void *) : size)
/* -------- Quincy's version of assert ------ */
#ifdef NDEBUG
#define Assert(p) ((void)0)
#else
void AssertFail(char*,char*,int);
#define Assert(p) ((p)?(void)0:AssertFail(#p,__FILE__,__LINE__))
#endif

#endif





































November, 1994
ALGORITHM ALLEY


Truly Random Numbers




Colin Plumb


Colin, a student from Toronto, was introduced to modern cryptography by the
Pretty Good Privacy package. He can be contacted at colin@nyx.cs.du.edu.


Introduction 
by Bruce Schneier
Why do we need random numbers? There are a lot of reasons. You might be
designing a communications protocol and need random timing parameters to
prevent system lockups. You might be conducting a massive Monte Carlo
simulation and need random numbers for various parameters. Or you might be
designing a computer game and need random numbers to determine the results of
different actions. 
As common as they are, random numbers can be infuriatingly hard to generate on
a computer. The very nature of a computer--a deterministic, digital Turing
machine--is contrary to the notion of randomness.
One application where random numbers are essential is in cryptography. The
security of a cryptographic system often hinges on the randomness of its keys.
In this month's "Algorithm Alley," Colin Plumb discusses the random-number
generator in the Pretty Good Privacy (PGP) e-mail security program. Colin is
one of the designers and programmers of PGP, and has spent a lot of time
thinking about this problem. His solution is elegant, efficient, effective,
and has applications well beyond e-mail security.
The ANSI C rand() function does not return random numbers. This is not a bug;
it's required by the ANSI C standard. Instead, the values returned are
determined by the seed supplied to srand(). If you run the same program with
the same seed, you get the same "random" numbers. The pattern may not be
obvious to the casual observer, but if Las Vegas ran this way, there'd be
fewer bright lights in the big city.
John von Neumann once said that "anyone who considers arithmetical methods of
producing random digits is, of course, in a state of sin." Sometimes you want
truly random numbers. Any number of security and cryptographic applications
require them. When it comes to checking for viruses, for example, CRCs are
convenient and fast, but you can easily fake out a known polynomial. The
Strongbox secure loader from Carnegie Mellon University, however, uses a
random polynomial to achieve security while keeping the speed advantages of a
CRC (see Camelot and Avalon: A Distributed Transaction Facility, edited by
Jeffrey L. Eppinger).
There are ways to produce random bits in hardware; sampling a quantum effect
such as radioactive decay, for instance. However, this is hard to calibrate
and involves special hardware.
On the other hand, software in a properly working computer is
deterministic--the antithesis of random. Still, a computer generally has to
interact and receive input from real-world events, so it is possible to make
use of a very unpredictable part of most computer systems: the person typing
at the keyboard.
Although keystrokes are somewhat random, compression utilities illustrate just
how predictable most text is. While it would be foolish to ignore this
entropy, anticipating what someone types is akin to password guessing:
difficult, but if you have the computational horsepower, conceivable.
A more fruitful source is timing. Many computers can time events down to the
microsecond. And while typing patterns on familiar words or phrases are
repeatable enough to be used for identification, there is still a large window
of available noise. Our basic source for entropy comes from sampling system
timers on every keystroke.
The problem that remains is to turn these timer values, which have a
nonuniform distribution, into uniformly distributed random bits. This is where
the software comes in.


Theory of Operation


The file randpool.c (see Listing Four) uses cryptographic techniques to
"distill" the essential randomness from an arbitrary amount of sort-of-random
seed material. As the file name suggests, the program maintains a pool of
hopefully random bits, into which additional information is "stirred." The
goal here is that if you have n bits of entropy in the pool ("Shannon
information," if you're familiar with information theory), any n bits of the
output are truly random.
The stirring operation (actually nothing more than an encryption pass) is
central. If you know the key and initial vector, you can reverse it to get
back the initial state. Since the encryption is reversible, stirring the pool
obviously does not lose information. So all the information in the initial
state is there, it's just masked by the encryption.
Since we don't need to reverse the encryption, the key is then destroyed. The
information is then reinitialized with data taken from the pool that was just
stirred. This makes it essentially impossible to determine the previous state
of the random-number pool from what is left in memory.
The cipher is Peter Gutmann's Message Digest Cipher using MD5 as a base. This
is fast and simple (as strong ciphers go), especially on 32-bit machines. In
this application, the large key size also helps efficiency. (For another
application of this cipher, see Gutmann's shareware MS-DOS disk encryptor,
"SFS." Every commercial MS-DOS disk encryptor I've seen--Norton Diskreet, for
example--has appalling cryptography. Their only advantage is that you can get
the data back with a few weeks' work if you lose the key. If you lose the key
with SFS, it's lost.) 
The output of the generator is taken from the pool, starting after the 64
bytes used for the next stirring key. If you reach the end of the pool, stir
again and restart. After that, it is theoretically possible to examine the
output and determine the key, which would reveal the complete state of the
generator and let you predict its output forever. That, however, would require
breaking the cipher by deriving the key from the data before and after
encryption, an adequate guarantee of security.
Input is more interesting. To ensure that each bit of seed material affects
the entire pool, the seed material is added (using XOR) to the key buffer.
When you reach the end of the key buffer, stir the pool and start over. The
difficulty of cryptanalysis (deriving the key from the prior and following
states of the pool) ensures that regularities in the seed material do not
produce regularities in the pool.
Adding bytes to the key sets the take position to the end of the pool, so the
newly added data will be stirred in before any bytes are returned. Thus, you
can add and remove bytes in any order.
The code mostly works with bytes, but since MD5 works with 32-bit words, a
standard byte ordering is used. This way you can use it as a pseudorandom
number generator seeded with a passphrase.
If you want to use the hash directly, md5.c (see Listing Two) includes a full
implementation of the MD5 algorithm. (It is similar to the hash presented in
"SHA: The Secure Hash Algorithm," by William Stallings, DDJ, April 1994.) If
you have a large amount of low-grade seed material, you can use MD5 to
pre-reduce it. For example, you can feed mouse-position reports into MD5, then
periodically add the resultant 16-byte digest to the pool. Even faster
algorithms are possible--based on CRCs and scrambler polynomials--if you have
real-time constraints.


Practice of Operation


The file noise.c (see Listing Three) samples a variety of system timers and
adds them to the random-number pool. It also returns the number of
highest-resolution ticks since the previous call, which you can use to
estimate the entropy of this sample. On an IBM PC, only 16 bits are returned;
this underestimates the result if the calls are more than 1/18.2 seconds
apart, but that is not a security problem.
The code also works under UNIX. You may have to find the frequency of a timer
that only returns ticks; noiseTickSize() finds the resolution of a timer (the
gettimeofday() function) that only returns seconds.
The main driver is in randtest.c (see Listing One). A flash effect is provided
by funnyprint(). Of more value is randRange(), which illustrates a way to
generate uniformly distributed random numbers in a range not provided by the
generator. The problem is akin to generating numbers from 1 to 5 using a
six-sided die. The solution amounts to rerolling if you get a 6.
The most interesting part is rand-Accum(), which accumulates a specified
amount of entropy from the keyboard. It uses the number of ticks returned by
the noise() function to estimate the entropy. It assumes that inter-keystroke
times vary pretty uniformly over a range of 15 percent or so. Thus, it divides
the tick count by 6 to get the fraction of the interval that is random, then
takes the logarithm to get the number of bits of entropy.
The integer number of bits comes from normalizing the number and counting the
shifts. The entropy is kept to four fractional bits using a few iterations of
an integer-logarithm algorithm.


Weaknesses


I don't know of any exploitable holes in this approach to generating random
numbers, but in cryptography, only a fool is sure he has a good algorithm. I
believe the following points need further examination: 
The divide-by-six approximation in randAccum(). This was chosen so a machine
with only a 60-Hz clock would produce at least one bit per keystroke; not a
very good reason. A much better technique is suggested by Ueli Maurer's paper
from Crypto '90, "A Universal Statistical Test For Random Bit Generators."
However, this technique is slow to decide that the input is trustworthy and
requires large tables.

The "leakage" rate of information from the pool. Because the stirring key is
drawn from the pool itself, collisions are possible. These are states of the
pool which, after stirring, result in the same output state. This reduces the
information content of the pool.
The use of MD5 as a cipher. If you are using this as a cryptographic PRNG and
producing large amounts of output from a smaller seed, the cipher at the heart
of the stirring may be broken. The amount of known plaintext available from
any given stirring is quite low (a few hundred bytes), the key space is
dauntingly large (512 bits), and no such attacks on MD5 have appeared in the
civilian literature; however, MD5 was not designed for use as a cipher and has
had less study in this mode.


References


Eppinger, Jeffrey L., Lily B. Mummert, and Alfred Z. Spector, eds. Camelot and
Avalon: A Distributed Transaction Facility. San Mateo, CA: Morgan Kaufman,
1991.
Maurer, Ueli M. "A Universal Statistical Test for Random Bit Generators," in
Advances in Cryptology--Crypto '90. Berlin: Springer-Verlag, 1991.
Davis, Donald T., Ross Ihaka, and Philip Fenstermacher. "Cryptographic
Randomness From Air Turbulence in Disk Drives," in Advances in
Cryptology--Crypto '94. Berlin: Springer-Verlag, 1994.
Knuth, Donald E. The Art of Computer Programming, Volume 2: Seminumerical
Algorithms. Reading, MA: Addison-Wesley, 1981.

Listing One 
/* usuals.h -- Useful typedefs */
#ifndef USUALS_H
#define USUALS_H
#include <limits.h>
#if UCHAR_MAX == 0xFF
typedef unsigned char byte; /* 8-bit byte */
#else
#error No 8-bit type found
#endif
#if UINT_MAX == 0xFFFFFFFF
typedef unsigned int word32; /* 32-bit word */
#elif ULONG_MAX == 0xFFFFFFFF
typedef unsigned long word32;
#else
#error No 32-bit type found
#endif
#endif /* USUALS_H */

/* randtest.c -- Test application for the random-number routines */
#include <stdio.h>
#include <string.h>
#include <stdlib.h> /* For rand(), srand() and RAND_MAX */
#include "randpool.h"
#include "noise.h"

/* This function returns pseudo-random numbers uniformly from 0..range-1. */
static unsigned
randRange(unsigned range)
{
 unsigned result, div = ((unsigned)RAND_MAX+1)/range;
 while ((result = rand()/div) == range)
 /* retry */ ;
 return result;
}
/* Cute Wargames-like random effect thrown in for fun */
static void
funnyprint(char const *string)
{
 static const char alphabet[] =
"ABCDEFGHIJKLMNOPWRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890";
 char c, flag[80] = {0}; /* 80 is maximum line length */
 unsigned i, tumbling = 0, len = strlen(string);
 /* We don't need good random numbers, so just use a good seed */
 randPoolGetBytes((byte *)i, sizeof(i));
 srand(i); /* Sometimes simple PRNGs are useful! */
 /* Truncate longer strings (unless you have a better idea) */

 if (len > sizeof(flag))
 len = sizeof(flag);
 /* Count letters that we can tumble (letters in the alphabet) */
 for (i = 0; i < len; i++) {
 if (strchr(alphabet, string[i])) {
 flag[i] = 1; /* Increase this for more tumbling */
 tumbling++;
 }
 }
 /* Print until all characters are stable. */
 do {
 putchar('\r');
 for (i = 0; i < len; i++) {
 if (flag[i]) {
 c = alphabet[randRange(sizeof(alphabet)-1)];
 if (c == string[i] && --flag[i] == 0)
 tumbling--;
 } else {
 c = string[i];
 }
 putchar(c);
 }
 fflush(stdout);
 } while (tumbling);
 putchar('\n');
}
#include <conio.h> /* For getch() */
#define FRACBITS 4 /* We count in 1/16 of a bit increments. */
/* Gather entropy from keyboard timing. This is currently MS-DOS specific. */
static void
randAccum(int bits)
{
 word32 delta;
 int c, oldc = 0, olderc = 0;
 if (bits > RANDPOOLBITS)
 bits = RANDPOOLBITS;
 bits <<= FRACBITS;
 puts("We are generating some truly random bits by timing your\n"
 "keystrokes. Please type until the counter reaches 0.\n");
 while (bits > 0) {
 printf("\r%4d ", (bits-1 >> FRACBITS) + 1);
 c = getch();
 delta = noise()/6; /* Add time of keystroke */
 if (c == 0)
 c = 0x100 + getch(); /* Handle function keys */
 randPoolAddBytes((byte const *)&c, sizeof(c));
 /* Normal typing has double letters, but discard triples */
 if (c == oldc && c == olderc)
 continue;
 olderc = oldc;
 oldc = c;
 if (delta) { /* Subtract log2(delta) from bits */
 /* Integer bits first, normalizing */
 bits -= 31<<FRACBITS;
 while (delta < 1ul<<31) {
 bits += 1<<FRACBITS;
 delta <<= 1;
 }
 /* Fractional bits, using integer log algorithm */

 for (c = 1 << FRACBITS-1; c; c >>= 1) {
 delta >>= 16;
 delta *= delta;
 if (delta >= 1ul<<31)
 bits -= c;
 else
 delta <<= 1;
 }
 }
 }
 puts("\r 0 Thank you, that's enough.");
}
/* When invoked with the argument "foo", this should start with:
 * Adding "foo\0" to pool. Pseudo-random bytes:
 * 4c 9d 41 ba 44 41 63 a1 db 1c ab 3f 52 a1 a2 84 c3 e5 dc bc 57 4c d9 f3 38
 * d7 45 50 f9 94 36 96 a3 df 90 ff 23 e5 ec 3c 76 1f ce 1c bc d6 79 8b 5e e7
 * aa 97 16 c0 50 c6 95 0b c1 62 42 e5 5b 8f d7 bd d7 70 1f c6 60 6a 5f f3 74
 * 8d 35 ad 51 5a 4a 0c 02 cd d5 36 7e d4 c2 d9 f0 d3 49 ed 2d fa 4e 2b 70 3f
 */
int
main(int argc, char **argv)
{
 int i;
 while (--argc) {
 printf("Adding \"%s\\0\" to the pool.\n", *++argv);
 randPoolAddBytes((byte const *)*argv, strlen(*argv)+1);
 }
 puts("\nPseudo-random bytes:");
 i = 100;
 while (i--)
 printf("%02x%c", randPoolGetByte(), i % 25 ? ' ' : '\n');
 putchar('\n');
 funnyprint("This will be deterministic on a given system.");
 putchar('\n');
 noise(); /* Establish a baseline for the deltas */
 randAccum(800); /* 800 random bits = 100 random bytes */
 puts("\nTruly random bytes:");
 i = 100;
 while (i--)
 printf("%02x%c", randPoolGetByte(), i % 25 ? ' ' : '\n');
 putchar('\n');
 funnyprint("This will be unpredictable.");
 return 0;
}



Listing Two

/* md5.h -- declarations for md5.c */
#ifndef MD5_H
#define MD5_H
#include "usuals.h"
struct MD5Context {
 word32 hash[4];
 word32 bytes[2];
 word32 input[16];
};
void byteSwap(word32 *buf, unsigned words);

void MD5Init(struct MD5Context *context);
void MD5Update(struct MD5Context *context, byte const *buf, unsigned len);
void MD5Final(byte digest[16], struct MD5Context *context);
void MD5Transform(word32 hash[4], word32 const input[16]);
#endif /* !MD5_H */

/* md5.c -- An implementation of Ron Rivest's MD5 message-digest algorithm.
 * Written by Colin Plumb in 1993, no copyright is claimed. This code is in
the
 * public domain; do with it what you wish. Equivalent code is available from
 * RSA Data Security, Inc. This code does not oblige you to include legal
 * boilerplate in the documentation. To compute the message digest of a string
 * of bytes, declare an MD5Context structure, pass it to MD5Init, call
 * MD5Update as needed on buffers full of bytes, and then call MD5Final, which
 * will fill a supplied 16-byte array with the digest.
 */
#include <string.h> /* for memcpy() */
#include "md5.h"

/* Byte-swap an array of words to little-endian. (Byte-sex independent) */
void
byteSwap(word32 *buf, unsigned words)
{
 byte *p = (byte *)buf;
 do {
 *buf++ = (word32)((unsigned)p[3]<<8 p[2]) << 16 
 ((unsigned)p[1]<<8 p[0]);
 p += 4;
 } while (--words);
}
/* Start MD5 accumulation. */
void
MD5Init(struct MD5Context *ctx)
{
 ctx->hash[0] = 0x67452301; ctx->hash[1] = 0xefcdab89;
 ctx->hash[2] = 0x98badcfe; ctx->hash[3] = 0x10325476;
 ctx->bytes[1] = ctx->bytes[0] = 0;
}
/* Update ctx to reflect the addition of another buffer full of bytes. */
void
MD5Update(struct MD5Context *ctx, byte const *buf, unsigned len)
{
 word32 t = ctx->bytes[0];
 if ((ctx->bytes[0] = t + len) < t) /* Update 64-bit byte count */
 ctx->bytes[1]++; /* Carry from low to high */
 t = 64 - (t & 0x3f); /* Bytes available in ctx->input (>= 1) */
 if (t > len) {
 memcpy((byte *)ctx->input+64-t, buf, len);
 return;
 }
 /* First chunk is an odd size */
 memcpy((byte *)ctx->input+64-t, buf, t);
 byteSwap(ctx->input, 16);
 MD5Transform(ctx->hash, ctx->input);
 buf += t;
 len -= t;
 /* Process data in 64-byte chunks */
 while (len >= 64) {
 memcpy(ctx->input, buf, 64);
 byteSwap(ctx->input, 16);

 MD5Transform(ctx->hash, ctx->input);
 buf += 64;
 len -= 64;
 }
 /* Buffer any remaining bytes of data */
 memcpy(ctx->input, buf, len);
}
/* Final wrapup - pad to 64-byte boundary with the bit pattern
 * 1 0* (64-bit count of bits processed, LSB-first) */
void
MD5Final(byte digest[16], struct MD5Context *ctx)
{
 int count = ctx->bytes[0] & 0x3F; /* Bytes mod 64 */
 byte *p = (byte *)ctx->input + count;
 /* Set the first byte of padding to 0x80. There is always room. */
 *p++ = 0x80;
 /* Bytes of zero padding needed to make 56 bytes (-8..55) */
 count = 56 - 1 - count;
 if (count < 0) { /* Padding forces an extra block */
 memset(p, 0, count+8);
 byteSwap(ctx->input, 16);
 MD5Transform(ctx->hash, ctx->input);
 p = (byte *)ctx->input;
 count = 56;
 }
 memset(p, 0, count);
 byteSwap(ctx->input, 14);
 /* Append 8 bytes of length in *bits* and transform */
 ctx->input[14] = ctx->bytes[0] << 3;
 ctx->input[15] = ctx->bytes[1] << 3 ctx->bytes[0] >> 29;
 MD5Transform(ctx->hash, ctx->input);
 byteSwap(ctx->hash, 4);
 memcpy(digest, ctx->hash, 16);
 memset(ctx, 0, sizeof(*ctx)); /* In case it's sensitive */
}
/* The four core functions */
#define F1(x, y, z) (z ^ (x & (y ^ z)))
#define F2(x, y, z) F1(z, x, y)
#define F3(x, y, z) (x ^ y ^ z)
#define F4(x, y, z) (y ^ (x ~z))
/* This is the central step in the MD5 algorithm. */
#define MD5STEP(f,w,x,y,z,in,s) (w += f(x,y,z)+in, w = (w<<s w>>32-s) + x)
/* The heart of the MD5 algorithm. */
void
MD5Transform(word32 hash[4], word32 const input[16])
{
 register word32 a = hash[0], b = hash[1], c = hash[2], d = hash[3];

 MD5STEP(F1, a, b, c, d, input[ 0]+0xd76aa478, 7);
 MD5STEP(F1, d, a, b, c, input[ 1]+0xe8c7b756, 12);
 MD5STEP(F1, c, d, a, b, input[ 2]+0x242070db, 17);
 MD5STEP(F1, b, c, d, a, input[ 3]+0xc1bdceee, 22);
 MD5STEP(F1, a, b, c, d, input[ 4]+0xf57c0faf, 7);
 MD5STEP(F1, d, a, b, c, input[ 5]+0x4787c62a, 12);
 MD5STEP(F1, c, d, a, b, input[ 6]+0xa8304613, 17);
 MD5STEP(F1, b, c, d, a, input[ 7]+0xfd469501, 22);
 MD5STEP(F1, a, b, c, d, input[ 8]+0x698098d8, 7);
 MD5STEP(F1, d, a, b, c, input[ 9]+0x8b44f7af, 12);
 MD5STEP(F1, c, d, a, b, input[10]+0xffff5bb1, 17);

 MD5STEP(F1, b, c, d, a, input[11]+0x895cd7be, 22);
 MD5STEP(F1, a, b, c, d, input[12]+0x6b901122, 7);
 MD5STEP(F1, d, a, b, c, input[13]+0xfd987193, 12);
 MD5STEP(F1, c, d, a, b, input[14]+0xa679438e, 17);
 MD5STEP(F1, b, c, d, a, input[15]+0x49b40821, 22);

 MD5STEP(F2, a, b, c, d, input[ 1]+0xf61e2562, 5);
 MD5STEP(F2, d, a, b, c, input[ 6]+0xc040b340, 9);
 MD5STEP(F2, c, d, a, b, input[11]+0x265e5a51, 14);
 MD5STEP(F2, b, c, d, a, input[ 0]+0xe9b6c7aa, 20);
 MD5STEP(F2, a, b, c, d, input[ 5]+0xd62f105d, 5);
 MD5STEP(F2, d, a, b, c, input[10]+0x02441453, 9);
 MD5STEP(F2, c, d, a, b, input[15]+0xd8a1e681, 14);
 MD5STEP(F2, b, c, d, a, input[ 4]+0xe7d3fbc8, 20);
 MD5STEP(F2, a, b, c, d, input[ 9]+0x21e1cde6, 5);
 MD5STEP(F2, d, a, b, c, input[14]+0xc33707d6, 9);
 MD5STEP(F2, c, d, a, b, input[ 3]+0xf4d50d87, 14);
 MD5STEP(F2, b, c, d, a, input[ 8]+0x455a14ed, 20);
 MD5STEP(F2, a, b, c, d, input[13]+0xa9e3e905, 5);
 MD5STEP(F2, d, a, b, c, input[ 2]+0xfcefa3f8, 9);
 MD5STEP(F2, c, d, a, b, input[ 7]+0x676f02d9, 14);
 MD5STEP(F2, b, c, d, a, input[12]+0x8d2a4c8a, 20);

 MD5STEP(F3, a, b, c, d, input[ 5]+0xfffa3942, 4);
 MD5STEP(F3, d, a, b, c, input[ 8]+0x8771f681, 11);
 MD5STEP(F3, c, d, a, b, input[11]+0x6d9d6122, 16);
 MD5STEP(F3, b, c, d, a, input[14]+0xfde5380c, 23);
 MD5STEP(F3, a, b, c, d, input[ 1]+0xa4beea44, 4);
 MD5STEP(F3, d, a, b, c, input[ 4]+0x4bdecfa9, 11);
 MD5STEP(F3, c, d, a, b, input[ 7]+0xf6bb4b60, 16);
 MD5STEP(F3, b, c, d, a, input[10]+0xbebfbc70, 23);
 MD5STEP(F3, a, b, c, d, input[13]+0x289b7ec6, 4);
 MD5STEP(F3, d, a, b, c, input[ 0]+0xeaa127fa, 11);
 MD5STEP(F3, c, d, a, b, input[ 3]+0xd4ef3085, 16);
 MD5STEP(F3, b, c, d, a, input[ 6]+0x04881d05, 23);
 MD5STEP(F3, a, b, c, d, input[ 9]+0xd9d4d039, 4);
 MD5STEP(F3, d, a, b, c, input[12]+0xe6db99e5, 11);
 MD5STEP(F3, c, d, a, b, input[15]+0x1fa27cf8, 16);
 MD5STEP(F3, b, c, d, a, input[ 2]+0xc4ac5665, 23);

 MD5STEP(F4, a, b, c, d, input[ 0]+0xf4292244, 6);
 MD5STEP(F4, d, a, b, c, input[ 7]+0x432aff97, 10);
 MD5STEP(F4, c, d, a, b, input[14]+0xab9423a7, 15);
 MD5STEP(F4, b, c, d, a, input[ 5]+0xfc93a039, 21);
 MD5STEP(F4, a, b, c, d, input[12]+0x655b59c3, 6);
 MD5STEP(F4, d, a, b, c, input[ 3]+0x8f0ccc92, 10);
 MD5STEP(F4, c, d, a, b, input[10]+0xffeff47d, 15);
 MD5STEP(F4, b, c, d, a, input[ 1]+0x85845dd1, 21);
 MD5STEP(F4, a, b, c, d, input[ 8]+0x6fa87e4f, 6);
 MD5STEP(F4, d, a, b, c, input[15]+0xfe2ce6e0, 10);
 MD5STEP(F4, c, d, a, b, input[ 6]+0xa3014314, 15);
 MD5STEP(F4, b, c, d, a, input[13]+0x4e0811a1, 21);
 MD5STEP(F4, a, b, c, d, input[ 4]+0xf7537e82, 6);
 MD5STEP(F4, d, a, b, c, input[11]+0xbd3af235, 10);
 MD5STEP(F4, c, d, a, b, input[ 2]+0x2ad7d2bb, 15);
 MD5STEP(F4, b, c, d, a, input[ 9]+0xeb86d391, 21);

 hash[0] += a; hash[1] += b; hash[2] += c; hash[3] += d;
}




Listing Three

/* noise.h -- get environmental noise for RNG */
#include "usuals.h"
word32 noise(void);

/* noise.c -- Get environmental noise.
 * This is adapted from code in the Pretty Good Privacy (PGP) package.
 * Written by Colin Plumb. */
#include <time.h>
#include "usuals.h"
#include "randpool.h"
#include "noise.h"

#if defined(MSDOS) defined(__MSDOS__) /* Use 1.19 MHz PC timer */
#include <dos.h> /* for enable() and disable() */
#include <conio.h> /* for inp() and outp() */

/* This code gets as much information as possible out of 8253/8254 timer 0,
 * which ticks every .84 microseconds. There are three cases:
 * 1) Original 8253. 15 bits available, as the low bit is unused.
 * 2) 8254, in mode 3. The 16th bit is available from the status register.
 * 3) 8254, in mode 2. All 16 bits of the counters are available.
 * (This is not documented anywhere, but I've seen it!)
 * This code repeatedly tries to latch the status (ignored by an 8253) and
 * sees if it looks like xx1101x0. If not, it's definitely not an 8254.
 * Repeat this a few times to make sure it is an 8254. */
static int
has8254(void)
{
 int i, s1, s2;
 for (i = 0; i < 5; i++) {
 disable();
 outp(0x43, 0xe2); /* Latch status for timer 0 */
 s1 = inp(0x40); /* If 8253, read timer low byte */
 outp(0x43, 0xe2); /* Latch status for timer 0 */
 s2 = inp(0x40); /* If 8253, read timer high byte */
 enable();
 if ((s1 & 0x3d) != 0x34 (s2 & 0x3d) != 0x34)
 return 0; /* Ignoring status latch; 8253 */
 }
 return 1; /* Status reads as expected; 8254 */
}
static unsigned
read8254(void)
{
 unsigned status, count;
 disable();
 outp(0x43, 0xc2); /* Latch status and count for timer 0 */
 status = inp(0x40);
 count = inp(0x40);
 count = inp(0x40) << 8;
 enable();
 /* The timer is usually in mode 3, but some BIOSes use mode 2. */
 if (status & 2)
 count = count>>1 (status & 0x80)<<8;

 return count;
}
static unsigned
read8253(void)
{
 unsigned count;
 disable();
 outp(0x43, 0x00); /* Latch count for timer 0 */
 count = (inp(0x40) & 0xff);
 count = (inp(0x40) & 0xff) << 8;
 enable();
 return count >> 1;
}
#endif /* MSDOS __MSDOS__ */

#ifdef UNIX
#include <sys/types.h>
#include <sys/time.h> /* For gettimeofday() */
#include <sys/times.h> /* for times() */
#include <stdlib.h> /* For qsort() */
#define N 15 /* Number of deltas to try (at least 5, preferably odd) */
/* Function needed for qsort() */
static int
noiseCompare(void const *p1, void const *p2)
{ return *(int const *)p1 - *(int const *)p2; }
/* Find the resolution of the gettimeofday() clock */
static unsigned
noiseTickSize(void)
{
 int i = 0, j = 0, d[N];
 struct timeval tv0, tv1, tv2;
 gettimeofday(&tv0, (struct timezone *)0);
 tv1 = tv0;
 do {
 gettimeofday(&tv2, (struct timezone *)0);
 if (tv2.tv_usec > tv1.tv_usec+2) {
 d[i++] = tv2.tv_usec - tv0.tv_usec +
 1000000 * (tv2.tv_sec - tv0.tv_sec);
 tv0 = tv2;
 j = 0;
 } else if (++j > 10000) /* Always getting <= 2 us, */
 return 2; /* so assume 2us ticks */
 tv1 = tv2;
 } while (i < N);
 /* Return average of middle 5 values (rounding up) */
 qsort(d, N, sizeof(d[0]), noiseCompare);
 return (d[N/2-2]+d[N/2-1]+d[N/2]+d[N/2+1]+d[N/2+2]+4)/5;
}
#endif /* UNIX */
/* Add as much time-dependent random noise to the randPool as possible. */
word32
noise(void)
{
 static word32 lastcounter;
 word32 delta;
 time_t tnow;
 clock_t cnow;
#if defined(MSDOS) defined(__MSDOS__)
 static unsigned deltamask = 0;

 unsigned t;
 if (deltamask == 0)
 deltamask = has8254() ? 0xffff : 0x7fff;
 t = (deltamask & 0x8000) ? read8254() : read8253();
 randPoolAddBytes((byte const *)&t, sizeof(t));
 delta = deltamask & (t - (unsigned)lastcounter);
 lastcounter = t;
#elif defined(VMS)
 word32 t[2];
 SYS$GETTIM(t); /* VMS hardware clock increments by 100000 per tick */
 randPoolAddBytes((byte const *)t, sizeof(t));
 delta = (t[0]-lastcounter)/100000;
 lastcounter = t[0];
#elif defined(UNIX)
 static unsigned ticksize = 0;
 struct timeval tv;
 struct tms tms;
 gettimeofday(&tv, (struct timezone *)0);
 randPoolAddBytes((byte const *)&tv, sizeof(tv));
 cnow = times(&tms);
 randPoolAddBytes((byte const *)&tms, sizeof(tms));
 randPoolAddBytes((byte const *)&cnow, sizeof(cnow));
 tv.tv_usec += tv.tv_sec * 1000000; /* Unsigned, so wrapping is okay */
 if (!ticksize)
 ticksize = noiseTickSize();
 delta = (tv.tv_usec-lastcounter)/ticksize;
 lastcounter = tv.tv_usec;
#else
#error Unknown operating system
#endif
 cnow = clock();
 randPoolAddBytes((byte const *)&cnow, sizeof(cnow));
 tnow = time((time_t *)0); /* Read slowest clock last */
 randPoolAddBytes((byte const *)&tnow, sizeof(tnow));
 return delta;
}



Listing Four

/* randpool.h -- declarations for randpool.c */
#include "usuals.h"
#define RANDPOOLBITS 3072 /* Whatever size you need (must be > 512) */
void randPoolStir(void);
void randPoolAddBytes(byte const *buf, unsigned len);
void randPoolGetBytes(byte *buf, unsigned len);
byte randPoolGetByte(void);


/* randpool.c -- True random number computation and storage
 * This is adapted from code in the Pretty Good Privacy (PGP) package.
 * Written by Colin Plumb. */
#include <stdlib.h>
#include <string.h>
#include "randpool.h"
#include "md5.h"

#define RANDKEYWORDS 16 /* This is a parameter of the the MD5 algorithm */

/* The pool must be a multiple of the 16-byte (128-bit) MD5 block size */
#define RANDPOOLWORDS ((RANDPOOLBITS+127 & ~127) >> 5)
#if RANDPOOLWORDS <= RANDKEYWORDS
#error Random pool too small - please increase RANDPOOLBITS in randpool.h
#endif
/* Must be word-aligned, so make it words. Cast to bytes as needed. */
static word32 randPool[RANDPOOLWORDS]; /* Random pool */
static word32 randKey[RANDKEYWORDS]; /* Next stirring key */
static unsigned randPoolGetPos = sizeof(randPool); /* Position to get from */
static unsigned randKeyAddPos = 0; /* Position to add to */
/* "Stir in" any random seed material before removing any random bytes. */
void
randPoolStir(void)
{
 int i;
 word32 iv[4];
 byteSwap(randPool, RANDPOOLWORDS); /* convert to word32s */
 byteSwap(randKey, RANDKEYWORDS);
 /* Start IV from last block of randPool */
 memcpy(iv, randPool+RANDPOOLWORDS-4, sizeof(iv));
 /* CFB pass */
 for (i = 0; i < RANDPOOLWORDS; i += 4) {
 MD5Transform(iv, randKey);
 iv[0] = randPool[i ] ^= iv[0];
 iv[1] = randPool[i+1] ^= iv[1];
 iv[2] = randPool[i+2] ^= iv[2];
 iv[3] = randPool[i+3] ^= iv[3];
 }
 memset(iv, 0, sizeof(iv)); /* Wipe IV from memory */
 byteSwap(randPool, RANDPOOLWORDS); /* Convert back to bytes */
 memcpy(randKey, randPool, sizeof(randKey)); /* Get new key */
 randKeyAddPos = 0; /* Set up pointers for future use. */
 randPoolGetPos = sizeof(randKey);
}
/* Make a deposit of information (entropy) into the pool. */
void
randPoolAddBytes(byte const *buf, unsigned len)
{
 byte *p = (byte *)randKey+randKeyAddPos;
 unsigned t = sizeof(randKey) - randKeyAddPos;
 while (len > t) {
 len -= t;
 while (t--)
 *p++ ^= *buf++;
 randPoolStir(); /* sets randKeyAddPos to 0 */
 p = (byte *)randKey;
 t = sizeof(randKey);
 }
 if (len) {
 randKeyAddPos += len;
 do
 *p++ ^= *buf++;
 while (--len);
 randPoolGetPos = sizeof(randPool); /* Force stir on get */
 }
}
/* Withdraw some bits from the pool. */
void
randPoolGetBytes(byte *buf, unsigned len)

{
 unsigned t;
 while (len > (t = sizeof(randPool) - randPoolGetPos)) {
 memcpy(buf, (byte const *)randPool+randPoolGetPos, t);
 buf += t;
 len -= t;
 randPoolStir();
 }
 memcpy(buf, (byte const *)randPool+randPoolGetPos, len);
 randPoolGetPos += len;
}
/* Get a single byte */
byte
randPoolGetByte(void)
{
 if (randPoolGetPos == sizeof(randPool))
 randPoolStir();
 return ((byte const *)randPool)[randPoolGetPos++];
}











































November, 1994
PROGRAMMER'S BOOKSHELF


Embedded-Systems Development




Douglas Reilly


Doug is the owner of Access Microsystems, a software-development house
specializing in C/C++ software development. He is also the author of the
BTFILER and BTVIEWER Btrieve file utilities. Doug can be contacted at 404
Midstreams Road, Brick, NJ 08724, or on CompuServe at 74040,607.


When people comment about the number of computers I have at home (two working
and two not-quite-working PCs), I remind them that they probably have at least
four--and likely more--computers around their house. What people sometimes
forget (or don't realize at all) is that computers are not limited to the
now-familiar PC-style boxes. Microwaves, stereos, VCRs, and even the family
station wagon are chock full of computers we refer to as "embedded systems."
And where there are computers, there must be programmers.
The two books I'll examine here are explicitly written to help ease the
learning curve for embedded-systems programmers. Interestingly, both books
also shed light on the recent trend of using C rather than assembly language
in embedded-systems programming.


Programming Microcontrollers in C 


Programming Microcontrollers in C, by Ted Van Sickle, is designed to help the
experienced embedded-systems programmer (who is likely experienced in assembly
languages) program embedded systems in C. His C tutorial is thorough,
beginning with the very basic elements of the language (fundamental data
types, control structures, and so on), progressing through pointer usage, and
going all the way to structures, unions, pointers, and functions. Special
emphasis is given to areas of C that are often misunderstood, even by seasoned
C programmers. For example, detailed instructions for decomposing complex
pointer declarations are provided. I wish I'd had access to such a tutorial
when I was learning C many years ago.
Several functional programs of reasonable length are covered, selected for
their ability to teach various elements of C, and not just in handling
embedded-system-type problems. The introduction to C, amounting to just over
one quarter of the book, would alone be worth the price of the book. However,
there is a great deal more. 
Van Sickle next takes a chapter to describe some of the basic functions of a
microcontroller, placing emphasis on some of the features that, while
sometimes used in general programming, take on much greater significance in
embedded systems. For example, timers and analog-to-digital converters are
discussed, as well as some special aspects of memory access in
microcontrollers.
The final sections of Programming Microcontrollers in C are devoted to details
of small 8-bit, large 8-bit, and larger Motorola microcontrollers, giving
details of likely uses of each class of microcontrollers as well as
information on the kinds of restrictions for each in common C compilers.
Appendices offer detailed specifications on many of the microcontrollers
listed, as well as header files that allow for compilation of some of the
programs from the book. A companion diskette is available at an additional
cost of $30.00.
My reservations about Programming Microcontrollers in C are minor. The
discussion of printf() and related functions could have been improved by a
discussion of the overhead they carry. This is, of course, critical in
embedded-systems applications. Another minor lapse is the equating of a FILE*
with a file handle. A FILE* may have a file handle, but a FILE * is not a file
handle. These are minor nits and can easily be forgiven because of the overall
quality of the text. A more serious limitation, acknowledged by the author, is
the fact that coverage is limited to Motorola microcontrollers, with no
discussion of other vendors' products.


Embedded Systems Programming in C and Assembler 


John Forrest Brown's Embedded Systems Programming in C and Assembler takes a
slightly different approach to explaining the integration of C into the
embedded-systems-programming world. Some knowledge of both C and
embedded-systems programming is presumed, and knowledge of the two major
microprocessor architectures (Intel and Motorola) does not hurt, either.
Brown begins with a useful introduction defining embedded-systems programming.
Although he recognizes that much of what embedded systems used to perform can
now be reasonably done using a dedicated PC, he explains the situations in
which the flexibility of the PC might not be the asset it normally is. This is
a good discussion of the virtues and perils of developing embedded systems in
today's ever-changing corporate and technological climate.
After discussing essentially the same issues as Van Sickle with regards to the
major parts of embedded-systems programming (interacting with timers,
interrupt service routines, analog-to-digital converters, and so on), Brown
moves on to several chapters of nuts-and-bolts programming, using examples
from both Intel and Motorola wherever possible. Handling interrupts, an
essential part of embedded-systems programming, is covered in great detail,
showing where C or assembler might be more appropriate as well as the many
differences between Intel and Motorola processors that cannot be easily hidden
from the embedded-systems programmer. In all cases, Brown discusses possible
timing problems, which can be very difficult to diagnose, along with possible
solutions. As is fitting in a book devoted to C and assembler programming,
mixed-language issues are discussed, shedding light on another area where
microprocessor differences cannot be hidden.
Brown moves well beyond the microwave and other appliances in discussions of
issues that surround embedded-systems programming. Multiprocessing and
interprocess synchronization are discussed in some detail. Some previously
discussed topics are revisited in the context of multitasking. The examples in
the appendices have a marked military bent (missile-to-aircraft interface,
pilot control panel, and so on) and include, not surprisingly, Department of
Defense guidelines for defense-system software development (DOD-STD-2167A).
Beyond these rigid guidelines, Brown presents ideas for more careful thought
about the design process, as well as more than the usual "documentation is a
good thing" talk. Embedded Systems Programming in C and Assembler gives you a
start at developing a methodology about the embedded-systems software design
and development process.
As mentioned, most examples are related to the aircraft industry. The examples
and explanations of why they do what they do take up over half the book. The
examples come with explanations that further examine the problems and the
workarounds found in the main body of the text. The code for the examples is
included on a diskette that comes with the book. 
Brown misses only a couple of opportunities to make the best explanation
possible of migrating toward C for embedded-systems programming. For example,
in a discussion of debugging code, he misses an opportunity to explain one of
the few good uses of the C preprocessor that remains, even in the C++ world:
employing #ifdefs to use a single set of source-code modules that can simply
be compiled with different sets of #defines to enable or disable debugging. As
in Van Sickle's book, C++ is mentioned, but not covered. Many of the tasks of
the embedded-systems programmer can be properly handled using C++ classes.
Maybe the second edition of each book will cover C++.


Conclusion


I recommend both books if Motorola embedded-systems work is in your future.
Van Sickle's book will easily pay for itself, and Brown's can be useful if you
think there is any chance that code will be moved to an Intel platform. 
Programming Microcontrollers in C
Ted Van Sickle
HighText Publications, 1994
394 pp., $29.95 
ISBN 1-878707-14-0
Embedded Systems Programming in C and Assembler
John Forrest Brown
Van Nostrand Reinhold, 1994
304 pp., $49.50 
ISBN 0-442-01817-7







November, 1994
SWAINE'S FLAMES


Celebrity Status


I have this software-developer friend, X (not his real name), who doesn't
program in C, isn't on the Internet, and has no social-security number. No,
he's not imaginary. But he's working on it. X is swimming against the current.
The trend is for everybody to become better known. In the future, Andy Warhol
once said, everybody will be famous for 15 minutes. Well, it's the future, and
some of you may be asking, "Is it my turn yet?" Then again, you may be asking,
"Who's Andy Warhol?" If so, I'm saying, "He was a friend of Lou Reed's who
lives on today as a demographic profile in various marketing databases, as
shall we all when our numbers are up." 
I, of course, am famous, but perhaps because I am childless, I sometimes worry
about the impermanence of the impression I will leave behind when my number is
up. Will the subject matter of these columns lose some of its fascination over
the next thousand years? Is DDJ using acid-free paper? Who has custody of
Nixon's enemies list? On the other hand, I don't have to watch Barney, so
maybe it's a wash. 
I worry about other things, like how you reduce crime by making more things
illegal and whether software agents keep 15 percent of the information they
process, but mostly these days I worry about being too wired. 
Wired magazine, and it should know, calls the talking half of Penn and Teller
the most wired person in America, which is interesting given that Penn isn't
even on the Internet. What they mean, I think, is that he's a celebrity who
knows something about computers. 
A celebrity, as somebody said, is somebody well-known for their
well-known-ness. There's probably a fine irony in the fact that I've forgotten
who said that. 
But it should be no surprise that Penn knows something about computers.
Magician is the second geekiest entertainment profession, right behind
ventriloquist. If you think back to high-school talent shows, you'll see that
I'm right. 
Hey, I don't like perpetuating this insulting popular image of the thin,
bespectacled, squeaky-voiced, self-absorbed, boring technoweenie geek, but you
know and I know that the stereotype is going to be with us until Bill Gates's
number is up, and there's not a darned thing we can do about it. 
Penn can't be called a geek anymore, though, because he has hired someone to
geek for him. "Personal Geek" has become a profession, soon to get its own
section in the classifieds in Variety. Celebrities and politicians, used to
getting fit by hiring personal trainers, are now getting wired by hiring
personal geeks. 
But, while my inside line to the celebrity scene has dried up since cousin
Frack retired from the Ice Follies, I wonder if even celebrities and
politicians might not regret getting too wired. Getting wired can make you a
little too well-known. It's hard to recall a persona launched into cyberspace.
Richard Nixon didn't want to be immortalized as the villain in The Haldeman
Diaries: The CD, and I don't want to be immortalized as a demographic profile
in various marketing databases. 
There is something to be said for being more Teller than Penn. 
Michael Swaineeditor-at-large











































November, 1994
OF INTEREST
The Jump-Start Client/Server Kit from Powersoft enables Clipper, FoxPro, and
dBase users to convert existing data to a true multiuser RDBMS for developing
client/server applications. The Jump-Start Client/Server Kit consists of
PowerBuilder Desktop (the desktop version of Powersoft's client/server
application development tool), ERwin for PowerBuilder Desktop (a PowerBuilder
Desktop-specific version of Logic Works' data-modeling tool), a three-user
Windows version of the Watcom SQL relational database, and PowerViewer (a
Powersoft's client/server information-access tool used to create queries,
reports, and business graphs). The Jump-Start Client/Server Kit has an
introductory price of $599.00. Reader service no. 20.
Powersoft Corp.
561 Virginia Rd.
Concord, MA 01742
508-287-1500 
The Information Brokerage, an online marketplace for software development and
information, has been announced by the Object Management Group (OMG) and
Connect Inc. The Information Brokerage is an information service and
marketplace for component software that lets vendors list their software,
along with descriptive information and specifications. To purchase software,
Information Brokerage subscribers install Connect client software, which is
available through the Information Brokerage on their desktop, and dial-in to
the network. 
Initially, the service will provide individual objects which can be "taken for
a test drive," purchased, and downloaded onto the subscriber's system. Later,
the Information Brokerage will expand to include class libraries as standards
define the architecture of objects and allow true component-level applications
to be architected. Subscribers will also be able to participate in forums with
other users and/or vendors to discuss current problems or successes in their
environments. 
The Information Brokerage will go online in January 1995. Developers
interested in placing software in the Information Brokerage, and subscribers
who want to get connected to the industry's first commercial, online market
should contact the OMG directly. Reader service no. 21.
Object Management Group
492 Old Connecticut Path
Framingham, MA 01701
508-820-4300
LANOpen is a specification for an open architecture API for Flash PROM
firmware and has been released by McAfee. The software is designed to
facilitate interoperability between different LAN-management applications.
McAfee will publish the LANOpen specification so that any PC LAN Management
vendor can support the interface in their applications. LANOpen will allow,
for example, one vendor's asset- management application to pass information
through a common set of calls to another vendor's help-desk application.
Reader service no. 22.
McAfee
2710 Walsh Ave.
Santa Clara, CA 95051
408-980-3637
Novell has announced the availability of international-language versions of
its LAN WorkPlace 2.4 and LAN WorkGroup 4.2 desktop TCP/IP products. Fully
translated French, Spanish, Portuguese, and German language versions are now
available. A Japanese version is available through Novell K.K. in Japan.
Because TCP/IP allows interoperability among dissimilar systems, it has become
the worldwide standard protocol for distributed, cross-platform networks. More
than four million nodes of TCP/IP were installed worldwide by the end of 1993,
with more than eight million projected by the end of 1994. LAN WorkPlace
connects MS Windows and DOS users to UNIX minis and mainframes. LAN WorkGroup
provides access for NetWare Networks to UNIX systems from their desktops. Both
products are integral to the right-sizing efforts currently underway in
corporations as IS managers move mission-critical information from mainframes
to distributed computing environments.
LAN WorkPlace has a suggested retail price of $399.00 for a single user,
$1995.00 for 10 users, and $12,995.00 for 100 users. LAN WorkGroup is
available for 5, 10, 20, 50, 100, and 250 users. It has a suggested price
ranging from $1500.00 for 5 users to $12,495.00 for 250 users. Reader service
no. 23.
Novell Inc.
UNIX Systems Group
2180 Fortune Drive
San Jose, CA 95131
408-577-4000
Hewlett-Packard's HP Object-Oriented DCE/9000 (OODCE/9000) is a recently
introduced toolset for distributed-computing (DCE) application development for
its HP 9000 family of workstations. Targeted for C++ programmers, HP
OODCE/9000 encapsulates DCE API commands into C++ classes with default DCE
behavior. HP plans that applications written with HP OODCE/9000 will
eventually interoperate with applications based on the OMG's CORBA 2.0. HP
OODCE/9000 and HP DCE/9000 are integrated with the HP SoftBench
application-development environment. 
OODCE/9000 costs $995.00 per developer; there is no additional run-time fee.
Documentation costs $2000.00. OODCE/9000 requires an HP DCE/9000 and HP C++
compiler. Reader service no. 24.
Hewlett-Packard Co.
3000 Hanover Street
Palo Alto, CA 94304
415-857-1501
Visigenic has entered into a technologylicensing agreement to provide Informix
with its cross-platform ODBC tools. The license will provide Informix with
Visigenic's Windows and UNIX ODBC drivers for Sybase, Oracle, and Informix
databases, and the Visigenic ODBC for UNIX Driver Manager. Consequently,
Informix users will be able to develop applications using a single API that
will transparently access multiple relational database-management systems. 
Visigenic's ODBC SDK for UNIX 2.0, which sells for $995.00 per developer, is
available for Sun, HP, and IBM platforms. Reader service no. 25.
Visigenic Software
951 Mariners Island, Suite 460
San Mateo, CA 94404
415-286-2468
VB HelpWriter Lite, a Windows help-creation software package, is available at
no charge from Teletech Systems. The system features an integrated word
processor and includes a CASE-like facility that automatically generates a
complete help-file template from any Visual Basic source code. The Lite
Edition creates help files of up to 20 topics which can then be distributed
royalty free. VB HelpWriter Professional, a full-featured version of the
software, sells for $99.00. The Lite Edition is available electronically in
Library 5 of CompuServe's MSBASIC forum or via the Internet at
ftp.netcom.com/ftp/pub/vb_helpwriter. Reader service no. 26.
Teletech Systems
750 Birch Ridge Drive
Roswell, GA 30076
404-475-6985


















December, 1994
EDITORIAL


Not So Strange Bedfellows


When looking back at this autumn's off-year elections, we may well remember
1994 not as the year one scoundrel was voted out and another in (there's
nothing new about that), but as the year computers began to have an impact on
the electoral process. Sure, computers have had a role in elections for the
last couple of decades, but usually in background mode--printing mailing
labels, cross-tabbing voter preferences, and the like. This year's election
was different, however, and the prospects are that in the upcoming 1996
presidential election we'll see computers play an even greater part in
deciding who stays at home and who goes to the state house.
This fall, politicians realized that the future of politics may well be linked
to computer networks. At the national, state, and local levels, candidates are
roaring down the information highway at breakneck speeds. These days, a
politico without an e-mail address is as naked as, well, Wilbur Mills in the
Reflecting Pool. Want to let President Clinton know what you think about
health care? Drop him a note at president@whitehouse.gov. Need to tell the
Democratic party you're willing to go door to door? They'd be glad to hear
from you at 72203.601@compuserve.com. Know the name of a good barber? Newt
Gingrich needs one, and he's only a few keystrokes away at
georgia6@hr.house.gov. All in all, nearly 100 members of the U.S. House and
Senate have e-mail addresses, a mere drop in the ballot box to the greater
number of state and local candidates who are online. 
Although e-mail remains the most common form of political dialogue, candidates
and parties are also using computer networks to distribute position papers,
press releases, and related campaign material. In CompuServe's Republican
Forum, for instance, you can download everything from discussions of the crime
bill to comparisons of how the Dole/Packwood approach to health care stacks up
against both the Clinton/Mitchell and Clinton/Gephardt proposals. You can even
get GIF image files of Reagan, Bush, Quayle, and the Republican elephant logo.

One reason behind this surge in online political activity is that electronic
dissemination of information is cheaper than the traditional hardcopy
alternatives. It cost a lot of money to print and mail all those political
brochures you threw away this fall. Computer networks have also become the
rallying point for grass-root activism, as online aficionados download and
share information with their nonnetworked neighbors. Still, the main reason
for all the excitement is that position papers can be made available to the
public unfiltered by what's usually seen as a hostile press. (Political
activist and former DDJ editor Jim Warren initially met disinterest and
skepticism when working to bring the proceedings of the California legislature
and other government-related information onto the Internet. Things didn't
start popping until he pointed out to politicians that they could talk
directly to the public without having to go through the press.) 
Both politicians and voters have realized that the press--in particular the
broadcast media, the primary purveyor of political information in the
U.S.--hasn't been doing its job. In his book, Democracy and the Problem of
Free Speech, for instance, Cass Sunstein reports that about 60 percent of the
press coverage in the 1988 national campaign dealt with who was leading the
race on a day-to-day basis, with only 30 percent focusing on substantive
issues and qualifications. More specifically, 75 percent of CBS's coverage of
the 1988 "Super Tuesday" primaries dealt with these "horse-race" issues, while
only 9 percent of the comments had substance. Overall, according to one study
that examined more than 7500 broadcast and print stories, less than 10 percent
of the political stories were on policy issues and less than 20 percent were
on candidate qualifications, while more than 36 percent were again horse-race
oriented. Similarly, network broadcasts presenting uninterrupted blocks of
speech from presidential candidates averaged about 9.8 seconds in 1988, down
from 42.3 seconds in 1968. (Of course, the cynical among us might wonder if
the candidates had much of importance to say.) 
There are numerous reasons why the broadcast media has shifted from its role
as public watchdog to public clown, not the least of which, everyone from the
FCC to broadcast advertisers believes that what we want is entertainment, not
substantive information. Consider that in 1949, the FCC said that the role of
broadcasting was "the development of an informed public opinion through the
dissemination of news and ideas concerning the vital public issues of the
day." Contrast that to the words of Mark Fowler, who said of his years as FCC
chairman during the 1980s: "It was time to move away from thinking about
broadcasters as trustees. It was time to treat them the way almost everyone
else in society does--that is, as businesses. Television is just another
appliance. It's a toaster with pictures." 
Whatever the reasons, the net result is that the broadcast media has served up
watered-down information that's void of real issues. As an alternative to
broadcast journalism, the information highway gives voters the opportunity to
examine facts and form opinions--even though the FCC and media giants still
see the information highway as entertainment. At the recent "Business in the
21st Century" conference in Kansas City, for instance, a speaker from the FCC
explained how the information highway will be defined by "video dial tone" and
entertainment broadcasting.
Concurring with this belief that entertainment is the driving force behind
computer networks are political pollmeisters, the primary (financial)
benefactors of the media's move to public-opinion polls. Pollster Richard
Hertz believes that "even in mass-market vehicles like TV and newspapers,
people aren't paying much attention to politics. They're probably buying
CompuServe to meet the cyberperson of their dreams or check out their stocks
or make an airline reservation." He then brushes off online political activity
by simply saying "with a telephone and a 29-cent stamp, you can do pretty much
the same thing."
An October 1994 survey by MacWorld magazine would suggest otherwise, however.
In a national poll that asked 600 adults what interactive capabilities they
wanted, the magazine found that electronic voting in elections was number one
on the list, while entertainment-oriented video-on-demand was tenth. Not only
that, but opportunities for participating in electronic town halls, obtaining
government data, and getting tax information were also preferred over
entertainment.
What started out this fall as a trickle has a chance of building into a flood
in 1996, as voters use computer networks as a political tool. No one expects
every voter to be online, nor will all of those who are tapped into the net be
necessarily interested in politics. Still, more than one national election has
been decided by fewer than a million votes, and with 20 million U.S. citizens
already on the Internet alone, online politics can be a force. If nothing
else, an active, informed public with a voice and vehicle of its own might
force broadcast news to clean up its act. If so, we'll all win--and then maybe
I can turn on my television again.
Jonathan Ericksoneditor-in-chief











































December, 1994
LETTERS


More on Fractal Rulers 


Dear DDJ, 
I was pleased that you published my letter regarding Fractal Rules (DDJ,
October 1994). However, a typographical error crept into the program listing
(my fault, no doubt). The expression min(x^(unsigned
long)(x+1),ticksPerSegment--1) should have read min(i^(unsigned
long)(i+1),ticksPerSegment--1). My apology to anyone confused by this. 
Michael Lee Finney
Lynchburg, Virginia


CISC is Dead--Long Live CISC!


Dear DDJ,
I experienced a bit of dj vu while reading "Programming Paradigms" (DDJ,
September 1994). All that talk of a new hardware platform with limited
development tools made me think that I was back in 1984 listening to the
latest Apple dogma.
I just don't understand why Apple expects everyone to buy into the RISC
religion. They push RISC's speed but fail to emphasize that the larger
programs will require at least double the memory ($$$), more hard-disk space
($$$), and longer load times (@#%^&*!).
In the meantime, many RISC concepts have found their way back into the CISC
technology. If Apple wanted my 2 cents worth, I'd ask them to build a new line
of machines based upon the Motorola 68060. CISC is not dead.
Neil Rieck
Kitchener, Ontario


OS/2 Woes


Dear DDJ,
I just finished reading Al Stevens' article on C++ and OS/2. I no longer feel
bad for consistently crashing OS/2. I, too, am a software developer and have
to write apps for Windows and OS/2. Some days it is a challenge just to get
OS/2 to stay running long enough to get the project done. I have had many of
the "OS/2 gurus" at my company look at my system to try to fix this problem;
they all say it's not OS/2's fault, it's something I am doing. I do not
subscribe to this point of view at all. Lowly DOS/Windows doesn't crash this
much during development; why does "almost uncrashable OS/2" do it so much more
often? Anyway, thanks for providing a little validation to my OS/2
frustrations. 
Michael Miller
Atlanta, Georgia


OS/2 2.1 Executable File Formats Corrections


Dear DDJ,
I would like to point out some problems with the article, "Examining OS/2 2.1
Executable File Formats," by John Rodley (DDJ, September 1994). In describing
the fixup format, John has confused the meanings of fixup source and fixup
target. On page 73, he states, "The target is the place in the code where a
symbolic reference must be replaced by a real address," and further on, "The
fixup record contains the offset of the target and the object number and
offset of the source." However, on the following page, Figure 2 clearly shows
that it is the source, not the target, which is specified by an offset on a
physical page, and it is the target which is specified by an object number and
offset.
Had John simply reversed the meanings of these terms, the error would be
understandable. But he states that "LX uses the more obvious one-[fixup
record]-per-target strategy." This contradicts his previously stated
definitions of source and target.
Finally, on page 73 John states that "each page in the Object Page Table
contains an index into the Fixup Record Table pointing to the first fixup
record for that page." This is incorrect. It is the Fixup Page Table, not the
Object Page Table, whose records contain pointers into the Fixup Record Table.
Fred Hewett
Watertown, Massachusetts
John responds: Thanks for responding to my article on the LX file format,
Fred. I often wonder if anyone is reading these things-- now I know. 
I have to tell you, my day is ruined. You're right, I reversed the definitions
of source and target, further muddying a subject I was trying to clarify. In
my defense, the Schmitt PC Tech Journal article I used as a reference for the
NE format uses my (reversed) definition for target. The IBM LX format doc uses
the opposite definition. In my head, I was simply switching definitions
depending on which reference I was looking at. I didn't catch that until
sitting down with your letter, the source, and both references.
In the code that attaches sources to targets, (lx_exe.c lines 412--471), I use
the proper (nonreversed) definition, which is why tables 3 and 4 come out
right. What I do is print column headings that say "Target Source" while
actually printing in the column data "source target."
In summary, the definitions of source and target on page 73 should be reversed
along with the column headings on Tables 3 and 4, and the statement about
one-record-per-target should read "LX uses the more obvious
one-fixup-record-per- source strategy." Also page 74, paragraph 4 should say
"source'' where it says "target.'' 
Sigh.
You are also correct about Object Page and Fixup Record Tables. If you look
close enough in the source (lx_exe.c lines 415--420) that connection is
visible, but I certainly forgot it when I wrote that sentence.
Thanks for the corrections, Fred. Frankly, I'm astonished that anyone caught
this. How do you know so much about executable file formats?


Bar Codes


Dear DDJ,
In regards to the article, "A C++ Class for Generating Bar Codes" by Douglas
Reilly (DDJ, July 1994), readers should know that the AIM Uniform Symbology
Specification Code 128 specification (June 1993) is the definitive reference
source for bar-code character labels. You can contact AIM at 634 Alpha Drive,
Pittsburgh, PA 15238-2802.
The AIM document says that the STOP character shall be 13 units wide, ending
in a double unit bar. Doug's CODE 128 routines only specify the first 11 units
of the STOP character and instead use a printCode128Term() routine to print
the final bar; this yields correct bar codes but seems to me (in a picky way)
to violate the specification's intent.
Also, the codes 28, 30, and 62 have incorrect display values in the DDJ
routine code structure. The bad routine values and correct AIM values are
shown in Table 1.

Harold T. Salive
Auckland, New Zealand
Doug responds: Thanks, Harold, for pointing out the correct codes for 28 and
30. The value for 62 is a space in my reference. Obviously, I should have
referred to the original standard. In any event, I'll obtain the text of the
standard ASAP and post a corrected version of the code.
As to my treatment of the STOP character, while I understand your concern, my
goal was to generalize the creation of individual characters as much as
possible. Placing the final bar of a STOP character outside the normal
character routine seemed (and seems) reasonable to me.
The STOP character is the single character that extends beyond 11 units.
Perhaps it would have been clearer to encapsulate all of the stop code
printing within printCode128Term(). Thanks again for your thoughtful reading
of the code.


Mind and Life


Dear DDJ,
Regarding Michael Swaine's, "Mind and Life as Mechanism" (DDJ, October 1994),
mechanical models of life have always been repugnant to many. However,
throughout history new knowledge about living creatures has only served to
strengthen the hand of those who tout such models. Knowledge now culminates in
the discovery that DNA governs the physical development of living creatures
from birth. The ongoing genome-mapping project is writing a new chapter.
The brain/mind seems to be the final frontier in the search for a mechanical
basis for life. I, for one, am ready to accept it. I've come to think of life
simply as a fifth state of matter--after solid, liquid, gas, and plasma--to
put it in prosaic terms. It seems that living creatures need not be considered
anything more than complex, highly organized systems of molecules. How that
organization continually fine tunes itself in an evolutionary sense is a
central question. I cannot believe it is due strictly to random mutations
triggered by cosmic rays as some say. Behavioral aspects affect and reflect
neural interconnection patterns. Perhaps information of changes in those
patterns tends to feed back into the genes somehow.
The earliest reference of the brain as mechanism in my library is Design For a
Brain, by W. Ross Ashby (John Wiley & Sons, 1952), which predates our modern
computers and is centered around a special-purpose computer called a
"homeostat." It's an interesting book filled with diagrams and equations, not
just text. Chapter 8's title is, "The Animal as Machine." Compare that with
"Mind and Life as Mechanism."
"Roger Penrose," Michael states, "has presented an approach that rests
ultimately on quantum uncertainty." I presume that refers to the Heisenberg
uncertainty principle from quantum physics. If so, here's a late flash: Along
comes Sallhofer (Z.Naturforsch, Hans Sallhofer, Maxwell-Dirac-Isomorphism,
1986) in Austria to proclaim, with some authority, that the probabilistic
interpretation of quantum theory is a mistake. That it is due simply to a
"lack of clarity" in the theory which is not present in the Dirac formulation.
So that catchall phrase, "quantum uncertainty," may be on its way out.
Homer B. Tilton
Tucson, Arizona


Programming-Language Syntax


Dear DDJ,
Many years ago, Niklaus Wirth proposed a notation for defining the syntax of
programming languages called, "extended Backus-Naur Form," or EBNF. Although
its earliest publication seems to have been in a very short article in the
Communications of the ACM in 1977, it is still alive and well. Recent
ANSI/IEEE and ISO/IEC standards have made use of it.
So why am I writing this letter? Partly because EBNF is a good thing in and of
itself: It is clear, unambiguous, readable, easy to learn, machine
processable, and is not dependent on multiple fonts. I'm also writing this
because I'd like to see a standard combining the various dialects of EBNF that
have cropped up over the years. In fact, I'm probably willing to write the
first draft of the standard. (I also hope to make some money by coming up with
my own EBNF-based products.)
Are you or any DDJ readers aware of any ongoing work on standardizing EBNF?
Perhaps an ANSI, IEEE, IEC, or ISO committee? If nothing is ongoing, then I
might be interested in creating an informal committee for this. Would you be
interested in being involved in the committee? Or would you just like me to
keep you posted?
The first thing I would like to add to EBNF is a way of indicating comments.
Before I invent something, is there any existing technique for this?
Anyone who is interested, or aware of any other EBNF-based software, is
invited to send me e-mail or regular mail.
John Rogers
11604 104th Ave. NE
Kirkland, Washington 98034-6606
CompuServe 72634,2402
DDJ responds: This sounds like a worthwhile project, John, and hopefully,
other readers will work with you on it. Keep us posted on your progress and
let us know how we can help out.
Hause's Method
Dear DDJ,
I read your magazine for the first time in July. This is the best programming
magazine I ever picked up: intelligent, unpretentious, and aware of the
context of software engineering in the world outside. Jack Woehr's interview
with Lotfi Zadeh was a pleasure to read and reread.
Now talking of fuzzy logic, what was Mr. Williams D. Hause's letter (DDJ, July
1994) trying to prove. I just bought a copy of Schneier's wonderfully
plaintext Applied Cryptography. Mr. Hause was talking about a one-time pad! My
sister and I used to send messages to each other like this. Mathematically
unbreakable, except that our mother would tidy up our messages and one-time
pads from inside the sofa to the kitchen bin.
I couldn't believe that Mr. Hause's nonsensical letter wasn't a joke. Here is
my proof: 
Theorem: Any notion N, when obscured by ridiculous pseudoformal reasoning R,
produces a ridiculous nonsense. Proof:
1. Let K be a ridiculous, pseudoformal reasoning of X number of lines.
2. Let P be the plain notion encrypted in Hause's letter, of length X.
3. Let C be the confusion produced by K obx P. (obx is the Hause "obscure
pseudoformal reasoning" operator.)
4. From the theorem I am trying to prove, we know that C is ridiculous
nonsense and is therefore unbelievable.
Corollary: Using Hause's method I have discovered a truly remarkable four-line
proof of the four-color map theorem, but my margin of credibility is too small
to contain it.
Sandy Anderson
London, England


Another Approach to Callbacks


Dear DDJ,
I read with interest the article, "Associations in C++" by Dan Ford (DDJ,
August 1994) because a team member of mine wrote an almost identical class
with a slightly different twist. Instead of indirectly calling a method
through a function, we simply called the method directly. To do this, the user
of the callback registers calls not only the object pointer, but also the
pointer to the method he wishes to have called. The method is then called in
the standard way of calling methods through pointers:
(objectPointer->*methodPointer)(param1, _;.
In this way, objects can directly register their own methods to be called on
specific events. Of course, the obvious constraint on this is that the object
and method pointers that are used are all derived from a common class. This is
because the callback manager object must be able to hold the pointers in a
variable of the appropriate type. In other words, you could not store the
object and method pointers as void pointers and expect it to work (too bad, I
say!). However, it is my opinion that this constraint is not a factor when
properly designing application-domain frameworks. Of course, if you use
Smalltalk, all of these problems that arise from using such a hybrid language
become NULL and void! 
Kevin W. Beam
Omaha, Nebraska
Table 1: Bar codes.
Code Routine AIM 
 Errval Value 

28 { <
30 } >
62 ^



























































December, 1994
World Wide Web & HTML


Preparing documents for online presentation




Douglas C. McArthur


Douglas is a systems integration engineer for BioData Inc. He can be reached
at douglas@biodata.com.


The World Wide Web--a network of computers providing information and resources
on the Internet--grew out of a hypertext project started at CERN, the European
Laboratory for Particle Physics. As a resource for finding information and
services ranging from protein-databank searches to pizza deliveries, the World
Wide Web (WWW) has become a centerpiece of the information superhighway. To
access information on the WWW, you use hypertext-based "browser" applications
that lead you to the desired documents, then display that information on the
screen. 
At the heart of the WWW is a platform-independent page-description language
called "Hypertext Markup Language" (HTML). Based on the Standard Generalized
Markup Language (SGML), HTML lets you prepare documents for WWW browsing by
embedding control codes in ASCII text to designate titles, headings, graphics,
and hypertext links, making use of SGML's powerful linking capabilities. HTML
is still evolving. The specification is currently at Document Type Definition
(DTD) Level 2, which includes support for forms within standard hypertext
documents. (The DTD describes the logical structure of possible document
instances.) Work is also underway for DTD Level 3, which is focusing on a
"scalable HTML." Undoubtedly, additions to HTML such as those described in the
accompanying text box entitled, "Evolution of the HTML 3.0 Spec," will
continue to be made, especially as commercial WWW browsers and servers become
available.
An understanding of HTML is critical to properly preparing documents for the
WWW. There are, of course, a variety of sophisticated freeware and commercial
tools designed to simplify this process. SoftQuad's (Toronto, ON) HoTMetal
Pro, for instance, is a word processor for Mosaic and other WWW browsers that
shields you from coding, then checks that documents are valid HTML. Likewise,
Avalanche's (Boulder, CO) FastTag, a filter for use with word processors such
as WordPerfect for Windows and Microsoft Word for Windows, lets you create
documents normally, then execute a conversion macro to transform them into
HTML format. Quarterdeck's (Santa Monica, CA) yet-to-come Normandy project
promises similar capabilities. 
Still, most "automated" HTML tools are narrowly focused for specific tasks
(such as translating previously created documents into HTML format) and
generally not suited for robust WWW development. For instance, you would not
want to create a WWW "home page"--the default hypertext document you see when
you first enter the WWW--with a word processor, then translate it to HTML. In
short, being familiar with HTML coding techniques is fundamental to WWW
development. Thankfully, a wealth of information and tutorials are available
to the HTML designer online. For instance, the NCSA provides the "Beginner's
Guide to HTML"
(http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/d2-htmlinfo.html), while
experienced HTML coders will find Michael Grobe's "HTML Quick Reference"
(http://kuhttp.cc.ukans.edu/lynx_help/HTML_quick.html) valuable. In addition,
James Tilton's "Composing Good HTML" should be required reading
(http://www.williamette.edu/html-composition/strict-html.html). In this
article, I'll cover the basics of HTML coding, then present the complete HTML
source (see Listing One) to the live WWW home page shown in Figure 1 (from
BioData, the company where I work). You can find the BioData home page on the
WWW at http://www.biodata.com/. With a grasp of the HTML fundamentals
presented here and the HTML source to the BioData home page, you can create
your own customized home page. 


WWW Browsers


There are numerous free (and a few commercial) hypertext browsers, the most
well-known being Mosaic, from the National Center for Supercomputing
Applications, a graphical, hypermedia front end to the Internet. Other
graphical browsers include HaL Software's (Austin, TX) Olias Browser and
Spyglass's (Champaign, IL) Enhanced NCSA Mosaic 2.0, which supports links to
Adobe Acrobat 2.0, OLE 2.0, and AppleEvents. Likewise, Mosaic Communication
Corp. (Mountain View, CA) has recently released its Mosaic NetScape browser
for Windows, Macintosh, and the X Window System. At the other end of the
spectrum is Lynx, a popular text-based browser for UNIX and DOS developed at
the University of Kansas.
Obviously, all WWW browsers aren't created equal. For instance, the X Windows
version supports forms, while the current stable Macintosh version does not
(alpha releases of NCSA Mosaic for the Mac do support forms, however). Lynx,
on the other hand, supports forms, but can't display graphic images. Note that
the freely distributable WWW browsers are regularly updated and usually
available via anonymous ftp as soon as they are released. Keep the ftp sites
handy and check them from time to time. (If you're using Mosaic, put them in
your "hotlist" of documents that you can access via a quick menu selection.)
The capabilities of WWW browsers should be kept in mind when designing HTML
pages. For instance, if you create a fancy GIF image as a hypertext link,
remember that some users will likely be using text-based browsers like Lynx.
Consequently, they'll probably see only a noninformative [IMAGE] label on
their screen. (There's an easy way around this one though, as I'll discuss
shortly.) With a little forethought, however, your HTML documents can retain
both aesthetics and functionality across a variety of WWW browsers.


WWW Servers


WWW server software has been ported to a number of platforms, ranging from
UNIX to Macintosh. For this discussion, I will focus on UNIX-based servers.
Two freely distributable UNIX-based WWW servers are widely available, one from
NCSA the other from CERN. Although both have the same basic functionality, I
prefer the CERN server because it provides better security, easier
configuration, and has a wealth of online documentation available. In
addition, Mosaic Communication has released two versions of its Mosaic NetSite
server, one for nonsecure communications, and the other secure "commerce"
transactions that require encryption and authentication.
Currently, many of the advanced features of HTML coding, such as image mapping
and utilizing forms data, require modification of server-configuration files.
Thus, in most cases, the system administrator has yet another hat to wear,
that of "webmaster." Of course, the webmaster need not have root privileges.
If security is an issue, I suggest creating a group "www," adding the
appropriate users, and setting group file permissions on WWW configuration
files appropriately.


Writing HTML


If you are familiar with UNIX text-formatting languages such as troff, you can
easily adapt to HTML coding. HTML tags, embedded within the body of the text,
are used to define a document and guide its display. Example 1(a), for
instance, produces the output in Example 1(b). Tags are bracketed by less-than
(<) and greater-than (>) symbols. The slash (/) signifies an end-tag.
Different tags can be embedded to combine type styles. Lines of text are
automatically concatenated when displayed. Some common HTMLtags are listed in
Table 1.
All text following a tag is acted upon, up to the occurrence of the end-tag. A
missing slash can have interesting effects! A few tags, like <P>, do not
require an end-tag, since the tag does not act upon anything else.
Hypertext links, graphics, or both text and graphics can be defined as a link.
Note in Example 2, for instance, that the <IMG_> tag requires an attribute
(SRC), which defines the graphic file. 
The majority of HTML tags are commands for text formatting. The exceptions are
for creating hypertext links, inline graphics, and forms. Let's first examine
creating a hypertext link, or "anchor," which points to other documents; see
Example 3(a). The anchor tag is different from most formatting tags in that it
includes an attribute. In this case the attribute is a hypertext reference
(HREF) to another document. The entire text between the <A_> and </A> becomes
the link. Also note that <IMG_> is one of the few tags, like <P>, that does
not require an end-tag. Additional attributes that can be used with the <IMG_>
tag are ALIGN and ALT. You can use the ALIGN=BOTTOM attribute to display text
at the bottom of an inline image; other values for the ALIGN attribute are TOP
and MIDDLE. 
If graphics cannot be displayed, the ALT attribute will allow you to display
an alternate text label; see Example 4(a). Remember that GIF images can be
large and demand a lot of network bandwidth to download. If you have a large
graphic image, create a smaller version of the image that is a link to the
large image. For instance, in Example 4(b) I've made a hypertext link to a GIF
image instead of another HTML document. This is perfectly valid, as most
browsers will automatically recognize the file type (based either on the .GIF
extension, or via MIME type) and invoke the appropriate viewer. Note also that
the smaller image, which has been defined as a link, will be displayed as an
inline image, while the larger image will probably be displayed by an external
viewer or in a separate window. If the browser does not recognize the file
type or does not know of an appropriate "helper application" to invoke to
display the image, you will be prompted for a filename and the data will be
downloaded to a file.


Uniform Resource Locators


Up to this point, I've demonstrated only HTML documents and GIF images as
hypertext reference (HREF) values. However, any uniform resource locator (URL)
can be a valid HREF. URLs, like that in Example 5(a), let you specify how to
reach an Internet resource. The exact form of URLs depends upon whether you
are using WWW, ftp, Telenet, Gopher, and so on. For the WWW, a URL begins with
the name of the server you want to access (http://www.biodata.com), followed
by the path name of your HTMLpage (/douglas/people/douglas.html). 
Because any URL can be used, HTML has the flexibility to reference many types
of data. A URL can point to more hypertext, an image, a sound file, an MPEG
animation sequence, or even a Microsoft Word document. Remember, however, that
not every browser will be capable of handling that particular file type. For
instance, clicking on a link pointing to a Microsoft Word document using NCSA
Mosaic for the Macintosh will successfully launch Word and display the
document. However, someone using NCSA Mosaic for X Windows will be out of
luck. In short, stick to standard file types; see Table 2.
A URL can also specify one of many types of protocols. For instance, to create
a link that will download a file instead of trying to view it, use a URL
specifying file-transfer protocol (FTP) instead of HTTP, as in Example 5(b).
In fact, an HREF value is not limited to being a URL as currently defined.
Some of the more advanced (and impressive) HTML coding tricks are possible via
URL overloading, or tacking extra information onto the end of a URL.


Back to Graphics


One example of URL overloading is demonstrated in image mapping. By clicking
on different parts of a mapped image, you can access other HTML pages, just
like clicking on a hypertext link. An image map is defined in HTML as in
Example 6. Note the added attribute ISMAP in the <IMG_> construct and the HREF
value used. While this looks like a valid URL, it is interpreted differently.
The HREF in Example 6 is being used to execute a program called "imagemap"
with the argument "picture." (Provided the server is configured with
http://host.domain.com/cgi-bin as the directory for executable files.) The
imagemap program returns the URL corresponding to the x,y coordinates of the
point where the GIF image was clicked, based on the map file referenced by the
argument picture.

Image maps with icons or buttons for certain subjects, departments, and the
like add a nice touch to home pages. You can also add the capability of
zooming in on images by linking an area of a mapped image to an enlarged image
(which could be another image map).
Don't forget to provide a separate HTML document with text-only links for
those users that can't view your mapped image. For details about configuring a
NCSA WWW server to handle image mapping (creating the map files and the like),
refer to http://winter- mute.ncsa.uiuc.edu:8080/map-tutorial/image-maps.html.


Forms


Before the advent of HTML forms, information flow via the WWW was
unidirectional. Forms provide the means to collect and act upon data entered
by the end user. They also open up a number of possibilities for online
transactions, such as requesting specific news articles or ordering pizza
(Pizza Hut's PizzaNet, http://www.pizzahut.com/, lets you do this).
A number of gadgets are available for building forms, including text boxes,
radio buttons, and check boxes. A user can enter text, select items from a
list, check boxes, and then submit the information to the server. A program on
the server then interprets the data and acts upon it appropriately, either by
returning information in hypertext form, downloading a file, or (in the
PizzaNet example) electronically notifying the local Pizza Hut restaurant of
your order.
All forms begin with the construct: <FORM METHOD=POST
ACTION="cgi-bin/program">_</FORM>. The ACTION should be a URL pointing to the
program which will be processing the data collected by the form. Do not forget
the attribute METHOD=POST, since use of METHOD=GET, the default attribute,
should be avoided, especially if a large amount of data could potentially be
submitted using the form.
Listing Two "Prototypical HTML forms" shows how the <INPUT_>, <TEXTAREA_>, and
<SELECT_> tags and their attributes are used to create a form. Note that one
special <INPUT_> attribute must always be included in your form: <INPUT
TYPE=SUBMIT VALUE="Continue."> This button causes the data entered into the
form to be transmitted to the server program specified by the <FORM ACTION_>
attribute. The VALUE attribute simply defines the name of the button, usually
Continue, Submit, or Do It. For more on HTML-form coding, see
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html.


Common Gateway Interface


Generating forms in HTML is only half the battle. The challenge is how to
decode the data submitted from the form. The data is passed to an executable
program in much the same way as x,y coordinates are passed to the imagemap
program described previously. Programs that use data submitted from an HTML
form must conform to the CGI (Common Gateway Interface) specification. Any
language that produces executable files is suitable for creating a CGI. The
most popular languages are C/C++, Perl, sh, and csh. Information on the CGI
specification can be found at http://hoohoo.ncsa.uiuc.edu/cgi/interface.html.
On a UNIX server, your CGI program will receive the form data via
stdin--provided that the METHOD=POST attribute was used. By the same token,
output produced by the CGI program sent to stdout will be sent back to the
browser. If the METHOD=GET attribute was used (not recommended) all the form
data is encoded in the environment variable QUERY_ STRING. Obviously, this
would not work for large strings of data.


Tools


A number of tools exist to facilitate creation of HTML documents--macro
libraries for emacs, for instance--that have shortcuts for all the HTML tags.
On the Macintosh, there's the BB-Edit editor, which lets you highlight blocks
of text and select an HTML formatting command from a pull-down menu. BBEdit
will then automatically insert the start-tag and end-tag in the proper places.
Another useful tool is rtf2html, which converts rich-text documents to HTML,
preserving type styles, footnotes, graphics, and the like. For a list of
useful HTML tools, refer to
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/faq-software.html and
http://info.cern.ch/hypertext/WWW/Tools/ Overview.html. 


Server Statistics


It's nice to know that after working long hours producing HTML documents,
someone is actually reading them. Most servers produce a log file which
records when pages are accessed and by what Internet address. Programs are
available that can analyze WWW-server log files and produce reports showing
which pages are accessed the most and which Internet domains access your
server the most. Kevin Hughes' getstats program, for example, produces at
least 12 different reports and bar graphs showing WWW server accesses. For
information on getstats, refer to
http://www.eit.com/software/getstats/getstats.html.


Things to Come


As the number of WWW servers on the Internet continues to increase, so will
the amount of information and services available. At the same time, the HTML
specification will evolve, as will the capabilities of WWW browsers and
servers. In the future, we'll continue to see exciting developments from the
commercial sector, as companies rush to provide WWW servers and browsers that
are faster, more robust, and more feature filled than the freely distributable
versions that paved the way. Despite rapid changes, it is my hope that the
World Wide Web will remain the collaborative environment that it is today. 
Evolution of the HTML Spec
Dan Connelly
Dan, the author of the HTML 2.0 spec, is an engineer at HaL Software. He can
be contacted at connelly@halsoft.com.
The HTML language was first specified in 1991 by Tim Berners-Lee of CERN as
part of the World Wide Web initiative to facilitate communication among
high-energy physicists. The specification continued to evolve quickly to meet
the requirements of the Web community. In 1993, I wrote the first SGML
Document Type Definition (DTD) describing HTML and started assembling a suite
of documents for testing Web browsers.
At the beginning of 1994, the HTML spec was out of date and no longer
reflected current practice. I drafted a revision that reflected many of the
enhancements from NCSA Mosaic, called it the HTML 2.0 specification, and began
organizing a review committee. The first meeting was at the WWW conference at
CERN in May. We met again at the IETF (Internet Engineering Task Force)
meeting in Toronto to form the HTML Working Group. The HTML 2.0 spec is an
attempt to put a boundary around the current usage of HTML. It is out as a
draft for review (see http://www.hal.com/~connolly/html-spec) and the Working
Group expects to publish it by the end of 1994.
The HTML 2.0 specification divides the HTML elements, or tags, into four
categories:
The core set which consists of features every WWW application must support.
Optional features, including inline images and font changes.
Form elements for designing interactive, GUI-based forms for creating
sophisticated, Web-based applications.
Obsolete elements that are supported but shouldn't be used in new documents,
because they may be phased out in the future.
HTML forms are the most innovative addition to the 2.0 spec. They allow the
author of a Web document to present the reader with an interactive form that
includes type-in fields, radio buttons, and pull-down menus. Forms allow the
implementation of new types of Web applications. The uses range from providing
sophisticated interfaces to search engines to on-line registration and
interactive applications where the input to the form determines the nature and
content of the document displayed.
The HTML specification follows an open development model: A new feature is
proposed, then implemented in some clients and tested in some applications. If
the demand for the new feature is sufficient, the other browser implementors
are encouraged to follow suit, and the new feature becomes widely deployed. In
this process, the design is reviewed and perhaps modified or enhanced.
Eventually, when there is sufficient experience with the new feature, it
becomes part of the standard set of HTML features.
The prime example of this is a set of features called HTML+, originally
proposed in late 1993 by Dave Raggett of Hewlett-Packard Labs in Bristol,
England. The forms elements were first proposed in his HTML+ document. Support
for HTML+ tables will soon be stable enough to become a standard part of HTML.
A simple markup scheme allows the encoding of complex tables, where each cell
can contain text, images, or multiple paragraphs.
The HTML+ proposal also allows the author more typographic control over the
presentation of the document. Many Web users have been brought up using
WYSIWYG word processors and expect the same level of presentation control of
their Web-based documents. HTML+ addresses these needs with features such as
inline images with automatic wraparound.
HTML+ has features that meet the specialized requirements of Web users. It
specifies a MATH element, making it simpler to encode mathematical equations
for display.
At this point, it's not clear how many more of the HTML+ features will become
standard in a future revision of HTML, or if HTML+ will become a separate data
format, supported by sophisticated browsers.
The best place to find out more information about the Web is, of course, on
the Web itself. For instance, see Tim Berners-Lee's original writings on WWW,
HTML, and other topics at http://info.cern.ch/hypertext/WWW/TheProject.html.
The HTML 2.0 review materials are at http://www.hal.com/~connolly/html-spec,
while the HTML+ discussion documents are at
http://info.cern.ch/hypertext/WWW/MarkUp/HTMLPlus/htmlplus_1.html. 
Discussions of HTML and other WWW topics are conducted over e-mail and USENET
news. To join the HTML Working Group discussion, send mail to
html-wg-request@oclc.org; to join the general HTML technical discussion,
contact www-html-request@info.cern.ch; to join the general WWW technical
discussion, contact www-talk-request@info.cern.ch. 
Figure 1 Typical WWW home page.
Table 1: Common HTML Tags. (a) headers; (b) general formatting; (c) logical
styles; (d) physical styles; (e) lists; (f) miscellaneous; (g) hyperlinks or
anchors; (h) graphics; (i) forms.
(a)
<h1>_</h1> Most prominent header
<h2>_</h2>

<h3>_</h2>
<h4>_</h4>
<h5>_</h5>
<h6>_</h6> Least prominent header

(b)
<title>_</title> Specify document title
<p> New paragraph
<br> Forces a line break
<hr> Horizontal line
<pre>_</pre> Preformatted text
<listing>_</listing> Example computer listing
<blockquote>_</blockquote> Quoted text.

(c)
<em>_</em> Emphasis
<strong>_</strong> Stronger emphasis
<code>_</code> Display an HTML directive
<samp>_</samp> Include sample output
<kbd>_</kbd> Display a keyboard key
<var>_</var> Define a variable
<dfn>_</dfn> Display a definition
<cite>_</cite> Display a citation

(d)
<b>_</b> Bold font
<i>_</i> Italics
<u>_</u> Underline
<tt>_</tt> Typewriter font

(e)
<ul>_</ul> Unordered list
<ol>_</ol> Ordered list
<menu>_</menu> Menu list
<dir>_</dir> Directory list
<li> Item in a list
<dl [compact]>_</dl> Definition list/glossary
<dt> Term to be defined
<dd> Definition of term

(f)
<!-- text --> Place a comment in the HTML source
<link href="URL"[rev=][rel=]> Define relationship between documents
<address>_</address> Address information
<isindex> Specify index file
<nextid> Set a variable value
<base> Path of current file

(g)
<a name="target">_</a> Define a target
<a href="#">_</a> Link to target in the same file
<a href="URL">_</a> Link to another file

(h)
<img src="URL"[alt=][align=][ismap]> Include a graphic image

(i)
<form [ACTION=][METHOD=]>_</form> Define a form.
<input [type=textpasswordcheckboxradiosubmitreset]

 [name=][value=][checked][size=][maxlength=]> Create a gadget
<select [name=][size=][multiple]> Define a list of options</select>
<option [selected]> Define values within a <SELECT>
<textarea name=""[rows=][cols=]> Define a text area
</textarea>
Table 2: Standard file types.
 Type Description 
 HTML Hypertext markup
 GIF Graphic interchange format
 XBM Bitmapped graphics
 AU Audio (raw) format
 JPEG Graphic format
 MPEG Animation/movie format
 AIFF Audio format
Example 1: (a) produces the output in (b).
(a) This word is in <I>italics</I>.
 <P>
 <U>Underlined, and 
 <B>bold</B> 
 too.</U>

(b) This word is in italics.

Underlined, and bold too.
Example 2: The <IMG_> tag requires an attribute (SRC) which defines the
graphic file.
<A HREF="another.html">
Text and
<IMG SRC="graphic.gif">
link.</A>
Example 3: (a) This hypertext link points to other documents; (b) text which
produces the image in Figure 2.
(a)
 <A HREF="index.html">This is a link.</A>

(b)
<IMG ALIGN=BOTTOM SRC="graphic.gif">
This text is at the bottom.
<P>
Example 4: (a) Using the ALT attribute; (b) a hypertext link to a GIF image.
(a)
<IMG SRC="graphic.gif" ALT="Click here [ ]">

(b)
<A HREF="larger_image.gif">
<IMG SRC="smaller_image.gif">
</A>
Example 5: (a) URLs let you specify how to reach Internet resources; (b) a URL
that specifies FTP instead of HTTP.
(a)
 http://www.biodata.com/douglas/people/douglas.html

(b)
 ftp://www.somewhere.edu/file_name
Example 6: Defining an image map.
<A HREF="http://host.domain.com/cgi-bin/imagemap/picture">
<IMG SRC="picture.gif" ISMAP>
</A>

Listing One 

<HTML>

<HEAD>
<!-- ---------------------------------------------------------------- -->
<!-- http://www.biodata.com/index.html, last modified 9/20/94 -=DCM=- -->
<!-- ---------------------------------------------------------------- -->
<TITLE>BioData Navigator</TITLE>
<IMG SRC="biodatabanner.gif" ALT="BioData Information Navigator">
</HEAD>
<PAGE>
<P>
<I>Copyright (c) 1994 by BioData, Inc. All rights reserved.</I>
<P>
The BioData Navigator is a
<I>Collaborative Information System</I>
designed to give World Wide Web users two-way access to
bioscience-related information. The
<A HREF="launch.html">BioData Cyberspace Launching Pad</A>
is the most popular part of BioData's web site. Check out
<A HREF="BioData.html">BioData Information</A>
and
<A HREF="WWW.html">WWW Information</A>
for more about BioData and the Navigator technology.
<HR>
<H1>What's New On The BioData Navigator?</H1>
A complete history of BioData
<A HREF="whatsNew.html">What's New notices</A>
is kept in another page.
<P>
<I>August 15, 1994</I>
<P>
The
Democratic Party's version of the
<A HREF="/gov/s2357/index.html">Health Security Act</A>
is on-line at BioData!
<P>
<I>August 6, 1994</I>
<P>
Check out the new
<A HREF="pr/index.html">Press Releases</A>
for information on BioData's latest expansion efforts! New employees --
<A HREF="pr/prVPeng.html">Barry Silver</A>,
VP of enginering and
<A HREF="pr/prSciComp.html">Barr Bauer</A>,
director of scientific computing -- are working together to
deliver innovative computer resources to biotechnology companies worldwide!
<P>
<I>July 21, 1994</I>
<P>
The
<A HREF="jobs.html">BioData Job Board</A>
is now on-line! Check here for information about open positions
at BioData and application instructions.
<P>
<I>July 20, 1994</I>
<P>
A
<A HREF="douglas/darkwater.html">geekhouse</A>
has invaded BioData! Our local Geek House Resident,
<A HREF="douglas/people/douglas.html">Douglas McArthur</A>
has some interesting links to Bay Area and Central Coast California people.

<HR>
<H1>BioData Information Resources</H1>
Check out the following topics to learn more about BioData and how
to use computers in a biotechnology company:
<P>
<UL>
<LI><B>BioData Product and Service Information.</B>
Sections on
<A HREF="BioData.html">Company information</A>
and a
<A HREF="Products.html">detailed product catalog</A>
present basic information about BioData.
<A HREF="pr/index.html">BioData Press Releases and News sightings</A> 
documents significant milestones in BioData history. On-line versions
of our newsletter, the <A HREF="newsletter/index.html">BioData Report</A>,
are available.
<P>
<LI><A HREF="launch.html">The BioData Cyberspace Launching Pad</a> 
is your gateway to the many bioscience, computer-related and fun spots 
on the Internet.
<P>
<LI><A HREF="who/whoswho.html">Who's Who At BioData.</A>
A directory of the special knowledge and skills of BioData employees plus
detailed biographical information.
<P>
<LI><A HREF="vendors/index.html">Vendor And Supplier Information.</A>
Many of BioData's suppliers have Internet resources available. This link puts
you in contact with those resources.
<P>
<LI><A HREF="jobs.html">BioData Job Board.</A>
BioData is a quickly growing company and has immediate
openings for highly motivated individuals. Send resumes to
<A HREF="mailto:jobs@biodata.com">jobs@BioData.COM</A>.
</UL>
<HR>
Send comments to
<A HREF="mailto:webmaster@BioData.COM">webmaster@BioData.COM</A>.
<P>
Have a happy journey through the cyberspace!
<P>
<I>All information contained in the BioData Navigator system is
Copyright (c) 1994 by BioData, Inc.</I>
<P>
<ADDRESS>
BioData Home Page /
<A HREF="who/vern.html">vern@BioData.COM</A>
</ADDRESS>
</BODY>
</HTML>


Listing Two

<HTML>
<HEAD>
<!-- ------------------------------------------------------------------- -->
<!-- http://www.biodata.com/douglas/form.html - Modified 9/20/94 -=DCM=- -->
<!-- ------------------------------------------------------------------- -->
<TITLE>Prototypical HTML Forms</TITLE>

<H1>Prototypical HTML Forms</H1>
</HEAD>
This document displays the various form gadgets currently supported.
<P>
<FORM ACTION="http://hoohoo.ncsa.uiuc.edu/htbin-post/post-query"
METHOD="POST">
<HR>
<H1>Text Fields</H1>
Basic text entry field:
<INPUT TYPE="text" NAME="entry1" VALUE=""> 
<P>
Text entry field with default value:
<INPUT TYPE="text" NAME="entry2" VALUE="This is the default."> 
<P>
Text entry field of 40 characters:
<INPUT TYPE="text" NAME="entry3" SIZE=40 VALUE=""> 
<P>
Text entry field of 5 characters, maximum:
<INPUT TYPE="text" NAME="entry5" SIZE=5 MAXLENGTH=5 VALUE=""> 
<P>
Password entry field (*'s are echoed):
<INPUT TYPE="password" NAME="password" SIZE=8 MAXLENGTH=8 VALUE=""> 
<HR>
<H1>Textareas</H1>
A 60x3 scrollable textarea:
<P>
<TEXTAREA NAME="textarea" COLS=60 ROWS=3>NOTE:
Default text can be entered here.
</TEXTAREA>
<HR>
<H1>Checkboxes</H1>
Here is a checkbox
<INPUT TYPE="checkbox" NAME="Checkbox1" VALUE="TRUE">,
and a checked checkbox
<INPUT TYPE="checkbox" NAME="Checkbox2" VALUE="TRUE" CHECKED>. 
<HR>
<H1>Radio Buttons</H1>
Radio buttons (one-of-many selection):
<OL>
<LI>
<INPUT TYPE="radio" NAME="radio1" VALUE="value1">
First choice. 
<LI>
<INPUT TYPE="radio" NAME="radio1" VALUE="value2" CHECKED>
Second choice. (Default CHECKED.)
<LI>
<INPUT TYPE="radio" NAME="radio1" VALUE="value3">
Third choice. 
</OL>
<HR>
<H1>Option Menus</H1>
One-of-many (Third Option selected by default):
<SELECT NAME="first-menu">
<OPTION>First Option</OPTION>
<OPTION>Second Option</OPTION>
<OPTION SELECTED>Third Option</OPTION>
<OPTION>Fourth Option</OPTION>
<OPTION>Last Option</OPTION>
</SELECT>
<P>

Many-of-many (First and Third selected by default):
<SELECT NAME="second-menu" MULTIPLE>
<OPTION SELECTED>First Option</OPTION>
<OPTION>Second Option</OPTION>
<OPTION SELECTED>Third Option</OPTION>
<OPTION>Fourth Option</OPTION>
<OPTION>Last Option</OPTION>
</SELECT>
<P>
<B>NOTE: Hold down CTRL and click to multiple-select.</B>
<!-- You can also assign VALUEs using TYPE="hidden" -->
<INPUT TYPE="hidden" NAME="hidden" VALUE="invisible">
<HR>
<H1>Special Buttons</H1>
Submit button (mandatory):
<INPUT TYPE="submit" VALUE="Submit Form">
<P>
Reset button (optional):
<INPUT TYPE="reset" VALUE="Clear Values">
<P>
</FORM>
<HR>
<H1>References</H1>
Heres a link to
<A HREF="http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-
out-forms/overview.html">
a handy HTML forms reference
</A>.
<P>
<HR>
<ADDRESS>
Prototypical HTML Form /
<A HREF="http://www.biodata.com/douglas/people/douglas.html">
douglas@BioData.COM 
</A>
</ADDRESS>
</HTML>

























December, 1994
Digital Speech Compression


Putting the GSM 06.10 RPE-LTP algorithm to work




Jutta Degener


Jutta is a member of the Communications and Operating Systems research group
at the Technical University of Berlin. She can be reached at
jutta@cs.tu-berlin.de.


The good news is that new, sophisticated, real-time conferencing applications
delivering speech, text, and images are coming online at an ever-increasing
rate. The bad news is that at times, these applications can bog down the
network to the point where it becomes unresponsive. Solutions to this problem
range from increasing network bandwidth--a costly and time-consuming
process--to compressing the data being transferred. I'll discuss the latter
approach. 
In 1992, my research group at the Technical University of Berlin needed a
speech-compression algorithm to support our video-conferencing research. We
found what we were looking for in GSM, the Global System for Mobile
telecommunication protocol suite that's currently Europe's most popular
protocol for digital cellular phones. GSM is a telephony standard defined by
the European Telecommunications Standards Institute (ETSI). In this article,
I'll present the speech-compression part of GSM, focusing on the GSM 06.10
RPE-LTP ("regular pulse excitation long-term predictor") full-rate speech
transcoder. My colleague Dr. Carsten Bormann and I have implemented a GSM
06.10 coder/decoder (codec) in C. Its source is available via anonymous ftp
from site ftp.cs.tu-berlin.de in the /pub/local/kbs/tubmik/gsm/ directory. It
is also available electronically from DDJ; see "Availability," page 3.
Although originally written for UNIX-like environments, others have ported the
library to VMS and MS-DOS. 


Human Speech


When you pronounce "voiced" speech, air is pushed out from your lungs, opening
a gap between the two vocal folds, which is the glottis. Tension in the vocal
folds (or cords) increases until--pulled by your muscles and a Bernoulli force
from the stream of air--they close. After the folds have closed, air from your
lungs again forces the glottis open, and the cycle repeats--between 50 to 500
times per second, depending on the physical construction of your larynx and
how strong you pull on your vocal cords. 
For "voiceless" consonants, you blow air past some obstacle in your mouth, or
let the air out with a sudden burst. Where you create the obstacle depends on
which atomic speech sound ("phoneme") you want to make. During transitions,
and for some "mixed" phonemes, you use the same airstream twice--first to make
a low-frequency hum with your vocal cords, then to make a high-frequency,
noisy hiss in your mouth. (Some languages have complicated clicks, bursts, and
sounds for which air is pulled in rather than blown out. These instances are
not described well by this model.) 
You never really hear someone's vocal cords vibrate. Before vibrations from a
person's glottis reach your ear, those vibrations pass through the throat,
over the tongue, against the roof of the mouth, and out through the teeth and
lips. 
The space that a sound wave passes through changes it. Parts of one wave are
reflected and mix with the next oncoming wave, changing the sound's frequency
spectrum. Every vowel has three to five typical ("formant") frequencies that
distinguish it from others. By changing the interior shape of your mouth, you
create reflections that amplify the formant frequencies of the phoneme you're
speaking.


Digital Signals and Filters


To digitally represent a sound wave, you have to sample and quantize it. The
sampling frequency must be at least twice as high as the highest frequency in
the wave. Speech signals, whose interesting frequencies go up to 4 kHz, are
often sampled at 8 kHz and quantized on either a linear or logarithmic scale.
The input to GSM 06.10 consists of frames of 160 signed, 13-bit linear PCM
values sampled at 8 kHz. One frame covers 20 ms, about one glottal period for
someone with a very low voice, or ten for a very high voice. This is a very
short time, during which a speech wave does not change too much. The
processing time plus the frame size of an algorithm determine the "transcoding
delay" of your communication. (In our work, the 125-ms frames of our input and
output devices caused us more problems than the 20-ms frames of the GSM 06.10
algorithm.)
The encoder compresses an input frame of 160 samples to one frame of 260 bits.
One second of speech turns into 1625 bytes; a megabyte of compressed data
holds a little more than ten minutes of speech.
Central to signal processing is the concept of a filter. A filter's output can
depend on more than just a single input value--it can also keep state. When a
sequence of values is passed through a filter, the filter is "excited" by the
sequence. The GSM 06.10 compressor models the human-speech system with two
filters and an initial excitation. The linear-predictive short-term filter,
which is the first stage of compression and the last during decompression,
assumes the role of the vocal and nasal tract. It is excited by the output of
a long-term predictive (LTP) filter that turns its input--the residual pulse
excitation (RPE)--into a mixture of glottal wave and voiceless noise.
As Figure 1 illustrates, the encoder divides the speech signal into short-term
predictable parts, long-term predictable parts, and the remaining residual
pulse. It then quantizes and encodes that pulse and parameters for the two
predictors. The decoder (the synthesis part) reconstructs the speech by
passing the residual pulse through the long-term prediction filter, and passes
the output of that through the short-term predictor.


Linear Prediction


To model the effects of the vocal and nasal tracts, you need to design a
filter that, when excited with an unknown mixture of glottal wave and noise,
produces the speech you're trying to compress. If you restrict yourself to
filters that predict their output as a weighted sum (or "linear combination")
of their previous outputs, it becomes possible to determine the sum's optimal
weights from the output alone. (Keep in mind that the filter's output---the
speech---is your algorithm's input.)
For every frame of speech samples s[], you can compute an array of weights
lpc[P] so that s[n] is as close as possible to
lpc[0]*s[n--1]+lpc[1]*s[n--2]+_+lpc[P--1]*s[n--P] for all sample values s[n].
P is usually between 8 and 14; GSM uses eight weights. The levinson_durbin()
function in Listing Two calculates these linear-prediction coefficients.
The results of GSM's linear-predictive coding (LPC) are not the direct lpc[]
coefficients of the filter equation, but the closely related "reflection
coefficients;" see Figure 2. This term refers to a physical model of the
filter: a system of connected, hard-walled, lossless tubes through which a
wave travels in one dimension.
When a wave arrives at the boundary between two tubes of different diameters,
it does not simply pass through: Parts of it are reflected and interfere with
waves approaching from the back. A reflection coefficient calculated from the
cross-sectional areas of both tubes expresses the rate of reflection. The
similarity between acoustic tubes and the human vocal tract is not accidental;
a tube that sounds like a vowel has cross-sections similar to those of your
vocal tract when you pronounce that vowel. But the walls of your mouth are
soft, not lossless, so some wave energy turns to heat. The walls are not
completely immobile either--they resonate with lower frequencies. If you open
the connection to your nasal tract (which is normally closed), the model will
not resemble the physical system very much.


Short-Term and Long-Term Analysis 


The short-term analysis section of the algorithm calculates the short-term
residual signal that will excite the short-term synthesis stage in the
decoder. We pass the speech backwards through the vocal tract by almost
running backwards the loop that simulates the reflections inside the sequence
of lossless tubes; see Listing Three. The remainder of the algorithm works on
40-sample blocks of the short-term residual signal, producing 56 bits of the
encoded GSM frame with each iteration.
The LTP analysis selects a sequence of 40 reconstructed short-term residual
values that resemble the current values. LTP scales the values and subtracts
them from the signal. As in the LPC section, the current output is predicted
from the past output. 
The prediction has two parameters: the LTP lag, which describes the source of
the copy in time, and the LTP gain, the scaling factor. (The lpc[] array in
the LPC section played a similar role as the LTP gain, but there was no
equivalent to the LTP lag; the lags for the sum were always 1,2,_,8). To
compute the LTP lag, the algorithm looks for the segment of the past that most
resembles the present, regardless of scaling. How do we compute resemblance?
By correlating the two sequences whose resemblance we want to know. The
correlation of two sequences, x[] and y[], is the sum of the products
x[n]*y[n--lag] for all n. It is a function of the lag, of the time between
every two samples that are multiplied. The lag between 40 and 120 with the
maximum correlation becomes the LTP lag. In the ideal voiced case, that LTP
lag will be the distance between two glottal waves, the inverse of the
speech's pitch.
The second parameter, the LTP gain, is the maximum correlation divided by the
energy of the reconstructed short-term residual signal. (The energy of a
discrete signal is the sum of its squared values--its correlation with itself
for a lag of zero.) We scale the old wave by this factor to get a signal that
is not only similar in shape, but also in loudness. Listing Four shows an
example of long-term prediction.


Residual Signal



To remove the long-term predictable signal from its input, the algorithm then
subtracts the scaled 40 samples. We hope that the residual signal is either
weak or random and consequently cheaper to encode and transmit. If the frame
recorded a voiced phoneme, the long-term filter will have predicted most of
the glottal wave, and the residual signal is weak. But if the phoneme was
voiceless, the residual signal is noisy and doesn't need to be transmitted
precisely. Because it cannot squeeze 40 samples into only 47 remaining GSM
06.10-encoded bits, the algorithm down-samples by a factor of three,
discarding two out of three sample values. We have four evenly spaced 13-value
subsequences to choose from, starting with samples 1, 2, 3, and 4. (The first
and the last have everything but two values in common.)
The algorithm picks the sequence with the most energy--that is, with the
highest sum of all squared sample values. A 2-bit "grid-selection" index
transmits the choice to the decoder. That leaves us with 13 3-bit sample
values and a 6-bit scaling factor that turns the PCM encoding into an APCM
(Adaptive PCM; the algorithm adapts to the overall amplitude by increasing or
decreasing the scaling factor). 
Finally, the encoder prepares the next LTP analysis by updating its remembered
"past output," the reconstructed short-term residual. To make sure that the
encoder and decoder work with the same residual, the encoder simulates the
decoder's steps until just before the short-term stage; it deliberately uses
the decoder's grainy approximation of the past, rather than its own more
accurate version.


The Decoder


Decoding starts when the algorithm multiplies the 13 3-bit samples by the
scaling factor and expands them back into 40 samples, zero-padding the gaps.
The resulting residual pulse is fed to the long-term synthesis filter: The
algorithm cuts out a 40-sample segment from the old estimated short-term
residual signal, scales it by the LTP gain, and adds it to the incoming pulse.
The resulting new estimated short-term residual becomes part of the source for
the next three predictions.
Finally, the estimated short-term residual signal passes through the
short-term synthesis filter whose reflection coefficients the LPC module
calculated. The noise or glottal wave from the excited long-term synthesis
filter passes through the tubes of the simulated vocal tract--and emerges as
speech.


The Implementation


Our GSM 06.10 implementation consists of a C library and stand-alone program.
While both were originally designed to be compiled and used on UNIX-like
environments with at least 32-bit integers, the library has been ported to VMS
and MS-DOS. GSM 06.10 is faster than code-book lookup algorithms such as CELP,
but by no means cheap: To use it for real-time communication you'll need at
least a medium-scale workstation.
When using the library, you create a gsm object that holds the state necessary
to either encode frames of 160 16-bit PCM samples into 264-bit GSM frames, or
to decode GSM frames into linear PCM frames. (The "native" frame size of GSM,
260 bits, does not fit into an integral number of 8-bit bytes.) If you want to
examine and change the individual parts of the GSM frame, you can "explode" it
into an array of 70 parameters, change them, and "implode" them back into a
packed frame. You can also print an entire GSM frame to a file in
human-readable format with a single function call.
We also wrote some throwaway tools to generate the bit-packing and unpacking
code for the GSM frames. You can easily change the library to handle new frame
formats. We verified our implementation with test patterns from the ETSI.
However, since ETSI patterns are not freely available, we aren't distributing
them. Nevertheless, we are providing test programs that understand ETSI
formats.
The front-end program called "toast" is modeled after the UNIX compress
program. Running toast myspeech, for example, will compress the file myspeech,
remove it, and collect the result of the compression in a new file called
myspeech.gsm; untoast myspeech will revert the process, though not
exactly--unlike compress, toast loses information with each compression cycle.
(After about ten iterations, you can hear high-pitched chirps that I initially
mistook for birds outside my office window.)
Listing One is params.h, which defines P_MAX and WINDOW and declares the three
functions in Listing Two--schur(), levinson_durbin(), and
autocorrelation()--that relate to LPC. Listing Three uses the functions from
Listing Two in a short-term transcoder that makes you sound like a
"speak-and-spell" machine. Finally, Listing Four offers a plug-in LTP that
adds pitch to Listing Three's robotic voice.


For More Information


Deller, John R., Jr., John G. Proakis, and John H.L. Hansen. Discrete-Time
Processing of Speech Signals. New York, NY: Macmillan Publishing, 1993.
"Frequently Asked Questions" posting in the USENET comp.speech news group. 
Li, Sing. "Building an Internet Global Phone." Dr. Dobb's Sourcebook on the
Information Highway, Winter 1994.
Figure 1 Overview of the GSM 6.10 architecture.
Figure 2 Reflection coefficients.

Listing One 

/* params.h -- common definitions for the speech processing listings. */

#define P_MAX 8 /* order p of LPC analysis, typically 8..14 */
#define WINDOW 160 /* window size for short-term processing */

double levinson_durbin(double const * ac, double * ref, double * lpc);
double schur(double const * ac, double * ref);
void autocorrelation(int n, double const * x, int lag, double * ac);



Listing Two

/* LPC- and Reflection Coefficients
 * The next two functions calculate linear prediction coefficients
 * and/or the related reflection coefficients from the first P_MAX+1
 * values of the autocorrelation function.
 */
#include "params.h" /* for P_MAX */

/* The Levinson-Durbin algorithm was invented by N. Levinson in 1947
 * and modified by J. Durbin in 1959.
 */
double /* returns minimum mean square error */
levinson_durbin(

 double const * ac, /* in: [0...p] autocorrelation values */
 double * ref, /* out: [0...p-1] reflection coefficients */
 double * lpc) /* [0...p-1] LPC coefficients */
{
 int i, j; double r, error = ac[0];

 if (ac[0] == 0) {
 for (i = 0; i < P_MAX; i++) ref[i] = 0;
 return 0;
 }

 for (i = 0; i < P_MAX; i++) {

 /* Sum up this iteration's reflection coefficient. */
 r = -ac[i + 1];
 for (j = 0; j < i; j++) r -= lpc[j] * ac[i - j];
 ref[i] = r /= error; 

 /* Update LPC coefficients and total error. */
 lpc[i] = r;
 for (j = 0; j < i / 2; j++) {
 double tmp = lpc[j];
 lpc[j] = r * lpc[i - 1 - j];
 lpc[i - 1 - j] += r * tmp;
 }
 if (i % 2) lpc[j] += lpc[j] * r;

 error *= 1 - r * r;
 }
 return error;
}

/* I. Schur's recursion from 1917 is related to the Levinson-Durbin method,
 * but faster on parallel architectures; where Levinson-Durbin would take time
 * proportional to p * log(p), Schur only requires time proportional to p. The
 * GSM coder uses an integer version of the Schur recursion.
 */
double /* returns the minimum mean square error */
schur(
 double const * ac, /* in: [0...p] autocorrelation values */
 double * ref) /* out: [0...p-1] reflection coefficients */
{
 int i, m; double r, error = ac[0], G[2][P_MAX];

 if (ac[0] == 0.0) {
 for (i = 0; i < P_MAX; i++) ref[i] = 0;
 return 0;
 }

 /* Initialize the rows of the generator matrix G to ac[1...p]. */
 for (i = 0; i < P_MAX; i++) G[0][i] = G[1][i] = ac[i + 1];

 for (i = 0;;) {
 /* Calculate this iteration's reflection coefficient and error. */
 ref[i] = r = -G[1][0] / error;
 error += G[1][0] * r;

 if (++i >= P_MAX) return error;


 /* Update the generator matrix. Unlike Levinson-Durbin's summing of 
 * reflection coefficients, this loop could be executed in parallel
 * by p processors in constant time.
 */
 for (m = 0; m < P_MAX - i; m++) {
 G[1][m] = G[1][m + 1] + r * G[0][m];
 G[0][m] = G[1][m + 1] * r + G[0][m];
 }
 }
}

/* Compute the autocorrelation 
 * ,--,
 * ac(l) = > x(i) * x(i-l) for all i
 * `--'
 * for lags l between 0 and lag-1, and x(i) == 0 for i < 0 or i >= n
 */
void autocorrelation(
 int n, double const * x, /* in: [0...n-1] samples x */
 int lag, double * ac) /* out: [0...lag-1] autocorrelation */
{
 double d; int i;
 while (lag--) {
 for (i = lag, d = 0; i < n; i++) d += x[i] * x[i-lag];
 ac[lag] = d;
 }
}1


Listing Three

/* Short-Term Linear Prediction
 * To show which parts of speech are picked up by short-term linear
 * prediction, this program replaces everything but the short-term
 * predictable parts of its input with a fixed periodic pulse. (You
 * may want to try other excitations.) The result lacks pitch
 * information, but is still discernible.
 */
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

#include "params.h" /* See Listing One; #defines WINDOW and P_MAX */

/* Default period for a pulse that will feed the short-term processing. The 
 * length of the period is the inverse of the pitch of the program's 
 * "robot voice"; the smaller the period, the higher the voice.
 */
#define PERIOD 100 /* human speech: between 16 and 160 */

/* The short-term synthesis and analysis functions below filter and inverse-
 * filter their input according to reflection coefficients from Listing Two. 
 */
static void short_term_analysis(
 double const * ref, /* in: [0...p-1] reflection coefficients */
 int n, /* # of samples */
 double const * in, /* [0...n-1] input samples */
 double * out) /* out: [0...n-1] short-term residual */
{

 double sav, s, ui; int i;
 static double u[P_MAX];

 while (n--) {
 sav = s = *in++;
 for (i = 0; i < P_MAX; i++) {
 ui = u[i];
 u[i] = sav;

 sav = ui + ref[i] * s;
 s = s + ref[i] * ui;
 }
 *out++ = s;
 }
}
static void short_term_synthesis(
 double const * ref, /* in: [0...p-1] reflection coefficients */
 int n, /* # of samples */
 double const * in, /* [0...n-1] residual input */
 double * out) /* out: [0...n-1] short-term signal */
{
 double s; int i;
 static double u[P_MAX+1];

 while (n--) {
 s = *in++;
 for (i = P_MAX; i--;) {
 s -= ref[i] * u[i];
 u[i+1] = ref[i] * s + u[i];
 }
 *out++ = u[0] = s;
 }
}

/* This fake long-term processing section implements the "robotic" voice:
 * it replaces the short-term residual by a fixed periodic pulse.
 */
static void long_term(double * d)
{
 int i; static int r;
 for (i = 0; i < WINDOW; i++) d[i] = 0;
 for (; r < WINDOW; r += PERIOD) d[r] = 10000.0;
 r -= WINDOW;
}

/* Read signed short PCM values from stdin, process them as double,
 * and write the result back as shorts.
 */
int main(int argc, char ** argv)
{
 short s[WINDOW]; double d[WINDOW]; int i, n;
 double ac[P_MAX + 1], ref[P_MAX];

 while ((n = fread(s, sizeof(*s), WINDOW, stdin)) > 0) {

 for (i = 0; i < n; i++) d[i] = s[i];
 for (; i < WINDOW; i++) d[i] = 0;

 /* Split input into short-term predictable part and residual. */

 autocorrelation(WINDOW, d, P_MAX + 1, ac);
 schur(ac, ref);
 short_term_analysis(ref, WINDOW, d, d);

 /* Process that residual, and synthesize the speech again. */
 long_term(d);
 short_term_synthesis(ref, WINDOW, d, d);

 /* Convert back to short, and write. */
 for (i = 0; i < n; i++)
 s[i] = d[i] > SHRT_MAX ? SHRT_MAX
 : d[i] < SHRT_MIN ? SHRT_MIN
 : d[i];
 if (fwrite(s, sizeof(*s), n, stdout) != n) {
 fprintf(stderr, "%s: write failed\n", *argv);
 exit(1);
 }
 if (feof(stdin)) break;
 }
 if (ferror(stdin)) {
 fprintf(stderr,"%s: read failed\n", *argv); exit(1); }
 return 0;
}


Listing Four

/* Long-Term Prediction
 * Here's a replacement for the long_term() function of Listing Three:
 * A "voice" that is based on the two long-term prediction parameters,
 * the gain and the lag. By transmitting very little information,
 * the final output can be made to sound much more natural.
 */
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

#include "params.h" /* see Listing One; #defines WINDOW */

#define SUBWIN 40 /* LTP window size, WINDOW % SUBWIN == 0 */

/* Compute n
 * ,--,
 * cc(l) = > x(i) * y(i-l)
 * `--'
 * i=0
 * for lags l from 0 to lag-1.
 */
static void crosscorrelation(
 int n, /* in: # of sample values */
 double const * x, /* [0...n-1] samples x */
 double const * y, /* [-lag+1...n-1] samples y */
 int lag, /* maximum lag+1 */
 double * c) /* out: [0...lag-1] cc values */
{
 while (lag--) {
 int i; double d = 0;
 for (i = 0; i < n; i++)
 d += x[i] * y[i - lag];

 c[lag] = d;
 }
}

/* Calculate long-term prediction lag and gain. */
static void long_term_parameters(
 double const * d, /* in: [0.....SUBWIN-1] samples */
 double const * prev, /* [-3*SUBWIN+1...0] past signal */
 int * lag_out, /* out: LTP lag */
 double * gain_out) /* LTP gain */
{
 int i, lag;
 double cc[2 * SUBWIN], maxcc, energy;

 /* Find the maximum correlation with lags SUBWIN...3*SUBWIN-1
 * between this frame and the previous ones.
 */
 crosscorrelation(SUBWIN, d, prev - SUBWIN, 2 * SUBWIN, cc);
 maxcc = cc[lag = 0];

 for (i = 1; i < 2 * SUBWIN; i++)
 if (cc[i] > maxcc) maxcc = cc[lag = i];

 *lag_out = lag + SUBWIN;

 /* Calculate the gain from the maximum correlation and
 * the energy of the selected SUBWIN past samples.
 */
 autocorrelation(SUBWIN, prev - *lag_out, 1, &energy);
 *gain_out = energy ? maxcc / energy : 1.0;
}

/* The "reduce" function simulates the effect of quantizing,
 * encoding, transmitting, decoding, and inverse quantizing the
 * residual signal by losing most of its information. For this
 * experiment, we simply throw away everything but the first sample value.
 */
static void reduce(double *d) {
 int i;
 for (i = 1; i < SUBWIN; i++) d[i] = 0;
}

void long_term(double * d)
{
 static double prev[3*SUBWIN];
 double gain;
 int n, i, lag;

 for (n = 0; n < WINDOW/SUBWIN; n++, d += SUBWIN) { 

 long_term_parameters(d, prev + 3*SUBWIN, &lag, &gain);

 /* Remove the long-term predictable parts. */
 for (i = 0; i < SUBWIN; i++)
 d[i] -= gain * prev[3 * SUBWIN - lag + i];

 /* Simulate encoding and transmission of the residual. */
 reduce(d);


 /* Estimate the signal from predicted parameters and residual. */
 for (i = 0; i < SUBWIN; i++)
 d[i] += gain * prev[3*SUBWIN - lag + i];

 /* Shift the SUBWIN new estimates into the past. */
 for (i = 0; i < 2*SUBWIN; i++) prev[i] = prev[i + SUBWIN];
 for (i = 0; i < SUBWIN; i++) prev[2*SUBWIN + i] = d[i];
 }
}





















































December, 1994
Intelligent XYModem


Establishing a common ground for XModem/YModem dialects




Tim Kientzle


More on XYModem and other serial protocols can be found in Tim's upcoming book
from Coriolis Group Books. Tim can be contacted at kientzle@netcom.com.


When Ward Christensen upgraded his MODEM program to MODEM2 in 1977, he added a
method for transferring binary files between two CP/M computers. This
protocol, which became known as "XModem," spread rapidly, primarily because of
it's simplicity. But this popularity came at a price. New implementations
often incorporated minor changes to the original protocol, resulting in a
babel of incompatible "dialects."
Today, programs that attempt to support a wide variety of file-transfer
protocols sometimes list as many as eight XModem and YModem variations. Worse,
the same dialect can appear under different names: What some systems refer to
as "XModem-1k" is referred to by others as "YModem." This lack of
standardization confuses users; even if the local terminal program and remote
host are using identically named protocols, they still may not be able to
transfer a file!


The XYModem Protocol


About five years ago, I began to develop techniques for automatically
determining which XModem or YModem dialect is being used by the other end. The
result is a single protocol, which I call "XYModem," implemented in the
program XY.C. Applications using this "smart" XYModem protocol can transfer
files with any of the common XModem or YModem dialects. Listing One presents
excerpts from XY.C. The complete implementation is available electronically;
see "Availability," page 3.
The major advantage of this combined protocol over more traditional
implementations is that it simplifies the user interface. A smart
implementation reduces the need for the user to set his or her program to
exactly match the other end. It also significantly reduces the number of
settings and technical terms that the user has to contend with. 
Listing One is organized by layers. Each layer reflects a different level of
functionality, ranging from basic serial I/O to a complete transfer session. 


Packet Layer


The lowest layer is the packet layer. XYModem has data packets (see Table 1),
and control packets (Table 2). One of XYModem's weaknesses, compared to
protocols such as Kermit or ZModem, is that control packets do not contain
error-check information. This means that higher layers need to allow for
altered or spurious control packets.


Reliability Layer


The reliability layer is responsible for ensuring that data packets are
correctly transferred. It calls packet-layer functions to send or receive data
and interprets responses and errors. The receiver responds to most errors by
waiting for the line to clear and sending a NAK to request that the packet be
resent.
XYModem is receiver driven so the XYModem receiver should always timeout
before the sender. To simplify the UI, I usually offer the user a short menu
of timeouts. I then subtract one second from the user's choice to get the
actual receiver timeout and multiply it by 5 to get the actual sender timeout.
Thus, the default user setting of 10 seconds results in a 9-second receiver
timeout and a 50-second sender timeout.


File Layer


The file layer, responsible for transferring a single file, involves a
handshake negotiation and an optional batch-header transfer, followed by the
transfer of the actual file data. 
By convention, text files are converted to a neutral format with CR/LF
terminating each line, and the last packet is filled out with SUB characters
(Ctrl-Z). This corresponds to the CP/M file format used by the first XModem
implementations. For optimal compatibility with older systems, it's advisable
to ensure that text files end with CR/LF, and that there is at least one SUB
character after the end of the file data.
Handshake. The file layer's handling of the initial handshake is a bit
complicated because the handshakes used by different dialects overlap. In
particular, the C handshake is used by both the non-batch XModem-CRC dialect
and the batch YModem dialect. Table 3 lists the handshake sequences that may
be sent by the XYModem receiver. (Forsberg, the inventor of YModem, allows a
YModem transfer to begin with a NAK handshake, specifying Checksum error
detection. However, this is rarely used.) 
The receiver starts with a handshake based on the user's protocol setting,
which should default to YModem. The receiver repeats the initial handshake
three times; if there is no response, it tries a different handshake, as
indicated under "Fallback" in Table 3. After a total of ten timeouts, the
receiver fails.
When the sender sees a handshake it recognizes, it responds with one of two
packets--a filename packet (to attempt a batch-mode transfer) or the first
packet of data (for a non-batch-mode transfer)--depending on the receiver's
handshake. After three failures, the sender tries the other. Once the receiver
acknowledges one of these packets, the sender knows whether or not batch mode
will be used.
At this point, the sender and receiver agree on the batch mode and error-check
method. The only remaining possibility for disagreement is the packet size.
Unfortunately, it's dangerous to try sending the same data with different
packet sizes just to discover the receiver's capabilities. For example,
spurious ACKs can cause both sides to believe the file transfer has succeeded,
even though data has been duplicated. (ZModem, which uses file position rather
than packet number to identify data, is immune to this problem.) So the
XYModem sender assumes that all batch receivers can handle long data packets
and relies on the initial handshake to determine if nonbatch receivers can
handle long packets.
To show how these rules work in practice, Figures 1 and 2 illustrate
negotiations between XYModem and non-XYModem implementations. Figure 1
demonstrates an XYModem sender attempting to transfer a file with an
XModem-CRC receiver. The XYModem sender initially sends a batch header, but
after three NAKs changes to a nonbatch transfer, which succeeds. In this case,
the startup is quite fast, since the XModem-CRC receiver is likely to NAK the
unrecognized packets immediately. Other instances of an XYModem sender
transferring data to a non-XYModem receiver work similarly.
Figure 2 demonstrates an XYModem receiver attempting to transfer a file with a
YModem-G sender. The XYModem receiver has been configured to start with plain
YModem. This example shows that, although the XYModem receiver will eventually
manage to negotiate a common protocol, it can be time consuming. If the user
can choose the protocol for the XYModem receiver to try first, the negotiation
is much quicker. Note that if the sender were an XYModem sender, it would
recognize and respond to the first handshake and a normal YModem transfer
would take place immediately.
Nonbatch File Transfer. In nonbatch mode, the file data is transferred
beginning with packet number one. After the receiver acknowledges the last
data packet, the sender sends an EOT and waits for the receiver's
acknowledgment. Figure 3 illustrates this exchange. Notice that because EOT
can conceivably be generated by noise, the receiver sends NAK to challenge the
first EOT. (Older nonbatch senders don't wait for the acknowledgment, so a
receiver using this technique should be prepared to terminate gracefully if
there is no response to the NAK.)
Batch File Transfer. Instead of simply transferring the file contents as
Figure 3 indicates, the sender may attempt a batch transfer. The batch
transfer consists of adding a packet zero with data about the file; see Figure
4. If the receiver fails to acknowledge packet zero, the sender should try
sending the first packet of data and continue in nonbatch mode if the receiver
acknowledges.
Packet zero contains two consecutive, null-terminated strings. The remainder
of the packet is filled with nulls. The first string is the mandatory
filename, the second contains the file size as a decimal number, the file's
time of creation (an octal number representing seconds since January 1, 1970),
and the file's mode (as a UNIX-style octal number). These three fields are
separated by blanks and are all optional. If any field is omitted, all
following fields must also be omitted. Also, if the file size is unknown
(which is common for text files that require end-of-line conversion), the size
field must be omitted.


Session Layer



The session layer handles multiple files. After the first file is transferred,
the protocol knows if batch mode is being used. If so, the session layer loops
to repeatedly transfer the remaining files and does a final transfer of a file
with no name to terminate the session. Otherwise, it simply terminates after
transferring the first file.


XYModem Implementation


I've implemented the XModem and YModem protocols, both separately and in the
combined form discussed earlier, several times over the last ten years. Each
time I've learned a few more tricks. 
One of the major reasons file-transfer protocols exist is to deal with errors.
Good error handling can't be easily added as an afterthought. The approach I
prefer is for each function to return a status code: 0 if the function
succeeded with no errors, or a code indicating the type of error. A collection
of constants defines the error values. Using this approach, most function
calls end up looking somewhat like Example 1. 
Three basic serial functions simplify the implementation of most file-transfer
protocols, while improving portability. 
ReadBytesWithTimeout accepts a buffer and a timeout interval and returns when
the buffer is filled or the timeout expires.
SendBytes queues bytes to be sent.
WaitForSentBytes waits until the outgoing serial queue is empty.
Each of these returns a status code to indicate one of the following:
The operation was completed successfully.
A user interrupt was detected. This result is eventually returned to the top
layer of the transfer code, which sends the abort sequence.
There was a fatal serial error which cannot be recovered from (loss of
carrier, for example).
There was a nonfatal serial error which invalidates the received data (a
framing or overrun error, for example).
There was a timeout.
There are two reasons why it's useful to implement sending as two separate
functions. First, accurate timing requires that the protocol not begin waiting
for a response until the data has actually been sent. Calling WaitForSentBytes
after each packet is sent helps to ensure this. Second, it allows you to queue
data for sending and then do additional work while the serial queue empties.
In particular, this speeds sending packets, since the CRC calculation can be
started after the data bytes are queued for sending.
Providing good user feedback for file transfers requires balancing user
requirements against efficiency. If you don't update the display for every
transfer event, then at low baud rates a user may easily believe that the
transfer has stopped. If you do update the display, then at high baud rates
the overhead of updating can slow the transfer. (Excessive screen updates have
been known to slow transfers by 20 percent or more.)
My solution is to generate frequent updates to the screen information, but to
limit the frequency at which the screen is actually updated. It is usually
easy to record the time of the last update and only change the screen if there
is some unusual condition or if a minimum time has elapsed.
Care also must be given when deciding what information to display. Too much
information can be intimidating to nontechnical users. Table 4 lists the
information I have found to be consistently useful.
Many programs overwhelm the user with a wealth of settings. My preference is
to provide only two: a timeout setting with options of 5, 10, or 20 seconds;
and a protocol setting with choices of XModem, YModem, or YModem-G. The other
settings commonly provided by terminal programs are either not useful in
practice, or made largely obsolete by the automatic negotiation of the XYModem
protocol.
The only real trick to implementing the combined protocol is dealing with
ambiguity. For example, if the sender sees a C handshake, then the receiver
may or may not support batch transfers. To help with this, my implementation
maintains a structure with information about which protocol options are
currently active. Each option contains two variables: one to indicate whether
that option is currently enabled, and another to indicate if that option is
locked or not. During the handshake negotiation, the program uses the current
options, changing them after repeated errors. When there is evidence that a
particular option is correct, that option is locked and not changed again
during the session.


Summary


XModem and YModem are often criticized for being difficult to use. Much of
that difficulty is due to unrefined user interfaces which require the user to
know many details of protocol operation. By shifting the burden to the
implementation and using automatic negotiation, the UI complexity can be
significantly reduced.
Table 1: XYModem data packet.
Size Description 
1 1 for 128-byte packet or 2 for 1024-byte packet.
1 Packet-sequence number.
1 Complement of sequence number.
128 or
1024 Data.
1 or 2 1-byte. checksum/2-byte CRC-16.
Table 2: XYModem control packets.
Control Packet Description 
ACK Sent by receiver to acknowledge successful receipt
 of a packet; sender responds with next data packet 
 or EOT.
NAK Sent by receiver to request repeat of previous packet.
EOT Sent by sender to indicate end of file data.
CAN CAN Sent by either side to abort transfer.
Table 3: Handshakes.
Handshake Fallback Error Check Batch Long Packets Dialect 
NAK G Checksum No No XModem
C NAK CRC-16 No Yes XModem-1K
C NAK CRC-16 No No XModem-CRC
C NAK CRC-16 Yes Yes YModem
G C CRC-16 Yes Yes YModem-G
Table 4: Progress information.
Relative progress. During sends or during batch receives, the file size should
be available so it's possible to compute the relative progress and display it
using a "thermometer" bar. If the file size is not available, then the number
of bytes transferred can be substituted.

Current filename. The receiver should show the filename under which the file
is being stored, rather than the name specified by the sender. These may
differ if the name given by the sender is illegal on this particular system or
the receiver is modifying names to avoid overwriting existing files.


Estimated time remaining. On some systems, this display can be updated every
second, providing a reassuring, steady countdown to the user. This should be
computed from a decaying average of the measured transfer speed, not from the
serial-port baud rate.

Throughput estimate. Percent-efficiency estimates, while popular, can be
misleading. Using newer, speed-buffering modems, the "efficiency" may appear
very low because the modems are connected at a low speed. I prefer to display
a throughput estimate in bits per second. Since modems are commonly rated in
bits per second, these numbers are more likely to be meaningful to the user.

Status. Detailed status messages are not very useful to the user and can
actually slow down the transfer if too much time is spent frequently updating
the display. I prefer to limit the status during the transfer to:
Negotiating_, Sending_, Receiving_, and Finishing_. When the transfer stops,
the status shows either Finished, Failed, or Aborted.
Figure 1: Handshake example: XModem-CRC receiver, XYModem sender.
 Receiver Sender
Send C handshake
 Send filename packet
 NAK
 Send filename packet
 NAK
 Send filename packet
 NAK
 Send first data packet
 Acknowledge

Nonbatch XModem-CRC transfer proceeds
Figure 2: Handshake example: XYModem receiver, YModem-G sender.
 Receiver Sender
Send C handshake
 Timeout
Send C handshake
 Timeout
Send C handshake
 Timeout
Send NAK handshake
 Timeout
Send NAK handshake
 Timeout
Send NAK handshake
 Timeout
Send G handshake
 Send filename packet

 YModem-G transfer proceeds
Figure 3: XYModem nonbatch file layer.
 Receiver Sender 
Send handshake
 Send packet one
 Acknowledge
 .
 .
 .
 Send last packet
 Acknowledge
 Send EOT
 NAK
 Resend EOT
 Acknowledge
 End of file
Figure 4: XYModem batch file layer.
 Receiver Sender 
 Send handshake
 Send packet zero
 Acknowledge
Repeat handshake
 Send packet one
 Acknowledge

 .
 .
 .
 Send EOT
 NAK
 Resend EOT
 Acknowledge
 End of file
Example 1: XYModem function calls.
int status;
status = function(arguments);
if (status == special value)
 fix problem;
else if (status != 0)
 return status;

Listing One 

/****************************************************************************
** Excerpts from XY.C **
****************************************************************************/

/**** Every function returns a status code. These macros help. ****/
#define StsWarn(s) Sts_Warn(s,__FILE__,__LINE__)
#define StsRet(e) do{int tmp_s; if ((tmp_s = (e)) != xyOK) \
 return StsWarn(tmp_s);}while(FALSE)
static int Sts_Warn(int s,char *file,int line)
{ switch(s) {
 case xyFail: fprintf(stderr,"!?:%s:%d:xyFail\n",file,line); break;
 /* ... other cases deleted ... */ }
 return s;
}
/* A 'capability' has two fields. The 'enabled' field determines current state
of the capability, while the 'certain' field determines whether the capability
can still change. Whenever we have evidence of a capability (for example, we
receive an acknowledge for a long packet), set the 'certain' field to TRUE. */

typedef struct CAPABILITY { int enabled, certain; } CAPABILITY;

/* This structure contains the current state of the transfer. By keeping all 
information about the transfer in a dynamically-allocated structure, we can 
allow multiple simultaneous transfers in a multi-threaded system. */

typedef struct XYMODEM XYMODEM;
struct XYMODEM {
 CAPABILITY crc, /* Does other end support CRC? */
 longPacket, /* Does other end support 1K blocks? */
 batch, /* Does other end support batch? */
 G; /* Does other end support `G'? */
 int timeout; /* Number of seconds timeout */
 int retries; /* Number of times to retry a packet */
 int userAbort; /* Set when user asks for abort */
 int packetNumber; /* Current packet number */
 PORT_TYPE port; /* Port to use */
 FILE * f; /* Current file */
 char fileName[128]; /* Name of file being transferred */
 long fileSize; /* Size of file being transferred, -1 if not known */
 long fileDate; /* Mod time, GMT, in seconds after 1/1/1970 */
 long fileMode; /* Unix-style file mode */

 long transferred;/* Number of bytes transferred so far */
};

/**************************** Packet Layer *********************************/
/* XYSendPacket. Send an XYModem data packet. Length must be less than 1024.
 Packets shorter than 128 or longer than 128 but shorter than 1024 are
 padded with SUB characters. */

static int XYSendPacket(XYMODEM *pXY, BYTE *pBuffer, int length)
 /* ... body deleted ... */

/* XYReceivePacket. Receiver must be able to receive three different types of 
packets: data packets start with a recognizable 3-byte sequence; EOT packets 
consist of a single EOT character; CAN packets consist of two consecutive CAN.
We use a shorter timeout between bytes within a single packet than we do 
between packets. This is to help speed error recovery. */

static int XYReceivePacket(XYMODEM *pXY, int *pPacketNumber,
 BYTE *pBuffer, int *pLength)
{
 BYTE startOfPacket = 0;
 BYTE packet = 0;
 BYTE packetCheck = 0;

/* This loop searches incoming bytes for a valid packet start. This reduces */
/* our sensitivity to inter-packet noise. Also check for EOT and CAN
packets.*/
 StsRet(XYReadBytesWithTimeout(pXY,pXY->timeout,&packetCheck,1));
 if (packetCheck == EOT) return xyEOT;
 do {
 startOfPacket = packet; packet = packetCheck;
 StsRet(XYReadBytesWithTimeout(pXY,pXY->timeout,&packetCheck,1));
 if ((packetCheck == CAN) && (packet == CAN)) return StsWarn(xyFailed);
 } while ( ( (startOfPacket != 0x01) && (startOfPacket != 0x02) )
 ( ((packet ^ packetCheck) & 0xFF) != 0xFF ) );
 /* ... rest of body deleted ... */
}

/************************* Reliability Layer ******************************/
/* XYSendPacketReliable. Repeatedly sends a packet until it is acknowledged. 
Note that only a slight change is required to handle the YModem-G protocol. */

static int XYSendPacketReliable(XYMODEM *pXY, BYTE *pBuffer, int length)
{
 int err;
 BYTE response = ACK;
 do {
 StsRet(XYSendPacket(pXY, pBuffer, length));
 if (pXY->G.enabled) return xyOK;
 do { /* Read ACK or NAK response */
 err = XYReadBytesWithTimeout(pXY, pXY->timeout, &response, 1);
 if (err == xyTimeout) return StsWarn(xyFail);
 StsRet(err);
 } while ((response != ACK) && (response != NAK));
 } while (response != ACK);
 pXY->crc.certain = TRUE; /* Checksum/crc mode is now known */
 if (length > 128)
 pXY->longPacket.enabled = pXY->longPacket.certain = TRUE;
 return xyOK;
}


/* XYReceivePacketReliable. Handles NAK of data packets until a valid packet
is
received. The next layer up is responsible for sending the ACK and dealing
with packet sequencing issues. EOT packets are handled here by the following 
logic: An EOT is considered reliable if it is repeated twice by the sender 
or if we get three consecutive timeouts after an EOT. Cancel packets are 
already reliable. We don't ACK packets here so the next layer up can do some
work (opening files, etc.) before the ACK is sent. That way, we can avoid
having to deal with the issue of overlapped serial and disk I/O. */

static int XYReceivePacketReliable(XYMODEM *pXY, int *pPacket,
 BYTE *pBuffer, int *pLength)
 /* ... body deleted ... */

/************************* File Layer *************************************/
/* Read and interpret receiver's handshake */
static int XYSendReadHandshake(XYMODEM *pXY, BYTE *pResponse)
{
 int err;
 do { /* Read handshake */
 err = XYReadBytesWithTimeout(pXY, pXY->timeout, pResponse, 1);
 if (err == xyTimeout) return xyFail;
 if (err != xyOK) return err;
 /* Interpret the receiver's handshake */
 switch(*pResponse) {
 case 'G': if (pXY->G.enabled) return xyOK;
 if (!pXY->G.certain) pXY->G.enabled = TRUE;
 if (!pXY->G.enabled) break;
 return xyOK;
 case 'C': /* ... deleted ... */
 case NAK: /* ... deleted ... */
 case ACK: return xyOK;
 default: break;
 }
 } while (TRUE);
}
/* Send packet zero or one, depending on batch mode. If there are too many
 failures, we swap batch mode. The buffer is passed as a parameter to reduce 
 our stack requirements. */ 
static int XYSendFirstPacket(XYMODEM *pXY, BYTE *pBuffer, int bufferLength )
{
 int err;
 int totalRetries = pXY->retries;
 int retries = pXY->retries / 2;
 BYTE response = ACK;
 int dataLength = 0;
 /* Get initial handshake */
 do {
 StsRet(XYSendReadHandshake(pXY,&response));
 } while (response == ACK);
 do {
 /* Send packet 0 or 1, depending on current batch mode */
 if (pXY->batch.enabled) { StsRet(XYSendPacketZero(pXY));
 if (pXY->G.enabled) return xyOK;
 } else {
 if (dataLength == 0) { /* Get packet 1 */
 dataLength = (pXY->longPacket.enabled)?1024:128;
 err = XYFileRead(pXY, pBuffer, &dataLength);
 }

 pXY->packetNumber = 1;
 StsRet(XYSendPacket(pXY,pBuffer,dataLength));
 }
 /* Get response or repeated handshake */
 StsRet(XYSendReadHandshake(pXY,&response));
 /* Count down number of retries */
 if (response != ACK) {
 if (retries-- == 0) {
 if (!pXY->batch.certain)
 pXY->batch.enabled = !pXY->batch.enabled;
 if (!pXY->longPacket.certain)
 pXY->longPacket.enabled = pXY->batch.enabled;
 retries = 2;
 }
 if (totalRetries-- == 0) return xyFail;
 }
 } while (response != ACK);
 if ( (pXY->packetNumber == 0) && (dataLength > 0) ) {
 pXY->packetNumber++;
 StsRet(XYSendPacketReliable(pXY, pBuffer, dataLength));
 pXY->transferred += dataLength;
 }
 pXY->G.certain = pXY->batch.certain = TRUE; /* batch mode is now known */
 return xyOK;
}
/* Send EOT and wait for acknowledgement */
static int XYSendEOT(XYMODEM *pXY) /* ... body deleted ... */

static int XYSendFile(XYMODEM *pXY) /* ... body deleted ... */

/* Shuffles capabilities for falling back to a lower protocol. Note that we do
 actually `fall' from basic XModem to YModem-G, to handle obstinate senders 
that may be looking for only a `G' or `C' handshake. */
static int XYReceiveFallback(XYMODEM *pXY)
{
 if (pXY->G.enabled) pXY->G.enabled = FALSE;
 else if (pXY->crc.enabled) pXY->crc.enabled
 = pXY->batch.enabled = pXY->longPacket.enabled = FALSE;
 else pXY->G.enabled = pXY->batch.enabled
 = pXY->longPacket.enabled = pXY->crc.enabled = TRUE;
 return xyOK;
}
/* Send the correct handshake for the current capabilities. */
static int XYReceiveSendHandshake(XYMODEM *pXY) /* ... body deleted ... */

static int XYReceiveFile(XYMODEM *pXY)
{
 BYTE data[1024];
 int dataLength, err = xyOK, packetNumber;
 int retries = pXY->retries/2 + 1;
 int totalRetries = (pXY->retries * 3)/2+1;
 /* Try different handshakes until we get the first packet */
 XYNewFile(pXY);
 XYProgress(pXY,stsNegotiating);
 do { 
 if (--retries == 0) {
 XYReceiveFallback(pXY);
 XYProgress(pXY,stsNegotiating);
 retries = (pXY->retries/3);

 }
 if (totalRetries-- == 0) return xyFail;
 StsRet(XYReceiveSendHandshake(pXY));
 err = XYReceivePacket(pXY, &packetNumber, data, &dataLength);
 if (err == xyEOT) /* EOT must be garbage... */
 StsRet(XYGobble(pXY, pXY->timeout/2));
 if (err == xyBadCRC) /* garbaged block */
 StsRet(XYGobble(pXY,pXY->timeout/3));
 } while ( (err == xyTimeout) (err == xyBadCRC) (err == xyEOT) );
 StsRet(err);
 if ((packetNumber != 0) && (packetNumber != 1)) return xyFail;
 /* The first packet tells us the sender's batch mode */
 /* If batch mode is certain, then a mismatch is fatal. */
 /* Note that batch mode is never certain for the first file. */
 if (packetNumber == 0) pXY->batch.enabled = TRUE;
 else if (pXY->batch.enabled && pXY->batch.certain) return xyFail;
 else pXY->batch.enabled = FALSE;
 pXY->batch.certain = TRUE;
 /* Open the file and make sure `data' contains the first part of file */
 if (packetNumber == 0) {
 if (data[0] == 0) { /* Empty filename? */
 StsRet(XYSendByte(pXY,ACK)); /* Ack packet zero */
 return StsWarn(xyEndOfSession);
 }
 StsRet(XYFileWriteOpen(pXY, data, dataLength));
 StsRet(XYSendByte(pXY,ACK)); /* Ack packet zero */
 StsRet(XYReceiveSendHandshake(pXY));
 err = XYReceivePacketReliable(pXY, &packetNumber, data, &dataLength);
 } else
 StsRet(XYFileWriteOpen(pXY, NULL, 0));
 pXY->packetNumber = 1; pXY->transferred = 0;
 XYProgress(pXY,stsReceiving);
 /* We have the first packet of file data. Receive remaining packets and 
 /* write it all to the file. Note that we're careful to ACK only after 
 /* file I/O is complete. */
 while (err == xyOK) {
 if (packetNumber == (pXY->packetNumber & 0xFF)) {
 if ( (pXY->fileSize >= 0)
 && (dataLength + pXY->transferred > pXY->fileSize) )
 dataLength = pXY->fileSize - pXY->transferred;
 StsRet(XYFileWrite(pXY,data,dataLength));
 pXY->transferred += dataLength;
 pXY->packetNumber++;
 XYProgress(pXY,stsReceiving);
 StsRet(XYSendByte(pXY,ACK)); /* Ack correct packet */
 } else if (packetNumber == (pXY->packetNumber-1) & 0xFF)
 StsRet(XYSendByte(pXY,ACK)); /* Ack repeat of previous packet */
 else return xyFail; /* Fatal: wrong packet number! */
 err = XYReceivePacketReliable(pXY, &packetNumber, data, &dataLength);
 }
 /* ACK the EOT. Note that the Reliability layer has already */
 /* handled a challenge, if necessary. */
 if (err == xyEOT) {
 XYProgress(pXY,stsFinishing);
 err = XYSendByte(pXY,ACK);
 }
 StsRet(XYFileWriteClose(pXY));
 StsRet(err);
 if (!pXY->batch.enabled) return StsWarn(xyEndOfSession);

 return xyOK;
}
/************************ Session Layer *********************************/
static int XYSend(XYMODEM *pXY)
{
 int err;
 XYNewFile(pXY);
 err = XYFileReadOpenNext(pXY);
 while (err == xyOK) {
 err = XYSendFile(pXY);
 XYFileReadClose(pXY);
 XYNewFile(pXY);
 if (err == xyOK) err = XYFileReadOpenNext(pXY);
 }
 if (err == xyEndOfSession) {
 err = xyOK;
 if (pXY->batch.enabled)
 err = XYSendSessionEnd(pXY); /* Transfer empty filename */
 }
 if (err == xyFail) {
 static BYTE cancel[] = {CAN,CAN,CAN,CAN,CAN,8,8,8,8,8};
 XYSendBytes(pXY,cancel,sizeof(cancel)/sizeof(cancel[0]));
 return xyFailed;
 }
 if (err == xyOK) XYProgress(pXY,stsFinished);
 else if (pXY->userAbort) XYProgress(pXY,stsAborted);
 else XYProgress(pXY,stsFailed);
 return xyOK;
}
/* The session layer simply receives successive files until end-of-session */
static int XYReceive(XYMODEM *pXY) /* ... body deleted ... */

/******************* Public Interface Layer ******************************/
/* Initialize the XY structure from the user preferences. Protocol codes are 
defined in xy.h, timeout is in seconds. */
int XYModemInit( void **ppXY, int protocol, int timeout, PORT_TYPE port)
 /* ... body deleted ... */

/* The public interfaces are simply stubs that call the internal XYSend/
XYReceive functions. We isolate the interface so it can be easily changed. */
int XYModemSend(void *pXY_public, char *filenames[], int count)
int XYModemReceive( void *pXY_public)




















December, 1994
Error-Recovery Codes


Powerful codes for reliable data communications




Bart De Canne


Bart is project manager for digital video communications in the Broadcast &
Cable division of Barco nv., Kortrijk, Belgium. He can be reached at
bdc@barco.be.


The need for reliable, high-speed transmission of video, audio, and data
applications has never been greater. One of the keys to reliable
communications is error recovery--and Bose-Chaudhuri-Hocquenghem (BCH) and
Reed-Solomon (RS) codes are two powerful approaches to error-control coding.
RS, in fact, is specified in virtually every proposed digital-TV transmission
standard that tolerates less than one uncorrected error event per hour. At a
multiplexed transmission rate of about 30 Mbits/sec, this corresponds to a
bit-error rate (BER) in the order of 10. With transmission channels typically
having a BER of 10e--3 to 10e--4, powerful error-correction techniques are
obviously required.
In this article, I'll examine BCH and RS codes, using them to encode data in
such a way that data errors can be recovered. In doing so, I'll present
GALOIS.EXE, a C program for constructing Galois fields and BCH/RS generators
and for encoding/ decoding of arbitrary files. The program also includes a
feature to "scramble" an encoded file with a specified number of symbol errors
per block to simulate error correctability. The complete package, including
programmer notes, is available electronically; see "Availability," page 3. 


Polynomials and Galois Fields


As Figure 1 illustrates, cyclic-linear block codes, like those discussed in
this article, convert blocks of L symbols at the encoder input into blocks of
N symbols (N>L). They're called "cyclic" because each shift of valid N-length
code word yields another code word. For these codes, the input/output relation
can easily be expressed using polynomials. The "generator polynomial" g(D)
completely characterizes a particular code.
Finite fields (or Galois fields)have p elements denoted as GF(p). The field's
elements are denoted by 0,1,_,p--1. On each pair of elements, both addition
and multiplication operations are defined. If the fields have a prime number
of arguments, addition and multiplication operations are performed the normal
way, with the result being modulo p. One reason for the presence of the
implied modulo operation is to keep the result of adding/multiplying two field
elements contained in the same GF.
However, there exist fields with a non-prime number of arguments. In such
cases, the number of GF elements is a power of a prime number, so the field
can be expressed as GF(pn) with p prime. Here I'll refer to GF(p) as the
"symbol field" and GF(pn) as the "locator field." You can express an element
of GF(pn) as a row of n GF(p) elements and hence as a polynomial of a degree
no larger than n--1, with each coefficient value belonging to GF(p).
How do you construct GF(pn)? The addition of two polynomials of degree n--1
yields another polynomial of degree n--1 or less. This is not the case for
multiplication. The resulting polynomial may not be a field element. However,
if you reduce the polynomial by performing the modulo operation using a
degree-n polynomial p(D), the result will be of degree n--1 or less. To
construct a valid field, you choose p(D) to be a "primitive" polynomial. For
GF(32), such a polynomial is D2+D+2. Table 1 lists the representation of the
elements of this Galois field. Note that you can represent a GF(pn) element ai
(i=0..pn--2) by the polynomial Di modulo g(D), g(D) being the primitive
polynomial for primitive element a. Since the primitive polynomial is of
degree n and is irreducible, you can find a polynomial for generating GF(pn)
by taking an irreducible polynomial of this degree and calculating Di mod g(D)
for i=0,_,pn--2. If this gives you pn--1 (all elements of GF(pn) except 0),
g(D) is primitive.
All operations are performed on coefficient values belonging to the
already-constructed symbol field. Listing One is the code for searching for
such a primitive polynomial (there's always at least one for each GF).
Afterwards, addition and multiplication operations on each pair of GF(pn)
elements are defined by means of the polynomial representation of these
operands: by adding or multiplying the corresponding polynomials modulo g(D).
The result is defined to be the polynomial representation of the GF element
corresponding to the sum or product of both operands. GALOIS.EXE stores this
value into a table defining the field's operations. Listing Two provides the
syntax used in the Galois program to store these and additional
one-dimensional tables for additive and multiplicative inverses. Table 2 shows
addition and multiplication tables for GF(22), constructed out of GF(2) using
D2+D+1 as a primitive polynomial over GF(2). Note that operations in GF(4) are
not ordinary addition and multiplication modulo 4 (4 is not prime!). You could
apply this procedure, for instance, to repeatedly generate GF(42) out of the
previously created GF(4) by using D2+D+2 over GF(4). Table 3 shows the
corresponding elements of GF(16).


Generating BCH/RS codes


BCH codes are designed for powerful error recovery--if you need more
robustness than Hamming provides, BCH is the way to go. The first step in
generating a specific BCH or RS code is the choice and construction of both
GF(p) and GF(pn) fields. GF(p) contains the symbols and defines operations on
these symbols. In the case of GF(2), a symbol is just a bit; if p is equal to
16, the code deals with 16-ary symbols, each one represented by four bits. The
block length N of the code is dependent on your choice of n in GF(pn) since
N=pn--1 for BCH and RS codes. So if you choose p=16 and n=2, the coder will
generate blocks of 255 symbols, each symbol consisting of four bits. Just one
parameter remains to be determined: the number of symbol errors (not bit
errors) per block N that can be recovered; this is denoted by t. Up to four
bit errors cause only one symbol error to occur if these four erroneous bits
all occur within the same symbol boundary. 
Coding theory proves that RS codes maximize error correctability for a set of
possible code words having a specific minimum distance (the minimum Hamming
distance between two valid N-length code words). The family of BCH codes
approximates this theoretical limit. The generators for BCH and RS codes are
given in Figure 2. RS codes actually constitute a subset of BCH codes: If both
symbol and locator field are the same (say, GF(p) and GF(p1), respectively),
the code generated is RS. In this case, all locator-field elements also belong
to the symbol field. The minimal polynomial of a locator-field element ai then
simply reduces to (D--ai), yielding the RS canonical form. Note also that the
number of recoverable symbol errors per block t always equals one-half the
number of parity symbols (N--L) added to the data block, the latter indicated
by the generator's degree. Listing Three shows how GALOIS.EXE constructs the
minimal polynomial for GF element ai and how to get to the generator
polynomial for a specific BCH or RS code.


Decoding BCH/RS Codes


How can a receiver know about the occurrence of one or more errors during
transmission or storage of the encoded block, given only the generator
polynomial g(D) and the received code word y, represented by its polynomial
y(D)? Assuming that code word x(D) is transmitted, you can write
y(D)=x(D)+e(D), introducing the error polynomial e(D). Remember, x(D) equals
u(D).g(D) and g(ai)=0 (i=1..2t) for BCH/RS codes by construction of the
generator. Hence, calculating y(ai) when receiving the block yields a nonzero
result only when errors have occurred. When their nonzero status points to an
erroneous condition, these 2t values y(ai) are called "syndromes" and are
denoted S(i).
Now you know, on a per block basis, whether or not errors have occurred, but
you don't know how many symbols are affected, nor the error positions. You can
find these positions by constructing yet another polynomial: the "error
locator" s(D). The inverses of its roots will indicate the positions in error.
Figure 3 defines s(D). 
The key to decoding BCH/RS codes is the construction of this polynomial using
the Massey-Berlekamp algorithm, which is generally more applicable for
synthesizing a linear-shift feedback register (LSFR) of minimum length and
generating a predefined output sequence. Figure 4 shows the algorithm. Table 4
illustrates the procedure in terms of LSFR generation for decoding a
triple-error-correcting binary (15,5) BCH-encoded block with three positions
in error. Note that a LSFR generating the 2t syndromes is constructed in this
way. Each time the register fails to generate a syndrome, its length is
incremented. When the algorithm ends, the LSFR will generate the 2t syndromes
correctly if the number of errors in the block is less than or equal to t. As
you see from the example, the LSFR is readily described by the polynomial
s(D). Solve s(D) for its roots, and invert these to obtain the error locators.

To decode binary BCH codes where the symbol field is GF(2), you only have to
know the position of symbol errors. In nonbinary codes, however, you also need
to know the error magnitudes. You can get around by modifying the error
locator (to incorporate error values, not only error positions) and by
defining an extra error evaluator polynomial. GALOIS.EXE incorporates the code
for doing this. 


Conclusion


While these error-correcting codes have been known for some time, their
application in high-speed digital transmission has only recently come into
vogue because of the complex decoding involved. Chips performing RS
encoding/decoding using 256-ary symbols are now commercially available at bit
rates of up to 40 Mbits/sec and higher. There's little doubt that they would
be a useful building block for the construction of error-resilient, safe file
systems.


References


Anderson, J.B. and S. Mohan. Source and Channel Coding: An Algorithmic
Approach. Boston, MA: Kluwer Academic Publishers, 1991.
J.L. Massey. "Shift Register Synthesis and BCH Decoding." IEEE Transactions on
Information Theory (January 1969).
Figure 1 Input/output relation for cyclic-linear block codes.
Figure 2 Generator polynomials for BCH and RS codes.

Figure 3 The error-locator polynomial s(D).
Figure 4 The Massey-Berlekamp algorithm for BCH decoding: synthesizing s(D) as
linear-shift feedback register of minimum length.
Table 1 Representation of GF(32) elements.
Table 2 Operations in GF(4).
Table 3 Representation of GF(42) elements.
Table 4 Massey-Berlekamp algorithm in progress.

Listing One 

/* Finds a primitive polynomial of GF(p^n) */
void FindPrimitivePolynomial(void) {
 int k,d;
 int *pi1; /* array of (n+1) integers, each indicating */
 /* a number between 0 and p-1 -> generation */
 /* of all polynomials of degree n with */
 /* coeff. in GF(p) */
 int *pi2; /* idem for polynomials of degree <=n-1 */
 bool rem0;
 d=pg_l->Exp; /* degree of primitive polynomial */
 ... (allocate and init to zero pp_prim, pp_aux, pi1 & pi2) ...
 pi1[d]=1; /* force it to be a degree-q polynomial */
 pp_prim->Deg=d; 
 do{ /* loop for generation of all degree-d polynomials */
 for(k=d;k>=0;k--)
 pp_prim->c[k]=pi1[k];
 if(d!=1){ /* all degree 1 polynomials are irreducible */
 memset((void*)pi2,0,sizeof(int)*d);
 pi2[1]=1; /* degree 1 polynomial */
 do{ /* loop for generation of all degree 0..q-1 pol. */
 for(k=d-1;k>=0;k--) 
 pp_aux->c[k]=pi2[k];
 DEGREE(pp_aux);
 rem0=DivPoly(pp_prim,pp_aux,NULL,NULL,po_s);
 if(rem0) break; /* not irreducible */
 }
 while(update(pi2,po_s->MaxNo,d-1));
 }
 else 
 rem0=0;
 if(!rem0) /* found an irreducible polynomial */ 
 if(is_primitive())
 break;
 }
 while(update(pi1,po_s->MaxNo,d));
 ... (free pp_aux, pi1, pi2) ...
 }
/* Gets increment for (d+1) integers, each having maximum value p-1 */
static bool update(int *pi,int p,int q) {
 int i,j,max;
 max=p-1;
 j=0;
 while(pi[j]!=max && j<q) j++;
 i=j;
 while(pi[i]==max && pi[i+1]==max && i<q) i++;
 if(i==q && j==0)
 return FALSE;
 if(j)
 pi[0]++;
 else {

 pi[i+1]++;
 for(j=0;j<=i;j++)
 pi[j]=0;
 }
 return TRUE;
 }
/* Checks for an irreducible polynomial to be primitive. If it is found to be 
primitive, the function returns TRUE and <pg> contains all GF-elements. */
static bool is_primitive(void) {
 int p,i,k,l;
 pPOLYNOM pp_rem=NULL;
 bool is_prim=TRUE;
 p=pp_prim->Deg;
 ...(allocate pp_aux, pp_rem) ...
 ...(init pg_l->pe[0] to 0
 pg_l->pe[1] to 1
 remaining entries to 0) ...
 for(i=2;i<pg_l->MaxNo && is_prim;i++) { /* `^1,..,`^(p^q-2) */
 SET_POLY(pp_aux,i-1,1); /* D^i <-> `^i */
 DivPoly(pp_aux,pp_prim,NULL,pp_rem,po_s);
 for(k=p-1;k>=0;k--)
 pg_l->pe[i].Alpha[k] = (byte)pp_rem->c[k];
 for(l=0;l<i;l++) {
 for(k=0;k<p;k++) {
 if(pg_l->pe[l].Alpha[k] != pg_l->pe[i].Alpha[k])
 break;
 }
 if(k==p) { /* then pe[l] == pe[i] -> not primitive */
 is_prim=FALSE;
 break;
 }
 }
 }
 ...(free pp_aux, pp_rem) ...
 return(is_prim);
 }



Listing Two 

/* constructs table of operations <po> of locator field <pg> given symbol
field
** operations <op>. Returns <po> Since 2D operations commute, only 1/2 of the 
** NxN tables are necessary. Proper indexing into 2D-arrays is done using 
** ADD/MULT macros. */

#define I(i,j) ((i>j) ? j : i)
#define J(i,j) ((i>j) ? (i-j) : (j-i))
#define MULT(i,j) Mult[I(i,j)][J(i,j)]
#define ADD(i,j) Add[I(i,j)][J(i,j)]

pOPTABLE ConstructOpTable(pOPTABLE po,pGALOIS pg,pOPTABLE op) {
 unsigned i,j;
 for(i=0;i<po->MaxNo;i++) {
 po->InvAdd[i] = (byte) i_add_inv(i,pg,op);
 po->InvMult[i] = (byte) i_mult_inv(i,pg,op);
 for(j=i;j<po->MaxNo;j++) {
 po->Add[i][j-i] = (byte) i_add(i,j,pg,op);
 po->Mult[i][j-i]= (byte) i_mult(i,j,pg,op);

 }
 }
 return po;
 }
/* Note: All references to GF-elements denote indexes to the <pg>-structure 
** containing these elements. The convention is:
** index = 0 -> GF-element = 0
** 1 -> 1
** 2 -> `
** 3 -> `^2
** ...
** p^q - 1 -> `^(p^q-2)
*/ 

/* `^k = `^i * `^j */
int i_mult(int i,int j,pGALOIS pg,pOPTABLE po) {
 int res;
 int mod=pg->MaxNo;
 if(po==NULL) /* symbol field created -> modulo */
 return( (i*j)%pg->Base);
 if(i==0 j==0) /* 0 */
 return 0;
 res=i+j-1;
 while(res>=mod)
 res -= (mod-1);
 return res;
 }
/* `^k = `^i + `^j */
int i_add(int i,int j,pGALOIS pg,pOPTABLE po) {
 byte res[8];
 int k,l;
 if(po==NULL) /* symbol field created -> modulo */
 return( (i+j)%pg->Base);
 for(k=pg->Exp - 1;k>=0;k--) 
 res[k]=po->ADD(pg->pe[i].Alpha[k],pg->pe[j].Alpha[k]);
 for(l=0;;l++) {
 for(k=0;k<pg->Exp;k++) {
 if( res[k] != pg->pe[l].Alpha[k] )
 break;
 }
 if(k==pg->Exp)
 break;
 }
 return l;
 }
/* `^k = (`^i)^n */
int i_pow(int i,int n,pGALOIS pg,pOPTABLE po) {
 int res=1;
 while (n--)
 res=i_mult(res,i,pg,po);
 return res;
 }
/* `^k = - (`^i) */
int i_add_inv(int i,pGALOIS pg,pOPTABLE po) {
 int j=0;
 while( 0 != i_add(i,j,pg,po) ) j++;
 return j;
 }
/* `^k = (`^i)^(-1) */

int i_mult_inv(int i,pGALOIS pg,pOPTABLE po) {
 int j=0;
 if(!i) return 0; /* no multiplicative inverse for 0 */
 while( 1 != i_mult(i,j,pg,po) ) j++;
 return j;
 }



Listing Three

/* Gets the minimal polynomial for each field element (except for 0 and 1),
** given locator field's elements and operations. */ 
void FindMinimalPolynomials(void) {
 int i,j,k,order;
 pPOLYNOM pp_min;
 ...(allocate pp_aux and init it to D) ...
 for(i=2;i<pg_l->MaxNo;i++) {
 /* We first get the order of the element since this */
 /* will determine the degree of the minimal polynomial */
 for(order=1;
 i_pow(i,(int)pow((double)pg_l->Base,(double)order),pg_l,po_l) != i;
 order++) ;
 ...(allocate pp_min to this degree and init to 1)...
 /* Now we'll actually start constructing */
 /* it : m`i(D)=(D-`i)(D-`i^2)...(D-`i^q) */
 /* where q equals the order of `i. */
 for(j=0;j<order;j++) {
 pp_aux->c[1]=1;
 pp_aux->c[0]=
 po_l->InvAdd[
 i_pow(i,(int)pow((double)pg_l->Base,(double)j),pg_l,po_l)];
 pp_min=MultPoly(pp_min,pp_aux,pp_min,po_l);
 }
 /* coeff. of <pp_min> n SYMBOL field, but */
 /* operations are done in the locator */
 /* field, so remap these values */
 for(k=order;k>=0;k--)
 pp_min->c[k]= (int)pg_l->pe[ pp_min->c[k] ].Alpha[0];
 DEGREE(pp_min);
 }
 ...(free pp_aux)... 
 }
/* Generates BCH/RS generator polynomial in symbol and locator field set up
** previously. */ 
extern int t; /* error correctability */
extern pPOLYNOM pp_gen; /* generator polynomial */ 
extern bool ComparePoly(pPOLYNOM p1,pPOLYNOM p2);
 /* returns true if p1==p2 */ 
void GenerateBCH(void) {
 bool common_factor;
 int deg,i,j;
 ...(init pp_gen to 1) ...
 deg=1+(t<<1); /* 2t+1 */
 for(i=2;i<=deg;i++) {
 common_factor=FALSE;
 for(j=2;j<i;j++) {
 if(ComparePoly(pg_l->pe[j].pp,pg_l->pe[i].pp)) {
 common_factor=TRUE;

 break;
 }
 }
 if(!common_factor) 
 pp_gen=MultPoly(pp_gen,pg_l->pe[i].pp,pp_gen,po_s);
 }
 DEGREE(pp_gen);
 }
END.





















































December, 1994
Sharing Peripherals Intelligently, Part 2


Multimegabyte networking via SCSI-2




Ian Hirschsohn


Ian holds a BS in mechanical engineering and an MS in aerospace engineering.
He is the principal author of DISSPLA and cofounder of ISSCO. He can be
reached at Integral Research, 249 S. Highway 101, Suite 270, Solana Beach, CA
92075.


In my article "Sharing Peripherals Intelligently" (DDJ, November 1994), I
described a system called the STAR Peripherals Manager, which furnishes a pool
of high-speed devices to a computer workgroup. In this installment, I'll
examine how the peripherals manager (PM) uses SCSI-2 as an ultra high-speed,
inexpensive network (up to seven Mbytes/sec per client). The STAR PM consists
of a standard 486 PC, which handles all the devices; the client workstations
use SCSI-2 cables to connect to STAR. The clients do not interface to the
devices themselves; this is handled by 32-bit assembly software in the STAR
PC.
SCSI was never intended to be a networking protocol. It was designed to
standardize the interface to disks, tapes, and other peripherals; see the
accompanying text box entitled, "A SCSI Primer." However, the SCSI message
architecture lends itself to networking. Theoretically, you could string up to
eight clients together (16 with Wide SCSI-2) by daisy chaining SCSI cables
between them: The SCSI-2 spec allows hosts to communicate via the Processor
Device or Communications Device protocols. The difficulty is that almost all
SCSI-2 adapters expect to connect to a disk, tape, or other device (target),
not another host (initiator). To be a network node, the adapter must function
interchangeably as both initiator and target. 
Since almost all current SCSI-2 devices--adapters and peripherals--use one of
a handful of SCSI-2 microprocessors (such as the NCR 720), the limitation lies
with the adapter-card firmware, not the hardware. Furthermore, most operating
systems catering to SCSI-2 do not provide for operation as a target, so any
direct use of SCSI-2 as a receptor to another desktop requires writing custom
host SCSI-2 drivers and APIs. Since drivers and APIs are specific to each OS
and even each platform, the direct use of SCSI-2 as a universal network is
impractical. A practical solution requires minimal impact on any platform or
its OS. 


Distance


A well-known SCSI drawback is its limited cable length. Normal cables, which
use a single wire per line (single-ended SCSI), are limited to six meters. But
differential SCSI, in which each line employs the polarity of a wire pair, can
span up to 25 meters. If there are only two devices on the bus, these limits
can be relaxed with good-quality cable. This is nowhere near the hundreds of
meters permissible with networks, but adequate for the typical intimate
workgroup. Since STAR favors just one client per adapter, the distance
limitation is not as severe. Applied Concepts (Wilsonville, OR) and Paralan
(San Diego, CA) market "SCSI extenders," which convert the parallel SCSI
signals into a proprietary serial protocol, and then transmit them via copper
or fiber-optic cable. An extender box is placed at each end of the SCSI
interconnect, enabling it to traverse well over 1000 meters; see Figure 1. The
coming SCSI-3 standard will address high-speed serial transmission. Thus, the
SCSI distance limitation can be overcome.


Circumventing Adapters


The speed of SCSI-2 makes it practical for huge files and sharing
high-performance peripherals, rather than e-mail and small files. This can be
accomplished via a central-hub STAR that appears as a target to all the
client-host initiators. This is a configuration compatible with all SCSI-2
host adapters and systems. Interhost communication is possible with STAR
acting as an intermediary. Although this solves interconnection, there remains
the question of protocol. SCSI-2 does not recognize files and defines no
interactive protocol; for example, the only commands for disk-data transfer
are read/write fixed-length "logical blocks." Names, directories, and even
file-allocation tables are alien, so how could a file be specified?
It's possible to implement a protocol on existing adapters via vendor-specific
SCSI-2 commands such as Send Diagnostic and Receive Diagnostic Results, which
enable arbitrary blocks of data containing file specification and other dialog
to be sent as "diagnostic data." Again, the shortcoming is that it would
require custom client-driver and API modifications to accommodate the
unsupported commands. To be universally viable, the protocol should be sent
via device commands supported by almost every platform: disk, CD-ROM, and
tape. However, disk can be ruled out because high-level disk I/O is modified
and encapsulated by almost every OS, so the protocol data is buried somewhere
in the final SCSI-2 read/write requests and may even be cached to memory (RAM
disk). CD-ROM shares these shortcomings with the additional caveat that writes
are frowned upon. This leaves tape.
Most systems have tape utilities such as TAR and SYTOS, but almost all support
direct tape-record read/write from high-level software (SunOS permits tape I/O
via C read/write to dev/mt0). Data records can be passed unadulterated to the
"tape" device without modifying the client system; caching to memory, if
present, can usually be disabled or circumvented. Tape I/O as a vehicle is
employed by STAR and Springboard, a PC-based image processing system.


Virtual Tape


Clients pass commands to STAR via tape writes and read back information via
tape reads. The protocol is open architecture with all commands as ASCII text
records to simplify implementation on diverse platforms. The commands can be
sent directly from client app code or from a C-based, interactive utility.
There is no need for STAR to possess an actual tape drive; as long as STAR
mimics a tape, the client OS is none the wiser. "Virtual tape" becomes the
mechanism by which information is passed between high-level client software
and STAR.
Figure 2 illustrates a protocol session. The sequence queries the devices
available on STAR, assigns a tape drive, reads an index of the tape contents
from the last record, and then positions to tape file 53. Command mode is
entered by writing the text record mAgIc, which also flags that subsequent
records are commands, not actual tape I/O. The session is terminated via the
Exit command, whereupon all further read/writes are considered legitimate tape
I/O (until the next mAgIc record). The protocol presupposes that an actual
tape write with the 5-byte text mAgIc is extremely unlikely--a reasonable
assumption. Each tape-write command is always followed by a tape-read command,
which returns status, if no data. A tape write followed by an immediate read
causes most systems grief. (How can you read blank tape?) The OS usually
issues a write filemark, then a rewind command prior to the read;
alternatively the STAR interface code may have to insert them to placate the
OS. STAR ignores these superfluous SCSI-2 commands until the exit, whereupon
they are considered legitimate tape ops.
EXIT NO_POSN causes STAR to ignore any OS or app tape-movement commands,
providing a mechanism for the client user to override device requests coming
from vendor software over which he or she has no control. For example, Adobe
Photoshop may issue rewind or other unwanted tape-positioning commands. STAR's
ability to override or change SCSI-2 commands (transparent to the client app
or OS) is a powerful, client-platform independent mechanism for modifying the
I/O handling of popular apps without banging on the app vendor. (I/O and
device handling are generally the areas where customization is most needed.)
The protocol enables the customizing information to be passed to STAR prior to
executing the app; STAR can then modify subsequent app I/O as instructed.
Since the STAR interface utility can generally be executed from within a local
shell, protocol sessions can be conducted from a standard app without
terminating it. The ability to filter and transform data coming from the
actual medium provides additional control over standard apps.
The actual protocol commands are not as important as the fact that an
extensive dialog can transpire between a user and STAR transparent to UNIX,
Windows NT, OS/2, or any other system--or even the client app itself. Since
the client OS has no influence over the content of the command records or data
returned by STAR, the scheme has complete flexibility. The commands are used
to specify disk files, interrogate disk directories, install
data-transformation algorithms, and perform other device activities unrelated
to tape. Actual tape I/O complicates the STAR end because legitimate I/O is
interspersed with protocol sessions. But the burden of implementation rests on
the STAR PC, not the client, because STAR must detect the special records
before caching actual tape I/O, discard superfluous client SCSI-2 commands,
and decode and execute the protocol requests. The client needs only to send
the virtual tape records and read the STAR responses. Without an intelligent
server programmed to respond at the SCSI-2 interface level, the virtual-tape
scheme is not viable.
There is no limit to the uses for a virtual-tape protocol; network functions
such as interclient communication, even e-mail, can be implemented by storing
messages from one client in STAR and forwarding them to the recipient. The
only requirement is that the STAR PC be programmed to recognize and execute
enhancements to the command set. 
A mechanism is available on STAR to recognize user custom commands and divert
them to user-supplied STAR-based code. Virtual tape provides communication
flexibility without sacrificing ultra high-speed SCSI-2 data transfer,
impacting existing client SCSI-2 devices, or requiring TSRs and other system
mods.


Reference


"Small Computer System Interface (SCSI-2)" Working Draft, Project 375D, ANSI
X3T9.2 Committee, March 1994.
A SCSI Primer
Small Computer Systems Interface (SCSI) has evolved into a
platform-independent specification for interfacing a wide range of peripherals
and has become the standard for almost all desktops. The original 1985 ANSI
specification, generally referred to as "SCSI-1," afforded latitude to device
manufacturers, inevitably leading to incompatibility between platforms and
peripherals. The current ANSI SCSI-2 specification nails down the hardware
timings and eliminates most of the different software interpretations. SCSI-2
provides a rich command set and versatile options to link multiple commands
(for speed). It enables time-consuming commands (rewind, for example) to
release the bus. SCSI-2 is so extensive it typically requires a dedicated
microprocessor, such as the NCR 720. But SCSI-2 is designed for device
interface. Its sophistication is concentrated on diagnostics, integrity, and
device-control functions--there is no provision for data files, names, or
other high-level concepts.
The 8-bit SCSI bus consists of eight data, one parity, and nine signal lines.
This is increased to 16 and 32 data lines by 16-bit (Wide) and 32-bit SCSI-2,
but all other aspects of the specification remain unchanged. The devices are
connected to one another by running a cable from one device to the next as a
"daisy chain." SCSI devices communicate via message packets analogous to
networks. These 6-, 10-, or 12-byte messages each contain a command and
related parameters; for example, to format a specific disk track, skip a given
record count, and so on. If the command involves data transfer, the data
length is contained as one of the parameters. In the case of a write, the data
bytes follow the message; for a read, they are input. Subsequent to the
message and data, the command sender (initiator) listens for a status byte
from the receiver (target). According to SCSI-2, any device can be an
initiator or target interchangeably even between commands; however, most host
adapters are configured exclusively as initiators, and most peripherals are
targets.
Selection of which devices communicate with one another takes place during the
arbitration and selection phases preceding the message phase; see Figure 3.
Any device wishing to use the SCSI bus activates one of the data lines at the
start of the arbitration phase, each device being unique to one of the data
lines (during this phase). If more than one data line is activated, the device
with the highest priority wins. Thus each line has a priority 0--7 (typically
the host takes 7). Once a device gains control of the bus, it becomes the
Initiator, whereupon it activates both its own line and the line for the
target (selection phase). The target acknowledges via a signal line and the
message phase commences. This scheme allows any device to become a bus master
and target any other device. The maximum number of devices on the SCSI bus is
limited by the count of data lines: 8, 16, or 32 for normal, Wide, and 32-bit
SCSI-2. Unlike single-line networks, 8-bit SCSI-2 has 17 lines to use so its
collision resolution is more efficient and consumes less bus cycles, giving it
a substantial advantage over a network with the same nominal bytes/sec.
One SCSI myth is the inflated data rate quoted by many vendors, for example,
ten Mbytes/sec for synchronous 8-bit (Fast) SCSI-2 devices. This data rate
refers only to the data clocking following the Message Phase and does not
account for the overhead of the arbitration, selection, and other phases. Once
a device gains control of the SCSI bus, it and its target can dally about set
up of the data transmission, resulting in a much-reduced actual SCSI
throughput. Performance depends on the amount of data accompanying the
message; for example, if the arbitration overhead is 5 ms and only one byte is
sent with each message, the rate is only 200 bytes/sec, whereas for 256-Kbyte
blocks the realized performance is 8.4 Mbytes/sec. Clearly, it is desirable to
send as much data as each device can accommodate with a single message, to
minimize delays and utilize command linking and other SCSI-2 features that cut
overhead. This favors separating real-world host (client) and peripheral I/O
into overlapped, independent streams buffered through a high-speed
intermediary.
Although SCSI-2 is more rigorous than SCSI-1, many of the SCSI-2 commands are
optional, and it is still possible for a device to be SCSI-2 compliant while
not honoring certain host-driver requests. On the other hand many of the
powerful features of a vendor's peripheral can go unused by an OS geared to
the lowest common denominator; for example, fast search on tapes and scatter
read/write on disks. By separating the host platforms from the peripherals,
STAR is able to take full advantage of advanced, or even vendor-specific
SCSI-2 features of a given device, while catering to the minimal expectations
of most host systems.
--I.H.
Figure 1 Extending distance between SCSI-2 devices with extender modules at
each end of SCSI-2 cable run. This overcomes SCSI-2 distance limitations.
Figure 2: Sample STAR protocol sequence to read an index from the last file of
a tape. The tape reads and writes are generic and can be executed by any tape
I/O library on the client platform.

Write: 'mAgIc' Enter protocol mode
Read : Status (OK) STAR returns status on each command
Write: 'QUERY' Query devices available on STAR
Read : Devices list
Write: 'ASSIGN 3' Device 3 on STAR list received
Read : Status
Write: 'TAPE_POSN LAST' Position to last file on tape
Read : Status
Write: 'READ 32000' Read record up to 32000 bytes
Read : Contents+length
Write: 'FILE1 53' Position to file 53 of tape (first
Read : Status file is 1)
Write: 'EXIT NO_POSN' Terminate protocol mode. Ignore SCSI-2
Read : Status position commands until 1st Read.
 <Further Read/Writes now legitimate tape I/O>
Figure 3 Phase sequence on each SCSI-2 command. Note similarity with network
message-packet protocol.














































December, 1994
Real-Time Scheduling Algorithms


Achieving predictability for critical applications




Alberto Daniel Ferrari


Alberto is on leave from the Laboratorio de Controle e Microinformatica at the
Universidade Federal de Santa Catarina in Florianopolis, Brazil. He can be
reached at alberto@uncu.edu.ar.


In a real-time computer system, correctness depends on the time at which the
results are given. In practice, this means that for real-time systems to
behave properly, some critical subset of a system's tasks should complete
their processing before their deadlines. Failure to do so can lead to human,
environmental, and/or economic damages. Compounding the problem is that
emerging systems which are distributed, dynamic, and adaptive will put even
more stringent demands on real-time systems. The success of teams of robots
working in hazardous environments, on-board space-shuttle systems, and
underwater or outer-space autonomous vehicles, for instance, will all be
strongly dependent on the timeliness of their computational results.
The predictability of the real-time system is fundamental to achieve this
temporal correctness. A predictable system has known temporal bounds for all
its actions and can therefore be formally analyzed. Automated techniques can
be applied in advance to guarantee the system's critical set. This way, you
can have an early warning of a system's inability to meet its timing
requirements, and so take appropriate corrective actions. In other words,
real-time is not synonymous with "fast," but with "predictable."
Most commercial real-time executives are based in a priority-driven scheme.
Current real-time practices for uniprocessor multitasking systems rely on the
importance the designer subjectively perceives they have. However, there's no
guarantee that critical tasks will meet their deadlines just because the
"ready" process with higher priority is the one that executes at every moment;
consequently, the temporal correctness of the system remains uncertain.
Furthermore, ad hoc priority schemes are not effective for predictable access
to shared resources due to the unbounded priority inversion phenomenon, in
which a higher-priority task is prevented from executing by a lower-priority
task for an indefinite period of time. 
To alleviate these problems, real-time scheduling algorithms have been devised
that force the system's scheduler to base its decisions explicitly on the
temporal characteristics of the task set. For most of these algorithms, simple
mathematical conditions verify the schedulability of the critical set;
satisfying the conditions guarantees the set, even without the designer
knowing precisely when any task will be run. This way, you isolate temporal
from logical system correctness. 
With real-time algorithms, you trade additional information about your system
for temporal predictability. To get a more dependable system, you must know
about the temporal qualities of the target environment, application, and
tasks--then make this information available to the scheduler.


Algorithms for Real-Time Use


When applying real-time algorithms, you must distinguish the algorithm's
heuristic--used for scheduling the system--from its schedulability test, which
guarantees a given task set. That is, you can use every heuristic that comes
to mind to schedule a task group, but you won't have any guarantee of meeting
the set's temporal constraints unless you apply a scheduling test.
The algorithm heuristic is the criterion according to which we define a figure
of merit for a task. The task with highest-merit value among all ready tasks
at every moment is the task that executes (ties are broken arbitrarily).
Schedulability tests are valid for a well-defined task model. In this article,
I'll examine the rate-monotonic (RM), earliest deadline (EDF), least-laxity
dynamic (LLF), and maximum-urgency-first (MUF) algorithms. The temporal model
for the algorithms discussed in this article assumes that each task:
Repeatedly executes at a known fixed rate (its "period").
Must end before the beginning of its next period (its "deadline").
Does not need to synchronize with others in order to execute.
Can be interrupted at any point in time and replaced by another task in the
CPU.
Does not suspend voluntarily.
Has zero preemption cost (task-switch times and scheduling-algorithm execution
load are neglected).
Is ready while its assigned processing time is not exhausted. After running
out of execution units, the task blocks until its next period.
Since the task set is periodic, the base timeline will repeat itself after the
LCM (least-common multiple) of the periods of the involved tasks. 
A scheduling algorithm is said static if the task merit's value is fixed
throughout its execution, and dynamic otherwise. Static algorithms have lower
run-time costs but lack flexibility for adapting to a changing environment;
they also need temporal information from all tasks before they're actually
run.
Rate monotonic (RM). Developed in 1973 by Liu and Leyland, this is one of the
most well-known and often-used algorithms for real-time applications. Its
scheduling heuristic is shortest-period-first, so it always runs the ready
task with shortest period. Since the period of a task is fixed, RM is a static
algorithm. The condition for a task set to always meet its deadlines is shown
in Figure 1, where Ci is task i's execution time, Ti is its period, and n is
the number of tasks. The left term is the sum of the individual task loads;
the right term defines a limit that goes from 100 percent for one task, to 69
percent for a large number of tasks. In practice, 88 percent is a more
realistic value, although when task periods are harmonically related, it could
still reach 100 percent. Even when the total task load is higher than the
limit, the set may still be scheduled using RM: You just have to prove that
every task meets its first deadline when all tasks are started simultaneously.
RM is said to be "stable" because one subset of tasks remains guaranteed when
the processor is overloaded. Having ordered the task set by increasing
periods, its stable set is composed of the first n tasks whose combined load
is below the limit. The RM analytical model has been extended to handle
complex situations, such as task synchronization (priority-ceiling algorithms)
and aperiodic task service (through appropriate servers). RM is also optimum
in the sense that it can always schedule a task set if another static
algorithm can.
Earliest deadline (EDF). This algorithm schedules the task with closer
deadline first. Because the task with the nearest deadline varies with time,
this is considered a dynamic algorithm. In this case, a task set is
schedulable if the total task load is under 100 percent. Its task model was
extended for the same cases as RM, too. EDF is optimal in that if a task set
can be scheduled by any algorithm, it can also be scheduled using EDF. It also
features the lowest number of task switches but is not stable.
Least laxity (LLF). The merit of the least-laxity (LLF) dynamic algorithm is
the lesser "flexibility" to be scheduled in time. This laxity is measured as
the difference between the time to deadline and the remaining computation time
to finish. A task with negative laxity won't meet its deadline, so laxity
provides early detection of temporal failures. This is important if specific
actions (other than aborting the faulty task or warning the user) should be
taken due to a timing fault: You must ensure that the exception handler has
time to execute its code within a given deadline. Again, for a task set to be
schedulable, its total load must be under 100 percent; LLF is also optimal, in
the same sense as EDF.
Maximum-urgency-first (MUF). Developed at Carnegie Mellon University, this
mixed-scheduling algorithm combines the best features of the others:
predictability under overload conditions and a scheduling bound of 100 percent
for its critical set. The static part of a task's urgency is a user-defined
criticality (high/ low), which has higher precedence than its dynamic part. 
First, tasks are ordered by increasing periods: The first n highly critical
tasks with combined load under 100 percent form the critical set. The highly
critical ready task with the least laxity is chosen to run at every moment. If
no critical tasks are ready, then tasks with low criticality are selected,
again with the least-laxity heuristic. Ties are broken through an optional
user priority.
The period of any taski may be modified without dropping it from the critical
set, provided its load (Ci/Ti) remains unchanged. This is useful in dynamic
environments, where the system needs flexibility to adapt to changing
situations.
Table 1 provides a summary of the RM, EDF, LLF, and MUF algorithms. All four
may accept new periodic tasks at run time, but a reschedule operation is
necessary to guarantee the former critical tasks.
Figure 2 shows execution timelines for a task set scheduled by each of these
algorithms, with a critical load of 58.3 percent. The resulting schedules
differ in the number of (potentially costly) context switches--13, 11, 13, and
13, respectively. Any timeline repeats itself every 24 time units.
For nonstable algorithms such as EDF and LLF, any overload that raises the
total load above 100 percent will cause at least one task to miss its
deadline. We don't know which one may fail; it could be even a critical task.
In RM (as in MUF), tasks outside the critical set may safely overload without
any critical task losing a deadline; highly critical tasks may overload the
example in up to (78.0/58.3--1)=33.8 percent and still remain guaranteed for
RM. In the case of MUF, this increases to (100/58.3--1)=71.5 percent, a
certainly higher value. The actual RM-scheduling limit for this case is higher
than the value yielded by the formula in Figure 1 (78.0 percent).
Increasing Task B's CPU time from 3 to 5 raises the critical load in 64.3
percent (95.8/58.3--1); see Figure 3. RM cannot handle the critical set
anymore, but MUF still guarantees it. Since the timeline is periodic with
period 24, no task belonging to the critical set (A and B) ever loses a
deadline. 


Implementation


I've included with this article a program that implements a generic execution
machine that simulates a system's temporal behavior under a specified
scheduling algorithm. Configuration files describe the intended task set and
system parameters; see Listing One . The program is composed of a series of
modules which are available electronically; see "Availability," page 3.
Listing Two is the code for the simulation machine. The selected algorithm
module is called upon entry and exit of the program, and on every time tick of
the system when it returns the task; in the latter case, it returns the task
chosen to run next. 
The system is not priority driven but managed through linked lists. The
deadline_list holds current instances ordered by increasing deadlines and is
searched for failed tasks: When a deadline is reached, its owner should be
idle; otherwise, there is a deadline failure and the instance is aborted.
Request_list contains the time for future task invocations; since the tasks
are periodic, a request for the next instance is made upon starting. Finally,
merit_list contains current instances ordered by merit value.
Default_dispatcher(), the sched_alg() function for RM and EDF, returns the
first ready task from merit_list. If the selected task is different from the
current task but has the same merit, the current task continues to execute, to
save one context switch.
The first three algorithms are in Listing Three . All relevant processing for
RM is done during its initialization; after that, the generic dispatcher is
called, according to its static nature. Since EDF's merit list is
deadline_list, it is always modified as instances develop and execute. As the
tasks' laxities vary with time, I preferred not to reorder the merit list of
LLF on every tick; instead, merit_list is linearly searched for the task with
least value. Apart from this, least_laxity() has almost the same code as
default_dispatcher. 
Finally, Listing Four shows the code of the MUF algorithm. Tasks are first
ordered by increasing periods; then the critical set is defined. The code
returns the highly critical task with the least laxity, if it exists, or a
task with lower criticality and least laxity otherwise. Ties are broken
through user priority, which is determined by the order in which tasks appear
in the configuration file (preceding tasks have higher user priority than a
given task). 

Figure 4 shows a sample timeline, plus the program output generated. When a
task executes continuously, you capture just its beginning and not its
subsequent time periods: This is the case for a, which executes from the
beginning of 5 to the end of 7. There exists a special task, idle_task, whose
execution represents the processing of background tasks or tasks without hard
deadlines. When a task terminates normally, it blocks until the beginning of
its next period. 
For linked-list management, I adapted Pugh's skip lists, described and coded
in the article "Skip Lists" by Bruce Schneier (DDJ, January 1994). The program
supports an interface to this library that modifies some calls, mainly because
the original doesn't have search-by-value functions for duplicated items.
Consequently, I combined the key and the task_id into one new key and didn't
allow for duplicates.


Conclusion


The main difficulties in applying real-time scheduling techniques stem from
the restricted task model for which they're valid. Those models often miss
communication relations, precedence orders, and mutual-exclusion problems
among tasks that exist in real-world situations. As tasks interact, integrated
resource scheduling is also necessary. Algorithms exist that support special
cases, in which decisions deal with imprecise results, task-completion value,
and so on. However, no algorithm is good for all cases: You'd have to look for
the one most suitable to your system's task model.
Additionally, you must know the precise temporal attributes of each process in
advance. State-of-the-art real time lacks tools for getting them
automatically. The problem is exacerbated when modules synchronize or share
resources (such as data). Modern hardware, with cache, pipelining, and the
like, also adds temporal uncertainty to the system. 
Real-time scheduling may seem unnecessary, but as the project's complexity and
size increase, it's the only way to guarantee proper system behavior. It is
certainly more predictable than ad hoc techniques. If you're still not
convinced, just imagine scheduling 20--30 processes by hand. Even worse,
imagine modifying the resulting schedule to accommodate a change in the
specifications of one or two processes. Remember, however, that the benefits
of real-time scheduling depend on the quality of the temporal information
(actual execution-time upper limit of each task, among others) on which you
base your analysis.


Bibliography


Liu, C. and J. Layland. "Scheduling Algorithms for Multiprogramming in a
Hard-Real-Time Environment." Journal of the ACM (January 1973).
Natarajan, S. and W. Zhao. "Issues in Building Dynamic Real-Time Systems."
IEEE Software (September 1992).
Sha, L. and J. Goodenough. "Real-Time Scheduling Theory and Ada." IEEE
Computer (April 1990).
Stankovic, J. "Misconceptions about Real-Time Computing." IEEE Computer
(October 1988).
Stewart, D. and P. Koshla. "Real-Time Scheduling of Sensor-Based Control
Systems." Proceedings of Eighth IEEE Workshop on Real-Time Operating Systems
and Software (May 1991).
Figure 1 RM schedulability test for a set of n periodic, independent,
preemptive tasks. The right term of the equation defines a limit which the
total load should not surpass for the task set to be guaranteed.
Figure 2 Task set scheduled using: RM, EDF, LLF, MUF. Although total load (83
percent) is higher than the RM limit for this case (78 percent), RM can
schedule the set satisfactorily.
Table 1: Summary of RM, EDF, LLF, and MUF scheduling algorithms. 
Algorithm Type Schedulability Comment
 Test
Rate monotonic Static Total load under limit defined Stable
 (RM) by algorithm task model.
 Meeting of first deadline.
Earliest deadline Dynamic Total load under 100% Least number
 (EDF) of context
 switches
Least laxity Dynamic Total load under 100% Early
 (LLF) detection 
 of timing
 failures
Maximum urgency Mixed Critical load under 100% Stable;
 first (MUF) 100%
 schedule
 bound
Figure 3 The same as Figure 2, but with a critical overload of 64.3 percent.
Note that MUF is the only algorithm that gets the critical set (A and B)
scheduled in time. 
Figure 4 Sample timeline and its program's representation. Background
processing (or soft) real-time tasks are symbolized by (Z); (b) program output
for the example in Figure 2, scheduled with RM.

Listing One 

configuration file: TASK DESCRIPTION FILE
; Task set descriptions, from which tasks are instantiated. Keywords 
; are at the line's beginning, and end with ':'. 
; Everything in the line after the keywords or values is ignored. Lines 
; beginning with '*' are also ignored. No line can be 
; longer than MAX_STRING characters, and no name longer than 
; MAX_NAME_LENGTH. Please, maintain the order of the parameters in the 
; task descriptions.
; start:

test set: Article's Example

MAXTIME= 27 /* timeline's upper limit (starting at 0) */
Number of Application Tasks=3


APPLICATION TASKS DESCRIPTION:
Name Criticality Period Execution_time 
Task A, HIGH, 6, 2.
Task B, HIGH, 8, 2.
Task C, LOW, 12, 3.
end.




Listing Two

/***** RTALGS.C -- Algorithms for Real-Time Scheduling *****
 ***** Alberto Ferrari -- aferrari@uncu.edu.ar *****/
/* Simulates the execution of a task set in a multitasking environment, 
under several real-time scheduling algorithms. The objective is to 
obtain a timeline of the execution, and to show if tasks meet their 
deadlines or not. Tasks are assumed to be hard real-time, preemptive, 
periodic, with deadline equal to the next instance's arrival 
time, and independent (they do not need to syncronize with others in 
order to execute). They also do not suspend its execution voluntarily. 
All tasks start execution at the same time in the simulation. 
Available algorithms for scheduling are Rate Monotonic (RM), 
Earliest-Deadline-First (EDF), Least-Laxity-First (LLF), and 
Maximum-Urgency-First (MUF), selected in command line. Also selected 
in command line is the configuration file, containing the task set 
description and the system parameters. Usage: 
rtalgs -[eElLmMrR] <testvect.name> where: e, 
Earliest-Deadline-First (EDF); l, Least-Laxity-First (LLF); m, 
Maximum-Urgency-First (MUF), r, Rate Monotonic (RM) *****/
void main( int argc, char *argv[])
{ 
 node n; 
 task_t *task, *new;
 init( argc, argv);
 printf( "\nSelected Scheduling Algorithm: %s,\n", labels[ alg]);
 (sched_alg_init)();
 /* select which task to run next */
 for( sys_time=0;
 (merit_list->header->forward[0]!=NIL request_list->
 header->forward[0]!=NIL)
 && sys_time <= max_time;
 sys_time++){

 /* and if current task emptied its allocated time... */
 if( current!= idle_task && -- current->remaining == 0){
 current->state=DEAD;
 current->cycles++;
 delete_task( deadline_list, current->deadline, current);
 current= idle_task;
 }
 /* Look out for deadline failures */
 while( key_of( n=first_node_of( deadline_list)) <= sys_time){
 if( (task= n->v)->state != DEAD){
 printf( "At %d: task %c (\"%s\"), 
 instance %d, Deadline Failure%s\n",
 sys_time, task->sys_id, task->name,
 task->instance, bell);
 }

 delete( deadline_list, n->key);
 }
 /* if it is time to launch a task... */
 while( key_of( n=first_node_of( request_list)) <= sys_time){
 task_init( (task= n->v) );
 delete( request_list, n->key);
 insert_task( deadline_list, task->deadline,task);
 insert_task( request_list, task->deadline, task);
 }
 new = (sched_alg)();
 /* swap and register who's using the processor */
 if( current!=new){
 context_switches++;
 current->state=READY;
 current=new;
 current->state=RUNNING;
 }
 timeline.history[ sys_time]= current->sys_id;
 #ifdef DEBUG
 printf( "%d: %s\n", sys_time, timeline.history);
 #endif
 }
 (sched_alg_end)();

 draw_timeline();
}
 ......
task_t *default_dispatcher( void)
{
 task_t *task;
 if( (task=first_ready( merit_list))==NULL)
 return( idle_task);
 else if( current== idle_task)
 return( task);
 else /* current task prevails other tasks with same merit */
 return( (*task->merit == *current->merit)? current : task);
}



Listing Three

/***** Rate Monotonic (RM) Algorithm *****/
void monotonic_rate_init( void)
{
 int i;
 node n;
 task_t *task; /* 'task' is the task with 'lesser' period */
 float task_load=0.0, critical_task_load=0.0, schedulability_bound;
 /* in RM case, 'deadline_list' is different from the 'merit_list' */
 deadline_list= newList(); deadline_list ->sys_id= 'D';
 /* calculate n*(2^1/n - 1) */
 schedulability_bound= num_tasks * ( pow( 2.0, 1.0/num_tasks) -1.0);
 printf( "which has a schedulability bound of %.1f%% for %d tasks.\n",
 100.0 * schedulability_bound, num_tasks);
 /* insert tasks in merit_list by increasing periods */
 /* If two tasks with equal period, order them by original sequence */
 for( i=1; i<=num_tasks; i++){
 task->merit= &(task=task_set+i)->period;

 insert_task( merit_list, *task->merit, task);
 insert_task( request_list, 0, task);
 }
 puts( "Critical set is composed of");
 for( task= (n= merit_list->header->forward[0])->v; n!=NIL; 
 task= (n= n->forward[0])->v){
 task_load+= (float )task->cpu_time / (float )task->period;
 if( task_load <schedulability_bound){
 critical_task_load= task_load;
 printf( "\t%s,\n", task->name);
 }
 }
 printf( "which accounts for a critical load of %.1f%%, over a total 
 system load of %.1f%%\n", 100.0 * critical_task_load, 100.0 * task_load);

 if( task_load<=schedulability_bound) printf( "So, the whole task set IS");
 else if( task_load>1.0) printf( "WARNING: the whole task set IS NOT");
 else printf( "WARNING: the whole task set MAY NOT be");
 printf( " schedulable under RM\n\n");
}
void monotonic_rate_end( void){}
/***** Earliest-Deadline-First (EDF) algorithm *****/
void earliest_deadline_init( void)
{
 task_t *task;
 float task_load=0.0;
 int i;
 printf( "which has a schedulability bound of 100%%\n");
 /* in the EDF case, 'deadline_list' is the same as 'merit_list' */
 deadline_list= merit_list;
 /* insert tasks in merit_list by increasing deadlines */
 for( i=1; i<=num_tasks; i++){
 task->merit= &(task=task_set+i)->deadline;
 task_load+= (float )task->cpu_time / (float )task->period;
 insert_task( request_list, 0, task);
 }
 printf( "Total system task load = %.1f%%\n", 100.0 * task_load);
 if( task_load<=1.0) printf( "So, the whole task set IS");
 else printf( "WARNING: the whole task set IS NOT");
 printf( " schedulable under EDF\n\n");
}
void earliest_deadline_end( void) {}
/***** Least Laxity Algorithm (LLA) *****/
void least_laxity_init( void)
{
 task_t *task;
 float task_load=0.0;
 int i;
 printf( "which has a schedulability bound of 100%%\n");
 /* in the LLF case, 'deadline_list' is not the same as 'merit_list' */
 deadline_list= newList(); deadline_list ->sys_id= 'D';
 for( i=1; i<=num_tasks; i++){
 task->merit= &(task=task_set+i)->laxity;
 task_load+= (float )task->cpu_time / (float )task->period;
 insert_task( merit_list, *task->merit, task);
 insert_task( request_list, 0, task);
 }
 printf( "Total system task load = %.1f%%\n", 100.0 * task_load);
 if( task_load<=1.0) printf( "So, the whole task set IS");

 else printf( "WARNING: the whole task set IS NOT");
 printf( " schedulable under LLF\n\n");
}
task_t *least_laxity( void)
{
 task_t *least;
 /* all tasks (except 'current') now have one less 'laxity' unit */
 if( (least=update_laxity_and_get_least( merit_list)) ==idle_task)
 return( idle_task);
 else if( current== idle_task)
 return( least);
 else /* current task prevails other tasks with same merit */
 return( (*least->merit == *current->merit)? current : least);
}
void least_laxity_end( void){}
/* returns idle_task if 'l' is empty */
task_t *update_laxity_and_get_least( list l)
{
 task_t *task, *least;
 node n;
 least= idle_task;
 for( task= (n= l->header->forward[0])->v; n!=NIL; 
 task= (n= n->forward[0])->v){
 /* task->laxity(t) = task->deadline - t - task->remaining(t);
 * but now(t)= now(t-1)+1,
 * and task->remaining(t)= task->remaining(t-1), if task!=current,
 * ==> task->laxity(t) = task->laxity(t-1) -1;
 *****************************************************************/
 /* look out! task->laxity is decremented only if its state is 
 READY, because of && */
 if( task->state ==READY && -- task->laxity<0){ 
 /* if it's eligible... */
 printf( "At %d: task %c (\"%s\"), instance %d, will 
 lose its deadline at %d%s\n", 
 sys_time, task->sys_id, task->name, 
 task->instance, task->deadline, bell);
 task->state=BLOCKED;
 }
 if( (task->state ==READY task->state ==RUNNING) 
 && task->laxity < least->laxity)
 least=task;
 }
 return( least);
}



Listing Four

/***** Maximum-Urgency-First (MUF) Algorithm *****/
list high_crit_l, low_crit_l;
task_t *first;
void maximum_urgency_first_init( void)
{
 node n;
 list temp_list;
 task_t *task;
 float critical_task_load=0.0, task_load=0.0, temp=0.0, load;
 int i, critical_set=TRUE;

 printf( "which has a schedulability bound of 100%%\n");
 /* in the MUF case, 'deadline_list' is not the same as 'merit_list' */
 deadline_list= newList(); deadline_list->sys_id= 'D';
 temp_list= newList(); temp_list->sys_id= 'T';
 high_crit_l= merit_list; high_crit_l->sys_id= 'H';
 low_crit_l= newList(); low_crit_l->sys_id= 'L';
 for( i=1; i<=num_tasks; i++){
 task=task_set+i;
 task->merit= &task->laxity;
 /* use temp_list to order tasks by increasing periods */
 insert_task( temp_list, task->period, task);
 insert_task( request_list, 0, task);
 }
 /* insert tasks in both (high_crit_l and low_crit_l) lists */
 puts( "Critical set is composed of"); /* the first 'n' tasks in 
 'high_crit_l' with combined load less than 100% */
 for( task= (n= temp_list->header->forward[0])->v; n!=NIL; 
 task= (n=n->forward[0])->v){
 task_load+=(load=(float )task->cpu_time/(float )task->period);
 if( task->criticality ==HIGH){
 if( (temp+=load)<=1.0 && critical_set==TRUE){
 critical_task_load= temp;
 printf( "\t%s,\n", task->name);
 insert_task( high_crit_l, task->period, task);
 }else{
 critical_set= FALSE;
 printf( "WARNING at %d: Highly critical task 
 %c (\"%s\"),\ found NOT Schedulable!!%s", 
 now(), task->sys_id, task->name, bell);
 printf( "\nContinue anyway? (y/[N]) ");
 if( (i=getchar()) == '\n' i=='n' i=='N') 
 exit(0);
 else insert_task(low_crit_l,task->period,task);
 }
 }else{ /* task->criticality ==LOW */
 insert_task( low_crit_l, task->period, task);
 }
 }
 freeList( temp_list);
 printf( "which accounts for a critical load of %.1f%%, over a total 
 system load of %.1f%%\n", 
 100.0 * critical_task_load, 
 100.0 * task_load);
 if( task_load<=1.0) printf( "So, the whole task set MAY BE");
 else printf( "WARNING: the whole task set IS NOT");
 printf( " schedulable under MUF\n\n");
}
task_t *maximum_urgency_first( void)
{
 task_t *least, *leasth, *leastl;
 /* all tasks (except 'current') now have one less 'laxity' unit */
 leasth= update_laxity_and_get_least( high_crit_l);
 leastl= update_laxity_and_get_least( low_crit_l);
 least= (leasth==idle_task)? leastl : leasth;
 /* all tasks (except 'current') have one less 'laxity' time unit */
 if( least==idle_task)
 return( idle_task);
 else if( current== idle_task)
 return( least);

 else /* current task prevails other tasks with same merit */
 return( (*least->merit == *current->merit)? current : least);
}
void maximum_urgency_first_end( void){}


























































December, 1994
Writing Serial Drivers for UNIX


Dialing in and out on the same serial line




Bill Wells


Bill, a longtime UNIX programmer, can be contacted at bill@twwells.com.


If you've ever butted heads with UNIX serial ports, you've noticed that things
often don't work just right. One reason is that most serial drivers are
derived from a few ancestral sources, which have been hacked and kludged until
they work--more or less. (Indeed, I've read two books on writing device
drivers that start out by saying that the only practical way to write a
terminal-like driver is by starting from someone else's driver.)
This is certainly the case with the serial driver that is part of FreeBSD, the
UNIX-like operating system I run on my machine. That driver started out as the
HP dca driver, was modified into the com.c driver, and was finally used to
build the sio.c driver. It wouldn't surprise me at all to discover other
transformations in there, too. The FreeBSD system is based on the 386BSD
UNIX-like operating system developed by William Jolitz and described in a
series of DDJ articles entitled "Porting UNIX to the 386" (January 1991--July
1992). FreeBSD is available from a number of sources, including Walnut Creek
CD-ROM and the ftpmail server at gatekeeper.dec.com. 
While the FreeBSD serial driver has several minor annoyances, its major defect
is that your system can't dial in or out on the same serial line without
manual intervention. After mulling over the options of further hacking the
FreeBSD driver, waiting for developers to fix the problem, or porting an
existing driver, I opted to write a new driver altogether. (This driver has
since been ported to NetBSD, another 386BSD descendant.)
To successfully write a serial driver, you must have an understanding of
concurrency and control flow in the device driver, the kernel interface, and
the serial device itself. Good software-engineering practice dictates that the
various parts of the driver be distinguished and separate; my driver has five
sections: declarations, debugging and statistic facilities, hardware
manipulation, state changes, and system-call interface.


Coding the Serial Driver


In my serial driver, the declarations section contains type and variable
declarations specific to the driver itself. In particular, the LINE_STATE
enumeration describes the overall state of a line. One essential design
decision was to describe the line-state concept using a scalar rather than a
host of variables scattered through various data structures. By categorizing
and characterizing the four primary states of a line before I began coding, I
avoided most of the problems that beset other drivers. I'll discuss this
further when I talk about the driver's open routine.
If you've ever written code that involved asynchronous events, you'll know
just how difficult it can be to debug--bugs tend to depend on timing, so they
are often very difficult to reproduce, much less track down. Ideally, you want
a complete trace of the events that occurred just before the bug bit. In
reality, such a trace requires that expensive hardware or CPU time be spent
executing otherwise useless debugging code. 
Each function entry, function exit, and significant driver action gets
recorded in a circular buffer. This is moderately expensive in time, but a
judicious use of inline assembly and functions in the debugging code keeps
cost down. An external program takes a snapshot of the circular buffer, then
interrupts and prints it out so that you can make sense of it. If the driver
locks up the machine, the debugging code can be configured to record events on
the console screen where, by no coincidence, the event codes result in
distinguishable, readable characters on the screen.
Another useful item in this section is a "status print" routine. FreeBSD's
kernel debugger, ddb, can call an arbitrary kernel function. At any time, you
can break into the debugger and call this routine for a dump of most of a
line's variables.
The hardware-manipulation section is the only part of the driver that knows
the details of the hardware. Separating this part of the code let me implement
a "virtual UART," which separates the bit-twiddling code from the primary
driver logic. This also makes it relatively easy to support different sorts of
hardware.
One difficulty of writing a serial device driver is that UNIX has a long
interrupt latency. Consequently, serial drivers coded the "standard" way lose
characters. One solution, "pseudo-dma," available in FreeBSD and used in my
driver, replaces the standard interrupt-handler code with a much simpler one.
Instead of establishing a normal UNIX execution thread with all the overhead
that entails, the serial device's interrupt goes directly to the interrupt
handler. Instead of calling standard UNIX functions to transfer data to and
from the device (which is rather slow), data is transferred to and from the
driver's control structures. The handler signals the UNIX kernel when it needs
data or when it has data available; this functions much like an interrupt
except that it is software generated. The "soft-interrupt" handler does the
same work as a standard interrupt handler, except that instead of reading and
writing to a device, it reads and writes to the driver's control structures.
The state-transition section moves data from here to there and keeps track of
the state of the driver. Most of the tricky logic goes here. System calls
eventually result in calls into this section, and the soft-interrupt handler
is here. This part of the driver knows that it is dealing with a serial device
but it relies on the hardware routines to do the actual device manipulation.
In many drivers, an attempt is made to propagate changes to driver variables
throughout all the other variables that might be affected, This usually
results in extremely complex code, full of incomprehensible conditionals--that
never quite work right. In my driver, wherever this might be a problem, I use
a different approach. I centralize the computation of the variable in a single
routine, and then whenever anything might require a change to the variable, I
call that routine. Instead of a computation from cause to effect that
(hopefully!) considers exactly the effect that the change causes, I fully
recompute the variable. The variable always has the right value, and if this
is less efficient than the alternative, it isn't measurably so.
The system-call interface primarily handles system calls from user processes.
This is where your open/close/read/write calls end up in the driver. This is
fairly straightforward code, except for the open routine in Listing Two . 
Suppose you want to dial out on a modem. You need to communicate with the
modem to get it to dial. When a connection is established, you then need to
let the dial-out application gain control of the line. If the carrier goes
away, you want the application to receive a hang-up signal.
Dialing in, on the other hand, requires that a front-end program (typically
getty) monitor the line for a connection indicated by the presence of a
carrier, then execute the application once the connection exists. If you were
only dialing in on the line, this would be easy: Just wait for input and
proceed. However, if you dial both in and out on the same line, things get
trickier. Sure, you could make everything work with the help of
application-level interlocking. This involves finicky code that every single
application must get right. Rather than do this, it's better to get the driver
to help out.
One approach is to prevent an open dial-in device from completing until a
carrier is present. Then getty can simply attempt to open the line; once the
open is successful, it knows there is a connection on the other side.
While you could have two varieties of opens--blocking for dial-in and
nonblocking for dial-out--this isn't enough. What happens to an open that
occurs while the line is open? There's a carrier, so both types of open will
complete. If that open is a getty while a dialect is in progress, it could be
made to work, but this, too, would involve ugly interlocking code in each
dial-in and dial-out application.
A better approach is to have the driver distinguish between dial-out and
dial-in opens; when either one has completed, the other is prevented from
completing. This leaves open how the driver is to distinguish the two sorts of
opens. There are a number of ways to do this; the one I chose is to
distinguish the type of open by a bit in the minor device number.
Devices in the range 0--127 are dial-in devices; those in the range 128--255
are dial-out. A getty tries to open a dial-in device, but it cannot succeed
until there is a carrier and no dial-out device is open. A dial-out program,
such as uucico, tries to open a dial-out device. This succeeds unless the
corresponding dial-in device is open, in which case the open fails
immediately.
It would be nice if the UNIX kernel kept track of who is open and who is
waiting to open; unfortunately, it doesn't. First, the kernel has no notion
that the dial-in and dial-out devices are related. Second, the kernel does not
try to keep track of opens and those waiting for opens; for most other types
of drivers, it just isn't needed. Keeping track is thus left to the driver
itself.
The primary parameter for line state is whether opens are waiting or active
for the line. When there are none, no one is using or trying to use the
driver. DTR is not sent by the UART, and the UART is ignored. Once an open is
attempted, one of the open wait counts goes nonzero. In that state, DTR is
turned on to let a modem know that a process is preparing to use the line. The
modem-status lines are monitored for a carrier. When the open completes, the
wait count is decremented but the appropriate active flag is set. Modem
controls are used to manipulate the modem, modem status is used for flow
control and for discovering loss of carrier, and data is transferred to and
from the line.
These conditions are summarized in a single LINE_STATE variable. Each time one
of the open wait counts or active flags changes, this variable is recomputed;
see Listing One . If it changes, various bits in the driver are changed
appropriately and the UART is programmed for that state.
(There is one more state: When the last close for a line happens, the driver
is placed in a "shutdown state" while it does the things needed to clean up
the line. During the shutdown state, no opens are allowed on the line, and, as
the last open has been closed, the line is entirely under the control of the
driver. That lets it do things such as manipulate the modem-control lines to
hang up a modem without interference.)


The open Routine


Having a state variable makes coding the open a snap. The heart of the open is
of a single loop. On entry to the loop, the open wait count is incremented.
The loop is only exited when the open is to fail with an error or succeed. In
either case, the open wait count is decremented, because the open is not
waiting anymore. Then, the open either returns an error or sets the
appropriate open active flag and actually opens the line. 
The loop is, conceptually, very simple. At the top of the loop, the line
status is tested and the driver set to the appropriate state. Then a series of
tests are performed. A given test says either that this open may not succeed,
in which case the loop is exited with an error, or that this open is to
succeed right now, in which case the loop is exited without an error. If all
the tests fall through, the open goes to sleep and stays asleep until
something relevant changes. When it wakes up, it goes right back up to the top
of the loop, where it does all of the tests again.
Coding this loop was, in fact, my main reason for writing a new driver. Most
drivers simply get this wrong. Typically, they use two loops, which test for
different open conditions, both of which have to be true before the open may
succeed. However, if you have two loops, there is either a sequence of events
that causes the open to succeed when it shouldn't or a sequence that causes an
open to deadlock, thus preventing the open from succeeding when it should. 
About Ring Buffers
One problem facing driver developers is efficiently moving the data between
the interrupt handlers and the rest of the driver. For small amounts of data,
this isn't difficult; but the characters sent and received are a large amount
of data that must be moved quickly. Functionally, you need a queue: characters
placed in one end, where they sit until extracted from the other. One of the
most time- and space-efficient ways to implement a queue is the "ring buffer,"
or circular queue.
A ring buffer is an array of elements with two pointers: "write" and "read." A
character is added to the ring buffer by storing it at the write pointer, then
advancing the write pointer. A character is removed from the ring buffer by
loading it from the read pointer, then advancing that pointer. When the
pointer advances past the end of the ring buffer, it is made to point to the
start of the buffer.
There is a "gotcha" when writing ring-buffer code: When the ring buffer is
empty, the read and write pointers are equal; the same is true when it is
full. A full ring buffer must be distinguished from an empty one, so the ring
buffer must not be allowed to become full.The solution I use involves having
the read pointer trail one behind the actual read position. Instead of read
and advance, I advance and read. As the write pointer is not allowed to
advance past the read pointer, it can never actually reach the read position,
so the buffer cannot become full.
While writing the ring-buffer code in Listing Three , I discovered that if you
are careful, it is not necessary to do any sort of interlocking between buffer
readers and writers--you can interleave the execution of a read routine and a
write routine, and things will still work. The trick is that at the front of
the read routine, the read pointer is accessed once and stored in a temp. This
value is then compared with the write pointer to check for an empty buffer.
Once data is retrieved, the updated pointer value is stored back in the ring
buffer's pointer. A similar procedure is followed in the write routine. To
enforce this access/compare sequence, the pointers are declared volatile. This
tells the compiler not to do things like optimizing those operations into
something unexpected.
The end result is that the top half of the driver can add characters to the
write buffer while the interrupt routine reads from it, without any special
precautions being taken. Similarly, characters read by the interrupt routine
can be placed in the read ring buffer, without worrying if the top half of the
driver was reading from it.
The ring-buffer code in the driver is stand-alone, implemented as a C include
file, and not restricted to character elements. It is intended to be portable
and useful in applications other than the driver.

--B.W.

Listing One 

STATIC void
sio_change_line_state(SIO_CTL *ctl)
{
 LINE_STATE new_state;
 LINE_STATE old_state;
 sio_record_call(EV_CHANGE_LINE_STATE, ctl->sc_unit, 0);
 /* What should the new state be? Return if no change. */
 if (ctl->sc_shutdown != CL_NONE) {
 new_state = ST_SHUTDOWN;
 } else if (ctl->sc_actin ctl->sc_actout) {
 new_state = ST_ACTIVE;
 } else if (ctl->sc_winc ctl->sc_woutc) {
 new_state = ST_WOPEN;
 } else {
 new_state = ST_INACT;
 }
 old_state = ctl->sc_lstate;
 if (old_state == new_state) {
 sio_set_modem_control(ctl);
 sio_record_return(EV_CHANGE_LINE_STATE, 0, 0);
 return;
 }
 sio_record_event(EV_LSTATE, new_state, 0);
 ctl->sc_lstate = new_state;
 if (new_state == ST_ACTIVE) {
 sio_flush_input(ctl);
 }
 if (new_state == ST_INACT) {
 ctl->sc_rtsline = ctl->sc_dtrline = 0;
 ctl->sc_carrier = 0;
 }
 sio_set_interrupt_state(ctl);
 if (old_state == ST_INACT) {
 ctl->sc_rtsline = ctl->sc_dtrline = 1;
 }
 sio_set_modem_control(ctl);
 sio_wake_open(ctl);
 sio_record_return(EV_CHANGE_LINE_STATE, 1, 0);
}



Listing Two

int
sioopen(dev_t dev, int flag, int mode, PROC *p)
{
 SIO_CTL *ctl;
 bool_t callout;
 dev_t unit;
 spl_t x;
 TTY *tp;
 const char *reason;
 error_t error;
 SIO_IF_DEBUG(static ucount_t onum;)

 sio_record_call(EV_SIOOPEN, minor(dev), flag);
 /* Extract the unit number and callout flag. */
 SIO_IF_DEBUG(++onum;)
 unit = UNIT(dev);
 if ((u_int)unit >= NSIO !(ctl = sio_ptrs[unit])) {
 sio_record_return(EV_SIOOPEN, 0, ENXIO);
 return (ENXIO);
 }
 callout = CALLOUT(dev);
 dev = makedev(major(dev), UNIT(dev));
 tp = ctl->sc_tty;
 /* Record that we're waiting for an open. */
 if (callout) {
 ++ctl->sc_woutc;
 } else {
 ++ctl->sc_winc;
 }
 sio_set_wopen(ctl);
 error = 0;
 x = spltty();
 while (1) {
 /* Get the device set up as necessary. */
 sio_change_line_state(ctl);
 /* If the line is set to exclude opens, and if the line is
 actually open, forbid anyone but root from opening it. */
 if ((tp->t_state & TS_XCLUDE)
 && (ctl->sc_actout ctl->sc_actin)
 && p->p_ucred->cr_uid != 0) {
 error = EBUSY;
 break;
 /* Shutdown temporarily prevents all opens. */
 } else if (ctl->sc_lstate==ST_SHUTDOWN) {
 reason = "sioocls";
 /* A dialout open succeeds unless there is an active
 dialin open, in which case it fails. */
 } else if (callout) {
 if (!ctl->sc_actin) {
 break;
 }
 if (!(tp->t_cflag & CLOCAL)) {
 error = EBUSY;
 break;
 }
 reason = "sioinw";
 /* A dialin open will not succeed while there are active or 
 pending dialout opens. It also requires a carrier or clocal. */
 } else {
 if (ctl->sc_actout ctl->sc_woutc) {
 reason = "sioout";
 } else if (!(tp->t_cflag & CLOCAL)
 && !(tp->t_state & TS_CARR_ON)) {
 reason = "siocar";
 } else {
 break;
 }
 }
 /* If we're here, either the line was in shutdown or a dialin
 open is going to wait. If this is a nonblocking open,
 return. Otherwise, sleep. */

 if (flag & O_NONBLOCK) {
 error = EWOULDBLOCK;
 break;
 }
 sio_record_event(EV_SLEEP, onum, 0);
 error = tsleep((caddr_t)ctl, TTIPRI PCATCH,
 reason, 0);
 sio_record_event(EV_WAKE, onum, error);
 if (error != 0) {
 break;
 }
 }
 /* The open has succeeded. We're no longer waiting for open.*/
 if (callout) {
 --ctl->sc_woutc;
 } else {
 --ctl->sc_winc;
 }
 sio_set_wopen(ctl);
 /* If the open errored, reset the device and return the error. */
 if (error != 0) {
 sio_change_line_state(ctl);
 splx(x);
 sio_record_return(EV_SIOOPEN, 1, error);
 return (error);
 }
 /* Next, set up the tty structure. */
 tp->t_oproc = siostart;
 tp->t_param = sioparam;
 if (!(tp->t_state & TS_ISOPEN)) {
 tp->t_dev = dev;
 ttychars(tp);
 if (!tp->t_ispeed) {
 tp->t_iflag = 0;
 tp->t_oflag = 0;
 tp->t_cflag = CREAD CS8 HUPCL;
 tp->t_lflag = 0;
 tp->t_ispeed = tp->t_ospeed = TTYDEF_SPEED;
 }
 ttsetwater(tp);
 (void)sioparam(tp, &tp->t_termios);
 }
 /* Do the line discipline open. This marks the line open. */
 error = (*linesw[tp->t_line].l_open)(dev, tp, 0);
 if (error != 0) {
 sio_change_line_state(ctl);
 splx(x);
 sio_record_return(EV_SIOOPEN, 2, error);
 return (error);
 }
 /* The line is now open. Let it rip. */
 if (callout) {
 ctl->sc_actout = 1;

 /* Dialout devices start by pretending they have
 carrier. */

 tp->t_state = TS_CARR_ON;
 } else {

 ctl->sc_actin = 1;
 }
 sio_set_wopen(ctl);
 sio_change_line_state(ctl);
 splx(x);
 sio_record_return(EV_SIOOPEN, 3, error);
 return (error);
}



Listing Three

#include <stdlib.h>

#if !defined(RB_PREFIX)
#define RB_PREFIX rb_
#define RB_TYPE char
#define RB_CONTROL RBUF
#define RB_QUAL static
#endif

#define RB_GLUE1(x,y) x ## y
#define RB_GLUE(x,y) RB_GLUE1(x,y)
#define RB_NAME(x) RB_GLUE(RB_PREFIX, x)

#if !defined(RB_SET)
#define RB_SET(buf,rh,wh) (\
 (buf)->rb_rhold = (buf)->rb_start + (rh),\
 (buf)->rb_whold = (buf)->rb_start + (wh))
#define RB_GET(buf,rh,wh) (\
 (rh) = (buf)->rb_rhold - (buf)->rb_start,\
 (wh) = (buf)->rb_whold - (buf)->rb_start)
#endif
#if !defined(RB_OVERHEAD)
#define RB_OVERHEAD (1)
#endif

typedef struct {
 RB_TYPE *volatile rb_rhold;
 RB_TYPE *volatile rb_whold;
 RB_TYPE *rb_start;
 RB_TYPE *rb_end;
 size_t rb_size;
} RB_CONTROL;
/* This initializes a ring buffer. */
RB_QUAL void
RB_NAME(init)(RB_CONTROL *p, RB_TYPE *d, size_t n)
{
 p->rb_start = d;
 p->rb_end = d + n;
 p->rb_size = n;
 p->rb_whold = p->rb_start;
 p->rb_rhold = p->rb_end - 1;
}
/* Returns the size of the ring buffer. Note that this is the maximum number
 of elements that may be placed in it, not the size of allocated area. */
RB_QUAL size_t
RB_NAME(size)(const RB_CONTROL *p)

{
 return (p->rb_size - 1);
}
/* This writes one datum to the ring buffer. The return value is the
 number of items written, 0 or 1. */
RB_QUAL size_t
RB_NAME(putc)(RB_CONTROL *p, const RB_TYPE *d)
{
 RB_TYPE *wp;
 wp = p->rb_whold;
 if (wp == p->rb_rhold) {
 return (0);
 }
 *wp++ = *d;
 if (wp == p->rb_end) {
 wp = p->rb_start;
 }
 p->rb_whold = wp;
 return (1);
}
/* This writes an arbitrary number of elements to the ring buffer. The
 return value is the number of items written. */
RB_QUAL size_t
RB_NAME(puts)(RB_CONTROL *p, const RB_TYPE *d, size_t n)
{
 RB_TYPE *rh = p->rb_rhold;
 RB_TYPE *wp;
 size_t c;
 size_t r;
 /* If the data in the buffer is wrapped, the hole into which data may
 be placed is not wrapped. This makes a wrapped buffer be the easy case. */
 wp = p->rb_whold;
 if (wp < rh) {
 c = rh - wp;
 if (n < c) {
 c = n;
 }
 if (!c) {
 return (0);
 }
 r = c;
 do {
 *wp++ = *d++;
 } while (--c);
 p->rb_whold = wp;
 return (r);
 }
 /* This next case handles an unwrapped buffer where the data
 fits before the end of the buffer. */
 c = p->rb_end - wp;
 if (c >= n) {
 c = n;
 if (!c) {
 return (0);
 }
 r = c;
 do {
 *wp++ = *d++;
 } while (--c);

 if (wp == p->rb_end) {
 wp = p->rb_start;
 }
 p->rb_whold = wp;
 return (r);
 }
 /* Finally, deal with the case where data will wrap. Since the write
 pointer is never at the end of the buffer, there is always one 
 element in the buffer. So, this copy doesn't require testing. */
 r = c;
 n -= r;
 do {
 *wp++ = *d++;
 } while (--c);
 /* Next, copy data to the start of the buffer. This might not
 copy any data if rhold is at the start of the buffer. */
 wp = p->rb_start;
 c = rh - wp;
 if (n < c) {
 c = n;
 }
 if (c) {
 r += c;
 do {
 *wp++ = *d++;
 } while (--c);
 }
 p->rb_whold = wp;
 return (r);
}
/* Returns the number of elements that may be put into the buffer. It is, in
 effect, a put routine, which means that you can't call it in a context where
 it might overlap with a put of the ring buffer. However, asynchronous gets 
 may occur, which would increase the number of available elements to above 
 what this routine returns. */
RB_QUAL size_t
RB_NAME(pcount)(const RB_CONTROL *p)
{
 RB_TYPE *rp = p->rb_rhold;
 RB_TYPE *wp = p->rb_whold;
 return (rp < wp ? p->rb_size - (wp - rp) : rp - wp);
}
/* This reads one datum from the ring buffer. The return value is the
 number of items returned, 0 or 1. */
RB_QUAL size_t
RB_NAME(getc)(RB_CONTROL *p, RB_TYPE *d)
{
 RB_TYPE *rp;

 rp = p->rb_rhold + 1;
 if (rp == p->rb_end) {
 rp = p->rb_start;
 }
 if (rp == p->rb_whold) {
 return (0);
 }
 *d = *rp;
 p->rb_rhold = rp;
 return (1);

}
/* This reads an arbitrary number of items from the ring buffer. The
 return value is the number of items returned. */
RB_QUAL size_t
RB_NAME(gets)(RB_CONTROL *p, RB_TYPE *d, size_t n)
{
 RB_TYPE *wh = p->rb_whold;
 RB_TYPE *rp;
 size_t c;
 size_t r;
 /* Handle the easy case, where the buffer is not wrapped. */
 rp = p->rb_rhold + 1;
 if (rp == p->rb_end) {
 rp = p->rb_start;
 }
 if (rp <= wh) {
 c = wh - rp;
 if (n < c) {
 c = n;
 }
 if (!c) {
 return (0);
 }
 r = c;
 do {
 *d++ = *rp++;
 } while (--c);
 p->rb_rhold = rp - 1;
 return (r);
 }
 /* The buffer is wrapped, which means that the data to be
 returned might span the end of the buffer. This case
 applies when the data wanted will not span the end of the buffer. */
 c = p->rb_end - rp;
 if (n <= c) {
 c = n;
 if (!c) {
 return (0);
 }
 r = c;
 do {
 *d++ = *rp++;
 } while (--c);
 p->rb_rhold = rp - 1;
 return (r);
 }
 /* The buffer is wrapped and so is the data that is to be
 returned. First, copy the data at the end of the buffer. */
 r = c;
 n -= r;
 do {
 *d++ = *rp++;
 } while (--c);
 /* There might be nothing left to copy if whold is at the
 start of the buffer. */
 rp = p->rb_start;
 c = wh - rp;
 if (n < c) {
 c = n;

 }
 if (c) {
 r += c;
 do {
 *d++ = *rp++;
 } while (--c);
 p->rb_rhold = rp - 1;
 } else {
 p->rb_rhold = p->rb_end - 1;
 }
 return (r);
}
/* This routine returns the number of data elements in the buffer. It
 is, in effect, a get routine, which means that you can't call it in
 a context where it might overlap with a get of the ring buffer.
 However, asynchronous puts may occur, which would increase the
 number of elements to above what this routine returns. */
RB_QUAL size_t
RB_NAME(gcount)(const RB_CONTROL *p)
{
 RB_TYPE *rp = p->rb_rhold + 1;
 RB_TYPE *wp = p->rb_whold;

 return (wp >= rp ? (wp - rp)
 : rp == p->rb_end ? wp - p->rb_start
 : p->rb_size - (rp - wp));
}
/* This clears all data from a ring buffer. */
RB_QUAL void
RB_NAME(gclear)(RB_CONTROL *p)
{
 RB_TYPE *wp = p->rb_whold;
 p->rb_rhold = wp == p->rb_start ? p->rb_end - 1 : wp - 1;
}
#undef RB_PREFIX
#undef RB_TYPE
#undef RB_CONTROL
#undef RB_QUAL
#undef RB_GLUE1
#undef RB_GLUE
#undef RB_NAME





















December, 1994
Adding Animation to Windows Help


A look at two new animation toolkits 




Peter Kent


Peter is a consultant specializing in creating WinHelp files and training new
WinHelp authors, and has written 11 computer books on various topics. He can
be contacted at 303-989-1869 or on CompuServe at 71601,1266.


As Al Stevens pointed out in his article, "Help for Windows Help Authors"
(DDJ, April 1994), a Windows app must provide online help if it is to be taken
seriously by users. In fact, the demand for sophisticated, flashy Windows help
systems including everything from animation to audio has grown since Al
examined help tools and techniques. Animation in particular is useful for
recording a short tutorial that demonstrates how to use an application.
However, none of the tools Al looked at provided a direct way to add animation
to a help file. Movie Development Kit, from Lantern, and DEMOquick, from AMT,
are two toolkits that let you add animation to Windows help files. You can
link animations to buttons or hot spots, record user actions for interactive
tutorials, and create animations by capturing individual frames in a graphics
program. You can also run animations independently of Windows Help files.
Another tool similar to the two discussed here will be available from Blue Sky
Software, makers of RoboHelp; however, at this writing, this tool has not yet
been released.


Movie Development Kit


The Movie Development Kit provides a simple, low-cost way of recording
animations and embedding them into other applications. In addition to the
ability to record frames from either a window or the entire screen, the
toolkit's features include capture and playback tools; a device-independent
file format for displaying 16, 256, or more colors; and a scripting language
that allows you to create self-running demos, slide shows, and so on.
The Movie Development Kit provides a number of ways to record frames. You can
click on the Snap button in the program's remote-control dialog box, use a
keyboard shortcut, or click on the Movie icon with the right-hand mouse
button. You can also configure Movie to take snapshots automatically. You do
this either by setting up a time interval, or configuring Movie to take a
snapshot when you click the mouse button, press a key, or immediately before
or after the capture window repaints (Movie watches for the WM_PAINT message).

One useful feature of Movie is its ability to record the mouse pointer in each
frame. If you're creating a tutorial, for instance, you'll probably make use
of this. However, you may want to omit the pointer from the frames if you are
creating other types of animation. Once you've recorded the animation, you can
play it back frame by frame or all together, but there are no tools for
deleting frames or adding notes to the animation. Movie provides a facility
for creating distribution disks containing the animations.
To use a Movie animation with a Windows Help file, you must place the
animation in an embedded window inside a Help topic. In WinHelp, embedded
windows are created using the {ew} command in the .RTF file. (If you are using
an authoring tool such as ForeHelp or Doc-to-Help, you may never see the .RTF
codes; nonetheless, the authoring tool is entering codes such as these into
the .RTF file for you.) For instance, to run an animation when a topic opens,
add the command {ewc VoyEwh.dll, VoyEwhMovie, ;ID=1 ;File=filename.mov
;AutoPlay} to the .RTF file. In this command, ew is the embedded-window
command, c places the embedded window in the center of the topic, VoyEwh.dll
is the Movie .DLL file that runs the animation, VoyEwhMovie is the class name,
ID=1 is an identification number for this instance of the movie,
File=filename.mov is the name of the file you created when you recorded the
animation, and AutoPlay is a parameter that tells the .DLL to play the
animation the first time the topic is entered. There are a number of other
parameters: You can display a border around the embedded window; play specific
frames; play in ping-pong mode (so it plays one way and then reverses); create
sliders, Play buttons, and Stop buttons; control the frame number on the
embedded window; and modify the speed at which the animation plays.
Movie also includes a number of macros that allow the user to control a movie
from text and graphic hot spots. These macros can be used to modify animation
speed, copy a frame to the clipboard, display a particular frame, play the
animation, stop the animation, or place a frame number in the animation. You
register Movie macros using WinHelp's RegisterRoutine macro. This is placed in
the [CONFIG] section of the project's .HPJ file with a command such as
RegisterRoutine(`VoyEwh.dll' `macroname', `parameterspec'), where VoyEwh.dll
is the .DLL containing Movie's macros, macroname is the macro you want to use,
and parameterspec is the parameters passed to the macro, usually ui. Once
registered, the macros can be used in the same way you use WinHelp's standard
macros--you can run them from hot spots, buttons, or when a topic or the Help
project opens.
Movie also lets you store the .MOV file as baggage within the .HLP file.
Baggage is added to a Help project by simply naming the baggage file in the
[Baggage] section of the project's .HPJ file. When you compile the .HLP file,
the .MOV file will be stored inside. Of course, there are several other files
that can't be placed as baggage inside the .HLP file but must be installed in
the \WINDOWS\SYSTEM directory before the animation can be run.


DEMOquick


DEMOquick provides an easier way to record animations and lets you quickly and
easily add notes to your tutorials. In fact, you can add voice if you want to
get fancy. You'll start by using Mimic, an accessory program, to record your
animation or tutorial in an .AVI video file. You have several recording
options. You can record a frame on any mouse action--button up or button down
on any specified button. You can also record on a key-up or key-down event,
when you press a hot key that you have defined, or when you click on a button
in a control box. Mimic also lets you filter out double mouse clicks and the
Shift, Alt, and Ctrl keys, so double clicks and keyboard combinations won't
give you two frames.
As with Movie, you can also use Mimic to record the mouse pointer, but you
won't usually want to do so, because DEMOquick can simulate smooth pointer
movements. The only reason to capture the mouse pointer in Mimic is if you are
creating single frames for some purpose other than creating an animation and
want to show the pointer position.
Setting up and starting the recording takes just a minute or two. You can
capture a full screen or the active window, and while you can't change the
capture area during a recording session, you can change other capture options,
such as the key and mouse triggers.


Editing the Animation


The real benefits of DEMOquick become clear once you've finished recording the
animation and begin editing. You can group several .AVI files together into
one sequence, then create pop-up messages and insert them into the animation.
You can also use a variety of other options, such as inserting music and
voice, bit-mapped graphics, or other .AVI files; specifying the type of mouse
pointer you want to use; or adding a border and specifying its color.
Adding popups to an animation is straightforward: Use the scroll bar at the
top of the DEMOquick control panel to move to the frame at which you want to
add the popup, then click on the Add Popup button. DEMOquick places the box on
the screen. Click on the PopUp Options button and you can modify the box--add
title and scroll bars, modify the text and background color, and select a font
size. You can also drag the pop-up border and title to modify its size and
position. Then, to add text to your popup, just place the cursor in the box
and type. (DEMOquick lets you export all your pop-up-note text, check the
spelling in your word processor, then reimport the corrected text.) When
you've finished creating the first pop-up note, you can continue moving
through the animation, adding more pop-up notes as desired.
When you are editing, the popups contain Cut, Paste, Copy, and Delete buttons;
in the final, compiled animation, the popups have buttons that let the user
move around in the animation, quit, or modify the animation speed. Using
DEMOquick's Other Options dialog box you can even add a button of your
own--this can be used to run an external program from within the animation.
Adding other components to the animation is also fairly simple. You can place
a .BMP, .WMF, or .ICO file within a frame, and position it exactly. You could
use this feature to add arrows, for instance, pointing from the popup to the
part of the screen to which you wish to direct your audience's attention. You
can also insert a single frame from an .AVI video file, play the entire file,
or crop a frame so that only a part of it appears.
DEMOquick lets you add Waveform and MIDI files. Again, adding them is a very
simple process. You can even record the .WAV sound directly into the animation
file, providing a simple way to add a voice-over to your animation to explain
the next step in your tutorial. DEMOquick also provides a number of run
options. You can modify the animation's border color (if you recorded an area
less than full screen), define how quickly the animation will run, select the
type of mouse pointer to be used (or turn the pointer off between specified
popups), and set the location of the animation on the screen.


Virtual-Help


To use macros to link your DEMOquick tutorials to your help file you'll need a
$300 accessory application called "Virtual-Help." Instead of placing the
animation in an embedded window within the Help file, a WinHelp macro creates
a DDE link to the DEMOrun program, which then takes over and displays a
full-screen animation outside the Help file. The user is returned to the help
file when the animation is closed. The person using the Help file can actually
switch between the animation and the Help file using normal Windows
task-switching procedures, such as Alt-Tab. This is similar to the way in
which the Help system in Word for Windows 6.0 runs its demos--outside WinHelp,
in a full-screen application. (With Word 6.0 demos, though, the user can't
leave the demo without closing it.)
You can attach WinHelp macros to buttons and graphic or text hot spots. For
instance, to allow the user to run a demo by clicking on hot-spot text, you'd
use the macro !ExecProgram(`DDELINK RUN/AUTOSTART= filename'0), where
ExecProgram is the WinHelp macro used to run external programs from within
WinHelp, DDELINK RUN is the command that starts the DDE link (which in turn
runs DEMOrun), /AUTOSTART= is the parameter that tells DEMOrun to start a
demo, filename is the name of the demo you want to start, and 0 is the
ExecProgram parameter that defines the application's display state (for
DEMOrun, you would always use 0). The hidden text is placed after the hot-spot
text in the .RTF file and the tutorial will run when the user clicks on this
text. The user is returned back to the Help file when the tutorial is closed.
While the tutorial is running, the user has a few options. The pop-up windows
have forward and back buttons--the forward button runs the next few frames,
stopping at the next popup, while the back button simply displays the previous
popup and its associated frame, allowing the user to run through the step
again. There's also a Jump button which lets the user return to the start of
the tutorial or adjust the speed. 


Experiences



Although placing Movie animations in a Help file is a simple procedure, I
found recording animations a little awkward. Defining the area of the
animation is difficult. For instance, clicking the clipboard mouse pointer
often grabs the window without the title bar, though you can define an area by
rubber-banding around it with the pointer. In addition, the mouse and key
triggers don't work well: In some applications, you can't use them at all; in
others, the mouse may work but not the keyboard; and in all cases, they only
work in a single application. If your tutorial spans two or more applications,
the mouse and key triggers stop working when you enter the second program.
These problems make recording more difficult than it should be, and getting a
smooth mouse-pointer motion in the tutorial takes some effort.
Another major drawback in Movie is its inability to add notes directly to your
tutorial. To overcome this, you could break the tutorial down into several
small animations, using a macro to run each piece from the Help file and
interspersing the bites with notes built into the Help topic itself.
Alternately, you could simply record frames with text inside some kind of text
box--Notepad, for instance--within the animation. But that's a lot of trouble.
DEMOquick and DEMOrun are also not without their problems. Norton Desktop
interferes with recording animations, and if you are running PC Tools for
Windows Desktop, the animation may not play properly. In some cases, an
incompatible sound driver might cause an error message to appear each time the
animation tries to play a sound (AMT plans to fix this by disabling the
animation's sound the first time the problem occurs). DEMOrun also has a few
memory problems. When system resources are below about 35 percent, DEMOrun may
not start. Additionally, DEMOrun's DR_PRO.EXE program stays in memory when you
quit; you have to use a macro from the Help file to remove the program from
memory before the Help file is closed. Occasionally, you can't get DEMOrun to
start if one instance of the program is already open. The problems are
compounded by documentation that is unclear in places.


Conclusion


The Movie Development Kit is fine for creating simple animations that contain
little or no additional text or audio information. Some of its problems are a
little troubling, though. DEMOquick, on the other hand, costs a bit more, but
its stronger editing features facilitate more advanced tutorials with text
notes, smooth mouse scrolling, voice, and music.


For More Information


Movie Development Kit 4.01.
Retail Price: $299.00
Lantern Corp.
63 Ridgemoor Drive
Clayton, MO 63105
314-725-6125
DEMOquick 2.13 with Virtual-Help option
Retail Price: $790.00
AMT Learning Solutions Inc.
183 Guggins Street
Boxborough, MA 01719
508-263-3030




































December, 1994
Building an E-mail Manager


PowerBuilder Desktop meets QmodemPro




Michael Floyd


Michael is DDJ's executive editor. He can be reached at mfloyd@mfi.com, on
CompuServe at 76703,3047, or through the DDJ offices.


Rapid application development (or RAD) tools emphasize drag-and-drop visual
programming and client/server development (even though most of the tools I've
seen address only the client side). To a large degree, the "rapid" part of the
application development stems from a focus on building user interfaces using
software "components." Among the RAD tools are Microsoft's Visual Basic,
Powersoft's PowerBuilder (both based on Basic), IBM's Visual Age, Symantec's
Enterprise Developer, and Borland's yet-to-come Pascal-based Delphi95. 
To explore RAD-based development, I've created a minimal communications engine
which allows me to automatically log on and exchange e-mail using a variety of
online services such as CompuServe, MCI Mail, Internet, and DDJ Online. In
this article, I'll use PowerBuilder Desktop to build the front end. In the
future, I'll revisit this project using other RAD front-end tools.
On the communications side, I use QmodemPro for Windows from Mustang Software.
While there are a number of asynchronous communications libraries
available--CommLib 5.0 from Greenleaf Software or Asynch Professional from
Turbo Power Software, among them--I decided to use QmodemPro for Windows
because of its built-in scripting language, SLIQ (Script Language Interface
for Qmodem). This powerful scripting language does much more than simply
automate communications tasks. SLIQ scripts, which are compiled by Qmodem's
built-in script compiler, allow you to create windows and dialog boxes and
call Windows DLLs, as well as providing a number of useful functions for
handling communications. Also, the script compiler includes a debugger
facility that lets you set breakpoints, step through code, and set variable
watches. 
While the engine handles sending and receiving mail from each of the different
services, a front-end Windows application handles the storage and retrieval of
messages in the mail database, displaying of messages, and so on. The
front-end application simply invokes QmodemPro by calling the appropriate
script. 


The Qmodem Connection


The first time I wrote publicly about Qmodem was in a 1984 issue of the CCF
Bulletin, a newsletter for users of the Central Computing Facility at
NASA/Ames Research Center in Mountain View, CA. Written by John Friel of the
Forbin Project, Qmodem was a pioneer in many areas. Qmodem was, at the time,
arguably the best PC communications program around, and it was freeware
("shareware" had not yet taken hold). Qmodem was also an early example of what
could be done with Turbo Pascal. In fact, Friel wrote freely about his use of
the Turbo in the doc files.
Since then, Qmodem has been acquired by Mustang Software, well known for its
Mustang BBS, and turned into a commercial toolkit. John Friel continues to
work on the DOS version. The Qmodem family now includes a Windows version,
although it is under separate development. Because of these separate
development paths, there are significant differences between the DOS and
Windows versions, and little compatibility between the scripting language, the
phone book, and so on. 
QmodemPro for Windows provides full support for asynchronous connection, a
long list of terminal-emulation options, file-transfer protocols (including
several variations of XModem and YModem, Kermit, CIS B+, and ASCII), fax and
host-mode capabilities, and SLIQ. SLIQ is a Basic-like scripting language
which includes some rather powerful features for Windows developers. For
starters, the scripting language provides the ability to create Windows dialog
boxes. To do so, you create a dialog-box template like that shown in Figure 1.
The dialog-box definition includes definitions for controls such as check
boxes, radio buttons, combo boxes, group boxes, list boxes, justified text,
edit controls, pushbuttons, and so on. Once the dialog is defined, you simply
create an instance of the dialog box based on the template. Note in Figure 1
that the width and height of the dialog are specified in "dialog units."
Because they are based on the size of the font used in the dialog, dialog
units are used to avoid scaling problems when changing fonts.
Listing One presents a script to parse mail headers in CompuServe Mail and
determine whether a message is new. If the message is new, the script
downloads the message to a file. A feature I don't like in mail-reader
programs such as ConnectSoft's E-Mail Connection is that they automatically
delete online mail after it has been downloaded. Although you have the option
of doing so, the scripts I present here do not automatically delete mail.
Instead, a mail header file (MAIL.HDR) is opened in capture mode and the
CompuServe SCAN command is issued. SCAN displays all mailbox information
without disturbing a message's "read" status. The presence of the string
"Expire:" within the header indicates that the message has already been read;
headers not containing the string are new. So, the mail script parses this
file to determine which messages are new and downloads them using the Download
function. The filename is constructed based on the service from which it was
downloaded and the message number assigned to it by the service. For example,
Message number 23 on CompuServe is given the filename CIS.23. This is a
temporary file which will be deleted when the message is stored in the
database by the front-end application.
As you can see in Listing One, SLIQ provides a number of built-in functions
like Dial, Send, Waitfor, Delay, and HangUp that make communication with the
online service painless. SLIQ also includes a LogFile function used to create
a record of the session. Although I haven't used the log file on the
PowerBuilder side, it's useful in determining the success or failure of a
message transfer.
Because the transfer uses CompuServe's B+ file-transfer protocol, it doesn't
matter to the script whether the file is a text message or a binary file such
as a .ZIP file. If the file transfer is successful, an appropriate message is
displayed and the script logs off the system. For the sake of simplicity,
however, I didn't implement error handling with this version. If you plan to
use these scripts, error handling will be the first enhancement you'll want to
make. Complete scripts for sending and receiving e-mail via DDJ Online, MCI
Mail, and CompuServe are provided electronically; see "Availability," page 3.


Database and Interface


PowerBuilder is an object-oriented development environment that supports
features such as inheritance, encapsulation, and user-defined objects. It
includes an enriched set of database portability and management functions; the
ability to support large-scale projects, including report generation and
object libraries with check-in/check-out procedures; and a complete
implementation of Windows objects, events, functions, and
communications--including OLE, MDI, DDE, and DLL calls. PowerBuilder also
comes with the Watcom SQL database and a complete set of ODBC drivers covering
virtually all PC databases.
When developing an application, you work with various painters, much like you
work with wizards in Visual C++. You start by creating an application object
using the application painter. Database tables and connections are painted
using the database painter, and views into the database are created using a
data window painter. The Window painter, not to be confused with the data
window painter, is the portion of PowerBuilder that is most like Visual Basic.
There, you create windows and controls, set their properties, and create
handlers for events. PowerBuilder includes a scripting language called
"PowerScript" for this purpose. Scripts are primarily used to handle events
associated with an object or control (such as a click event for a button).
PowerScript is Basic-like in syntax and includes a sizable function library.
Additionally, PowerScript allows you to embed SQL statements within a script,
as well as create user-defined functions. Scripts are written using the script
painter, which includes a debug facility that lets you set breakpoints; set
watches for local, global, and shared variables; and step through code. When
you finish building the script, it is compiled and placed in temporary
storage. When you save the object, the script is saved in an application
library (.PBL file) with the object.
The e-mail app consists of a main window with two data windows and several
control buttons; see Figure 2. Buttons are used to initiate communications and
manage the mail database. The two data windows are different views into the
same table. The top window provides a summary of messages in the database,
including message status and header information. The user can scroll through
the database and click on any row in the table. When clicked, the lower data
window receives an update message and displays the mail for that row. If the
mail is plain ASCII, the message itself is displayed. If, however, the mail is
binary in nature (such as a .ZIP file), then a message (containing the fully
qualified path and filename) is displayed.
The bulk of the work takes place in the script for the click event of the
Send/Receive button; see Listing Two. The first task is to invoke QmodemPro,
passing the name of the script in Listing One as a parameter. The Qmodem
script downloads the mail to temporary files and creates a temporary
mail-header file. Listing Two reads the mail-header file, parses the mail
messages, and stores them in the database. At the same time, the two data
windows are updated with the new mail messages.
One minor problem I encountered involves the formatting of date and time
strings. Date strings in particular are presented in different formats
depending on the service from which the message was received. And even though
PowerBuilder can display date strings in any format, it expects to receive
them in the form YYYY-MM-DD. Listing Two determines from which system the
message originated and formats the date string accordingly. Once the string is
parsed, the new date string is constructed using PowerBuilder's replace()
function and then stored in the DateStr variable. Later, DateStr will be
concatenated to a larger string containing the complete formatted record for
the mail message.
A simple method to import records into the database is to create a delimited
string and call PowerBuilder's ImportString() function. However, I found a few
quirks using this function, particularly with large bodies of text such as
those associated with e-mail. The problem arises from the fact that
ImportString() uses carriage returns, line feeds, and EOF markers to separate
fields upon importing. However, you cannot specify which of these are used. To
get around this, Listing Two Opens the same file twice; once in line mode and
again in stream mode. In line mode, FileRead() reads until either a carriage
return, line feed, or EOF is encountered. In stream mode, FileRead() will read
the entire file until an EOF is encountered or the maximum 32,765 bytes are
read.
Once the import string is formatted, InsertRow() is called to create a new row
in the table. Next, dw_2.ImportString(ImportStr) imports the string stored in
ImportStr to the database. The dw_2 object qualifier references data window 2.
Finally, dw_2.update() adds the new record to the table and dw_1.retrieve()
makes the change visible in the top data window.


Wrapping It Up


PowerBuilder includes many other features which I have not explored, including
the ability to communicate via DDE and OLE. And because PowerScript supports
calls to DLLs, you can extend the environment. In fact, the communications
engine could be rewritten in, say C, and called directly from PowerScript.
That way, you could add support for network mail systems such as cc:Mail.
Coincidentally, PowerBuilder supports MAPI. PowerBuilder supplies a system
object called MailSession, and some mail-related structures, enumerated data
types, and object-level functions including MailLogon, MailGetMessages,
MailAddress, and MailSend. One enhancement you might consider is creating a
mail server for the e-mail application. The server could be used to handle
requests from other applications for e-mail services.
PowerBuilder 4.0 is due out by the end of this year, according to Powersoft.
While company officials are reluctant to give out details of this new release,
they have indicated the new version will support both Windows and Windows NT,
and that Macintosh support will be available in early 1995. A Powersoft
spokesperson also indicated plans for several UNIX implementations with Sun
Solaris at the top of the list.
Finally, using Qmodem over a roll-your-own approach saves the headaches of
implementing terminal emulations and the sometimes-obscure file-transfer
protocols. And on the PowerBuilder side, you no longer have to agonize over
user-interface and database-connectivity issues. You may not have thought of
putting these two tools together in a programming project; nevertheless, it is
surprising what you can accomplish in a short time when you use RADical tools.


For More Information


QmodemPro For Windows 1.1
Mustang Software

6200 Lake Ming Road
Bakersfield, CA 93306
805-395-2500
Price: $99.00
Requirements: Windows 3.1
PowerBuilder Desktop 3.0
Powersoft Corp.
561 Virginia Rd.
Concord, MA 01742-2732
508-287-1500
Price: $695.00
Figure 1: Creating a dialog-box template using SLIQ.
DIALOG dialogtype x, y, w, h
 [CAPTION caption]
 [FONT size, fontname]
 [integer-field AS CHECKBOX title, id, x, y, w, h]
 [integer-field AS COMBOBOX id, x, y, w, h]
 [CTEXT title, id, x, y, w, h]
 [DEFPUSHBUTTON title, id, x, y, w, h]
 [string-field as EDITTEXT id, x, y, w, h]
 [GROUPBOX title, id, x, y, w, h]
 [integer-field AS LISTBOX id, x, y, w, h]
 [LTEXT title, id, x, y, w, h]
 [PUSHBUTTON title, id, x, y, w, h]
 [integer-field AS RADIOBUTTON title, id, x, y, w, h]
 [RTEXT title, id, x, y, w, h]
 ...
END DIALOG
Figure 2 Main window of the e-mail application.

Listing One 

'
' Script to parse mail headers in CIS Mail and determine whether a message
' is new. If new, then download
'

DIM Str1 as string
DIM Str2 as string
DIM Str3 as string
DIM FileName as string
DIM I, N, J as Integer
DIM MsgNum as string
DIM DownLoadStr as string

LogFile ON

If Exists("cis.hdr") then
 del "cis.hdr"
End If

dial manual "9,434-1580"

striphibit on
DELAY 1
SEND "^C";
WAITFOR "User ID: "
SEND "76703,4057"
WAITFOR "Password: "

SEND "elder*wholly"

WAITFOR "!"
SEND "go mail"

WAITFOR "!"
CAPTURE "cis.hdr"
SEND "scan"
WAITFOR "!"
CAPTURE OFF
SEND
WAITFOR "!"

OPEN "cis.hdr" for input as #1

For I = 1 to 4
 INPUT #1, Str1
Next I

Do While not eof(1)
 INPUT #1, Str1
 INPUT #1, Str2
 If Str2 = "" then
 Exit Do
 End If
 INPUT #1, Str3

 MsgNum = " "
 I = InStr(Str3, "Expire:")
 If I = 0 then
 While MsgNum = " "
 MsgNum = LEFT(Str1, 1)
 N = Len(Str1) - 1
 Str1 = RIGHT(Str1, N)
 WEnd

 FileName = "cis." + MsgNum
 DownLoadStr = "download/pro:b " + MsgNum

 Send DownLoadStr
 Waitfor ":"
 Send FileName
 If download("c:\apps\qmwin\download", bplus) = 0 Then
 'PRINT "file transfer OK"
 End If
 Waitfor "!"
 SEND
 End If
 IF Str3 <> "" then
 INPUT #1, Str1
 End If
Loop
 CLOSE #1

SEND "BYE"
Waitfor "!"
Send "N"
DELAY 5
HANGUP

LogFile OFF



Listing Two

string Fname, TmpStr, DateStr, FromStr, SubjectStr, ImportStr
string StatusStr, SystemStr, MsgStr, FExtension
int Fnum, HdrNum, retrn, HdrFileRet, I
long StrPos, RowNum
Boolean OK

run("c:\apps\qmwin\qmwin c:\apps\qmwin\scripts\email.scr")

// Construct filename from mail.hdr
Fname = "c:\apps\qmwin\cis.hdr"
OK = FileExists(Fname)
If OK then 
 HdrNum = FileOpen(Fname)
 For I = 1 to 5
 HdrFileRet = FileRead(HdrNum, TmpStr)
 Next
 Do While HdrFileRet > 0
 FExtension = mid(TmpStr, 3, 1)
 FName = "c:\apps\qmwin\download\cis."
 FName = replace(FName,Len(FName)+1,Len(FExtension), FExtension)

 // Determine if file is binary
 For I = 1 to 3
 HdrFileRet = FileRead(HdrNum, TmpStr)
 Next
 StrPos = pos("* Binary *", TmpStr)

 // If file is text then open mail and process
 If StrPos = 0 Then
 OK = FileExists(Fname)
 If OK then 
 Fnum = FileOpen(Fname)

 // Get and format the date time string
 retrn = FileRead(Fnum, TmpStr)
 If retrn > 0 then 
 StrPos = pos(TmpStr, ": ")
 StrPos = StrPos + 2
 DateStr = mid(TmpStr, StrPos)
 DateStr = replace(DateStr,Len(DateStr)+1,Len("~t"), "~t")
 End If
 
 // Get and format the "From" string
 retrn = FileRead(Fnum, TmpStr)
 If retrn > 0 then 
 StrPos = pos(TmpStr, ": ")
 StrPos = StrPos + 2
 FromStr = mid(TmpStr, StrPos)
 FromStr = replace(FromStr,Len(FromStr)+1,Len("~t"), "~t")
 End If

 // Get and format the Subject string
 retrn = FileRead(Fnum, TmpStr)

 If retrn > 0 then 
 StrPos = pos(TmpStr, ":")
 StrPos = StrPos + 2
 SubjectStr = mid(TmpStr, StrPos)
 SubjectStr = replace(SubjectStr,Len(SubjectStr)+1,Len("~t"), "~t")
 End If

 // Get the mail message
 Do While retrn >= 0
 retrn = FileRead(Fnum, TmpStr)
 MsgStr = replace(MsgStr,Len(MsgStr)+1,Len(TmpStr), TmpStr)
 Loop
 FileClose(Fnum)
 End If

 Else // Process binary file 
 MsgStr = "Binary file copied to download directory~t"
 FromStr = TmpStr
 SubjectStr = TmpStr
 DateStr = "1994-01-01"
 HdrFileRet = FileRead(Fnum, TmpStr)
 HdrFileRet = FileRead(HdrNum, TmpStr)
 End If
 
 StatusStr = "received~t"
 SystemStr = "Compuserve~t"
 RowNum = dw_2.InsertRow(0)
 ImportStr = String(Rownum)
 ImportStr = replace(ImportStr,Len(ImportStr)+1,Len("~t"), "~t")
 ImportStr = replace(ImportStr,Len(ImportStr)+1,Len(DateStr), DateStr)
 ImportStr = replace(ImportStr,Len(ImportStr)+1,Len(FromStr), FromStr)
 ImportStr = replace(ImportStr,Len(ImportStr)+1,Len(SubjectStr), SubjectStr)
 ImportStr = replace(ImportStr,Len(ImportStr)+1,Len(StatusStr), StatusStr)
 ImportStr = replace(ImportStr,Len(ImportStr)+1,Len(SystemStr), SystemStr)
 ImportStr = replace(ImportStr,Len(ImportStr)+1,Len(MsgStr), MsgStr)

 dw_2.ImportString(ImportStr)
 dw_2.update()
 dw_1.retrieve()
 Loop
FileClose(HdrNum)
End If




















December, 1994
PROGRAMMING PARADIGMS


The Pizza Clerk, the Bookmaker, and the UPS Truck




Michael Swaine


This is not the month in which we learn how to get rich developing software.
This is also not the month in which I pick the best erotic QuickTime movies on
CompuServe. This is the month in which I relate strange tales of Internet
access, critique Newton development tools, and describe some things I found in
my driveway. I wanted you to know that up front in case these subjects were
not at the top of your must-research list.


Tales of the On Ramp


Every year for eight years, the Santa Cruz Operation has hosted a bash called
SCO Forum on the University of California, Santa Cruz campus, just down the
road from me. This year for the first time I bestirred myself early one
morning to grace the festivities with my bleary-eyed presence. It wasn't
primarily an interest in SCO specialties like multiuser system support and
fast-food service that drew me to this romp in the redwoods, although back
when I was a working programmer, I was heavily into both.
No, I was there chiefly as working press--that is, to suss the vibes and nab
the perks and bennies. I scarfed up a bottle of official SCO wine, two nifty
tie tacks, one SCO Forum shoelace, and a Frisbee for the lab. I didn't stay
till the last day, so I didn't get to take my seat cushion home with me.
The seat cushions were to ease the pain of sitting on the hard bleachers in
the quarry, listening to--to name perhaps the least painful player in the
motley troupe--Electronic Frontier Foundation cofounder John Perry Barlow, on
whose thoughts as then expressed I hope to report in depth within the next
month, but not now.
From players other than Barlow I heard that SCO just had its best year ever,
is profitable, and has money in the bank some 15 months after going public. I
learned that new SCO CFO Alok Mohan considers the UNIX wars over now that SCO
has a 37 percent market share, and that SCO tells the press it ain't afraid of
no NT, while admitting to its stockholders that SCO was feeling the NT pinch
at the end of fiscal '93.
I heard more than I ever wanted to hear about Lotus Notes for SCO. SCO
customers apparently couldn't hear enough about one of the two big themes of
SCO Forum: its new OS version that is more friendly to Windows clients.
And I got the real story on Internet pizza delivery.
SCO is big with fast-food folks. Pizza Hut and its parent, Pepsi, are major
customers, as is Dr. Pepper, which I think is part of Pepsi, isn't it? So is
KFC, which announced that it was installing SCO back-of-house management
software in 800 of its chicken joints, but that's not the big news, oh no;
that would be PizzaNet.
It probably shouldn't come as a surprise that the first pizza order over the
Internet occurred at SCO Forum. The other big theme at SCO Forum, and the
theme of the August issue of SCO World magazine, was access to the Internet.
Now there's a no-brainer. Access to the Internet is probably a theme at
Tupperware parties these days. As Ray Valds reported in the August Dr. Dobb's
Developer Update, the Icon Bar & Grill in San Francisco has a TCP/IP link to
the Internet. It's everywhere. Tell me about it; my mom Veronicas. I'd
paraphrase John Lennon and say that the Internet is bigger than Jesus, but a
clause in my contract holds that my pay gets docked for all copies of the
magazine burned by Christian fundamentalists. It's big, though. And SCO
noticed. And they thought, "We could use it to order pizza."
A big draw at SCO Forum was the pizza-ordering station. The idea was, you log
on to the WWW via NCSA Mosaic and custom "PizzaNet" software from SCO, and you
enter your name and address and your order (including drinks). The PizzaNet
server at Pizza Hut headquarters in Wichita, Kansas, takes the order and
modems it back to your local Pizza Hut, which calls you back (voice) to verify
the order and then dispatches a driver. To the local Pizza Hut's back-of-store
SCO software it looks no different from an order from a point-of-sale terminal
in the front of the store.
The main drawbacks seem to be: 1. You still have to talk to your local store;
2. it's only available in Santa Cruz, so far; 3. delivery was unimpressive in
the test run my friend Jrgen did; and 4. _er, well, I don't know, maybe you
actually like Pizza Hut pizza.
What's interesting about this whole thing is that they already had a back-end
system and only needed to attach a user interface, and the one they chose was
Mosaic. In the competition for On Ramp of Choice to the Infobahn, the leading
contender is a noncommercial program (take that, commercializers of the net),
and it's a noncommercial program that pretty much requires a 56Kbs connection
(take that, Joe Average User).
Of course, Mosaic has been licensed to various developers for commercial
versions.
How does anyone not already into computers and telecommunications and UNIX
make any sense of the Internet? How do such people even find out about service
providers, or find out that there are different kinds of service providers, or
learn that service providers are anything to ask about?
Access is a puzzle for anyone not already on the Internet. And that's still
most of the world: Americans tend to take terms like World Wide Web literally,
but most Internet users today are U.S. resident. (Second place goes to
"unknown," which says something about Internet culture, I suppose.) It's only
been for about a year now that the general public in Japan has had any access,
which they now get through IIKK via TWICS, no doubt using TCP/IP and SLIP or
PPP, unless they want WWW access, in which case maybe they should talk to SCO
about PizzaNet.
Okay, it's not just the acronym soup that confuses neophytes. Access to the
Internet is a puzzle. So you'd think that solving that puzzle would be the
business of any introductory Internet book. But my latest visit to a good
computer bookstore showed that the Internet neophyte has a bigger access
problem than that. There was a wall of probably 500 books on the Internet, and
there didn't seem to be more than four of any one title. There should be a
book to tell people what book to read.
DDJ has offered some advice on Internet books (and does so again in this
issue), but the problem with any book about a topic like the Internet is that
it is guaranteed to be out of date by the time it's published. I suspect the
best advice you can give anyone interested in the Internet is to read Howard
Reingold's The Virtual Community for the history and culture and sociology and
philosophy of the Net, get some actual person to set you up, and then start
surfing. The only reliable source regarding the Internet is the Internet. The
best information on Internet access is accessed through the Internet. Joseph
Heller could have designed this.


Making Book on Newton


With used Newton MessagePad 100s selling for $350 now, improvements in the
handwriting recognition in place and more coming, and with cellular-phone
Newton devices probably available by the time this sees print, you might be
tempted to pick one up. A MessagePad is not a bad appointment calendar, to-do
list manager, and phone list, and it justifies being more expensive than a
dedicated device of this sort by being, well, nondedicated. There are hundreds
of Newton applications out there now, mostly freeware or shareware, and mostly
pretty good stuff. And if you don't like the built-in applications, there a
third-party alternatives. There's even a shareware program that turns the
MessagePad into a Sony remote, with separate screens for controlling your Sony
TV, CD, VCR, or whatever.
So you just might possibly be tempted to pick up at least a used Newton. What
you probably may not be tempted to do is to plunk down $800 for the
developer's kit just to program the thing for your own purposes. There are,
however, some alternatives.
There are shareware and freeware products that let you write Newton code on
the Newton. By and large, this is not something you would ever do if you were
in your right mind. Handwriting code or typing on an on-screen keyboard with
the stylus is not a fun way to program. (There is a shareware product,
Typomatica, that lets you turn any computer with serial communications
software into a keyboard for the Newton, though.)
Electronic books are actually catching on for the Newton platform. Till now,
the only way to make them had been via the Bookmaker component of the
Developer's Kit (NTK). Now there's an alternative, David Fedor's PaperBack.
It's fine for doing simple electronic books, but what Apple really needs to do
is to spin Bookmaker off as a separate product. Bookmaker actually produces
executables and allows you to embed NewtonScript code in them, so a standalone
Bookmaker and a book on NewtonScript could be a low-cost approach to a limited
programming system for the Newton.
There are also a number of tools for knocking out Newton data-entry
applications quickly and cheaply. These are form-creation programs that sell
for anywhere from $49 to $200. Some even support development under Windows.
The ones I can recommend from personal experience are FilePad from HealthCare
Communications (Lincoln, NE), PowerForms from Sestra Inc., (Omaha, NE), a
powerful product whose name is unsettled from Fulcrum Software (Austin, TX)
and, for quick-and-dirty creation of simple data-entry apps with good
data-type screening, Flash-Data from ISIS International (Sherman Oaks, CA).


Hurled from the UPS Truck


Generations of UPS drivers have concluded that our dog, whom we consider a
frisky, Frisbee-catching lab, is in truth a bloodthirsty beast who makes it
unsafe to set foot out of the truck anywhere within the gates of stately
Swaine Manor. Quivering in fear, they throw their packages from the truck as
they squeal through the drive. Okay, they turn around slowly in the drive, and
from my garret window I can tell neither whether they quiver nor whether they
throw the boxes out while moving or set them gently on the ground while
stopped. I just know they don't get out of the truck.
FedEx drivers and Express Mail drivers do: They walk their packages past the
lab to the front door, and if she sets the Frisbee at their feet they
sometimes even throw it for her. Having collected data over the years across
several drivers, several services, and four dogs (counting neighbors'), I have
concluded that big brown trucks make dogs angry. No other conclusions are
supported by the data, but I suggest you don't send me glassware.
Among recent UPS (or FedEx or Express Mail) deliveries to the stately Swaine
vestibule (or driveway, as the case may be) are Prograph CPX, VIP-C, and a
book by Scott Jarol, all of which I am in the process of evaluating.
There's another reason for looking at these particular products.
Coincidentally, all these products deal, in one way or another, with visual
programming. Also coincidentally, I've heard the suggestion from a couple of
respected sources just recently that what OOP really needs is visible objects,
a programming environment that gives a concreteness to software objects. Some
coincidences are meaningful, so I'm looking at these products.
Prograph CPX from Prograph International (Halifax, NS) and VIP-C from Mainstay
(Camarillo, CA) are visual programming systems for the Macintosh. They reflect
a lot of experience in visual programming and a lot of thinking about how it
ought to be implemented in an object-oriented development universe, and I hope
to write about one or both of them in the coming months as I get more deeply
into them. Jarol's book has something to do with visual programming, not on
the Mac, but it's really about multimedia development.
There are already a lot of books out on multimedia development, and Scott
Jarol's Visual Basic Multimedia Adventure Set (Coriolis Group Books, 1994) has
a couple of strikes against it before it even gets to the plate. There's the
title. Adventure set? Please. And there's the dog-ear announcing a $2500
multimedia contest, which, in addition to being gimmicky, turns out to be
$2500 worth of software, not $2500 cash.
But I think the cover blurb is not far off the mark when it calls Jarol's
approach "The best way to develop multimedia with animation, sound, video,
music, and more." I am of the opinion, not original with me, that developing
multimedia titles is something that should be done as cost-effectively as
possible. Two reasons: multimedia is inherently costly and you can't expect to
price it higher enough than analogous monomedia products to offset the cost
differential, and you need to plan for failures.
Failures, fer sure. Book publishers have been at it for much longer than
multimedia publishers and they have come up with nothing better than a system
in which a few best-sellers support a big backlist of losers. If you know
something about computer-book publishing or science fiction or some other
niche, you may question that, but it's the reality in general fiction and
nonfiction-book publishing. Can you expect multimedia publishers to be any
better at reading the market?
Myst, a CD-ROM-based adventure game distributed by Br0derbund and produced by
Rand and Robyn Miller, has sold over 400,000 copies at around $50 per. Myst is
deep, rich, and beautiful with thousands of professional-quality images
produced just for it, music composed just for it, live actors, and a complex
game structure--an intriguing, well-developed virtual world. The production
that went into Myst is comparable to what goes into a movie, and if it had
been developed on a movie budget the $20 million gross to date would not have
been very impressive.

In fact, Myst was produced by the multitalented workaholic genius Miller
brothers and team, who created the art, the music, the game, the world, and
even acted in scenes in Myst, so we can assume that the Millers got a pretty
good return on their investment of time. But how many Miller-and-Miller teams
are there out there?
And the situation is worse than that. Although CD-ROM hardware sales numbers
are high and growing, an alarming number of these devices are being left on
the shelves after they're bought, according to some observers. Most multimedia
development is for presentations or training, not for producing CD-ROM titles.
CD-ROM title sales are encouragingly high if you look at total volume and its
growth, but the number of titles is increasing faster, and reports on user
satisfaction indicate that a lot of CD-ROMs are sitting on the shelf after one
viewing. I haven't seen this particular trend plotted, but my guess is that
user perception of value of CD-ROM titles is falling. People are spending
money on these things and, far too often, are being disappointed in what they
get.
This is a picture of a market poised for a consumer backlash. So if you're
going to develop CD-ROM titles you should not be surprised if the first one is
not a commercial success, or if fewer than half your publisher's titles are
successful, or if the whole market goes through some painful times in the near
future. You should write for Windows and only then, maybe, for the Mac. And
you should keep costs, especially your time, down.
This leads me to think that Visual Basic or some Windows-targeted authoring
system is the way to go. That's Jarol's position, but he points out that no
authoring system is going to give you the kind of flexibility you'll probably
want. Thus Visual Basic. Visual Basic is well documented and supported. The
question is, can you find the add-ons to make it a multimedia development
system?
That's what his book is about. Jarol argues against using a lot of extensions,
instead showing how to get there chiefly via Windows API calls. And he bundles
up all his solutions in a sort of multimedia kit.
Overall, I like the book. In addition to building the kit and a multimedia
title in the process, the book includes chapters on animation, audio, and MIDI
that contain the kind of material you'd expect to find in appendices but that
don't read (shudder) like appendices.
He also gives sound advice, like, for Pete's sake don't waste time upgrading a
PC to MPC yourself. Just buy an upgrade package.























































December, 1994
C PROGRAMMING


Myst, CD-ROMs, and CEnvi




Al Stevens


Okay, the column's late again. Blame it on two brothers named Robyn and Rand
Miller. They built Myst, a computer game marketed by Broderbund Software. I am
not usually a game player, preferring to spend my idle hours away from the
computer. But someone told me to look at Myst to get an idea of the potential
for computer graphics and visual simulation using contemporary hardware. I did
and got drawn into Myst's strange universe of islands, ages, sights, and
sounds. It has interfered with my dreams and altered my usually relaxed view
of reality. Try this thing only if you have plenty of spare time. It is
compelling. I think I'll go play it now.


Information Retrieval


My attention returns to static information-retrieval engines. Long an interest
of mine, the subject is particularly relevant now because of the plethora of
information-based CD-ROMs. There are good ones and bad ones, both with respect
to the information itself and to the search-and-display engines. As a
developer, I am more concerned with the engines.
The Microsoft Developer Network (MSDN) CD-ROM is a particularly good
research-and-development tool, and every DOS and Windows programmer should
have it. It uses a search-and-display engine that resembles the Windows Help
system, complete with graphics and hypertext links. It contains most of
Microsoft's developer documentation along with the full text of several
Microsoft Press books. There are free tools and source code, too.
The Borland C++ compiler CD-ROM contains all its documentation in Adobe
Acrobat format, which displays the pages of books on the screen in their
printed format along with search functions. The display engine is excellent.
The search engine is primitive. There are no hypertext links like the ones in
the MSDN CD-ROM. I searched Borland's online DOS manual for the single word
"TSR," and the program took about a minute to find the keep function
documentation. The same search of the MSDN CD-ROM takes less than a second to
find and lists 65 topics. That's the way a text-search engine is supposed to
work.
Recently I got the J.F.K. Assassination: A Visual Exploration CD-ROM, which
contains the text of two books, the complete Warren Commission Report, several
animated simulations, and film clips, including the Zapruder film. The search
and display engine seems to be the same one or similar to that of the MSDN
CD-ROM.
The JFK product has a proconspiracy bias, which is obvious when you compare
its treatment of conspiracy books to its treatment of Case Closed, by Gerald
Posner (Random House, 1993). Case Closed concludes that Oswald acted on his
own and was such a flake that no respectable Federal agent, Communist, or
mafioso would have associated with him, much less trusted him with such a
responsibility. Posner's work is a monumental piece of research that delivers
its conclusions with sound technical and literary responsibility and without
emotion, hysteria, or agenda. The book does not appeal to closed minds. It
does, however, deserve more respect than the JFK CD-ROM gives it.
Bias notwithstanding, the JFK CD-ROM is a model of information retrieval. I
use it here as an example because it has everything--text, sound, video,
animation, graphics, hypertext, and an intuitive user interface. It uses big
graphic buttons to jump between major topics, and the graphics, animations,
and photos are excellent. The search engine is comprehensive. I did a search
for the word "Monroe" and instantly got references to four topics: the Monroe
doctrine, Kennedy's alleged relationship with Marilyn, a reading test that
Oswald took, and an acknowledgment in the Warren Commission Report. Hypertext
links are everywhere and are comprehensive. A lot of preparation went into
this work. If you are interested in electronic publishing, this product from
Medio Multimedia Inc. (Redmond, WA) is one to study.
Several years ago in this column, I published the source code to a
text-indexing system that included a Boolean query engine. More recently, I
applied those concepts to a shareware product for an associate. We implemented
a Windows program that retrieves and displays verses of the King James version
of the Bible. His interest in the project was to distribute the program as
shareware. Mine was to further my research in static text storage and
retrieval. The Bible project was particularly well suited to these objectives
because the text is in the public domain and the Bible is organized into a
hierarchical structure of testaments, books, chapters, and verses. The largest
verse is relatively small (less than 600 characters), and the text is
certainly static--it won't be changing anytime soon.
The project included tasks common to any such project--development of the
system itself, a Windows Help database, and a Windows Setup program. To build
a text database, you must apply a certain amount of data analysis to determine
the organization of the database and its presentation, the distribution and
frequency of words, and the best methods for compression, indexing, and
retrieval. I wrote C programs that analyzed the raw text and subsequent C
programs to build the database. The retrieval engine is a Windows DLL written
in C. The user interface is written in Visual Basic. The shareware venture did
not pan out, and the program is now in the public domain. I'll be discussing
the software aspects of this project and providing source code over the next
few columns.


Quincy 


Last month, I described Quincy's architecture and the front end of Quincy's
translator. It is time to get into the hard parts--the compiler and
interpreter. It is time, yes, but I demur. Quincy's techniques for compiling
and interpreting source code defy description. They do not follow traditional
translator logic, and they are difficult to work with and explain. When I
added the ANSI extensions to the original K&R interpreter, I found out just
how true that was. A language translator needs to be driven by an unambiguous
grammar, but Quincy's parsing logic is brute force, following a thread that
reflects my understanding of the language rather than a grammar.
Quincy's overhaul was to prepare it for use in my C tutorial book (Al Stevens
Teaches C, M&T Books, 1994), and that work is done. Throughout the project a
persistent notion nagged at me. When Visual Basic came out, I was sure that
Visual C would soon follow. It did not. All contemporary C- and C++-based
visual development environments that I have seen launch compilers. They are
not interpreters that work like Visual Basic. It seems to me that such a
program would be useful and, if done properly, wildly successful. First you
would need a reasonably efficient C interpreter upon which to base the visual
tool. Quincy is not reasonably efficient, sacrificing execution speed for a
rapid development cycle to support its tutorial role. Quincy could, however,
be improved.
Lately I have been working on a more conventional translator/interpreter using
the grammar in the ANSI Standard C specification as a guide. The same grammar
is published in the second edition of Kernighan and Ritchie. Do not interpret
this as an announcement of a visual C project_but that notion keeps nagging.
There are some things about the Standard C grammar that I do not understand.
Why does the parameter-declaration specification consist of
declaration-specifiers, which permit storage-class-specifiers, instead of the
specifier-qualifier-list, which does not? According to the grammar, you can
code typedef, extern, static, auto, and register as part of the parameter
declarations in a function prototype or header. Of course, no compiler allows
that. The grammars in Kernighan and Ritchie's second edition and the C++
Annotated Reference Manual have the same seeming anomaly, so there must be
something that I am missing.
I shall defer further discussion of Quincy's existing translator and
interpreter until I decide its future. For now, the source code is on
CompuServe, DDJ Online, and ftp.mv.com, and, as usual, I am available on
CompuServe to answer questions.


CEnvi


This is the December issue, coming when programmers are looking for stocking
stuffers, and I've found a good one. Every now and then a programming tool
serves a particular need better than anything else. CEnvi, a shareware product
from Nombas (P.O. Box 875, Medford, MA 02155, 617-391-6595, bsn@world.std.com,
$45.00 for the registered version) is such a tool. It is a language
interpreter that implements a subset of C.
The subset language is called "Cmm," standing for C-minus-minus, which implies
that some parts of C are missing. They are, indeed. Cmm does not have type
declarations or pointers. Imagine a C program with no declarations, no
pointers, pointer notation, or address constants, and you have a Cmm program,
which CEnvi interprets. Cmm variables are implicitly typed by the context in
which the program uses them. Basic programmers are familiar with such
implicit, dynamic typing. Cmm programs do not have pointers and addresses
because arguments are passed by reference unless you specifically tell CEnvi
to pass an argument by value. You can even have structures without structure
declarations. Structures evolve as the program assigns values to implicitly
declared structure members. CEnvi's author maintains that keeping track of
memory is what makes C hard to understand for the newcomer, so he has removed
everything from the language that deals with memory. The programmer is unaware
of the address or format of anything. Even so, CEnvi supports arrays,
structures, and strings through implicit typing.
Now, what good is it? Although designed primarily as a program-development
environment, Cmm is better as a batch language. CEnvi executes command-line
batch files written in the Cmm dialect of C. That is good news for C
programmers. Until now, OS/2 programmers had to learn ReXX to build complex
command files, and DOS programmers were stuck with the clumsy BAT language.
CEnvi provides almost full batch capabilities but with a subset of C. The only
thing I can see that it is not good for is loading TSRs. CEnvi is not a shell
program. Its executable terminates when the interpreting completes. Any TSRs
that it spawned would be loaded above CENVI.EXE.
There are three versions of the interpreter: DOS, Windows, and OS/2. Having
used the DOS and OS/2 versions, I rarely use the DOS batch language or ReXX. I
got the OS/2 shareware version from a CD-ROM and learned that CEnvi nicely
solves one of my OS/2 problems. I prefer to open DOS and OS/2 command-line
windows with a specific font and cannot find a way to override the defaults
from the OS/2 desktop. CEnvi includes access to the PM kernel and examples
that open DOS and OS/2 sessions with different positions, window sizes, and
fonts. CEnvi uses features of both batch environments that allow you to embed
Cmm code into .BAT and .CMD files. Most of the file is Cmm code, but users
execute the command file just as they would any other. By using this feature
and the examples that Nombas provides, I was able to build a command file that
opens DOS and OS/2 command-line windows just the way I like them.
CEnvi's shareware version has an annoying nag screen that begs you to pay up
and register. It runs too often and takes too long. Nombas says that they have
eased up on it in newer versions. I suspect that the policy has caused
potential registered users to scrap the program in exasperation without
exploring it enough to see its potential. On the other hand, programmers are
notorious for not registering shareware unless there is a compelling reason to
do so.
CEnvi can generate a stand-alone executable for distribution to users who do
not have the run-time interpreter. To enhance its batch script capabilities,
CEnvi uniquely supports operating-system environment variables. An identifier
of all uppercase letters is treated as an environment variable. If you use a
special version of the run-time module, your program can set
environment-variable values that persist beyond the execution of the program.
I was curious about how easy it was to convert a running C program into the
Cmm dialect. One of the tutorials that Quincy runs is a simple tic-tac-toe
program. Since it uses a number of C idioms to demonstrate language features,
including multiple-dimensioned arrays, I decided to port it to Cmm. The job
took about ten minutes. Mostly what I did was remove declarations and pointer
notation and substitute CEnvi's screen and cursor library functions for my
own. The only problem I had was one that the CEnvi documentation warns about.
(As usual, I waded in without RTFM.) If a function modifies one of its
parameter variables, by default it modifies the caller's argument variable if
the argument is an l value. After I realized that, I fixed all such parameter
references, and the program ran fine. Listing One is ttt.cmm, the Cmm version
of the game program.
CEnvi has a substantial subset of the Standard C library and some extra
functions to support writing Windows programs and DOS and OS/2 command-line
programs. The documentation is typical shareware with a cottage-industry look
and feel, but with enough information to get programming underway in short
order. The package installs easily in any of the three environments and comes
with an abundance of example programs. When you call Nombas, you speak to
Brent Noorda, the programmer, CEO, and guy who sweeps floors at Nombas. His
mother has never seen his name in print, so here it is.
Naturally, Cmm code is incompatible with C. The name, pronounced "C-Envy,"
reflects the developer's reason for building the interpreter. He says that he
was envious working on small systems that do not have big C development
environments. (Others might see an analogy to a well-known Freudian complex,
but we at DDJ are above making such tawdry observations, so I shall decline to
do so, thank you very much.) I'm not sure that the argument holds up if you
still have an old copy of Turbo C 2.0. I used to run the tcc command-line
compiler on a slow 8088 laptop with 640K and only one 720K diskette drive, and
I had room left over for an editor and some source code.
CEnvi certainly does not need as much disk space as contemporary C/C++
compilers. There are no huge libraries or long lists of header files.
Everything is built into the interpreter. It may be the smallest development
environment for Windows programming ever, although the Windows programs that
it runs tend to be slow even when compared to those of Visual Basic. Besides
not having the typical bloat, CEnvi also does not have an integrated editor or
debugger. You have to return to the old ways, using the editor of your choice
and inserting getchar and printf calls in the code to form breakpoints and
examine variables.
The future for CEnvi is not so much as a substitute batch language, at which
it shines, nor as a tiny development environment, for which there may be no
real market, but in its potential as an application script language. Nombas
plans to release a programmer's toolkit that lets you link the interpreter
into your application and use Cmm as a script language very much along the
lines of Word Basic or the Brief macro language. Its success will depend on
whether users can be expected to use a C dialect for macros. Certainly
programmers can do that, but if the example text editor is any indication of
the interpreter's performance, CEnvi needs to get a lot faster before you
would use it to run macros of any consequence. I understand the problem.
Quincy is about as slow, although that was a conscious design decision to
trade off the tutorial's run time for compile-time efficiency. Nombas is aware
of the problem and is looking into it. In the meantime, CEnvi has virtually
replaced ReXX and the DOS batch language in my office.


"C Programming" Column Source Code


Quincy, D-Flat, and D-Flat++ are available to download from the DDJ Forum on
CompuServe, DDJ Online, and on the Internet by anonymous ftp. See page 3 for
details. If you cannot get to one of the online sources, send a diskette and a
stamped, addressed mailer to me at Dr. Dobb's Journal, 411 Borel, San Mateo,
CA 94402. I'll send you a copy of the source code. It's free, but if you want
to support the Careware charity, include a dollar for the Brevard County Food
Bank.


Listing One 

/* ------------------------------ ttt.cmm -------------------------- */
/* A simple game of tic-tac-toe written in Cmm */
/* ------------------------------------------------------------------*/
#define TRUE 1
#define FALSE 0
#define BELL 7
/* ---- board markers ---- */
#define PLAYER 'X'
#define COMPUTER 'O'
#define FREE ' '
/* --- game position on screen --- */
#define LEFT 10
#define TOP 5
/* --- game board --- */
board = " ";
/* --- winning combinations --- */
wins = {
 /* --- winning rows --- */
 { 1,2,3 },
 { 4,5,6 },
 { 7,8,9 },
 /* --- winning columns --- */
 { 1,4,7 },
 { 2,5,8 },
 { 3,6,9 },
 /* --- winning diagonals --- */
 { 1,5,9 },
 { 3,5,7 }
};
main()
{
 ch = 'y';
 while (ch == 'y') {
 memset(board, FREE, 9);
 displayboard();
 /* --- get player's first move --- */
 if ((mv = getmove()) == 0)
 break;
 /* --- set computer's first move --- */
 if (mv != 5)
 setpiece(5, COMPUTER); /* center if available */
 else
 setpiece(1, COMPUTER); /* upper left otherwise */
 moves = 2;
 while (moves < 9) {
 getmove(); /* player's next move */
 moves++;
 if (won()) {
 message(1, "You win");
 break;
 }
 if (moves == 9)
 message(1, "Tie");
 else {
 /* --- find computer's next move --- */
 if ((mv = canwin(COMPUTER)) != 0)

 /* --- win if possible --- */
 setpiece(=mv, COMPUTER);
 else if ((mv = canwin(PLAYER)) != 0)
 /* --- block player's win potential --- */
 setpiece(=mv, COMPUTER);
 else
 nextmove();
 if (won()) {
 message(1, "I win");
 break;
 }
 moves++;
 }
 }
 message(2, "Play again? (y/n) ");
 ch = getch();
 }
}
/* --- find next available open space for a dumb move --- */
nextmove()
{
 lmv = -1;
 for (i = 0; i < 9; i++)
 if (board[i] == FREE) {
 lmv = i+1;
 setpiece(=lmv, COMPUTER);
 if (canwin(COMPUTER))
 return;
 setpiece(=lmv, FREE);
 }
 if (lmv != -1)
 setpiece(=lmv, COMPUTER);
}
/* --- get the player's move and post it --- */
getmove()
{
 mv = 0;
 while (mv == 0) {
 message(0, "Move (1-9)? ");
 mv = getch();
 mv -= '0';
 if (mv < 1 mv > 9 board[mv-1] != FREE) {
 putchar(BELL);
 mv = 0;
 }
 }
 setpiece(=mv, PLAYER);
 return mv;
}
/* ------ test to see if the game has been won ------- */
won()
{

 for (i = 0; i < 8; i++) {
 pl = wins[i][0]-1;
 if (board[pl] == FREE)
 continue;
 for (k = 1; k < 3; k++)
 if (board[pl] != board[wins[i][k]-1])

 break;
 if (k == 3)
 return TRUE;
 }
 return FALSE;
}
/* --- test to see if a player (n) can win this time; 
 return 0 or winning board position --- */
canwin(n)
{
 for (i = 0; i < 8; i++)
 if ((w = trywin(n, i)) != 0)
 return w;
 return 0;
}
/* ---- test a row, column, or diagonal for a win;
 return 0 or winning board position --- */
trywin(n, wn)
{
 nct = 0;
 zct = 0;
 for (i = 0; i < 3; i++) {
 pl = wins[wn][i]-1;
 if (board[pl] == FREE)
 zct = i+1;
 else if (board[pl] == n)
 nct++;
 }
 if (nct == 2 && zct)
 return wins[wn][zct-1];
 return 0;
}
/* ------ display the tic-tac-toe board ------ */
displayboard()
{
 ln1 = " \xb3 \xb3";
 ln2 = "\xc4\xc4\xc4\xc5\xc4\xc4\xc4\xc5\xc4\xc4\xc4";
 ScreenClear();
 for (y = 0; y < 5; y++) {
 ScreenCursor(LEFT,TOP+y);
 printf((y&1) ? ln2 : ln1);
 }
}
/* ---- set a players mark (O or X) on the board ---- */
setpiece(pos, mark)
{
 board[--pos] = mark;
 col = pos / 3;
 row = pos % 3;
 ScreenCursor(LEFT+row*4+1, TOP+col*2);
 putchar(mark);
}
/* ---- message to opponent ---- */
message(y, msg)
{
 ScreenCursor(LEFT, TOP+8+y);
 printf(msg);
}
































































December, 1994
ALGORITHM ALLEY


The Gosselink Ditherer




Pieter Gosselink


Pieter is a programmer located in The Netherlands. He can be reached by fax at
+31-73-443635. 


Introduction 
by Bruce Schneier
Dithering is a trick. It's a way to make video output look better than its
data. It relies on your eye not being able to see certain things as well as
certain other things. Dithering techniques are used with halftoning methods to
smooth the edges of a displayed object. Basically, you add some kind of dither
intensity to the calculated intensity of various pixels. Sometimes this dither
intensity is calculated randomly; sometimes the calculation is based on the
coordinate position of the point. Adding dither tends to break up the contours
of objects, which improves the overall appearance of a scene.
In this article, Pieter reviews the two basic approaches to dithering--error
distribution and ordered dither--then presents a new technique he's developed.
For additional information on both techniques, see "Converting Dithered Images
Back to Gray Scale," by Allen Stenger (DDJ, November 1992), and "Differential
Image Compression," by John Bridges (DDJ, February 1991). Dithering is a cool
subject, one not likely to be overcome by technology anytime soon.
With the introduction of full-color pictures and full-motion video on
present-day computer displays, it is obvious that video hardware isn't keeping
pace with requirements. At least for the time being, there remains a need for
fast, high-quality color-dithering algorithms.
Most currently available monitors display 256 colors at a fairly high
resolution. To display pictures with a few million colors with these
relatively limited resources, some clever algorithms have been suggested, two
of which have proven to be very popular.
The first algorithm is "error distribution" (ED), which is almost always
implemented in the Floyd-Steinberg variant; see "An Adaptive Algorithm for
Spatial Gray Scale," by R. Floyd and L. Steinberg, Proceedings of the
International Symposium on Digital Technology, 1975. The second is "ordered
dither" (OD), or "Bayer;" see "An Optimum Method for Two-Level Rendition of
Continous-Tone Pictures," by B.E. Bayer, Proceedings of the International
Conference on Communication, 1973. In this article, I present a new algorithm
that is a combinatorial variant of these two algorithms.


Error Distribution versus Ordered Dither


Table 1 compares the ED and OD algorithms on a number of key factors.
Colorset, for instance, is a collection of colors which can be displayed
simultaneously. Most of the time it is constructed by selecting 256 colors
from a larger set. This colorset is used to approximate the colors in the
picture. For ED, this set can be chosen arbitrarily from the larger set. This
means that the choice of one color has no implications for the choice of the
other colors. For OD, the situation is somewhat more complicated. The
available bits for a palette entry (eight bits are needed for 256 colors) have
to be divided between the three primary colors (red, green, and blue) because
the algorithm has to be applied to these colors independently. A common
division is three bits for red, three bits for green, and two bits for blue.
This takes into account the fact that the eye is more sensitive to changes in
red and green than in blue. I call a colorset constructed in this way a
"linear colorset."
Most of the time, the output quality of the ED algorithm is higher because it
preserves the information better than OD. The error which is made when
approximating an RGB value in ED is distributed to the neighboring pixels,
whereas with OD this error is preprogrammed in a matrix. The Floyd-Steinberg
variant of ED reduces the noise further by distributing the error to several
pixels instead of one.
OD is much faster because it makes approximately 1/5 to 1/10 as many
calculations per pixel as ED.
ED distributes errors to neighboring pixels, which leads to a picture that is
somewhat more blurred. OD instead takes an approximation that is independent
of the color of a pixel's neighbors. This may be a less accurate approximation
than is possible with ED, but it preserves the contrast better. 
Finally, if the RGB value of a pixel changes, the color of that pixel (chosen
as an approximation of the RGB value) has to be changed. In ED, this results
in recalculating the whole picture, because all the pixels are dependent on
each other. In OD, only the color of the matching pixel has to be changed.
This presents a great advantage for applications in which pixels locally
change color, for example, in full-color bitmap editors and full-motion video.
The algorithm presented here, which I call the "Gosselink Dither," combines
the positive characteristics of both the ED and OD algorithms.


The Gosselink Dither


To illustrate the Gosselink Dither, I'll use a 4X4 OD matrix (although the
algorithm is applicable to matrixes independent of size). For clarity, I've
implemented the algorithm in Basic (see Listing One). For speed, however, I've
coded the algorithm in assembly. The matrix used for approximating an RGB
value consists of 16 colors, the mean value of which more or less corresponds
to the original RGB value. The key is that the colors in the matrix are in
order of brightness. The darkest color has an index of 0, and the brightest
has an index of 15.
A combination of 16 colors approximating an RGB value does not have to be
based on a linear colorset. Any combination corresponding to the RGB value
that has to be approximated will do. 
As soon as a set of 16 colors for a particular RGB value is created, it is
sorted on brightness and arranged in a table; this table is used when
dithering a picture. With the RGB value used as an address, a combination of
16 colors is selected from the table. The x- and y-coordinates (modulo 4) are
used to select 1 of the 16 colors from this combination.
The difference with the original OD is that the palette entry can no longer be
calculated from the RGB value, but is looked up in a table.
Because the 16 RGB values are all equal, the calculation of the combination
can be of a special form of error distribution. The simple form of error
distribution adds the error to the neighboring pixel, but because the RGB
values are all equal, there is no blurring. Consequently, the error can be
distributed to more RGB values. The logical way to do this is to distribute
the error made with the first approximation evenly among the remaining 15 RGB
values so they each get 1/15 of the error. This is repeated for the second RGB
value but now the error is divided by 14 and distributed to the remaining 14
RGB values, and so on. The error made on the last RGB value is lost, but this
would also be the case for ED or Floyd-Steinberg.


Conclusion


One disadvantage of the Gosselink Dither is that the table size can become
large. This size depends on the number of bits (c) per primary color, and the
size (s) of the OD matrix (equal to 2n for a 2n*2n matrix). There is a
functional relationship between these parameters and the number of bits used
for creating the palette (p): c<=p+s. By experimenting with different values,
I determined the following:
For large (full screen) pictures, five bits per primary color is necessary;
the size of the matrix may vary, but 2X2 should be satisfactory in most cases.
For pictures requiring a smooth grading of colors, a 4X4 matrix should be
used.
For small pictures, four bits per primary color should be used. A matrix of
2X2 again is sufficient in most cases. In special cases, the number of bits
for red, green, and blue may not be equal.
The formula for calculating the size of the table is size=matrixsize
X2(redbits+greenbits+bluebits). Therefore, for a table with five bits per
primary color and a 4X4 matrix, the size is 16X215=512 Kbytes. For a
more-common table with five bits per primary color and a 2X2 matrix, the size
is equal to 128 Kbytes.
The calculation of a table of 128 Kbytes is comparable to dithering 128,000
pixels with ED (not counting the sorting of the four colors). An
implementation coded in assembly language running on a 36-MHz, ARM3-powered
Acorn Archimedes reached 200,000 pixels per second, so the calculation of the
table should not really be a problem, considering that this has to be done
only once. The quality of the result is almost as good as that of ED, except
for pictures with very smooth grades of color, in which the limited number of
bits per primary color becomes clear.
In summary, ED compromises on speed, OD compromises on quality, and the
Gosselink Dither compromises on memory usage.
Ordered Dither and Error Distribution
With ordered dither, combinations of black and white are used to suggest 16
levels of grey. A matrix containing the numbers 0--15 in a specific order is
used to tile the image; see Figure 1. If the greyvalue of a pixel is greater
than the value in the matrix, the color white is used; otherwise the color
black is used. An image consisting of only half grey is represented by a
pattern with an equal number of black and white pixels. The values in the
matrix are placed in such an order that annoying, nonrandom patterns are
avoided. The size of the matrix can also be 2X2; see Figure 2.
With error distribution, the image is scanned from top-left to bottom-right
while dithering a picture. When a color is chosen from the colorset to
represent an RGB value, an error is made. The amount of red, green, and blue
will not be the requested amount. This error can be added to the RGB value of
the next pixel. The color combination of the pixels is a better approximation
of the RGB value than the first approximation.
The error can be divided among more neighboring pixels (which have not been
scanned yet). This will result in a better image because the noise of the
error is more evenly spread. This is what Floyd and Steinberg suggested. They
added 3/8 of the error to the right pixel, 3/8 to the below pixel, and 1/4 to
the right-below pixel; see Figure 3.
--P.G.

Figure 1 A matrix used to tile an image.
Figure 2 A 2X2 matrix.
Figure 3 Floyd-Steinberg matrix.
Table 1: Error distribution versus ordered dither.
 ED OD 
Colorset arbitrary (+) linear (--)
Quality high (+) medium (--)
Speed slow (--) fast (+)
Contrast medium (--) high (+)
Update global (--) local (+)

Listing One 

REM Number of bits per primary color
rbits%=5
gbits%=5
bbits%=5

REM The size of the ordered dither matrix
accurate%=4

REM The number of colors in the palette ( <=256)
num_of_colors = 256-1

tot% = (1<<accurate%)-1
tr% = (1<<rbits%)-1
tg% = (1<<gbits%)-1
tb% = (1<<bbits%)-1
max = 4*(tot%+1)

REM Reserve the necessary memory
DIM bytes%(tot%)
DIM red(num_of_colors),green(num_of_colors),blue(num_of_colors)
DIM grey(num_of_colors)
DIM grid%(tot%)
DIM table%(tr%,tb%,tg%,tot%)

REM The indeces in the ordered dither matrix
CASE accurate% OF
WHEN 2:grid%()=0,3,2,1
WHEN 4:grid%()=0,12,3,15,8,4,11,7,2,14,1,13,10,6,9,5
OTHERWISE : PRINT "Invalid size":END
ENDCASE

REM Palette specification
REM The file consist records with the RGB value and the grey value
REM of each palette entry
f%=OPENIN("Palette")
IF f%=0 THEN PRINT "Couldn't open palette file for reading":END
FOR i%=0 TO num_of_colors
 INPUT#f%,red(i%),green(i%),blue(i%),grey(i%)
NEXT
CLOSE#f%

FOR ir%=0 TO tr%
 fr=ir%/tr%
 FOR ig%=0 TO tg%
 fg=ig%/tg%
 FOR ib%=0 TO tb%

 fb=ib%/tb%
 sr = fr
 sg = fg

 sb = fb
 FOR i%=0 TO tot%
 a=max
 FOR j%=0 TO num_of_colors
 d = (red(j%)-sr)*(red(j%)-sr)
 d += (green(j%)-sg)*(green(j%)-sg)
 d += (blue(j%)-sb)*(blue(j%)-sb)
 IF d<a THEN a=d:c%=j%
 NEXT
 bytes%(i%)=c%
 IF i%<tot% THEN
 f=tot%-i%
 sr+=(sr-red(c%))/f
 sg+=(sg-green(c%))/f
 sb+=(sb-blue(c%))/f
 ENDIF
 NEXT
 FOR i%=0 TO tot%-1
 FOR j%=i%+1 TO tot%
 IF grey(bytes%(i%))>grey(bytes%(j%)) THEN SWAP bytes%(i%),bytes%(j%)
 NEXT
 NEXT
 FOR i%=0 TO tot%
 table%(ir%,ig%,ib%,i%)=bytes%(grid%(i%))
 NEXT
 NEXT
 NEXT
NEXT

a$="RGB"+STR$(rbits%)+STR$(gbits%)+STR$(bbits%)+STR$(accurat e%)
F%= OPENOUT(a$)
IF F%=0 THEN PRINT "Couldn't open table file for writing":END
FOR r%=0 TO tr%
 FOR g%=0 TO tg%
 FOR b%=0 TO tb%
 FOR c% = 0 TO tot%
 BPUT#F%,table%(r%,g%,b%,c%)
 NEXT
 NEXT
 NEXT
NEXT
CLOSE#F%
END















December, 1994
PROGRAMMER'S BOOKSHELF


Net Surfing on the Printed Page




Ray Duncan


Ray Duncan is a software developer, neonatologist, and author of several
programming books. He can be reached at duncan@cerf.net.


A lot of water has passed over the dam, or, if you prefer, a great many
terabits have been routed through the backbone, since I wrote my "Programmer's
Bookshelf" columns on Internet books (February, April, and August 1993). In
that short time, the Internet has continued to grow exponentially, in effect
becoming the data conduit and mail switch for the entire civilized world.
Concurrently, it has become the darling of both politicians and the media, the
presumptive cornerstone of the National Information Initiative, a marketing
weapon for the commercial online services, a preferred medium of exchange for
soft pornography and stolen software, and fodder for hype masters of every
persuasion.
Book publishers have jumped onto the Internet bandwagon with a vengeance. Two
years ago, it was difficult to find a dozen books total about the Internet.
Then O'Reilly & Associates released Ed Krol's breakthrough book, The Whole
Internet User's Guide and Catalog, and much to their surprise sold 250,000
copies--a megahit, as technical books go. Now, the problem is rather
different--every publishing house has a dozen Internet books on the racks and
more on the way; the real challenge has become winnowing out the few books
with intrinsic value amid the deluge of clones, drones, and blatant rip-offs.
In this month's "Programmer's Bookshelf" I'll provide thumbnail sketches of
the subset of Internet-related books that I've read and believe are worth your
attention for one reason or another. For reasons of space (and to protect my
sanity), I have omitted the excruciatingly basic and tiresome books of the
Internet for Dummies or Idiot's Guide to the Internet genre. However, I have
included a few introductory books that are appropriate for the technically
sophisticated Dr. Dobb's audience, or have other special attributes.


Internet "How-To" Books


Connecting to the Internet, by Susan Estrada (O'Reilly & Associates, 1993,
$15.95, ISBN 1-56592-061-9), focuses entirely on the sometimes nontrivial
chore of obtaining Internet connectivity at a suitable speed and price point.
It includes useful explanations of the different types of network connections,
performance trade-offs, and an international list of network providers.
Internet: Getting Started, by April Marine, Susan Kirkpatrick, Vivian Neou,
and Carol Ward at the Network Information Systems Center, Stanford Research
Institute (Prentice Hall, 1994, $28.00, ISBN 0-13-289596-X), assembles lots of
useful reference material: an index to Requests for Comments (RFCs, the
Internet consensus technical-standard documents), a list of public providers,
overseas-contact information, and the like. The book takes a bit too much for
granted to be used alone by the average Internet beginner, but makes a good
companion to the Krol book.
Internet CD, by Vivian Neou at SRI International (Prentice Hall, 1994, $49.95,
ISBN 0-13-123852-3), is a book/CD-ROM combination made up of public-domain or
shareware TCP/IP software for MS-DOS, Microsoft Windows, and UNIX. The book
itself consists largely of patched together program documentation files with a
minimum of editing and is cryptic to the point of unusability in several
sections. However, the package is well worth its price for the CD-ROM, which
contains UUCP, SLIP, packet drivers, gopher, FTP, and mail software for DOS
and Windows; gopher and WAIS software for UNIX; and the complete LINUX Version
1.2 operating system with source code. Inexplicably (and disappointingly), the
CD does not contain the Mosaic clients for Windows or UNIX, or any of the
excellent public domain/shareware Macintosh TCP/IP software that is available.
The Internet Companion Plus, by Tracy LaQuey and Jeanne C. Ryer
(Addison-Wesley, 1993, $19.95, ISBN 0-201-62719-1), is a pocket-sized guide to
basic Internet terminology, utilities, and resources. Although it takes a
relatively low-tech, gee-whiz approach, it is a good choice for your
technically challenged friends. The latest printing includes a disk with
communications software by Intercon Systems.
The Internet Navigator, Second Edition, by Paul Gilster (John Wiley & Sons,
1994, $24.95, ISBN 0-471-05260-4), is detailed, readable, and with its fine
production values, could have been one of the best introductory books
available. Unfortunately, the first edition of the book was severely retro: It
was strictly oriented to dial-up, character-based accounts on UNIX hosts and
completely ignored non-UNIX connectivity issues and graphical clients such as
Mosaic. The second edition, however, includes more on ftp and other topics,
which may make the book more useful to Mac and PC users with Internet
connectivity.
The Whole Internet User's Guide and Catalog, Second Edition, by Ed Krol
(O'Reilly & Associates, 1994, $24.95, ISBN 1-56592-063-5), is an excellent
mid-level introduction to the Internet, the first of its kind and still the
book to beat. It explains why to connect, how to connect, how to use basic
network tools, how to troubleshoot networking problems, and what cool things
you can do with your Internet connectivity once you've got it. The second
edition has expanded coverage of the World Wide Web (WWW) and its graphical
clients, including Mosaic, and an updated resource guide. Beautifully written,
illustrated, edited, and produced, this book is a model for technical
publishers everywhere.
USENET: Netnews for Everyone, by Jenny A. Fristrup (Prentice Hall, 1994,
$24.95, ISBN 0-13-123167-7), is a brief introduction to what USENET is and how
it works, news-reader programs, terminology, and netiquette. It includes a
comprehensive list of USENET newsgroups sorted by category and topic. Although
oriented to beginners, DDJ readers may still find it useful.
Using UUCP and UseNet, by Grace Todino and Dale Dougherty (O'Reilly &
Associates, 1986, $21.95, ISBN 0-937175-10-2), is UNIX-centric and its
sections on UUCP are increasingly irrelevant, but its explanations of the use
and abuse of Internet "news" will be helpful to any Internet novice.
Zen and the Art of the Internet: A Beginner's Guide, Third Edition, by Brendan
P. Kehoe (Prentice Hall, 1994, $23.95, ISBN 0-13-121492-6), is a brief,
well-focused introduction to Internet terms, electronic mail, utilities, and
resources. Previous editions of this book were rather dismal, but the third
edition is attractive and easy to read. A good starting point for any would-be
Internet user.


Internet Resource Guides and Lists


!%@:: Addressing and Networks, Fourth Edition, by Donnalyn Frey and Rick
Williams (O'Reilly & Associates, 1994, $9.95, ISBN 0-56592-046-5), starts out
with an introduction to addressing conventions, continues with a brief
description of each major network throughout the world that has connectivity
to the Internet, and finishes up with a comprehensive list of known Internet
addressing domains and subdomains. This is a great coffee-table book for
Internet nerds and self-professed "net surfers," but it also has some
practical utility for network managers.
EMAIL Addresses of the Rich & Famous, by Seth Godin (Addison-Wesley, 1994,
$7.95, ISBN 0-201-40893-7), is divided into various categories: Academics,
Authors, Celebrities, Government Mandarins, Phi-losophers and Deep Thinkers,
Reporters, Military-Industrial Complex, Rich Americans (such as Bill Gates and
Ross Perot), World Peace Creators, and so on. Not a particularly useful book
in the strictest sense, but makes a great conversation piece.
Internet Worlds On Internet '94, edited by Tony Abbott (Mecklermedia, 1994,
$34.95, ISBN 0-887-369-294), is a hashed-over version of the famous (and free)
Internet "List of [mailing] Lists," augmented with some additional sections on
electronic journals, community and campus information services, FTP sites, and
WAIS servers. I mention this book mainly to steer you away from it; it is
ridiculously expensive, poorly organized and edited, and the production values
stink.
Internet: Mailing Lists, edited by Edward T.L. Hardie and Vivian Neou at the
Network Information Systems Center, Stanford Research Institute (Prentice
Hall, 1993, $29.00, ISBN 0-13-289661-30), is a comprehensive guide to mailing
lists, many of which are reflected to USENET (or vice versa). Especially
valuable for Internet users who have dial-up e-mail access only. 
New Riders' Official Internet Yellow Pages, by Christine Maxwell and Czeslaw
Jan Grycz (New Riders Publishing, 1994, $29.95, ISBN 1-56205-306-X), is an
extensive list of Internet mailing lists, FTP archives, telnet query
interfaces, gopher and WWW servers, and other resources, categorized and
sorted alphabetically by topic (for example, medieval history, Medline,
Melrose Place, memorabilia, and so on).
The Internet Directory, by Eric Braun (Ballantine Books, 1994, $25.00, ISBN
0-449-90898-4), is another comprehensive list of Internet access providers,
mailing lists, newsgroups, library catalogs, FTP archives, and archie, gopher,
WAIS, and WWW servers.
The Internet Yellow Pages, by Harley Hahn and Rick Stout (Osborne McGraw-Hill,
1994, $27.95, ISBN 0-07-882023-5), is similar to New Riders' Yellow Pages, but
is not as detailed or as well organized. Somewhat slanted toward that category
of Internet users who download GIF files of girls in swimsuits and hang out in
the goofier newsgroups.


Internet Technology Books


Exploring the Internet: A Technical Travelogue, by Carl Malamud (Prentice
Hall, 1992, $26.95, ISBN 0-13-296898-3), defies classification and pretty much
defies description as well. The author recounts his jaunts around the world to
meet Internet wizards and taste exotic foods. You'll learn about some strange
fruits in this book, both the kind you might find in a market and the kind you
might find administering a national network.
Interconnections: Bridges and Routers, by Radia Perlman (Addison-Wesley, 1992,
$49.50, ISBN 0-201-56332-0), is completely dedicated to the obscure and
somewhat magical topic of bridges, routers, data-packet handling, and routing
algorithms.
Internet System Handbook, edited by Daniel C. Lynch and Marshall T. Rose
(Addison-Wesley 1993, $61.25, ISBN 0-201-56741-5), is a massive collection of
technical essays and overviews by various prestigious network architects,
gurus, and programmers. Tough to digest technically in some areas, and
suffering from uneven editing and style, but nonetheless a valuable resource.
Internetworking with TCP/IP, Volume I: Principles, Protocols, and
Architecture, by Douglas E. Comer (Prentice Hall, 1991, $60.00, ISBN
0-13-468505-9), and Internetworking with TCP/IP, Volume II: Design,
Implementation, and Internals, Second Edition, by Douglas E. Comer and David
L. Stevens (Prentice Hall, 1994, $50.00, ISBN 0-13-125527-4), constitute the
most detailed and most structured technical course on TCP/IP networking that
the average programmer could need or want. As you might guess from the titles,
the first volume is mainly descriptive, while the second focuses on
implementation techniques with plenty of example source code. The second
volume is also probably the only book ever dedicated to a specific IP address.
Internetworking: A Guide to Network Communications, by Mark A. Miller (M&T
Books, 1991, $34.95, ISBN 1-55851-143-1), is a somewhat abstract overview of
internetworking and protocols, both LAN and WAN.
Open Systems Networking: TCP/IP and OSI, by David M. Piscitello and A. Lyman
Chapin (Addison-Wesley, 1993, $49.50, ISBN 0-201-56334-7), is a unique book
that explains, compares, and contrasts in parallel the OSI- and TCP/IP-based
network layers, routing, directory services, and management. Includes much
interesting historical perspective and editorialization. For reasons that
escape me, the authors seem to feel that OSI still has some hope of being a
significant force in internetworking.
SNMP, SNMPv2, and CMIP: The Practical Guide to Network-Management Standards,
by William Stallings (Addison-Wesley, 1993, $49.50, ISBN 0-201-63331-0), is a
rather dry textbook about the so-called "Simple Network Management Protocol"
(SNMP) protocols, including technical overviews of SNMP-based network
monitoring, analysis, and management.
How to Manage Your Network Using SNMP: The Networking Management Practicum, by
Marshall T. Rose and Keith McCloghrie (Prentice Hall, 1995, $48.00, ISBN
0-13-141517-4), discusses the basic principles and technology of the Simple
Network Management Protocol (SNMP), then illustrates the power of SNMP with
code examples and actual output from network sniffers and browsers. One of the
authors (Rose) essentially invented SNMP and is well-known as an Internet
super-wizard. The book appears to be aimed mostly at the do-it-yourself types,
but may also be of interest to users of commercial network-management
packages.
Stacks: Interoperabilitiy in Today's Computer Networks, by Carl Malamud
(Prentice Hall, 1992, $42.00, ISBN 0-13-484080-1), is a succinct overview of
the competing network protocols and transports: OSI, TCP/IP, ISDN, X.25, and
so on.

TCP/IP Illustrated, Volume 1: The Proto-cols, by W. Richard Stevens
(Addison-Wesley, 1994, $47.50, ISBN 0-201-63346-9), is focused entirely on the
implementation and operation of various TCP/IP protocols, with byte-by-byte
examples of actual protocol transactions obtained with network-monitoring
utilities. Anyone who is writing low-level network code will want to keep this
book handy. Strangely enough, Addison-Wesley's publicist wanted me to sign a
nondisclosure agreement before looking at galleys of this book. I'm still
trying to figure out why, considering the protocols themselves are in the
public domain.
TCP/IP: Architecture, Protocols, and Implementation, by Sidnie Feit
(McGraw-Hill, 1993, $45.00, ISBN 0-07-020346-6), has a textbook approach,
which is thorough and detailed, but not very friendly. It includes chapters on
the physical networking layers; IP routing; the IP, TCP, FTP, telnet, and mail
protocols; NFS; SNMP-based network management; and the socket programming
interface.
The Internet Message: Closing the Book with Electronic Mail, by Marshall T.
Rose (Prentice Hall, 1993, $50.00, ISBN 0-13-092941-7), provides an overview
of Internet mail protocols by one of the most famous Internet gurus. This book
makes it look easy.
The Simple Book: An Introduction to Internet Management, Second Edition, by
Marshall T. Rose (Prentice Hall, 1994, $55.00, ISBN 0-13-177254-6), is a
highly readable (but relentlessly technical) explanation of the Simple Network
Management Protocol Version 2 (SNMPv2) by one of its inventors.


Network Administration


DNS and BIND, by Paul Albitz and Cricket Liu (O'Reilly & Associates, 1992,
$29.95, ISBN 1-56592-010-4), has very helpful explanations of DNS, bind,
sendmail configuration, and the like, although coverage of SunOS peculiarities
is spotty and somewhat outdated.
Firewalls and Internet Security: Repelling the Wily Hacker, by William R.
Cheswick and Steven M. Bellovin (Addison-Wesley, 1994, $26.50, ISBN
0-201-63357-4), is written by the researchers responsible for protecting the
internal network at AT&T Bell Labs. This book is both entertaining and scary.
The authors show you exactly how to set up a near-bulletproof firewall against
hackers, industrial saboteurs, graduate students with too much time on their
hands, and other nasties that go bump in the night. Then they provide you with
plenty of motivation in the form of horror stories about past attacks on their
own network.
Managing UUCP and UseNet, Tenth Edition, by Tim O'Reilly and Grace Todino
(O'Reilly & Associates, 1992, $27.95, ISBN 0-937175-93-5), is a general
discussion of e-mail and news servers and clients.
Practical UNIX Security, by Simson Garfinkel and Gene Spafford (O'Reilly &
Associates, 1991, $29.95, ISBN 0-937175-72-2), is UNIX-centric but includes
discussions of passwords, gateways, firewall machines, and the like, that will
be valuable to any system administrator.
Sendmail, by Bryan Costales with Eric Allman and Neil Rickert (O'Reilly &
Associates, 1993, $32.95, ISBN 1-56592-056-2), describes the sendmail daemon,
which controls mail flow on BSD UNIX workstations. Sendmail is pervasive but
poorly understood--mostly because it is configured via scripts that are
cryptic and obscure beyond imagining. Marshall Rose has written: "It is a
tribute to the Internet mail system that it works so well, given that sendmail
behaves so poorly." This unique, hefty book tames the savage sendmail and
should be within arm's length of every network manager at all times.
TCP/IP Network Administration, by Craig Hunt (O'Reilly & Associates, 1992,
$29.95, ISBN 0-937175-82-X), is a clearly written, extremely helpful overview
of TCP/IP from protocol basics to configuration of gateways, DNS, and
sendmail. It also includes nice discussions of network troubleshooting and
security considerations. I have found this book an invaluable resource while
configuring and managing all types of workstations.
TCP/IP: Running a Successful Network, by K. Washburn and J.T. Evans
(Addison-Wesley, 1993, $48.50, ISBN 0-201-62765-5), is a highly technical,
comprehensive, practical treatment of TCP/IP network implementation and
administration at all levels. Protocols are explained in detail with
occasional source-code examples. It includes material on NetBIOS, NFS,
routing, gateways, and network testing and debugging.
Troubleshooting TCP/IP: Analyzing the Protocols of the Internet, by Mark A.
Miller (M&T Books, 1992, $44.95, ISBN 1-55851-268-3), is an interesting and
highly readable guide to implementation, maintenance, and debugging of TCP/IP
networks and internetworks, with many practical examples and extensive
references. Miller pays particular attention to multiprotocol environments.













































December, 1994
SWAINE'S FLAMES


What's in a Name?


The question in the title is Shakespeare's. The bard was surprisingly down on
names. "A rose by any other name would smell as sweet." "Honor doth forget
men's names." "No profit but the name." "Refuse thy name." "I cannot tell what
the dickens his name is." But then, the bard had a little trouble with his own
name. What's that about, Will? Methinks thou dost protest too much.
Names are regarded as having power, a fact exploited by science-fiction writer
Vernor Vinge in the novel True Names and proven by the Vietnam Memorial. "The
name of a man," Marshall McLuhan said, "is a numbing blow from which he never
recovers." It was as lucid as anything else the man has said.
Hypertext visionary Ted Nelson knows about the power of names and chooses
names of his inventions with great care. His magnum opus, Xanadu, is named
after the poetic dream of a dope fiend started in 1798 and not finished yet.
The greatest virtue of the next version of Microsoft Windows is its power over
names. It will let you name a file Inventory rather than Inventor.y. This
version has been known as Chicago, but that's only its code name, not its true
name. We learned recently that its true name is Windows 95.
It's a good name, like most Microsoft names. Windows, Works, Word, Office,
Access. Microsoft names usually sound almost generic for the category. Excel
was a departure, for which Microsoft apparently paid: I gather that a license
agreement says that Microsoftians must always refer to the product as
Microsoft Excel. Like they mind.
And now IBM has announced that it will not make waves if Microsoft wants to
trademark Windows. IBM couldn't care less. IBM's not into names; its only good
product name was Personal Computer. The latest brainstorm from IBM's name
generator: Aptiva. Right.
Apple used to be good at names, although its approach was always riskier than
Microsoft's. Apple, Macintosh, eWorld. Apple's latest offering is the name for
its operating-system software. If you're gonna license it, you'd better have
an official name for it. And they came up with--Mac OS? Hmm. Well, it's an
honest name.
Windows 95 is an honest name, but I can't help but think that it would have
avoided a lot of confusion and false alarms if Microsoft had shared this
information with us two years ago.
Well, enough of that. Here's a puzzle about names. I'd call it an
"emoticontest," but then you'd think there was a prize. The only prize I'm
offering is that I'll mention the name of the first person who gets a correct
answer to me by any means. Since I don't print my phone number in this column,
the rules favor those who can reach me electronically, but that seems
appropriate in this case. Besides, there's no real prize involved.
Here it is. Emoticons, or smileys, are used in electronic communications to
express emotions. They are also sometimes used to identify, or anyway to refer
to, people. Below are two smileys that are intended to represent two people
often seen together. Name the people.
&8-) 7
(:-\ L
Michael Swaine
editor-at-large
MikeSwaine@eWorld.com







































December, 1994
OF INTEREST
Advanced Computing Labs has introduced Neural++, a collection of neural
networks implemented as a C++ class library. Among the neural nets included
are BP, CPN, Kohonen, and Outstar. Data scaling and Z-score data preprocessing
are transparently automated. Available for DOS/Windows, Neural++ sells for
$269.00 and includes Math++, a set of numerical classes that provide access to
matrices, vectors, linear algebra, random numbers, regression, simulation, and
data analysis. Reader service no. 20.
Advanced Computing Labs
P.O. Box 1547
West Chester, OH 45069
513-779-2716
XVT-Architect 1.0, from XVT Software, is a new visual programming tool for
cross-platform development. XVT-Architect provides a graphical means of
developing C++ programs using point-and-click interaction with the XVT-Power++
application framework. Integrated into XVT-Architect is the Rogue Wave
Tools.h++ library, which consists of more than 100 classes for data, string,
and character manipulation.
The XVT-Architect package is composed of three modules: Blueprint, which is a
hierarchy browser; Drafting Board, which is a graphical layout tool for
designing windows, menus, strings, and the like; and Object Strata, which is a
tool that lets you view the class inheritance of an object selected from
either the Object Strata or Blueprint module.
Once the program design is complete, XVT-Architect generates code separately
for both the user-code shells and the GUI. XVT-Architect is included as part
of the XVT Development System for C++ 3.0 and is available for Windows,
Windows NT, Macintosh, OS/2, OSF/Motif, and Open Look. The development system
sells for $1950.00 per developer for PCs, and $6300.00 for workstations.
Reader service no. 21.
XVT Software
4900 E. Pearl Circle
Boulder, CO 80302
303-443-4223
Novell has begun shipping its Visual AppBuilder 1.0, an object-oriented visual
programming environment. The package includes the AppWare Loadable Module
(ALM) Builder library consisting of more than 80 component objects for tasks
ranging from messaging and authentication to directory services. Additionally,
with the ALM Builder you can write your own C/C++ ALMs for use in the Visual
AppBuilder development environment. The package also includes the AppWare Bus,
an engine which manages and coordinates the interaction of the ALMs. Visual
AppBuilder sells for $495.00 and includes a royalty-free run-time license.
Reader service no. 22.
Novell
122 East 1700 South
Provo, UT 84606-6194
801-429-7000
Image Format Library 5.0, an imaging library from AccuSoft, includes tools for
reading the Kodak Photo CD format and incorporating high-performance
raster-imaging capabilities into your software. In addition to Photo CD, the
library supports JPEG, TIFF, PCX, DIB, TGA, GIF, WMF, PICT, DCX, WPG, EPS, and
BMP formats. Version 5.0 also provides automatic thumbnails and new
compression algorithms. The Visual Basic library sells for $495.00 and the
Windows/DOS library for $795.00. Version 5.0 32-bit libraries for Windows, NT,
OS/2, Visual Basic, UNIX, and Macintosh are priced at $995.00 each. Reader
service no. 23.
AccuSoft
P.O. Box 1261
Westborough, MA 01581
508-898-2770
The SQL-Sombrero suite of database-development tools has been announced by
SFI. The suite provides an interface to Sybase's Client Library and
Microsoft's DB-Library without having to resort to C/C++. Additionally, the
toolset supports OLE Automation, VBXs, and OLE Custom Controls, allowing
direct access to Sybase and Microsoft SQL Server data. 
The SQL-Sombrero family, which works with all versions of SQL Server
(including Sybase Server 10) on all platforms, sells for $249.00. Reader
service no. 24. 
SFI 
880 Boulevard de la Carrire, Suite 120
Hull, PQ Canada J8Y 6T5
819-778-5045
The CodeSafe Protection Engine from EliaShim Microcomputers is a
copy-protection system that prevents "protected" files on diskettes from being
copied. However, the protected information is transferable from a CodeSafe
diskette to a local hard drive without the need for a key disk. In other
words, the unique serial number assigned by the CodeSafe system cannot be
duplicated. 
One protection is based on an envelope method, which adds a protection shell
and encrypts existing EXE or COM files. Encryption algorithms are included to
prevent debugging. Another method uses calls embedded in the source code to a
function contained in object files that are supplied for most programming
languages. The engine, which supports DOS 3 through 6.x and Windows 3.x, is
priced on a per-unit-of-protection basis starting at $1.00 per copy. Reader
service no. 25.
EliaShim Microcomputers
4005 Wedgemere Drive
Tampa, FL 33610
813-744-5177
The Guide Reader DLL, a dynamically linked library that lets you embed
hypertext-structured documents into a Windows-based host application, has been
announced by InfoAccess (formerly OWL International). By embedding hypertext
display capabilities into applications, developers avoid writing their own
hypertext engines, yet still provide users with interactive access to text,
graphics, and other multimedia data. Built on top of the Guide Reader, this
DLL also provides bookmarks, annotations, full-text search, and other
document-related features. Reader service no. 26.
InfoAccess
2800 156th Ave. SE
Bellevue, WA 98007
206-747-3203
Visual/Recital 1.0 for MS-DOS, an object-oriented development environment for
client/server applications, has been released by Recital. The environment
provides character-mode, DOS-based systems with window user interfaces that
sport dialog boxes, pull-down menus, push buttons, and the like. The system
also provides an integrated data dictionary and RDBMS engine. 
Applications developed with Visual/Recital 1.0 for MS-DOS (which is fully
xBase compatible with dBase IV and 5, Microsoft's FoxPro, and Computer
Associates' Clipper) are portable to VAX/VMS, OpenVMS, HP MPE, and more than
70 UNIX platforms. Reader service no. 27.
Recital 
85 Constitution Lane
Danvers, MA 01923
508-750-1066
Software developers creating localized applications for international markets
might want to investigate Wintertree Software's Sentry Spelling-Checker
Engine, which now provides multilingual support. By adding support for the
8-bit ANSI character set, the engine works with languages such as French,
German, Italian, and Spanish. 
Versions of the royalty-free Sentry Spelling-Checker, which include American
English and British English dictionaries, are available with source code (ANSI
C) for $599.00; in binary form for MS-DOS it sells for $99.00, or for Windows,
$169.00. Dictionaries for other languages sell for $199.00 each. Reader
service no. 28.
Wintertree Software
43 Rueter Street
Nepean, ON Canada K2J 3Z9
613-825-6271
DT Software has released Version 3.0 of its text-retrieval software: dtSearch
for DOS and dtSearch for Windows. The package performs indexed, unindexed, and
combination searches across multiple indexes, directories, and drives.
Searches include Boolean, proximity, phrase, wildcard, segment, and
filename/date.
Version 3.0 has been enhanced to support network servers, fuzzy searching,
phonic searching, unlimited indexing, scrolling word list, stemming,
multiple-file tagging, annotation, and so on. dtSearch for DOS sells for
$149.00 (single user) and dtSearch for Windows for $199.00 (single user).
Reader service no. 29.
DT Software
2101 Crystal Plaza Arcade, Suite 231
Arlington, VA 22202 703-521-9427

IBM has announced that the next version of OS/2 will include TCP/IP
communications software that provides comprehensive Internet access.
Additionally, OS/2 will include utilities such as Gopher, ftp, and Telenet, in
addition to other e-mail capabilities such as its own graphics web browser
called "WebExplorer." All utilities will be available via point-and-click
access. Reader service no. 30.
IBM
1133 Westchester Ave.
White Plains, NY 10604
914-642-3000
PageAhead Software has started shipping Version 2.1 of its SimbaEngine SDK,
which provides an SQL engine and other components for creating ODBC drivers
for any non-SQL data source. In particular, the Version 2.1 query-optimization
engine supports push-down filters and joins so that ODBC drivers can "push"
views or joins down to the application for maximum query performance; column
caching, so information is retrieved only once and is stored in memory;
transaction processing that allows users to undo updates, inserts, or deletes;
and both session- and table-based security models. Other improvements include
support for decimal data up to a precision of 60 decimal places and hashed
indexes. Reader service no. 31.
PageAhead Software 
2125 Western Ave., Suite 301
Seattle, WA 98121
206-441-0340
Among recent books released by Butterworth-Heinemann are The PowerPC: A
Practical Companion, by Steve Heath, and ISO 9000 Quality Systems Handbook,
Second Edition, by David Hoyle. In his PowerPC book, Heath covers the
programming model, instruction set, memory management, exception processing,
and practical programming techniques. The 388-page book sells for $24.95. ISBN
0-7506-1801-9. 
The new edition of Hoyle's ISO 9000 book includes recent clarifications and
amendments to the standard, as well as annotations detailing rationale and
compliance suggestions. The 420-page book sells for $29.95. ISBN
0-7506-2130-3. Reader service no. 32.
Butterworth-Heinemann Publishers
313 Washington Street
Newton, MA 02158-1626
617-928-2500
PCMS*CTS from SQL Software is a process-configuration management package that
uses e-mail to alert interested parties that changes have been made.
Process-configuration management combines the traditional elements of version
control with defect tracking and problem-management systems. PCMS*CTS uses the
SQL standard to target for heterogeneous networks. Consequently, organizations
can change software, hardware, or documentation in a well-defined and visible
fashion. Reader service no. 33.
SQL Software
8500 Lessburg Pike, Suite 405
Vienna, VA 22182
703-760-0448
Mosaic Communications Corp. has unveiled Mosaic NetScape, a Mosaic browser for
the World Wide Web, and Mosaic NetSite, a Mosaic server. The Mosaic NetScape
Network Navigator, a browser optimized to efficiently run over 14.4-Kbit/sec
modems, is available for Windows, Macintosh, and the X Window System. The
browser provides encryption and server authentication and supports the JPEG
image format. The UNIX-based Mosaic NetSite server is designed for users who
want to set up and maintain servers for distributing information and
conducting commercial operations. The NetSite Communications server is for
nonsecure applications, while the NetSite Commercial server, which
incorporates RSA data security, is for applications where security is
important. Reader service no. 34.
Mosaic Communications Corp.
650 Castro Street, Suite 500
Mountain View, CA 94041
415-254-1900
Building Better Applications: A Theory of Efficient Software Development, by
Michael R. Dunlavey, has been published by Van Nostrand Reinhold. In his book,
Dunlavey, who wrote the article, "Performance Tuning: Slugging It Out!" (DDJ,
November 1993), provides tools for more effective software development,
including high- and low-level methods of increasing application speed,
differential techniques for simplifying data and source code, and reusable
diagnostics for call-stack sampling and time-line analysis. The 176-page book
sells for $39.95. ISBN 0-442-01740-5. Reader service no. 35.
Van Nostrand Reinhold
115 Fifth Ave.
New York, NY 10003
800-842-3636
Dart Communications recently released PowerTCP Tools for Windows, a set of
protocol libraries that provide turnkey TCP/IP protocols for a flat license
fee. The SDK, which includes both 16- and 32-bit DLLs and a 16-bit VBX, will
run in any environment that provides a Windows Sockets interface. The DLLs
provide C/C++ (Microsoft name-mangling only) interfaces. The first protocols
offered include TCP, Telnet, ftp, VT-220, and SMTP. The PowerTCP development
license sells for $598.00 (single-user) or $998.00 (five-user license).
Run-time licenses range from $1600.00 to $3000.00. Reader service no. 36.
Dart Communications
6 Occum Ridge Road
Deansboro, NY 13328-1008
315-841-8106
OnCmd xBase for OS/2, recently released by On-Line Data, provides multiuser
xBase functionality to OS/2. The software makes it possible for you to convert
FoxPro, Clipper, or dBase programs to native OS/2 32-bit Presentation Manager
applications. The toolkit sells for $695.00 but is available at an
introductory price of $149.00. Network licenses are also available. Reader
service no. 37.
On-Line Data
5 Hill Street
Kitchener, ON
Canada N2G 3X4
519-579-3930




















Special Issue, 1994
EDITORIAL


Skyrocketing into the Future


Just when you thought it was safe to go net surfing, who should pop up in the
bit stream but Madonna, for goodness sakes. I don't know what ARPA researchers
had in mind when they started building the Internet in the late '60s, but I'm
here to tell you, it wasn't a 30-second sound clip from a torch-singer
wannabe's newest single. But then, that's why technology is like a box of
chocolates--when you throw something out there, you never quite know what
people will do with it.
There's little question that 25 years ago no one could have predicted what's
going on today with the "information highway." From the Internet and
Microsoft's upcoming "Marvel"online service to single-user BBSs run by
high-school students, millions of homes, schools, and businesses are
communicating digitally like never before. Even those who should be able to
predict the future didn't expect all of this. At a recent stockholder's
meeting, Henry Bloch, chairman of H&R Block, the parent company of CompuServe,
said that "when we acquired CompuServe in 1980, we could see that the computer
age was dawning, but we didn't dream how far-reaching the changes might be."
He went on to add that "what looked like a potentially nice fit became a
skyrocket to the future."
The numbers are staggering. In just a few years, for instance, the number of
Internet hosts has shot from about 1000 to over 2 million, servicing, say some
estimates, more than 30 million users. For its part, CompuServe has rocketed
to more than 2.25 million subscribers in little over a decade, growth mirrored
by other online services.
All of these people aren't sitting around listening to Madonna sound bytes.
Sure, entertainment is an online staple, but mainly in the form of multiplayer
network games. Of course, if the entertainment industry has its druthers,
you'll be downloading movies, music, and more "real soon now," and if you want
a snack while surfing the net, you can already order a pizza over the Internet
and have it delivered to your door.
That's not to say that joyriding on the information highway is all fun and
games. From income-tax filing to work-at-home telecommuting, people are
finding new ways to cope with old problems. In what has to be an enviable
position, H&R Block's tax-preparation and online-service operations will
shortly begin collaborating, offering direct, electronic tax filing through
CompuServe. According to Henry Bloch, three million people currently use PCs
to prepare tax returns, so the step to electronically filing those returns is
a logical one. On the downside, the IRS is considering charging an $8.00 user
fee to anyone who files electronically, even though it saves the IRS money to
process electronic returns--so much for the government's push for the
paperless office.
In all likelihood, many of the people filing taxes electronically will be
telecommuters who work remotely using PCs and modems. According to some
reports, the number of U.S. employees who telecommute, either from home or
satellite offices, has grown from 7.6 million people in 1993 to 8.8 million in
1994. Paving the way in most cases are telecommunications companies. AT&T, for
instance, recently declared a "telecommute day," urging employees to work from
any place except their office desk.
Politicians have seen the digital light, too. Just about every political party
and politician has some sort of online presence, from President Clinton
(president@whitehouse.gov) on down. In addition to communicating with
constituents, politicos are using the information highway to distribute press
releases and position papers. 
Of course, the result of this online explosion is that network infrastructure
can't expand rapidly enough to support it. We're already seeing instances of
multimedia, electronic mass mailings, and similar large-scale data transfers
bogging down performance. While solutions range from fiber-optic communication
lines to self-modifying protocols, I'd like to make a proposal that will cut
down the electronic equivalent of junk mail and raise revenues to expand the
infrastructure. Instead of levying electronic tax-return fees, let's charge
politicians every time they jump onto the information highway. On second
thought, they'd go along with it, then double our taxes to cover their costs.
Jonathan Erickson
Editor-in-chief












































Special Issue, 1994
The Economics of the Internet


Technology is only part of the challenge




Jeffrey K. MacKie-Mason and Hal Varian


The authors are faculty members in the University of Michigan's department of
economics. This article is adapted from "Economic FAQs about the Internet,"
which was first published in the Journal of Economic Perspectives (Summer
1994). The authors can be contacted at Hal.Varian@umich.edu and jmm@umich.edu.


The Internet is a world-wide network of computer networks that use a common
communications protocol, TCP/IP (Transmission Control Protocol/Internet
Protocol). TCP/IP provides a common language for interoperation between
networks that use a variety of local protocols (NetWare, AppleTalk, DECnet,
and others).
In the late 1960s, the Advanced Research Projects Administration (ARPA), a
division of the U.S. Defense Department, developed the ARPAnet to link
universities and high-tech defense contractors. TCP/IP technology was
developed to provide a standard protocol for ARPAnet communications. In the
mid-1980s, the National Science Foundation (NSF) created the NSFNET to provide
connectivity to its supercomputer centers and other general services. The
NSFNET adopted the TCP/IP protocol and provided a high-speed backbone for the
developing Internet.
From 1985 to January 1994, the Internet has grown from about 200 networks to
well over 21,000 and from 1000 hosts (end-user computers) to over 2 million.
Of U.S. sites, about 640,000 of these hosts are at educational sites, 520,000
at commercial sites, and 220,000 at government/military sites; most of the
other 700,000 hosts are elsewhere in the world. NSFNET traffic has grown from
85 million packets in January 1988 to 46 billion packets in December 1993. (A
packet is about 200 bytes.) This is more than a 500-fold increase in only six
years. The traffic on the network is currently increasing at a rate of 6
percent a month. (Current NSFNET statistics are available by anonymous ftp
from nic.merit.edu.)
Probably the most frequent use of the Internet is e-mail, followed by file
transfer and remote login. In terms of traffic, about 42 percent of total
traffic is file transfer, 17 percent e-mail, and 24 percent other
services--including information-retrieval programs such as gopher, Mosaic, and
World Wide Web. People search databases (including the catalogs of the Library
of Congress and scores of university research libraries), download data and
software, and ask (or answer) questions in discussion groups on numerous
topics.
In terms of organization, the Internet is a loose amalgamation of computer
networks run by many different organizations in over 70 countries. Most of the
technological decisions are made by small committees of volunteers who set
standards for interoperability.


Internet Structure 


The Internet is usually described as a three-level hierarchy. At the bottom
are local area networks (LANs); for example, campus networks. Usually the
local networks are connected to a regional, or mid-level network. The
mid-levels connect to one or more backbones. The U.S. backbones connect to
other backbone networks around the world. There are, however, numerous
exceptions to this structure.
Regional networks provide connectivity between end users and the NSFNET
backbone. Most universities and large organizations are connected by leased
line to a regional provider. There are currently about a dozen regional
networks, some of which receive subsidies from the NSF; many receive subsidies
from state governments. A large share of their funds is collected through
connection fees charged to organizations that attach their local networks to
the mid-levels. A large university, for example, will typically pay
$60,000--$100,000 per year to connect to a regional.
The regionals are generally run by a state agency, or by a coalition of state
agencies in a given geographic region. They are operated as nonprofit
organizations.
As of January 1994, there are four public fiber-optic backbones in the U.S.:
NSFNET, Alternet, PSInet, and SprintLink. The NSFNET is funded by the NSF, and
is the oldest, having evolved directly out of ARPAnet, the original TCP/IP
network. The other backbones are private, for-profit enterprises.
Due to its public funding, the NSFNET has operated under an Acceptable Use
Policy that limits use to traffic in support of research and education. When
the Internet began to rapidly grow in the late 1980s, there was an increasing
demand for commercial use. Since Internet services are unregulated, entry by
new providers is easy, and the market for backbone services is becoming quite
competitive. (Transport of TCP/IP packets is considered to be a value-added
service, and as such, is not regulated by the FCC or state public-utility
commissions.)
Nowadays the commercial backbones and the NSFNET backbone interconnect so that
traffic can flow from one to the other. Given that both research and
commercial traffic is now flowing on the same fiber, the NSF's Acceptable Use
Policy has become pretty much a dead letter. The charges for these
interconnections are currently relatively small lump-sum payments, but there
has been considerable debate about whether usage-based settlement charges will
have to be put in place in the future.
Currently the NSF pays Merit Inc. (Michigan Educational Research Information
Triad) to run the NSFNET. Merit, in turn, subcontracts the day-to-day
operation of the network to Advanced Network Services (ANS), a nonprofit firm
founded in 1990 to provide network-backbone services. The initial funding for
ANS was provided by IBM and MCI.
It is difficult to say how much the Internet as a whole costs, since it
consists of thousands of different networks, many of which are privately
owned. However, it is possible to estimate how much the NSFNET backbone costs,
since it is publicly supported. As of 1993, NSF paid Merit about $11.5 million
per year to run the backbone. Approximately 80 percent of this is spent on
lease payments for the fiber-optic lines and routers (computer-based
switches). About 7 percent of the budget is spent on the Network Operations
Center, which monitors traffic flow and troubleshoots problems.
To give some sense of the scale of this subsidy, add to it the approximately
$7 million per year that NSF pays to subsidize various regional networks, for
a total of about $20 million. With current estimates at 20 million Internet
users (most of whom are connected to the NSFNET in one way or another), the
NSF subsidy amounts to about $1 per person per year. Of course, this is
significantly less than the total cost of the Internet; indeed, it does not
even include all of the public funds, which come from state governments,
state-supported universities, and other national governments. No one really
knows how much all this adds up to, although research projects are underway to
try to estimate the total U.S. expenditures on the Internet. It has been
estimated (read "guessed") that the NSF subsidy of $20 million per year is
less than 10 percent of the total U.S. expenditure on the Internet.
The NSFNET backbone will likely be gone by the time you read this, or soon
thereafter. With the proliferation of commercial backbones and
regional-network interconnections, a general-purpose, federally subsidized
backbone is no longer needed. In contracts awarded earlier this year, the NSF
will only fund a set of Network Access Points (NAPs), which will be hubs to
connect the many private backbones and regional networks. The NSF will also
fund a service that will provide fair and efficient routing among the various
backbones and regionals. Finally, the NSF will fund a very-high-speed
backbone-network service (vBNS) connecting its six supercomputer sites, with
restrictions on the users and traffic that it can carry. Its emphasis will be
on developing capabilities for high-definition remote visualization and video
transmission. The new U.S. network structure will be less hierarchical and
more interconnected. The separation between the backbone and regional network
layers of the current structure will become blurred, as more regionals are
connected directly to each other through NAPs, and traffic passes through a
chain of regionals without any backbone transport.
Most users access the Internet through their employer's organizational
network, which is connected to a regional. However, in the past few years a
number of for-profit independent providers of Internet access have emerged.
These typically provide connections between small organizations or individuals
and a regional, using either leased lines or dial-up access. Starting in 1993
some of the private computer networks (such as Delphi and World) have begun to
offer full Internet access to their customers. (CompuServe and the other
private networks have offered e-mail exchange to the Internet for several
years.)
Other countries also have many backbone and mid-level networks. For example,
most western European countries have national networks attached to EBone, the
European backbone. The infrastructure is still immature and quite inefficient
in some places. For example, the connections between other countries are often
slow or of low quality, so it is common to see traffic between two foreign
countries routed through the U.S. via NSFNET.


Internet Technology


Since most backbone and regional network traffic moves over leased phone
lines, there's little difference at a low level between the Internet and
telephone networks. However, there is a fundamental distinction in how the
lines are used by the Internet and phone companies. The Internet provides
connectionless packet-switched service, whereas telephone service is
circuit-switched. The difference may sound arcane, but it has vastly important
implications for pricing and the efficient use of network resources.
Circuit switching requires that an end-to-end circuit be set up before the
call can begin. A fixed share of network resources is reserved for the call,
and no other call can use those resources until the original connection is
closed. This means that a long silence between two teenagers uses the same
resources as an active negotiation between two fast-talking lawyers. One
advantage of circuit-switching is that it enables performance guarantees such
as guaranteed maximum delay, essential for real-time applications like voice
conversations. It is also much easier to do detailed accounting for
circuit-switched network usage.
In packet switching a data stream is divided into packets of about 200 bytes
(on average), which are then sent out onto the network. Each packet contains a
header with information necessary for routing the packet from origin to
destination. Thus, each packet in a data stream is independent.
The main advantage of packet switching is that it permits statistical
multiplexing on the communications lines. That is, the packets from many
different sources can share a line, allowing for very efficient use of the
fixed capacity. With current technology, packets are generally accepted onto
the network on a first-come, first-served basis. If the network becomes
overloaded, packets are delayed or dropped.
Internet technology is connectionless, meaning there is no end-to-end setup
for a session; each packet is independently routed to its destination. When a
packet is ready, the host computer sends it on to another computer, known as a
"router," which examines the destination address in the header and passes the
packet along to another router, chosen by a route-finding algorithm. A packet
may go through 30 or more routers in its travels from one host computer to
another. Because routes are dynamically updated, it is possible for different
packets from a single session to take different routes to the destination.
Along the way, packets may be broken up into smaller packets, or reassembled
into bigger ones. When the packets reach their final destination, they are
reassembled at the host computer. The instructions for doing this reassembly
are part of the TCP/IP protocol.
Some packet-switching networks are connection oriented (notably, X.25
networks, such as Tymnet and frame-relay networks). In such a network, a
connection is set up before transmission begins, just as in a circuit-switched
network. A fixed route is defined, and information necessary to match packets
to their session and defined route is stored in memory tables in the routers.
Thus, connectionless networks economize on router memory and connection set-up
time, while connection-oriented networks economize on routing calculations
(which have to be redone for every packet in a connectionless network).
Most of the network hardware in the Internet consists of communications lines
and switches or routers. In the regional and backbone networks, the lines are
mostly leased telephone trunk lines, which are increasingly fiber optic.
Routers are computers; indeed, the routers used on the NSFNET are modified
commercial IBM RS/6000 workstations, although routers custom-designed by
companies such as Cisco, Wellfleet, 3Com, and DEC probably have the majority
of market share.
Modem users are familiar with recent speed increases from 300 bps (bits per
second) to 2400, 9600, and now 19,200 bps. Leased-line network speeds have
advanced from 56 Kbps (kilo, or 103 bps) to 1.5 Mbps (mega, or 106 bps, known
as T-1 lines) in the late '80s, and then to 45 Mbps (T-3) in the early '90s.
Lines of 155 Mbps are now available, though not yet widely used. Congress has
called for a 1-Gbps (giga, or 109 bps) backbone by 1995.
The current T-3 45-Mbps lines can move data at a speed of 1400 pages of text
per second; a 20-volume encyclopedia can be sent coast to coast on the NSFNET
backbone in half a minute. However, it is important to remember that this is
the speed on the superhighway--the access roads via the regional networks
usually use the much slower T-1 connections.
Economics can explain the preference for packet switching over circuit
switching in the Internet and other public networks. Circuit networks use many
lines to economize on switching and routing--once a call is set up, a line is
dedicated to it, regardless of its rate of data flow, and no further routing
calculations are needed. This network design makes sense when lines are cheap
relative to switches.
The cost of both communications lines and computers has been declining
exponentially for decades. However, around 1970, switches (computers) became
relatively cheaper than lines. At that point, packet switching became
economical: Lines are shared by multiple connections at the cost of many more
routing calculations by the switches. This preference for using many
relatively cheap routers to manage few expensive lines is evident in the
topology of the backbone networks. In the NSFNET, for example, any packet
coming on to the backbone has to pass through two routers at its entry point
and again at its exit point. A packet entering at Cleveland and exiting at New
York traverses four NSFNET routers but only one leased T-3 communications
line.
At present there are many overlapping information networks (telephone,
telegraph, data, cable TV, and the like), and new networks are emerging
rapidly (such as paging or personal-communications services). Each of the
current information networks is engineered to provide a particular type of
service, and the added value provided by each of the different types was
sufficient to overcome the fixed costs of building overlapping physical
networks.
However, given the high fixed costs of providing a network, the economic
incentive to develop an integrated-services network is strong. Further, now
that all information can be easily digitized, the need for separate networks
for separate types of traffic is no longer necessary. Convergence toward a
unified, integrated-services network is a basic feature in most visions of the
much-publicized information superhighway. The migration to integrated-services
networks will have important implications for market structure and
competition.
The international telephone community has committed to a future network design
that combines elements of both circuit and packet switching to enable the
provision of integrated services. The CCITT (an international standards body
for telecommunications) has adopted a cell-switching technology called "ATM"
(asynchronous transfer mode) for future high-speed networks. Cell switching
closely resembles packet switching in that it breaks a data stream into
packets, which are then placed on lines shared by several streams. One major
difference is that cells have a fixed size, while packets can have different
sizes. This makes it possible in principle to offer bounded delay guarantees
(since a cell will not get stuck for a surprisingly long time behind an
unusually large packet).
An ATM network also resembles a circuit-switched network in that it provides
connection-oriented service. Each connection has a set-up phase, during which
a virtual circuit is created. The fact that the circuit is virtual, not
physical, provides two major advantages. First, it is not necessary to reserve
network resources for a given connection; the economic efficiencies of
statistical multiplexing can be realized. Second, once a virtual-circuit path
is established, switching time is minimized, allowing much-higher network
throughput. Initial ATM networks are already being operated at 155 Mbps, while
the non-ATM Internet backbones operate at no more than 45 Mbps. The path to
1000-Mbps (gigabit) networks seems much clearer for ATM than for traditional
packet switching.

The federal High-Performance Computing Act of 1991 targeted a gigabit per
second (Gbps) national backbone by 1995. Six federally funded testbed networks
are currently demonstrating various gigabit approaches. To get a feel for how
fast a gigabit is, note that most small colleges or universities today have
56-Kbps Internet connections. At 56 Kbps, it takes about five hours to
transmit one gigabit!
Efforts to develop integrated-services networks are also on the rise. Several
cable companies have already started offering Internet connections to their
customers. (Because most cable networks are one-way, these connections usually
use an asymmetric network connector that brings the input in through the TV
cable at 10 Mbps, but sends the output out through a regular phone line at
about 14.4 Kbps. This scheme may be popular since most users tend to download
more information than they upload.) AT&T, MCI, and all of the Regional Bell
Operating Companies (RBOCs) are involved in mergers and joint ventures with
cable TV and other specialized network providers to deliver new integrated
services such as video-on-demand. ATM-based networks, although initially
developed for phone systems, ironically have been first implemented for data
networks within corporations and by some regional and backbone providers.


Internet Pricing Schemes


Until recently, nearly all users faced the same pricing structure for Internet
usage. A fixed-bandwidth connection was charged an annual fee, which allowed
for unlimited usage up to the physical maximum-flow rate (bandwidth). We call
this "connection pricing." Most connection fees were paid by organizations
(universities, government agencies, and so on), with users paying nothing
directly themselves.
Simple connection pricing still dominates the market, but a number of variants
have emerged. The most notable is "committed information-rate" pricing,
whereby an organization is charged a two-part fee: one based on the bandwidth
of the connection, which is the maximum feasible flow rate, the other based on
the maximum guaranteed flow to the customer. The network provider installs
both sufficient capacity to simultaneously transport the committed rate for
all of its customers and flow regulators on each connection. When some
customers operate below the committed rate, the excess network capacity is
available on a first-come, first-served basis for the other customers. This
type of pricing is more common in private networks than in the Internet
because a TCP/IP flow rate can be guaranteed only network by network, greatly
limiting its value unless many of the 20,000 Internet networks coordinate on
offering this type of guarantee.
Networks that offer committed information pricing generally have enough
capacity to meet the entire guaranteed bandwidth. This is a bit like a bank
holding 100 percent reserves, but is necessary with existing technology since
there is no commonly used way to prioritize packets.
For most usage, the typical packet placed on the Internet is priced at zero.
There are a few exceptions at the outer fringes. For example, some private
networks (such as CompuServe) provide e-mail connections to the Internet.
Several of these charge per message above a low threshold. The public networks
in Chile and New Zealand charge customers by the packet for all international
traffic. 


Coping with Congestion 


However, most of the Internet does not price by the packet. Organizations pay
a fixed fee in exchange for unlimited access up to the maximum throughput of
their particular connection. This is a classic problem of the commons--the
externality exists because a packet-switched network is a shared-media
technology. Each packet I send beyond the maximum throughput imposes a cost on
all other users because the resources I'm using aren't available to them. This
cost can come in the form of delay or lost (dropped) packets.
Without an incentive to economize on usage, congestion can become quite
serious. Indeed, the problem is more serious for data networks than for many
other congestible resources because of the tremendously wide range of usage
rates. On a highway, for example, at a given moment, a single user is more or
less limited to either putting zero or one cars on the road. In a data
network, however, a single user at a modern workstation can send a few bytes
of e-mail or put a load of hundreds of Mbps on the network. Within a year, any
undergraduate with a new Macintosh will be able to plug in a video camera and
transmit live videos home to mom, demanding as much as 1 Mbps. Since the
maximum throughput on current backbones is only 45 Mbps, it is clear that even
a few users with relatively inexpensive equipment could bring the network to
its knees.
Congestion problems are not just hypothetical. For example, congestion was
quite severe in 1987 when the NSFNET backbone was running at much slower
transmission speeds (1.5 Mbps). Users running interactive, remote-terminal
sessions experienced excessive delays. As a temporary fix, the NSFNET
programmed the routers to give terminal sessions (using the telnet program)
higher priority than file transfers (using the ftp program). More recently,
large ftp archives, Web servers at the National Center for Supercomputer
Applications, the original Archie site at McGill University, and many other
services have had serious problems with overuse. 
If everyone just stuck to ASCII e-mail, congestion would not likely become a
problem for many years--if ever. However, new multimedia services such as
Mosaic and Internet Talk Radio are consuming ever-larger amounts of bandwidth,
and although the supply of bandwidth is increasing, so is the demand. If usage
remains unpriced, it is likely that in the foreseeable future, the demand for
bandwidth will sometimes exceed the supply.
Administratively, assigning different priorities to different types of traffic
is appealing. As a long-term solution to congestion costs, however, it is
impractical due to the usual inefficiencies of rationing. More importantly, it
is technologically impossible to enforce. From the network's perspective, bits
are bits, and there is no certain way to distinguish between different types
of uses. By convention, most standard programs use a unique identifier
included in the TCP header (the port number); this is what NSFNET used for its
priority scheme in 1987. However, it is a trivial matter to put a different
port number into the packet headers; for example, to assign the telnet number
to ftp packets to defeat the 1987 priority scheme. To avoid this problem,
NSFNET kept its prioritization mechanism secret, but that is hardly a
long-term solution.
What other mechanisms can be used to control congestion? The most obvious
approach is to charge some sort of usage price. To date, however, usage
pricing for backbone services has not been considered seriously, and even
tentative proposals have met with strong opposition. 
Many proposals rely on voluntary efforts to control congestion. Numerous
participants in congestion discussions suggest that peer pressure and user
ethics will be sufficient to control congestion costs. For example, recently a
single user started broadcasting a 350--450-Kbps audio-video test pattern to
hosts around the world, blocking the network's ability to handle a scheduled
audio broadcast from a Finnish university. When a network engineer sent a
strongly worded message to the user's site administrator, the offending
workstation was taken off the network. This illustrates one problem with
relying on peer pressure: The signal was not terminated until after it had
caused serious disruption. Also, it apparently was caused by a novice user who
did not understand the impact of what he had done; as network access becomes
ubiquitous, an ever-increasing number of unsophisticated users will have
access to applications that can cause severe congestion if not used properly.
And of course, peer pressure may be quite ineffective against malicious users
who want to intentionally cause network congestion.
One recent proposal for voluntary control is closely related to the 1987
method used by the NSFNET. This proposal would require users to indicate a
priority level for each of their sessions. Routers would be programmed to
maintain multiple queues, one for each priority class. Obviously, the success
of this scheme would depend on users' willingness to assign lower priorities
to some of their traffic. However, as long as one or a few abusive users can
create crippling congestion, voluntary priority schemes may be largely
ineffective.
In fact, a number of voluntary mechanisms are in place today. They are
somewhat helpful, in part because most users are unaware of them, or because
they require some programming expertise to defeat. For example, most
implementations of the TCP protocols use a slow start algorithm which controls
the rate of transmission based on the current state of delay in the network.
But nothing prevents users from modifying their TCP implementation to send
full throttle.
A completely different approach to reducing congestion is purely
technological: overprovisioning, or maintaining sufficient network capacity to
support the peak demands without noticeable service degradation. (The effects
of network congestion are usually negligible until usage is very close to
capacity.) This has been the most important mechanism used to date in the
Internet. However, overprovisioning is costly, and with both
very-high-bandwidth applications and near-universal access fast approaching,
it may become too costly. In short, will the cost of capacity decline faster
than the growth in capacity demand?
Given the explosive growth in demand and the long lead time needed to
introduce new network protocols, the Internet may face serious problems very
soon if productivity increases do not keep up. Therefore, we believe it is
time to seriously examine incentive-compatible allocation mechanisms, such as
various forms of usage pricing.


Choosing the Right Level of Service


The current Internet offers a single service quality: best-efforts packet
service. Packets are transported first-come, first-served with no guarantee of
success. Some packets may experience severe delays, while others may be
dropped and never arrive.
However, different kinds of data place different demands on network services.
E-mail and file transfers require 100 percent accuracy, but can easily
tolerate delay. Real-time voice broadcasts require much higher bandwidth than
file transfers and can only tolerate minor delays, but they can tolerate
significant distortion. Real-time video broadcasts have very low tolerance for
delay or distortion.
Because of these different requirements, network-routing algorithms should
treat different types of traffic differently--giving higher priority to, say,
real-time video than to e-mail or file transfer. But the user must truthfully
indicate what type of traffic is being sent. If real-time-video bit streams
get the highest quality service, why not claim that all of your bit streams
are real-time video?
The trick is to design a pricing mechanism that gives the users the right
incentive to ask for the kind of services they really need. If a user wants to
send high-priority traffic, then he will have to pay a first-class fare;
low-priority traffic, like e-mail, can travel tourist class. Economists have
come up with pricing mechanisms that, in theory, give users the right
incentives to reveal their true priorities. However, some of these pricing
mechanisms are very complicated; ordinary users--or even computer
hackers--would probably not want to spend a lot of time and effort figuring
out the cheapest way for transferring a file. But just as we use travel agents
to figure out the cheapest airfare, we could use "artificial
agents"--intelligent computer programs--to figure out the cheapest way to send
information across the network. If the network pricing mechanism presents the
right incentives to these "artificial agents" then every computer on the
network could be working together to optimize network use.
One of the first necessary steps for implementing usage-based pricing (either
for congestion control or multiple service-class allocation) is to measure and
account for usage. Accounting poses some serious problems. For one thing,
packet service is inherently ill-suited to detailed usage accounting because
every packet is independent. As an example, a one-minute phone call in a
circuit-switched network requires one accounting entry in the usage database.
But in a packet network, that one-minute phone call would require around 2500
average-sized packets; complete accounting for every packet would then require
about database 2500 entries. On the NSFNET alone, over 40 billion packets are
being delivered each month. Maintaining detailed accounting by the packet in a
way similar to phone-company accounting may be too expensive.
Another accounting problem concerns the granularity of the records.
Presumably, accounting detail is most useful when it traces traffic to the
user. Certainly, if the purpose of accounting is to establish prices as
incentives, those incentives will be most effective if they affect the person
actually making the usage decisions. But the network is, at best, capable of
reliably identifying only the originating host computer (just as phone
networks only identify the phone number that placed a call, not the caller).
The host computer will need another layer of expensive, complex authorization
and accounting software in order to track packets to specific user accounts.
Imagine, for instance, trying to account for student e-mail usage at a large,
public computer cluster.
The higher the level of aggregation, the more practical and less costly
accounting becomes. For example, the NSFNET already collects some usage
information about each of the subnetworks that connect to its backbone
(although this data is based on a sample, not an exhaustive accounting for
every packet). Whether accounting at lower levels of aggregation is worthwhile
depends on cost-saving innovations in internetwork accounting methods.


Network Usage and Public Funding


Excess capacity (or overprovisioning) has been subsidized heavily--directly or
indirectly--through public funding. Providing network services at a zero usage
price probably made sense during the research, development, and deployment
phases of the Internet. However, as the network matures and becomes widely
used by commercial interests, it is harder to rationalize. Why should
data-network usage be free even to universities, when telephone and postal
usage are not? (Many university employees routinely use e-mail rather than the
phone to communicate with friends and family at other Internet-connected
sites. Likewise, a service is now being offered to transmit faxes between
cities over the Internet for free, then paying only the local phone-call
charges to deliver them to the intended fax machine.)
Indeed, Congress has required that the federally developed, gigabit-network
technology must accommodate usage accounting and pricing. Furthermore, because
the NSF will no longer provide backbone services, the general-purpose public
network will be left to commercial and state-agency providers. As the net
becomes increasingly privatized, competitive forces may necessitate the use of
more-efficient allocation mechanisms. So there are both public and private
pressures for serious consideration of pricing. The trick is to design a
pricing system that minimizes transactions costs.


Pros and Cons of Pricing


Standard economic theory suggests that prices should be matched to costs.
There are three main elements of network costs: connecting to the net,
providing additional network capacity, and congestion. Once capacity is in
place, direct-usage cost is negligible, and is almost surely not worth
charging for by itself, given the accounting and billing costs.
Charging for connections is conceptually straightforward: A connection
requires a line, a router, and some labor effort. The line and the router are
reversible investments and can reasonably be charged for on an annual lease
basis (though many organizations buy their own routers). This, essentially, is
the current scheme for Internet connection fees.
Charging for incremental capacity requires usage information. Ideally, we need
a measure of the organization's demand during the expected peak period of
usage over some period of time, to determine its share of the
incremental-capacity requirement. A reasonable approximation might be to
charge a premium price for usage during predetermined peak periods, as is
routinely done for electricity. However, casual evidence suggests that
peak-demand periods are much less predictable for the Internet than for other
utility services. One reason is that it is very easy to schedule some
activities for off-peak hours, leading to a shifting-peaks problem. (The
single, largest current use of network capacity is file transfer, much of
which is distribution of files from central to local archives. Just as some
fax machines allow faxes to be transmitted at off-peak times, large data files
could easily be transferred at off-peak times--if users had appropriate
incentives to adopt such practices.) In addition, so much traffic traverses
long distances around the globe that time-zone differences are important. 


Pricing Congestion



When the network is near capacity, a user's incremental packet imposes costs
on other users in the form of delay or dropped packets. Our scheme for
internalizing this cost is to impose a congestion price on usage that is
determined by a real-time auction, or "smart market."
The basic idea is simple. Much of the time the network is uncongested, and the
price for usage should be zero. When the network is congested, packets are
queued and delayed. The current queuing scheme is FIFO. We propose instead
that packets should be prioritized based on the value that the user puts on
getting the packet through quickly. Each user would assign his or her packets
a bid that measures willingness-to-pay for immediate servicing. At congested
routers, packets would be prioritized based on willingness-to-pay. The packets
with the highest bids would be admitted first. If the router can handle all
the packets arriving in a given time slice, then there is no congestion and no
reason to charge any packets for access. However, if the router reaches
capacity, only packets with bids higher than some cutoff value would be
admitted to the network. Each admitted packet is then charged a price of
admission equal to the cutoff value--which is guaranteed to be lower than any
admitted packet's bid. It can be shown that this pricing system provides the
right incentives to the users to reveal their true priorities.
This scheme has a number of nice features. In particular, not only do those
with the highest cost of delay get served first, but the prices also send the
right signals for capacity expansion in a competitive market for network
services. If all of the congestion revenues are reinvested in new capacity,
then capacity will be expanded to the point where its marginal value is equal
to its marginal cost.
Prices in a real-world smart market cannot be updated continuously. The
efficient price is determined by comparing a list of user bids to the
available capacity and determining the cutoff price. In fact, packets arrive
not all at once, but over time. It would be necessary to clear the market
periodically based on a time-slice of bids. The efficiency of this scheme,
then, depends on how costly it is to frequently clear the market and on how
persistent the periods of congestion are. If congestion is exceedingly
transient, the state of congestion may have changed by the time the market
price is updated.
Some network specialists have suggested that many customers--particularly
not-for-profit agencies and schools--will object because they will not know in
advance how much network utilization will cost them. We believe that this
argument is partially a red herring, since the user's bid always controls the
maximum network-usage costs. Indeed, since we expect a zero congestion price
for most traffic, it should be possible for most users to avoid ever paying a
usage charge by simply setting all packet bids to zero. (Since most users are
willing to tolerate some delay for e-mail, file transfer, and so forth, most
traffic should be able to go through with acceptable delays at a zero
congestion price. Time-critical traffic will typically pay a price.) When the
network is congested enough to have a positive congestion price, these users
will pay the cost in units of delay rather than cash, as they do today.
We also expect that in a competitive market for network services, fluctuating
congestion prices would usually be a wholesale phenomenon, and that
intermediaries would repackage the services and offer them at a guaranteed
price to end users. Essentially, this would create a futures market for
network services.
Problems with auctions must also be solved. Our proposal specifies a single
network entry point with auctioned access. In practice, networks have multiple
gateways, each subject to differing states of congestion. Should a smart
market be located in a single, central hub, with current prices continuously
transmitted to the many gateways? Or should a set of simultaneous auctions
operate at each gateway? How much coordination should there be between the
separate auctions? These problems need not only theoretical models, but also
empirical work to determine the optimal rate of market-clearing and
interauction information sharing, given the costs and delays of real-time
communication.
Another serious problem for almost any usage-pricing scheme is how to
correctly determine whether the sender or receiver should be billed. With
telephone calls, it is clear that, in most cases, the caller should pay.
However, in a packet network, both sides originate their own packets, and in a
connectionless network there is no mechanism for identifying which of party
B's packets were solicited as responses to a session initiated by party A.
Consider a simple example: A major use of the Internet is file retrieval from
public archives. If the originator of each packet were charged for that
packet's congestion cost, then the providers of free public goods (the file
archives) would pay nearly all of the congestion charges induced by a user's
file request. (Public file servers in Chile and New Zealand already face this
problem: Any packets they send in response to requests from foreign hosts are
charged by the network. Network administrators in New Zealand are concerned
that this blind charging scheme is stifling the production of
public-information goods. For now, those public archives that do exist have a
sign-on notice pleading with international users to be considerate of the
costs they are imposing on the archive providers.) Either the public-archive
provider would need a billing mechanism to charge requesters for the (ex post)
congestion charges, or the network would need to be engineered to bill the
correct party. In principle, this problem can be solved by schemes like 800 or
900 numbers and collect phone calls, but the added complexity in a packetized
network may be too costly.
Consider the average cost of the current NSFNET: about $106 per month, for
about 42,000x106 packets per month. This implies a cost per packet (around 200
bytes) of about 1/420 cents. If there are 20 million users of the NSFNET
backbone (10 per host computer), then full cost recovery of the NSFNET subsidy
would imply an average monthly bill of about $0.05 per person. If we accept
the estimate that the total cost of the U.S. portion of the Internet is about
ten times the NSFNET subsidy, we come up with 50 cents per person per month
for full cost recovery. The revenue from congestion fees would presumably be
significantly less than this amount. (If revenue from congestion fees exceed
the cost of the network, it would be profitable to expand the size of the
network.)
The average cost of the Internet is so small today because the technology is
so efficient: The packet-switching technology allows for very cost-effective
use of existing lines and switches. A video e-mail message could easily use
104 more bits than a plaintext ASCII e-mail with the "same" information
content, and providing this amount of incremental bandwidth could be quite
expensive. Well-designed congestion prices would not charge everyone the
average cost of this incremental bandwidth, but instead charge those users
whose demands create the congestion and need for additional capacity. 


Pricing Information


Our focus thus far has been on the technology, costs, and pricing of network
transport. However, most of the network value lies not in the transport
itself, but in the value of the information being transported. For the full
potential of the Internet to be realized, it will be necessary to develop
methods to charge for the value of the information services available on the
network.
Vast troves of free, high-quality information (and probably equally large
troves of dreck) are currently available on the Internet. Historically, there
has been a strong base of volunteerism to collect and maintain data, software,
and other information archives. However, as usage explodes, volunteer
providers are learning that they need revenues to cover their costs. And of
course, careful researchers may be skeptical about the quality of any
information provided for free.
Charging for information resources is quite a difficult problem. A service
like CompuServe charges customers by establishing a billing account. This
requires that users obtain a password, and that the information provider
implement a sophisticated accounting-and-billing infrastructure. However, one
of the advantages of the Internet is that it is so decentralized: Information
sources are located on thousands of different computers. It would simply be
too costly for every information provider to set up an independent billing
system and give out separate passwords to each of its registered users. Users
could end up with dozens of different authentication mechanisms for different
services.
A deeper problem for pricing information services is that traditional pricing
schemes are not appropriate. Most pricing is based on the measurement of
replications: We pay for each copy of a book, each piece of furniture, and so
forth. This usually works because the high cost of replication generally
prevents us from avoiding payment. If you buy a table we like, we generally
have to go to the manufacturer to buy one for ourselves; we can't just simply
copy yours. With information goods, the pricing-by-replication scheme breaks
down. This has been a major problem for the software industry: Once the sunk
costs of software development are invested, replication costs are essentially
zero. The same is especially true for any form of information transmitted over
the network. Imagine, for example, that copy shops begin to make course packs
available electronically. What is to stop a young entrepreneur from buying one
electronic copy and selling it at a lower price to everyone else in the class?
This is a much greater problem even than that which publishers face from
unauthorized photocopying, since the cost of electronic replication is
essentially zero.
A small body of literature on the economics of copying examines some of these
issues. However, the same network connections that exacerbate the problems of
pricing information goods may also help to solve some of these problems. For
example, Brad Cox describes the idea of superdistribution of information
objects, in which accessing a piece of information automatically sends a
payment to the provider via the network. (See "Superdistribution and
Electronic Objects," DDJ, October 1992.) 


Electronic Commerce and the Internet


Some companies have already begun to advertise and sell products and services
over the Internet. Home shopping is expected to be a major application for
future integrated-services networks that transport sound and video. Electronic
commerce could substantially increase productivity by reducing the time and
other transaction costs inherent in commerce, much as mail-order shopping has
already begun to do. One important requirement for a complete
electronic-commerce economy is an acceptable form of electronic payment. (In
our work on pricing for network transport, we have found that some form of
secure electronic currency is necessary for the transaction costs of
accounting and billing to be low enough to justify usage pricing.)
Bank debit cards and automatic-teller cards work because they have reliable
authentication procedures based on both a physical device and knowledge of a
private code. Digital currency over the network is more difficult because it
is not possible to install physical devices and protect them from tampering on
every workstation. (Traditional credit cards are unlikely to receive wide use
over a data network, though there is some use currently. It is very easy to
set up an untraceable computer account to fraudulently collect credit-card
numbers; fraudulent telephone mail-order operations are more difficult to
arrange.) Therefore, authentication and authorization will most likely be
based solely on the use of private codes. Another objective is anonymity, so
individual buying histories cannot be collected and sold to marketing agencies
(or Senate confirmation committees).
A number of recent computer-science papers have proposed protocols for digital
cash, checks, and credit. Each of these has some desirable features, yet none
has been widely implemented thus far. The seminal paper "Security Without
Identification: Transaction Systems to Make Big Brother Obsolete," by D. Chaum
(Communications of the ACM 28[10], 1985) proposed an anonymous form of digital
cash that requires a single, central bank to electronically verify the
authenticity of each "coin." In their paper, "Netcash: A Design for Practical
Electronic Currency on the Internet" (Proceedings of the First ACM Conference
on Computer and Communications Security, ACM Press, 1993), Medvinsky and
Neuman propose a form of digital check that is not completely anonymous, but
is much more workable for widespread commerce with multiple banks. Similarly,
Low, Maxemchuk, and Paul suggest a protocol for anonymous credit cards in
their paper "Anonymous Credit Cards" (AT&T Bell Laboratories Technical Report,
1994, available at ftp://research.att.com/dist/anoncc/anoncc.ps.Z).


Regulatory Issues


The growth of data networks like the Internet is increasingly important
motivation for regulatory reform of telecommunications. A primary principle of
the current regulatory structure, for example, is that local phone service is
a natural monopoly, and thus must be regulated. However, local phone companies
face ever-increasing competition from data-network services. For example, the
fastest-growing component of telephone demand has been for fax transmission,
but fax technology is better suited to packet-switching networks than to voice
networks, and faxes are increasingly transmitted over the Internet. As
integrated-services networks emerge, they will provide an alternative for
voice calls and video conferencing, as well. This "bypass" is already
occurring in the advanced, private networks that many corporations, such as
General Electric, are building.
As a result, the trend seems to be toward removing the barriers against
cross-ownership of local phone and cable-TV companies. The RBOCs have filed a
motion to remove the remaining restrictions of the Modified Final Judgment
that created them (with the 1984 breakup of AT&T). The White House, Congress,
and the FCC are all developing new models of regulation, with a strong bias
towards deregulation.
Internet transport itself is currently unregulated. This is consistent with
the principle that common carriers are natural monopolies, and must be
regulated, but services provided over those common carriers need not be.
However, this principle has never been consistently applied to phone
companies: Services provided over phone lines are regulated. Many
public-interest groups are now arguing for similar regulatory requirements for
the Internet.
One issue is universal access--the assurance of basic service for all citizens
at a very low price. But what is "basic service?" Is it merely a data line, or
a multimedia, integrated-services connection? And in an increasingly
competitive market for communications services, where should the money to
subsidize universal access be raised? High-value uses which traditionally
could be charged premium prices by monopoly providers are increasingly subject
to competition and bypass.
A related question is whether the government should provide some data-network
services as public goods. Some initiatives are already underway. For instance,
the Clinton administration has required that all published government
documents be available in electronic form. Another current debate concerns the
appropriate access subsidy for primary and secondary teachers and students.


The Market Structure of the Information Highway


If different components of local phone and cable-TV networks are deregulated,
what degree of competition is likely? Similar questions arise for data
networks. For example, a number of observers believe that by ceding backbone
transport to commercial providers, the federal government has endorsed
above-cost pricing by a small oligopoly of providers. Looking ahead,
equilibrium market structures may be quite different for the emerging
integrated-services networks than they are for the current specialized
networks.
One interesting question is the interaction between pricing schemes and market
structure. If competing backbones continue to offer only connection pricing,
would an entrepreneur be able to skim off high-value users by charging usage
prices, but offering more-efficient congestion control? Alternatively, would a
flat-rate-connection price provider be able to undercut usage-price providers,
by capturing a large share of low-value-base load customers, who prefer to pay
for congestion with delay rather than cash? The interaction between pricing
and market structure may have important policy implications, because certain
types of pricing may rely on compatibilities between competing networks that
will enable efficient accounting and billing. Thus, compatibility regulation
may be needed, similar to the interconnect rules imposed on RBOCs.


References


Bohn, R., H.W. Braun, K. Claffy, and S. Wolff. "Mitigating the Coming Internet
Crunch: Multiple Service Levels via Precedence." Technical Report, San Diego
Supercomputer Center and NSF, 1993.
Braun, H.W., and K. Claffy. "Network Analysis in Support of Internet Policy
Requirements." Technical Report, San Diego Supercomputer Center, 1993.
Chaum, D. "Security Without Identification: Transaction Systems to Make Big
Brother Obsolete." Communications of the ACM, 28(10), 1985.
Cocchi, R., D. Estin, S. Shenker, and L. Zhang. "A Study of Priority Pricing
in Multiple Service Class Networks." Proceedings of Sigcomm '91 (available
from: ftp:ftp.parc.xerox.com/pub/net-research/pricing-sc.ps).
------. "Pricing in Computer Networks: Motivation, Formulation, and Example."
Technical Report, University of Southern California, 1992.
de Prycker, M. Asynchronous Transfer Mode: Solution for ISDN. New York: Ellis
Horwood, 1991. 
Goffe, W. "Internet Resources for Economists." Technical Report, University of
Southern Mississippi. Journal of Economic Perspectives Symposium. Fall, 1994
(available at gopher:niord.shsu.edu). 
Huberman, B. The Ecology of Computation. New York: North-Holland, 1988.
Low, S., N.F. Maxemchuk, and S. Paul. "Anonymous Credit Cards." Technical
Report, AT&T Bell Laboratories, Murray Hill, NJ, 1994 (available at
ftp://research.att.com/dist/anoncc/anoncc.ps.Z).

MacKie-Mason, J.K., and H. Varian. "Some Economics of the Internet." Technical
Report, University of Michigan, 1993.
------. "Pricing the Internet." in Brian Kahin and James Keller, Public Access
to the Internet. Englewood Cliffs, NJ: Prentice Hall, 1994. 
Markoff, J. "Traffic Jams Already on the Information Highway." New York Times
(November 3, 1993).
Medvinsky, G., and B.C. Neuman. "Netcash: A Design for Practical Electronic
Currency on the Internet." Proceedings of the First ACM Conference on Computer
and Communications Security. New York: ACM Press, 1993 (available at
ftp://gopher.econ.lsa.umich.edu/pub/Archive/netcash.ps.Z).
Partridge, C. Gigabit Networking. Reading, MA: Addison-Wesley, 1993.
Shenker, S. "Service Models and Pricing Policies for an Integrated Services
Internet." Technical Report, Palo Alto Research Center, Xerox Corp., 1993.
Tanenbaum, A.S. Computer Networks. Englewood Cliffs, NJ: Prentice Hall, 1989.























































Special Issue, 1994
E-Mail Security


Maintaining privacy in a world of public data transfer




Bruce Schneier


Bruce is president of Counterpane Systems, a cryptography and data-security
consulting company. He is the author of Applied Cryptography and E-Mail
Privacy, both published by John Wiley & Sons. Bruce is also a contributing
editor to Dr. Dobb's Journal and can be reached at schneier@chinet.com.


The world of electronic mail is the world of postcards. Messages travel from
machine to machine in the open, just like the messages on the backs of
postcards. These messages can easily be read, altered, forged, or
deleted--without anyone's knowledge. 
Cryptography provides an easy and effective solution to these problems, even
though few people take advantage of it. 
Imagine that Alice sends a message to Bob over the Internet. The message flows
through the system, going from one machine to another. When a machine gets the
message, the machine reads the header, figures out if the message is for
anyone who has an account on that machine, and then sends it off to another
machine if it isn't. The other machine does the same, and so on. Eventually
the message reaches the correct machine and is placed in Bob's electronic-mail
file. The next time Bob logs in, he reads the message.
In reality, electronic mail doesn't bounce randomly from one machine to
another, hoping to find its destination. The different computers on the
Internet have routing tables. If a computer receives a piece of electronic
mail that is for someone on another computer, it knows enough to look up that
computer on the routing table and to send it in the general direction of that
computer. Even so, look at the routing information next time you receive a
piece of e-mail; it probably passed through quite a few intermediaries between
its source and its destination.
Any of these intermediaries can easily read Alice's mail. Imagine that Eve, an
eavesdropper, is sitting at one of these intermediary machines on the
Internet. She can be the system administrator, a clever hacker, or, if the
security on the machine is poor enough, a regular user. In any case, the world
is an open book to her. She can sit at her terminal and see every
electronic-mail message that passes through the machine, no matter who it is
addressed to. She can print a message out and show it to her friends. She can
send it to the New York Times. Alice and Bob have no control over who reads
their mail in transit. It doesn't matter if their mail is marked
"confidential," if their computers are in locked rooms, or if they both have
been subjected to rigorous psychological screening and have been selected for
their discretion. By sending a message over the Internet to Bob, Alice is
trusting the security of every machine the message will pass through--without
even knowing which machines they will be. The only real security they have is
the honesty of those machines.


Envelopes for E-Mail


If e-mail messages are like postcards, what we want are letters in envelopes.
Like electronic-mail messages, letters are routed through a network. Alice
drops a letter in a mailbox and postal workers send it via a variety of post
offices and transport vehicles to Bob's mailbox. A dozen different people
might handle a letter as it travels through the system, but none of them can
read the letter. The envelope protects it.
You can mirror this process with cryptography, using strong encryption as an
"envelope." By encrypting her electronic mail so that only Bob can read it,
Alice ensures that Eve cannot--even if Eve intercepts it in transit. The
addition of digital signatures to the electronic mail lets Bob verify that
Alice sent the message and that it was not altered in transit.


Who Wants to Read Your Mail?


Anyone who wants to can read your e-mail, remote login sessions, ftp
downloads, real-time conversations, and anything else you do on the net. But
who would want to? The answer depends on who you are, what you are doing, and
who may be interested in it.
The military-intelligence organizations of major governments are the most
sophisticated of potential eavesdroppers. Reading people's mail is their
business. Since the beginning of the Cold War, intelligence organizations have
spent fortunes collecting, compiling, and analyzing intelligence data on each
other. Just because the Cold War is over, don't think that these organizations
are not still at it. Computer transmissions are just a part of that overall
collection effort, but it is an important part.
This is not to say that if you are not involved with the government, you are
safe from military-intelligence collection efforts. The lines between military
and corporate espionage are fuzzy; many commercial technologies have military
applications. Several countries routinely target foreign companies for spying,
and think nothing of passing the information they collect on to companies in
their own country. France and Japan are the most well-known offenders, but
there are undoubtedly others. The NSA has been accused of, in at least one
instance, intercepting a telephone call between two European countries and
passing on marketing information to a U.S. competitor. As the post-Cold-War
world continues to evolve, large military-intelligence communities need new
reasons to justify their existence. So industrial spying by
military-intelligence organizations is likely to increase.
Governments are often interested in spying on their own citizens, as well.
This is certainly true in totalitarian regimes such as China, North Korea, and
Cuba, but it exists in other countries, too. The government of France
prohibits encryption on civilian communications circuits unless a copy of the
encryption key and algorithm is given to the authorities. Both the governments
of Taiwan and South Korea have been known to request that companies remove
encryption from voice, data, and facsimile telephone connections. Even the
United States has a long, sordid, history of conducting illegal wiretaps. Any
organization that would, without a court order, tap the telephones of Martin
Luther King, Jr. could easily justify reading its citizens' electronic-mail
messages. And since electronic mail doesn't yet have the same Constitutional
protection as paper mail, it's easier.
Several U.S. government organizations might be interested in reading private
e-mail. The FBI might be looking for criminals, people starting fringe
political parties, people who don't floss regularly, or other unsavory
characters. Pornographers are particularly popular targets. The DEA might be
looking for drug dealers. (The "war on drugs" has at times been used as an
excuse for questionable law-enforcement ideas.) The IRS might be looking for
tax cheats. There's also the Treasury Department, the BATF, the CIA: If you're
doing something even remotely mysterious, somewhere in the bowels of
Washington there is a government acronym that wants to know about it.
Businesses might use espionage against rival companies. They could be
interested in customer lists, employee directories, marketing plans, financial
data--almost anything. Coca-Cola might pay dearly to know Pepsi's new
advertising plan; Ford might be similarly interested in the designs for next
year's Chevrolet models. Stockbrokers might be very interested in data about a
company that may eventually affect its stock price. A salesman might be very
interested in the customer database of a rival salesman, perhaps even a rival
salesman in the same company.
Investigative reporters might be interested in private e-mail conversations
between public individuals: politicians, corporate leaders, entertainers, and
other public citizens. Remember when the Washington D.C.'s City Paper
collected and published data on Supreme Court nominee Robert Bork's
videotape-rental records? Or when Prince Charles's telephone conversations
made it into the British tabloids? What about when reporters broke into Tonya
Harding's electronic mailbox? Although there have not yet been any public
instance of reporters actually going so far as publishing someone's
electronic-mail messages, it is bound to happen. How would a Senate candidate
feel if his college-era postings to alt.beer.belch were published?
Criminals can get valuable data from electronic mail, as well. Police have
long known that people monitor cellular phone channels, listening for
credit-card numbers. There's no reason why they can't look for the same thing
amongst the electronic-mail messages moving back and forth across the
networks. Some companies are already opening up shop on the Internet, offering
various consumer goods for sale by credit card. It would be easy to set up an
automatic program to scan the mail feed for credit-card numbers. If commerce
on the Internet ever takes off, this practice would become widespread.
And finally, colleagues, friends, and family are possible spies. These are not
sophisticated spies, but they may very well be the most interested. Some
companies explicitly reserve the right to read all electronic mail sent by
their employees, whether work-related or personal. A worker in an office might
be very interested in the personal electronic-mail correspondence of a
coworker, for no other reason than nosiness. A family member might be carrying
on an illicit love affair. E-mail messages have already shown up in divorce
court.


The Collection Problem


The biggest problem in reading someone's e-mail is finding it amongst the sea
of other electronic-mail messages. It's a small needle inside an enormous
haystack.
One of the National Security Agency's jobs, for instance, is to monitor
computer data flowing into and out of the United States, as well as data
flowing between other countries. This is a task of Herculean proportions. At
least a gigabyte of computer data flows in and out of our borders every day.
This includes e-mail, Internet newsgroups, remote logins, ftp downloads,
real-time "chat" conversations, and everything else. Storing the data on
computer tape is a massive problem, let alone reading and analyzing it.
The NSA uses computers to sift through the data in real time, looking for
interesting information. Maybe the computers look for certain key words. An
electronic-mail message containing the words "nuclear," "cryptography," or
"presidential assassination" might be stored on tape for further analysis.
They might look for data from particular people, or from particular
organizations. They might look for data with a particular structure. They
might have artificial-intelligence software that does things I can't even
comprehend. The NSA has a lot of money to throw at this problem, and they've
been working on it for a long time.
The important point is that they have to do this in real time. There is just
too much data to save. The best they can hope for is to collect as much data
in a day that can be analyzed in a day. They can't collect any more, because
more is coming the next day. Data collection is like a never-ending treadmill;
if they fall behind, they will never catch up.
Collection is useless without analysis, and that is a far more complicated
process. Unless the NSA has more advanced computing resources than I can
imagine, they need people for this part. People have to read those
"interesting" electronic-mail messages to determine if they are really
interesting. Maybe the message with the words "nuclear" and "assassination"
was really about a science-fiction movie. For a while it was popular to add a
string of interesting words at the end of all messages, just to frustrate
these collection efforts. Maybe the mention of blowing up the UN was
frivolous, and maybe it was a message from one terrorist to another,
discussing their plans. Maybe that message from an American high-tech company
was innocuous, and maybe it was a foreign spy passing information back home.


Encryption as a Defense


Encryption makes the NSA's job difficult on several fronts. The most obvious
is that they cannot read the various e-mail messages. This is only true if the
encryption method is secure enough that the NSA can't break it. If the NSA can
break the encryption method, then it is just a matter of allocating the
resources necessary to break it.
However, this is only really true if encryption is not widespread. Remember
the collection problem. There is an enormous amount of data flowing through
the Internet every day; far too much to examine it all. The NSA's
interesting-stuff checkers could easily collect encrypted messages and then
route them to another computer program for further analysis, but this is only
feasible if a small percentage of messages are encrypted. If 80 percent of all
e-mail traffic, ftp downloads, remote login sessions, and so on are routinely
encrypted, the NSA's computers will not have the time to break them--even if
they are all breakable.

And even worse, the interesting-stuff checkers have a much harder time
deciding which messages to ignore and which are worth breaking. If only a few
messages a day are encrypted, then those are obviously interesting messages.
If everyone routinely encrypts their messages--even people chatting with their
friends--the NSA can't tell the interesting encrypted messages from the
innocuous encrypted messages. There's too much data flowing through the
network, and not enough computing power to bring to bear on the problem.
Encryption, even poor encryption, can quickly make the collection problem
intractable.


Traffic Analysis


The NSA isn't out of work yet. Even if they can't read your e-mail, it can
collect some pretty impressive data on you through traffic analysis.
Traffic analysis is the analysis of who you send electronic mail to, who you
receive electronic mail from, how long those electronic-mail messages are, and
when they are sent. There's a lot of good information buried in that data if
you know where to look.
Most European countries don't have itemized telephone bills. European
telephone bills list how many "message units" were used from a particular
phone, but not where and when these message units were used. American
telephone bills list every long distance call made from the telephone number:
date and time, number called, and duration of the call. The American system
makes it easier to spot errors and to catch your children making hundreds of
calls to 1-900-HI-SANTA, but it also allows the telephone company to learn
information about your calling patterns. Do you make a lot of long-distance
calls to Montana? Then maybe you are interested in these Montana vacation
packages? Do you order from catalogues frequently? Then you should be on this
mailing list. Do you call the Suicide Prevention Hotline regularly? Then maybe
a prospective employer should hire someone else.
During World War II, the Germans used detailed calling records to round up the
friends of suspected enemies of the state. Many European countries believe it
is worth the loss of a detailed telephone bill to prevent that from ever
happening again.
E-mail messages can yield the same information. Even if the message is
encrypted, the header clearly states who the message is from, who it is to,
when it was sent, and how long it is. There are anonymous remailing services
on the Internet that purport to hide who a message is from, but while services
such as these may prevent the average Internet user from knowing where a
particular piece of electronic mail came from, they'll likely not fool a
sophisticated eavesdropper such as the NSA.
Imagine that Eve is interested in Alice, a suspected terrorist. Alice encrypts
all her electronic mail, so Eve can't read the contents of her messages.
However, Eve collects all the information she can on Alice's traffic patterns.
Eve knows the e-mail addresses of everyone Alice regularly corresponds with.
She often sends long messages to someone called Bob, who always immediately
responds with a very short message. Perhaps she is sending him orders, and he
is confirming receipt of those orders. One day there is a big jump in the
volume of electronic mail traffic between Alice and her correspondents.
Perhaps they are planning something. Then, there is silence. No mail flows
between Alice and her correspondents. The next day a government building is
bombed. Is this enough evidence to arrest the whole bunch of them? Perhaps not
in the United States, but certainly in countries with weaker concepts of
personal freedom.
Terrorists are not the only ones who fear traffic analysis. Would the FBI
start investigating people for drug use simply because they corresponded over
electronic mail with a convicted--or even just a suspected--drug dealer? Would
a company, after receiving information that an employee is regularly
corresponding with an electronic-mail address in a competitor's offices, have
grounds to fire that employee? What would a jealous person think after
learning that his or her spouse was corresponding regularly with a potential
rival? Traffic analysis is an important intelligence tool, and its
implications on personal privacy are significant.


Spoofing 


Spoofing is one person impersonating another. Whether the impersonation is
intended as a joke, a means to discredit or disgrace, or a means to defraud,
it is a problem.
Every month or so a particular type of message is posted to a variety of
newsgroups on the Internet. The message might have a header that reads "I am a
child molester and I'm proud of it," or maybe a racist slogan; the text isn't
any better. Then, anywhere from ten minutes to a day later, there is another
message from the same person apologizing for the first message. "It was a
forgery," the second posting insists, "don't believe any of it."
This may be true, but damage is already done. People see the first message
and, not knowing that it is a forgery, believe the purported sender to be
whatever the message claims him to be. They reply angrily. They write a
scathing letter to his system administrator demanding he be removed from the
network. They report him to the police, or to some political-action group. If
they know him, they may avoid his presence. They may even further damage his
reputation by spreading the story to even more people. (This is particularly
damaging, because those other people are even less likely to see the
retraction.)
Maybe Eve wants to smear Alice. She writes an incriminating e-mail message,
puts Alice's name on the bottom, forges Alice's header on top (it's not hard
for a skilled hacker to fake a message header), and sends it to a public
forum. Then she sends a copy to the print media.
There are other, less overt, ways to do damage by impersonating someone else.
Imagine that Alice and Bob are collaborating via electronic mail on some
project. Eve, purporting to be Bob, sends a message to Alice. In it, "Bob"
claims that he has moved, and that this is his new electronic-mail address.
Alice doesn't know better and changes her address directory. Now Eve can
correspond with Alice, pretending to be Bob. If Eve is really clever, she can
simultaneously convince Bob that she is Alice. Then, Eve can have
conversations with both of them, passing messages through most of the time and
only changing them on occasion. Eve can thwart whatever project Alice and Bob
are working on through judicious use of misinformation. If Alice and Bob don't
communicate face to face or over the telephone regularly, Eve can keep this
ruse up for a long time.
Eve also could get an account in the name of a known, but not too well-known,
reporter. She could promise Alice publicity in exchange for some information.
Alice trusts the name on the "From" line of the mail header, and is tricked
into revealing whatever information Eve wants.
Spoofing can be prevented with something called a "digital signature." Just as
a written signature provides proof of authorship of (or, at least, agreement
with) a physical document, a digital signature provides the same for an
electronic document. With this sort of digital authentication, Alice can
always check to make sure a document is actually from the person it is
purported to be from. No one can send an incriminating posting purporting to
be from Alice. Alice can always check who really sent a piece of electronic
mail she received. And no one can pretend to be Alice to someone else and hope
to get away with it.


Making Security Work


There are several things we can do to keep our digital connections secure. The
simplest and easiest is to regularly use encryption and digital signatures.
This means all the time. Encrypt and sign all of your correspondence, even
when the content doesn't warrant secrecy. To do otherwise only invites
trouble.
Currently, almost no one encrypts their e-mail because doing so is a nuisance.
Unfortunately, the side effect is that anyone who does, immediately arouses
suspicions. If everyone regularly uses encryption, then encryption is not
suspicious. No one in the post office stares at a sealed envelope, wondering
what is so private that it can't be written on a post card. If encrypted
electronic mail were the rule, no one would assume an encrypted message has
something to hide.
Likewise, digital signatures should be the rule. Almost no one signs their
digital correspondence, so a digitally signed message is sure to arouse
suspicion: What is so important about this piece of electronic mail that it
has to be signed? If everyone routinely signs their messages, it won't even
warrant a comment.
The way to make security ubiquitous is to make it transparent. It is just as
easy to send a signed and encrypted piece of electronic mail as it is to send
an unsigned and unencrypted piece of mail, then people are more likely to
choose the former. Signed and encrypted correspondence will be commonplace,
yet another example of the triumph of personal privacy over Big Brother
government.
RSA for Encryption and Digital Signatures
Public-key cryptography is based on the idea of a key pair. One key remains
private, and the other one is public. With this tool, you can both encrypt
files and create digital signatures.
If Alice wants to send a message to Bob, she encrypts it with Bob's public
key. The key is public, so she can get it off the net somewhere. Bob, after he
receives the encrypted message, decrypts it with his private key. The key is
private, so only he can decrypt it.
If Alice wants to sign a message, she encrypts it with her private key. Bob,
or anyone else, can verify the signature by decrypting the message with
Alice's public key. The key is public, so anyone can verify the signature. But
Alice's private key is private, so only she can sign messages.
The reason this kind of cryptography is so useful for e-mail security is that
Alice and Bob do not have to meet somewhere secret and exchange keys. If they
were using a conventional algorithm, they would have to agree on a secret key
before they could communicate securely. With public-key cryptography, all
communication can be out in the open over an insecure channel.
RSA is the most common public-key algorithm. Its security is based on the
difficulty of factoring large numbers. To generate a public-key/private-key
pair, first choose two large (500 bits or more) prime numbers, p and q. Then,
n=p*q. Choose e such that it has no factors in common with (p--1)*(q--1). Then
compute d such that d=e--1 mod ((p--1)*(q--1)). The public key is d and n; the
private key is e and n. Destroy p and q, and do not reveal them to anyone.
To encrypt a message m, compute c=me mod n. To decrypt, compute m=cd mod n. To
sign a message m, compute c=md mod n. To verify the message, compute m=ce mod
n.
There are more subtleties to this, and a complete system is quite a bit more
complicated. Any modern book on cryptography covers this in more detail.
--B.S.



















Special Issue, 1994
MUD Games on the Internet


Multiplayer entertainment across the net




Dennis Cronin


Dennis, a.k.a. AmrekTrentrak, works for Central Data Corp. developing UNIX
drivers for scsiTerminal Servers. He can be reached at denny@cd.com.


Notable among the wonders of the Internet are the games--not the single-player
computer games that you download to your PC, but the multiuser ones you play
online with real, live people from all over the world. Some are highly
competitive, some are more social in nature, and some are a little of both.
These games are commonly referred to as "MUDs," short for "Multiple User
Dungeons" since many of the prototypical games were (and are) based on
Dungeons and Dragons (D&D) type themes. Within the general category of MUDs
there are a number of common and popular variants, many of which bear no
relation whatsoever to their D&D precursors. Nevertheless, I'll continue to
use the term "MUD" generically.
Although played on computers, with very few exceptions these games are not the
flashy, graphics-oriented, animation-intensive games you might expect.
Instead, they are ASCII-text-based games which don't even require any
particular screen-formatting capabilities. The text simply just scrolls by.
At first you might wonder how these games could possibly be fun. What is a
computer game without bitmapped graphics? Without explosions and custom
controllers? Surprisingly, these games can be extremely engaging, with the
entertainment value derived more from multiple-player interaction than from
what the computer itself is doing. In these cases, the computer merely serves
as the vehicle, or medium.
In this article, I'll survey some popular types of MUDs and provide background
history on their development. I'll tell you how to connect to MUDs and how to
get started. Finally, I'll discuss how MUDs work and give you a few pointers
to MUD code bases and online resources should you decide to run or write your
own MUD.


What is a MUD?


A MUD is a multiplayer game program that usually runs on a UNIX-based computer
and talks directly to the Internet. Since it is connected directly to the
Internet, anyone from anywhere in the world with Internet access can connect
to this game and spend as long as they want playing for free. This immediately
distinguishes MUDs from BBS-based games, where users must play within their
local area or else be prepared to fork over the big bucks to the phone-line
providers.
Traditionally, these games are also completely noncommercial and are run and
administrated entirely by volunteers. They may be running on hardware and
Internet accounts belonging to commercial concerns or educational
institutions, but no fee is extracted, and no service guarantees are provided.
They are generally provided strictly as a labor of love by your fellow
"netizens."
Interaction. As a user on a MUD, you may play individually, partner up, or
even join groups. By cooperating with one or more other players, you may be
able to accomplish more within the construct of the game than you could
operating on your own.
While playing, you can "talk" with other users (via typing, of course). People
who are continents apart can gather in the same virtual MUD room and chat
away. Particularly on MUDs based overseas, users from half a dozen or more
countries may be on at the same time!
Ranking Systems. MUDs often rank players according to experience level. The
names of the ranks are often indicative of that MUD's culture. When you first
start playing a MUD, expect to be assigned an initial rank like "Apprentice,"
"Novice," or even "Clueless Newbie." As you progress, you'll achieve
more-impressive (or at least less-demeaning) titles.
The highest-ranking characters are, of course, those who run or administer the
MUD. Often called "Gods," they can modify game code, change the database, and
otherwise shape the reality of the MUD. Below them there's usually a "Wizard"
or "ArchWizard." To earn this rank, you must have effectively "won" the game
and possibly received a special promotion from the Gods. At this level, your
character is considered "immortal" and is allowed nearly god-like powers on
the MUD. Gods are often happy to unload some of the busy work of running the
MUDs onto these highly experienced regular players.
Playing a MUD. Actual game play on a combat MUD usually involves killing
monsters called "mobiles" (or "mobs" for short), exploring, collecting
valuable or useful items, and solving quests or puzzles. Killing other players
is usually strictly forbidden, and is dealt with very harshly. But some MUDs
do permit player killing (PK) with certain restrictions, or during certain
free-for-all time periods. Definitely check the rules of the MUD you're on
before you decide to whack some other player up the side of the head with the
ElvenSword (or even the king of all swords, the ElvisSword).
A more socially oriented MUD might involve nothing more than just hanging out,
talking, and occasionally playing funny tricks on each other. Or you might be
able to build your own rooms or regions, and furnish these areas with gadgets
or little puzzles of your own.
While the specifics of play vary from MUD to MUD, they all encourage user
interaction and communication. It is this that makes them so much more
compelling than their older, single-user UNIX relatives. Let's take a quick
look at the foundations of the multiuser-game experience and at where it's
going.


The MUD Family Tree


Computer games have supported multiple players interacting in various fashions
since at least the mid-1970s (see Figure 1 for details of the MUD family
tree). A multiplayer game called "Oubliette" featuring dungeons, levels, and a
chat area was in heavy use on the Plato system at the University of Illinois
in 1978. This was quickly followed by an improved version called "Avatar" that
enjoyed enormous popularity during the early '80s and is still going on today.
Around 1979, a game called simply "MUD" appeared in the UK. It, too, was based
on a D&D theme. It is probably this game that spawned the most direct and
indirect MUD progeny in the form of the "AberMUDs" and their descendants.
AberMUDs--named for the University of Aberystwyth, where they were
written--were the first MUDs to become popular on a grand scale. They were
highly portable to different platforms, hence their rapid spread throughout
Europe. Trans-Atlantic links weren't as reliable and fast back then as they
are now, so these didn't catch on right away in the US. Around 1988, AberMUDs
started appearing at American universities. Shortly thereafter the first
TinyMUDs appeared. These MUDs are more socially oriented and noncombative; the
"Tiny" part of the name reflects the fact that the base code is much smaller
than that of the AberMUDs.
At this point, MUDs came to be divided into two broad categories: the
so-called "Hack'n'Slash" combat MUDs, and the less competitive social flavors.
And things began to happen fast. MUD variants began sprouting up all over.
Today, on the combat side, the LPMUDs are currently the most numerous, but the
DikuMUDs, an LP-variant, are coming on strong. Both of these types draw from
AberMUDs but add multiple classes (or "races") of players and typically
feature more sophisticated combat systems. On the social side, there is
probably more variety of both games and themes, and it's less clear which form
is the most popular. Probably, it would be the MUSHes--"Multi-User Shared
Hallucinations." 
Much MUD development has been evolutionary rather than revolutionary, simply
because it takes so long to write a full-featured MUD. Most people wanting to
open a new MUD prefer to start with an existing source base and modify it to
suit their tastes.
It also takes a while for a major new variant to catch on. Becoming proficient
at a particular type of MUD often requires many hours of practice and
exploration. Players then tend to continue to play games of the same family to
leverage some of their hard-earned experience.


Connecting to a MUD


To connect to a MUD, you must have Internet access. Then you need to know the
MUD's Internet address and its port number. The Internet address may either be
in symbolic or raw numeric form; the port number is usually a four-digit
number such as "2222" or "4000." Then you simply invoke telnet, specifying the
address and port number. For example, to access PrairieMUD, a traditional
AberMUD, you type telnet 192.17.3.3 6715 or telnet firefly.prairienet.org
6715. 
Most MUDs then give you a sign-on screen where you select a character name and
choose a password. The name you choose will, of course, affect everyone else's
perception of you on the MUD, so choose carefully. Many MUDs have a rather
pronounced culture. Selecting a name like "Neutron," while acceptable on a
space-theme MUD, might not look so good on a medieval D&D MUD.
Most MUDs allow you to just sign right on and play, but some enforce a
registration process. This enables their administrators to restrict access to
certain folks who have demonstrated a tendency towards bad manners or
antisocial behavior. To join in these MUDs, you will need to first send e-mail
requesting a particular character name and arranging for a password.
When you initially connect to a MUD, you will be a "newbie." Most MUDs provide
learning areas or online information to help you learn something about the
game. You will want to spend some time reading and learning before you start
to pester the more-experienced players. While many are willing to help
beginners get started, they are much happier to provide aid if you've done
your basic homework and familiarized yourself with the rudiments of their
game. One way to make the learning phase more fun and interesting is to
connect with a friend. You can learn together, and share tips and tricks as
you become more familiar with the game.
Table 1 provides a sampler of MUDs you can try connecting to. There are
hundreds more. You can get their addresses in the rec.games.mud.announce
newsgroup and by watching for the MUD lists occasionally posted there.
As you begin to learn the MUD etiquette appropriate to the particular style of
MUD you are playing, you will definitely want to communicate with the other
players on the MUD. Other players can provide advice, give you equipment, even
help rescue you from sticky situations. And the good-natured joking around
that goes on adds to the fun.
As you interact with people, remember that they are, in fact, people. Don't be
pushy or rude. If someone seems less receptive to your questions, maybe they
are in the middle of doing something themselves and don't want to be bothered.
And while the Wizards and other players on a game may sometimes perform
spontaneous acts of generosity towards you, don't depend on it, and don't go
whining every time you're in a little bit of a sticky situation. Those with
more experience arrived at it the hard way, and they'll expect you to make
your bones the same way. In other words, be patient.



A Note about MUD Addiction


It has been demonstrated that these MUD games can lead to near-addictive
behavior in some personality types. People have lost their jobs or flunked out
of school because they didn't know when to quit out of a MUD. I doubt that
MUDs deserve to be blamed directly for someone's failure to deal with
non-computer-generated reality, but if a person is a little lacking in self
discipline or direction, they can be an easy distraction.
So just a quick admonition: Keep an eye on how much time you spend online
playing these games. That quest will always be there tomorrow. That next level
will still be around. Most importantly, if you find you're walking up to real
doors and saying "open" out loud, it's time to ease up!


How MUDs Work


The program that delivers MUD reality to all who connect is called the
"server." At the heart of a MUD server is the machinery that supports the
connections and communications. This part of a MUD is surprisingly
straightforward, with most MUDs using the same basic approach. It is possible
to have a simple multiuser chat program up and running on the Internet with
less than 100 lines of code--elegantly simple!
The quick tour looks like this. First, establish a socket with the socket(2)
system call. Then associate a port number with the socket using the bind(2)
system call. This is where the MUD gets the last four-digit portion of its
address. Then use the listen(2) system call to listen for connection attempts
to the socket. 
Now you are ready to enter a loop, using the select(2) system call to monitor
activity at the socket. When you detect a new connection, you use the
accept(2) system call to establish a new file descriptor and add it to the
list you are monitoring for activity with the select(2) call. That's all there
is to it. Now you can use read(2) and write(2) on the file descriptors
returned by accept(2), and away you go. For more details and example code you
might want to refer to the text, UNIX Network Programming, by W. Richard
Stevens (Prentice-Hall, 1990). And of course, you could study some of the
existing MUD code to see how other subtleties of communication are handled.
The limitation of the above approach is that eventually you will run into the
system limit on open file descriptors as more users come online. To get around
this, write a front-end multiplexor program that listens for connects and then
funnels traffic for multiple sessions through a single channel to the main MUD
program. MUD servers have handled loads of well over 200 players this way,
although response times get a bit sluggish.
MUD server implementations diverge rapidly from this point. Some MUDs have
very simple string matching and lookup-table-driven command processors; others
use a more grammar-driven parser. Some MUDs load the database into memory for
play, others keep it on disk. A disk-based approach has the disadvantage that
a server crash can leave the main database scrambled, whereas when a
memory-based server goes awry, at least the starting database is left intact.


Client Programs


You can attach to most MUDs and have plenty of fun using just the standard
telnet program. However, special programs called "clients" exist to make your
stay on a MUD more enjoyable and entertaining. They provide features to help
make better sense of the quickly scrolling screens on a busy MUD. They prevent
a line of input you are typing from being mangled on the screen by output from
other players. They can even completely screen out certain types of unwanted
noise from other players. They may provide quick, little, cute actions and
personalized phrases, and even help you navigate.
Some MUDs are beginning to incorporate simple polygon graphics using a system
called "BSX graphics." A dedicated BSX client is required to take advantage of
the added graphics effects provided by these MUDs.
If you are going to spend much time on a MUD, you might ask around and see
what clients other folks on the MUD are using. TinyFugue and LPmudr are a
couple of examples--the MUD ftp sites have many more.


Running Your Own MUD


As you play MUDs, you almost inevitably begin to have your own ideas about how
you think things should be run. This is particularly true in the competitive
games; after you've achieved the highest mortal ranking, you still want more.
To enter the ranks of the Gods and run your own game, you need at least basic
knowledge of the C language. To make major changes to an existing MUD's
functionality or write your own MUD from scratch, you need significant coding
experience.
First, you locate a source base for the type of MUD you are interested in.
Even if you plan to write your own, it's advisable to study some existing code
so you'll know more in advance about what types of problems you might
encounter. Table 2 provides a list of online sources for MUD code and other
resources.
To compile the MUD code, you may need to do a little tool gathering. Many MUD
code bases expect and/or support the GNU C compiler.
I brought down several varied types of MUDs and tried building them on a
SPARCstation running SunOS. All were fairly easy to build, but most required
at least some tweaking. In some cases, more-significant code changes were
necessary to convince the compiler that things were okay. Some MUD code bases
contain several hundred Kbytes of source, which the C neophyte might find a
bit daunting.
Once I got the various MUDs to compile and sorted through setting up
directories, all of them came up and ran just fine (as evidenced by the fact
that productivity on our office network dropped to nil). All in all, it was
pretty easy to set up a MUD from an existing code base. This brings us to the
next problem. The most frequent cry in the MUD newsgroups seems to be "site
needed." Would-be Gods have MUD code running but don't have a direct Internet
connection.
Many are students who have accounts on an Internet machine but are restricted
to rather small disk quotas or explicitly forbidden to run MUDs on the
school's machines. MUDs tend to be big, fairly hefty processes, requiring
large amounts of memory and/or disk space. Some sidestep the
limited-disk-space issue by establishing a CSLIP or PPP link to a home machine
running Linux. A 14.4-Kbaud link and a reasonably equipped PC will support at
least a handful of users.
A better solution, of course, is to find a sympathetic site with a direct
Internet link and offer to provide the hardware and administration. MUDs tend
to become more fun as more users participate; you don't want to have your
response times degrading just when more people start to get interested.
But whatever you do, don't put up a MUD without consulting your system
administrator. You get annoyed when some clueless newbie does something stupid
on a MUD you're playing; don't you go and act like a "clueless newbie" on
somebody else's UNIX system!


Conclusion


MUDs are beautiful in their simplicity of format and universality. They
epitomize the concept of entertainment by the people and for the people. Since
they are not run by big business and no money changes hands, they are free to
be whatever they want to be. There are no commercials.
From these simple, text-based MUDs which currently predominate, we will
certainly see the emergence of games featuring more graphics and live speech
capabilities. But don't wait for the future--get online now. You'll have some
fun, and if nothing else, you'll be amazed at how quickly your typing speed
increases! 


Acknowledgments


Many thanks to Jennifer "Moira" Smith for putting together the great MUD FAQ
(Frequently Asked Questions) for Usenet (required reading for all MUDders).
Also, thanks to all of you who've talked to me on the various MUDs.
Figure 1 MUD family tree.
Table 1: Typical MUDs.
Name Symbolic address Numeric address Port
PrairieMUD firefly.prairienet.org 192.17.3.3 6715
Basic traditional AberMUD based on Dirt 3. 
3-Kingdoms marble.bu.edu 128.197.10.75 5000
Very active LPMUD.
Nuclearwar NuclearWar.Astrakan.HGS.SE 130.238.206.12 23 (opt)

Cyberpunk LPMUD.
FurryMUCK sncils.snc.edu 138.74.0.10 8888
Anthropomorphic theme--
everyone is a furry critter; registration required.
PernMUSH cesium.clock.org 130.43.2.43 4201
Themed role-playing MUSH based on the Anne McCaffrey 
book, Dragonriders of Pern.
Foothills marble.bu.edu 128.197.10.75 2010
Very busy talker.
DeepSeas a.cs.okstate.edu 139.78.9.1 6250
Underwater theme; registration required; be prepared 
for a playful and feisty bunch.
Carrion 
Fields neoteny.eccosys.com 199.100.7.5 9999
Blood and guts LP; player kill; player steal; cabals you 
can join for power and protection.
Table 2: Online MUD sources and resources.
tmi.ccs.neu.edu (129.10.114.86) 5555
LP-building support available through an LPMUD called "TMI-2."
ftp.luth.se (130.240.18.2) in /pub/misc/aber/code
Aber Dirt 3.1 server source, the most popular Aber flavor.
ftp.lysator.liu.se (130.236.254.153) in /pub/lpmud
LPMUD server and client source.
ftp.math.okstate.edu (139.78.10.6) in /pub/muds
MUD FAQ in /pub/muds/misc/mud-faq, also Diku FAQ, clients, 
servers for Merc,
MacMerc, TinyMUD, TeenyMud, CircleMUD, TinyMUCK, TinyMUSH, 
UberMUD, and more.
ftp.tcp.com (128.95.44.29) in /pub/mud
Clients, servers for CoolMUD, Diku, TinyMAGE, UnterMUD, and 
more.
Newsgroups:
alt.mud Redundant MUD newsgroup, superceded by 
 groups below; still has some activity, 
 however.
rec.games.mud.admin Ideas and information forum for those 
 who run MUDs.
rec.games.mud.announce Announcements for new MUDs, changes of 
 address, source-code releases, and so on.
rec.games.mud.diku Discussion related to the Diku family of 
 MUDs.
rec.games.mud.lp Discussion related to the LP family of 
 MUDs.
rec.games.mud.misc MUD topics not covered in more-specific 
 groups.
rec.games.mud.tiny Forum for discussing MUSH, MUSE, MOO, and 
 so on.















Special Issue, 1994
An Online Conferencing System Construction Kit


Bob learns to talk




David Betz


David, the author of XLisp, XScheme, AdvSys, Bob, and many other programming
systems, is a DDJ contributing editor. He can be contacted through the DDJ
offices.


A few years ago, I presented Bob, a language with a C-like syntax and an
object system that looked a bit like C++, in the article, "Your Own Tiny
Object-Oriented Language" (DDJ, September 1991). Bob is a dynamic language
with automatic storage management and an interactive interface. It is written
in ANSI C and has been ported to several platforms, including the Macintosh,
MS-DOS, and UNIX.
Recently, I decided to build an extensible computer-conferencing system, and
Bob looked like a good choice for an embedded language. One advantage of
inventing your own language is that you don't need anyone else's permission to
change it. The basic Bob syntax (see Table 1) seemed like it would work well,
but I needed a new object system. I decided to go with an object system I had
first used in AdvSys, a language designed for writing text adventure games.
(See "Dave's Recycled OO Language," DDJ, October 1993). This became the object
system for the new version of Bob. The complete source code for this
conferencing-enhanced implementation is available electronically; see
"Availability," page 3.


The Bob Object System


If you have spent any time using computer-conferencing systems, you know that
one of the primary types of data objects is the message. A message usually has
an author, a subject, and some text. It may also have a creation date, a
reference to another message if it is a reply, or various other data, all of
which generally takes the form of a list of tags like "subject" or "author"
and values like "Hiking" or "Anne." In fact, a message can be thought of as a
collection of tag/value pairs, the text itself being the value of a tag such
as "text."
This maps well onto the new Bob object model. In Bob, all objects consist of a
collection of properties. A property has a name and a value. An object can
inherit from another object. In this case, the object it inherits from is
called its "class." There really isn't a distinction between classes and
objects in Bob, though. A class is just another object with its own collection
of properties.
The only operations supported on objects are getting and setting property
values. To get the value of a property, you use an expression like
message.subject. If message is a variable whose value is an object, this
expression will get the value of the subject property of that object. If the
object doesn't have a subject property, Bob will look for a subject property
in the class of the object. This search continues either until a property is
found, or until an object is found with no parent class.
Setting a property works in a similar way, as in, message.subject =
"Swimming;". There is a difference, though. When setting the value of a
property, Bob only considers the properties local to the specified object.
There is no search up the inheritance chain. If Bob doesn't find the property
in the specified object, it adds a new property to the object with the
specified value. This allows an object to share values with other objects that
have the same class, but also to override those values when necessary.
A conferencing system isn't much good if all of the messages have to fit in
memory all the time. Bob solves this by providing an object store so that
objects can reside on disk. Updates to the object store are protected by file
locking so that multiple users can access the same store at the same time.
The Bob object store consists of a collection of numbered slots where data can
be stored. Each slot is a variable-length stream of bytes. Figure 1 shows the
Bob functions for accessing an object store. The functions for creating,
opening, and closing object stores are fairly straightforward.
The CreateObject function creates a new slot and returns a stream you can use
to write data into the slot. Once you're done writing the slot data, you use
the Close function to close the stream.
To read from an existing slot, use the OpenObject function. It opens the
specified object and version and returns a stream you can use to read data
from the slot. The version parameter allows you to specify which version of
the object you want to open. Specifying 0 gets the most recent version.
Specifying a negative number gets older versions relative to the most recent
version; for instance, --1 will get the next to last version. Specifying a
positive number gets that version. Version numbers always start with 1.
To create a new version of an object, use the UpdateObject function, which
creates a new version of the specified object and returns a stream you can use
to write the object data just like the stream returned by CreateObject. You
can use ObjectNumber and ObjectVersion to determine the object number or
version number of an object opened by OpenObject, CreateObject, or
UpdateObject.
The cursor functions provide a means of iterating through the slots in an
object store. A cursor points to a particular slot. The slot it points to can
be changed by SetCursorPosition, GetNextObject, and GetPreviousObject. The
current slot can be opened with OpenCurrentObject.
This, in itself, would probably be sufficient to build a conferencing system.
Each conference would be a separate object store. Each message would be stored
in a slot in the object store. The message would consist of a few lines at the
start with the header fields like "author" and "subject" followed by the text
of the message.
As you probably guessed, I called these "object stores" because I intend to
store more than just text in them. Bob provides a facility for writing a
representation of any Bob data type to a stream that can be reconstructed
later. This process is called "flattening" and is recursive. Flattening an
object writes out each property tag and value as well as the class of the
object. Of course, the values of properties can themselves be objects, vectors
or any other Bob data type.
The class of an object is handled a little differently. Calling the flattener
recursively to write out an object's class would cause that class object (and
its class, and so on) to be written out every time an object with that class
was flattened. This would waste disk space and cause problems when the objects
were unflattened, since objects that shared the same class would end up with
separate but identical classes when read back in. This would destroy any data
sharing between objects of the class.
To get around this problem, Bob requires that all objects used as classes
contain a className property. When the flattener writes out an object, it
writes out the value of the className property of its class object instead of
the class object itself. When the object is unflattened, the value of the
global variable with that name is used to initialize the class field of the
unflattened object. This means that every class object should be stored as the
value of the global variable with the same name as its className property
value.
Since any Bob data type can be flattened to a stream and since object-store
slots are streams, any Bob data type, for instance, an object, can be written
to an object-store slot. In fact, several objects can be written to the same
slot.
I've implemented messages as Bob objects with properties for each of the
fields of the message. This makes it very easy to create and manipulate
messages in Bob. For instance, you can create a message using the code in
Example 1(a), and then write this message to an object store using the code in
Example 1(b). Finally, Example 1(c) lets you read it back. The flattening
facility makes it very easy to read and write messages without getting into a
lot of complicated parsing of the header fields.
This may, however, turn out to be too simple an approach to storing messages.
Most messages will have text considerably longer than that in Example 1,
making it impractical to store the text as the value of one of the properties
of the message object. This is where the ability to write more than one
flattened data object into an object-store slot can be handy. You can split
the message into a header part and a text part. The header can be stored as an
object just as I have described, while the text could be stored following the
header object as either a string or as a sequence of bytes, using the PutC
function; see Figure 2. This would allow the header to be read quickly to
display a summary of the message without having to read the potentially large
text part.
Since the flattening functions read and write from streams, you can also write
Bob objects to regular files. This could be used to keep information about
conferences like the moderator, a description, membership lists, and access
restrictions. Files with flattened objects could also be used for user
information like passwords, lists of conferences the user has joined,
addresses, phone numbers, and personal comments.


E-Mail


Most conferencing systems support electronic mail as well as message
databases. This can be done with Bob by using a single object store, the Mail
store, to hold all of the e-mail messages in the system and by using one
object store per user Inbox to hold references to the messages received by
that user. The references are just the object number in the mail object store
of the message received. Why don't I just store the messages themselves in the
user's mailbox? Because a mail message might be addressed to several
recipients. If the message itself is stored in each user's mailbox, the data
for that message will be duplicated for each user. Putting the message in a
common mail store and putting a reference in the user's mailbox solves this
problem and allows all recipients to share the same copy of the message. The
message reference also provides a handy place to keep other information, such
as whether the message has been read or not.


Distributed Conferencing 


It is often desirable to distribute conferences across a number of machines.
This makes it possible to place machines near to where users will be accessing
them so that they are accessible by a local phone call. In order to distribute
a conference across multiple machines, it is necessary to distribute the
messages posted to that conference, as well. The flattening facility of Bob
makes this fairly easy. Since a message (or any object) can be flattened to an
arbitrary output stream, there is no reason it can't be flattened to a
communications link to another computer. On the other end, the unflattener
would reconstruct the object ready to be placed in the parallel conference on
the target machine. This could be done to forward e-mail messages as well.


Active Messages


So far, I've only talked about fairly traditional text messages. By taking
advantage of Bob's object-oriented nature, it's easy to allow for different
types of messages with different behaviors. Assume that whenever you display a
message, you do that by sending a displayMessage message to the message
object. For text messages, the method for handling the displayMessage message
would simply display the text. However, you could also invent a new type of
message like a FormMessage, where the displayMessage method would display a
form, allow the user to fill in fields in the form, and then automatically
construct a response to the form message and send it back to the sender of the
original message.

Beyond just having several built-in classes of messages like TextMessage and
FormMessage, it would be possible for any message to carry with it its own
method for the displayMessage message. This is because any object can override
any method with its own version of that method and because the flattener can
flatten any Bob data type, including a compiled method. Because Bob compiles
methods to machine-independent bytecodes, these compiled methods can be run on
any machine that can run Bob. You can write a message on a Macintosh that
includes its own displayMessage method and send it to an MS-DOS user, and it
will execute just fine.


Command Handling


It doesn't do much good to be able to store messages if it's impossible to
read them. Any conferencing system needs some sort of user interface. I know
it's the fashion these days to have fancy GUIs for everything, but I've chosen
to initially use a text-based interface. 
Bob contains a number of functions to support parsing commands, which are
summarized in Figure 3. A line buffer is simply an array of characters with a
pointer to the next character to fetch. The FillLineBuffer function reads
characters from the specified stream and places them in the line buffer until
it encounters an end-of-line character and sets the buffer's pointer to the
first character of the line. The NextToken function returns the next
space-delimited sequence of characters starting at the buffer pointer after
skipping any leading spaces. The RestOfLine function returns the rest of the
characters in the buffer, starting at the pointer after skipping leading
spaces. The IsMoreOnLine function is a predicate that returns TRUE if there
are more nonspace characters left in the buffer. These functions make it easy
to write the simple command parsers used in text-based conferencing systems.
To handle a command, you must associate a command keyword with a function for
carrying out the actions indicated by the command. Bob does this by using
string addressing of object properties. A command table is simply an object
where each property is a command name, and the property value is a method for
carrying out the command. Example 2, for instance, points out another feature
of Bob. Property names can be any Bob data type. In this case, we are using
strings as property names because NextToken returns a string; this makes it
easy to find the method for handling the command indicated by the string
token.


Conclusion


Bob is still very much under development. Some of the features I'm considering
for future versions are ways of handling the flattening of circular
structures, distributed data storage on a LAN, using an object-store server,
and a graphical interface. I'll probably add compressed bit vectors as a Bob
data type to support the tracking of which messages a user has read in a
conference. With an index facility for object stores, messages threads should
be easier to implement. There's lots of room for improvement.
You may have noticed that Bob is not so much a conferencing system as a
language for writing conferencing systems. It provides basic
programming-language features along with a facility for the persistent storage
of objects and some simple command-parsing functions. With these basic tools,
you should be able to build your own conferencing system tailored to your
specific needs.
Table 1: Bob syntax: (a) function definition; (b) formal arguments; (c) method
definition; (d) formal argument list; (e) block; (f) local declaration; (g)
local variable; (h) statement; (i) expression; (j) property expression; (k)
message-sending expression; (l) literal function; (m) literal symbol; (n)
literal list; (o) literal vector; (p) literal object; (q) Lvalue; (r) global
variables; (s) run-time routines.
(a)
 define <function-name>( <formal-arguments> )
<block>
(b)
 <argument-name> [ , <argument-name> ]_
(c)
 define '[' <object> <keyword> ']'
 <block>
 define '[' <object> [ <keyword> :<argument name> ]_']'
 <block>
(d)
 <argument-name> [ , <argument-name> ]_
(e)
 { [ <local-declaration> ]... <statement>_ }
(f)
 local <local-variable> [ , <local-variable> ]_ ;
(g)
 <variable-name> [ = <initial-value> ]
(h)
 if ( <test-expression> ) <then-statement> [ else <else-statement> ] ;
 while ( <test-expression> ) <body-statement>
 do <body-statement> while ( <test-expression> ) ;
 for ( <init-expression> ; <test-expression> ;<increment-expression> )
 <body-statement>
 break ;
 continue ;
 return [ <result-expression> ] ;
 [ <expression> ] ;
 <block>
 ;
(i)
 <expression> , <expression>
 <lvalue> = <expression>
 <lvalue> += <expression>
 <lvalue> --= <expression>
 <lvalue> *= <expression>
 <lvalue> /= <expression>
 <lvalue> %= <expression>
 <lvalue> &= <expression>
 <lvalue> = <expression>
 <lvalue> ^= <expression>
 <lvalue> <<= <expression>

 <lvalue> >>= <expression>
 <test-expression> ? <true-expression> : <false-expression>
 <expression> <expression>
 <expression> && <expression>
 <expression> <expression>
 <expression> ^ <expression>
 <expression> & <expression>
 <expression> == <expression>
 <expression> != <expression>
 <expression> < <expression>
 <expression> <= <expression>
 <expression> >= <expression>
 <expression> > <expression>
 <expression> << <expression>
 <expression> >> <expression>
 <expression> + <expression>
 <expression> -- <expression>
 <expression> * <expression>
 <expression> / <expression>
 <expression> % <expression>
 -- <expression>
 ! <expression>
 ~ <expression>
 ++ <lvalue>
 -- <lvalue>
 <lvalue> ++
 <lvalue> --
 <expression> ( [ <arguments> ] )
 <expression> . <property-expression>
 <expression> '[' <expression ']'
 <message-sending-expression>
 ( <expression> )
 <variable-name>
 <literal-function>
 <literal-symbol>
 <literal-list>
 <literal-vector>
 <literal-object>
 <number>
 <string>
 nil
(j)
 <symbol>
 <expression>
(k)
 '[' <object> <keyword> ']'
 '[' <object> [ <keyword> <argument> ]_']'
(l)
 function [ <name-string> ] ( <formal-arguments> ) <block>
(m)
 \ <symbol-name>
(n)
 \ ( [ <expression> ]_ )
(o)
 \ '[' [ <expression> ]_ ']'
(p)
 \ { <object> [ <property-name> : <expression> ]_ }
(q)
 <variable-name>

 <vector-or-string-expression> '[' <expression> ']'
 <expression> . <property-expression>
(r)
 stdin The standard input file pointer
 stdout The standard output file pointer
 stderr The standard error file pointer
(s)
 obj = New(class);
 obj = Clone(object);
 list = List(element,...);
 value = Cons(car,cdr);
 car = Car(cons);
 SetCar(cons,car);
 cdr = Cdr(cons);
 SetCdr(cons,cdr);
 symbol = Intern(printName);
 class = Class(object);
 value = GetLocalProperty(object,propertyName);
 Print(value[,stream]);
 Display(value[,stream]);
 DecodeMethod(method[,stream]);
 stream = FOpen(name,mode);
 Close(stream);
 ch = GetC(stream);
 PutC(ch,stream);
 Load(filename);
 Quit();
Figure 1: Object-store functions.
CreateObjectStore(name);
store = OpenObjectStore(name);
CloseObjectStore(store);
count = ObjectCount(store);
stream = OpenObject(store,objectNumber,versionNumber);
stream = CreateObject(store);
stream = UpdateObject(store,objectNumber);
DeleteObject(store,objectNumber);
objectNumber = ObjectNumber(stream);
versionNumber = ObjectVersion(stream);
cursor = CreateCursor(store,objectNumber);
CloseCursor(cursor);
SetCursorPosition(cursor,position);
objectNumber = GetNextObject(cursor);
objectNumber = GetPreviousObject(cursor);
stream = OpenCurrentObject(cursor,versionNumber);
Figure 2: Bob flattening functions.
Flatten(value,stream);
value = Unflatten(stream);
Figure 3: Bob parsing functions.
buffer = NewLineBuffer()
FillLineBuffer(buffer,stream);
token = NextToken(buffer);
line = RestOfLine(buffer);
more = IsMoreOnLine(buffer);
Example 1: (a) Creating a message; (b) writing a message to an object store;
(c) reading back the message.
(a)
 myMessage = New(Message);
 myMessage.author = "Jonathan";
 myMessage.subject = "Hiking";
 myMessage.text = "We went on a nice hike yesterday.";

(b)
 myStore = OpenObjectStore("Messages");
 stream = CreateObject(myStore);
 Flatten(myMessage,stream);
 Close(stream);
(c)
 myStore = OpenObjectStore("Messages");
 stream = OpenObject(myStore,1,0); // message one, current version
 myMessage = Unflatten(stream);
 Close(stream);
Example 2: Bob command table.
topMenu = \{ Menu
 "help": function() { Display("No help available.\n"); },
 "bye": function() { Display("Sorry to see you go.\n"); }
 };















































Special Issue, 1994
Very High-Speed Networks: HiPPI and SIGNA


Enabling technology for the information highway




William F. Jolitz and Lynne G. Jolitz


Bill and Lynne are the authors of the 386BSD CD-ROM and can be contacted
through the DDJ offices.


Very high-speed networking has become a key component in the race to rapidly
and economically deliver large amounts of information. The high visibility of
the information highway, the increasing interest in multimedia applications,
and the demands of high-profile public-policy issues (such as rapid and
confidential access to medical information on a national scale) ensure that
very high-speed gigabit networks will be implemented within the next few
years, current technology notwithstanding.
In "Very High-Speed Networking" (DDJ, August 1992), we outlined a number of
hardware and software approaches which could be useful in achieving the
required gigabit rates. Unfortunately, very little work of practical substance
has been forthcoming. Many hardware solutions--including protocol
engines--have recently fallen out of vogue, primarily due to cost constraints.
Changing to different transmission technologies (FDDI and SONET, for instance)
has also proved difficult (replacing the infrastructure is costly), so the
focus is back on improving rates on existing, copper transmission lines.
Popular software solutions (such as header prediction) have been successful,
but generally have been mined out. 
Even though there has been a great deal of talk about gigabit testbeds, the
relative lack of interface hardware has been a stumbling block. Most projects
have assembled a gigabit platform by using banks of T3 (45-Mbyte) or FDDI
(100-Mbyte) interfaces, since few interface standards exist in this rarefied
area, but these testbeds tend to be beyond the economic reach of most software
and hardware application-development groups. However, one extant standard
which may be within reach--HiPPI (high-performance protocol interface) allows
for a 880-Mbit link to supercomputers; see "HiPPI and High-Performance LANs"
by Andy Nicholson, (DDJ, June 1993).
In this article, we'll examine two HiPPI-based projects--the PC-Supercomputer
HiPPI Project and Project SIGNA--both of which utilize the 386BSD publicly
accessible research software. However, any system using TCP/IP (Windows NT,
for instance) can also be so modified, assuming you have the patience and
access to kernel source code. 


Hardware Gigabit Testbeds: HiPPI in the Labs


The Los Alamos National Laboratory (LANL) views supercomputer resources as a
kind of "numerical" science laboratory of simulation, and PCs and workstations
as the "visualization" devices (terminals) which provide rapid access to these
shared resources. By placing all these computer resources (such as oddball
supercomputer architectures, massive data stores, and tape/optical backup) on
the same high-speed network, LANL can effectively "remove" the bottlenecks
which occur in managing an information system that deals in extremely huge
objects (for example, as in a plasma-reaction simulation, where data is
shipped to the facility needed at the moment--even between clusters of
supercomputers).
To accomplish this, Richard Thomsen, Michael McGowan, and Craig Idler of LANL
developed a special HiPPI-based interface to connect these supercomputers and
high-speed storage devices to the Internet at very high rates. Since these
devices could not work with the Internet protocols directly, a PC running
386BSD is used to interpret the protocol headers stripped off the incoming
packet, with the remaining data payload redirected to a separate HiPPI link
for reliable delivery to the target hardware. The combination of dual HiPPI
interfaces, 486 PC, and software effectively produces a TCP/IP protocol engine
running at HiPPI rates (see the accompanying text box entitled, "The LANL
HiPPI Protocol Engine Hardware").


Software Gigabit Testbeds: Project SIGNA


Hardware solutions, while intriguing, are usually out of reach for most
software programmers. Still, the prospect of developing a scalable
network-interface technology is very desirable. Even though hardware
interfaces are still evolving, the software technology, coupled with fast
(100-MIPS), inexpensive processor technologies and memory systems (greater
than 512K write-back caches), is now available. 
SIGNA (short for "simplified Internet gigabit networking architecture") is
designed as a guide to inexpensively exploring Internet gigabit-networking
technologies in an inexpensive manner by running extremely high-speed
protocols on an ordinary PC via 386BSD software.
The SIGNA approach currently emphasizes the most minimal of gigabit networking
applications: client operation of a PC with a single application. However,
when gigabit hardware interfaces become available, a SIGNA platform could
allow client PCs to access supercomputer "servers" (as, for example, during
image uploading and downloading) and other client PCs (such as in video
teleconferencing). By dedicating PC resources to a single "bursty"
application, you can essentially create "gigabit-terminal equipment."
Key considerations in the 386BSD SIGNA design included: 
Reducing the high overhead of protocol processing.
Speeding up memory allocation to adequately buffer the network application.
Improving application delivery of data to reduce bandwidth bottlenecks.
Reducing response time in real-time applications.
Current TCP/IP implementations reduce protocol-processing overhead by managing
the protocol header via header prediction. However, the overhead produced by
managing the data payload of a packet is also a significant limitation not
addressed in current TCP/IP designs. As in the case of the LANL HiPPI protocol
engine, this delivery and checksum overhead must be minimized to create
high-speed protocols.
To guarantee real-time application response, it is necessary to add a limited
real-time mechanism to the 386BSD kernel. This mechanism allows a special
single process to preempt the kernel on demand. This special case carefully
"violates" the UNIX model of restricted preemption to achieve a rapid response
to data delivery; it is not intended as a general-purpose mechanism for
real-time programs.
Extant device and driver interfaces which place the burden of buffer
allocation and packet extraction on the device driver are not appropriate for
gigabit-network interfaces. Gigabit-networking interfaces must cope with the
fact that while processor speed is increasing, memory-system bandwidth is not
keeping pace. Operations involving the most bandwidth (the packetdata payload)
are costly; if you require more than a single pass over the packet, you
overload the memory-system bandwidth and "get behind" in processing a packet.
One way to avoid this is to use extensive amounts of memory (arranged as frame
buffers) to assemble and present the link-layer packets in transit. Such
memory-based devices require novel device-driver interfaces. 
Finally, Internet Core protocol structures (TCP, UDP, IP, ICMP) must
themselves be modified to eliminate copies and reduce checksum overhead. By
operating on descriptors instead of copying the packet around during
processing, you can reduce the average passes required per packet from three
to one--a significant reduction in memory overhead; see Figure 3. This is done
by combining the copy and checksum operations directed to protocol headers and
data. The descriptors selectively reference header/data portions of the packet
in place in the interface's buffer.
Header prediction can also be enhanced through a "clustering" mechanism, which
synchronizes a half-duplex stream of packets. This effectively locks out other
system activity during peak-rate transfers.


The Future of the LANL HiPPI and SIGNA Projects


The LANL HiPPI project, exhibited at the Supercomputer '92 and '93
conferences, is possibly the only successful protocol-engine design ever put
into operation. Even software-testbed designs (including Project SIGNA) cannot
match the current speed of good protocol-engine designs due to the limitation
in the memory system used by the processor itself. As such, anyone interested
in getting a hands-on, operational, protocol-engine testbed should look at
this design carefully. It could save a company years in design and development
costs and also bring very high-speed networking that much closer to reality.
Because gigabit hardware technologies are still a matter of speculation,
software-only approaches (such as SIGNA) and testbeds are more than just
interesting. Both 600-Mbit ATM (MAN) and 100-Mbit Ethernet might offer
affordable desktop bandwidth in the near future, while SONET scaled to
multi-gigabit levels offers the possibility of metropolitan-network
interconnections. Even HiPPI, originally a supercomputer mass-storage
interface, has been demonstrated as a network-interconnect standard. With the
recent standardization of the HiPPI serial standard, the cost of
implementation has lowered drastically. 


Very High-Speed Networks


While gigabit networking is considered solely the province of the data
industry, knowledge of telephony techniques provides insight into design
considerations and constraints. In fact, both the SIGNA and LANL HiPPI
testbeds could be viewed simply as a gigabit-terminal equipment. In addition,
new gigabit-networking technologies must rely on switching technologies
instead of routing technologies, since the data rates required prohibit the
delay imposed by the interim retransmission of a packet. 
The inevitable reunion of the data-networking and telecommunications
industries will be spurred on by the demand for global very high-speed gigabit
networking, although probably not in the manner either of these industries
have separately forecast. Ironically, the experts most suited to leading the
charge are at risk of being most blind to these new possibilities, since they
are used to seeing them only in terms of their respective disciplines.

In the meantime, hardware projects like LANL's HiPPI project and
software-testbed engines like SIGNA will provide us with the knowledge and
experience needed when very high-speed networking solutions become available.
Perhaps they will encourage entrepreneurs from both industries to take the
initiative and offer ad hoc solutions, creating a whole new information
industry. In any case, the demand for very high-speed networks is real, and
that demand will be satisfied--one way or another. 
The LANL HiPPI Protocol Engine Hardware
The LANL protocol engine (see Figure 1) consists of two CBI (crossbar
interface) cards attached to an ordinary EISA PC. Each CBI card has two
unidirectional HiPPI ports (one input, one output), each used to manage one
half circuit of the communications between an Internet network and a
non-Internet-capable application host. Only data and requests for Internet
service flow across the application link, and only Internet-protocol (IP)
datagrams appear on the network link. It is the sole responsibility of the PC
to handle the transformation of the application's requests into appropriate
Internet-protocol operations without ever seeing the application's data (just
handling pointers to the data only). In this case, the PC is the actual
Internet host which operates on behalf of the external host computer.
The key to this architecture is the design of each CBI (see Figure 2), which
is built around a large (4-Mbyte) block of video RAM (VRAM). The VRAM has
three ports: two serial (one in and one out) for receiving and transmitting
HiPPI, and one parallel, bidirectional port that allows the PC to access
TCP/IP header and HiPPI Link Layer information. Each board has a port on the
network and a port connected to the application host (which runs the network
application connected to the network). The data is buffered between the
network and the application host solely in the VRAM while the PC arranges the
details of the network transfer.
While the roles of application and network are split between two hosts, you
could design a delivery mechanism to the application running on the same PC
(sort of a "socket protocol engine" for the particular application program) if
necessary. This approach can also be used on a single PC or workstation.
By stratifying the design of protocol processing into scalable sections, you
can cope with any degree of bandwidth on a networking implementation. Given
the rate of technology change, switching a gigabit per second between
computers will be routine in less than a decade.
The choice of a PC/supercomputer connection presented some novel problems
which had to be resolved to make the LANL HiPPI project fly. One of the most
critical issues dealt with the rate of information itself: While a
supercomputer has no trouble churning out TCP/IP in order to source a HiPPI
link, how could a PC handle it? The secret was to decouple the overhead of the
data payload from the protocol processing so that the overhead per packet is
fixed, regardless of the size of the packet. Assuming maximum packet size of
64 Kbytes (219 or 512 Kbits), a packet rate of 211 or 2048 per second would be
necessary to support a data rate of a gigabit (230 bits per second).
Since these packet rates are achievable with a carefully tuned PC Internet
implementation, the real key to high-speed networking is to find ways to scale
packet-data payload delivery. The LANL CBI project addresses this through
clever hardware design. The TCP protocol has two requirements on its data
payload: a delivery requirement and a checksum across the span of both the
payload anda special, pseudo-protocol header. A hardware-checksum mechanism
offloads from the networking implementation a portion of the protocol
processing that increases with packet payload.
A second hardware mechanism eliminates the remaining payload overhead from
delivering the data to the application. Essentially, the PC never touches the
data inside the packets--it merely manages the association of hardware
data-buffer pointers between the two interfaces. The PC simply does the
bookkeeping of the protocol, which is the same whether the packet is 64 bytes
or 64 Kbytes. 
--W.F.J.


For More Information 


More information on the LANL HiPPI Project, including documentation on the
CBI, is available via ftp at the Internet site ftp.lanl.gov in the /pub/cbi
directory. For information on Project SIGNA, 386BSD, or pointers to further
information about the LANL HiPPI Project, please send e-mail to
wjolitz@cardio.ucsf.edu.
Figure 1 The LANL protocol engine.
Figure 2 The design of each CBI.
Figure 3 Reducing the average passes required per packet; (a) three pass;
(b)one pass.











































Special Issue, 1994
sGs: A Simple Gopher Server 


A menu-based searching utility--written in Perl!




Bob Kaehms and Jonny Goldman


The authors are programmers in the San Francisco Bay Area. Bob can be
contacted at cames@well.sf.ca.us and Jonny at jonny@synopsys.com.


If you've followed the explosive growth in network activity, you know that
three predominant applications--Gopher, Wide Area Information Servers (WAIS),
and the World Wide Web (WWW)--have transformed the use of the Internet.
WAIS, developed by Thinking Machines Corp. (Cambridge, MA) is a client/server
full-text search system based on a standard library-science network protocol
(ANSI Z39.50--1988). Gopher, developed by the microcomputer laboratory of the
University of Minnesota, is a menu-based Campus Wide Information System (CWIS)
that simplifies the dissemination of information by presenting a uniform user
interface to the campus network (and the Internet as a whole). The World Wide
Web, developed at the European Laboratory for Particle Physics (CERN) in
Geneva, is a distributed hypertext system designed to provide a way for
physicists to collaborate on research in high-energy physics.
Along with networking tools such as ftp, telnet, usenet, and mail, Gopher,
WAIS, and WWW each have a place in the overall design of an information
infrastructure. In this article, we'll present a simple Gopher server written
in Perl and discuss the basic Gopher protocol. We'll also show how this server
can be extended and combined with other Internet tools to provide more
sophisticated network information systems.


About Gopher


The Gopher protocol is a file-system-based model extended to allow "files" to
reside on many computers. Clients (often, but not exclusively, represented as
user interfaces) run on a local computer connected to a server which manages
the information (the files). The Gopher protocol was designed to be
human-readable, so all transactions through the protocol are done using the
U.S. ASCII character set and consist of lines separated by carriage
return/line feed (CR/LF). The first character on the line is a tag that
informs the client that the remaining information on the line is of a
particular, well-defined type. The types specified by the original protocol
were minimal: just enough to provide a simple interface to a distributed file
system. This base set included ASCII files (tag=0), directory listings
(tag=1), or searchable indexes (tag=7). Table 1 presents the complete list of
types.
The Gopher protocol can be extended by defining additional tags, although 0
through Z have been reserved by the original developers. Gopher+, a new Gopher
protocol, extends this idea, passing additional type information both through
a filename extension (such as movie.mpg), and by appending information to the
original Type-1 response returned by the server.


Type 1: A Directory Listing


Much of the work in Gopher is done through Type 1 responses, which consist of
a set of lines, each containing a single character type. This is followed by a
tab-separated quadruple (4-tuple) of a string to be displayed as the heading
by the client, a selector string that can be returned to the server for
subsequent processing, a host name (usually the server), and a TCP port.
Figure 1, which is a typical Type 1 response, shows that an item is an ASCII
file. The client would display the string following the "0" up to the first
tab as a menu item (About this Gopher Article). The remaining information
(0/About this GopherArticle<tab>g.host<tab>port<CR><LF>) is used to retrieve
the next piece of information. The client parses this substring to get the
selector, the host, and the port. In this example, the client makes another
connection to the host gopher.host via a socket connection at port 70 and
sends the string0/About this GopherArticle<CR><LF> to the Gopher server to
request the item referred to in the previous response. All requests from the
client are performed in a single TCP/IP connection and are terminated by a
CR/LF. The server responds to the request and closes the connection.


Why sGs?


Faced with uncertain commercial-use licensing for the University of
Minnesota's Gopher server, we decided to develop a simple prototype server to
help bootstrap Lockheed onto the information highway. This code is a
modification of the Perl program waismail.pl (written by Jonny Goldman), which
is a gateway to the WAIS systems through Internet electronic mail. 
In providing a systems-engineering organization with a graphical front end to
existing, internally written requirements-traceability software, we saw an
opportunity to explore solutions to information processing available on the
Internet. We chose to implement a client/server system based on the emerging
technologies of WAIS and Gopher.
We leveraged our efforts against other pockets of Internet development
existing in the corporation. Known as the "Technology Broker System" (TBS),
the aim of our project was to simplify and validate the terminology of the
Internet within the Lockheed corporate community. sGs quickly took off inside
the corporation because it was highly portable and easy to configure. (See
Information Week, June 27, 1994, for more information on Lockheed and the
Internet.) 


sGs Code


As Listing One illustrates, sGs is an excellent example of a hacker's program.
It was my second attempt at a real Perl program, with a little help from "the
net," especially Jonny Goldman, who was still actively supporting the Public
Domain WAIS software at the time. My first program was a modification of
Jonny's public-domain waismail.pl program so that it could be used for
handling additions, modifications, and deletions to WAIS databases.
sGs is built upon the simple client/server socket example in Programming Perl,
by Larry Wall (O'Reilly & Associates, 1990). As you examine the code, notice
that the subroutine initialize_socket() looks similar to the sample server
code of his book. Wall notes how concisely the socket, bind, listen, and
accept calls can be written in Perl; see Example 1. For more information on
Perl, we recommend both Wall's book and "Networking with Perl," by Oliver
Sharp (DDJ, October 1993). The client/server example in Wall's book provides a
firm foundation on how sockets work. In Example 1, the subroutine creates and
initializes a socket to which clients can connect.
We extended Wall's example in two ways: First, instead of echoing the lines
sent by the client as Wall did, the server interprets the string (up to the
CR/LF) and acts upon it in some way. Second, we reap the child processes
spawned to handle each request that would otherwise become "zombies." 
All of this fits neatly into the trap_gophers subroutine which forms the core
event loop in the server; see Example 2. trap_gopher simply sits and waits for
connections on the open socket. (Imagine a cat sitting patiently on the back
lawn waiting for gophers. When it sees one, it has a kitten that chases and
eats the gopher. When the kitten is finished, the parent cat consumes the
kitten.)
This completes the basic mechanics of the server. For requests made to the
server by the client, we'll start with the simplest request: a "Type 0"
request which is used to retrieve a plain text (ASCII) file. The section of
Example 2 beginning with while (<NS>) { _ illustrates how this is implemented.
In effect, this code works as follows:
1. Wait for input on the socket.
2. Check to make sure the request is valid. This has the side-effect of
removing the first character from the input line and putting the result in the
variable $request.
3. Check to see if the first character of the line is the number 0.
4. If so, execute the subroutine sendfile.
Perl provides an easy mechanism for handling lines. We have created a stream
called "NS" from the socket using the accept call and processed it one record
at a time with the while(<NS>) call. By default, the end of a record in Perl
is CR/LF. Conveniently enough, this is the end of a Gopher request. If this
subroutine fails to recognize a valid type--the first character of the
line--it simply closes the socket.
This routine does some simple preprocessing of the request string, which is
passed to the subroutine through the Perl variable $request. The subroutine
send-file (see Example 3) prepends the Gopher root-level data directory to the
request, opens the file, and sends it back to the client one line at a time.
If the file doesn't exist, the connection simply closes.
The Type 1 request is similar. We prepend the root-level Gopher directory to
the request. If the directory is dynamic, we create a directory listing using
the UNIX ls command and process the output, tagging the resulting lines with
the base Gopher types and processing any link files that might be found. If
the directory is static, we just open the "cachefile" and send it back to the
client; see Example 4. Note that the server sets the type using filename
extensions and the standard UNIX tests for binary, text, and directory.
In the Type 7 request section, the server implements a gateway to a search
tool, in this case, WAIS. The subroutine wa2go (see Example 5) serves two
functions: First, it takes a list of keywords, which it hands to the WAIS
server. It receives the list of results from the WAIS server and rewrites them
to conform to the Gopher protocol before returning them to the client. Second,
it receives a result from a previous search, rewrites it to the WAIS protocol,
and passes that back to the WAIS server. It receives the result from the WAIS
server and passes that back to the client. It would be relatively simple to
replace the WAIS gateway with a gateway to some other search tool (like UNIX
grep, for instance).

There are several ways that sGs can be configured. Modifications can be made
directly to the code inside the subroutine init_program_vars. Configurable
parameters can also be passed to the program as command-line options or
through a configuration file. Perl allows for a simple syntax for processing
the command line. In Example 6, @ARGV is an array containing the command line.
Elements from the array are shifted into the stream variable $_ and processed
until the end of the array is reached. The variables themselves can be easily
understood by looking at a configuration file such as Example 7, where the
server would be started either from a command line, or through an rc file as:
sGs -c sGs.cnf.
If the server is running with dynamic menuing, link files can also be placed
in any directory below Gopher root and are identified as "dot" files. The
server assumes that any such dot file is a link and tries to process it
accordingly. If it finds any data within the file that it doesn't understand,
it exits the subroutine, ignoring the file; see Example 8. These link files
are what makes Gopher so powerful and allow a Gopher administrator to tunnel
far and wide throughout the Internet. Example 9 is an example link file
conforming to sGs's format.
Provided that you were "on the net," putting such a link into your Gopher data
directory would let you jump to the place where it all started, where you
would find plenty of additional information and support.


Conclusion


By utilizing this type of server, an organization can utilize other freely
available software--in particular, clients for most hardware platforms. Table
2 is a short list of what's currently available.
Also, keep in mind that the official Gopher software is always available from
boombox.micro.umn.edu and should not be overlooked when considering an
information architecture. The Gopher team provides excellent support and
constant improvements to their code, such as the three-dimensional-space user
interface for Gopher that's currently under development.
As a development platform, Perl provides a rich, portable language for
developing network-based services. Organizations need access to the huge
amounts of information available on the net, and they need to provide
customized servers to their users. sGs is a marriage of two useful tools that
provide such a solution.


References


Wall, Larry and Randall Schwartz, Programming Perl. O'Reilly & Associates,
1992. 
Anklesaria, A. et al. RFC 1436 "The Internet Gopher Protocol" available from
ftp.internic.net.
"Guide to Network Resource Tools." EARN Association, May 20, 1994, Document
Number: 3.0. Available in electronic form from LISTSERV@EARNCC.EARN.NET (or
LISTSERV@EARNCC.BITNET). Send the command: GET filename where the filename is
either: NETTOOLS PS or NETTOOLS TXT. 
Kahle, Brewster. "Wide Area Information Servers." April, 1991. One-page
overview of Internet release of WAIS. Available via anonymous ftp:
/pub/wais/wais-discussion/wais-overview.text@quake.think.com or WAIS server
wais-discussion-archive.src.
Sharp, Oliver "Networking With Perl." Dr. Dobb's Journal (October 1993).
Figure 1: A Type 1 response.
0About this Gopher
Article<tab>0/gopher-data/About<tab>gopher.host<tab>70<CR><LF>
* <tab> = ASCII 9
 <CR> = ASCII 13
 <LF> = ASCII 10
Example 1: This code creates and initializes a socket to which clients can
connect. The variable $WNOHANG is used to collect child processes that would
otherwise result in "zombie'' processes.
sub init_socket {
 $AF_INET = 2;
 $SOCK_STREAM = 1;
 $sockaddr = 'S n a4 x8';
($name, $aliases, $proto) = getprotobyname('tcp');
 if ($port !~ /^\d+$/) {
 ($name, $aliases, $proto) = getservbyport($port, 'tcp');
 }
 $this = pack($sockaddr, $AF_INET, $port, "\0\0\0\0");
 select(NS); $ = 1; select(stdout);
 socket(S,$AF_INET, $SOCK_STREAM, $proto) die "socket: $!";
 bind(S,$this) die "bind: $!";
 listen(S,5) die "connect:$!";
 select(S); $ = 1; select(stdout);
 $WNOHANG =1;
}
Example 2: The trap_gophers subroutine forms the core event loop in the
server.
sub trap_gophers {
 for($con = 1; ; $con++) {
 ($addr = accept(NS,S)) die $!;
FORK:
 if (($pid = fork()) != 0) { # parent
 close(NS);
 while (1) { last if (waitpid(-1,$WNOHANG) < 1);}
 } elsif (defined $pid) { # child
 ($af,$port,$inetaddr) = unpack($sockaddr,$addr);
 @inetaddr = unpack('C4',$inetaddr);
 while (<NS>) {
 if (! &valid_request($_)) {close(NS);exit(-1);}
 if (/^\r/) {&log_request("CONNECT\n");&senddir();}
 if (/^1/) {&senddir();}
 if (/^0^4^9^g^h/) {&sendfile();}

 if (/^7/) {&wa2go();}
 close(NS);
 exit(0);
 }
 } elsif ($! =~ /No more process/) { #EAGAIN is recoverable
 sleep 2;
 redo FORK;
 } else { # weird fork error
 die " could not fork child to handle connection!!!: $!\n";
 }
 }
 close(NS);
}
Table 1: Gopher types.
 Type Description 
 0 A readable ASCII file.
 1 A simple directory listing.
 2 A CSO phonebook (a special kind of directory listing).
 3 An Error was detected by the server.
 4 A BinHexed Macintosh file.
 5 A DOS binary archive.
 6 A UNIX uuencoded file.
 7 A keyword searchable index.
 8 A telnet session.
 9 A binary file.
Example 3: The subroutine sendfile prepends the Gopher root-level data
directory to the request, opens the file, and sends it back to the client one
line at a time.
sub sendfile {
 &log_request("FILE:$request");
 open(REPLY, "<$gopher_root/$request");
 while (<REPLY>){send(NS,"$_",0);}
}
Example 4: Processing a Type 1 request.
sub senddir {
 &log_request("DIR:$request");
 if ($menutype eq "d") {
 open(REPLY, "ls -a1 '$gopher_root/$request' );
 while (<REPLY>){
 chop $_;
 $file= $_;
 if (/^\./) { &process_link($_);}
 else {
 $type="0" if -T "$gopher_root/$request/$file";
 $type="9" if -B "$gopher_root/$request/$file";
 $type="1" if -d "$gopher_root/$request/$file";
 $type="7" if "$gopher_root/$request/$file" =~/\.src$/;
 $type="g" if "$gopher_root/$request/$file" =~/\.gif$/;
 $type="4" if "$gopher_root/$request/$file" =~/\.hqx$/;
 $type="h" if "$gopher_root/$request/$file" =~/\.html$/;
 if ($type == 0 $type == 1 $type eq "g" $type eq "9" $type eq "4" 
 $type eq "h") {
 send(NS,"$type$file\t$type$request/$file\t$thishost\t$thisport\r\n",0);
 }
 $waissourcedir = ""; $ENV{'WAISCOMMONSOURCEDIR'} = $waissourcedir;
 if ($type == 7 && $wais_op) {
 $waissourcedir = "$gopher_root/$request"; #chop $waissourcedir;
 $ENV{'WAISCOMMONSOURCEDIR'} = $waissourcedir;
 send(NS,"$type$file\t$type::search::$waissourcedir::$file::\t$thishost
 \t$thisport\r\n",0);
 }

 }
 }
 send(NS,".\r\n",0);
 } else { #menutype is static
 open (CACHE, "< $gopher_root/$request/$cachefile") print "error opening
 $cachefile $!\n";
 while (<CACHE>){send(NS,"$_",0); }
 }
}
Example 5: The subroutine ws2go takes a list of keywords, which it hands to
the WAIS server and receives a result from a previous search, rewrites it to
the WAIS protocol, and passes that back to the WAIS server.
# do a WAIS search
sub wa2go { #Modified from Jonny Goldman's waismail.pl <jonathan@think.com>
 [...]
 if (/^7::search^7::Search^7::SEARCH/) {
 [...]
 &dosearch();
 }
 if (/^7::retrieve^7::Retrieve^7::RETRIEVE^[ \t]{0,}DocID: /) {
 [...]
 &doretrieve();
 }
 [...]
}
Example 6: @ARGV is an array containing the command line.
while (@ARGV) {
 $_=shift @ARGV;
 if (/-c/) {
 $c_file=shift @ARGV;
 [...]
 }
 if (/^-l/) { $logfile=shift @ARGV;}
 [...]
Example 7: A configuration file.
# sGs.cnf
# port to run on.
# NOTE: official gopher port is 70, but requires root privilege
-p 1492
# gopher root level data dir
# this is where you put the data you want to publish
-d /my-gopher/data
# where you keep your wais binaries (remove or comment the following
# line if no wais)
-w /wais/bin
# gopher log file
-l ./sGs.log
# how to set up menus... if "-m d" then they are dynamic
# if "-m s" then they are static, and cache files must be created
 manually. sGsCache.pl can be used
#
# NOTE:
# static menus allow finer control over what a gopher menu will look like,
# more control and security, but in general may be harder to maintain.
# Properly constructed gopher filesystems on a unix system utilizing long
# filenames and whitespace allow for very readable client menus.
-m d
# the hostname you wish to have created in your gopher replies
# normally the default is fine "`hostname`.`domainname`".
-H gopher.chaser.com
Example 8: Gopher link files.

sub process_link {
[...]
 if (-T "$gopher_root/$request/$file") {
 open(LINK,"< $gopher_root/$request/$file") die "can't...";
 while (<LINK>) {
 [...]
 if(/^Name^Type^Port^Path^Host/) {
 [...]
 }
 else { return } # funny garbage in link file
 }
Example 9: An example link file that conforms to sGs's format.
Name=The Original Gopher at UofM
Type=1
Port=70
Path=1/
Host=boombox.micro.umn.edu
Table 2: Freely available software samplers.
Platform Application Author ftp Site Comments 
PC gopher_p.exe Martyn Hampson Gopher+ client
PC gophbook.exe Kevin Gamiel Cute client
 that uses a
 book metaphor for
 displaying data.
Mac GopherApp Don Gilbert xxxxxxxxxxTKxxxxxxxxxx Uses MacApp
 extensible Macintosh
 programming framework.
 Highly reliable.
 Some problems with
 32K text-file limits.
UNIX Xgopher Allan Tuchman xxxxxxxxxxTKxxxxxxxxxx

Listing One 

#!/usr/local/bin/perl
$program="sGs.pl"; # (a simple Gopher server)
$revision="2.0 ";
#
############################################################################
$maintainer_person="Your Name Here";
$maintainer_address="Your Email Address Here";
&register();
#############################################################################
# DESCRIPTION
# A simple gopher server which handles type 0,1, g, and 7 gopher requests.
#
############################## MAIN #########################################
#
&init_program_vars();

#############################################################################
# RUN AS A DAEMON
fork && exit;
setpgrp(0,$$);

#############################################################################

&init_socket();


&trap_gophers();

############################# END ##########################################
sub init_program_vars {
 $host=`hostname`;chop($host);$domain=`domainname`;chop($domain);
 if ($domain) { $thishost="$host.$domain"; }
 else { $thishost=$host; }
 $port="1470";$thisport=$port;$gopher_root="/";
 $wais_op="0";$waisq="waisq";$menutype="d";$cachefile=".cache";
 $maxres = 200;
 $errorlog = "./sGs.err";
 $logfile = "./sGs.log";

 # process the command line
 while (@ARGV) {
 $_=shift @ARGV;
 if (/^-c^-C^-l^-h^-p^-d^-m^-u^-v^-w^-H/) { #good arguments

 if (/-c/) {
 $c_file=shift @ARGV;
 if ( -T "$c_file" ) { &process_config_file(); }
 else { die "-c: improper filename $c_file\n";}
 }

 if (/^-l/) { $logfile=shift @ARGV;}
 if (/^-p/) { $port=shift @ARGV;$thisport=$port;}
 if (/^-d/) { $gopher_root=shift @ARGV;
 if (! -d "$gopher_root"){die "Not a valid directory: $gopher_root\n";}}

 if (/^-m/) { # menu type (d)ynamic (default) or (s)tatic
 $menutype=shift @ARGV;
 if (!($menutype eq "s" $menutype eq "d"))
 {die "-m: bad option $menutype (use d or s) \n"}
 }

 if (/^-C/) { $cachefile=shift @ARGV;}

 if (/^-u/) { # setuid to user (default whoever starts it)
 print "-u option not implemented yet\n";
 }

 if (/^-v/) {&print_version(); die "\n";}
 if (/^-w/) {
 $wais_op="1";
 $waisq = shift @ARGV."/waisq";
 if (! -x "$waisq") {die "$waisq... not executable\n"}
 }

 if (/^-H/) {
 $host=shift @ARGV;$thishost=$host;
 }

 if (/^-h^-\?/) { &print_help(); die "\n";}
 } else {&print_help(); die "\n";} # bad arguments
 }

 if ($logfile){ #we do this once to make sure we can.
 open (LOG,">>$logfile") die "can't open logfile: $logfile: $!\n";
 close (LOG);

 }
 open(ELOG,">>$errorlog") die "can't open error log\n";
 close (ELOG); #just checking...

 $with_options="-h $thishost -p $port -d $gopher_root -m $menutype -w 
 $wais_op ";
 $start_mess="$timestamp Starting sGs: $with_options";
 &print_version(); sleep 2;
 print "\n";
 print "$start_mess\n";
 sleep 4;system("clear");print "Welcome to sGs....\n";
 &log_request("sGs Started $with_options\n");
}

#############################################################################
sub process_config_file {
#
# a config file is just a bunch of command line options put into a file
# one line at a time.
# Example:
# -d /users/gopher/gopher-data
# -p 1500
# -w /users/wais/w8b5bio/bin
# -l /your/gopher/log


 open (CONFIG, "<$c_file") die "cant open $c_file";
 while (<CONFIG>) {
 @op= split(/\s/, $_);
 $_=shift(@op);
 if (/^-H/) { $host=shift(@op); $thishost=$host;}
 if (/^-l/) { $logfile=shift(@op); }
 if (/^-p/) { $port=shift(@op);$thisport=$port } # port
 if (/^-d/) { # gopher directory
 $gopher_root=shift(@op);
 if (! -d "$gopher_root" ) { die "Not a valid directory: $gopher_root\n";}
 }
 if (/^-m/) { # menu type (d)ynamic (default) or (s)tatic
 $menutype=shift(@op);
 $menutype=shift @ARGV;
 if (!($menutype eq "s" $menutype eq "d"))
 {die "-m: bad option $menutype (use d or s) \n"}
 }

 if (/^-C/) { $cachefile=shift(@op); } # default is .cache
 if (/^-u/) { # setuid to user (default whoever starts it)
 print "-u option\n";
 }
 if (/^-w/) { # WAIS SEARCH OPTION default nowais
 $wais_op="1";
 $waisq=shift(@op)."/waisq";
 if (! -x "$waisq") {die "$waisq... not executable\n"}
 }
 }
}

###############################################################################
sub print_version {


 system("clear");
 print "\n\n";
 print "

##############################################################################
 # 
 # sGs 
 # 
 # Gopher 
 # simple server 
 # 
 # Version: $revision 
 # 
 ################################# For Support
################################
 # 
 contact: $maintainer_person
 e-mail: $maintainer_address
 # 

##########################################################################\n";
 }

#############################################################################
sub print_help {

print "
sGs [-c <configfile>] [-p <port>] [-d <gopher-data-dir>] [-l <logfile>]
 [-m <s d>] (static or dynamic {default} menus) [-u <user>]
 [-h-H-??] (prints this file) [-v] (prints version) [-w] (allow WAIS)
 [-C <cachefile>] (when running with static menus. Default .cache)
 \n";
}

#############################################################################
sub init_socket {

 $AF_INET = 2;
 $SOCK_STREAM = 1;
 $sockaddr = 'S n a4 x8';

($name, $aliases, $proto) = getprotobyname('tcp');
 if ($port !~ /^\d+$/) {
 ($name, $aliases, $proto) = getservbyport($port, 'tcp');
 }
 $this = pack($sockaddr, $AF_INET, $port, "\0\0\0\0");

 select(NS); $ = 1; select(stdout);

 socket(S,$AF_INET, $SOCK_STREAM, $proto) die "socket: $!";
 bind(S,$this) die "bind: $!";
 listen(S,5) die "connect:$!";

 select(S); $ = 1; select(stdout);
 $WNOHANG =1;
}

#######################################################################
sub trap_gophers {

 for($con = 1; ; $con++) {
 ($addr = accept(NS,S)) die $!;

FORK:
 if (($pid = fork()) != 0) { # parent
 close(NS);
 while (1) { last if (waitpid(-1,$WNOHANG) < 1);}
 } elsif (defined $pid) { # child

 ($af,$port,$inetaddr) = unpack($sockaddr,$addr);
 @inetaddr = unpack('C4',$inetaddr);
 while (<NS>) {
 if (! &valid_request($_)) {close(NS);exit(-1);}
 if (/^\r/) {&log_request("CONNECT\n");&senddir();}
 if (/^1/) {&senddir();}
 if (/^0^4^9^g^h/) {&sendfile();}
 if (/^7/) {&wa2go();}
 close(NS);
 exit(0);
 }
 } elsif ($! =~ /No more process/) { #EAGAIN is recoverable
 sleep 2;
 redo FORK;
 } else { # wierd fork error
 die " could not fork child to handle connection!!!: $!\n";
 }
 }
 close(NS);
}

######################################################################
sub sendfile {
 &log_request("FILE:$request");
 open(REPLY, "<$gopher_root/$request");
 while (<REPLY>){send(NS,"$_",0);}
}

######################################################################
sub senddir { #NEED TO PUT IN A FLAG FOR STATIC/DYNAMIC
 &log_request("DIR:$request");

 if ($menutype eq "d") {
 open(REPLY, "ls -a1 '$gopher_root/$request' ); 
 while (<REPLY>){
 chop $_;
 $file= $_;
 if (/^\./) { &process_link($_);}
 else {
 $type="0" if -T "$gopher_root/$request/$file";
 $type="9" if -B "$gopher_root/$request/$file"; 
 $type="1" if -d "$gopher_root/$request/$file";
 $type="7" if "$gopher_root/$request/$file" =~/\.src$/;
 $type="g" if "$gopher_root/$request/$file" =~/\.gif$/;
 $type="4" if "$gopher_root/$request/$file" =~/\.hqx$/;
 $type="h" if "$gopher_root/$request/$file" =~/\.html$/;

 if ($type == 0 $type == 1 $type eq "g" $type eq "9" 
 $type eq "4" $type eq "h") {
 send(NS,"$type$file\t$type$request/$file\t$thishost\t$thisport\r\n",0);
 }
 $waissourcedir = ""; $ENV{'WAISCOMMONSOURCEDIR'} = $waissourcedir;


 if ($type == 7 && $wais_op) {
 $waissourcedir = "$gopher_root/$request"; #chop $waissourcedir;
 $ENV{'WAISCOMMONSOURCEDIR'} = $waissourcedir;
 send(NS,"$type$file\t$type::search::$waissourcedir::$file::
 \t$thishost\t$thisport\r\n",0); 
 }
 }
 }
 send(NS,".\r\n",0);
 } else { #menutype is static
 open (CACHE, "< $gopher_root/$request/$cachefile") print "error opening 
 $cachefile $!\n";
 while (<CACHE>){send(NS,"$_",0); }
 }
}

############################################################################
# do a WAIS search
sub wa2go { #Modified from Jonny Goldman's waismail.pl <jonathan@think.com>
 $tmpfile = "/tmp/sGs.$$";
 $sfile = "sGs.$$.src";
 $outfile = "/tmp/sGs.out.$$";
 $errfile = "/tmp/sGs.err.$$";
 $goph_string=$_;
 ($gophertype, $action, $wais_src_dir, $source, @words) = split(/::/,$_);
 if (/^maxres (\d+)/) { $maxres = $1;}
 
 if (/^7::search^7::Search^7::SEARCH/) {
 ($gophertype, $action, $wais_src_dir, $source, @words) = split(/::/,$_);
 $search=1;
 @sources=split(".src",$source);
 $ENV{'WAISCOMMONSOURCEDIR'} = $wais_src_dir;
 $maxres = 200;
 $waissourcedir=$wais_src_dir;
 &dosearch();
 }

 if (/^7::retrieve^7::Retrieve^7::RETRIEVE^[ \t]{0,}DocID: /) {
 ($gophertype, $action, $docid) = split(/::/,$_);
 $retrieve = 1; $indocid = 1; chop($docid); chop($docid);
 &log_request("RETRIEVING: $docid\n");
 }
 
 if ($indocid == 1) {
 $indocid = 0;
 &doretrieve();
 }

 open(RESPONSE,"<$outfile");
 while (<RESPONSE>){
 if ($retrieve) {
 send(NS,"$_",0);
 }

 if ($search) {
 $/ = ""; #paragraph mode
 ($result,$heading,$DOCID) = split(/\n/,$_);
 if ($heading =~/Headline/){


 if ($DOCID =~/GIF/) {
 send(NS,"g$heading\t7::retrieve::$DOCID\t$thishost\t$thisport\r\n",0);
 }
 else {
 send(NS,"0$heading\t7::retrieve::$DOCID\t$thishost\t$thisport\r\n",0);
 }
 }
 }
 }
 send(NS,".\r\n",0);

 unlink $outfile;
 unlink $tmpfile;
 unlink $errfile;
 unlink $sfile;
}

###############################################################################
sub dosearch {
 foreach $source (@sources) {
 if(!(-f "$waissourcedir/$source.src")) {
 &logerror("could not find source: $waissourcedir/$source.src");
 }
 }

 open(TMP, ">$tmpfile");
 printf TMP "(:question :version 2\n :seed-words \"";
 foreach $w (@words) { printf TMP "$w ";};
 printf TMP "\"\n :relevant-documents\n ( ";

 if ($relevant) {
 foreach $rel (@reldocs) {
 $_ = $rel;
 /@/ && ($_ = $`) && (/:/) && ($id = $`) && ($db = $');
 printf TMP "\n (:document-id \n :document \n (:document \n 
 :doc-id \n";
 printf TMP " (:doc-id \n :original-database %s \n 
 :original-local-id %s\n)\n";
 &stringtoany($db), &stringtoany($id);
 printf TMP " :source (:source-id :filename \"$source.src\" )\n";
 printf TMP " ) )\n";
 }
 }

 printf TMP " )\n";
 printf TMP " :sourcepath \"$waissourcedir/:\" \n";
 printf TMP " :sources (\n";
 
 foreach $source (@sources) {
 printf TMP " (:source-id :filename \"$source.src\" )\n";
 }

 printf TMP " )\n";
 printf TMP " :maximum-results %d )\n", $maxres;
 close(TMP);
 system("cp $tmpfile /tmp/TESTSEARCH");
 &log_request("WAISSEARCH: @sources, words: @words");

 if ($relevant) {

 foreach $rel (@reldocs) {
 $_ = $rel;
 { &log_request("RelDocID: \"$rel\" ");}
 }
 }
 open (OUT, ">>$outfile");
 printf OUT "Searching: ";
 foreach $source (@sources) {
 printf OUT "$source ";
 }

 printf OUT "\nKeywords: ";
 foreach $w (@words) { printf OUT "$w "; };
 if ($relevant) {
 foreach $rel (@reldocs) {
 $_ = $rel;
 { printf OUT "\nRelDocID: \"$rel\"";}
 }
 }
 printf OUT "\n";
 system("$waisq -f $tmpfile -m $maxres -g >> /dev/null 2> $errfile");
 open(ERR, "$errfile");

 while (<ERR>) {
 if (/Connect to socket did not work:/) {
 &log_request("Error Searching @sources for @inetaddr: Bad connect 
 (source down?)");
 &log_request("Error: $_");
 printf OUT "\n**** Error Searching @sources: not responding ****\n";
 printf OUT "\tPlease send mail to the maintainer.\n";
 }
 }
 close(ERR);
 #unlink($errfile);
 open(TMP, "$tmpfile");
 $inres = 0;

 while(<TMP>) {
 /:result-doc/ && ($inres = 1);
 if ($inres == 1) {
 /:score\s+(\d+)/ && ($score = $1);
 ((/:headline "(.*)"$/ && ($headline = $1)) 
 (/:headline "(.*)$/ && ($headline = $1))); # one little "" any 
 my formatter dies.
 /:number-of-bytes\s+(\d+)/ && ($bytes = $1);
 /:type "(.*)"/ && ($type = $1);
 /:filename "(.*)"/ && ($sourcename = $1);
 /:original-local-id "(.*)"/ && ($docid = $1);
 /:original-local-id (\(:any.*\))/ && ($docid = &anytostring($1));
 /:original-database "(.*)"/ && ($database = $1);
 /:original-database (\(:any.*\))/ && ($database = &anytostring($1));
 /:date "(\d+)"/ && ($date = $1, &docdone);
 }
 }
 printf OUT 
"\n______________________________________________________________________\n\n";
 close(TMP);
 close(OUT);
 $relevant = ''; @reldocs = '';

# unlink($tmpfile);
}

##############################################################################
sub doretrieve {
 $port = "0";
 $_ = $docid;
 s/^DocID: //g;
 if (/%/) {
 $docid = $`;
 $type = $';
 #print "in doretrieve type = :$type:...\n";
 }
 $_ = $docid;
 /:/ && ($id = $`) && ($db = $');
 /@/ && ($_ = $`) && (/:/) && ($id = $`) && ($db = $');
 $_ = $docid;
 /@/ && ($_ = $') && (/:/) && ($host = $`) && ($port = $');
 open(SRC, ">/tmp/$sfile");
 printf SRC "(:source :version 3 \n";
 printf SRC " :database-name \"$db\"\n";
 if ($port != 0) {
 printf SRC " :ip-name \"$host\" :tcp-port $port\n";
 }
 printf SRC ")\n";
 close(SRC);
 open(TMP, ">$tmpfile");
 printf TMP "(:question :version 2 :result-documents \n";
 printf TMP " ( (:document-id :document (:document :doc-id\n";
 printf TMP " (:doc-id :original-database %s\n", &stringtoany($db);
 printf TMP " :original-local-id %s )\n", &stringtoany($id);
 printf TMP " :number-of-bytes -1 :type \"$type\"\n";
 printf TMP " :source (:source-id :filename \"$sfile\") ) ) ) )\n";
 close(TMP);
 $timestamp = &date() . " " . &time() . ":";
 &log_request("WAISSEND:\"$docid%%$type\" to @inetaddr\n");
 open(OUT, ">>$outfile");
# printf OUT
"______________________________________________________________________\n" if 
 ! ($type=~/GIF/);
 close(OUT);
 $docid = $docid."%".$type;
 if ($type eq "" $type eq "TEXT" $type eq " TEXT" $type eq "WSRC" 
 $type eq "GIF" $type eq 
 "HTML" $type eq "html") {
 $exres = system("$waisq -s /tmp/ -f $tmpfile -v 1 >> $outfile 2> $errfile");
 }
 else {
 $exres = system("($waisq -s /tmp/ -f $tmpfile -v 1 uuencode WAIS.res >> 
 $outfile) 2> $errfile");
 }
 unlink("/tmp/$sfile");
 open(OUT, ">>$outfile");
 open(ERR, "$errfile");
 while (<ERR>) {
 if (/Missing DocID in requestCould not find Source/) {
 s/done//g;
 printf OUT "Error getting document:\n $_\n";
 printf OUT "(This is usually a bad DocID,\n or the server has deleted the 
 document since you ran the search)\n";

 $timestamp = &date() . " " . &time() . ":";
 &log_request("Error Sending \"%s\" to @inetaddr: Bad DocID,\n $docid");
 }
 }
 close(ERR);
 #unlink($errfile);
# printf OUT
"______________________________________________________________________\n" if 
 ! ($type=~/GIF/);
 close(OUT);
}

############################################################################
sub docdone {
 open(SRC, "$waissourcedir/$sourcename");
 while(<SRC>) {
 /:ip-name[ \t]{0,}"(.*)"/ && ($ipname = $1);
 /:database-name[ \t]{0,}"(.*)"/ && ($databasename = $1);
 /:tcp-port[ \t]{0,}"(.*)"/ && ($tcpport = $1);
 /:tcp-port[ \t]{0,}(\d+)/ && ($tcpport = $1);
 /:maintainer[ \t]{0,}"(.*)"/ && ($maintainer = $1);
 }
 close(SRC);
 select(OUT); $num++;
 printf "\nResult #%2d Score:%4d lines:%3d bytes:%7d Date:%6d Type: %s\n", 
 $num, $score, $lines, 
 $bytes, $date, $type;
 printf "Headline: %s\n", $headline;
 printf "DocID: %s:%s", $docid, $database;
 if ($tcpport != 0) { printf "@%s:%d", $ipname, $tcpport; }
 printf "%%$type\n";
 $score = $headline = $lines = $bytes = $type = $date = '';
 select STDERR;
}

############################################################################
# a couple of WAIS utility functions
sub anytostring {
 local($any) = pop(@_);
 $res = '';
 $_ = $any;
 if (/:bytes #\((.*)\)(.*)\)/ && ($string = $1)) {
 @chars = split(' ', $string);
 foreach $c (@chars) {
 $res = $res.sprintf("%c", $c);
 }
 }
 $res;
}

sub stringtoany {
 local($str) = pop(@_);
 $len = length($str);
 $res = sprintf("(:any :size %d :bytes #( ", $len);
 for ($i = 0; $i < $len; $i++) {
 $res = $res.sprintf("%d ", ord(substr($str,$i,1)));
 }
 $res = $res.") )";
 $res;
}

############################################################################
# error logging
sub logerror {
 $timestamp = &date() . " " . &time() . ":";
 open(ELOG,">>$errorlog") die "can't open error log\n";
 printf ELOG "$timestamp @_\n"; close (ELOG);
 system("echo \"$timestamp @_\n\" $maintainer_address");
}
############################################################################
# date and time functions
sub date {
 local ($when) = `date '+%m/%d/%y'`; chop $when;
 return $when;
}

sub time {
 local ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst);
 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
 $mon = $mon + 1;
 return sprintf("%02d:%02d:%02d", $hour,$min,$sec);
}

###########################################################################
# general logging
sub log_request {
 local ($request)=@_;
 $timestamp = &date() . " " . &time();
 open (LOG,">>$logfile") &logerror("can't open logfile: $logfile: $!\n");
 printf LOG 
 "%s::%s.%s.%s.%s::%s",
 $timestamp,
 @inetaddr[0],@inetaddr[1],@inetaddr[2],@inetaddr[3],$request;
 close (LOG);
}

###########################################################################
sub process_link {
 local ($lname,$ltype,$lport,$lpath,$lhost);
 if (-T "$gopher_root/$request/$file") {
 open(LINK,"< $gopher_root/$request/$file") die "cant open 
 $gopher_root/$request/$file: $!\n";
 while (<LINK>) {
 @L=split("="); chop(@L);#print " @L\n";
 if(/^Name^Type^Port^Path^Host/) {
 if (/^Name/) {$lname=@L[1]}
 if (/^Type/) {$ltype=@L[1]}
 if (/^Port/) {$lport=@L[1]}
 if (/^Path/) {$lpath=@L[1]}
 if (/^Host/) {$lhost=@L[1]}
 }
 else { return } # funny garbage in link file
 }

 if ($ltype == 0 $ltype == 1 $ltype eq "g" ) {
 print "$ltype$lname\t$lpath\t$lhost\t$lport\r\n";
 send(NS,"$ltype$lname\t$lpath\t$lhost\t$lport\r\n",0);
 }
 if ($type == 7 && $wais_op) {
 $waissourcedir = "$gopher_root/@tmp"; #chop $waissourcedir;

 $ENV{'WAISCOMMONSOURCEDIR'} = $waissourcedir;
 send(NS,"$ltype/$lname\t$ltype::search::$waissourcedir::$lpath::
 \t$lhost\t$lport\r\n",0);
 }
 }
}
###########################################################################
# validity check
sub valid_request {
 $request=$_;chop $request;chop $request;
 substr($request,0,1)="";
 if ($request=~/\.\./) { return 0;}
 else { return 1}
}

##############################################################################
# registration functions
sub register {
 $c="./.sgsc";
 if ( ! -r $c) { &cop();&reg();&sen();return}
 if (&bf() == 1) {&sen(); &cop(); &reg();&sen();return}
}

sub cop {
 print"
 sGs.pl

 (C) COPYRIGHT
 1993 Bob Kaehms
 cames@well.com

This software is provided free, AS IS, and neither the author, nor any person
or entity associated with the author in producing this software is responsible
for the condition of the software, it's use, or any damage to a computer or
the information therein, from using this software.

In short, LET THE USER BEWARE. If you plan on running this software you should
be familar with TCP/IP and network security.

You may do what you want with the program as long as the original copyright
and the following notice remain attached.

 Press <RETURN> when you've read and accept the above caveate";
 $OK=<STDIN>; system("clear");

 print"

 A. BECAUSE THE PROGRAM IS AVAILABLE FREE OF CHARGE, THERE IS NO WARRANTY
 FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
 OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDER AND/OR OTHER PARTIES
 PROVIDE THE PROGRAM \"AS IS\" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
 OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
 TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
 PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF NECESSARY SERVICING,
 REPAIR OR CORRECTION.

 B. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
 WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR

 REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
 INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES
 ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT
 NOT LIMITED TO THE LOSS OF DATA BEING RENDERED INACCURATE OR
 LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO
 OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS
 BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

 Press RETURN if you've read and accept the above.Else Cnt C";
 $OK=<STDIN>; system("clear");
}

sub reg {
 $hdr="/tmp/HDR";
 local($d) = `date`;
 local($h) = `uname -a`;local($da)=`domainname`;local ($w)=`whoami`;
 open(H,"> $hdr");printf H "From: $w@$hTo: $$maintainer_address\nSubject: 
 Gopher 
Registration\n\n";close H;
 print "Please enter the following:\n\n";
 print "NAME COMPANY PHONE e-mail address\n";
 $user_contact = <STDIN>;
 print "thanks.....\n";
 open(F,">$c");
printf F " sGs.pl\n $revision\n $d HARDWARE\n $h $da $w CONTACT PERSON\n 
$user_contact";
 close F;
}

sub sen {
 system("cat $hdr $c /usr/lib/sendmail '$maintainer_address'");
}

###########################################################################
# special check to see if we've registered once
#
sub bf {
 local ($h) = `uname -a`;local ($da)=`domainname`;local ($w)=`whoami`;
 $/ = ""; #Enable paragraph mode
 open (F,"<$c");
 while (<F>){
 if (/$h/ && /$da/ & /$w/){$/ = "\n";close F;return "0";}
 close F;$/ = "\n";return "1"}
}


















Special Issue, 1994
Building an Internet Global Phone


Two-way voice over the Internet




Sing Li


Sing, a products architect with microWonders in Toronto, specializes in GUI
portability, UNIX systems programming, digital-convergence technologies, and
device drivers. You can contact him on CompuServe at 70214,3466.


The Internet connects millions of computers worldwide for the exchange of
information. For the most part, this information is text data, typically in
the form of e-mail. However, digitized data such as voice, photos, and video
can also be exchanged over the same medium, although bandwidth limitations,
the lack of global standards, and the scarcity of appropriate software has
generally stood in the way of making this possible. 
Voice communication in particular is an important key to achieving a true
global information highway. While English is the primary language used to
exchange information over the Internet, it is not the first language of
millions of Internet users. Of course, hardware and software systems for
translating Arabic, Far Eastern languages, and others can be built, but doing
so is currently prohibitively expensive. Direct voice communications can
overcome this barrier.
In this article, I'll present the "Internet Global Phone" (IGP), an
application which enables two-way voice communication across the Internet.
Although Windows hosted, IGP is compatible with similar utilities available
for UNIX workstations. IGP, written in Visual C++ 1.5 using the Microsoft
Foundation Classes (MFC) 2.5, is implemented via Windows Socket 1.1 and allows
LAN communications over most TCP/IP stacks including Windows for Workgroups,
Novell LAN workplace, Windows NT, and the like. In doing so, it provides
real-time, on-the-fly voice compression on 80486 processors, providing
serviceable voice communications over SLIP or PPP links at speeds as low as
14,400 bps. Still, with its reasonably low bandwidth requirements, IGP is a
good Internet citizen and not a resource hog. 


IGP Design and Implementation


A number of widely circulated voice applications are available for UNIX
workstations, most of which are implemented using Berkeley Sockets. All follow
similar design architectures; see Figure 1. 
Defining the problem in terms of client and server components greatly
simplifies the design and coding tasks. Example 1(a) is the generic pseudocode
for a client, while Example 1(b) is the generic pseudocode for the server. 
Coding falls out directly from the design, which relies heavily on access to
built-in hardware DSP, the high processing speed of most UNIX workstations,
the preemptive multitasking nature of UNIX, and the ease of use of Berkeley
Sockets. In short, it is a simplistic, yet functional, design.
I began coding IGP with Visual Basic. Using an assortment of VBX components,
voice input is digitized through the high-level Media Control Interface (MCI),
then sent through Windows Sockets to the destination. The destination receives
the data through Windows Sockets and then plays it back via MCI. 
I encountered a couple of problems with this approach. For one thing, the huge
size of voice files created by MCI clogged up all available bandwidth over
slow links. I solved this problem by incorporating compression into the
software. A second problem involved synchronization when receiving data while
at the same time recording/sending information. It quickly became obvious that
the serial procedure of recording-compression-transmission took too much time.
Even with real-time compression algorithms, it took more than three times the
recording duration before the voice was received at the other end. Figure 2
describes these time delays. CPU time hogging also occurred during the
compression process, which ultimately led to the user interface being unusable
for various periods.
Because of these low-level problems, I made a number of implementation
changes. For one thing, I switched the development platform from Visual Basic
to C/C++. Next, I began calling low-level audio routines instead of using the
MCI. Finally, I directly programmed the Windows Sockets instead of using VBXs.
In short, the event-driven, non-preemptive nature of Windows precluded the use
of the elegantly simple UNIX-based design. (Building the implementation in
Windows is more akin to embedded-systems programming than to designing
high-level applications.)
All user code execution within Windows applications is in response to specific
operating-system events or messages. Since Windows is not a preemptive
multitasking system, servicing the messages holds up the entire Windows
session. This is analogous to handling hardware interrupts with all interrupts
disabled. If the application does not have other real-time messages to handle,
the occasional hogging of the CPU may not be a major problem. Unfortunately in
IGP, there are plenty of real-time events to handle.
IGP needs to be able to receive messages while recording, during compression,
and while users are running other applications (background ftp, for instance).
This means that IGP must do as little as possible between processing messages.
Protocol management and audio-compression algorithms expressed in a linear
fashion must be segmented to get incremental work done between successive
messages.
Finally, the Windows cooperative multitasker does not keep state information
for logically different threads of execution. Therefore, it is necessary for
the application to carry state information for each of the logical recording
buffers: management, compression, protocol management, decompression, and
playback/buffer-management threads.
Conventional wisdom says that any naturally multithreaded application can be
coded within a single thread/task via a huge switch mechanism and some real
ugly coding. For the sake of implementability and efficiency, IGT is
programmed in this less-than-elegant fashion.


Real-Time Voice Handling


Using the low-level audio routines in Windows involves the calls in Table 1.
The application allocates buffers of memory (via GlobalAlloc), initializes
them (via waveInPrepareHeader), and feeds them to the waveIn digitizer device
(via waveInAddBuffer). As digitized audio becomes available after a
waveInStart command, the system sends a message (MM_WIM_DATA) to a specified
window with the currently filled buffer of audio. The application makes sure
that the digitizer has additional memory buffers to work with. If buffers are
not supplied to the digitizer in time, there will be dropouts in the resulting
audio capture. The application also has the responsibility of deinitializing
(waveInUnprepareHeader) and freeing all allocated memory buffers.
Programming the audio-playback hardware is similar to programming audio input.
Buffers are allocated (via GlobalAlloc), filled with audio data, initialized
(via waveOutPrepareHeader), and sent to the playback hardware (waveOut device)
via the waveOutWrite call. When the system is finished playing back a buffer,
a message (MM_WOM_DONE) is sent to the application-specified window that used
the audio-data block. The application must then deinitialize the buffer
(waveOutUnprepareHeader) and free the memory.
There is one caveat: Although most audio digitization and playback takes place
via the DMA/interrupt mechanism of the audio hardware, some audio drivers
still choke the playback if too much foreground time is consumed. Therefore,
take care not to work the Windows message pump too hard during audio-playback
periods. 
In Windows 3.1, the only currently documented and universally supported audio
format is WAVE_FORMAT_PCM. This format is somewhat of an anomaly in that 8-
and 16-bit samples are stored inconsistently. The 8-bit data is an unsigned
linear sample, and the 16-bit data is a signed sample; see Figure 3.
Consequently, you can compile the IGP source with either the HI_FIDELITY
switch set for 16-bit sound cards (#define HI_FIDELITY), or comment off the
#define for 8-bit sound cards.
Ideally, the voice-compression algorithm should be able to execute in real
time. This will allow on-the-fly compression and minimize the delay between
recording and actual playback at the receiving end. 
Standard waveform-coder (compression) technologies such as adaptive
differential pulse code modulation (ADPCM) have a very high bit rate and very
high-speed link requirements. Using mathematical models to simulate the
human-voice production system, technologies based on voice coders/decoders
(vocoders) can provide acceptable speech at reasonable bit-stream rates.
However, vocoders are usually very computationally intensive.
Most available vocoder algorithms are designed for DSP-based implementations
rather than for general-purpose processors. However, Jutta Degener and Carsten
Bormann of TU Berlin have developed a 32-bit C implementation of the European
GSM 0.6.10 speech-compression standard (prI-ETS 300 036), which is what IGP
uses. Consequently, compression on a 33/486 CPU is very close to real time.
Windows makes it necessary to recompile the algorithms in 16 bits. The current
coding has some artifacts due to round-off. The resulting sound is audible and
satisfactory for our purposes.
The actual GSM algorithm compresses blocks of 160 samples (13 bits) at 8-kHz
sampling into 260 bits. Degener and Bormann's implementation takes 160 16-bit,
linear 8-kHz samples and compresses them into a 33-byte frame, generating a
bit-stream rate of 13 Kbits per second after compression. This bit rate is
close to the raw bit throughput of 14,400 modems. It will be well within the
capability of the new crop of V.34(V.Fast) modems operating at 28,800 baud.
Using GSM will allow even slow-link (modem-connect) users of the Internet to
participate in IGP activities.


The Protocol


A protocol is necessary to coordinate the transmission and receipt of voice
and status information. For IGP, I selected a protocol compatible with current
UNIX utilities, allowing PC users to talk to the large community of UNIX users
on the Internet.
This protocol is simple and extensible, as Figure 4 illustrates. Both sides
take turns being the initiator and receiver. Note that the initiator and
receiver are totally independent of the role of client and server. This can be
confusing since the different roles are all played by separate, logical
threads of the same process or task.
Table 2 shows the logical threads of operation on a higher level. The model is
for all threads to perform their tasks concurrently. During actual
implementation, however, there are obvious difficulties in realizing the model
fully. The half-duplex nature of Windows audio drivers will not allow audio
playback during recording. More than one active server thread causes trouble,
since there is no means of simultaneously opening or mixing multiple channels
of digital audio output.


The Winsock Reality



Handling TCP/IP-based client/server configurations on a UNIX system involves
the execution of a daemon (background) tcpd process. This "listening manager"
process listens at various well-known ports for network-client connections.
Once a connection is detected, tcpd consults a services system-configuration
file for the actual server to invoke. It then spawns the necessary server
process and connects the client-server pair with a new socket for
communications. A single tcpd started during system boot will handle tcp
connections for all available system servers.
Naturally, this straightforward, elegant solution has no parallel in the
Windows world. No system-level support for TCP/IP-based client/server
connections exists. So you have to emulate this behavior using
application-level coding.
The equivalent of Berkeley Sockets on Microsoft Windows is Windows Sockets
1.1, a binary-compatible API designed to support TCP/IP protocol stacks from a
variety of vendors. (See "Untangling the Windows Sockets API," by Mike
Calbaum, Frank Porcaro, Mark Ruegsegger, and Bruce Backman, DDJ, February
1993.) While it is a workable standard, programming with the current Winsock
1.1 specifications is not without pain. Due to the underlying operating-system
architecture and the need to be compatible with a variety of TCP/IP tools, the
functions defined in Windows Sockets 1.1 are minimally adequate. Performing
even the most common data transmit or receive requires a substantial amount of
coding.
Until Windows Sockets 2.0 arrives on the scene or Microsoft provides
system-level extensions, Mark Clouden's WSNETWRK library substantially eases
the pain of programming raw Winsock by providing a higher-level abstraction of
the socket paradigm. This implementation of IGP uses WSNETWRK, placing a C++
wrapper around the library, thereby providing the much-needed virtual
functions to events mapping for all relevant messages to IGP. The resulting
class hierarchy is much cleaner and considerably more maintainable. (For more
on WSNETWRK, see the text box entitled, "The WSNETWRK Library.")


Doing it with Classes


To achieve some extensibility, it is a good idea to organize Winsock coding in
a rational manner. C/C++ and MFC make the job somewhat easier. By carefully
mapping system events and messages to C++ virtual functions, you can make the
coding understandable. Some commercial libraries, notably MFC, use a
macro-based hash table instead of C++ virtual functions to map messages. The
construct preserves the clean nature of the virtual-function concept and
greatly improves execution speed. I'll avoid the awkward construct since the
events and messages that IGP deals with occur infrequently compared to the
core Windows messages that MFC deals with. The overhead of a formal virtual
function call is negligible in this case.
The new model is based on a virtual CSocketOwner class; see Figure 5. As a
virtual class, no object of CSocketOwner can be instantiated. This class is
derived from CObject and is designated as the base class of any object which
may contain and/or own a socket. CSockOwner contains member functions that
have an almost one-to-one mapping to all required functions in Winsock 1.1. It
also contains the default implementation of the Notify and Callback functions.
These functions intercept the messages from WSNETWRK and map them into
corresponding virtual functions OnConnected, OnSendCompleted,
OnReceiveCompleted, OnTimerExpired, and OnDisconnected.
From CSocketOwner, you derive three instantiable classes, CSockListenServer,
CSockServer, and CSockClient. CSockListenServer, or a class based on it, has
the basic functionality of tcpd in UNIX and can be asked to monitor at a
certain port. Once a client connection is received, CSockListenServer will
dynamically create a specific CSockServer object and create a new socket for
the client/server pair. After this, CSockListenServer returns to monitor the
port until it is explicitly terminated.
The CSockServer class contains the common data structures and handles the
common functions of a socket server, independent of protocol. The CSockClient
class does the same for socket clients. 
Any protocol-specific handling is encapsulated in C<protocol
specific>ListenServer, C<protocol specific>Server, and C<protocol
specific>Client classes. This normalization, however, has not yet been
implemented in IGP. IGP's talksock.cpp and talksock.h modules contain
CTalkSockListenServer, which is derived from CSockListenServer. Both the
CTalkSockServer and CTalkSockClient classes are directly derived from
CSocketOwner. Propagating common functions and data up to the CSockServer and
CSockClient classes makes the hierarchy reusable for new protocol
implementations.


IGP Source Code


IGP is developed using Microsoft Visual C++ 1.5. It makes use of Microsoft
Foundation Classes (MFC 2.5) to generate the application framework. The three
source files--mainframe.cpp (Listing One), phonedoc.cpp (Listing Two), and
mphone.cpp (Listing Three)--are typical of code generated by Microsoft C's
AppWizard. An SDI (Single Document Interface) document-view-based structure is
selected during code generation. Phoneview.cpp contains most of the audio
handling and user-interface management logic. Talksock.cpp contains the
Winsock and protocol-handling components. IGP uses the medium memory model.
The library file GSM.LIB is a 16-bit version of Degener and Bormann's GSM
compression module and is available electronically; see "Availability," page
3. The original source-code distribution is included as GSMSRC.ZIP.
SOCKLIB.LIB consolidates Mark Clouden's WSNETWRK.LIB and the low-level
CSocketOwner and CSockListenServer implementations; these are also available
electronically. To run IGP, you will need the included WSNETWRK.DLL and a
WINSOCK.LIB and WINSOCK.DLL from your Winsock vendor to link the IGP source
code.


Building Future Global Phones


As a technology prototype, IGP shows what is possible through a first-cut
functional implementation. There is obviously room for improvement.
First, the user interface could use a face-lift. Management of callee
information, prerecorded messages functions, call taping, call screening, and
the like could all be readily implemented. Error handling could be made more
robust (the current code merely pops up message boxes).
At the engine level, it is possible to implement full packet-based parallel
record-compress-transmit with the current architecture. However, this will
require extensive testing with boundary and timing conditions. Out-of-band
transmission is also necessary to indicate the length of voice segment to the
receiver. Once the parallel functions are in place, new application
areas--such as voice routers or soft-phone switchboards--will open up.
From the server perspective, parallel receive-decompress-play is possible.
However, actual tests over slow links show that synchronization becomes a
major issue here. It is difficult to avoid unnatural breaks between received
chunks. This undesirable effect offsets any perceived performance benefits.
Simultaneous multiclient handling can be added using the available
software-based Wave-Mixer module from Microsoft or third parties. Full-duplex
operation (recording and playback at the same time) is not yet possible
because of current Windows driver limitations. 
Easily adaptable system-level support for audio compression is on the horizon.
Looking further ahead, standardization of the still-video-capture API should
allow transfer of still, JPEG-compressed frames at the beginning and/or end of
audio-segment transmissions. Still, it is important that you not sacrifice the
built-in UNIX compatibility. The existing protocol should be adaptable to
handle still-frame video in a backward-compatible way.
With the arrival of 32-bit multitasking, multithreaded Windows and the
much-needed programmer-oriented enhancements to Winsock beyond the 1.1 specs,
most of the complexities encountered in the design of IGP may disappear. Until
then, IGP provides a workable two-way voice solution over the Internet for
Microsoft Windows.
The WSNETWRK Library
Mark Clouden
Mark is a systems programmer and independent software developer. For more
information or to purchase a copy of WSNETWRK, contact Mark at 2632 South
Miller Drive, #104, Lakewood, CO 80227.
WSNETWRK is a Microsoft Windows DLL that encapsulates and expands upon the
functionality of the Windows Sockets (Winsock) API. I designed the library
while working on the development of several nonblocking, Winsock-based
applications. Tired of redundant coding involving the handling of network
events, I created WSNETWRK as a common base of code that could be maintained
independently of the caller. After nailing down the initial functionality, I
decided to expand the library. Eventually, what started out as a small
collection of standard routines for my own personal use became a full-blown
SDK.
WSNETWRK is comprised of a core module, providing base-level socket support,
and additional APIs that use the nuclear module to provide Internet
standard-protocol support. The current base library contains over 55 routines,
including standard operations such as send, receive, connect, and accept. Also
included are more high-level functions, such as receives with timeouts, user
timer services, socket memory management, out-of-band data handling, and
connects incorporating host-name look-up.
The WSNETWRK SDK supports ICMP echo, ECHO, and Finger. FTP client and server
support is expected to be available by the time this article goes to print.
Also under development for near-term completion are SMTP, POP3, and NNTP. In
the long run, all protocols defined as "standard" for the Internet will be
included. 
A brief discussion of nonblocking sockets and the FD_WRITE event is a good way
to show how WSNETWRK works. In order to make true use of a nonblocking socket
using Winsock calls, the programmer must change his or her view of how an
event drives the application. If a send() is initiated and cannot be completed
in a single call, the application must be prepared to request and then process
the FD_WRITE event. To process this event, the caller must remember how much
of the buffer has been transmitted so far, and then continue the send with
another call. The caller must also remember how many bytes were to be sent in
order to know when the operation is complete. In most of the applications I
have seen, these difficult steps are avoided, and instead the send is
completed in a single, tight loop, thereby negating most of the benefits of a
nonblocking socket.
Using WSNETWRK to send a buffer requires only a single call to WSSend() (or
WSSendTo() for unconnected sockets). WSSend() will return immediately, even
though the data has yet to be sent. When the buffer has been completely
transmitted (or the send failed with an error), then the application will
receive a WSSENDCOMPLETE notification message. To accommodate long operations,
WSNETWRK will periodically send a WSSENDUPDATE event to the caller, detailing
the number of bytes transmitted at that point. This allows the caller to
implement a progress meter or some other user-interface item. An operation in
progress may be canceled at any time using the WSCancelIO() function.
WSNETWRK makes use of Winsock's requirement that all network events be sent to
the window associated with a socket. WSNETWRK internally creates and manages
this window, using it as a method of obtaining the network events. Events
received are then processed according to the current state of the socket. As
WSNETWRK operations complete, the caller is notified through the WSNETWRK
mechanism of a callback function or callback window.
By using the socket's window this way, WSNETWRK can also track when its parent
(the calling application) is closed. This allows WSNETWRK to do some garbage
collection, destroying sockets and other resources owned by the now-closed
application.
Instead of requiring that the caller use the Winsock WSAStartup() and
WSACleanup() functions, WSNETWRK instead tracks how each calling task uses it.
As a task opens and closes sockets, or makes database requests, WSNETWRK will
remember which resources are allocated so that they may be released if the
caller exits without closing them itself.
WSNETWRK can be used instead of or in conjunction with raw Winsock function
calls. Access to a Winsock development kit is not required. The product saves
significant coding time and energy and places no artificial limitations on
programs. WSNETWRK is presently implemented as a C-based DLL. Shortly, all
modules will be available as VBX interfaces, as well as a C++ class library. 
Figure 1 Typical voice/talk application.
Figure 2 IGP two-way voice-transmission time delays.
Figure 3 Wave-format PCM coding.
Figure 4: Logical threads of operation.
Client Thread
 Become Initiator
 Digitize Voice and send voice
Server Thread
 Become Receiver
 Receive voice and play voice
Listener Thread
 Listen on well-known port for connection
 Spin off Server Thread upon connection

Figure 5 Hierarchy for C++ Winsock client/server classes.
Example 1: (a) Pseudocode for the IGP client; (b) pseudocode for the IGP
server.
(a)
while (not end conversation)
 grab a small buffer from built-in dsp
 compress the small buffer
 connect to server tcp socket
 establish protocol parameters
 send the small buffer
 disconnect
end while
(b)
while true
 listen for connection
 if connection then
 establish protocol parameters
 receive a small buffer
 decompress the small buffer
 play the result on built-in dsp
 endif
end while
Table 1: Windows low-level audio routines.
waveInOpen
waveInClose
waveInAddBuffer
waveInPrepareHeader
waveInUnprepareHeader
waveInStart
waveInStop
waveInReset
waveOutOpen
waveOutClose
waveOutPrepareHeader
waveOutUnprepareHeader
waveOutWrite
waveOutReset
Table 2: Protocol states for IGP initiator and receiver.
 Receiver Initiator 
 send ack open link
 get type of data wait ack
 send ack send type of data
 get length of data wait ack
 send ack send length of data
 get data itself wait ack
 send ack send data itself
 terminate wait ack
 terminate

Listing One 

//////////////////////////////////////////////////////////////////////////////
// Internet Global Phone Project
// mainfrm.cpp : implementation of the CMainFrame class
//
// Very little modification to the AppWizard generated MFC skeleton.
//////////////////////////////////////////////////////////////////////////////
// Copyright (c) 1993-1994 microWonders Inc. All rights reserved.
// 
// AN OPEN INVITION TO BUILD UPON AND CONTRIBUTE TO THE PUBLIC TECHNOLOGY
POOL:

// You are encouraged to redistribute, and build upon the technologies 
// presented in this source module and the accompanying article in 
// Dr. Dobb's Journal provided all the conditions listed in the MUSTREAD.TXT 
// file, included with this distribution, are met.
//////////////////////////////////////////////////////////////////////////////

#include "stdafx.h"
#include "mphone.h"

#include "mainfrm.h"

#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif

/////////////////////////////////////////////////////////////////////////////
// CMainFrame

IMPLEMENT_DYNCREATE(CMainFrame, CFrameWnd)

BEGIN_MESSAGE_MAP(CMainFrame, CFrameWnd)
 //{{AFX_MSG_MAP(CMainFrame)
 // NOTE - the ClassWizard will add and remove mapping macros here.
 // DO NOT EDIT what you see in these blocks of generated code !
 ON_WM_CREATE()
 //}}AFX_MSG_MAP
 // Global help commands
 ON_COMMAND(ID_HELP_INDEX, CFrameWnd::OnHelpIndex)
 ON_COMMAND(ID_HELP_USING, CFrameWnd::OnHelpUsing)
 ON_COMMAND(ID_HELP, CFrameWnd::OnHelp)
 ON_COMMAND(ID_CONTEXT_HELP, CFrameWnd::OnContextHelp)
 ON_COMMAND(ID_DEFAULT_HELP, CFrameWnd::OnHelpIndex)
END_MESSAGE_MAP()

/////////////////////////////////////////////////////////////////////////////
// arrays of IDs used to initialize control bars

// toolbar buttons - IDs are command buttons
static UINT BASED_CODE buttons[] =
{
 // same order as in the bitmap 'toolbar.bmp'
// ID_FILE_NEW,
 ID_FILE_OPEN,
// ID_FILE_SAVE,
// ID_SEPARATOR,
// ID_EDIT_CUT,
// ID_EDIT_COPY,
// ID_EDIT_PASTE,
// ID_SEPARATOR,
// ID_FILE_PRINT,
// ID_APP_ABOUT,
// ID_CONTEXT_HELP, 
 ID_SEPARATOR,
 ID_PHONE_RECORD,
 ID_PHONE_SEND
};

static UINT BASED_CODE indicators[] =

{
 ID_SEPARATOR, // status line indicator
 ID_INDICATOR_CAPS,
 ID_INDICATOR_NUM,
 ID_INDICATOR_SCRL,
};

/////////////////////////////////////////////////////////////////////////////
// CMainFrame construction/destruction

CMainFrame::CMainFrame()
{
 // TODO: add member initialization code here
}

CMainFrame::~CMainFrame()
{
}

int CMainFrame::OnCreate(LPCREATESTRUCT lpCreateStruct)
{
 if (CFrameWnd::OnCreate(lpCreateStruct) == -1)
 return -1;

 if (!m_wndToolBar.Create(this) 
 !m_wndToolBar.LoadBitmap(IDR_MAINFRAME) 
 !m_wndToolBar.SetButtons(buttons,
 sizeof(buttons)/sizeof(UINT)))
 {
 TRACE("Failed to create toolbar\n");
 return -1; // fail to create
 }

 if (!m_wndStatusBar.Create(this) 
 !m_wndStatusBar.SetIndicators(indicators,
 sizeof(indicators)/sizeof(UINT)))
 {
 TRACE("Failed to create status bar\n");
 return -1; // fail to create
 }

 return 0;
}

/////////////////////////////////////////////////////////////////////////////
// CMainFrame diagnostics

#ifdef _DEBUG
void CMainFrame::AssertValid() const
{
 CFrameWnd::AssertValid();
}

void CMainFrame::Dump(CDumpContext& dc) const
{
 CFrameWnd::Dump(dc);
}

#endif //_DEBUG


/////////////////////////////////////////////////////////////////////////////
// CMainFrame message handlers




Listing Two

// phonedoc.cpp : implementation of the CPhoneDoc class
//
// no change from MFC generated skeleton
//
#include "stdafx.h"
#include "mphone.h"

#include "phonedoc.h"

#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif

/////////////////////////////////////////////////////////////////////////////
// CPhoneDoc

IMPLEMENT_DYNCREATE(CPhoneDoc, CDocument)

BEGIN_MESSAGE_MAP(CPhoneDoc, CDocument)
 //{{AFX_MSG_MAP(CPhoneDoc)
 // NOTE - the ClassWizard will add and remove mapping macros here.
 // DO NOT EDIT what you see in these blocks of generated code!
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()

BEGIN_DISPATCH_MAP(CPhoneDoc, CDocument)
 //{{AFX_DISPATCH_MAP(CPhoneDoc)
 // NOTE - the ClassWizard will add and remove mapping macros here.
 // DO NOT EDIT what you see in these blocks of generated code!
 //}}AFX_DISPATCH_MAP
END_DISPATCH_MAP()

/////////////////////////////////////////////////////////////////////////////
// CPhoneDoc construction/destruction
CPhoneDoc::CPhoneDoc()
{
}
CPhoneDoc::~CPhoneDoc()
{
}
BOOL CPhoneDoc::OnNewDocument()
{
 if (!CDocument::OnNewDocument())
 return FALSE;

 // TODO: add reinitialization code here
 // (SDI documents will reuse this document)

 return TRUE;

}

/////////////////////////////////////////////////////////////////////////////
// CPhoneDoc serialization

void CPhoneDoc::Serialize(CArchive& ar)
{
 if (ar.IsStoring())
 {
 // TODO: add storing code here
 }
 else
 {
 // TODO: add loading code here
 }
}

/////////////////////////////////////////////////////////////////////////////
// CPhoneDoc diagnostics

#ifdef _DEBUG
void CPhoneDoc::AssertValid() const
{
 CDocument::AssertValid();
}

void CPhoneDoc::Dump(CDumpContext& dc) const
{
 CDocument::Dump(dc);
}
#endif //_DEBUG

/////////////////////////////////////////////////////////////////////////////
// CPhoneDoc commands
//////////////////////////////////////////////////////////////////////////////
// Internet Global Phone Project
// mphone.cpp : implementation file for the IGP application CWinApp class
//
// Very little modification to the AppWizard generated MFC skeleton. Note only
// the mapping of the CWinApp Idle loop to the CPhoneView Idle loop to handle
// the background audio compression/de-compression threads.
//////////////////////////////////////////////////////////////////////////////
// Copyright (c) 1993-1994 microWonders Inc. All rights reserved.
// 
// AN OPEN INVITION TO BUILD UPON AND CONTRIBUTE TO THE PUBLIC TECHNOLOGY
POOL:
// You are encouraged to redistribute, and build upon the technologies 
// presented in this source module and the accompanying article in 
// Dr. Dobb's Journal provided all the conditions listed in the MUSTREAD.TXT 
// file, included with this distribution, are met.
//////////////////////////////////////////////////////////////////////////////


#include "stdafx.h" 
#include "mmsystem.h"
#include "mphone.h"

#include "mainfrm.h"
#include "phonedoc.h"
#include "wsmin.h"

#include "socket.h" 
#include "talksock.h"
#include "phonevw.h"


#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif

/////////////////////////////////////////////////////////////////////////////
// CPhoneApp

BEGIN_MESSAGE_MAP(CPhoneApp, CWinApp)
 //{{AFX_MSG_MAP(CPhoneApp)
 ON_COMMAND(ID_APP_ABOUT, OnAppAbout)
 // NOTE - the ClassWizard will add and remove mapping macros here.
 // DO NOT EDIT what you see in these blocks of generated code!
 //}}AFX_MSG_MAP
 // Standard file based document commands
 ON_COMMAND(ID_FILE_NEW, CWinApp::OnFileNew)
 ON_COMMAND(ID_FILE_OPEN, CWinApp::OnFileOpen)
END_MESSAGE_MAP()

/////////////////////////////////////////////////////////////////////////////
// CPhoneApp construction

CPhoneApp::CPhoneApp()
{
 // TODO: add construction code here,
 // Place all significant initialization in InitInstance
}

/////////////////////////////////////////////////////////////////////////////
// The one and only CPhoneApp object

CPhoneApp NEAR theApp;

/////////////////////////////////////////////////////////////////////////////
// CPhoneApp initialization

BOOL CPhoneApp::InitInstance()
{

 // Standard initialization
 // If you are not using these features and wish to reduce the size
 // of your final executable, you should remove from the following
 // the specific initialization routines you do not need.

 SetDialogBkColor(); // Set dialog background color to gray
 LoadStdProfileSettings(); // Load standard INI file options 
 // (including MRU)

 // Register the application's document templates. Document templates
 // serve as the connection between documents, frame windows and views.

 CSingleDocTemplate* pDocTemplate;
 pDocTemplate = new CSingleDocTemplate(
 IDR_MAINFRAME,

 RUNTIME_CLASS(CPhoneDoc),
 RUNTIME_CLASS(CMainFrame), // main SDI frame window
 RUNTIME_CLASS(CPhoneView));
 AddDocTemplate(pDocTemplate);

 // create a new (empty) document
 OnFileNew();

 if (m_lpCmdLine[0] != '\0')
 {
 // TODO: add command line processing here
 }


 return TRUE;
}

/////////////////////////////////////////////////////////////////////////////
// Idle Processing for Compression and Decompression

BOOL CPhoneApp::OnIdle(LONG lCount)
{
 BOOL bMore = CWinApp::OnIdle(lCount);
 // do compression or decompression every moment we have 
 if (m_OnlyView)
 {
 bMore = ((CPhoneView *)(m_OnlyView))->DoIdleProcessing(); // neat!
 }
 return bMore;
 // return TRUE as long as there is any more idle tasks

}



/////////////////////////////////////////////////////////////////////////////
// CAboutDlg dialog used for App About

class CAboutDlg : public CDialog
{
public:
 CAboutDlg();

// Dialog Data
 //{{AFX_DATA(CAboutDlg)
 enum { IDD = IDD_ABOUTBOX };
 //}}AFX_DATA

// Implementation
protected:
 virtual void DoDataExchange(CDataExchange* pDX); // DDX/DDV support
 //{{AFX_MSG(CAboutDlg)
 // No message handlers
 //}}AFX_MSG
 DECLARE_MESSAGE_MAP()
};

CAboutDlg::CAboutDlg() : CDialog(CAboutDlg::IDD)
{

 //{{AFX_DATA_INIT(CAboutDlg)
 //}}AFX_DATA_INIT
}

void CAboutDlg::DoDataExchange(CDataExchange* pDX)
{
 CDialog::DoDataExchange(pDX);
 //{{AFX_DATA_MAP(CAboutDlg)
 //}}AFX_DATA_MAP
}

BEGIN_MESSAGE_MAP(CAboutDlg, CDialog)
 //{{AFX_MSG_MAP(CAboutDlg)
 // No message handlers
 //}}AFX_MSG_MAP
END_MESSAGE_MAP()

// App command to run the dialog
void CPhoneApp::OnAppAbout()
{
 CAboutDlg aboutDlg;
 aboutDlg.DoModal();
}

/////////////////////////////////////////////////////////////////////////////
// CPhoneApp commands


Listing Three 
Listing Three 
































Special Issue, 1994
Creating Your Own Multiplayer Game Systems


A flexible engine for network-game development




Rahner James and Linus Sphinx


Rahner and Linus are programmers in the Sacramento, California area. They can
be reached on CompuServe at 71450,757, the Channel-D BBS at 916-722-1984, or
voice at 916-722-1939.


Over the years, I've come to believe that one of the greatest challenges
programmers can take on is the design of a computer-based game. When the scope
of the game is limited only by the hardware and your imagination, there can be
no higher peak than creating the ultimate multiplayer game. 
In this article, I'm presenting the development tools you need to build your
own complete, multiplayer game system. At the heart of this development
platform is an "engine" for creating a variety of server-based games such as
adventure games or space-type arcade games.
My focus here is on the design and features of the engine, not on details of
specific game genres. I'll zero in on the game server (as well as its
underlying database), the player nodes, and the terminal software that enables
communication between the server and nodes. The complete game-development
system is available electronically from DDJ (see "Availability," page 3) as
well as from our Channel-D BBS.


Underlying Principles


Games require performance. While average computer users may wait patiently for
dBase to sort a list of customers who have blue eyes and four warts, they
start fuming when their 486/66 isn't fast enough to provide the windshield
glare in "Aces of the Persian Gulf." Performance does require hardware, but
more importantly, performance requires design and attention to those minor
details that most programmers typically don't have to deal with.
Real-time games, whether arcade or incremental, have "turns." In a sequential
game such as Monopoly, a turn is defined by the player upon completion of an
arbitrary set of actions. Real-time games, however, define a turn by some
external time constant, without regard to the desires of the participants;
consequently, the performance of the computer is important. With arcade games,
a turn is generally qualified by the time it takes to show the current state
of the game (video vertical refresh, for example). Incremental games, on the
other hand, are real-time games with an extended, fixed turn time--generally
on the order of seconds. The extended time allows the computer to make more
complex decisions and tends to equalize the differences in the physical
reflexes of the participants.
One purpose of games is to provide the participants with a microcosm of life
in compressed time. (Believe me, no one would want to play my life in real
time; but, if it were provided in 15-minute bursts, maybe someone would stay
awake.) It's been said that "the difference between a toy and a game is that a
game has a goal and a toy does not; therefore life is a toy." In keeping with
that concept, I will diverge from life and just refer to a reality. Life is
absolute; reality is merely perception. We will create realities that can have
goals; the life will be left up to the player.


Overall Structural Design


The game-development system presented here has three major logical structures:
the server, nodes, and terminals. Each is logically independent of the others,
but may reside within the same physical computer. There are direct
communication paths from the server to the nodes and from the nodes to the
terminals. For the sake of security, there is no direct path between nodes or
from the terminals to the server.
The server is at the center of the entire system. Its purpose is to receive
requests from the nodes, prioritize those requests according to some
properties of the reality, and provide responses to the nodes in accordance
with the state of the reality at the end of the turn. The server's purpose
requires it to have complete control over an entire database; therefore, a
majority of the storage resides within the server. The ability of the server
to communicate with the nodes is essential, so consideration must be given to
the message structure and the carrier to facilitate data transactions.
The node is either a gateway to the player's terminal or the source for
actions of the nonhuman players that exist within the reality. As a gateway,
the node receives request packets from the terminal, qualifies those requests,
and possibly sends them to the server. The node provides all the security and
secondary parsing services for the server. To this end, the connection between
the node and server is assumed to be secure and the packets passed are
incorruptible. As a source of nonhuman player actions, the node provides
another reduction in server-CPU overhead. This node service does not
necessarily eliminate the processing of nonhuman actions by the server.
The terminal lets the human player interact with the reality. Unlike most
terminal programs, this one is graphics based, supports animation, and
requires a fast CPU and significant disk storage. The node communicates with
the terminal program with free-format packets that are similar to function
calls made by the node to a terminal "process." Although the terminal program
can reside within a node as a separate task, it more typically is a remote
computer terminal linked through a modem. The packet structure is designed to
ensure data integrity and assumes that the terminal has a large, local
database that can be referenced. The general philosophy, with regard to the
clarity of the physical connection between a remote terminal and the node, is
that the game should not be crippled by the one bad connection. Volatile
request packets are rejected with minimal retries. The terminal provides the
initial syntax/range checking and parsing. Request packets are built and sent
to the node. Because of the abilities and motivations of some gamers, the
terminal is considered unsecure. 


Inside the Server


When we thought about the ideal game server, two major components to consider
were the hardware and the operating system. Although, collectively, we had
experience with almost all microprocessors, microcontrollers, and RISC
processors (and their necessary development tools), we chose the x86 platform.
Currently, the server is a 486/66-MHz with 32 Mbytes of RAM, a 1.2-gigabyte
SCSI drive, a 5-gigabyte tape driver for historical information, and a WORM
drive for terrain information. The three storage peripheral types are on
separate bus masters to eliminate the data/command bandwidth problems we
encountered when they were all on the same adapter.
Although there were several options in regard to the server's operating
system, the one major concern was performance-- everything else was
subordinate to raw speed. Among those OSs we considered were: NT, MS-DOS, RMX,
and NetWare/ 386. Since processing performance requires RAM and MS-DOS doesn't
let us access all of it unless we use a DOS extender, DOS hit the road. I have
used RMX only long enough to decide that I don't like it, so it followed DOS.
Pretty pictures and crippled CPU power wouldn't do, so NT was out. Ultimately,
we chose NetWare/386.
As it turns out, NetWare has several advantages as a game server. It can
handle a large amount of RAM. It has excellent communications, not to mention
disk and tape support. It is non-preemptive, so the operating system will get
out of the program's way when needed. It runs applications (NLMs) in Ring 0,
so there isn't any instruction-virtualization nonsense. On the downside,
debugging can be a problem and the user interface is poor.
We used Watcom C and Microsoft's MASM 6.0 assembly language package to build
the system. Watcom C is great for generating NetWare NLMs, 286 executables for
real-mode MS-DOS, and 386 protected-mode executables for DOS extenders. Value
passing between functions was done using registers (as opposed to the stack)
for the sake of performance. With the use of pragmas, specific passing
registers are selected occasionally to the same extent.


General-Database Considerations


The database associated with the game server is large and dynamic. As the game
is played, the database grows, and we had to plan for the database spanning
hard disks and peripherals. As with most databases, there are static and
dynamic portions. Rather than treating everything in some general manner, we
chose access methods geared toward each case. The database can only be
accessed through a single process on the game server, for instance, so
multiple access concerns are virtually eliminated.
The hard-disk-file data is stored three ways: fixed-data files; hashed,
binary-indexed files; and Btrieve. The access method for the database depends
upon which delivers the best performance.
The fixed-data files are those that contain fixed data that can easily be
accessed directly. These are generally configuration files that are read once
and ignored during the operation of the program. Access time and complexity
are not real issues in this case.
The hashed, binary-indexed files contain data in which the index table is
infrequently altered, and no importance is placed upon the access order of
data elements. This method is the fastest we could implement, so most of the
databases use a hashed index. The index for this method is an array of 32-bit
hash entries and associated file offsets in RAM or on disk. The hash is
generated by performing a table-driven, 32-bit CRC on the index tokens for a
record. The hash is then appropriately placed in the index array. When tested
against a dictionary with 450,000 unique words, this hash method had only two
collisions. The index search function (written in assembly) returns a worst
case, noncollision matching entry in less than 8.0 microseconds when tested
with a one-million-record index running on a 486/33. Since binary hashing does
have some performance drawbacks when adding and deleting records, it wasn't
universally applied.
Any database that did not fit within the previous two classes was indexed
using Btrieve. We don't yet know whether this is the highest-performance
index; when we compared several B-tree methods, no particular one emerged as
the clear champ. The comparative performance of B-trees depends upon a variety
of factors such as the content and size of the data, the order that data is
entered, how thrashed the disk is, and the phase of the moon. The arbitrary
nature of the data that will be filed by this program and the NetWare-specific
target led us to Btrieve. If you know of a demonstrably better index than
Btrieve, we would be very interested in hearing from you.


Server Communications



In a network game, there are two communication philosophies: client/server or
web. Initially, we considered a network web, where every node is a peer,
sharing the overall processing load. However, this approach quickly turned
into a headache. The requirement for intercommunication in the web is
considerable. In certain, easily attainable scenarios, the horrendous network
traffic caused lost packets, which in turn, increased the network traffic for
retransmissions. In a web, there is a major synchronization problem.
Additional communication packets are required to make sure every node sticks
to a common time scale. Also, with increased complexity, comes an increased
vulnerability to failure. One computer out of one is less likely to go down
than one out of many. A client/server arrangement tends to be self-limiting.
With this in mind, we ultimately settled on the client/server model.
With client/server architectures, communication between nodes and the server
can be a major bottleneck. Currently, network communication is via IPX packets
in a raw 802.3 format on 10-MHz Ethernet. This yields about 500 Kbytes/sec
gross throughput--adequate for a low-volume system. We'll likely switch to
100-MHz Ethernet as it becomes stable.
From an installation standpoint, we've eliminated two more layers by not using
LSL. We found a slight performance improvement over LSL/ODI by using the old,
bound IPX.COM driver. Of course, this performance difference may be a function
of the network-interface manufacturer we use. We have not done extensive
multivendor testing to see if this performance enhancement generally holds.
To facilitate the initial access to the server, we have followed Novell's
Service Advertising Protocol (SAP). A game server appears on the network as a
type 0x8900 server. Beyond making a call to NetWare's SAP function, no further
consideration was given to this portion of the server. The communication
socket is whatever NetWare decides to dynamically allocate for our process.
The remote node pings for the server on startup using SAP. SAP delivers the
logical network address and socket number for the node to start a connection.
Except for the initial SAP query, no broadcast packets are used.
The data-transaction protocol we chose is similar to Novell's own NetWare Core
Protocol (NCP). Novell does provide access to its NCP layer, but we decided
that we would need something a little more loosely coupled, and we were less
concerned about network security at this level. By using our own communication
construct, we are not constrained by the connection limit imposed on the
five-user versions of NetWare.
All transactions are initiated by the remote node. The node either requests a
response or transmits a connection-good packet. The requests are processed by
the server and responses are returned upon completion of their execution
within the server. Simple status requests are returned within a millisecond;
turn requests are returned on completion of the next game turn (50
milliseconds to 5 seconds). Because of the extended time required for turn
resolution, the server sends an immediate acknowledgment packet for turn
requests. This allows the node to maintain a tight time-out on turn requests
and resend that packet if the need arises. Even so, the game running on the
server must contend with turn packets not being submitted due to lost packets.
The basic request-response packet is shown in Table 1. 
Although a majority of the request packets have only one request, the
transmission packet may contain multiple requests in the form of "tuples" (an
AppleTalk term that I've shamelessly plagiarized). The length field of the IPX
header is used to determine the number of tuples (that is, to read through
each of the tuples until the length gives out). The format of a single tuple
is listed in Table 2.
The request tuples made by the node are numbered sequentially, and the
server's responses are given the same number. The server may answer all
requests from a node with separate response packets or may respond in kind
with a multiple tuple packet. The server does not keep track of the sequence
order; that is the responsibility of the requesting node. If a response has
not been received by the node for some request within the appropriate time,
the node can either resend the command or send a packet-status query request.
If the node resends the command, either an identical or new sequence number
can be used. If the same sequence number is used, the packet-resend bit must
be set to inform the server not to process the packet twice if the original
request is sitting in some process queue. If the packet-resend bit is not set
and the original packet has not been processed, the server will treat the new
packet as a replacement for the old, which is then discarded. If a new
sequence number is used, the node must be careful that those requests do not
have a detrimental effect if both requests were processed.
The first request made by a node to the server is for a valid connection
number. This is accomplished by sending a packet to the server with the
connection number field set to 0xFFFF and bit 5 of the flags field set to 1.
Any tuples encapsulated in this packet will be ignored by the server. The node
must set the destination-network address and socket in the IPX header
according to the information found in the server SAP packet. The source socket
must be set to the socket number to which the node will be listening for
responses from the server. If the node's listening address and socket number
are the same as an existing connection, the previous connection will be closed
in the server and a new connection entry will be generated. If multiple
processes are accessing the server from the same node, they must all use
socket numbers that are distinct from the others being used on that physical
node. An establish-connection request does not require any sequence number.
The server will respond with a valid connection number (0-0xFFFD) and bit 5 of
the flags field set to 1. The node may send this request repeatedly until it
receives a good response. If bit 3 of the flags field is set in the response,
the server does not have room for an additional connection.
When the initial connection is established, requests may be made of the server
in any order. The server performs a check of all of its active connections
every five minutes. If any connection has not made a request of the server
within that five-minute period, that connection is closed. It is up to the
node to make simple time requests every minute just to guarantee that the
server does not close the connection.
If a node makes a request using connection number 0xFFFE and bit 5 of the
flags field set, the server will return the connection number currently being
used for that node ID/socket combination. This transaction was implemented to
reduce the damage done if the node fails/reboots/dies, but is able to recover
its composure enough to carry on the connection. Nothing's worse than being
involved in a real-time game when a machine crashes and the player has to
start over--except when everyone has to start over. This transaction should
not be used routinely (preferably only once during the life of a connection),
because it takes more server CPU cycles than a time request or a
connection-status request.


Server Requests


Requests by a node can be directed at the server or a game being played on the
server. Each request type is differentiated numerically by the first byte of
the request body. Each request type has a subtype byte that immediately
follows the first. The request types are listed in Table 3.
A request-type number with bit 7 set is intended for a specific game. The
lower seven bits are used for a game handle to determine for which game the
request has been made. The packet is sent to that game function for
processing. Each game has its own request structures.
No visuals are associated with the request/response transactions. The game
server is only aware of spatial relationships. The time and processing
required to generate graphics and interact with the user are unnecessary for
the operation of the game server. The data required for visual and audio can
easily overwhelm the bandwidth of any communication channel; therefore, those
two components have been left out of game-server requests.


The Node


The application (called "NODE") running on the node would generally be started
from some BBS as a DOOR (an executable that expects to get its user
interaction from a serial port). As a DOOR, we had a choice of accessing the
serial port through a FOSSIL driver or directly to the serial port. We chose
the direct-access method, more for the fact that the FOSSIL did not provide us
with certain capabilities rather than any performance issue. 
NODE and the terminal program require a certain amount of multitasking, or at
least a close approximation. Our target for the node and terminal was a
386-based DOS PC primarily because there are a lot of them out there and we
could buy them cheap. Mixing multitasking and DOS can be done in several ways.
While we looked at DesqView (too slow), AMX (too expensive for little gain),
and others, we eventually wrote our own polled-event manager that works fine.
The major operational parts of the event manager are written in assembly
language. Drivers were written for all the communication events that need to
be processed--network, serial I/O, keyboard, and timer. NODE's main() simply
initializes the screen and the event manager; then it calls the event-polling
function, which loops continually, accepting and processing messages from the
event manager. The event poll does not return to main() until the program is
ready to return to DOS.
Upon initialization, NODE attempts to establish a connection with the remote
computer in order to assure that the program is running a current version of
the terminal program (named TERM). If the remote is not running TERM, a simple
text message is sent telling the user to download it, then NODE exits to DOS.
If TERM is running on the remote but is not a current version or is missing
resource files, NODE triggers a download of the current environment. Once
everything is found to be valid, NODE begins its transaction process with
TERM.
To establish a connection, the following steps are taken:
1. NODE begins transmission of a BREAK sequence.
2. TERM begins transmission of a responding BREAK sequence within 250
milliseconds.
3. NODE stops the BREAKing and transmits the message "RYUNODExxxxx". The first
seven bytes (RYUNODE) are the signature and are mandatory, but any other data
may follow terminated by a 0. The full signature string must be received
within one second of the end of the BREAK.
4. TERM stops BREAKing when it receives the first non-BREAK character and
transmits its message "RYUTERMxxxxx" in response. The first seven bytes are
the signature anything may follow that, terminated with a 0. The full response
string must be received within one second of the end of the BREAK.
5. NODE sends the first command for TERM's version and resource status.
6. NODE starts any file transfers that are required.
The BREAK signal was chosen because it exists outside of the 8-bit data set
that generates its own interrupt. By having a signal outside, we can resynch a
command-based communication interlock without resorting to artificial data
sequences that can be easily lost in a burst of noise. The BREAK resynching
procedure (steps #1--4) can be used whenever either side finds that it has
lost command track.
Where the server is the id, NODE is the ego of the game. It determines how the
user will communicate with the server. It decides what will be displayed on
the terminal. It handles any security processing, and it performs
communications with the other nodes. 
As mentioned earlier, NODE communicates with the server using request/response
transactions. That same application communicates with the user's terminal
using command/acknowledgment transactions. The request/response transaction is
a loose interaction, where the volatility of the data makes the importance of
lost transactions moot after the turn boundary; therefore, a missing
communiqu is generally shrugged off. Conversely, the command/acknowledgment
transaction views lost packets as a reason for suicide, or at least motivation
for extreme mid-life crisis. The direct link between NODE and TERM complements
the command/acknowledgment because there are no other connections to usurp
control of that data conduit as there are with Ethernet. Additionally, the
cumulative nature of the visuals and the fact that memory buffer references
are passed between NODE and TERM require a qualified data handshake.
The structure of the command sent by NODE is shown in Table 4. The commands
are sent from NODE to TERM in any sequence order. A sequence number will be
accepted only if there is no previous command that contains that sequence
number. A command does not require any acknowledgment before the next command
is sent, but each command must be acknowledged within 50 milliseconds of the
transmission of the last byte of that command. A command can be acknowledged
in three ways: ACK with status, ACK with an indefinite delay, or a command
resend.
An acknowledgment with status has the structure described in Table 5. If the
status byte is 0xFF, the status body will contain a time-out (in system
ticks), during which NODE waits for the actual acknowledgment. Statuses from 1
to 254 can be used to indicate any condition other than complete success in
fulfilling the command request.
An acknowledgment with indefinite delay would have the structure listed in
Table 6. This would tell NODE to wait forever for the command to complete.
Needless to say, if too many of these ACKs are pending, the connection will
require a resynchronization.
Because each command/ACK packet starts with a length byte, this length will
only contain values from 3 to 255. This leaves 0, 1, and 2 for special
circumstances. 
0 is a common side effect of the BREAK sequence, so it is ignored by both
sides. 
1 can be sent by either side to force the other to resend all unacknowledged
packets. 
2 has been reserved for future use and is currently ignored.
If the CRC shows that any packet is incorrect, the receiver sends a resend (1)
request, after any current transmission is completed. This requires the ISR on
the receiver to calculate the CRC as the packet bytes are received. Because
packet length precedes any data, the entire packet does not have to trigger an
event until it has been completely received and qualified. The ISR is also
responsible for keeping track of any ACK time-outs. 


NODE/TERM Commands


Both NODE and TERM are event-based processes. After they do their respective
initializations, they jump to the event manager; returning only to exit to the
operating system. Commands sent to TERM by NODE are fashioned after a C
function call. By convention, all remote function calls are differentiated by
an RM_ prefix. These "functions" are actually macros that pass data to the
serial-communication routines before releasing to the event manager. The event
manager attaches some task values to that remote call and continues polling
the event queue for event packets to process. This simple, no-frills
multitasking system works well when there is an unknown amount of time between
the command and its response.
The major commands are shown in Table 7. The minor-command numbers depend upon
the major command. The command takes one byte. The subcommand number is also
one byte. No command is processed by NODE until it has been fully received and
qualified. Once that command has been completed by TERM, a completion response
is returned to NODE.
For example, assume you want to move the graphics cursor to the x,y pixel
position 100,230. The program in NODE would have the command line
rv=RM_gotoxy( 100, 230 );. This, in turn, would be expanded by the C compiler
to rv=send_command( CMD_GRAPHICS, RM_GOTOXY, 100, 230 );. The send_command()
function would package and transmit the information in Figure 1 to TERM. 
All remote-memory references are done with16-bit handles. This allows TERM to
use any form of local-memory management. Actual data transfers between NODE
and TERM are rare, because it is expected that TERM has a complete database of
images, icons, and fonts. If an image does not currently exist in TERM's local
database, a general group image is used as a placeholder until an
image-request command from TERM is serviced by NODE. Although the
command/acknowledgment transaction can go both ways, NODE does not support any
commands except status and query for security reasons.


Implementation



Both NODE and TERM are written with Watcom C and MASM 6.0 assembler. Both
programs are run in protected mode using Rational System's royalty-free DOS4GW
DOS extender. One of the pluses of this environment is that it has tons of
memory. The extender runs on most 386 (and up) machines without a hitch. It
almost supports the DPMI 0.9 specification and has a flat memory model that
directly maps the value of a pointer to any physical address--no
segment-register diddling.
On the downside, the environment doesn't fully support the DPMI 0.9
specification--real-mode callbacks to protected-mode functions are not
supported (DOS4GW supports only the mouse callback). Interrupt-service
routines (ISRs) written to be serviced in protected mode run as slow as
molasses. Certain CPU instructions trigger CPU faults, which cause a fault
manager to be run; this can make STI and CLI instructions take several
microseconds to execute.
Whenever an interrupt occurs while running DOS4GW, the CPU is switched to
real-mode, and the real-mode ISR is run first. After the real-mode ISR has
returned (even if it was just an IRET), the CPU switches to protected mode to
process any protected-mode ISR. The CPU then switches back to whatever mode
was running at the moment of the interrupt. This entire process takes a bit of
time. Our solution was to implement device drivers as .COM programs and
dynamically load them into low memory from files at run time. This required
placing the event manager's memory and certain input functions in low memory
as well. By putting the drivers in low memory (and making them real-mode
drivers), we solved other problems beyond the slow ISR. Because the drivers
and event manager are the major users of privileged instructions (STI, CLI,
OUT, IN), being run in real mode means that those instructions do not generate
a fault. Also, because event-manager input functions are written in real mode,
the mouse and IPX event vectors do not require a real-to-protected-mode
callback. The real-mode drivers support the keyboard, timer, serial ports,
mouse, sound I/O, and network interface. I suppose there could be others, but
what else that generates an interrupt is useful for a game?
The implementation of each of the viewing styles presented to the user is
beyond the scope of this article. Whether the user gets a straight overhead
view, a tilted overhead view, or a straight-on three-dimensional, Doom-style
view is immaterial to the actual inner workings of the game being played. In
any event, the display template has been implemented and the internal,
relational data structures are the same. 
The implementation of the low-level graphics engine is important. A large
portion of TERM's CPU time is spent updating the screen and building the
visuals. Some of the time is apparent to the casual observer, all those images
require the movement of a lot of data. Other demands on CPU time are more
obscure. Compared to normal RAM, accessing video memory is like running in
loose sand. One reason is that video memory is on the bus and CPU caches do
not deal with that memory. Another reason is that dual-ported, video RAM
(VRAM) will delay a second write to video memory until the previous write has
been completed. We came up with a couple of solutions to reduce the effect
felt by the slow video. Any references made here to video assume a 16-bit VGA
adapter in Mode 13 or Mode X. 
The delay of the CPU by the VRAM between successive writes is significant. To
find out how much, we wrote the test functions in Example 1. All of these
programs executed in the same amount of time. What this shows us is that we
can reduce the amount of time it takes to perform graphics functions in
several ways. First, most video accesses are repetitive manipulations of the
same areas. By creating a virtual video buffer in RAM to redraw the screen at
every video refresh (or timer tick), we reduced the delays associated with
continually accessing the same areas; see Listing One, page 63. We also kept
track of the region in which the updates were made, so only the general
regions that had been updated since the previous video refresh were written.
Each change enhanced the system's performance, but we still had not dealt with
the time wasted between video accesses. We created a jump vector to be called
after every video access. A jump vector would allow small program snippets to
run, getting some use out of the CPU while the video came back on line. We
created a second virtual buffer, which represented the current video state.
Statistically, many of the colors did not change in regions, even though the
images were different. By comparing the update regions between the two virtual
buffers, we eliminated many unnecessary accesses. Performing only 16-bit
writes gained additional time slots to do comparisons. Cycles are still being
lost, but the comparisons tend to take up a bit of that slack.


Conclusion


As you might guess, this project is an ongoing effort. Anyone can take part;
your ideas, suggestions for changes, and help are welcome. To participate,
dial up the Channel-D BBS at 916-722-1984, 916-722-1985, and 916-722-7223.
Table 1: Basic request response packet.
Packet Name Description 
00--29 IPX header Information associated with the IPX packet.
00--31 Connection number Number used to define the client/server connection.
30--31 Packet flags Bits used to define the packet that follows.
 B0: 1 indicates that the packet is a heartbeat or an ACK.
 B1: 1 indicates that this is a resent packet.
 B2: 1 indicates that this packet must be ACKed immediately.
 B3: 1 indicates that the connection is being terminated.
 B4: 1 indicates that all enqueued packets should be flushed.
 B5: 1 indicates that the node is establishing a connection.
 B6: 1 indicates that the node is using an invalid connection.
32--i Request tuple 1 Encapsulated request 1.
i+1--j Request tuple 2 Encapsulated request 2.
 ...
y+1--z Request tuple n Encapsulated request n.
Table 2: Format of a single tuple.
 Packet Name Description 
 0--1 Request length Length of request tuple to
 follow, including this length field.
 2--3 Sequence number Number used to differentiate requests
 (0--2047 valid).
 4--n Request body Body of the request.
Table 3: Server request types.
Type Name 
 00 Status or General Information
Covers the status or information for the server, any connection, a player, or
agame.
 01 Control
Allows the node to establish a player and attach to or drop from a game.
 02 Communicate
Allows a connection to communicate with another player or connection. This is
used to send or receive disk- or server-based e-mail. If direct communication
is desired, each receptive node will have listening sockets open from the
socket with which they use to communicate to the server. That socket
information is gleaned from a status request.
 03 Query an Entity
Allows the connection to get information on an entity in the main database. An
entity is defined as an existing player or nonplayer thing or object that
currently exists within a reality. This can be a person, place, or thing. The
entity must exist and have a name or a handle.
 04 Query a Species
Allows the connection to get information on a species that exists within the
main database. A species is defined as a template from which entities are
created. Species can be animal, vegetable, or mineral.
Table 4: Structure of the command sent by NODE.
 Command Name Description 
 0 Command length Length of command from 1 to n+2.
 1 Sequence number Number used to differentiate commands, 0--127.
 2 to n Command body Body of the command.
 n+1 to n+2 CRC 16-bit CRC of bytes 0 through n, inclusive.
Table 5: Structure of acknowledgment with status.
 Status Name Description 
 0 ACK length Length of packet to follow, from 1 to n+2.
 1 Sequence number Command # ACKing with bit 7 set, 128--255.

 2 Status byte Status for the command, 0 is always good.
 3 to n Status body Additional status bytes that are command dependent.
 n+1 to n+2 CRC 16-bit CRC of bytes 0 through n, inclusive.
Table 6: Structure of acknowledgment with indefinite delay.
 Byte Length Name Description 
 0 ACK length Length of packet to follow, 3.
 1 Sequence number Command # ACKing with bit 7 set, 128--255.
 2 to 3 CRC 16-bit CRC of bytes 0 and 1.
Table 7: NODE/TERM commands.
 Command Description 
 0 Status.
 1 Query.
 2 Watcom C library access.
 3 Memory management.
 4 Hardware access.
 5 Graphics primitives.
 6 Menuing access.
 7 Window access.
 8 Animation.
 9 Data manipulation.
 10 Database manipulation.
 11 Messages (mail, chat, and sound).
 254 Continuation of previously sent packet.
 255 Extended major command, byte that follows is extended-command number.
Example 1: Three test functions to determine CPU delay.
(a)
 mov ecx, 32000
 mov edi, 0A0000h
 xor eax, eax
 rep stosw
(b)
 mov ecx, 16000
 mov edi, 0A0000h
 xor eax, eax
 rep stosd
(c)
 mov edx, 32000
 mov edi, 0A0000h
 xor eax, eax
 loop10:
 stosw
 dec edx
 jz done
 mov ecx, 20
 loop20:
 add eax, eax
 loop loop20
 jmp loop10
Figure 1: Information transmitted by the send_command() function.
0 9 Command length
1 xx Sequence number, managed by send_command()
2 5 (Graphics primitives)
3 11 (Function RM_GOTOXY)
4-5 100 X coordinate
6-7 230 Y coordinate
8-9 xxxx 16-bit CRC of bytes 0 through 7, inclusive

Listing One 


; ****************************************************************************
; * Title: GKERNEL.ASM
; * Copyright (c) March 1994, Ryu Consulting
; * Written by Rahner James
; * Code to move the virtual graphics buffer to the physical buffer
; * Very important note: These functions are designed for performance, so 
; * very little (if any) range checking is performed on the data unless 
; * the RANGE_CHECK label is defined
; ****************************************************************************

 .386
 .model small, syscall

;RANGE_CHECK equ 0 ; Defined if range checking is enabled
MAX_VIRTUAL_WIDTH equ 320
MAX_VIRTUAL_HEIGHT equ 200
VIDEO_START equ 0A0000h
TIMER_INTERRUPT equ 1CH

.DAta

Display_Width dd 320
Display_Height dd 200

Virtual_Display_Ptr dd 0 ; -> virtual display buffer
Virtual_Display_Bytes dd 0 ; Size of virtual display in bytes
Virtual_Display_Words dd 0 ; Size of virtual display in 16-bit words
Virtual_Display_Dwords dd 0 ; Size of virtual display in 32-bit words
Virtual_Display_End dd 0 ; End of the virtual display
Lowest_X dd 0
Lowest_Y dd 0
Highest_X dd 0
Highest_Y dd 0

Video_Access_Count dd 0 ; Number of video accesses since refresh
Update_Display_Flag dd 0 ; Set to !0 if the virtual buffer should 
 ; be moved to the display


Line_Start_Table dd MAX_VIRTUAL_HEIGHT dup(0) 
Old_Timer_Vector dd 0,0 ; -> old timer ISR, !0 if installed

Old_Video_Mode dd 0FFh

.COde
 extern malloc_:near, free_:near, __GETDS:near

; ****************************************************************************
; * void MOVE_VIRTUAL_TO_DISPLAY( void )
; * Moves the virtual buffer to the display buffer
; * Given: Video_Access_Count = number of accesses made to the video
; * Returns: Virtual buffer copied to the display
; * Lowest_X, Lowest_Y = lowest point in virtual display accessed
; * Highest_X, Highest_Y = highest point in virtual display accessed
; * Video_Access_Count = number of accesses made to the video
; * Update_Display_Flag set to 0
; ****************************************************************************
move_virtual_to_display_ proc uses eax ebx ecx edx edi esi


 cmp Video_Access_Count, 0 ; See if we need to do this
 jz short done ; Quit if not

; * Setup the registers for the movement
 mov esi, Lowest_Y ; ESI = lowest Y value
 mov esi, Line_Start_Table[esi*4] ; ESI -> virtual line start
 mov eax, Lowest_X ; EAX = left X
 and al, not 3
 add esi, eax ; ESI -> upper left pixel in 
 ; virtual buffer
 mov edi, esi
 sub edi, Virtual_Display_Ptr ; EDI = offset from start of table
 add edi, VIDEO_START ; EDI -> upper left pixel in display

 mov edx, Highest_X ; EDX = right most side
 mov ebx, Display_Width
 sub edx, eax ; EDX = width in bytes - 1
 sub ebx, edx ; EBX = display width remainder + 1
 dec ebx
 and bl, not 3 ; Clear the LSBits

 add edx, 4
 shr edx, 2 ; DL = width/4 (will work up to 1023
 ; pixels wide)
 mov eax, Highest_Y
 sub eax, Lowest_Y
 inc eax

; * Loop through the rectangle, putting it down
@@: mov ecx, edx
 rep movsd
 add edi, ebx
 add esi, ebx
 dec eax
 jnz @B

done: mov Lowest_X, -1
 mov Lowest_Y, -1
 mov Highest_X, 0
 mov Highest_Y, 0

 mov Video_Access_Count, 0 
 mov Update_Display_Flag, 0 ; Stop any updates
 ret

move_virtual_to_display_ endp

; ****************************************************************************
; * void far TIMER_ISR( void )
; * Timer tick ISR
; * Given: nothing
; * Returns: 0 if all went well
; ****************************************************************************
timer_isr proc

 push ds
 push es
 pushad


 call __GETDS

 inc byte ptr ds:[0B009Eh]

 popad
 pop es
 pop ds

 iretd

timer_isr endp

; ****************************************************************************
; * int GR_INIT( int EAX )
; * Sets the graphics screen to a mode
; * Given: EAX = mode to set the graphics monitor to
; * 0 = normal EGA, mode 10h, 640x350, 16-color
; * 1 = normal VGA, mode 13h, 320x200, 256-color
; * Returns: 0 if all went well
; ****************************************************************************
gr_init_ proc near uses ebx ecx edx edi

 mov Update_Display_Flag, 0 ; Stop any updates
 mov Video_Access_Count, 0 
 mov Lowest_X, -1
 mov Lowest_Y, -1
 mov Highest_X, 0
 mov Highest_Y, 0

 mov word ptr ds:[0B009Eh], 730h

; * Make sure we are using a fresh memory buffer
 cmp Virtual_Display_Ptr, 0 ; See if already setup
 jz short gr10_init ; Skip if not

 push eax
 xor eax, eax
 xchg eax, Virtual_Display_Ptr
 call free_
 pop eax

gr10_init:
; * Get the old video mode, if need be
 cmp byte ptr Old_Video_Mode, 0FFh ; been here before?
 jne short gr20_init

 push eax
 mov ah, 0fh
 int 10h
 mov byte ptr Old_Video_Mode, al
 pop eax


gr20_init:
; * Revector the timer if we need to
 cmp Old_Timer_Vector, 0 ; timer already installed?
 jnz short gr30_init

 push eax

 mov eax, 204h ; EAX = DPMI Get Protected-mode
 mov bl, TIMER_INTERRUPT ; Interrupt Vector command
 int 31h
 mov Old_Timer_Vector, edx
 mov Old_Timer_Vector+4, ecx
 mov eax, 205h ; EAX = DPMI Set Protected-mode
 mov bl, TIMER_INTERRUPT ; Interrupt Vector command 
 mov cx, cs
 mov edx, offset timer_isr
 int 31h
 pop eax

gr30_init:
 cmp eax, 1 ; See if it's us
 jne derr ; Quit with an error if not

; * VGA mode 13h, 320x200, 256 colors
; * Initialize all variables before memory allocation
 mov Display_Width, 320
 mov Display_Height, 200
 mov Virtual_Display_Bytes, 320*200
 mov Virtual_Display_Words, (320*200)/2
 mov Virtual_Display_Dwords, (320*200)/4
 mov eax, 13h
 int 10h

; * Allocate memory and setup the pointers
gr100_init:
 mov eax, Virtual_Display_Bytes
 call malloc_
 or eax, eax ; See if allocation error
 jz short derr
 mov eax, VIDEO_START ; debugging
 mov Virtual_Display_Ptr, eax

 mov ecx, Display_Height ; ECX = number of lines
 mov ebx, offset Line_Start_Table ; EBX -> line start table
@@: mov [ebx], eax
 add eax, Display_Width
 add ebx, 4
 loop @B
 mov Virtual_Display_End, eax ; -> the end of the buffer
dood: xor eax, eax ; Clear out the virtual buffer
 mov edi, Virtual_Display_Ptr
 mov ecx, Virtual_Display_Dwords
 rep stosd

 mov Video_Access_Count, 1 
 mov Update_Display_Flag, 1 ; Starts any updates

done: ret
derr: or eax, -1
 jmp done

gr_init_ endp


; ****************************************************************************
; * int GR_STOP( void )

; * Stops all graphics library processing
; * Given: nothing
; * Returns: 0 if all went well
; ****************************************************************************
gr_stop_ proc near uses ebx

 mov Update_Display_Flag, 0 ; Stop any updates
 mov Video_Access_Count, 0 
 mov Lowest_X, -1
 mov Lowest_Y, -1
 mov Highest_X, 0
 mov Highest_Y, 0

; * Free any allocated memory
 xor eax, eax
 xchg Virtual_Display_Ptr, eax ; See if already setup
 or eax, eax
 jz short @F ; Skip if not

 call free_

; * Restore timer interrupt
 cmp Old_Timer_Vector, 0 ; timer already installed?
 jz short @F

 mov edx, Old_Timer_Vector
 mov ecx, Old_Timer_Vector+4
 mov eax, 205h ; EAX = DPMI Set Protected-mode
 mov bl, TIMER_INTERRUPT ; Interrupt Vector command
 int 31h

; * Get the old video mode, if need be
@@: cmp byte ptr Old_Video_Mode, 0FFh ; been here before?
 je short @F

 mov eax, Old_Video_Mode
 int 10h
 mov byte ptr Old_Video_Mode, 0FFh

@@: xor eax, eax
 ret

gr_stop_ endp

; ****************************************************************************
; * int GR_SET_PIXEL( int EAX, int EDX, BYTE BL )
; * Sets a pixel in the virtual buffer
; * Given: EAX = X coordinate
; * EDX = Y coordinate
; * BL = pixel color
; * Returns: EAX = 0 if all went well only if range checking is enabled
; ****************************************************************************
gr_set_pixel_ proc near

ifdef RANGE_CHECK
 cmp Virtual_Display_Ptr, 0 ; See if we are enabled
 jz derr
 cmp Display_Width, eax
 jbe derr

 cmp Display_Height, edx
 jbe derr
endif

 cmp Lowest_X, eax
 jb short @F
 mov Lowest_X, eax
@@: cmp Highest_X, eax
 ja short @F
 mov Highest_X, eax
@@: cmp Lowest_Y, edx
 jb short @F
 mov Lowest_Y, edx
@@: cmp Highest_Y, edx
 ja short @F
 mov Highest_Y, edx
@@:

 mov edx, Line_Start_Table[edx*4] ; EDX -> start of display line
 add edx, eax
 mov [edx], bl ; Put the pixel

 inc Video_Access_Count
 ret

ifdef RANGE_CHECK
derr: or eax, -1
 ret
endif

gr_set_pixel_ endp


; ****************************************************************************
; * int GR_RECT( int ECX, int EAX, int EDX, int ESI, BYTE BL )
; * Set a rectangle to a color
; * Given: ECX,EAX = X1,Y1 of upper left corner
; * EDX,ESI = X2,Y2 of lower right corner, must be > EAX,EDX
; * BL = color
; * Returns: EAX = 0 if all went well only if range checking is enabled
; ****************************************************************************
gr_rect_ proc near

 push edi

ifdef RANGE_CHECK
 cmp Virtual_Display_Ptr, 0 ; See if we are enabled
 jz derr
 cmp Display_Width, ecx
 jbe derr
 cmp Display_Width, edx
 jbe derr
 cmp Display_Height, eax
 jbe derr
 cmp Display_Height, esi
 jbe derr
endif

 cmp Lowest_X, ecx

 jb short @F
 mov Lowest_X, ecx
@@: cmp Highest_X, edx
 ja short @F
 mov Highest_X, edx
@@: cmp Lowest_Y, eax
 jb short @F
 mov Lowest_Y, eax
@@: cmp Highest_Y, esi
 ja short @F
 mov Highest_Y, esi
@@:

 inc Video_Access_Count
 mov edi, Line_Start_Table[eax*4] ; EDI -> start of display line
 add edi, ecx ; EDI -> upperleft corner of screen

 sub esi, eax ; ESI = height - 1
 inc esi

 mov bh, bl ; EAX = the color in all four bytes
 mov eax, ebx
 shl eax, 16
 mov ax, bx

 sub edx, ecx ; EDX = width of the rectangle - 1
 inc edx
 mov ebx, Display_Width ; EBX = width of the display in bytes
 sub ebx, edx ; EBX = wrap around value to add to 
 ; EDI after each STOS
 ror dx, 2 ; DL = width / 4 (will work up to 1023 pixels wide)
 shr dh, 6 ; DH = width % 4 for remainder

 xor ecx, ecx ; Clear the MSWord of ECX

@@: mov cl, dl
 rep stosd
 mov cl, dh
 rep stosb
 add edi, ebx
 dec esi
 jnz @B

 pop edi
ifdef RANGE_CHECK
 xor eax, eax
endif
 ret

ifdef RANGE_CHECK
derr: or eax, -1
 ret
endif

gr_rect_ endp

 end

































































Special Issue, 1994
Civic Networking with Geographic Information Systems


Public participation in the local decision-making process




Richard Civille and R.E. Sieber


Richard is executive director of the Center for Civic Networking in
Washington, DC. Rene is a PhD candidate in urban planning at Rutgers
University and can be contacted at sieber@zodiac.rutgers.edu.


Civic networking is the use of telecommunications by the general public for
community and economic development, nonprofit service delivery, and citizen
participation in government. Computerized telecommunications such as e-mail
and the Internet expand the possibilities for civic networking.
Rapid development of geographic information systems (GISs) is creating new
ways for the general public to participate in local planning decisions
affecting land and water use, roads, and development. This new level of
maturity makes civic networking with GIS a possibility, characterized by
hardware and software integration, object-oriented programming,
internetworking of distributed spatial data sets, and federal efforts that
require government agencies to deliver spatial data to the public at marginal
cost. These factors will generate demand for new GIS applications that enhance
community-based planning, and can empower small organizations and the general
public in new ways.
As applications mature, GIS will be perceived as a front end to a user who,
unconcerned with the underlying technology, can visually navigate through his
or her community. Increasingly, users will be private citizens--not just city
officials and staff members of planning agencies.
Public users will navigate their communities using PCs in libraries, schools,
and even at home. Today, municipal GIS applications are typically based on a
single database located on a central host computer. If networked, such
applications are connected through Ethernet-based, local-area networks to
terminals or workstations in the same building. In the near future, such
applications will become increasingly distributed and will be networked
through TCP/IP protocols that can exploit the Internet. Coaxial cable used for
television delivery is simply the physical layer of what can be configured as
a two-way broadband network. It is technically feasible to build
internetworked GIS applications that operate through cable-television systems
and thus become available anywhere in town.
This trend towards linking a common GIS front end to a back end of
internetworked, physically distributed spatial-data sets parallels efforts of
the Federal Geographic Data Committee (FGDC). With the clout of an Executive
Order from the President, this committee, chaired by Secretary of the Interior
Bruce Babbitt, is developing standards and procedures to make the government's
large investment in spatial data widely available to the public over the
Internet beginning in 1995.


What is GIS and How is it Used?


A GIS is a computer system that assembles, stores, manipulates, and displays
data identified according to geographic location. The total system also
includes critical success factors such as trained personnel, support from key
stakeholders, and adequate operating budgets.
A GIS is often built from a set of topological data layers describing land
forms, infrastructure, and property lines. Such maps are often created by
combining vector-based Census TIGER files and USGS digital line graph (DLG)
files with more-detailed local data, collected with the aid of handheld
transmitters linked to geo-positioning satellites. Vector data may then be
integrated with Landsat raster images. The process of transforming maps
conforming to the curvature of the earth to a two-dimensional plane is very
difficult. Also, local topography must often be hand-drawn on digitizing
tablets.
Once work is done for a region, however, it becomes possible to use geographic
coordinates to link tabular data from external relational databases. This
could include U.S. Census Bureau tract data, the National Wildlife Inventory,
effluent-discharge permits from the National Pollutant Discharge Elimination
System (NPDES), county soil maps from the Soil Conservation Service, zoning
districts, aquifers, rail lines, sewage, roads and other infrastructure,
archeological and historical sites, wildlife and heritage trusts, locations of
threatened and endangered species, and so on.
Geographic information systems are widely used. Businesses want to find good
locations for retail outlets. Watchdog groups want to know how public money is
distributed into different neighborhoods in a city. Politicians want to know
where their constituencies are. Public agencies use GIS to manage physical
infrastructures such as roads. Oil companies use it to improve exploration.
GIS has become a powerful tool in managing environmental emergencies, helping
to calculate responses to oil spills and predict impacts of toxic-waste sites.
Public health and safety officials use GIS to map the spread of a disease or
shifting patterns of urban crime. Maintaining accurate, quickly accessible
real-property inventories that include maps and photographs is important to
both tax authorities and real-estate agencies. And, of course, GIS is
increasingly recognized as a tool to provide urban and regional planners with
rapid access to spatial data. The list is endless--but the general public is
not often regarded as a class of user for whom to develop such applications.


How Has GIS Changed?


Mapping software is rapidly following trends familiar from word processing,
spreadsheets, desktop publishing, and database-development systems. These
trends are hardware and software integration, improvements in
application-development tools, networking capability and, in the case of GIS,
new federal policies towards public access to spatial data.
In the past, GIS applications tended to be designed for a particular
purpose--that is, they were likely to be built to serve the needs of one
particular agency. The database was built, managed, and located at that agency
as well. This is inefficient, considering the effort required to build the
base layers. Now, GIS is beginning to provide integrated services across
agencies. Ultimately, a GIS designed for a specific region, such as a county,
should be equally useful to an emergency-response planner, a real-estate
agent, or a neighborhood volunteer crime-watch group.
Because development of the base topological layers is so difficult and labor
intensive, the same system should be accessible simultaneously to the county
planner's office, the Red Cross, and the public library. The Red Cross and the
county planner both may maintain tabular datasets that are of interest to each
other. Both datasets must be visible to a user looking at a county base map on
a computer terminal in another part of town. In this configuration, a general
system that leverages resources across agencies and community organizations
needs to integrate physically distributed datasets across a network
infrastructure. 


Hardware and Software Integration


Powerful GIS packages such as ArcInfo from Environmental Systems Research
Institute (Redlands, CA) and MapInfo (Troy, NY) are available on both Windows
and Macintosh platforms. Both Microsoft and Lotus are developing mapping
engines that will integrate spatial datasets developed on proprietary systems
with their respective Excel and 1-2-3 spreadsheets. 
In May, 1994, the Department of Housing and Urban Development (HUD) began to
distribute a mapping application to all of its field offices to assist
communities in planning proposals for the agency. A suite of Windows and
Macintosh applications built with FoxPro and MapInfo greatly automates a
complicated proposal-application process that requires the generation of
detailed demographic maps using Census data.
This toolkit is made available with the intent to provide citizens with a copy
of the planning proposal that can be read on a standard PC, for public review.
While these applications are not designed to be networked with this initial
release from HUD, the Foxbase database-development system and the MapInfo GIS
development system are both designed to work in networked environments. In the
future, such applications may become increasingly networked, becoming
available to the general public at libraries, schools, and even at home--via
broadband Internet services through cable-television systems.


Improvements in Application Development Tools


Object-oriented programming is making its way into GIS applications. In a
spatial dataset, an object can be defined as a map feature--a bridge, for
example--along with nongeographic attributes such as load stress or road
traffic collected from on-site sensors. Data and instructions associated with
map features have "intelligence" and can act with knowledge under certain
conditions without the programmer writing additional code for each case. The
object can also understand that it needs to link to a remote dataset to access
data under certain conditions, an important requirement in a networked
application. For example, the bridge object may understand that it needs to
update traffic data periodically from a remote sensor embedded in the road.
Objects can be arranged hierarchically into classes, where subclasses inherit
attributes associated with their parent class. Software for GIS that takes
advantage of this important characteristic could further reduce application-
and database-development time. The Argus mapping package from Munro Garrett
International (Calgary, AB) is a Windows-based client/server package that
employs an object-oriented approach which can link clients simultaneously to
multiple databases. This approach reduces the need for a proprietary, separate
spatial database--an ongoing concern by developers and public-access advocates
for years.


Multimedia and Hypermedia Enhancements


A GIS sufficiently comprehensive for civic networking and policy analysis
cannot simply portray the spatial dimension of printed reports. This is
because public information exists in a variety of formats, from full-text
documents, hand-drawn elevation maps, and 35-mm slides, to full-motion video
and recorded sound. Additionally, related data exists in other applications
such as statistics, computer-aided design (CAD), 3-D visualization, and
modeling software. Increasingly GISs are being designed to handle
nonconventional data. Nodes on data layers once referred only to tabular data;
they can now be dynamically linked to bitmapped images of historic homes or
analog oral histories of neighborhoods. Moreover, as software converges many
applications that are geographically referenced will be available in a single
front-end GUI of multiple representations. Also as data-compression techniques
for these large datasets improve, multimedia becomes increasingly effective
for LANs.

As an example, various information about proposed waterfront developments can
be displayed for citizen comment. Simultaneously, multiple windows can show an
aerial photograph of the site, an architectural rendering of the buildings, a
3-D simulation of the buildings in context, the numerical results of a model
of generated traffic patterns, the sound generated by the traffic, textual
histories of the waterfront, and names and addresses of contact people. (See
"Planning with Hypermedia," by Lynn L. Wiggins and Michael J. Shiffer in
Journal of the American Planning Association, Spring 1990.)
Hypermedia would also extend associative data structures to GIS. It would
allow one to relate not only concepts, images and digital video, but to relate
them geographically. Pointers could be to filenames or latitude/longitude
coordinates, for example. Authoring tools, which long have been graphically
based, are ideally suited to script this integration.


Networking Capabilities


Recent experiments using the World Wide Web and Mosaic suggest that the
Internet and a growing base of "resource-discovery" products may be able to
work together as a suite of tools developers can use to build front ends into
an internetworked GIS application. Indeed, a "map server" developed at Xerox
PARC (Palo Alto Research Center) displays a map of the world to a Mosaic user,
who can then select regions with a mouse, successively calling up
more-detailed maps. 
The Wide Area Information Server (WAIS) is a query tool that works in concert
with an emerging international data-retrieval protocol known as Z39.50. Both
are in the public domain. Z39.50 has been upgraded to permit queries using
geographic coordinates, and enables transactional processing for payment, if
needed. The World Wide Web and client-based browsing tools such as Mosaic
provide gateways to WAIS servers and allow graphic- or forms-based queries for
spatial data.
Structured query language (SQL) is increasingly being used to link client GIS
applications to remote databases in real-time across networks. Previous
versions of SQL were limited in their ability to work with spatial data;
however, recent implementations have resolved some of the earlier technical
drawbacks. Similar to the development of macro languages for word processing
and spreadsheet software, high-level scripting languages are becoming
available to facilitate the coding of such links across networks to
distributed datasets. One example is Atlas GIS for Windows from Strategic
Mapping (Santa Clara, CA), which employs Visual Basic as a scripting language.
MapInfo also supports SQL and includes a version of Basic to develop vertical
applications as well.
Eliminating a requirement for centrality through networking reduces risk of
becoming locked into proprietary standards. At the same time, improving data
integrity using metadata standards increases the confidence level for using
data that is not necessarily under centralized control.
Database development has long been the most arduous component of a robust GIS
application, and not a few developers have concerns over networked
applications where data integrity is an issue. Here, the Federal government
may be providing some solutions in a policy calling for a metadata standard
for spatial datasets. The metadata standard would require "data about the
data" to be included in a dataset, in a uniform manner. In this way,
developers would have important details about the source and structure of
remote data they might wish to link into. The metadata standard will also
provide a coherent way for the government to mount huge spatial datasets on
the Internet. These databases could be queried and downloaded for local
applications, reducing development times.


National Geospatial-Data Clearinghouse


The National Spatial Data Infrastructure is a federal initiative to bring
together the technology, policies, standards, and human resources necessary to
acquire, process, store, distribute, and improve utilization of spatial data.
On April 11, 1994, President Clinton signed Executive Order 12906,
"Coordinating Geographic Data Acquisition and Access: The National Spatial
Data Infrastructure," which instructs Federal agencies to document spatial
data beginning in 1995 and to provide this metadata to the public through an
electronic clearinghouse within one year. The Internet is the most
cost-efficient way to deliver such data to the public. 
According to the FGDC, 
The National Geospatial Data Clearinghouse is a distributed,
electronically-connected network of geospatial data producers, managers, and
users. The clearinghouse will allow its users to determine what geospatial
data exist, find the data they need, evaluate the usefulness of the data for
their applications, and obtain or order the data as economically as possible.
This Internet-based clearinghouse would require providers to describe data
about their metadata. The provider may also provide access to the geospatial
data; in the case of a federal agency, they may be required to do so. Metadata
available through the clearinghouse will be distributed and physically
maintained by providers all over the country.
Thus, the general public will be able both to locate government-collected
spatial data over the Internet and to query information servers directly and
acquire the data at marginal cost. This could dramatically reduce the entry
costs to small entrepreneurs, community-based organizations, and private
individuals.
In September, 1994 the FGDC announced awards for a new program called the
National Spatial Data Infrastructure (NSDI) Competitive Cooperative Agreements
Program, available to state and local governments, universities, and the
private sector. A group of innovative projects awarded through the $250,000
fund will help develop the Clearinghouse itself, and the use of FGDC-endorsed
standards in data collection, documentation, transfer, and search and query
over the Internet. The FGDC plans to continue this cooperative program in
fiscal year 1995.


What is Internetworked GIS Good For?


Local-government planning agencies have begun development of internetworked,
distributed GIS applications. For example, InfoWorld recently reported on a
project of the Metropolitan Water District in Southern California. A
distributed GIS front-end application linked to a central relational database
helps planners monitor water-usage patterns among 16 million customers across
six counties (see "Water District Fights Drought with Data Technology," by
David Baum, InfoWorld, July 25, 1994). The system is expected to reliably
predict water demand through the year 2000. Previously, large census files had
to be processed manually to gain an understanding of where customers were
located and their present and future water needs.
ArcInfo is used to process census tract data and provide demographic maps to
users connected by Sun SPARCstations over an Ethernet network. However,
ArcInfo is inefficient at maintaining the huge set of demographic records
itself. Analysts using Sun workstations running a local GIS application link
to a central database for subsets of needed census records, process this data
and generate reports--usually maps--or pass results down the network to
technicians using less powerful personal computers for spreadsheet analysis.
The tabular census data, containing attributes such as housing and population,
is maintained on a central Oracle database and can be linked, many-to-one, to
the local GIS application. The Oracle database contains about 3 million
records and resides on a SPARCstation 10.
Future plans for this system involve development of an object-oriented
database system independent of any particular relational-database product.
Using a metadata approach similar to that advocated by the federal government,
the system will store the logical structure of many individual datasets
without being tied to any particular physical database. This will make it
possible to develop front-end GIS applications independent of the relational
databases they may be linked to. In this way, such applications can link among
any number of dissimilar relational databases containing tabular data without
changing the application-program code.
Using an object-oriented design with global metadata structures, a local
government will be able to develop a general-use, front-end GIS that can
establish near-ad hoc linkages with many physically distributed relational
databases. For example, place-based data maintained by a neighborhood
crime-watch group could be linked into general GIS base maps maintained by a
local government and made available through public libraries to citizens. The
crime-watch database could be a Foxbase application, maintained on a 386, in a
volunteer's home office.


Community-Information Infrastructure


How can a community-information infrastructure make a distributed GIS
application useful to any neighborhood group or small organization in town?
Glenview, a Chicago suburb of 35,000 was the first community in the United
States--and perhaps the world--to establish broadband Internet connectivity to
its civic institutions through a cable-television system. In Cambridge,
Massachusetts, Continental Cablevision and Performance Systems, a commercial
Internet provider, announced a joint venture in September of 1993, shortly
after the system in Glenview, Illinois became fully operational. By March,
1994 commercial Internet service was introduced in Cambridge using many of the
same components proven in Glenview. While Cambridge had the distinction of
hosting the first commercial offering and receiving much press notice in the
bargain, the two systems are nearly identical in technology and capability.
Similar ventures are proposed or underway in a number of other cities around
the country.
In Glenview, implementation of broadband Internet service over the cable
system began in the late fall of 1992. By the summer of 1993 schools,
libraries, and government agencies were connecting to a 4-megabit TCP/IP
service to the Internet. This fall, according to John Mundt, Director of
Administrative Computing for Glenview School #34, "3500 students will have
broadband Internet access in class." Moreover, the school has installed a
Cisco terminal server to enable SLIP/PPP connections for students at home.
Unlike the commercial service in Cambridge, the Glenview model connects to the
Internet over an I-Net ("institutional network"). An I-Net is a common
provision in many local cable franchise agreements that require noncommercial
services in exchange for using the public right-of-way.
A cable-television plant is typically configured as a passive star. For
example, a cable operations facility may receive satellite broadcast
television signals at a head end, then retransmit them down a set of branches,
where each branch serves a number of residential drops. Signal amplifiers are
installed at fixed distances, but these amplifiers are generally passive and
can only transmit in one direction.
However, in building an I-Net, a cable operator agrees to connect civic
institutions for what amounts to close-circuit television as part of the
public, education, and government (PEG) requirements often included in the
franchise. Thus, the I-Net had already been installed as a bidirectional
network in Glenview. 
In Cambridge, broadband Internet service is priced at what the market will
bear. In Glenview, access is jointly procured by civic institutions that share
costs with a contracted Internet service provider. The joint procurement
contract can be reviewed periodically, and a competitive bid process can
leverage price. On the other hand, the commercial partnership in Cambridge may
be in a better position to invest in upgrading the entire cable system to
provide residential Internet access beyond what could be limited to a
noncommercial I-Net in Glenview.
According to Glenview's Mundt, the equipment needed to build a community
network over broadband cable is straightforward. "Each site has two pieces, a
router and a broadband converter. In addition, a single repeater is required
at the cable company head end."
The implementation project was undertaken by TCI, Zenith, Compatible Systems,
and netIllinois. The Zenith ChannelMiser device bidirectionally converts
Ethernet to broadband using two frequencies for send and receive. The
ChannelMiser is capable of processing TCP/IP, Localtalk, or IPX packets. The
Glenview system uses TCP/IP. Under certain situations, Localtalk packets are
embedded within TCP/IP packets. A head-end repeater receives data on one
channel and retransmits to another, with each channel separated by about 150
MHz. At several locations, a RISCRouter 3000E, manufactured by Compatible
Systems (Boulder, CO) was installed. The router bridges between Ethernet-based
TCP/IP networks and AppleTalk networks. Finally, an Internet connection was
established by netILLINOIS, who provided a router and associated equipment.
The project began nearly ten years after the original I-Net had been
installed. Much of the plant was old and untested, with many of the amplifiers
inoperative. Neither TCI nor Zenith had outdoor broadband-equipment experience
in a real-life cable-television environment. After some experience was gained,
the most common failure point appeared to be the broadband amplifiers
installed throughout the community. They are sensitive to temperature, and
some have incurred damage from snow plows and automobile accidents during the
winter. As the copper-based infrastructure gives way to fiber, reliability is
expected to improve. Despite the technical flaws, the network is said to be
operational 98 percent of the time.
TCI has indicated interest in further developing the system to provide
Internet connectivity to the home through the cable plant. Zenith is
developing the "Homework card," a lower-speed device card for PCs that is
expected to provide throughput of about 200K. In Cambridge, Continental and
PSI have offered a future residential service that will use either the
Homework card or a similar device.


A Community-Based Application for Internetworked GIS


What critical success factors in Glenview made this community network
possible? How can community groups in Cambridge take advantage of broadband
Internet access through the cable system? What kind of public access GIS
application could emerge in towns such as these?
In Glenview, a number of civic committees and organizations were aware of the
I-Net capability. In exploring ways to utilize it, they decided that Internet
connectivity was essential. Zenith, which manufactures bidirectional CATV
amplifiers, is headquartered in Glenview. A new Internet service provider,
netILLINOIS, was seeking public schools to connect to the Internet. These were
all critical factors for success that came together, and the system has grown
to the extent that over 3000 school children will have broadband Internet
access in their classrooms in the 1994--95 school year.
Cambridge, with its universities and high-tech knowledge industries seems a
perfect place to launch a flagship commercial Internet service. Yet, without
community-based content, such a service may wind up being underutilized.
Continental Cablevision provided a one-year grant of access to the Cambridge
Public Library. In July, 1994, the library, in association with a number of
community-based organizations, installed Macintosh public terminals linked to
the Cambridge Civic Network, an Internet-based community network.
A community-planning initiative in Cambridge called the "Civic Forum" has held
open public meetings for over a year addressing quality-of-life issues for a
sustainable future. Recognizing that planning for a sustainable future is an
information-intensive activity, organizers of the Civic Forum have begun to
consider applications for the Cambridge Civic Network. Will a GIS be such an
application? The factors for success are present. The city has plans for
developing a GIS capability, and many businesses in the area have core
capabilities. It is too early to tell what could emerge from a test-bed such
as this, but as this article makes clear, GIS will soon no longer be an
exclusive tool for planning agencies, direct-marketing firms, political
consultants, or oil companies. It will be used by the general public to keep a
better pulse of the lifeblood of their community.
What could a neighborhood crime-watch group do if a volunteer could connect a
PC-based Foxbase application containing locally collected incident data, to a
MapInfo-based community GIS application accessible over the cable system using
a several-hundred-dollar Zenith Homework card? Will anyone change City Hall by
sitting in the public library studying electronic maps to compare how the
police were doing in different parts of town--on an hour-by-hour basis? In a
few short years, we may know the answer.





Special Issue, 1994
EDITORIAL


Into the Future


If you asked ten longtime Byte readers to name the magazine's best issue,
chances are that nine of them would recall "the Smalltalk issue," the seminal
August '81 edition with a magical Robert Tinney cover featuring the word
"Smalltalk" emblazoned across a hot-air balloon.
What's historic about that particular issue was that it introduced the
Smalltalk language and the notion of object orientedness to a generation of
programmers. At that time, the Smalltalk-80 system was (in the words of then
Byte editor Chris Morgan) "the culmination of ten years of research by the
Xerox Learning Research Group located at the Xerox Palo Alto Research Center
(PARC)." With articles by pioneers of the caliber of Adele Goldberg, Larry
Tesler, Peter Deutsch, Dan Ingalls, and others, the issue said just about all
there was to say about Smalltalk.
Unfortunately, Smalltalk never fully achieved the success and acceptance its
designers envisioned. Yet the concept of object-oriented programming is
clearly in the mainstream, particularly as embodied in C++ (which itself is
over ten years old). But just as clearly, languages such as C++ and Smalltalk
don't solve every programming problem and, in the coming years, will therefore
be pushed aside by programming languages which better address the challenges
of the day.
What will be the nature of the languages which will define the programming
tools and techniques you'll be using in the next century? Will they be visual?
Object oriented? Will they be integrated into a system as with Smalltalk or
Oberon? Will they be bigger than a bread box (like C++), or small enough to
fit into a PDA (like Dylan)? Unless your crystal ball has higher resolution
and greater bandwidth than most, you're not likely to find out the answers in
the near future. At best, you can observe trends and make guesses based on
some of the lesser-hyped programming languages on the horizon--like those
languages discussed in this special issue of Dr. Dobb's Journal.
What do forward-looking languages such as Dylan, Sather, Oberon, Bob, and
Parasol have in common? For one thing, they're almost all object oriented. For
another, these alternative programming languages tend to be more specialized
than general-purpose languages such as C or C++. Parasol, for instance, is
designed for parallel systems; Tcl, as a command or macro language for making
applications programmable; Perl, for network system administration; and
Sather, for numerical programming. Finally, alternative languages can be hard
to find. In most cases, you have to grab them off the Internet or BBSs,
although it's becoming increasingly common to find them collected on CD-ROM
libraries. Programming languages are, as Bob Jervis (creator of Parasol) says
in this issue, a manifesto of what their creators see as good and bad in
programming. This was certainly the case when Xerox PARC researchers set out
to design Smalltalk. It's no less so today. In each of the languages described
here, the designers identified a problem, recognized that conventional
approaches didn't address the problem, then came up with solutions that
adapted the good and discarded the bad.
Twenty years from now, a new generation of programmers will be using tools and
developing applications we can only imagine. But just as today's tools grew
from programming languages of a quarter century ago, those tools may have
their roots in one or more of the alternative languages described in these
pages. 
Jonathan Erickson
Editor-in-chief














































Special Issue, 1994
The Parasol Programming Language


An OO language that supports network and parallel computing




Robert Jervis


Bob is an independent consultant and can be reached at Wizard Consulting
Services Inc., 17645 Via Sereno, Monte Sereno, CA 95030 or
bjervis!rbj@uunet.uu.net.


A programming language is a manifesto from its creator declaring what's good
and bad in programming. The good becomes a feature, the bad, an error.
I created Parasol (Parallel Systems Object Language) to implement an operating
system. In 1982 I tried to build an extensible version of UNIX, using a spare
PDP-11 at work after hours. I started coding the system in C and got a
multitasking kernel running, but extensibility proved to be a problem. I set
aside the project until I owned my own computer and had the time to go
further. By the time I had resumed the project in 1989, I had become convinced
C wasn't up to the task, so I designed Parasol.
Although C and Smalltalk are its primary sources, Parasol's design was
influenced by many languages, including C++, CLU, Algol, and Turbo Pascal.
Parasol had to be as efficient as C, while incorporating some aspect of the
object-oriented capabilities of Smalltalk. When I designed Parasol I was
working on a C++ compiler project, so I knew that C++ implemented classes in a
way that avoided the performance issues of Smalltalk.
I made two decisions at the outset which determined the general outline of
Parasol: While using C as a starting point for ideas, Parasol did not have to
accept ANSI C code; and secondly, instances of classes did not have to be
"first class" objects.
This latter point is important. In making a programming language, it would be
nice if all objects could be treated as uniformly as possible. In APL, arrays
are considered first class because almost any operator that can be applied to
a scalar value can be applied to an array. In Smalltalk, all objects are first
class because they have a type derived from a single common ancestor and,
other than some necessary magic glue in some of the low-level classes to do
arithmetic and control flow efficiently, all Smalltalk classes are written in
Smalltalk itself.
Smalltalk lets you add classes that are just as capable as built-ins because
the language syntax is very simple. By contrast, C has a complex type
declaration and expression syntax and a rich set of scalar numeric types and
operators. So to make user-defined classes first class, a language such as C++
must add many features like references, operator overloading, constructors,
and destructors. C++ is made more complicated by the need for all that syntax
to define new types.
I avoided this with Parasol. All structures in Parasol are considered
"classes" and can have method functions defined for them. Since they're
structures, they can't be used with arithmetic operators. Consequently, a
minimum of new concepts are needed in Parasol beyond those already found in C.
In the last three years I've added distributed and parallel programming
constructs to Parasol, including interprocess messages and multithreaded
applications.
The Parasol language itself (including the name) is in the public domain. The
current implementation is for a 32-bit stand-alone operating system that uses
DOS disk files. It is available as unsupported shareware and includes the
operating system and all source code for the compiler and libraries. This
implementation is still a research project, so no promises about bugs. It does
run on most desktop 80386/486 DOS systems, but it doesn't recognize DOS 6
compressed disk partitions. A SPARC/UNIX implementation is in the works.
Parasol 0.5 is available electronically from DDJ (see "Availability," page 3).
Alternatively, registered versions can be purchased directly from me.


Declaration Syntax


Parasol declaration syntax is closer to Pascal than to C. Example 1(a) is a
simplified version of the general form of a Parasol declaration. Functions are
declared in almost the same way; see Example 1(b). A new type name can be
introduced with the code in Example 1(c). For objects, you can declare more
than one name in a single declaration by simply using a list of identifiers
separated by commas; see Example 1(d).
Like many Algol-like languages, Parasol is block structured (although more in
the spirit of C than Pascal, since functions can't be nested). All symbols
must be unique within their own scope of definition, but symbol names can have
distinct declarations in any number of different scopes. A reference in an
expression always refers to the symbol definition in the "closest" enclosing
scope. If you refer to a symbol defined in the same scope, it doesn't matter
whether the definition occurs before or after the reference. Thus the pairs of
statements in Examples 1(e) and 1(f) are equivalent in the body of a function.

Local declarations in a function body don't have to occur at the top of each
block. As in C++, they can occur anywhere in the block.
Exposure determines how accessible a symbol is outside its own scope. A
Parasol exposure can be public, private, or visible. For example, an object
declared at file scope can be global (accessible from other modules) by using
public or visible exposure, or local to its own file by using private
exposure. In most circumstances, the exposure of an object will default to
private if you don't specify otherwise. This encourages encapsulation, since
you must explicitly decide which symbols are public to the outside world.
Public objects can be read or written anywhere, but visible objects can only
be modified from within their own scope. Private objects can neither be read
nor written outside their own scope. A public integer might be declared as in
Example 2(a). 
Parasol's numeric types include two integer types, signed and unsigned, and
one floating-point type, float. These are the only truly built-in Parasol
numeric types. You can specify a size in bits for each of these types. A
variety of type synonyms are predefined for commonly used sizes. The actual
amount of memory an object consumes is left up to the compiler, however. For
example, Example 2(b) shows some of the predefined type names and their sizes
for the Intel 32-bit implementation.
Note that if you omit a size, the compiler picks a default. For example, plain
signed, unsigned, and float objects all happen to be 32 bits wide on the Intel
implementation.
You should use the type synonyms, especially for the floating-point types,
because the exact sizes will vary from one machine to another. The actual
compiled sizes are chosen to be at least as large as declared, but, at the
very least, an efficient fit to the performance of the machine. Thus, int
would be typically either 16, 32, or 64 bits wide. The long type must be the
widest integer size available on that machine and is typically either 32 or 64
bits wide.
You can declare integral bit fields with exact sizes by defining them within a
packed structure. You should only resort to exact bit sizes in declarations
for externally specified data formats, like system control blocks.
The ref keyword means "pointer-to." Parentheses enclose the arguments to a
function. Unlike in C, empty parentheses mean that no arguments are allowed.
For example, an integer absolute-value function might be declared as in
Example 2(c).
Square brackets, on the other hand, declare an array. A buffer of 512 bytes
might be declared as in Example 2(d).
More complex data types are constructed by stringing declarators together in
left-to-right order. For example, Example 2(e) declares x as a pointer to a
function returning a pointer to an array of ten singles.


Classes


You declare a class by enclosing a list of declarations inside curly braces,
with some optional modifiers in front of the curly braces. This enclosed list
creates a new structure or class type; see Example 3(a). 
Example 3(b) declares a structure named point (for a 2-D graphics package). In
this example, the symbols x and y are members of the class. The whole
declaration gives this new type a name: point. Objects of type point can now
be declared and manipulated.
Class modifiers are union, packed, or inherit. The union keyword declares a
C-like union, where all object members overlap one another. The packed keyword
signals that bit-sized members should be packed into words as densely as
possible. The inherit keyword (followed by the name of a class type) declares
that this is a derived class.
You can declare anything inside a class, including other classes. There are
some differences between declarations made inside a class and outside. Objects
declared inside a class are not static by default, but instead are fields of a
structure. Functions can be declared inside a class as well, where they are
called "methods." Methods are not called in quite the same way as normal
functions.
You call a method by designating an object of its class in the call
expression. The syntax requires that you name the object to the left of the
method name; see Example 3(c). Listing One shows an example of an object, O1,
with one public method, func. The main routine contains a call to that method.


Inheritance


Parasol supports only single inheritance so that when you declare a derived
class, you can name only one base class. The memory for an object is laid out
fairly simply, with the memory for the members of the base class first,
followed by the memory for the derived class members; see Figure 1. Space is
allocated in a derived class for newly defined members, even if they have the
same name as a member of the base class.
Parasol is like C++ in that you can refer to exposed (public or visible)
members of a class using the dot or arrow operators (depending on whether you
have an object or a pointer to the object). You can also refer to members from
within method function code.

Just as local block scopes nest within the body of a function, in Parasol the
body of a class forms a scope, which all of the enclosed methods share.
Listing Two illustrates a method, hypot, which computes the hypotenuse of a
right-triangle for the coordinates of the point object. This method refers to
the two members, x and y, of the enclosing class. Remember that members are
just fields in a structure, so these references must be to some object. Since
you must mention an object in a call to any method, it is this object that you
are actually referring to. In effect, the address of the mentioned object is
passed as a hidden parameter in a call to a method. 
In more complicated situations (such as where you have derived classes), the
chain of base classes form a set of nested scopes as well. Thus, when matching
names to variables, after the compiler has exhausted all the local block
scopes inside a function, it next looks in the list of members of the
enclosing class. If the desired symbol is not found there, before the compiler
moves to examine the scope enclosing the type (usually file scope), it looks
along the chain of base classes. The effect of this is that you can redefine
methods in subclasses. 
You can use two keywords to access the hidden object parameter. The self
keyword is a pointer to the object passed to the method. Its type is a pointer
to the enclosing class. In a derived class, you can also use the keyword super
to refer to the same object. This keyword has the same value as self but is a
pointer to the base class of the object.
The super keyword comes in handy if you want to call methods in the base class
that have been redeclared in the subclass.
Inheritance provides an excellent way to help organize and document
interfaces. By exploiting the redefinition of methods in derived classes, you
can design a much more structured and well-organized program than with C.


Polymorphism


The Parasol windowing library defines a set of common capabilities that all
windows share. Thus all windows have a redraw function that gets called when a
window is moved or resized. In C, you would have to implement such a
capability in a couple of different ways.
In one way, you would have some master redraw function, that is, a huge switch
statement. For every type of window in the program, you would have a case that
controls the redraw of that window. This means that every time you add a new
window type to your program, you need to modify this function. Since the
window library has several functions beside redraw, there are several
different switch statements in different places, each requiring updating
whenever a new window type is added.
A better way, in C, is to define a set of function pointers that you store
with each window. At run time, when you actually create a window, the
appropriate function pointers are copied to the object. That way, when you
need to redraw a window, you simply call through the pointer. This has the
advantage of allowing you to cleanly add window types without having to change
existing code.
Parasol (like C++) provides convenient syntax to make the function-pointer
solution easy to implement. In Parasol, you simply declare a method with the
keyword dynamic. Then the compiler arranges for an object of the type
containing the method to use a run-time pointer in all calls of the method. In
Listing Three , a window class is defined, containing a redraw function that
accepts no arguments and produces no return value. Then, an editor class is
created that defines a new version of redraw that does the specific redrawing
operations for an editor.
There are some restrictions on dynamic functions in Parasol. First of all, for
statically bound methods, there are no restrictions on how you redefine the
arguments or the return value of the function. They can be arbitrarily changed
in a subclass. A dynamic function, on the other hand, must be redeclared with
the same arguments and return type as it was originally defined in the base
class.
If you look again at Listing Three, the function at the bottom is passed a
pointer to a window object, and it calls the redraw function. If the argument
passed actually points to an editor and not a window, the call will
automatically go to the version of the redraw method for an editor at run
time. Because the caller doesn't know the actual object it is calling, at
compile time, the arguments and return type must be fixed for all versions of
the redraw function.
Note also that the body of the editor's redraw function uses super seemingly
to call itself. In fact, because this call uses super, the code generated is a
call to the redraw of the window class (the base for editor ). This is a
common construct in the windowing library.
C++ provides essentially the same capabilities as dynamic functions, but just
calls them "virtual" functions instead. Parasol does allow one element of
flexibility that I haven't found in C++ compilers: If you assign a pointer to
a subclass object to a pointer to the base class, C++ rejects this assignment.
Parasol accepts it. As long as you are copying from a more specific subclass
to the more general base, this copy is considered legal in Parasol (though not
in the other direction, of course). In Listing Three, you can assign the
address of an editor object to a pointer to window, but not the address of a
window object to a pointer to editor.


Units


One of the real shortcomings of C++ is that class definitions are written in
header files and included in each compilation unit. Consequently, common
information describing a class is replicated in all these separate modules.
Methods have to be written outside the class so that they can be placed
outside the headers. A great deal of effort has been put into C++ compilers to
overcome the inefficiencies and complexities that arise from these
constraints.
Parasol overcomes these limitations by changing the program structure. In
Parasol, you simply write source units (analogous to modules in Modula-2 or
units in Turbo Pascal). By declaring an object public in a unit, that object's
definition is available in other compiles. To gain access to a unit's public
types, (such as objects and functions) another unit must explicitly include
the definition; see Example 4(a).
One advantage of this scheme is that public symbols can be duplicated in
different units of the same program. This means that libraries obtained from
different sources won't have public symbols that clash. If two units sharing a
common public symbol are both included into a third source, you can still
disambiguate references using the :: operator; see Example 4(b). 


Messages and Threads


Parasol allows you to define special objects that can exchange messages with
other processes. An object receiving messages must be defined as a subclass of
the built-in class called "external." The subclass then defines a set of
methods, each marked with the gate keyword. A client can send messages to the
object by first obtaining a special far pointer to it from the operating
system or messaging library. You actually send a message by simply calling a
gate method using the far pointer. The Parasol compiler generates code to send
the arguments (as the body of the message) and waits for a reply, which then
becomes the return value.
These capabilities make for a natural scheme for defining client/server
applications. The server is simply an object subclassed from external, and the
client is any Parasol program.
External objects are designed to operate as separate processes. The Parasol
library includes facilities to start and control these threads. The
object-oriented capabilities of Parasol have proven useful even for
multithreaded applications. Since Parasol's libraries are designed to use
objects, there are few if any static variables (which tend to cause trouble
for multithreaded applications).


Conclusion


I began designing Parasol with the idea of fixing some syntax problems in C
and adding a minimum of new features. The changed syntax certainly presents a
barrier for people with large bodies of C code, but the binary import/export
mechanism of units and the object-oriented extensions are real enhancements to
C. Parasol is simpler than C++, so I spend my time coding solutions, not
exploring exotic features.
Parasol has a number of features I haven't even mentioned here, including
exceptions, but altogether it is still a fairly compact language. The compiler
I've written is fast (compiling over 60,000 lines per minute on my 486/66MHz),
and the code is as good as unoptimized C. Parasol should optimize at least as
easily as C, I just haven't had time to write an optimizer. I'm now working on
a Parasol-to-C translator to make the language more readily available.
Parasol began as yet another object-oriented language, and while it has
advantages, who needs another one of those? Now that Parasol has messages and
threads, it is more than just another OO language. Network and parallel
computing need new languages that give the programmer some help. I think
Parasol does just that.
Example 1: (a) A typical Parasol declaration; (b) declaring a function; (c)
introducing a new type name; (d) declaring more than one name in a single
declaration; (e) and (f) are equivalent in the body of a function.
(a)
name: exposure type-declaration = initializer;
(b)
name: exposure type-declaration = { statements; }
(c)
name: exposure type type-declaration;
(d)
i, j, k: int;
(e)
i: int;
i = 7;
(f)
i = 7;
i: int;
Example 2: (a) Declaring a public integer; (b) predefined type names and their
sizes; (c) declaring an integer absolute-value function; (d) declaring a
buffer of 512 bytes; (e) declaring a pointer to a function and returning a
pointer to an array.
(a)

i: public int;
(b)
byte: public type unsigned[8];
short: public type signed[16];
int: public type signed[32];
long: public type signed[64];
single: public type float[32];
double: public type float[64];
extended: public type float[80];
(c)
abs: (x: int) int = {
 return x >= 0 ? x : -x;
 }
(d)
buffer: [512] byte;
(e)
x: ref () ref [10] single;
Figure 1 Memory layout for a Parasol object.
Example 3: (a) Creating a new structure or class type; (b) declaring a
structure named point; (c) Parasol syntax requires that you name the object to
the left of the method name.
(a)
 class-modifiers {
 declarations;
 }
(b)
 point: type { public:
 x, y: short;
 };
(c)
 object method ( arguments );
Example 4: (a) Using a symbol defined in another unit; (b) referring to
multiply defined symbols from different units.
(a)
Unit a:
 xyz: public int;
Unit b:
include a;
... xyz ...
(b)
Unit aa:
 xyz: public int;
Unit bb:
 xyz: public double;
Unit cc:
 include aa, bb;
 ... aa::xyz ...
 ... bb::xyz ...

Listing One 
// O1 is an object with class type. 
// The type of O1 is anonymous.
O1: {
 hidden: int;
 public:
 record: (i: int) = {
 hidden = i;
 }
 func: (i: int) int = {
 return i * 3 + hidden;
 }
 };

main: entry () = {
 x: int;
 O1 record(3);
 x = O1 func(5); // Method call
 printf("Value is %d\n", x); // Prints 'Value is 18'
 }


Listing Two
include math;
point: type {
 x, y: short;
 hypot: () single = {
 f, g: short;
 f = x;
 g = y;
 return sqrt(f * f + g * g);
 }
 };


Listing Three
window: type { public:
 redraw: dynamic () = { ... }
 };
editor: type inherit window { public:
 redraw: dynamic () = {
 super redraw();
 ...
 }
 };
func: (p: ref window) =
 {
 p redraw();
 }



























Special Issue, 1994
The Perl Programming Language


Perl scripts can simplify network communication




Oliver Sharp


Oliver is a graduate student at the University of California at Berkeley,
researching parallel programming environments. He can be reached at
oliver@cs.berkeley.edu.


Wary of becoming entangled, many programmers never try to write networked
applications. While connecting computers together can be difficult and
complex, you don't necessarily have to master the alphabet soup of standards
and the wide array of specialized hardware just to get started writing
programs that work over networks. There are software interfaces that hide many
of these details from you.
Although no single interface is supported everywhere, the one that's al-most
universally available under UNIX is Berkeley sockets. Perl, a language
designed to handle many system-administration tasks, makes handling the socket
protocol easier still. This article shows how to write Perl scripts that
communicate across networks of UNIX machines. 
Perl, developed by Larry Wall, is a tool for solving all the irksome,
automatable file-management tasks that bedevil the system manager and computer
user. I've come to depend on some of the Perl scripts I've written; one
reformats BibTeX bibliography entries to print out on 7x2 sheets of labels.
Another connects to every machine in our local subnet and lists, with their
location, the people who have been active within the past hour. A useful
script (called "zap") that appears in Programming Perl lists currently
executing processes that match a search criteria and lets the user kill them
if desired.
Experienced UNIX users know that many tools are available for handling these
kinds of tasks: grep for searching files, awk for scanning and modifying
files, various shell languages for writing scripts, and so on. Perl can work
with these tools, but it really subsumes many of them. Because Perl scripts
are precompiled at startup time, they are much more efficient than shell
scripts. This is particularly noticeable if your script does any computation;
unlike the shell scripts, Perl has computational built-in operators and does
not need to rely on external programs (such as test).
The philosophy behind Perl is not minimalist--Larry Wall was trying to build a
language that makes it easy to solve problems with a minimum of fuss. Perl
syntax is based on C syntax, with many additions and modifications. You don't
absolutely need to know C first, but you will learn Perl much more quickly if
you do.
Perl provides a number of useful features beyond those available in C; one of
the best is associative arrays. An associative array is like a normal array,
except that it is indexed with a string. Suppose you want to count the number
of times that a word appears in a file. If you are reading the words, keeping
them in a variable called $word, you can count them using an associative array
called $count, like this: $count {$word }++;. The curly braces tell Perl that
you are using an associative array; it uses the contents of the $word variable
as an index. Since you are applying a numerical operation to the array
location (that is, increment), Perl knows that this array contains numbers. If
there has not yet been any reference to the location, Perl initializes one,
gives it an initial value of 0, and then increments it. If that key has been
used before and the array already has a location, its contents are
incremented. By taking care of all these issues for you, Perl lets you write
very compact code that does a lot of work.
Perl is available for almost every computing platform in common use, but it
can't communicate over networks unless your operating system supports Berkeley
sockets. Perl can still be used for file manipulation under MS-DOS, MacOS,
AmigaDOS, and other systems that do not natively support socket-based
networking.
The Berkeley socket protocol was developed to allow communication between
networked computers. After examining sockets in a general way, I'll present a
Perl application that takes advantage of them. The application, called
"PostIt," allows users on different machines to leave notes for each other,
tagged by a keyword. Of course, I could have written PostIt in C, but Perl
simplifies the socket interface, making the code shorter and easier to
understand. Also, since the names of the socket routines are the same, you can
later scale up to C without much difficulty.


Sockets


A socket is an abstraction of the communication link between two machines. The
easiest way to understand it is to think of it as an extension of a UNIX pipe.
Through a socket, two processes can communicate with each other, whether or
not they are on the same computer. The socket is a two-way link, so each
process can read or write on it just the way that it would use a file
descriptor. In fact, to the standard I/O library, sockets look like file
descriptors and you can pass a socket to any library routine that expects a
descriptor (such as read and write).
Generally, the easiest way to use sockets is to set them up in stream mode,
where they'll act like you'd expect: If you send two messages, they're
guaranteed to arrive in the same order that you sent them. PostIt uses stream
mode because such guarantees make the programmer's life much simpler.
There are, however, alternatives to stream mode. Datagram mode, for example,
models the way the underlying network acts: You send data back and forth in
discrete packets of some particular size. If you want to send more information
than fits in one packet, you divide it up. Packets can arrive in any order and
they can also get lost, often requiring a protocol layer on top of the socket
interface. The advantage to datagrams is that they are more efficient, since
fewer layers of software lie between you and the network; as always, higher
levels of abstraction impose a cost.


PostIt


There are two parts to the PostIt program: a server sitting on a designated
machine that accepts commands, and a client program invoked by the user to
send the commands. Once up and running, the server waits for clients to call
it up with commands, of which there are three kinds: set, get, and die. The
server keeps track of messages, each of which has an associated key word. A
set has a tag and a string value, telling the server to associate the string
with that tag. A get has a tag, and the server returns the string (if any)
associated with that tag. For a get with the tag alltags, the server sends
back a list of all the tags that have information associated with them. The
die command tells the server to terminate itself. Figure 1 shows a simple
dialog using PostIt.
The first thing that happens is that somebody runs the server. In Figure 1,
the third machine runs the server and fields queries from the others. Real
servers (such as the ftp and finger daemons) are, of course, started
automatically when the system boots up; for now, however, I'll start PostIt by
hand.
The next step is that Joe on Machine #2 asks if anyone has left a message
under the tag lunch; the system responds that there's no message. Joe wants to
tell anyone who's interested that he is at the Sandwich House, and leaves a
message. The next person to come along is Mary on Machine #1, who looks to see
what messages are available. She sees the tag lunch, gets the message, and
decides to join Joe. She changes the lunch message to let the rest of their
group know where to go. There is one message per tag, so if somebody sets a
new one, it replaces the old. Since the example is simple, there's no message
protection, message history, or any of the other elaborations you'd want in a
real messaging program.


PostIt Implementation


Perl and C syntax are quite similar. I tried to avoid using the more exotic
features of Perl in PostIt to keep the code straightforward for C programmers.
The server uses a simple strategy for storing data: It creates a file for each
PostIt note in the directory where the server was invoked. The name of the
file is the tag for that note. The server sits in a loop, waiting for clients
to get in touch with it and either leave messages or request them.
Listing One is the server code. The first line is a command to the UNIX shell
that this is a script and should be run by handing it to Perl. The script
starts by stashing away its process ID into the variable $parent_pid. (All
string variables in Perl start with a dollar sign.) The variable $$ is a
built-in Perl variable that contains the process ID--Perl has many of these
variables, and they are useful (if rather cryptic) shortcuts.
The next two lines check if the user specified a port number when invoking the
server. If so, I set the variable $port to that number; otherwise, I use 2001.
Port numbers are a simple way for two processes to get in touch with each
other, somewhat like a phone number. If the server process is running on a
machine called "green," it tells green that it will handle any requests to a
given port (2001, say). Any client, whether it is on green or not, can "dial"
to machine green, port 2001, and the system will notify the server that
somebody called.
The problem with port numbers is that you don't know which are available. A
variety of services already use the lower numbers; 21, for example, is the
port used by the file-transfer utility ftp. You can request a specific port,
or you can ask the system to pick any available one (by asking for port number
0). If the server doesn't use the default value, the user will have to specify
the port number to the client. Just as with a phone, if you don't know the
right number, you can't get in touch with the server you are looking for.
The next line tells Perl that if an interrupt signal comes in, it should call
the subroutine suicide (which appears at the bottom of the script) and close
the socket. It is important to close sockets, because they won't be shut down
when a process exits. Some systems have a time-out, but many versions of UNIX
won't recover the socket until a system reboot.
Next, set up some variables that will be used in calling the socket interface
routines. The first is $family, which is set to 2 to indicate that I want
Internet protocols. There are several different protocol families, including
the Xerox NS and internal UNIX protocols. Since I'm going to be communicating
via TCP/IP to other UNIX machines, I'll stick to the Internet protocol.
The next variable, $sock_type, is set to 1 for stream mode. The last variable,
$sockaddr, is a character encoding of some network ID information; unless you
are doing something fancy, you can usually stick to this standard value.
To call socket routines, start by calling getprotobyname, which takes the name
of a protocol and returns three identification keys used by other socket
routines. Perl's string parsing comes in handy, letting you separate the
protocol information into three variables and then recombine with pack. Before
actually creating the socket, I tell Perl to set up a stream called NEW_S that
flushes on every input and output. Otherwise, the socket buffering causes
problems; in a conversation, the processes want to send a message, wait for a
response, and so on. Without automatic flushing, a sent message may sit in a
socket buffer because the system doesn't think it is long enough to be worth
sending yet. The select command sets a stream to be the current one, and $/ is
a special variable that controls whether flushing is done automatically.
Next, the call to socket creates a disembodied socket--it has buffering set
up, but isn't connected to anything. The call to bind gives the system some
more information about how the socket will be used. Now you're ready to do
something with the socket, depending on whether you will be calling somebody
or they'll be calling you. In the case of a server, you want to wait for
incoming calls, so you use listen. The second argument tells the system to
allocate enough space for five processes to wait to get in touch with you.
With the socket setup finished, you can get down to the business of being a
server--sitting in a loop, waiting for clients to connect. Each call to accept
returns when somebody calls, creating a new socket (NEW_S) for that particular
connection. The socket library makes extra sockets to allow servers to be more
responsive. A simple strategy would be to handle incoming clients in order,
forcing everyone to wait until the socket was free. Instead, the original
socket is only used to make the connection. Once a client attaches to it, a
new socket is created for that conversation and will be closed after it is
over. 
PostIt uses the typical server strategy of spawning a child process to handle
each conversation; that leaves the parent server free to handle the next
client who comes along. Now you see the point of the argument to listen--it
tells the socket library how many clients should be able to wait until the
server accepts them. Choose a number based on the number of requests you
expect clients to make and the delay between calls to accept. If the line for
server access is too long, the next client will be turned away.
The call to fork creates a child process which looks just like the parent and
has the same streams, variables, and so forth. The two processes can figure
out which they are by looking at the return value from fork. The one that gets
a 0 is the child and is responsible for handling the client. The parent gets a
non-zero return value; since it doesn't need to talk to the client, it closes
the temporary socket NEW_S and loops, calling accept to get the next client.
The child reads the first line from the socket into $command and uses the Perl
command split to peel off the instruction and the tag. PostIt can handle get,
set, and die; anything else is ignored.
To shut down, PostIt uses the Perl kill command, sending the parent UNIX
signal SIGINT. This tells Perl it will handle that signal in the routine
suicide, which closes the main server socket if it is open, prints out a death
message, and exits.

To handle a get, the program first checks if it has the special tag all-tags.
If so, just send the names of all the files in the server's directory. One way
to do this in UNIX is with the command 'echo*'. By putting it in backquotes,
you tell Perl to execute the command and replace it on the line with its
output. When print is issued a stream as its first argument (NEW_S in this
case), the other argument strings are written to that stream. That's all it
takes to send the list of files back to the client. If you're asked about some
other tag, use the UNIX cat command to send the file. If a file doesn't exist,
nothing is sent.
A set is equally simple. You open a file with the value of $tag as its name,
write the message from the client, and close it. Return OK to the client if
you succeeded, Nope if you fail.
On the server side of the connection (see Listing Two), PostIt uses a standard
UNIX trick because most of the work is the same, regardless of the message you
send. Post-It uses the same source code for different commands. In this case,
the script can be invoked in one of three ways: getinfo <info-tag>
[server-machine server- port]; setinfo <info-tag> <value> [server-machine
server-port]; and kill-server [server-machine server-port]. Under UNIX, you
can save some disk space by having three separate names for the same file
(using links). Alternatively, you can just make three copies.
The first version of the client asks the server for a message with the given
tag; the user can optionally specify a machine and a port to connect to.
Remember that a port is like a telephone number within a given machine. To
connect to the server, the client "rings" on that number; if a server wants to
answer requests, it will have called accept and a connection will be set up.
If no server information is given, we will use the default machine
(master.euphoria.edu) and port number (2001). The second kind of invocation
sets up a note with the given tag and value, again optionally specifying the
server information. The third one tells the server to close up and exit.
First, the client parses the arguments. The variable $0 contains the name used
to invoke the script, so PostIt uses it to figure out whether to get, set, or
die. If it's not just killing the server, you get the tag argument using the
Perl command shift, which takes an array and returns the first element,
removing it and shifting the rest down. If no array is specified, shift uses
the arguments to the script as its default.
For setinfo, you get the contents of the note and make sure that the user
isn't trying to assign a value to the special tag alltags. Having read the
required arguments (the tag, if we are getting info, both tag and value, if we
are setting), you can check for server information. Put the next two
arguments, if they exist, into $machine and $port. Use the defaults if you
didn't get anything.
The next several lines are similar to the server, though the arguments to the
socket functions are a bit different. The main difference is that you call
connect, specifying the host and port number of the server. Remember that in
the server you called listen, to tell the system that we were waiting for
clients to call us. The connect call to a listening socket creates a
connection. Another difference is that the client uses its original socket to
communicate with, unlike the server, which got a new one for each connection
from accept.
Once connected, PostIt sets the socket to flush I/O and sends the message. For
a setinfo, send the set request and see if you get back an OK. To get info,
send the get request and wait for a response. If you get back an empty string,
there wasn't a note with that name, so print out a message. Otherwise, print
out the note's contents. Once you have gotten the response from the server,
you're done, so close the socket and exit.


Conclusion


For more complete discussions of UNIX network-programming issues, I recommend
UNIX Network Programming, by W. Richard Stevens (Prentice-Hall, 1990) and
Internetworking with TCP/IP, by Douglas Comer (Prentice-Hall, 1988). 
For more information on Perl, turn to Programming Perl, by Larry Wall and
Randal L. Schwartz (O'Reilly and Associates, 1990). Larry invented the
language, so this book is the authoritative word. In fact, like many Perl
programs, the skeleton of PostIt comes from an example in the book. Both the
authors and many others are active participants in the Usenet group,
comp.lang.perl, where you can get exhaustive answers to almost any conceivable
question about Perl.
Figure 1 Sample use of PostIt on a network.

Listing One
#! /usr/local/bin/perl
# Usage: PostIt [port-number]
# This sets up a server, which sits around waiting for requests. There are
# three kinds: "set <tag> <value>" - stash this away
# "get <tag>" - get a value associated with tag, or return the
# list of tags if asked for "alltags"
# "die" - commit suicide
# If we get a SIGINT signal, close the socket and exit.
$parent_pid = $$; # stash our pid so we can be killed by a child
($port) = @ARGV; # see if we have a port argument
$port = 2001 unless $port; # if not, use 2001
$SIG{'INT'} = 'suicide'; # route SIGINT signal to subroutine suicide
$family = 2; # set up some protocol parameters
$sock_type = 1;
$sockaddr = 'S n a4 x8';
($name,$aliases,$proto) = getprotobyname('tcp');
$me = pack($sockaddr, $family, $port, "\0\0\0\0");
# make the socket, bind it to the protocol, and tell system to start listening
socket(S, $family, $sock_type, $proto) die "socket: $!\n ";
bind(S,$me) die "Tried to bind socket, got: $!\n ";
listen(S,5) die "Tried to listen, got: $!\n ";
select(NEW_S); $ = 1; select(stdout); # set auto-flush mode for sockets
select(S); $ = 1; select(stdout);
for (;;) {
 ($addr = accept(NEW_S,S)) die $!; # wait for incoming request
 if (($id = fork()) == 0) { # fork a child to handle request
 $command = <NEW_S>;
 ($whattodo,$tag,$rest) = split(' ',$command,3);
 chop($rest);
 if ($whattodo eq 'get') {
 if ($tag eq 'alltags') {
 print NEW_S `echo *`;
 }
 else {
 print NEW_S `cat $tag`;
 }
 }
 elsif ($whattodo eq 'set') {
 if (open (TAG,">$tag")) {
 print TAG "$rest\n";
 close(TAG);

 print NEW_S "ok\n";
 }
 else {
 print NEW_S "nope\n";
 }
 }
 elsif ($whattodo eq 'die') {
 print "got a kill\n";
 kill 'SIGINT',$parent_pid;
 }
 else {
 print "got unknown request $whattodo";
 }
 close(NEW_S);
 exit;
 }
 close(NEW_S);
}
# when a SIGINT signal comes in, close the socket and exit
sub suicide {
 close S if S;
 print "Suiciding now\n";
 exit;
}


Listing Two
#! /usr/local/bin/perl
# Usage: getinfo <info-tag> [server-machine server-port]
# setinfo <info-tag> <value> [server-machine server-port]
# killserver [server-machine server-port]
# This script tries to contact an info server on the given machine and port.
# If the latter aren't specified, it uses defaults. If no server is found, it
# complains. Otherwise, it either gets info about specified tag (if invoked as
# "getinfo") sets it (if invoked as "setinfo"), or kills server (if invoked as
# "killserver"). For getinfo, it returns whatever info server returns. The 
# magic info-tag "alltags" returns the list of tags that server has
information
# about. You aren't allowed to setinfo the word "alltags".
if ($0 ne 'killserver') { # if we are getting or setting info, grab tag
 $tag = shift;
}
if ($0 eq 'setinfo') { # get value, if we are doing setinfo
 $value = shift;
 die "That's a magic tag ..." if ($value eq 'alltags');
}
($machine,$port) = @ARGV; # get info about server, if specified
$machine = "master.euphoria.edu" unless $machine;
$port = 2001 unless $port;
$family = 2; # set up protocol parameters
$sock_type = 1;
$sockaddr = 'S n a4 x8';
chop($hostname = `hostname`);
($name,$aliases,$proto) = getprotobyname('tcp');
($name,$aliases,$type,$len,$myaddr) = gethostbyname($hostname);
($name,$aliases,$type,$len,$serveaddr) = gethostbyname($machine);
$me = pack($sockaddr, $family, 0, $myaddr);
$server = pack($sockaddr, $family, $port, $serveaddr);
# create socket, bind to protocol, and try to connect to server
socket(S, $family, $sock_type, $proto) die "Failed to make socket\n ";

bind(S,$me) die "Failed to bind socket\n ";
connect(S,$server) die "Failed to connect to $machine\n ";
select(S); $ = 1; select(STDOUT); # set socket to autoflush
if ($0 eq 'setinfo') {
 print S "set ",$tag," ",$value,"\n";
 $result = <S>;
 if ($result eq "ok\n") {
 print "Succeeded\n";
 }
 else {
 print "Failed\n";
 }
}
elsif ($0 eq 'getinfo') {
 print S "get ",$tag,"\n";
 $result = <S>;
 if ($result eq "") {
 print "Sorry, no info available about $tag\n";
 }
 else {
 print $result;
 }
}
elsif ($0 eq 'killserver') {
 print S "die";
}
else {
 die "I was invoked with strange name: $0\n ";
}
close(S);
































Special Issue, 1994
The Sather Programming Language


Efficient, interactive, and object oriented




Stephen M. Omohundro


Sather is an object-oriented language which aims to be simple, efficient,
interactive, safe, and nonproprietary. One way of placing it in the "space of
languages" is to say that it aims to be as efficient as C, C++, or Fortran, as
elegant and safe as Eiffel or CLU; and as supportive of interactive
programming and higher-order functions as Common Lisp, Scheme, or Smalltalk.


Sather has parameterized classes, object-oriented dispatch, statically checked
strong typing, separate implementation and type inheritance, multiple
inheritance, garbage collection, iteration abstraction, higher-order routines
and iters, exception handling, constructors for arbitrary data structures and
assertions, preconditions, postconditions, and class invariants. This article
describes a few of these features. The development environment integrates an
interpreter, a debugger, and a compiler. Sather programs can be compiled into
portable C code and can efficiently link with C object files. Sather has a
very unrestrictive license which allows its use in proprietary projects but
encourages contribution to the public library.
The original 0.2 version of the Sather compiler and tools was made available
in June 1991. This article describes version 1.0. By the time you read this,
the combined 1.0 compiler/interpre-ter/debugger should be available on
ftp.icsi.berkeley.edu, and the newsgroup comp.lang.sather should be activated
for discussion.


Code Reuse


The primary benefit object-oriented languages promise is code reuse. Sather
programs consist of collections of modules called "classes" which encapsulate
well-defined abstractions. If the abstractions are chosen carefully, they can
be used over and over in a variety of different situations.
An obvious benefit of reuse is that less new code needs to be written. Equally
important is the fact that reusable code is usually better written, more
reliable, and easier to debug because programmers are willing to put more care
and thought into writing and debugging code which will be used in many
projects. In a good object-oriented environment, programming should feel like
plugging together prefabricated components. Most bugs occur in the 10 percent
or so of newly written code, not in the 90 percent of well-tested library
classes. This usually leads to simpler debugging and greater reliability.
Why don't traditional subroutine libraries give the same benefits? Subroutine
libraries make it easy for newly written code to make calls on existing code
but not for existing code to make calls on new code. Consider a visualization
package that displays data on a certain kind of display by calling
display-interface routines. Later, the decision is made that the package
should work with a new kind of display. In traditional languages, there's no
simple way to get the previously written visualization routines to make calls
on the new display interface. This problem is especially severe if the choice
of display interface must be made at run time.
Sather provides two primary ways for existing code to call newly written code.
"Parameterized classes" allow the binding to be made at compile time, and
"object-oriented dispatch" allows the choice to be made at run time. I'll
demonstrate these two mechanisms using simple classes for stacks and polygons.


Parameterized Classes


Listing One shows a class which implements a stack abstraction. We want stacks
of characters, strings, polygons, and so on, but we don't want to write new
versions for each type of element. STACK{T} is a parameterized class in which
the parameter T specifies the stack-element type. When the class is used, the
type parameter is specified.
For example, the class FOO in Listing One defines a routine which uses both a
stack of characters and a stack of strings. The type specifier STACK{CHAR}
causes the compiler to generate code with the type parameter T replaced by
CHAR. The specifier STACK{STR} similarly causes code to be generated based on
STR. Since character objects are usually eight bits and strings are
represented by pointers, the two kinds of stack will have different layouts in
memory. The same Sather source code is reused to generate different object
code for the two types. We may define a new type (such as triple-length
integers) and immediately use stacks of elements of that type without
modifying the STACK class. Using parameterized classes adds no extra run-time
cost, but the choice of type parameter values must be made at compile time.


Object-Oriented Dispatch


Listing Two shows an example of object-oriented dispatch. The class $POLYGON
is an "abstract" class, which means it represents a set of possible object
types called its "descendants" (in this case, TRIANGLE and SQUARE). Abstract
classes define abstract interfaces which must be implemented by all their
descendants. Listing Two only shows the single routine number_of_vertices:INT,
which returns the number of vertices of a polygon. TRIANGLE's implementation
returns the value 3, and SQUARE's returns 4. 
Routines in the interface of an abstract type may be called on variables
declared by that type. The actual code that's called, however, is determined
at run time by the type of the object which is held by the variable. The class
FOO2 defines a routine with a local variable of type STACK{$POLYGON}. Both
TRIANGLE and SQUARE objects can be pushed onto stacks of this type. The call
s.pop might return either a triangle or a square. The call
s.pop.number_of_vertices calls either the number_of_vertices routine defined
by TRIANGLE and returns 3, or the number_of_vertices routine defined by SQUARE
and returns 4. The choice is made according to the run-time type of the popped
object. The names of abstract types begin with a $ (dollar sign) to help
distinguish them (calls on abstract types are slightly more expensive than
nondispatched calls).


Strong Typing


The Sather type system is a major factor in the computational efficiency,
clarity, and safety of Sather programs. It also has a big effect on the "feel"
of Sather programming. Many object-oriented languages have either weak typing
or none at all. Sather, however, is "strongly typed," meaning that every
Sather object and variable has a specified type and that there are precise
rules defining the types of object that each variable can hold. Sather is able
to statically check programs for type correct-ness_if a piece of Sather code
is accepted by the interpreter or compiler, it's impossible for it to assign
an object of an incorrect type to a variable.
Statically checked, strong typing helps the Sather compiler generate efficient
code because it has more information. Sather avoids many of the run-time
tag-checking operations done by less strongly typed languages.
Statically checked, strongly typed languages help programmers to produce
programs that are more likely to be correct. For example, a common mistake in
C is to confuse the C assignment operator = with the C equality test ==.
Because the C conditional statement if(_) doesn't distinguish between Boolean
and other types, a C compiler is just as happy to accept if(a=b) as if(a==b).
In Sather, the conditional statement will only accept Boolean values,
rendering impossible this kind of mistake.
Languages like Beta are also strongly typed, but not statically checkable.
Consequently, some type checking must be done at run time. While this is
preferable to no type checking at all, it reduces the safety of programs. For
instance, there may be a typing problem in obscure code that isn't exercised
by test routines. Errors not caught by the compiler can make it into final
releases.
Sather distinguishes "abstract types," which represent more than one type of
object, from other types, which do not. This has consequences for both the
conceptual structure and the efficiency of programs. An example which has been
widely discussed is the problem of the add_vertex routine for polygons. This
is a routine which makes sense for generic polygons but does not make sense
for triangles, squares, and so on. In languages which do not separate abstract
types from particular implementations, you must either make all descendants
implement routines that don't make sense for them, or leave out functionality
in parent classes.
The Sather solution to this is based on abstract types. The Sather libraries
include the abstract class $POLYGON, which defines the abstract interface that
all polygons must provide. It also includes the descendant class POLYGON,
which implements generic polygons. The add_vertex routine is defined in
POLYGON but is not defined in $POLYGON. TRIANGLE and SQUARE, therefore, do not
need to define it.
Run-time dispatching is only done for calls on variables declared by abstract
types. The 
Sather compiler is, itself, a large program written in Sather which uses a lot
of dispatching. The performance consequences of abstract types were studied by
comparing a version of the compiler, in which all calls were dispatched, to
the standard version (Lim and Stolcke, 1991). The use of explicit typing
causes one-tenth the number of dispatches and an 11.3 percent reduction in
execution time.


Separate Implementation and Type Inheritance



In most object-oriented languages, inheritance defines the subtype relation
and causes the descendant to use an implementation provided by the ancestor.
These are quite different notions; confusing them often causes semantic
problems. For example, one reason why Eiffel's type system is difficult to
check is that it mandates "covariant" conformance for routine argument types
(Meyer, 1992). This means a routine in a descendant must have argument types
which are subtypes of the corresponding argument types in the ancestor.
Because of this choice, the compiler can't ensure argument expressions conform
to the argument type of the called routine at compile time. In Sather,
inheritance from abstract classes defines subtyping while inheritance from
other classes is used solely for implementation inheritance. This allows
Sather to use the statically type-safe contravariant rule for routine argument
conformance.


Multiple Inheritance


In Smalltalk and Objective-C, each class only inherits from a single class. In
Sather, classes can inherit from an arbitrary number of classes, a property
called "multiple inheritance." This is important because it commonly occurs in
modeling physical types. For example, there might be types representing "means
of transportation" and "major expenditures." The type representing
"automobiles" should be a descendant of both of these. In Smalltalk or
Objective-C, which only support single inheritance, you'd be forced to make
all "means of transportation" be "major expenditures" or vice versa. 


Garbage Collection


Languages derived from C are usually not "garbage collected," making you
responsible for explicitly creating and destroying objects. Unfortunately,
these memory-management issues often cut across natural abstraction
boundaries. The objects in a class usually don't know when they are no longer
referenced, and the classes which use those objects shouldn't have to deal
with low-level memory-allocation issues. Memory management done by the
programmer is the source of two common bugs. If an object is freed while still
being referenced, a later access may find the memory in an inconsistent state.
These so-called "dangling pointers" are difficult to track down because they
often cause code errors far removed from the offending statement. Memory leaks
caused when an object is not freed even though there are no references to it,
are also hard to find. Programs with this bug use more and more memory until
they crash. Sather uses a garbage collector which tracks down unused objects
and reclaims the space automatically. To further enhance performance, the
Sather libraries generate far less garbage than is typical in languages such
as Smalltalk or Lisp.


Interactive, Interpreted Programming


Sather combines the flexibility of an interactive, interpreted environment
with very high-efficiency compiled code. During development, the well-tested
library classes are typically run compiled, while the new experimental code is
run interpreted. The interpreter also allows immediate access to all the
built-in algorithms and data structures for experimentation. Listing Three is
an example of an interactive Sather session.


Iteration Abstraction


Most code is involved with some form of iteration. In loop constructs of
traditional languages, iteration variables must be explicitly initialized,
incremented, and tested. This code is notoriously tricky and is subject to
"fencepost errors." Traditional iteration constructs require the internal
implementation details of data structures like hash tables to be exposed when
iterating over their elements.
Sather allows you to cleanly encapsulate iteration using constructs called
"iters" (Murer, Omohundro, and Szy-perski, 1993) that are like routines,
except their names end in an exclamation point (!), their bodies may contain
yield and quit statements, and they may only be called within loops. The
Sather loop construct is simply: loop_end. When an iter yields, it returns
control to the loop. When it is called in the next iteration of the loop,
execution begins at the statement following the yield. When an iter quits, it
terminates the loop in which it appears. All classes define the iters
until!(BOOL), while!(BOOL), and break! to implement more traditional looping
constructs. The integer class defines a variety of useful iters, including
upto!(INT):INT, downto!(INT):INT, and step!(num,step:INT):INT. Listing Four
shows how upto! is used to output digits from 1 to 9.
Container classes, such as arrays or hash tables, define an iter elts!:T to
yield the contained elements and an iter called set_elts!(T) to insert new
elements. Listing Four shows how to set the elements of an array to successive
integers and then how to double them. Notice that this loop doesn't have to
explicitly test indexes against the size of the array.
The tree classes have iters to yield their elements according to the "pre,"
"post," and "in" orderings. The graph classes have iters to yield the vertices
according to depth-first and breadth-first search orderings.


The Implementation


The first version of the Sather compiler was written in Sather by Chu-Cheow
Lim and has been operational for several years. It compiles into C code and
has been ported to a wide variety of machines. It is a fairly large program
with about 30,000 lines of code in 183 classes (this compiles into about
70,000 lines of C code).
Lim and Stolcke extensively studied the performance of the compiler on both
MIPS and SPARC architectures. Because the compiler uses C as an intermediate
language, the quality of the executable code depends on the match of the
C-code templates used by the Sather compiler to the optimizations employed by
the C compiler. Compiled Sather code runs within 10 percent of the performance
of handwritten C code on the MIPS machine and is essentially as fast as
handwritten C code on the SPARC architectures. On a series of benchmark tests
(towers of Hanoi, 8 queens, and the like) Sather performed slightly better
than C++ and several times better than Eiffel. The new compiler performs
extensive automatic inlining and so provides more opportunities for
optimization than typical handwritten C code.


The Libraries


The Sather libraries currently contain several hundred classes, and new ones
are continually being written. Eventually, we hope to have efficient,
well-written classes in every area of computer science. The libraries are
covered by an unrestrictive license which encour-ages the sharing of software
and crediting of authors, without prohibiting use in proprietary and
commercial projects. Currently there are classes for basic data structures,
numerical algorithms, geometric algorithms, graphics, grammar manipulation,
image processing, statistics, user interfaces, and connectionist simulations.


pSather 


Sather is also being extended to support parallel programming. An initial
version of the language, "pSather" (Murer, Feldman, and Lim, 1993), runs on
the Sequent Symmetry and the Thinking Machines CM-5. pSather adds constructs
for programming on a distributed-memory, shared-address machine model. It
includes support for control parallelism (thread creation, synchronization),
an SPMD form of data parallelism, and mechanisms to manipulate execution
control and data in a nonuniform access machine. The issues which make
object-oriented programming important in a serial setting are even more
important in parallel programming. Efficient parallel algorithms are often
quite complex and should be encapsulated in well-written library classes.
Different parallel architectures often require the use of different algorithms
for optimal efficiency. The object-oriented approach allows the optimal
version of an algorithm to be selected according to the machine it is actually
running on. It is often the case that parallel code development is done on
simulators running on serial machines. A powerful object-oriented approach is
to write both simulator and machine versions of the fundamental classes in
such a way that a user's code remains unchanged when moving between them.


Conclusion


I've described some of the fundamental design issues underlying Sather 1.0.
The language is quite young, but we are excited by its prospects. The user
community is growing, and new class development has become an international,
cooperative effort. We invite you join in its development! 


Acknowledgments


Sather has adopted ideas from a number of other languages. Its primary debt is
to Eiffel, designed by Bertrand Meyer, but it has also been influenced by C,
C++, CLOS, CLU, Common Lisp, Dylan, ML, Modula-3, Oberon, Objective C, Pascal,
SAIL, Self, and Smalltalk. Many people have contributed to the development and
design of Sather. The contributions of Jeff Bilmes, Ari Huttunen, Jerry
Feldman, Chu-Cheow Lim, Stephan Murer, Heinz Schmidt, David Stoutamire, and
Clemens Szyperski were particularly relevant to the issues discussed in this
article.



References


ICSI Technical reports are available via anonymous ftp from
ftp.icsi.berkeley.edu.
Lim, Chu-Cheow and Andreas Stolcke. "Sather Language Design and Performance
Evaluation." Technical Report TR-91-034. International Computer Science
Institute, Berkeley, CA, May 1991.
Meyer, Bertrand. Eiffel: The Language. New York, NY: Prentice Hall, 1992. 
Murer, Stephan, Stephen Omohundro, and Clemens Szyperski. "Sather Iters:
Object-oriented Iteration Abstraction." ACM Letters on Programming Languages
and Systems (submitted), 1993.
Murer, Stephan, Jerome Feldman, and Chu-Cheow Lim. "pSather: Layered
Extensions to an Object-Oriented Language for Efficient Parallel Computation."
Technical Report TR-93-028. International Computer Science Institute,
Berkeley, CA, June 1993.
Omohundro, Stephen. "Sather Provides Non-proprietary Access to Object-oriented
Programming." Computers in Physics. 6(5):444-449, 1992.
Omohundro, Stephen and Chu-Cheow Lim. "The Sather Language and Libraries."
Technical Report TR-92-017. International Computer Science Institute,
Berkeley, CA, 1991. 
Schmidt, Heinz and Stephen Omohundro. "CLos, Eiffel, and Sather: A
Comparison," in Object Oriented Programming: The CLOS Perspective, edited by
Andreas Paepcke. Boston, MA: MIT Press, 1993.

Listing One 

class STACK{T} is
 -- Stacks of elements of type T.
 attr s:ARR{T}; -- An array containing the elements.
 attr size:INT; -- The current insertion location.

 is_empty:BOOL is
 -- True if the stack is empty.
 res := (s=void or size=0) end;

 pop:T is
 -- Return the top element and remove it. Void if empty.
 if is_empty then res:=void
 else size:=size-1; res:=s[size]; s[size]:=void end end;
 
 push(T) is
 -- Push arg onto the stack.
 if s=void then s:=#ARR{T}(asize:=5)
 elsif size=s.asize then double_size end;
 s[size]:=arg; size:=size+1 end;

 private double_size is
 -- Double the size of `s'.
 ns::=#ARR{T}(asize:=2*s.asize); ns.copy_from(s); s:=ns end;

 clear is
 -- Empty the stack.
 size:=0; s.clear end
 
end; -- class STACK{T}

class FOO is
 bar is 
 s1:STACK{CHAR}; s1.push('a');
 s2:STACK{STR}; s2.push("This is a string.") end;
end;



Listing Two

abstract class $POLYGON is
 ...
 number_of_vertices:INT;

end;

class TRIANGLE is
 inherit $POLYGON;
 ...
 number_of_vertices:INT is res:=3 end;
end;

class SQUARE is
 inherit $POLYGON;
 ...
 number_of_vertices:INT is res:=4 end;
end;

class FOO2 is
 bar2 is
 s:STACK{$POLYGON};
 ...
 n:=s.pop.number_of_vertices;
 ... 
 end;
end;



Listing Three

>5+7
12

>40.intinf.factorial
815915283247897734345611269596115894272000000000

>#OUT + "Hello world!"
Hello world!

>v::=#VEC(1.0,2.0,3.0); w::=#VEC(1.0,2.0,3.0); 
>v+w
#VEC(2.0, 4.0, 6.0)

>v.dot(w)
14.0

>#ARRAY{STR}("grape", "cherry", "apple", "plum", "orange").sort
#ARRAY{STR}("apple","cherry","grape","orange","plum")



Listing Four

>loop #OUT+1.upto!(9) end
123456789 

>a::=#ARRAY{INT}(asize:=10)
>loop a.set_elts!(1.upto!(10)) end
>a
#ARRAY{INT}(1,2,3,4,5,6,7,8,9,10)

>loop a.set_elts!(2*a.elts!) end

>a
#ARRAY{INT}(2,4,6,8,10,12,14,16,18,20)




























































Special Issue, 1994
The Modula-3 Programming Language


A full-featured language for software engineering and object-oriented
programming




Sam Harbison


Sam is the author of C: A Reference Manual.


Programmers who prefer strongly typed, structured programming are frustrated
by languages that are either too simplistic, such as Pascal, or too costly and
complex, like Ada. They are looking for a language that is "just right," to
quote Goldilocks--a language that supports long-term reliability and
maintainability, but also has enough modern, practical features to handle
large problems efficiently. A language, in fact, like Modula-3.
Modula-3 was developed by resear-chers at DEC's Systems Research Center (SRC)
and the Olivetti Research Center in 1989. It borrows from two evolutionary
lines of programming languages: an academic line, represented by Niklaus
Wirth's Pascal, Modula-2, and Oberon languages; and an industrial research
line, represented by the Mesa, Cedar, and Euclid languages from the Xerox PARC
(Palo Alto Research Center). Its immediate parent is Modula-2+, an extension
of Modula-2 developed at SRC in the early 1980s and used in its research
systems. In 1986, Maurice Wilkes, who had developed the first practical
electronic stored-program computer at Cambridge 37 years earlier, sparked an
effort to "clean up" Modula-2+. With Niklaus Wirth's blessing, this became a
design for a new language, Modula-3. The original language report was issued
in 1988, with minor revisions in 1989 and 1990.
Modula-3 emphasizes safety and maintainability and is gaining important
converts outside DEC. The computer-science laboratory at Xerox PARC has
adopted the language for its research software, and the University of
Cambridge in England is now teaching Modula-3 to its computer-science
students.
The designers had several goals: 
To provide the abstractions necessary to structure large systems programs:
modules, objects, threads, and generics. 
To provide the mechanisms for making programs safe and robust: strong type
checking, exceptions, isolation of unsafe code, and automatic garbage
collection. 
To keep the language simple. Features were chosen that had been proven in
other languages, but compatibility with older languages wasn't important. 
The result was a full-featured language for software engineering and
object-oriented programming. A feature-by-feature comparison puts Modula-3
roughly on par with Ada and C++; see Table 1. However, Modula-3 avoids the
complexity of those larger languages by simplifying individual features. For
example, Modula-3 supports object-oriented programming but implements single
rather than multiple inheritance. It supports generics, but the mechanism is
considerably simpler than that of Ada or C++. In practice, these
simplifications do not affect day-to-day programming. Paradoxically, Modula-3
is also the most stable language: C++, Ada, and Modula-2 are all being
"enhanced" in standards committees, in many cases to add features already
found in Modula-3. 


SRC Modula-3 


Before discussing language features, I should note that DEC provides
free-of-charge a high-quality Modula-3 compiler, called SRC Modula-3, which is
available in source form on the Inter-net (gatekeeper.dec.com in directory
/pub/DEC/Modula-3). SRC Modula-3 runs on most UNIX workstations and is in use
at many universities, companies, and research laboratories. SRC Modula-3 also
includes a rich run-time library, including UNIX and X Window interfaces, and
an object-oriented X Window programming system called "Trestle." 


Touring the Language 


Modula-3's syntax holds no big surprises; it is based on Modula-2 and,
therefore, Pascal. Statements, expressions, and declarations are similar to
those found in other Pascal-family languages. Modula-3, however, deviates from
Modula-2 when necessary. For example, the precedence of arithmetic and logical
operators follows the more natural convention found in C, Ada, and Fortran
rather than the one used in Pascal and Modula-2; see Table 2. 
In large programs, it is important to place some structure on collections of
procedures and variables, restricting the proliferation of names. Modula-3
programs are structured as collections of modules and interfaces. An interface
specifies a set a public facilities: types, variables, constants, and
procedures. The interface is a contract between the facilities' developers and
their clients. A module implements an interface by supplying private data and
bodies for interface procedures. To use an interface, a client must import the
interface. Interfaces and modules are stored in different files and compiled
separately. 
You can change a module without recompiling clients of the interface. Consider
the Modula-3 version of "Hello, World" in Figure 1. The Hello module exports
(implements) Main, a built-in interface that identifies the starting point of
a program. Hello also imports two interfaces, Wr and Stdio, that provide basic
stream-oriented I/O facilities. (These interfaces are part of the standard
libraries supplied with SRC Modula-3.) Wr.PutText is an example of how names
of imported procedures and variables are qualified by the interface name. This
makes it easy to keep track of name sources in large programs. 


Field Lists 


In the rest of this article I'll illustrate the features of Modula-3 by
looking at a realistic example--an input line parser modeled on the input
facilities of the Awk language. In most programming languages, dealing with
free-form numeric and text input is a hassle. Even C, which has a pretty good
I/O library, forces you to descend into the mysteries of scanf to read
numbers. In Awk, input is effortless: Input lines are automatically broken
into whitespace-delimited fields that are referred to by number and can be
used as text or in numeric expressions. The goal is to write a module to
provide Awk-like input for Modula-3. 
Modula-3 supports both object-oriented and traditional programming models. In
this case, our input-parser interface uses an object model in which a client
creates an object called a "field list" and then uses it to read fields from
input lines. The steps to follow are: 
1. Create a field-list object, fl. 
2. Read a line into fl by calling fl.getLine(). 
3. Get the value of the nth field as a string with fl.text(n). Get the value
of the nth field as a number with fl.number(n). 
4. When finished with the current line, go back to step 2. 
The design for field lists gives us the opportunity to discuss two
particularly interesting features of Modula-3: opaque types and threads.
Opaque object types allow you to reveal only the user-visible part of an
object definition in an interface, hiding its implementation in a module.
Modula-3's support for threads lets you take advantage of preemptive
multitasking on any computer, and real multiprocessing on computers that
support it. 


The Interface 


The FieldList interface (see Listing One) declares several types, two
constants, and an exception. The principal type (the field list) is named "T"
by convention; clients will refer to it as FieldList.T. The relevant
declarations in Listing One begin with the statement T<: Public; and include
the METHODS..END block. The <: operator signals that this is an opaque-type
declaration. Translated into English, the declarations say, "Type T is an
object type descended from type Public, which in turn is descended from type
MUTEX. Type Public (and hence T) has these methods." Thus, we have the method
specifications for T, but have yet to explain the private data and methods.
(We'll see how type T's declaration is completed later.) 
The FieldList interface also includes declarations to support error handling
(the EXCEPTION declaration and the RAISES clauses in the method declarations)
and multitasking (the type MUTEX in the declaration of Public and the
Thread.Alerted exception in the declaration of getLine). 
There are a few other interesting things to note in this interface. Some names
are used before they are declared. This is because in Modula-3, the visibility
of a name extends both before and after the name in the current scope. Type
Rd.T is a "reader," a general input stream that acts like the input side of a
C FILE * stream. Type NumberType is the particular floating-point type used to
represent the numeric values of fields. Modula-3 has three floating-point
types: REAL, LONG-REAL, and EXTENDED, corresponding to the IEEE floating-point
standard types. Several convenient shortcuts are also demonstrated. The
declaration getLine(rd: Rd. T:= NIL) RAISES {...} indicates that method
getline takes a single parameter of type Rd.T, and that the parameter has a
default value of NIL. You can omit the argument when calling the method. Even
more concisely, the declaration init(ws:=DefaultWS): T, the parameter of the
init method has a default value, and the parameter type is omitted. Modula-3
determines the types from the default value, so this declaration is the same
as init(ws: SET OF CHAR:=DefaultWS): T. You can also omit the type in variable
declarations if you supply an initial value of the same type. 



The Client 


Listing Two is Sum, a small program that reads lines, sums all the numbers on
each line, and prints the result. The basic idea is simple, but it
demonstrates several features of Modula-3. The field list is created in the
top-level variable declaration VARfl:=NEW (FieldList.T).init(WhiteSpace). The
NEW function creates a new field-list object, to which the init method is
immediately applied. The init method returns the initialized object. Modula-3
does not have automatic constructors (like C++), but use of the init method is
a convention. The program calls fl.getLine() to read a line from the standard
input and loops over the input fields, summing the numbers. 
The loop body also shows an example of a WITH statement. Modula-3's WITH
statement differs from those of Pascal and Modula-2, which are used to make
record field names visible. In Modula-3, WITH is used to introduce a new
identifier and bind it to an arbitrary variable or value for the duration of
the enclosed statements. If the value is a variable designator (such as an
array element or record field), the new name is aliased to the variable, and
it can be read or written. Otherwise, as in this case, the new identifier is a
read-only value. WITH is surprisingly convenient, and you will see many
examples of it in the implementation of FieldList. 


Exceptions 


Error handling in Modula-3 programs is accomplished with exceptions. By using
exceptions, you don't have to check return values on every procedure
call--something so tedious that most C programmers don't bother. The FieldList
interface exports the exception Error, which is raised by certain methods in
response to an error. A typical client error would be trying to read the 12th
field in a line with only 11 fields. 
When an exception is raised, it propagates out of the current procedure into
the caller. If it is not handled there, it continues to propagate outward
until a handler is found. If no procedure handles the exception, the Modula-3
runtime system terminates the program with a suitable message. Exceptions are
part of the specification of procedures in Modula-3; every procedure or method
that can raise an exception must list that exception in a RAISES clause. Note
the RAISES {Error} clauses in the method declarations in FieldList. 
The Sum program deals with exceptions in two ways: by handling some exceptions
and ignoring others. The outer loop in the main program is surrounded by a
TRY..EXCEPT..END block, which handles any exceptions raised by procedures
inside the loop. This particular exception handler simply prints a message and
allows the program to finish normally. One message is provided for the
end-of-file exception, and another message for all other exceptions. 
An alternative to providing an exception handler can be seen in the Put
procedure, which includes the pragma <*FATAL Wr.Failure, Thread.Alerted*>.
Because there is no exception handler or RAISES clause in the Put procedure,
it is a checked run-time error to raise any exception within that procedure.
But the compiler knows from the Wr interface that Wr.PutText can raise the
Wr.Failure and Thread.Alerted exceptions, so it will warn the programmer of
the potential run-time error. The FATAL pragma says, in effect, that it's OK
to halt the program if these exceptions are raised within Put, and so the
warning message should be suppressed. 


The Module 


It's time to turn to the implementation of FieldList, shown in Listing Three,
page 30. Given the interface shown in Listing One, this module must do at
least two things: complete the opaque declaration of FieldList.T, and supply
procedures to implement the methods for that type. 
Note the REVEAL declaration in Listing Three. This "revelation" adds to the
previous declaration of T in the interface; a set of private fields that
clients cannot see. The keyword BRANDED is required for reasons we won't go
into here. Notice also that the object fields have initializers. These
initializations are performed whenever the object is created, and take the
place of C++ constructors. For more complex initialization, a separate method
must be used. 
The revelation also associates procedures with methods by a series of lines of
the form methodname:=procedure name. In the FieldList example we've given the
procedures the same names as the methods. The method and its procedure must
have compatible signatures. The signatures for isANumber are shown in Figure
2. The procedure includes an extra argument representing the object on which
the method is operating. Some programmers name the extra parameter self by
convention. The method call fl.isANumber(n) is equivalent to isA-Number(fl,
n), assuming that this procedure is currently bound to the method. 
In addition to providing better abstraction and information hiding, keeping
the type revelation in the module means that changing the hidden fields or
procedures of the object type does not force clients to be recompiled. 


Strings and Arrays 


You'll notice that the FieldList interface uses the built-in string type TEXT,
but the module also uses arrays of characters. Strings (type TEXT) are
extremely convenient in Modula-3. They are dynamically allocated and can be of
any length. Once created, a TEXT value cannot be changed, so strings have
value semantics--no one can change the value of a string you are holding. The
built-in interface Text provides basic construction and testing operations on
strings, but no searching functions. 
For intense character manipulation, you can also convert a TEXT value to an
array of characters, as in the getLine procedure. In most languages, arrays of
characters are difficult to deal with because they must have a fixed
compile-time size. Modula-3 provides a compromise: Although stack-based arrays
must have a fixed size, dynamic arrays can be allocated with a run-time size.
For example, the FieldList.T object contains a field declared as chars: REF
ARRAY OF CHAR. This is a reference (pointer) to an open (unbounded) array of
characters. In getLine, the self.chars and Text.SetChars statements store in
self.chars a reference to an array of Linelength characters and then fill that
array with the characters from string text. Unlike C and C++, dynamic arrays
in Modula-3 have subscript bounds checking built in. The AddDescriptor
procedure is a good example of how to use dynamic arrays, including how to
grow them when necessary. All open arrays are 0 based, and the function
NUMBER(a) can be used to determine the number of elements in any array a.
SUBARRAY can be used to designate a contiguous set of elements in an array. 


Threads 


Multitasking with threads is a useful structuring tool in many applications.
Threads are independent control points, or miniprocesses, that execute within
your program's address space. Each thread has its own stack but shares access
to global variables and the heap. It is not unreasonable for a large program
to have dozens of threads performing various activities. Most new operating
systems, including Windows NT, OS/2, and Mach (OSF/1) provide built-in support
for preemptive multitasking with threads. So does Modula-3, even on operating
systems that don't provide thread support directly. 
Coupled with the benefits of threads is the danger of race conditions, which
occurs when two threads attempt to modify a shared data structure at the same
time. One thread may be suspended after partially modifying the data
structure, leaving the data structure in an inconsistent state. A second
thread might then trip over the inconsistency. The solution is to synchronize
access by "locking" the data structure while it is being used, using a
mutual-exclusion semaphore ("mutex" for short). Each thread locks the mutex
while using the data structure; if a thread already has the mutex locked, the
second thread will be forced to wait. 
The implementation of FieldList does not use threads, but it is "thread
friendly." That means FieldList provides the necessary synchronization so that
clients can be multithreaded without worrying about race conditions. In
Modula-3, mutexes are provided by the built-in object type MUTEX. You can
store MUTEX objects in your data structures or, as in FieldList, you can
simply make your object type a descendant of MUTEX, effectively turning your
object into a mutex itself. In using mutexes, there is a special block
statement, LOCK mu DO..END, so that you can't forget to unlock a locked mutex.
The LOCK statement locks the mutex mu while the enclosed statements are
executed, and ensures that the mutex is properly unlocked when the statements
terminate, even if they are terminated by an exception or RETURN within the
LOCK statement. Throughout the FieldList module, you will see LOCK statements
surrounding access to field lists. 


Safety 


What is Modula-3's most important advantage? Without a doubt, it is safety.
While the exception mechanism encourages the creation of robust programs, the
Modula-3 language is inherently safe and does not require special attention by
the programmer. 
Some of the hardest bugs to find are those that cause a valid source-code
state-ment to execute incorrectly at run time. For example, the C statement
s->a[i]=v could fail for many reasons: The pointer s might be null or might
point to unallocated storage; the value of i might be too large to use as an
index into the array; or v might contain an out-of-bounds value because it was
never initialized. ANSI C lists 97 ways in which a C program's behavior might
be unpredictable at compile time or run time. Even Ada, usually considered a
"safe" language, does not protect you against dangling pointers or
uninitialized variables. 
Modula-3 guarantees that all run-time assumptions remain valid. It will
initialize your variables (if you don't) to ensure that they always contain
values of their declared types. It checks pointer conversions at run time for
type safety, and does not permit you to deallocate storage directly. Some
features found in other languages, such as taking the address of a local
variable, are prohibited because they are unsafe and because detecting their
misuse at run time would be too costly. These checks and rules avoid many bugs
and catch the remaining ones quickly, before their effects can spread. In my
experience, the error messages provided by the Modula-3 run-time system are so
good that I rarely have to use the debugger to locate the cause of an error. 
A good example of this safety and the new programming features it makes
possible is run-time type testing. Consider the GetReal procedure in Figure 3,
which accepts a pointer of any type (type REFANY) and returns as a
floating-point number the value pointed to. 
The built-in NARROW function is used here to convert a "pointer to anything"
to a "pointer to REAL" (REF REAL). In most languages, this code would be
unsafe: The argument ptr might point to some other type of value whose bits
don't constitute a valid floating-point number. Modula-3, however, checks at
run time that ptr is a value of type REF REAL. If it is not, the call to
NARROW will fail. This run-time type checking is possible because all dynamic
storage contains type information primarily intended for the garbage
collector. You can use this type information yourself. For example, Figure 4
is another version of GetReal that shows how run-time type testing can be made
explicit. 
Some programmers worry about garbage-collection overhead, but what are the
real costs and benefits? Good garbage-collection algorithms seem to impose no
more than a 10 percent overhead on run-time performance, and can actually save
time on smaller programs or programs that use inefficient heap managers. This
is a reasonable investment for reliability. Garbage collection also shortens
development and makes programs smaller by eliminating the need to write
storage-management code. Many OOP languages, such as SmallTalk, Eiffel, and
Trellis include garbage collection. 


Loopholes 


Of course, systems programmers cannot restrict themselves to "safe"
programming at all times. Languages that are too strict about safety may not
be usable for writing low-level code, like storage allocators. Therefore, if
you declare a module UNSAFE, Modula-3 gives you access to a variety of unsafe
but practical features, such as unrestricted type conversions (via function
LOOPHOLE), address arithmetic, and the DISPOSE procedure to deallocate
storage. The compiler cannot ensure the safety of UNSAFE modules, but at least
it makes you isolate and identify unsafe code. 


The Tools 



A programming language without good tools is almost useless. As previously
mentioned, SRC Modula-3 includes a rich run-time library, including UNIX and X
Windows interfaces and an object-oriented X Windows programming system.
Program rebuilding is easy in SRC Modula-3 using the supplied m3 driver
program. The command line m3 --o Progmake *.i3 *.m3 will cause the driver to
inspect all the interfaces (*.i3) and modules (*.m3) in the current directory;
compute dependencies based on IMPORT and EXPORT declarations; determine which
source files have changed and which interfaces are out of date; recompile
anything that needs to be recompiled; and link program Prog. For more complex
programs, the m3make utility allows you to describe your program abstractly
without computing file dependencies by hand. You can also add arbitrary
make-like dependencies to your m3makefile. 
Modula-3 brings together the long-term maintainability of Ada, the simplicity
of Pascal, and the modern object-oriented programming facilities of C++. The
result is a clean language that makes it easy to write robust and maintainable
programs. 


References 


Harbison, Samuel P. Modula-3. Englewood Cliffs, NJ: Prentice Hall, 1992. 
Modula-3 News. Pittsburgh, PA: Pine Creek Software. 
Nelson, Greg. Systems Programming with Modula-3. Englewood Cliffs, NJ:
Prentice Hall, 1991. 
Usenet News Group: comp.lang.modula3. 
Table 1: Feature comparison of some popular programming languages.
 Modula-3 C++ Ada Modula-2 Turbo C 
 Pascal 5.5 
 Generics yes no* yes no no no
 Exceptions yes no* yes no* no no
 Threads yes no yes no no no
 OOP yes yes no* no yes no
 User-defined operators no yes yes no no no
 Interfaces yes no yes yes yes no
 Strong typing yes some yes yes yes no
 Run-time safety checks yes no yes yes yes no
 Isolate unsafe features yes no yes yes no no
 Procedure types yes yes no yes yes yes
 Case-sensitive names yes yes no yes no yes
 Garbage collection yes no no no no no
 *These features are coming in new versions of the language.
Table 2: Some differences between Modula-2 and Modula-3.
 Modula-3 Modula-2 
 Declarations Names visible throughout Must declare names before use;
 scope; can initialize variables. cannot initialize variables.
 Types Structural equivalence. Name equivalence.
 Expressions C-like precedence; A.B Pascal-like precedence; no
 shorthand for A^.B, and so on. shorthand for A^.B.
 Statements FOR loop declares its own FOR-loop variable must be
 variable. declared by programmer.
 Pointers Syntax: REF T or REFANY; Syntax: POINTER TO T; no
 run-time type testing and run-time type testing; manual
 garbage collection. deallocation of storage.
 Built-in MIN, MAX apply to value pairs, MIN, MAX apply to types, yield
 functions yield smaller and larger values. smallest and largest elements.
 Strings Variable length, read-only; Fixed maximum length, read/
 different from ARRAY OF CHAR. write, same as ARRAY OF CHAR.
 Isolation of UNSAFE keyword reveals Most unsafe features are
 unsafe features unsafe features of language. provided by SYSTEM interface.
 OOP, generics, Supported. Not currently available.
 exceptions
Figure 1: Modula-3 version of the classic "Hello, World!" program.
MODULE Hello EXPORTS Main; IMPORT Wr, Stdio;
BEGIN
 Wr. PutText(Stdio.stdout, "Hello, World!\n");
 Wr. Close(Stdio.stdout);
END Hello.
Figure 2: Signatures for isANumber.
Method:
 isANumber (n: FieldNumber) : BOOLEAN RAISES {Error}
Procedure:

 isANumber (self: T; n: FieldNumber) : BOOLEAN RAISES {Error}
Figure 3: Procedure to accept a pointer of any type and return as a
floating-point number the value pointed to.
PROCEDURE GetReal(ptr: REFANY) : REAL = (* Return ptr^ as a REAL *)
 VAR realPtr:= NARROW(ptr, REF REAL);
 BEGIN
 RETURN realPtr^;
 END GetReal;
Figure 4: Making explicit run-time type testing in the GetReal procedure.
PROCEDURE GetReal2(ptr: REFANY) : REAL = (* Return prt^, or 0.0*)
 BEGIN
 IF ptr # NIL AND ISTYPE(ptr, REF REAL) THEN
 RETURN NARROW(ptr, REF REAL)^;
 ELSE
 RETURN 0.0; (* ptr is not what we expected *)
 END;
 END GetReal2;

Listing One 

INTERFACE FieldList;
(* Breaks text lines into a list of fields which can be treated
 as text or numbers. This interface is thread-safe. *)
IMPORT Rd, Wr, Thread;
EXCEPTION Error;
CONST 
 DefaultWS = SET OF CHAR{' ', '\t', '\n', '\f', ','};
 Zero: NumberType = 0.0D0;
TYPE 
 FieldNumber = [0..LAST(INTEGER)]; (* Fields are numbered 0, 1, ... *)
 NumberType = LONGREAL; (* Type of field as floating-point number *)
 T <: Public; (* A field list *)
 Public = MUTEX OBJECT (* The visible part of a field list *)
 METHODS
 init(ws := DefaultWS): T; (* Define whitespace characters. *)
 getLine(rd: Rd.T := NIL) 
 RAISES {Rd.EndOfFile, Rd.Failure, Thread.Alerted};
 (* Reads a line and breaks it into fields that can be 
 examined by other methods. Default reader is Stdio.stdin. 
 numberOfFields(): CARDINAL; 
 (* The number of fields in the last-read line. *)
 line(): TEXT; (* The entire line. *)
 isANumber(n: FieldNumber): BOOLEAN RAISES {Error};
 (* Is the field some number (either integer or real)? *)
 number(n: FieldNumber): NumberType RAISES {Error};
 (* The field's floating-poinnt value *)
 text(n: FieldNumber): TEXT RAISES {Error}; (* The field's text value *)
 END;
END FieldList.



Listing Two

MODULE Sum EXPORTS Main; (* Reads lines of numbers and prints their sums. *)
IMPORT FieldList, Wr, Stdio, Fmt, Rd, Thread;
CONST WhiteSpace = FieldList.DefaultWS + SET OF CHAR{','};
VAR 
 sum: FieldList.NumberType;
 fl := NEW(FieldList.T).init(ws := WhiteSpace);

PROCEDURE Put(t: TEXT) =
 <*FATAL Wr.Failure, Thread.Alerted*>
 BEGIN
 Wr.PutText(Stdio.stdout, t);
 Wr.Flush (Stdio.stdout);
 END Put;
BEGIN
 TRY
 LOOP Put("Type some numbers: ");
 fl.getLine();
 sum := FieldList.Zero;
 WITH nFields = fl.numberOfFields() DO
 FOR f := 0 TO nFields - 1 DO
 IF fl.isANumber(f) THEN
 sum := sum + fl.number(f);
 END;
 END;
 WITH sumText = Fmt.LongReal(FLOAT(sum, LONGREAL)) DO
 Put("The sum is " & sumText & ".\n");
 END(*WITH*);
 END(*WITH*);
 END(*LOOP*)
 EXCEPT
 Rd.EndOfFile => 
 Put("Done.\n");
 ELSE 
 Put("Unknown exception; quit.\n");
 END(*TRY*);
END Sum.



Listing Three

MODULE FieldList;
(* Designed for ease of programming, not efficiency. We don't bother to reuse
 data structures; we allocate new ones each time a line is read. *)
IMPORT Rd, Wr, Text, Stdio, Fmt, Thread, Scan;
CONST DefaultFields = 20; (* How many fields we expect at first. *)
TYPE
 DescriptorArray = REF ARRAY OF FieldDescriptor;
 FieldDescriptor = RECORD 
 (* Description of a single field. The 'text' field and 'real' 
 fields are invalid until field's value is first requested.
 (Invalid is signaled by 'text' being NIL. *)
 start : CARDINAL := 0; (* start of field in line *)
 len : CARDINAL := 0; (* length of field *)
 numeric: BOOLEAN := FALSE; (* Does field contain number? *)
 text : TEXT := NIL; (* The field text *)
 number : NumberType := 0.0D0; (* The field as a real. *)
 END;
REVEAL
 T = Public BRANDED OBJECT
 originalLine: TEXT; (* the original input line *)
 chars : REF ARRAY OF CHAR := NIL; (* copy of input line *)
 nFields : CARDINAL := 0; (* number of fields found *)
 fds : DescriptorArray := NIL; (* descriptor for each field *)
 ws : SET OF CHAR := DefaultWS; (* our whitespace *)
 OVERRIDES (* supply real procedures for the methods *)

 init := init; getLine := getLine;
 numberOfFields := numberOfFields;
 line := line;
 isANumber := isANumber;
 number := number;
 text := text;
 END;
PROCEDURE AddDescriptor(t: T; READONLY fd: FieldDescriptor) =
 BEGIN
 IF t.nFields >= NUMBER(t.fds^) THEN
 WITH 
 n = NUMBER(t.fds^), (* current length; will double it *)
 new = NEW(DescriptorArray, 2 * n)
 DO
 SUBARRAY(new^, 0, n) := t.fds^; (* copy in old data *)
 t.fds := new;
 END;
 END;
 t.fds[t.nFields] := fd; 
 INC(t.nFields);
 END AddDescriptor;
PROCEDURE getLine(self: T; rd: Rd.T := NIL)
 RAISES {Rd.EndOfFile, Rd.Failure, Thread.Alerted} =
 VAR
 next : CARDINAL; (* index of next char in line *)
 len : CARDINAL; (* # of characters in current field *)
 lineLength: CARDINAL; (* length of input line *)
 BEGIN
 IF rd = NIL THEN rd := Stdio.stdin; END; (* default reader *)
 LOCK self DO
 WITH text = Rd.GetLine(rd) DO
 lineLength := Text.Length(text);
 self.originalLine := text;
 self.fds := NEW(DescriptorArray, DefaultFields);
 self.nFields := 0;
 self.chars := NEW(REF ARRAY OF CHAR, lineLength);
 Text.SetChars(self.chars^, text);
 END;
 next := 0;
 WHILE next < lineLength DO (* for each field *)
 (* Skip whitespace characters *)
 WHILE next < lineLength AND (self.chars[next] IN 
 self.ws) DO INC(next);
 END;
 (* Collect next field *)
 len := 0;
 WHILE next < lineLength 
 AND NOT (self.chars[next] IN self.ws) DO
 INC(len); INC(next);
 END;
 (* Save information about the field *) IF len > 0 THEN
 AddDescriptor(self, FieldDescriptor{start:=
 next - len, len := len});
 END;
 END(*WHILE*);
 END(*LOCK*);
 END getLine;
PROCEDURE GetDescriptor(t: T; n: FieldNumber): FieldDescriptor RAISES {Error}
 BEGIN

 (* Handle bad field number first. *)
 IF n >= t.nFields THEN
 RAISE Error;
 END;
 (* Be sure text and numeric values are set. *)
 WITH fd = t.fds[n] DO
 IF fd.text # NIL THEN RETURN fd; END; (* Already done this *)
 fd.text := Text.FromChars(SUBARRAY(t.chars^, fd.start, fd.len)
 TRY (* to interpret field as floating-point number *)
 fd.number := FLOAT(Scan.LongReal(fd.text), NumberType); 
 fd.numeric := TRUE;
 EXCEPT
 Scan.BadFormat => 
 TRY (* to interpret field as integer *)
 fd.number := FLOAT(Scan.Int(fd.text), NumberType
 fd.numeric := TRUE;
 EXCEPT
 Scan.BadFormat => (* not a number *)
 fd.number := Zero;
 fd.numeric := FALSE;
 END;
 END;
 RETURN fd;
 END(*WITH*);
 END GetDescriptor;
PROCEDURE numberOfFields(self: T): CARDINAL =
 BEGIN
 LOCK self DO RETURN self.nFields; END;
 END numberOfFields;
PROCEDURE isANumber(self: T; n: FieldNumber): BOOLEAN RAISES {Error} =
 BEGIN
 LOCK self DO
 WITH fd = GetDescriptor(self, n) DO RETURN fd.numeric; END;
 END;
 END isANumber;
PROCEDURE number(self: T; n: FieldNumber): NumberType RAISES {Error} =
 BEGIN
 LOCK self DO
 WITH fd = GetDescriptor(self, n) DO RETURN fd.number; END;
 END;
 END number;
PROCEDURE line(self: T): TEXT = BEGIN
 LOCK self DO RETURN self.originalLine; END;
 END line;
PROCEDURE text(self: T; n: FieldNumber): TEXT RAISES {Error} =
 BEGIN
 LOCK self DO
 WITH fd = GetDescriptor(self, n) DO
 RETURN self.fds[n].text;
 END;
 END(*LOCK*);
 END text;
PROCEDURE init(self: T; ws := DefaultWS): T =
 BEGIN
 LOCK self DO
 self.ws := ws;
 END;
 RETURN self;
 END init; 

BEGIN
 (* No module initialization code needed *)
END FieldList.



























































Special Issue, 1994
Bob: A Tiny Object-Oriented Language


C++? Smalltalk? What About Bob?




David Betz


David is a contributing editor for DDJ, and the author of XLisp, XScheme, and
other languages. He can be reached through the DDJ offices.


When I first started reading Dr. Dobb's back in the '70s, the articles I
looked forward to the most were those describing tiny programming languages.
First there were tiny implementations of Basic for various microprocessors,
then small implementations of C and Forth, and even a tiny language for the
control of robots. These articles intrigued me because they not only described
a language, but also included complete source code for its implementation.
I've always been interested in how programming languages are constructed, and
this gave me an opportunity to look inside and see how things worked.
Eventually, I decided to try my own hand at building languages.
Since then, I've built many different types of languages, ranging from simple
assemblers to complete Lisp systems. This article describes my latest
creation, a C-like, object-oriented language I call "Bob." Unlike the popular
Small C compiler by Ron Cain, it isn't a strict subset of C or C++; hence it
isn't possible to compile Bob programs with a standard C or C++ compiler.
Instead, Bob is an interpreter for a language with C-like syntax and a class
system similar to C++, but without variable typing and mostly without
declarations. In a sense, Bob is a combination of C++ and Lisp.


Writing a Bob Program


Before I begin describing Bob in detail, let's discuss how you go about
writing Bob programs. Example 1(a) presents a simple example program--a
function for computing factorials--written in Bob.
This function definition looks a lot like its C counterpart. The only
noticeable difference is the lack of a declaration for the type of the
parameter n and for the return type of the function. Variable types do not
need to be declared in Bob. Any variable can take on a value of any type.
There is no need for type declarations.
To take this further, Example 1(b) shows a program that uses the factorial
function in Example 1(a) to display the factorials of the numbers from 1 to
10. Again, this program looks a lot like a similar program written in C. The
main difference is in the first line. In a function definition's formal
parameter list, the semicolon character introduces a list of variables local
to the function. In this case, the variable i is local to the function main.
Also, notice that I've used the print function to display the results instead
of the C printf function. The print function in Bob prints each of its
arguments in succession. It is capable of printing arguments of any type and
automatically formats them appropriately. In addition to supporting C-like
expressions and control constructs, Bob also supports C++-like classes. Again,
Bob is a typeless language, so the syntax for class definitions is somewhat
different from C++, but it is similar enough that it should be easy to move
from one to the other. 
Example 2(a) shows a simple class definition that defines a class called foo
with members a and b, a static member last, and a static member function
get_last. Unlike in C++, it is not necessary to declare all member functions
within the class definition; only the static member functions need be
declared. It is necessary, however, to declare all data members in the class
definition.
As in C++, new objects of a class are initialized using a constructor
function, which has the same name as the class itself. Example 2(b) is the
constructor function for the foo class. This constructor takes two arguments,
which are the initial values for the member variables a and b. It also
remembers the last object created in the static member variable last. Lastly,
it returns the new object. For those of you not familiar with C++, the
variable this refers to the object for which the member function is being
called. It is an implicit parameter passed to every nonstatic member function.
In this case, it is the new object just created.
In Bob, all data members are implicitly protected: The only way to access or
modify the value of a member variable is through a member function. If you
need to access to a member variable outside a member function, you must
provide access to member functions to do this; see Example 3(a). Example 3(b)
shows how to set the value of a member variable. Finally, Example 3(c) shows a
member function that displays the numbers between a and b for any object of
the foo class, and a main function that creates some objects and manipulates
them. The new operator creates a new object of the class whose name follows
it. The expressions in parentheses after the class name are the arguments to
be passed to the constructor function.
Bob also allows one class to be derived from another. The derived class will
inherit the behavior of the base class and possibly add some behavior of its
own. Bob only supports single inheritance; therefore, each class can have at
most one base class. The code in Example 4(a) defines a class bar derived from
the base class foo, defined earlier.
The class bar will have member variables a and b inherited from foo as well as
the additional member variable c. The constructor for bar needs to initialize
this new member variable and do the initialization normally done for objects
of class foo; see Example 4(b).
This definition illustrates another difference between Bob and C++. In C++,
constructor functions cannot be called to initialize already existing objects.
This is allowed in Bob, so the foo constructor can be used to do the common
initialization of the foo and bar classes. In C++, it would be necessary to
define an init function for foo and call it from both constructors.
That's a brief walk through the features of Bob. Table 1 details Bob's
complete syntax. 


How Does it All Work?


I've implemented Bob as a hybrid of a compiler and an interpreter. When a
function is defined, it is compiled into instructions for a stack-oriented,
bytecode machine. When the function is invoked, those bytecode instructions
are interpreted. The advantage of this approach over a straight interpreter is
that syntax analysis is done only once, at compile time. This speeds up
function execution considerably and opens up the possibility of building a
run-time-only system that doesn't include the compiler at all.


Run-Time Organization


First, I'll describe the run-time environment of Bob programs. The virtual
machine that executes the bytecodes generated by the Bob compiler has a set of
registers, a stack, and a heap. The register set is shown in Table 2. All
instructions get their arguments from and return their results to the stack.
Literals are stored in the code object itself and are referred to by offset.
Branch instructions test the value on the top of the stack (without popping
the stack) and branch accordingly. Function arguments are passed on the stack,
and function values are returned on top of the stack.
In Bob, all member functions are virtual. This means that when a member
function is invoked, the interpreter must determine which implementation of
the member function to invoke. This is done by the SEND opcode, which uses a
selector from the stack (actually, just a string containing the name of the
member function) with the method dictionary associated with the object's class
to determine which member function to use. If the lookup fails, the dictionary
from the base class is examined. This continues, following the base-class
chain until either a member function is found or there is no base class. If a
member function is found to correspond to the selector, it replaces the
selector on the stack and control is transferred to the member function, just
as it would have been for a regular function. If no member function is found,
an error is reported and the interpreter aborts.
Bob supports five basic data types: integers, strings, vectors, objects, and
nil. Internally, the interpreter uses four more types: classes, compiled
bytecode functions, built-in function headers, and variables. Wherever a value
can be stored, a tag indicates the type of value presently stored there. The
structure for Bob values is shown in Example 5.
Objects, vectors, and bytecode objects are all represented by an array of
value structures. In the case of bytecode objects, the first element in the
vector is a pointer to the string of bytecodes for the function, and the rest
are the literals referred to by the bytecode instructions. Class objects are
vectors, where the first element is a pointer to the class object and the
remaining elements are the values of the nonstatic member variables for the
object. Built-in functions are just pointers to the C functions that implement
the built-in function. Variables are pointers to dictionary entries for the
variable. There is a dictionary for global symbols and one for classes. Each
class also has a dictionary for data members and member functions.
In addition to the stack, Bob uses a heap to store objects, vectors, and
strings. The current implementation of Bob uses the C heap and the C functions
malloc and free to manage heap space and uses a compacting memory manager.


The Source Code


The Bob bytecode compiler is a fairly straightforward recursive-descent
compiler. At the moment, it uses a set of heavily recursive functions to parse
expressions. I intend to replace that with a table-driven expression parser.
The bytecode interpreter (Listing One) is really just a giant switch statement
with one case for each bytecode.
The source code for Bob is too large (more than 3000 lines) to be included in
this issue. Consequently, it's available electronically; see "Availability" on
page 3. I am including significant portions of the code here, though, and I
hope this will give you a taste of the implementation of Bob.


Conclusions



Well, there it is--a complete, if simple, object-oriented language. I don't
think I'd want to throw away my C or C++ compiler in favor of programming in
Bob, but it could serve as a good basis for building a macro language for an
application program or just as a tool for experimenting with language design
and implementation. It should be fairly easy to extend Bob with more built-in
functions and classes or to build application- specific versions with
functions tailored to your own application. I'm designing a computerized
system for controlling theater lighting and will probably use Bob as a macro
facility in that system. Anyway, have fun playing with Bob and please let me
know if you come up with an interesting application for it.
Example 1: (a) A Bob program for computing factorials; (b) a program that uses
the factorial function to display the factorials of the numbers from 1 to 10.
(a)factorial(n)
{
 return n == 1 ? 1 : n * factorial(n-1) ;
}

(b)main(; i)
{
 for (i = 1; i <= 10; ++i)
 print(i," factorial is ",factorial(i)
,"\n");
}
Example 2: (a) A simple class definition; (b) a constructor function for the
class foo.
(a)
class foo
{
 a,b;
 static last;
 static get_last() ;
}

(b)
foo::foo(aa,bb)
{
 a = aa; b = bb;
 last = this;
 return this;
}
Example 3: (a) Providing access to a member variable outside a member
function; (b) setting the value of a member variable; (c) a member function
that displays the numbers between a and b for any object of the foo class, and
a main function that creates some objects and manipulates them.
(a)
foo::get_a()
 {
 return a;
}

(b)
foo::set_a(aa)
 {
 a = aa;
}

(c)
foo::count (; i)
{
 for (i = a; i <= b; ++i)
 print (i, "\n");
}
main(; foo1, foo2)
{
 foo1 = new foo (1, 2); // create a object of class foo
 foo2 = new foo (11, 22); // and another
 print ("fool counting\n"); // ask the first to count
 foo1 ->count ();
 print ("foo2 counting\n"); // ask the second to count
 foo2 ->count ();
 }

Example 4: (a) Defining a class bar derived from the base class foo; (b) the
constructor for bar needs to initialize this new member variable as well as
doing the initialization normally done for objects of class foo.
(a)
class bar : foo
// a class derived from foo
{
 c;
}

(b)
bar::bar (aa,bb,cc)
{
 this->foo (aa,bb);
 c = cc;
 return this;
}
Table 1: Bob syntax.
 Class Definition
 class <class-name> [ : <base-class-name > ]
 { <member-definition>... }

 Member Definition
 <variable-name>... ;
 static <variable-name>... ;
 <function-name> ( [ <formal-argument-list> ] ) ;
 static <function-name> ( [ <formal-argument-list> ] ) ;

 Function Definition
 [ <class-name> :: ] <function-name>
 ( [ <formal-argument-list> [ ; <temporary-list> ] )
 { <statement>... }

 Statement
 if ( <test-expression> ) <then-statement> [ else <else-statement> ] ;
 while ( <test-expression> ) <body-statement>
 do <body-statement> while <test-expression> ) ;
 for ( <init-expression> ; <test-expression> ; <increment-expression> )
 <body-statement>
 break ;
 continue ;
 return [ <result-expression> ] ;
 [ <expression> ] ;
 { <statement>... }

 Expression
 <expression> , <expression>
 <lvalue> = <expression>
 <lvalue> += <expression>
 <lvalue> -- <expression>
 <lvalue> *= <expression>
 <lvalue> /= <expression>
 <test-expression> ? <true-expression> : <false-expression>
 <expression> <expression>
 <expression> && <expression>
 <expression> <expression>
 <expression> ^< expression>
 <expression> &< expression>
 <expression> == <expression>
 <expression> != <expression>
 <expression> << expression>

 <expression> <= <expression>
 <expression> > = <expression>
 <expression> > <expression>
 <expression> << <expression>
 <expression> > > <expression>
 <expression> + <expression>
 <expression> -- <expression>
 <expression> * <expression>
 <expression> / <expression>
 <expression> % <expression>
 -- <expression>
 ! <expression>
 ~ <expression>
 ++ <lvalue>
 -- <lvalue>
 <lvalue> ++
 <lvalue> --
 new <class-name> ( [ <constructor-arguments> ] )
 <expression> ( [ <arguments> ] )
 <expression> -> <function-name> ( [ <arguments> ] )
 ( <expression> )
 <variable-name>
 <number>
 <string>
 nil
Table 2: Registers used by the virtual machine.
 Register Description 
 code Currently executing code object.
 cbase base of bytecode array for the current code object.
 pc Address of the next bytecode to fetch.
 sp Top of the stack.
 fp Stack frame for the current call.
 stkbase Bottom stack limit.
 stktop Top stack limit.
Example 5: The structure for Bob values.
typedef struct value {
 int v_type; /* data type */
 union { /* value */
 struct class *v_class;
 struct value *v_object;
 struct value *v_vector;
 struct string *v_string;
 struct value *v_bytecode;
 struct dict_entry *v_var;
 int (*v_code) ();
 long v_integer;
 } v;
} VALUE;

Listing One 

/* bobint.c - bytecode interpreter */
/*
 Copyright (c) 1991, by David Michael Betz
 All rights reserved
*/

#include <setjmp.h>
#include "bob.h"


#define iszero(x) ((x)->v_type == DT_INTEGER && (x)->v.v_integer == 0)
#define istrue(x) ((x)->v_type != DT_NIL && !iszero(x))

/* global variables */
VALUE *stkbase; /* the runtime stack */
VALUE *stktop; /* the top of the stack */
VALUE *sp; /* the stack pointer */
VALUE *fp; /* the frame pointer */ int trace=0; /* variable to control tracing
*/

/* external variables */
extern DICTIONARY *symbols;
extern jmp_buf error_trap;

/* local variables */
static unsigned char *cbase; /* the base code address */
static unsigned char *pc; /* the program counter */
static VALUE *code; /* the current code vector */

/* forward declarations */
char *typename();

/* execute - execute a bytecode function */
int execute(name)
 char *name;
{
 DICT_ENTRY *sym;
 
 /* setup an error trap handler */
 if (setjmp(error_trap) != 0)
 return (FALSE);

 /* lookup the symbol */
 if ((sym = findentry(symbols,name)) == NULL)
 return (FALSE);

 /* dispatch on its data type */
 switch (sym->de_value.v_type) {
 case DT_CODE:
 (*sym->de_value.v.v_code)(0);
 break;
 case DT_BYTECODE:
 interpret(sym->de_value.v.v_bytecode);
 break;
 }
 return (TRUE);
}

/* interpret - interpret bytecode instructions */
int interpret(fcn)
 VALUE *fcn;
{
 register int pcoff,n;
 register VALUE *obj;
 VALUE *topframe,val;
 STRING *s1,*s2,*sn;
 
 /* initialize */
 sp = fp = stktop;

 cbase = pc = fcn[1].v.v_string->s_data;
 code = fcn;

 /* make a dummy call frame */ check(4);
 push_bytecode(code);
 push_integer(0);
 push_integer(0);
 push_integer(0);
 fp = topframe = sp;
 
 /* execute each instruction */
 for (;;) {
 if (trace)
 decode_instruction(code,pc-code[1].v.v_string->s_data);
 switch (*pc++) {
 case OP_CALL:
 n = *pc++;
 switch (sp[n].v_type) {
 case DT_CODE:
 (*sp[n].v.v_code)(n);
 break;
 case DT_BYTECODE:
 check(3);
 code = sp[n].v.v_bytecode;
 push_integer(n);
 push_integer(stktop - fp);
 push_integer(pc - cbase);
 cbase = pc = code[1].v.v_string->s_data;
 fp = sp;
 break;
 default:
 error("Call to non-procedure, Type %s",
 typename(sp[n].v_type));
 return;
 }
 break;
 case OP_RETURN:
 if (fp == topframe) return;
 val = *sp;
 sp = fp;
 pcoff = fp[0].v.v_integer;
 n = fp[2].v.v_integer;
 fp = stktop - fp[1].v.v_integer;
 code = fp[fp[2].v.v_integer+3].v.v_bytecode;
 cbase = code[1].v.v_string->s_data;
 pc = cbase + pcoff;
 sp += n + 3;
 *sp = val;
 break;
 case OP_REF:
 *sp = code[*pc++].v.v_var->de_value;
 break;
 case OP_SET:
 code[*pc++].v.v_var->de_value = *sp;
 break;
 case OP_VREF:
 chktype(0,DT_INTEGER);
 switch (sp[1].v_type) { case DT_VECTOR: vectorref(); break;
 case DT_STRING: stringref(); break;

 default: badtype(1,DT_VECTOR); break;
 }
 break;
 case OP_VSET:
 chktype(1,DT_INTEGER);
 switch (sp[2].v_type) {
 case DT_VECTOR: vectorset(); break;
 case DT_STRING: stringset(); break;
 default: badtype(1,DT_VECTOR); break;
 }
 break;
 case OP_MREF:
 obj = fp[fp[2].v.v_integer+2].v.v_object;
 *sp = obj[*pc++];
 break;
 case OP_MSET:
 obj = fp[fp[2].v.v_integer+2].v.v_object;
 obj[*pc++] = *sp;
 break;
 case OP_AREF:
 n = *pc++;
 if (n >= fp[2].v.v_integer)
 error("Too few arguments");
 *sp = fp[n+3];
 break;
 case OP_ASET:
 n = *pc++;
 if (n >= fp[2].v.v_integer)
 error("Too few arguments");
 fp[n+3] = *sp;
 break;
 case OP_TREF:
 n = *pc++;
 *sp = fp[-n-1];
 break;
 case OP_TSET:
 n = *pc++;
 fp[-n-1] = *sp;
 break;
 case OP_TSPACE:
 n = *pc++;
 check(n);
 while (--n >= 0) {
 --sp;
 set_nil(sp);
 }
 break;
 case OP_BRT:
 if (istrue(sp))
 pc = cbase + getwoperand();
 else
 pc += 2;
 break; case OP_BRF:
 if (istrue(sp))
 pc += 2;
 else
 pc = cbase + getwoperand();
 break;
 case OP_BR:

 pc = cbase + getwoperand();
 break;
 case OP_NIL:
 set_nil(sp);
 break;
 case OP_PUSH:
 check(1);
 push_integer(FALSE);
 break;
 case OP_NOT:
 if (istrue(sp))
 set_integer(sp,FALSE);
 else
 set_integer(sp,TRUE);
 break;
 case OP_NEG:
 chktype(0,DT_INTEGER);
 sp->v.v_integer = -sp->v.v_integer;
 break;
 case OP_ADD:
 switch (sp[1].v_type) {
 case DT_INTEGER:
 switch (sp[0].v_type) {
 case DT_INTEGER:
 sp[1].v.v_integer += sp->v.v_integer;
 break;
 case DT_STRING:
 s2 = sp[0].v.v_string;
 sn = newstring(1 + s2->s_length);
 sn->s_data[0] = sp[1].v.v_integer;
 memcpy(&sn->s_data[1],
 s2->s_data,
 s2->s_length);
 set_string(&sp[1],sn);
 break;
 default:
 break;
 }
 break;
 case DT_STRING:
 s1 = sp[1].v.v_string;
 switch (sp[0].v_type) {
 case DT_INTEGER:
 sn = newstring(s1->s_length + 1);
 memcpy(sn->s_data,
 s1->s_data,
 s1->s_length);
 sn->s_data[s1->s_length] = sp[0].v.v_integer; set_string(&sp[1],sn);
 break;
 case DT_STRING:
 s2 = sp[0].v.v_string;
 sn = newstring(s1->s_length + s2->s_length);
 memcpy(sn->s_data,
 s1->s_data,s1->s_length);
 memcpy(&sn->s_data[s1->s_length],
 s2->s_data,s2->s_length);
 set_string(&sp[1],sn);
 break;
 default:

 break;
 }
 break;
 default:
 badtype(1,DT_VECTOR);
 break;
 }
 ++sp;
 break;
 case OP_SUB:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 sp[1].v.v_integer -= sp->v.v_integer;
 ++sp;
 break;
 case OP_MUL:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 sp[1].v.v_integer *= sp->v.v_integer;
 ++sp;
 break;
 case OP_DIV:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 if (sp->v.v_integer != 0) {
 int x=sp->v.v_integer;
 sp[1].v.v_integer /= x;
 }
 else
 sp[1].v.v_integer = 0;
 ++sp;
 break;
 case OP_REM:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 if (sp->v.v_integer != 0) {
 int x=sp->v.v_integer;
 sp[1].v.v_integer %= x;
 }
 else
 sp[1].v.v_integer = 0;
 ++sp;
 break; case OP_INC:
 chktype(0,DT_INTEGER);
 ++sp->v.v_integer;
 break;
 case OP_DEC:
 chktype(0,DT_INTEGER);
 --sp->v.v_integer;
 break;
 case OP_BAND:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 sp[1].v.v_integer &= sp->v.v_integer;
 ++sp;
 break;
 case OP_BOR:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);

 sp[1].v.v_integer = sp->v.v_integer;
 ++sp;
 break;
 case OP_XOR:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 sp[1].v.v_integer ^= sp->v.v_integer;
 ++sp;
 break;
 case OP_BNOT:
 chktype(0,DT_INTEGER);
 sp->v.v_integer = sp->v.v_integer;
 break;
 case OP_SHL:
 switch (sp[1].v_type) {
 case DT_INTEGER:
 chktype(0,DT_INTEGER);
 sp[1].v.v_integer <<= sp->v.v_integer;
 break;
 case DT_FILE:
 print1(sp[1].v.v_fp,FALSE,&sp[0]);
 break;
 default:
 break;
 }
 ++sp;
 break;
 case OP_SHR:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 sp[1].v.v_integer >>= sp->v.v_integer;
 ++sp;
 break;
 case OP_LT:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 n = sp[1].v.v_integer < sp->v.v_integer;
 ++sp; set_integer(sp,n ? TRUE : FALSE);
 break;
 case OP_LE:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 n = sp[1].v.v_integer <= sp->v.v_integer;
 ++sp;
 set_integer(sp,n ? TRUE : FALSE);
 break;
 case OP_EQ:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 n = sp[1].v.v_integer == sp->v.v_integer;
 ++sp;
 set_integer(sp,n ? TRUE : FALSE);
 break;
 case OP_NE:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 n = sp[1].v.v_integer != sp->v.v_integer;
 ++sp;
 set_integer(sp,n ? TRUE : FALSE);

 break;
 case OP_GE:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 n = sp[1].v.v_integer >= sp->v.v_integer;
 ++sp;
 set_integer(sp,n ? TRUE : FALSE);
 break;
 case OP_GT:
 chktype(0,DT_INTEGER);
 chktype(1,DT_INTEGER);
 n = sp[1].v.v_integer > sp->v.v_integer;
 ++sp;
 set_integer(sp,n ? TRUE : FALSE);
 break;
 case OP_LIT:
 *sp = code[*pc++];
 break;
 case OP_SEND:
 n = *pc++;
 chktype(n,DT_OBJECT);
 send(n);
 break;
 case OP_DUP2:
 check(2);
 sp -= 2;
 *sp = sp[2];
 sp[1] = sp[3];
 break;
 case OP_NEW:
 chktype(0,DT_CLASS);
 set_object(sp,newobject(sp->v.v_class));
 break; default:
 info("Bad opcode %02x",pc[-1]);
 break;
 }
 }
}

/* send - send a message to an object */
static send(n)
 int n;
{
 char selector[TKNSIZE+1];
 DICT_ENTRY *de;
 CLASS *class;
 class = sp[n].v.v_object[OB_CLASS].v.v_class;
 getcstring(selector,sizeof(selector),sp[n-1].v.v_string);
 sp[n-1] = sp[n];
 do {
 if ((de = findentry(class->cl_functions,selector)) != NULL) {
 switch (de->de_value.v_type) {
 case DT_CODE:
 (*de->de_value.v.v_code)(n);
 return;
 case DT_BYTECODE:
 check(3);
 code = de->de_value.v.v_bytecode;
 set_bytecode(&sp[n],code);

 push_integer(n);
 push_integer(stktop - fp);
 push_integer(pc - cbase);
 cbase = pc = code[1].v.v_string->s_data;
 fp = sp;
 return;
 default:
 error("Bad method, Selector '%s', Type %d",
 selector,
 de->de_value.v_type);
 }
 }
 } while ((class = class->cl_base) != NULL);
 nomethod(selector);
}

/* vectorref - load a vector element */
static vectorref()
{
 VALUE *vect;
 int i;
 vect = sp[1].v.v_vector;
 i = sp[0].v.v_integer;
 if (i < 0 i >= vect[0].v.v_integer)
 error("subscript out of bounds");
 sp[1] = vect[i+1];
 ++sp;
} 
/* vectorset - set a vector element */
static vectorset()
{
 VALUE *vect;
 int i;
 vect = sp[2].v.v_vector;
 i = sp[1].v.v_integer;
 if (i < 0 i >= vect[0].v.v_integer)
 error("subscript out of bounds");
 vect[i+1] = sp[2] = *sp;
 sp += 2;
}

/* stringref - load a string element */
static stringref()
{
 STRING *str;
 int i;
 str = sp[1].v.v_string;
 i = sp[0].v.v_integer;
 if (i < 0 i >= str->s_length)
 error("subscript out of bounds");
 set_integer(&sp[1],str->s_data[i]);
 ++sp;
}

/* stringset - set a string element */
static stringset()
{
 STRING *str;
 int i;

 chktype(0,DT_INTEGER);
 str = sp[2].v.v_string;
 i = sp[1].v.v_integer;
 if (i < 0 i >= str->s_length)
 error("subscript out of bounds");
 str->s_data[i] = sp[0].v.v_integer;
 set_integer(&sp[2],str->s_data[i]);
 sp += 2;
}

/* getwoperand - get data word */
static int getwoperand()
{
 int b;
 b = *pc++;
 return ((*pc++ << 8) b);
}

/* type names */
static char *tnames[] = {
"NIL","CLASS","OBJECT","VECTOR","INTEGER","STRING","BYTECODE",
"CODE","VAR","FILE"
}; 
/* typename - get the name of a type */
static char *typename(type)
 int type;
{
 static char buf[20];
 if (type >= _DTMIN && type <= _DTMAX)
 return (tnames[type]);
 sprintf(buf,"(%d)",type);
 return (buf);
}

/* badtype - report a bad operand type */
badtype(off,type)
 int off,type;
{
 char tn1[20];
 strcpy(tn1,typename(sp[off].v_type));
 info("PC: %04x, Offset %d, Type %s, Expected %s",
 pc-cbase,off,tn1,typename(type));
 error("Bad argument type");
}

/* nomethod - report a failure to find a method for a selector */
static nomethod(selector)
 char *selector;
{
 error("No method for selector '%s'",selector);
}

/* stackover - report a stack overflow error */
stackover()
{
 error("Stack overflow");
}

































































Special Issue, 1994
The Tcl Programming Language


A powerful scripting language designed for ease of use




John K. Ousterhout


John is creator of the Tcl language, author of Tcl and the Tk Toolkit, and a
professor at the University of California at Berkeley. He can be contacted at
ouster@allspice.cs.berkeley.edu. This article is excerpted from Tcl and the Tk
Toolkit, by John K. Ousterhout. Copyright (C) 1994 by Addison-Wesley
Publishing Company. Reprinted by permission of Addison-Wesley.


Tcl, short for "tool command language" (and pronounced "tickle"), is a simple
scripting language for controlling and extending applications. Tcl provides
generic programming facilities, such as variables, loops, and procedures,
useful for a variety of applications. Furthermore, Tcl is an embeddable
command language. Its interpreter is a library of C procedures that can easily
be incorporated into applications, and each application can extend the core
Tcl features with additional commands for that application.
The core Tcl facilities can be extended with Tk, a toolkit for the X Window
System. Tk includes commands for building user interfaces so that you can
construct Motif-like UIs by writing Tcl scripts instead of C code. Like Tcl,
Tk is implemented as a library of C procedures, so it can be used in many
different applications. Individual applications can also extend the base Tk
features with new UI widgets and geometry managers written in C.
One benefit of Tcl and Tk is that they lend themselves to rapid development.
Many GUI applications can be written entirely as Tcl scripts, using a
windowing shell called "wish" that allows you to program at a higher level
than you would in C/C++. Compared to toolkits in which you program in C (such
as Motif), there is much less code to write and certainly much less to learn.
Another reason Tcl and Tk development is rapid is that Tcl is an interpreted
language. When you use a Tcl application, you can generate and execute new
scripts on the fly without recompiling or restarting the application. This
allows you to test out new ideas and fix bugs rapidly. Interpreted code
executes more slowly than compiled C code, but today's workstations are
surprisingly fast. For example, you can execute scripts with hundreds of Tcl
commands on each movement of the mouse with no perceptible delay. In rare
cases where performance becomes an issue, you can reimplement the
performance-critical parts of your Tcl scripts in C.
Tcl also makes it easy for applications to have their own powerful scripting
languages. To create a new application, you simply implement a few new Tcl
commands that provide the basic features of your application. Then you can
link your new commands with the Tcl library to produce a full-function
scripting language that includes both the commands provided by Tcl (the "Tcl
core") and those that you wrote, as in Figure 1(a). For example, an
application for reading electronic bulletin boards (BBSs) might contain C code
that implements one to query a BBS for new messages and another Tcl command to
retrieve a given message. Once these commands exist, Tcl scripts can be
written to cycle through the new messages from all the BBSs and display them
one at a time, keep a record in disk files of which messages have been read
and which haven't, or search one or more BBSs for messages on a particular
topic. 
Furthermore, Tcl is an excellent glue language. A Tcl application can include
many different library packages, each of which provides an interesting set of
Tcl commands, as in Figure 1(b). Tk is just one example library package; many
others exist. Tcl scripts for such applications can include commands from any
of the packages.
Tcl scripts can also be used as a communication mechanism to allow different
applications to work together. For example, any windowing application based on
Tk can send a Tcl script to any other Tk application to be executed there.
Among other things, this makes multimedia effects much more accessible: Once
audio and video applications have been built with Tk, any Tk application can
issue record and play commands to them. Indeed, an audio extension for Tcl
called "Ak" provides Tcl commands for file playback, recording, telephone
control, and synchronization. In addition, spreadsheets can update themselves
from database applications, UI editors can modify the appearance and behavior
of live applications as they run, and so on. In fact, Tcl's use of a common,
interpreted language for communication between applications is more powerful
and flexible than static approaches such as Microsoft's OLE and Sun
Microsystems' ToolTalk.


Getting Started


To invoke Tcl scripts, you must run a Tcl application. The Tcl system includes
a simple Tcl shell application called "tclsh," on which the examples presented
here are based. To run the application, simply type tclsh to your shell and
the program will start up in interactive mode, reading Tcl commands from the
keyboard and passing them to the Tcl interpreter for evaluation. Entering
tclsh expr 2 + 2, for example, prints the result (4) and prompts you for
another command. 
This simple example illustrates several Tcl features. First, Tcl commands are
similar in form to shell commands, each consisting of one or more command
words separated by spaces or tabs. This example has four words: expr, 2, +,
and 2. The first word of each command, or "name," selects a C procedure in the
application to carry out the command's function. The other words are
"arguments" passed to the C procedure. expr, a core command provided by the
Tcl library that exists in every Tcl application, concatenates its arguments
into a single string and evaluates the string as an arithmetic expression.
Each Tcl command returns a result string. For the expr command, the result is
the value of the expression. Results are always returned as strings, so expr
converts its numerical result back to a string in order to return it. If a
command has no meaningful result, it returns an empty string.
Commands are normally terminated by newlines, so each line that you type to
tclsh becomes a separate command. Semicolons also act as command separators,
just in case you want to enter multiple commands on a single line. Single
commands can also span multiple lines.
The expr command supports an expression syntax similar to that in ANSI C,
including the same precedence rules and most of the C operators. Example 1
provides a few examples that you could enter into tclsh. Invoking the exit
command terminates the application and returns you to your shell.


Hello World with Tk


Although Tcl provides a full set of programming features such as variables,
loops, and procedures, it is not typically used by itself. It is intended to
be used as part of applications that contain their own Tcl commands, in
addition to those in the Tcl core. The application-specific extensions provide
interesting primitives, and Tcl is used to assemble the primitives into useful
functions. It is easier to understand Tcl's facilities if you have seen some
application-specific commands to use with Tcl.
One of the most interesting extensions to Tcl is the set of windowing commands
provided by the Tk toolkit. Most of the examples in this article use an
application called "wish" (short for "windowing shell") which is similar to
tclsh except that it includes the commands defined by Tk that allow you to
create GUIs. If Tcl and Tk have been installed on your system, you can invoke
wish from your shell; it will display a small empty window on your screen and
then read commands from standard input. Example 2 is a simple wish script. If
you type these two Tcl commands to wish, the window's appearance will change
to that shown in Figure 2. If you move the pointer over the "Hello, world!"
text and click mouse button 1 (usually the left-most button), the window will
disappear and wish will exit.
Several things about this example are worth noting. First, the example
contains two commands, button and pack, both of which are implemented by Tk.
Although these commands do not look like the expr command, they have the same
basic structure as all Tcl commands: one or more words, separated by white
space. The button command contains six words, and pack two.
The fourth word of the button command is enclosed in double quotes. This
allows the word to include white-space characters; without the quotes,
"Hello," and "world!" would be separate words. The double quotes are not part
of the word itself; they are removed by the Tcl interpreter before the command
is executed. 
The word structure doesn't matter for the expr command since expr concatenates
all its arguments. For button and pack (and most other Tcl commands), however,
the word structure is important. The button command expects its first argument
to be the name of a new window to create. Subsequent arguments must come in
pairs, where the first argument of each pair is the name of a configuration
option and the second is a value for that option. Thus if the double quotes
were omitted, the value of the --text option would be "Hello," and "world!"
would be treated as the name of a separate configuration option. Since no
option is defined with the name "world!", the command would return an error.
The basic building block for a GUI in Tk is a "widget"--a window with a
particular appearance and behavior (the terms "widget" and "window" are used
synonymously in Tk). Widgets are divided into classes such as buttons, menus,
and scroll bars. All the widgets in the same class have the same general
appearance and behavior. For example, all button widgets display a text string
or bitmap and execute a Tcl command when invoked with the mouse.
Widgets are organized hierarchically in Tk, with names that reflect their
positions in the hierarchy. The "main widget," which appears on the screen
when you start wish, has the name ".". The name .b, for instance, refers to a
child b of the main widget. Widget names in Tk are like filenames in UNIX
except that they use a period as a separator character instead of a slash.
Thus, .a.b.c refers to a widget that is a child of widget .a.b, which is a
child of .a, which is a child of the main widget.
Tk provides one "class command" for each class of widgets, which you invoke to
create widgets of that class. For example, the button command creates button
widgets. All of the class commands have the same form: The first argument is
the name of a new widget to create, and additional arguments specify
configuration options. Different widget classes support different sets of
options. Widgets typically have many options (about 20 different options are
defined for buttons, for example), and default values are provided for the
options that you don't specify. When a class command like button is invoked,
it creates a new widget with the given name and configures it as specified by
the options.
The command in Example 2 specifies two options: --text, a string to display in
the button, and --command, a Tcl script to execute when the user invokes the
button. In Example 2, the --command option is exit. Other button options are
listed in Table 1.
The pack command causes the button widget to appear on the screen. Creating a
widget does not automatically cause it to be displayed. Independent entities
called "geometry managers" are responsible for computing the size and location
of widgets and making them appear on the screen. The pack command in Example 2
asks a geometry manager called the "packer" to manage .b. The command asks
that .b fill the entire area of its parent window; if the parent has more
space than needed by its child, the parent is shrunk so that it is just large
enough to hold the child. Thus when you type the pack command, the main window
shrinks from its original size to the size in which it appears in Figure 2.


Script Files


You can also place commands into script files and invoke the script files just
like shell scripts. To do this for Example 2, place the text in Example 3 in a
file named "hello." This script is the same as Example 2 except for the first
line. As far as wish is concerned, this line is a comment, but if you make the
file executable (type chmod +x hello to your shell, for example) you can
invoke the file directly by typing hello to your shell. The system will then
invoke wish, passing it the file as a script to interpret; wish will display
the same window and wait for you to interact with it. In this case, you will
not be able to type commands interactively to wish; all you can do is click on
the button. (Note: This script will work only if wish is installed in
/usr/local/bin. If wish has been installed elsewhere, you will need to change
the first line to reflect its location on your system. Some systems will
misbehave in confusing ways if the first line of the script file is longer
than 32 characters, so beware if the full pathname of the wish binary is
longer than 27 characters.)
In practice, Tk application users rarely type Tcl commands; they interact with
the applications using the mouse and keyboard in ways normal for graphical
applications. Tcl works behind the scenes. The hello script behaves just the
same as an application coded in C with a toolkit such as Motif and compiled
into a binary executable file.
During debugging, however, it is common for application developers to type Tcl
commands interactively. For example, you could test the hello script by
starting wish interactively. Type wish to your shell instead of hello. Then
enter the Tcl command source hello where source is a Tcl command that takes a
filename as an argument. It reads the file and evaluates it as a Tcl script.
This generates the same UI as if you had invoked hello directly from your
shell, but you can now type Tcl commands interactively. For example, you could
edit the script file to change the --command option to Example 4(a), then
interactively type the commands in Example 4(b) to wish without restarting the
program. The first command in Example 4(b) deletes the existing button, and
the second recreates the button with the new --command option. When you click
on the button, the puts command prints a message on standard output before
wish exits.


Variables and Substitutions



Tcl allows you to store values in variables and use those values in commands.
For instance, you could enter the script in Example 5 into either tclsh or
wish. Example 5(a) assigns the value 44 to variable a and returns the
variable's value. In Example 5(b), the $ causes Tcl to perform "variable
substitution," replacing the dollar sign and the variable name following it
with the value of the variable, so that the actual argument received by expr
is 44*4. Variables need not be declared in Tcl; they are created automatically
when set. Variable values are stored as strings, and arbitrary string values
of any length are allowed. Of course, in this example an error will occur in
expr if the value of a doesn't make sense as an integer or real number (try
other values and see what happens).
Tcl also provides command substitution, which allows you to use the result of
one command in an argument to another. Square brackets invoke command
substitution: Everything inside the brackets is evaluated as a separate Tcl
script, the result of which is substituted into the word in place of the
bracketed command. In Example 5(c), the second argument of the second set
command will be 176.
The final form of substitution in Tcl is backslash substitution, which allows
you to use various special characters in a command, as in Example 5(d). The
first command sets variable a to the string $a (the characters \$ are replaced
with a dollar sign and no variable substitution occurs). The second command
sets variable newline to hold a string consisting of the newline character
(the characters \n are replaced with a newline character).


Control Structures


Example 6 uses variables and substitutions along with some simple control
structures to create a Tcl procedure called "power," which raises a base to an
integer power. If you enter Example 6(a) into wish or tclsh, or if you enter
them into a file and then source the file, a new command power will become
available. The command takes two arguments, a number and an integer power, and
its result is the number raised to the power; see Example 6(b). This example
uses Tcl braces which are like double quotes in that they can be placed around
a word that contains embedded spaces. However, braces are different from
double quotes in two respects. First, braces nest. The last word of the proc
command starts after the open brace on the first line and contains everything
up to the close brace on the last line. The Tcl interpreter removes the outer
braces and passes everything between them, including several nested pairs of
braces, to proc as an argument. The second difference between braces and
double quotes is that substitutions cannot occur inside braces. All of the
characters between the braces are passed verbatim to proc without any special
processing.
The proc command takes three arguments: the name of a procedure, a list of
argument names separated by white space, and the body of the procedure, which
is a Tcl script. proc enters the procedure name into the Tcl interpreter as a
new command. Whenever the command is invoked, the body of the procedure is
evaluated. While the procedure body is executing, it can access its arguments
as variables: base will hold the first argument to power and p will hold the
second.
The body of the power procedure contains the Tcl commands set, while, and
return. The while command does most of the procedure's work. It takes two
arguments: an expression $p>0 and a body, which is another Tcl script. The
while command evaluates its expression argument; if the result is nonzero, it
evaluates the body as a Tcl script. It repeats this process until eventually
the expression evaluates to zero. In Example 6, the body of the while command
multiplies the result value by base and then decrements p. When p reaches
zero, the result contains the desired power of base.
The return command causes the procedure to exit with the value of variable
result as the procedure's result. If it is omitted, the return value of the
procedure will be the result of the last command in the procedure's body. In
the case of power, this would be the result of while, which is always an empty
string.
The use of braces in this example is crucial. The single most difficult issue
in writing Tcl scripts is managing substitutions: making them happen when you
want them and preventing them when you don't. The body of the procedure must
be enclosed in braces because you don't want variable and command
substitutions to occur at the time the body is passed to proc as an argument;
you want the substitutions to occur later, when the body is evaluated as a Tcl
script. The body of the while command is enclosed in braces for the same
reason: You want the substitutions to be performed each time the body is
evaluated, rather than once, while parsing the while command. Braces are also
needed in the {$p>0} argument to while. Without them, the value of variable p
would be substituted when parsing the while command; the expression would have
a constant value and while would loop forever. 
Although Tcl doesn't require it, for readability I'm using a syntax where the
open brace for an argument that is a Tcl script appears at the end of one
line, the script follows on successive lines indented, and the close brace is
on a line by itself after the script. Arguments that are scripts are subject
to the same syntax rules as any other arguments; in fact, the Tcl interpreter
doesn't even know that an argument is a script at the time it parses it. One
consequence is that the open brace must be on the same line as the preceding
portion of the command. If the open brace is moved to a line by itself, the
newline before the open brace will terminate the command.
The variables in a procedure are normally local to that procedure and will not
be visible outside it. In Example 6, the local variables include the arguments
base and p as well as the variable result. A fresh set of local variables is
created for each call to a procedure (arguments are passed by copying their
values), and when a procedure returns, its local variables are deleted.
Variables named outside any procedure are "global"--they last forever unless
explicitly deleted. 


The Tcl Language


As a programming language, Tcl is defined quite differently than most other
languages. In most languages, a grammar defines the entire language. In
Example 7(a), for instance, the grammar for C defines the structure of this C
statement in terms of the reserved word while, an expression, and a
substatement to execute repeatedly until the expression evaluates to 0. The C
grammar defines both the overall structure of the while statement and the
internal structure of its expression and substatement.
In Tcl, no fixed grammar explains the entire language. Instead, it is defined
by an interpreter that parses single Tcl commands, plus a collection of
procedures that execute them. The interpreter and its substitution rules are
fixed, but new commands can be defined at any time and existing commands can
be replaced. Features such as control flow, procedures, and expressions are
implemented as commands; they are not understood directly by the Tcl
interpreter. For example, the Tcl command in Example 7(b) is equivalent to the
C while loop in Example 7(a). 
When the command in Example 7(b) is evaluated, the Tcl interpreter knows only
that it has three words, the first of which is a command name. The Tcl
interpreter has no idea that the first argument to while is an expression and
the second is a Tcl script. Once the command has been parsed, the Tcl
interpreter passes the words of the command to while, which treats its first
argument as an expression and the second as a Tcl script. If the expression
evaluates to nonzero, then while passes its second argument back to the Tcl
interpreter for evaluation. At this point, the interpreter treats the contents
of the argument as a script (that is, it performs command and variable
substitutions and invokes the expr and set commands).
As far as the Tcl interpreter is concerned, the set command in Example 7(c) is
identical to while except that it has a different command name. Therefore, the
interpreter handles the two commands identically, except that it invokes a
different procedure to execute set. The set command treats its first argument
as a variable name and its second as a new value for that variable, so it will
set a variable with the name $p>0.
The most common mistake made by new Tcl users is trying to understand Tcl
scripts in terms of a grammar; this leads them to expect much more
sophisticated behavior from the interpreter than actually exists. For example,
a C programmer using Tcl for the first time might think that the first pair of
braces in the while command serves a different purpose than the second pair.
In reality, there is no difference. In each case, the Tcl interpreter passes
the characters between the braces to the command without performing any
substitutions.
Thus the entire Tcl "language" consists of about a half-dozen simple rules for
parsing arguments and performing substitutions. At the same time, Tcl is
powerful enough to allow a rich set of structures such as loops and procedures
to be built as ordinary commands. Applications can extend Tcl not just with
new commands but also with new control structures.


Event Bindings


A binding causes a certain Tcl script to be evaluated whenever a particular
event occurs in a particular window. The --command option for buttons is an
example of a simple binding implemented by a particular widget class. Tk also
includes a more general mechanism that can be used to extend the behavior of
widgets in nearly arbitrary ways.
When copied into a file and invoked from your shell, the script in Example 8
will produce a screen display like that in Figure 3. The display has two entry
widgets in which you can click with the mouse and type numbers. If you press
the Return key in either of the entries, the result will appear on the right
side of the window. You can compute different results by modifying either the
base or the power and then pressing Return again.
This application consists of five widgets: two entries and three labels.
Entries are widgets that display one-line text strings that you can edit
interactively. The two entries, .base and .power are used for entering the
numbers. Each entry is configured with a --width of 6 (large enough to display
about six digits) and a --relief of "sunken" (which gives the entry a
depressed appearance). The --textvariable option for each entry specifies the
name of a global variable to hold the entry's text; any changes you make in
the entry will be reflected in the variable and vice versa.
Two of the labels, .label1 and .label2, hold decorative text; the third,
.result, holds the result of the power computation. The --textvariable' option
for .result causes it to display whatever string is in the global variable
result and to update itself whenever the variable changes. In contrast,
.label1 and .label2 display constant strings.
The pack command arranges the five widgets in a row from left to right. The
command occupies two lines in the script; the backslash at the end of the
first line is a line-continuation character, which causes the newline to be
treated as a space. The --side option means that each widget is placed at the
left side of the remaining space in the main widget: first .base is placed at
the left edge of the main widget, then .label1 is placed at the left side of
the space not occupied by .base, and so on. The --padx and --pady options make
the display more attractive by arranging for one millimeter of extra space on
the left and right sides of each widget, plus two millimeters of extra space
above and below it. The m suffix specifies millimeters; you could also use c
for centimeters, i for inches, p for points, or no suffix for pixels.
The bind commands connect the UI to the power procedure. Each bind command has
three arguments: the name of a widget, an event specification, and a Tcl
script to invoke when the given event occurs in the given widget. <Return>
specifies an event consisting of the user pressing the Return key on the
keyboard. Other useful event specifiers are listed in Table 2.
The scripts for the bindings invoke power, passing it the values in the two
entries and storing the result in the result variable so that it will be
displayed in the .result widget. These bindings extend the generic, built-in
behavior of the entries (editing text strings) with application-specific
behavior (computing a value based on two entries and displaying that value in
a third widget).
The script for a binding has access to several pieces of information about the
event, such as the location of the pointer when the event occurred. For
example, if you start up wish interactively, type the command
bind.<Any-Motion> {puts "pointer at %x,%y"}'>, and move the pointer over the
window, each time the pointer moves, a message will be printed on standard
output giving its new location. When the pointer motion event occurs, Tk scans
the script for % sequences and replaces them with information about the event
before passing the script to Tcl for evaluation. %x is replaced with the
pointer's x-coordinate, and %y is replaced with the pointer's y-coordinate.


Subprocesses


Normally, Tcl executes each command by invoking a C procedure in the
application to carry out its function. This is different from a shell program
such as sh, where each command is executed in a separate subprocess. However,
Tcl also allows you to create subprocesses using the exec command; see Example
9.
The exec command treats its arguments much like the words of a shell command
line. In Example 9(a), exec creates a new process to run the grep program and
passes it #include and tk.h as arguments, just as if you had typed grep
#include tk.h to your shell. The grep program searches file tk.h for lines
that contain the string #include and prints those lines on its standard
output. However, exec arranges for standard output from the subprocess to be
piped back to Tcl. exec waits for the process to exit, then returns all of the
standard output as its result. With this mechanism you can execute
subprocesses and use their output in Tcl scripts. exec also supports input and
output redirection using standard shell notation such as <, <<, and >,
pipelines with , and background processes with &.
Example 9(b) creates a simple UI for saving and reinvoking commonly used shell
commands. If you type the script into a file named "redo" and invoke it, the
script initially creates an interface with a single entry widget. You can type
a shell command such as ls into the entry, as shown in Figure 4(a). If you
press Return, the command gets executed as if you had typed it to the shell
from which you invoked redo; the <@ and >@ arguments to exec cause the
standard input and output files for the command to be the same as those for
wish. Furthermore, the script creates a new button widget that displays the
command, and you can reinvoke the command later by clicking on the button; see
Figure 4(b). As you type more commands, more buttons appear, up to a limit of
five remembered commands, as in Figure 4(c). 
The most interesting part of the redo script is in the bind command. The
binding for <Return> must execute the command, which is stored in the cmd
variable, and create a new button widget. First it creates the widget. The
button widgets have names like .b1, .b2, and so on, where the number comes
from the variable id which starts at 0 and increments before each new button
is created. The notation .b$id generates a widget name by concatenating .b
with the value of id. Before creating a new widget, the script checks to see
if there are already five saved commands; if so, the oldest existing button is
deleted. The notation .b[expr $id --5] produces the name of the oldest button
by subtracting five from the number of the new button and concatenating it
with .b. The --command option for the new button invokes exec and redirects
standard input and output for the subprocess(es) to wish's standard input and
standard output, which are the same as those of the shell from which wish was
invoked. This causes output from the subprocesses to appear in the shell's
window instead of being returned to wish.
The command pack .b$id --fill x makes the new button appear at the bottom of
the window. The option --fill x improves the appearance by stretching the
button horizontally so that it fills the width of the window even if the text
doesn't really need that much space. 
The last two commands of the binding script are called "widget commands."
Whenever a new widget is created, a new Tcl command with the same name is
created, and you can invoke this command to communicate with the widget. The
first argument to a widget command selects one of several operations, and
additional arguments are used as parameters for that operation. In the redo
script, the first widget command causes the button widget to invoke its
--command option, just as if you had clicked the mouse button on it. The
second widget command clears the entry widget in preparation for a new command
to be typed.
Each class of widget supports a different set of operations in its widget
commands, but many of the operations are similar from widget to widget. For
example, every widget class supports a configure widget command that can be
used to modify any of the configuration options for the widget. If you run the
redo script interactively, you can type in the command in Example 10(a) to
change the background of the entry widget to yellow or the command in Example
10(b) to change the color of the text in button .b1 to brown and then cause
the button to flash.
One of the most important things about Tcl and Tk is that they make every
aspect of an application accessible and modifiable at run time. For example,
the redo script modifies its own interface on the fly. In addition, Tk
provides commands that you can use to query the structure of the widget
hierarchy, and you can use configure widget commands to query and modify the
configuration options of individual widgets.


Conclusion



There's much more to Tcl and Tk than I've presented here. Also available, for
instance, is Extended Tcl (TclX), a library package that augments the built-in
Tcl commands by providing access to POSIX functions and system calls, file
scanning similar to awk, keyed lists, online help, and so on. Tcl Distributed
Programming (Tcl-DP) is a collection of Tcl commands that simplify the
development of distributed programs, while XF is an interactive interface
builder that was actually written in Tcl. For additional information on these
and other Tcl, refer to my book, Tcl and the Tk Toolkit.
Additional Tcl and Tk Features
Tcl and Tk contain many other facilities not discussed in this article,
including:
Arrays and lists: arrays, for efficiently storing key value pairs; and lists,
for managing aggregates of data.
Control structures for controlling the flow of execution, such as eval, for,
foreach, and switch.
String manipulation: measuring their length, regular-expression pattern
matching and substitution, and format conversion.
File access: You can read and write files from Tcl scripts and retrieve
directory information and file attributes, such as length and creation time.
Widgets: menus, scroll bars, a drawing widget called a "canvas," and a text
widget for achieving hypertext effects.
Access to X facilities: access to all major facilities in the X Window System,
such as commands for communicating with the window manager, retrieving the
selection, and managing the input focus.
Interapplication communication: a send command can be used to issue arbitrary
Tcl/Tk scripts to other Tk-based applications.
C interfaces: C-library procedures to define new Tcl commands in C and a
library for creating new widget classes and geometry managers in C.
--J.K.O.
Example 1: (a) Using the bitwise left-shift operator <<; (b) expressions that
contain real values as well as integer values; (c) use of relational operators
> and <= and the logical OR operator . As in C, Boolean results are
represented numerically with 1 for true and 0 for false.
(a)
 Enter: expr 3 << 2
 Returned value: 12
(b)
 Enter: expr 14.1*6
 Returned value: 84.6
(c)
 Enter: expr (3 > 4) 
(6 <= 7)
 Returned value: 1
Example 2: Simple wish script.
button .b -text "Hello, world!" -
command exit
pack .b
Figure 1 Simple TCL application consisting of the Tcl interpreter plus a few
application-specific commands; (b) complex application which includes the
commands defined by Tk plus additional commands defined by other packages.
Figure 2 Tcl "hello world" program.
Table 1: Typical button options.
 Option Description 
 --background Background color for the button, such as blue.
 --foreground Color of the text in the button, such as black.
 --font Name of the font to use for the button, such as *--times--medium--r
 normal*120* for a 12-point Times Roman font.
Example 3: Typical script file.
#!/usr/local/bin/wish -f
button .b -text "Hello, world!" -
command exit
pack .b
Example 4: Interactively entering data into a Tcl script file.
(a)
 -command "puts Good-bye!; exit"
(b)
 destroy .b
source hello
Example 5: Tcl variables and substitution.
(a)
 Enter: set a 44
 Returned value: 44
(b)
 Enter: expr $a*4
 Returned value: 176
(c)
 Enter: set a 44
 set b [expr $a*4]
 Returned value: 176
(d)
 set x \$a

 set newline \n
Example 6: Tcl control structures.
(a)
 proc power {base p} {
 set result 1
 while {$p0} {
 set result [expr $result*$base]
 set p [expr $p-1]
 }
return $result
(b)
 Enter: power 2 6
 Returned value: 64
 Enter: power 1.15 5
 Returned value: 2.01136
Example 7: (a) Typical C statement; (b) Tcl equivalent; (c) using the Tcl set
command.
(a)
 while (p>0) {
 result *= base;
 p -= 1;
}
(b)
 while {$p>0} {
 set result [expr $result*$base]
 set p [expr $p-1]
}
(c)
 set {$p>0} {
 set result [expr $result*$base]
 set p [expr $p-1]
}
Example 8: A Tcl script.
#!/usr/local/bin/wish -f
proc power {base p} {
 set result 1
 while {$p>0} {
 set result [expr $result*$base]
 set p [expr $p-1]
 }
 return $result
}
entry .base -width 6 -relief sunken -textvariable base
label .label1 -text "to the power"
entry .power -width 6 -relief sunken -textvariable power
label .label2 -text "is"
label .result -textvariable result
pack .base .label1 .power .label2 .result -side left \
 -padx 1m -pady 2m
bind .base <Return> {set result [power $base $power]}
bind .power <Return> {set result [power $base $power]}
Figure 3 A GUI that computes powers of a base.
Example 9: Tcl lets you create subprocesses using the exec command.
(a)
 Enter: exec grep #include tk.h
 Returned value: #include <tcl.h>
 #include <X11/Xlib.h>
 #include <stddef.h>
(b)
#!/usr/local/bin/wish -f

set id 0
entry .entry -width 30 -relief sunken -textvariable cmd
pack .entry -padx 1m -pady 1m
bind .entry <Return> {
 set id [expr $id + 1]
 if {$id > 5} {
 destroy .b[expr $id - 5]
 }
 button .b$id -command "exec <@stdin >@stdout $cmd" \
 -text $cmd
 pack .b$id -fill x
 .b$id invoke
 .entry delete 0 end
}
Table 2: Typical event specifiers.
 Event Specifier Description 
 <Button1> Mouse button 1 is pressed.
 <1> Shorthand for <Button1>.
 <ButtonRelease1> Mouse button 1 is released.
 <Double-Button1> Double-click on mouse button 1.
 <Keya> The "a" key is pressed.
 <Motion> Pointer motion with no buttons or modifier keys pressed.
 <B1-Motion> Point motion with button 1 pressed.
 <Any-Motion> Pointer motion with any (or no) button or modifier keys pressed.
Figure 4 The redo application.
Example 10: (a) Changing the background of the entry widget to yellow; (b)
changing the color of the text in button to brown, then causing it to flash.
(a)
 .entry configure -background yellow
(b)
 .b1 configure -foreground brown
 .b1 flash>































Special Issue, 1994
Quincy: The Architecture of a C Interpreter


A complete environment for rapid application development




Al Stevens


Al is a DDJ contributing editor and can be contacted on CompuServe at
71101,1262.


Quincy is an interactive C-language interpreter with a user interface similar
to that of QBasic. Quincy runs under MS-DOS in text mode. I originally
developed Quincy as a Standard C language teaching aid and included it with my
book, Al Stevens Teaches C (MIS:Press, 1994), a C tutorial for developers who
program in other languages. I subsequently presented the interpreter in a
series of my "C Programming" columns in Dr. Dobb's Journal, commencing with
the May, 1994 issue. This article provides an overview of Quincy's software
architecture.


Architectural Overview


Figure 1 shows Quincy's two primary subsystems: the integrated development
environment (IDE) and the translator. The IDE provides a user interface,
source-code editor, and source-level debugger. The translator contains the C
preprocessor, lexical scanner, linker, and interpreter.
The two subsystems are loosely coupled in that neither depends heavily on the
details of the other. My goals were to build an IDE that is independent of the
programming language it supports and a language translator that I could port
to other operating environments with a minimum of effort.


The C-Language Implementation


Quincy interprets a subset of Standard C, including most of the preprocessing
directives (#line and #pragma are not implemented), most of the standard
library functions, and most of the C language, with these exceptions:
Structure bitmap fields are not supported, and arrays are limited to four
dimensions.
A Quincy program consists of one translation unit, which means that the
program does not link object files and libraries built by a compiler. All the
code for the interpreted program is contained in one source-code file and the
header files that it includes. Programs developed with Quincy can be compiled
with ANSI Standard C compilers.
Programs written in Quincy use the standard input/output console device. Its
implementation uses Standard C library functions. There are also some
nonstandard conio.h functions to support direct console input/output that
bypasses DOS and goes directly to BIOS.
Error detection is shared between the processes that prepare the code for
execution and the run-time interpreter.
Performance is biased toward the development cycle. Compile-time efficiency is
emphasized, and execution time takes a backseat. This strategy produces an
interpreter that begins running the program almost immediately. By deferring
most of the translation to the run-time interpreter, the strategy penalizes
execution time in favor of fast turnaround in the interactive development
environment. This trade-off is necessary to support the tutorials for which I
designed Quincy.


The IDE


Quincy's IDE implements the user interface with menus and dialog boxes, a
source-code editor, and an interactive source-level debugger including
breakpoints, watch variables, and support for examining and modifying
variables while the interpreted program is running.


User Interface


To approximate the common user access interface shared by contemporary
applications, I used the D-Flat function library, which I developed as a Dr.
Dobb's Journal "C Programming" column project over the past several years.
D-Flat supports an application window, drop-down menus, and dialog boxes, with
user access through the mouse and keyboard. It offers a look-and-feel similar
to those of MS-DOS utility programs such as DOSSHELL, QBasic, and EDIT. Its
Windows-like Help subsystem was easily adapted to the tutorial sessions that I
built as exercises in chapters of the book.
Quincy's user interface consists of a source-code-editor application window
with menus and dialog boxes that support text editing and interactive
debugging of the program. The user can shell out to DOS, get online help, and
set several options, which are automatically saved for subsequent sessions.
The application window can display the source-code text and an optional watch
window that displays variables being watched during a debug session.


Source-Code Editor


D-Flat has an EDITBOX control oriented more toward simple word processing than
source-code editing. For Quincy I derived an EDITOR class from the EDITBOX
class to remove word wrapping and add support for tab characters embedded in
the text.
You can load an existing source-code file into the editor or begin writing a
new program. A clipboard includes cut, copy, and paste commands, and there are
text search and replace commands. You can print the current source-code file
and save it to disk, giving it any name you choose. The program recognizes
when you have changed and not saved a file and prompts you to save it if you
are exiting the program or replacing the file in memory.


Debugger



The debugger responds to commands from the user interface to run the program,
step through it, set breakpoints, set watch variables, and examine and modify
variable values. When the user runs or steps through the program, the debugger
calls the translator, which compiles and interprets the program.
After interpreting each statement, the interpreter calls back into the
debugger, which updates watch variables, tests for breakpoints, and gives
control back to the user interface if the user is single-stepping through the
program. The debugger updates the editor's source-code display to reflect the
current interpreted statement and to highlight any breakpoints.
The debugger's watch and examine processes call the translator to dereference
the variables being viewed.


The Translator


The translator compiles and runs the program. Compilation consists of running
the preprocessor, the lexical scanner, and the linker. If the compile was
successful, the program begins running. The translator builds and interprets a
one-line program with a call to the main function, which can be anywhere in
the source code. Command-line arguments are simulated by the debugger, which
passes them to the translator. The translator constructs argc and argv
arguments to the main function.


Preprocessor


The preprocessor translates a source-code file with preprocessing directives
into a source-code file ready to compile. The debugger has passed the address
of the editor's source-code buffer to the translator. The translator passes
the preprocessor the source-code buffer address and the address of a buffer to
receive the preprocessed source code.
The preprocessor strips all comments and unnecessary white space. It defines
and resolves macros and processes compile-time conditional directives. It
inserts a short C comment for each nonblank source-code line. This comment
identifies the file and line number of the source-code line. File numbers
represent the source-code file in the editor buffer and all source-code files
included by the #include preprocessing directive. These file/line number
comments provide line-number information for error reporting and debugger
actions.


Lexical Scanner


When the preprocessor returns with no errors, the translator calls the lexical
scanner to convert the preprocessed source code into language tokens. Tokens
are character values that represent discrete language elements. Each C keyword
and operator is a token and each constant is a token that identifies the
constant and the constant's value. Each unique identifier is a token that
identifies the identifier and an integer offset into a symbol table. The
scanner recognizes function declarations and puts them into a symbol table.
Subsequent uses of the same identifier are assigned a function-call token and
an integer offset into the function table. The scanner searches the
standard-library table for standard-library function names and resolves them
to their own token and an integer value that identifies the function. The
scanner is ahead of traditional scanners in its treatment of function names.
It also recognizes statement labels, puts them into a table, and resolves goto
references to them.
When it is finished, the lexical scanner has produced a stream of tokens ready
to be linked. Identifiers in the token stream are resolved to point to their
respective symbol- or function-table entries.


Linker


The linker passes through the program's token stream and resolves global
declarations, building tables of global variables and function prototypes and
initializing the global variables. For each statement block in each function
definition, the linker builds tables of local variable declarations and
initializes the static local variables. Each function has a table of parameter
variables and local variables. These tables are used to build a run-time
declaration and initialization of the variables when the function is called. 


Interpreter


The interpreter executes the program by interpreting tokens in the token
streams of functions. The translator calls the interpreter and tells it to
begin executing the tokens in the main function. The interpreter interprets
the tokens in main and returns when main returns. The main function, of
course, may call other functions, and the interpreter processes these calls.


Statements


Quincy's interpreter executes code one statement or statement block at a time.
A statement block is initiated when Quincy sees the left brace in the token
stream. The statement process initializes the block's local variables and then
calls itself recursively until it sees a right brace.
Each individual statement is examined to see if it is a flow-control keyword
(goto, if, else, while, do, for, switch, case, default, return, break, or
continue). If so, the interpreter evaluates the controlling expressions and
executes the statements that should execute as a result of the flow control.
Statements that are not flow-control keywords are assumed to be expressions.


Expressions


Quincy evaluates expressions by interpreting the tokens and performing a
recursive-descent parse at run time. Quincy employs an expression evaluation
stack, which can contain entries of any data type that can be computed by
expression evaluation, passed to a function as an argument, or returned from a
function. The stack contains lvalue entries that consist of indirect pointers
to the values in variable memory and rvalue entries that consist of values
themselves. Constants and addresses are examples of rvalues. Variable
references are examples of lvalues. The expression evaluator evaluates each
element of the expression and pushes the result on the stack. Operators in the
expression act on values that are already on the stack. The binary addition
operator, for example, assumes that the stack has the left value already
evaluated and pushed. The operator's function calls the expression evaluator
recursively to evaluate the right value. Then it pops the two values, sums
them, and pushes the result. Other operators behave similarly. The position of
the operator in the recursive descent determines its precedence. Operators
with the same precedence are processed in the same position. Their
associativity is determined by the sequence in which they are processed.
A translator that strives for maximum run-time performance converts
prefix-notation expressions to postfix notation and evaluates them in postfix
order. The burden of the recursive descent is borne by the compiler rather
than the interpreter. As a result, it takes longer to prepare a program to run
than it does when the run-time interpreter processes the recursive descent.
Quincy intentionally produces programs that run slower to gain the advantage
of rapid turnaround in the tutorial environment.
Quincy similarly evaluates structure-member operations and subscript operators
at run time.


Function Calls


One element in an expression can be a function call. When the expression
evaluator sees one, it suspends interpreting the current function, saves its
context, and prepares to interpret the called function. First, the interpreter
checks the types of the arguments in the call against the function's
prototype. When that test has passed, the interpreter calls the expression
evaluator once for each argument in the function call's argument list. This
operation pushes the arguments onto the expression-evaluation stack.
The interpreter tests to see if the function being called is in the user's
program or taken from the standard library. If it is a user function, the
interpreter pops the arguments from the stack and initializes parameter
variables with the argument values. Then the interpreter begins executing the
function by executing its first, outer statement block. When that block has
finished executing, the interpreter restores the context of the function that
made the call and resumes executing it.



Library Functions


When a program calls a Standard-C library function, the interpreter passes
control to a process that executes the library function. Library functions are
not interpreted. Quincy uses the standard-library functions of the compiler
with which it is compiled to service standard-library function calls from the
interpreted program. Depending on which function is called, the interpreter
pops the arguments into local variables, calls the standard-library function,
and pushes the returned value, if any, onto the expression-evaluation stack.
If the library function uses the keyboard or screen, the interpreter notifies
the IDE to relinquish the screen to the run-time system. This is necessary
when you step through the code one line at a time. After the library function
executes, the interpreter notifies the IDE that it can take the screen back if
it needs to.
When the function is of printf or scanf form, the interpreter uses the host
compiler's vprintf and vscanf functions to build the variable argument list.
Quincy keeps a table of file handles that the interpreted program opens,
deleting table entries when the program closes the files. If the program
terminates without closing all the files it opened, the interpreter closes
them. This behavior emulates that of a program running under DOS and prevents
an errant interpreted program from gobbling up DOS's limited number of file
handles.
Quincy does not recognize standard-library function calls unless they have
been declared, so the using program must include the standard header files
that declare library functions. The prototypes assure that the function calls
pass the correct types and number of arguments.


Memory Allocation


The interpreted program uses Quincy's heap for mallocs. Quincy uses the DOS
system heap. To prevent an interpreted program from allocating memory and not
freeing it, the interpreter maintains a table of allocated memory buffers and
frees any buffers left allocated after the interpreted program terminates.


Error Processing


Quincy can detect source-code errors at any time during translation. A common
error-processing function accepts an error number as a parameter and performs
a longjmp call to restore the interpreter to its condition just prior to
beginning the translation. 


Future Directions


Quincy evolved from a K&R interpreter developed nearly ten years ago. When I
separated the interpreter from the user interface and added support for the
ANSI language extensions, I had a long-term goal in mind other than the
immediate need for a tutorial tool. I wanted a C-interpreter engine I could
install wherever it made sense--as a scripting language, a macro language, or
a portable visual programming environment. Separating the interpreter from its
operating environment was the first step toward that goal. Other improvements
should rebalance the compile and run-time responsibilities and improve
performance. The translator should resolve the expression evaluation at
compile time. The interpreter should use tokens as the argument for a finite
state machine rather than to the large switch statements it currently uses.
The compiler should support incremental compiles to rebuild only those
discrete program entities--functions--that you changed since the last compile.
Whether or not Quincy ever gets those improvements depends on time available
to do the work and how compelling a need I feel to get it done. 


How to Get Quincy 


Quincy is available to download from the DDJ Forum on CompuServe and on the
Internet by anonymous ftp at site ftp.mv.com. Alternatively, you can send a
diskette and a stamped, addressed mailer to me at Dr. Dobb's Journal, 411
Borel, San Mateo, CA 94402 and I'll send you a copy of the source code. Quincy
is free, but if you want to support my Careware charity, include a dollar for
the Brevard County Food Bank.
Figure 1 Quincy's two primary subsystems--the integrated development
environment (IDE) and the translator.




























Special Issue, 1994
The Dylan Programming Language


A small, efficient object-oriented language 




Tamme D. Bowen and Kelly M. Hall


The authors were formerly associated with the Laboratory for Applied Logic of
the University of Idaho. Copyright (C) 1993, Laboratory for Application Logic,
University of Idaho. All rights reserved.


Dylan, an object-oriented dynamic language developed by Apple Computer, is
designed to replace existing static languages for the development of large
software systems, yet remains small and efficient enough for the next
generation of portable computers. Dylan was developed from the language
Scheme, augmented with the Common-Lisp Object System (CLOS). 
In this article, we will model Dylan's type system. In doing so, we will
formally define the terms class, method, generic function, and instance. We
will also discuss features Dylan provides for efficiency and security.
Function descriptions have been written to resemble that of the Haskell
programming language. For more information on Dylan, refer to Dylan: An
Object-Oriented Dynamic Language, by the Apple Computer Eastern Research and
Technology Group (1992) and "A Taste of Dylan," by David Betz (DDJ, October
1992). 


Major Concepts 


For the most part, Dylan can be characterized by two main concepts: objects
and functions. The core of Dylan implements objects and functions, but omits
many of the features you need to write useful programs. Dylan extends the core
with ten required libraries that provide control flow, numbers, and abstract
data types. 
All data values in Dylan are considered "objects," organized by groups of
"classes." All objects are "instances" of at least one class, where the
classes are organized into a heterarchy (direct acyclic graph) and inherit the
features of classes above themselves in the heterarchy. The top of the
heterarchy is the class <object>, the most general class in Dylan. 
Classes determine the structural characteristics of their instances by
specifying "slots," which hold the object's local state. Each slot has a name
which identifies it, and functions (called "getters" and "setters") are used
to read and write the values stored in the slots. 
Functions in Dylan are objects that perform actions corresponding to
functions, procedures, methods, and messages in other languages. Dylan has two
types of functions: methods and generic functions. 
Methods are functions that contain a typed argument list and a body of code.
Methods are defined with typed formal parameters and can be applied to
arguments with either the same class as the defined parameters or to
subclasses of the defined parameters. If functions are applied to incorrect
argument types, an error is signaled. Note that this type checking is
performed at run time even though static analysis is possible. 
Generic functions are ways to group together zero or more methods (each having
different types) under the same function name. The generic function examines
the types of the arguments and chooses the most appropriate method to invoke.
If no appropriate method can be found, an error is generated. Since generic
functions can be created and modified at run time, all type checking is
dynamic. Generic functions are by definition overloaded and give Dylan ad hoc
polymorphic capabilities. 
The Dylan reference manual is often vague in its description of how classes,
methods, and generic functions work together. To better understand Dylan, we
have formalized the relationship of these entities. This section contains an
abstract syntax and description of the basic operations on classes and
functions. No static validity functions are provided since Dylan performs no
static analysis. 
Table 1 provides a list of the abstract syntax of classes, methods, and
generic functions--note that Symbol and Expr are not defined. Symbol is
analogous to Char+, and Expr corresponds with function bodies in Scheme. 


Classes 


The acyclic directed graph of classes can be represented in many ways. Our
representations will be a list of Class, where classname is the name of a
class, sl is a list of that class's slots, and subclasses is a list of classes
that directly inherit properties from classname. The basic operations on
classes can be coded; see Example 1. 
New classes are created in Dylan with the function make-class, which takes
three parameters: the name of the new class, a list of the direct superclasses
of the new class, and a list of the new class's slots. For new classes, Dylan
defines functions to access and update the slots. These getter and setter
functions are added to the generic-function space later. Example 2(a) provides
code to define new classes. In Example 2(b), FixLinks and Update add the new
class to the heterarchy and make sure subclasses' lists get modified to
reflect the new class. NewClass adds new classes to the heterarchy by checking
that the class doesn't already exist. If the class does exist, it is removed
and then added to the heterarchy again. Otherwise, the new class is added to
the old class list and the subclass pointers are updated accordingly. Example
2(c) presents another operation on classes--making new instances of them. 


Generic Functions 


Generic functions in Dylan are overloaded functions that examine the types of
their parameters and invoke the most appropriate method based on those types.
We represented the space of all generic functions as a list, where each
element in this list is itself a list of parameter lists and a pointer to the
corresponding method that takes that parameter list. All top-level function
applications in Dylan take place through generic functions. 
The operations on generic functions include defining new generic functions,
removing generic functions, adding methods to existing generic functions,
removing methods from generic functions, and applying generic functions to
arguments; see Example 3. Note in Example 3(f) that SpecificMethod chooses the
most specific method to apply based on the types of the supplied parameters
and SchemeApply is the apply function in Scheme modified so that if a function
cannot be found in the Scheme top-level namespace, then the function name is
treated as a generic function. 


Methods 


Methods are Dylan's basic functional unit, taking a list of typed parameters
and returning a typed value. Methods can be defined but not applied by the
user. Defining a method automatically creates a new generic function with the
same name, and the new method is attached to the new generic function. A new
method is defined by creating a new, unique identifier, called a "key," for
this method, having Scheme bind the new key to the method's equivalent lambda
expression in the Scheme namespace, adding the parameter list and key to the
appropriate generic function, and creating a new generic function if required.
Defining a new method must then alter the generic function space and the
Scheme environment. 
In Example 4, NewMethod creates a new method by modifying the Scheme
environment and the generic function space. MkUniqueKey generates a new,
unique key in the generic function space, and bind binds the key and the
lambda expression in the Scheme environment. 


An Example 


To further illustrate how Dylan works, we'll use an example which calculates
the square root of a number using Newton's method (see Structure and
Interpretation of Computer Programs, by Harold Abelson and Gerald Sussman, MIT
Press, 1985). 
In Example 5, the Dylan implementation's call to define-method creates a
generic function called newtons-sqrt that has one method with a parameter of
type <object> (since x wasn't given a more specific class, it defaults to the
most general class <object>) and stores the lambda expression Scheme's
top-level environment. The local functions created with bind-methods do not
generate functions themselves--these functions are stored, like any other
Scheme local functions, in nested environments. 

Tracing the call (newtons-sqrt 4) illustrates how Dylan and Scheme work
together: The generic function newtons-sqrt is found and the argument class
<number> matches the method <object>, so the Scheme function associated with
that method is invoked with parameter 4. In the Scheme environment, the call
(sqrt 1) is evaluated, generating the call (close? 1) which, in turn,
generates the call (* 1 1). Since the asterisk (*) is not found in the Scheme
environment, it is reevaluated as a generic function in Dylan. Dylan matches
the types, dispatches the proper method to multiply guess, and continues. 


Conclusion 


In this article, we've focused on what sets Dylan apart from other functional
and object-oriented languages. In doing so, the features we haven't covered
include: 
 Slot options: Slots may initial values, initializing functions, and
storage-allocation modifiers. 
 Additional method functions: Methods can defer execution to more general
methods. 
 Security options: Methods, generic functions, and classes may be "sealed,"
which prevents any modifications to these items. 
 Efficiency option: Classes may be declared as "abstract" or "instantiable,"
preventing instantiation of classes that are too general to be implemented
efficiently. Abstract classes could be used as parents to more concrete,
instantiable classes. Generic functions can still be defined to operate on the
abstract classes since they can be overloaded to deal with all of the
instantiable classes. 
In any event, the Dylan language is a small, efficient way to get the benefits
of object-oriented programming without writing a new language from scratch.
Using the Scheme as a starting point, Dylan gives you the ability to define
new types (classes) and typed functions (methods). Polymorphism is fundamental
to Dylan, and is provided through inclusion (class inheritance) and ad hoc
polymorphic functions (generic functions). 
Dylan lacks type inference and static type checking. Type inference could help
verify correctness in Example 5, for instance, by inferring the type for the
parameter x as <number> instead of <object>. But inference is complicated by
Dylan's dynamic nature, which allows functions such as abs, +, *, and the like
to be overloaded at run time to operate on nonnumeric objects. This dynamic
nature also limits the use of any static type checking. 
Since Dylan can be readily built by adding two new namespaces on top of
Scheme--a graph representing the class heterarchy and a list of generic
functions, each with a list of specific methods. These new namespaces,
combined with new functions (to support the namespaces and replace the Scheme
functions) form the core of the Dylan language. 
Example 1: Basic operations on classes. (a) IsClass returns True if cl is a
valid class; (b) GetSlots returns the slot list for class cl; (c) GetKids
returns a list of direct descendants of class cl; (d) GetSupers returns a list
of all of superclasses for class cl; (e) GetSubs returns a list of all of the
subclasses for class cl, where unique removes duplicate items from a list and
element checks for membership in a list.
(a)
IsClass :: ClassName -> ClassList -> Boolean
IsClass (cl:ClassName) ([]:ClassList) = False
IsClass cl (c::cs) =
 if cl = c.name then True else IsClass cl cs
(b)
GetSlots :: ClassName -> ClassList -> SlotList
GetSlots (cl:ClassName) ([]:ClassList) = error "class not found
GetSlots cl (c::cs) =
 if cl = c.name then c.sl also GetSlots cl cs
(c)
GetKids :: ClassName -> ClassList -> ClassList
GetKids (cl:ClassName) ([]:ClassList) = error "class not found
GetKids cl (c::cs) =
 if cl = c.name then c.subclasses else GetKids cl cs
(d)
GetSupers :: ClassName -> ClassList -> ClassList -> ClassList
GetSupers (cl:ClassName) ([]:ClassList) = (CC:ClassList) = []
GetSupers cl (c::cs) CC =
 if element cl = c.subclasses
 then unique ((cl.name::GetSupers cl cs CC)@(GetSupers c.name CC CC))
 else GetSupers cl cs CC
(e)
GetSubs :: ClassName -> ClassList -> ClassList -> ClassList
GetSubs (cl:ClassName) ([]:ClassList) = (CC:ClassList) = []
GetSubs cl (c::cs) CC =
 if cl = c.name
 then unique (direct @ indirect)
 else GetSupers cl cs CC
where direct = c.subclasses
and indirect = fold '@' (map (\x. GetSubs x CC CC) c.subclasses)
Example 2: (a) Defining new classes; (b) adding the new class to the
heterarchy; (c) making new instances of classes where BuildRecord returns a
record with field names identical to the SlotNames it is passed.
(a)
NewClass :: ClassName -> ClassList -> SlotList -> ClassList -> ClassList
NewClass (n:ClassName) (pl:ClassList) (sl:SlotsList) (C:ClassList) =
 if IsClass n C
 then NewClass n pl sl (remove n C)
 else if fold and (map (\x. IsClass x C) pl)
 then FixLinks n pl (n,pl,sl)::C
 else error "superclass does not exist"
(b)
FixLinks :: ClassName -> ClassList -> ClassList -> ClassList
FixLinks (n:ClassName) ([]:ClassList) (CC:ClassList) = CC
FixLinks n (p::ps) CC = FixLinks n ps (Update n p CC)

Update :: ClassName -> ClassName -> ClassList -> ClassList
Update (n:ClassName) (p:ClassName) ([]:ClassList) = error
Update n p (c:cs) =
 if p = c.name
 then (c.name, c.sl, n::c.subclasses)::CS
 else c::(Update n p cs)
(c)
Make :: ClassName -> ClassList -> Instance
Make (n:ClassName) (CL:ClassList) =
 if IsClass n CL
 then BuildRecord unique (localslots @ superslots)
 else error "class not found
where localslots = GetSlots n CL
and superslots = fold '@' (map GetSlots (GetSupers n CL))
Table 1: Abstract syntax of Dylan classes, methods, and generic functions.
 Classlist = Class+
 Class = name:ClassName;sl:SlotList;subclasses:ClassList
 SlotList = EmptySlot*
 ClassList = ClassName+
 ClassName = Symbol
 EmptySlot = name:SlotName;pl:ParamList;body:Expr
 ParamList = Param*
 Param = name:Symbol;class:ClassName
 FunName = Symbol
 Generic = name:GenName;ml:MethodList
 MethodList = FunName*
 GFList = Generic*
 Instance = name:IName;slots:FullSlots;class:ClassName
 FullSlots = FullSlot*
 FullSlot = name:SlotName;class:ClassName
 IName = Symbol
 Key = Symbol
Example 3: (a) IsGF returns True if n is a generic function; (b) AddMethod
adds a new method to generic function n; (c) RemoveMethod and RMAux remove a
method with parameters pl from a generic function n; (d) NewGF creates a new
generic function, overriding any old methods that might be there; (e) RemoveGF
removes a generic function n; (f) ApplyGF applys a generic function to an
argument list.
(a)
IsGF :: FunNames -> GFList -> Boolean
IsGF (n:FunName) ([]:FGList) = False
IsGF n (g:gs) =
 if n = g.name then True else IsGF n gs
(b)
AddMethod :: FunName -> ParamList -> Key -> GFList -> GFList
AddMethod (n:FunName) (pl:ParamList) (key:Key) ([]GFList) = []
AddMethod n pl key (g:gs) =
 if n = g.name
 then ((g.name),(pl.key)::(g.methods)) :: gs
 else g :: AddMethod n pl key gs
(c)
RemoveMethod :: FunName -> ParamList -> GFList -> GFList
RemoveMethod (n:FunName) (pl:ParamList) ([]GFList) = error
RemoveMethod n pl key (g:gs) =
 if n = g.name
 then (g.name,(RMAux pl g.methods)) :: gs
 else g :: RemoveMethod n pl key gs
RMAux :: ParamList -> MethodList -> MethodList
RMAux (n:ParamList) ([]:MethodList) = error
RMAux pl key (m:ms) =
 if foreach i in pl (pl.i.type = m.pl.i.type)
 then ms
 else m :: RMAux pl ms
(d)

NewGF :: FunName -> GFList -> GFList
NewGF (n:FunName) (GF:GFList) =
 if IsGF n GF
 then NewGF n (RemoveGF n GF)
 else (n,[]) :: GF
(e)
RemoveGF :: FunName -> GFList -> GFList
RemoveGF (n:FunName) ([]:GFList) = error
RemoveGF n (g:gs) =
 if n = g.name then gs
 else g:: RemoveGF n gs
(f)
ApplyGF :: FunName -> ParamList -> GFList -> Object
ApplyGF (n:FunName) (pl:ParamList) ([]:GFList) = error
ApplyGF n pl (g:gs) =
 if n = g.name
 then SchemeApply (SpecificMethod pl g.methods) pl
 else ApplyGF n pl gs
Example 4: NewMethod creates a new method by modifying the Scheme environment
and the generic function space.
NewMethod (n:FunName) (pl:ParamList) (l:Expr) (GF:GFList) (SE:Env) =
let key = MkUniqueKey GF in
if IsGF n GF
 then (AddMethod n pl key GF) , (bind key l SE)
 else NewMethod n pl l (AddMethod n [] Nil GF)
Example 5: Calculating the square root of a number using Newton's method.
(define-method newtons-sqrt (x)
 (bind-methods ((sqrt1 (guess)
 (if (close? guess)
 guess
 (sqrt1 (improve guess))))
 (close? (guess)
 (< (abs (- (* guess guess) x)) 0.0001))
 (improve (guess)
 (/ (+ guess (/ x guess)) 2)))
 (sqrt1 1)))



























Special Issue, 1994
The Oberon Programming Language


The new Pascal




Josef Templ


Josef is associated with ETH Zurich and can be contacted at
jt@swe.uni-linz.ac.at.


Oberon is a general-purpose programming language that evolved from Pascal and
Modula-2. This evolution included improvements such as garbage collection, a
streamlined module concept, numeric type inclusion, and null-terminated
strings, but Oberon's principal unique feature is the concept of type
extension. Type extension makes Oberon an object-oriented language. However,
Oberon's approach to object-orientation differs considerably from that of
other extensions of Pascal or Modula-2. Oberon implementations for DOS,
Windows, Amiga, Mac, and UNIX are available via anonymous ftp from
ftp.inf.ethz.ch (129.132.101.33) subdirectorypub/Oberon.


Modules


The most important difference between Pascal and Modula-2 is the decomposition
of programs into modules. In Oberon, a program is an extensible set of
modules; in other words, there is no main module. The unit of program
execution in Oberon is the "command," an exported, parameterless procedure.
Thus, a module in Oberon is:
The construct for expressing data abstraction by means of an import/export
mechanism, like in Modula-2.
The unit of compilation, including type checking across module boundaries.
The unit of program extension. A module is supposed to be loaded on demand
whenever it is used first.
A container of commands which may be activated from the operating environment.
Example 1 illustrates a typical Oberon module. The import list of M lists all
modules which should be accessible inside M. Module M is said to be a "client"
of M1 and MyModule. A client can only access those objects of an imported
module which are exported. The asterisk (*) is used to signal export of an
object; for example, T* means that type T is to be exported. In contrast to
Modula-2, Oberon unifies the definition and implementation modules by means of
the export mark, which also allows you to selectively export record fields. In
addition, the import list lets you rename the imported module to enable simple
substitution of one module by another. In Example 1, MyModule is imported
under the alias M2. Imported objects such as M1.P are always qualified with
the exporting module in order to avoid name clashes and to increase program
readability and maintainability.


Infinite Heap


Inappropriate deallocation of dynamic storage is a well-known source of
catastrophic program failures. In the context of nonextensible applications,
such as statically linked programs, this problem can (in theory) be mastered
by a careful programmer; for extensible programs, it cannot! There is always
the possibility that a module may be added to a program later on; such an
extension can introduce additional references to objects of which the
implementor of the core modules is unaware. Therefore, Oberon neither requires
nor allows explicit deallocation of dynamic storage. Instead, it assumes a
conceptually "infinite heap." As in Lisp or Smalltalk, this concept can be
implemented on today's finite hardware by a garbage collector which knows
about the internal structure of objects and the roots where objects are
anchored. Unused objects can be identified and deallocated automatically and
safely. This facility makes Oberon programs very reliable, significantly
decreasing debugging time. Automatic garbage collection is possible in Oberon
because Oberon replaces typing loopholes such as records with variant parts
with the type-safe concept of "record extension." 


Basic Types


Oberon's basic types are fairly familiar: BOOLEAN, CHAR, SET, SHORTINT,
INTEGER, LONGINT, REAL, and LONG-REAL. Between the numeric types, there is a
type-inclusion relation: The larger type includes the smaller ones.
With LONGREAL, REAL, LONGINT, INTEGER, and SHORTINT, assignment from a smaller
to a larger type is allowed. Operations between different numeric types yield
the larger type as result.
SET is a rather crude approximation of the mathematical set concept. It
denotes the power set of the integers between 0 and MAX(SET), which is an
implementation-dependent constant (typically, 31).


Type Constructors


In addition to the basic types, Oberon allows construction of user-defined
types by means of the type constructors ARRAY, RECORD, and POINTER. Unlike
standard Pascal, however, Oberon offers the type constructor PROCEDURE, which
defines procedure types. For example, TYPE Poly2 = PROCEDURE (x0, x1, x2:
REAL): REAL; introduces a procedure type denoting quadratic polynoms.
Variables of type Poly2 can have as values procedures with appropriate
parameters. Procedure variables introduce a level of indirection for procedure
calls and can be used to express dynamic binding--a prerequisite for
object-oriented programming.
On the other hand, TYPE String = ARRAY 32 OF CHAR; defines a new type String
as an array of 32-character elements indexed from 0 to 31. In contrast to
Pascal, the lower bound is always 0, for several reasons:
1. Open arrays already start at index 0 in Modula-2. 
2. The MOD-operator (positive remainder) yields results including 0. Lower
bounds at 0 fit perfectly, for example, for an implementation of a cyclic
buffer. 
3. There is a nice invariant when iterating over an array.The control variable
contains the index of the next element to visit as well as the number of
already visited elements.
Fixing the lower bound at 0 practically eliminates off-by-one errors.
In contrast to Pascal, Oberon lets you declare formal parameters as open
arrays without specifying the number of elements. Such procedures can be
called with arrays of arbitrary length as actual parameters. Unlike Modula-2,
Oberon allows you to specify multidimensional open arrays.
String literals are compatible with arrays of characters. String assignment
and comparison are defined within the language, based on the assumption that
strings are always null-terminated.
Record-type constructors such as TYPE T = RECORD x: CHAR; y: INTEGER END, are
similar to those in Pascal except that variant records have been replaced by
record-type extension. This means that a new record type can be defined as an
extension of an existing one. For example, in TYPE T1 = RECORD (T) z: REAL
END;, type T1 is said to be a direct extension of type T, which is a direct
base type of T1. The extended type inherits all fields of the base
type--everything that can be done with the base type can also be done with the
extended type but not vice versa. Therefore, Oberon allows you to assign
variables of an extended type to variables of the base type.
In contrast to Pascal's variant records, record extension is an open-ended
construct; an extended record may well be defined in a module other than the
corresponding base type. Surprisingly, record-type extension suffices to make
Oberon an object-oriented language.
The code TYPE P = POINTER TO T; is similar to Pascal's ^T. POINTER TO T
constructs a new pointer type that denotes references to variables of type T.
Pointer types inherit the compatibility relations of their base types--a
pointer to an extended record is regarded as an extension of a pointer to the
base record. Assigning a pointer variable of an extended type to a pointer
variable of the base type introduces the notion of "dynamic type." 



Static and Dynamic Type


The static type of a variable in Oberon is the type specified together with
the declaration of the variable. The dynamic type of a variable is the type
the variable assumes at run time. The dynamic type can only be an extension of
the static type. 
Consider, for example, two pointer variables v1: T1 and v: T, where T1 is an
extension of T. If v1 holds a reference to a variable of type T1--that is, by
a preceding NEW(v1)--the assignment v := v1 assigns a reference to a variable
of an extended type to a variable of the base type. After the assignment, v's
dynamic type is T1, the static type remains T. This rule also applies to
passing extended records by reference (that is, as VAR-parameters). Only
base-type fields can be accessed via v, but the extended fields of T1 still
exist.


Reverse Assignment


The reverse assignment v1 := v is only meaningful if the dynamic type of v1 is
at least T1, the static type of v1. In Oberon, this property can be tested by
the type-test operator "IS". The expression v IS T1 yields True if, and only
if, the dynamic type of variable v is at least T1. The assertion that a
variable is of a given dynamic type can be expressed by the type guard,
written as v(T1). This asserts (at run time) that variable v is at least of
dynamic type T1 and allows access to the extended fields of T1. The reverse
assignment can be written as IF v IS T1 THEN v1 := v(T1) ELSE _ END. If the
same type guard has to be applied several times in a statement sequence, the
With statement can be used as a regional type guard; see Example 2.


Object-Oriented Programming in Oberon


The paradigm of object-oriented programming is based on the assumption that
objects communicate with each other by sending messages. In the real world,
messages are often represented as letters, notes, videos, e-mail, and the
like--that is, as objects themselves. A powerful OOP style, therefore, must
let you represent messages as explicit objects rather than as parameters of
procedure calls (SIMULA-style OOP). Oberon can use records to represent both
objects and messages and record extension to create hierarchies of both object
and message types. In Example 3, the types CopyMsg and ConsumeMsg are derived
from the base type ObjMsg. 
To respond to this sort of messages, Oberon expresses the behavior of an
object using the "message handler" (or handler) procedure. Example 4 is an
example of a simple handler. There are two parameters, one describing the
receiver of the message and the other representing the message sent to this
object. The message is passed as a record VAR-parameter, allowing use of type
tests to identify messages. Messages of unknown type are usually ignored.
Error messages such as Smalltalk's message-not-understood are not appropriate
here since--in analogy with the real world--it is perfectly legal to ignore a
message without committing suicide.
The handler is usually bound to an object via a record field of procedure
type; see the simple object model in Example 5(a). New object types are
introduced as extensions of the base type, as in Example 5(b). To generate a
new object, you create a new variable and assign a handler to the variable's
handle field; see Example 5(c). To send a message, you set up a message record
and call the handler of the receiving object with the receiver and the message
as parameter; see Example 5(d). Looking at the possible variations of the
fundamental message-send pattern o.handle(o, m) reveals many important OOP
concepts.


Inheritance


An object can inherit behavior by simply activating another handler. In
Example 6, for instance, MyHandle, the handler of MyObject, inherits behavior
by calling Objects.Handle for all messages that are not specially handled.
Instead of an object, a module is used to select a handler, and static binding
takes place. The receiver and message remain unchanged. This technique of
expressing inheritance has several properties: 
Inheritance is programmed explicitly, rather than predefined by the language.
A wide variety of inheritance relations is possible.
No special syntax or semantics in the language are required.
Subtyping and subclassing are disentangled.


Delegation


In general, delegation means inheriting from an object. In the context of our
message-sending pattern, delegation means using different objects for
selecting a handler and acting as receiver. In Example 6, exchanging the ELSE
branch with ELSE traits.handle(O, M) means that object O delegates all unknown
messages to object traits, which is supposed to provide suitable standard
behavior. The receiver and the message remain unchanged.


Forwarding


An object can also forward messages to other objects. The most prominent
examples are container objects, which handle several messages in a special way
and forward all others to their components. Again in Example 6, an ELSE branch
of the form ELSE x.handle(x, M) means that object O forwards all unknown
messages to object x. M remains unchanged; the receiver of the new message
becomes x. 


Broadcasting


An important application of forwarding is broadcasting a message to a group of
receivers, as in Example 7. Typical applications of broadcasting are container
objects that consist of more than one component. In the Oberon system,
broadcasting is also used heavily for sending messages to all visible objects
on the screen in order to provide consistency between the model and the view.
Note that the Broadcast procedure is a generic iteration construct that is
independent of a particular message type--that is, it works for all messages,
even those introduced years later in additional modules.
Through forwarding, messages can arrive at an object via different paths at
different times. To handle this, an object needs information about the context
of a received message. To this end, Oberon System 3 uses time stamping of
messages to detect multiple arrivals and a dynamic link that points back to
the sender of a message. 


Efficiency Considerations


OOP based on message records and handlers seems, at first glance, rather
inefficient and inconvenient. In practice, however, efficiency is not a big
problem since a type test can be carried out in a constant, very short time
(two loads plus one compare). It's also possible to speed up message
identification by grouping messages, either using type extension or by
introducing an explicit tag. Message forwarding is very efficient, and is
independent of message-record size. Furthermore, specialized procedure
variables can be introduced within an object to reduce the message-dispatching
effort to the call of a procedure variable.
Event records are a similar technique used in contemporary operating or
windowing systems. They are expressed by variant records (unions in C) that
are not open ended and type safe. However, inefficiencies in the Macintosh or
Windows operating system, for example, do not originate from the use of event
records. Oberon's message records are "cultivated" event records.


Run-Time Environment



The Oberon programming language has been developed together with an operating
environment, the Oberon system. The language, however, makes very few
assumptions about the environment. If not available from a given operating
system, dynamic module loading, command invocation, and automatic garbage
collection can be introduced by a small run-time system. In principle, it is
possible to forego dynamic loading and to create traditional, statically
linked applications, but this is anachronistic for the challenges of today's
software systems. Automatic garbage collection is indispensable for reliable,
extensible software systems and is probably the biggest culture clash between
Oberon and other compiled languages such as Pascal, C, or C++.


References


Wirth N. and M. Reiser. Programming in Oberon: Steps Beyond Pascal and
Modula-2. Reading, MA: Addison-Wesley, 1992. 
Example 1: Typical Oberon module.
MODULE M;
IMPORT M1, M2 := MyModule;
TYPE
 T* = RECORD
 f1*: INTEGER;
 f2: ARRAY 32 OF CHAR
 END ;
PROCEDURE P*(VAR p: T);
BEGIN
 M1.P(p.f1, p.f2)
END P;
END M.
Example 2: Reverse assignment.
WITH v: T1 DO
 v treated as being declared with static
 type T1 in this statement sequence
END
Example 3: The types CopyMsg and ConsumeMsg are derived from the base type
ObjMsg.
TYPE
 Object = POINTER TO ObjDesc;
 ObjDesc = RECORD ... END ;
 ObjMsg = RECORD END ;
 CopyMsg = RECORD(ObjMsg)
 deep: BOOLEAN; cpy: Object
 END ;
 ConsumeMsg = RECORD(ObjMsg)
 obj: Object; x, y: INTEGER
 END ;
Example 4: A simple Oberon handler.
PROCEDURE Handle (O: Object; VAR M: ObjMsg);
BEGIN
 IF M IS CopyMsg THEN handle copy message
 ELSIF M IS ... handle further message types
 ELSE ignore
 END
END Handle;
Example 5: The handler is usually bound to an object via a record field of
procedure type.
(a)
TYPE Object = POINTER TO ObjDesc;
 ObjMsg = RECORD END ;
 Handler = PROCEDURE (O: Object; VAR M: ObjMsg);
 ObjDesc = RECORD
 handle: Handler
 END ;
(b)
TYPE
 MyObject = POINTER TO MyObjDesc;
 MyObjDesc = RECORD (Object)
 extended fields

 END ;
(c)VAR o: MyObject;
NEW(o)
; o.handle := MyHandle;
(d)
VAR m: CopyMsg;
m.deep := TRUE; m.obj := NIL;
o.handle(o, m);
Example 6: MyHandle, the handler of MyObject, inherits behavior by calling
Objects.Handle.
PROCEDURE MyHandle (O: Object; VAR M: ObjMsg);
BEGIN
 WITH O: MyObject DO
 IF M IS CopyMsg THEN handle copy message
 ELSIF M IS ... handle further message types
 ELSE Objects.Handle(O, M)
 END
 END
END MyHandle;
Example 7: Broadcasting a message to a group of receivers.
PROCEDURE Broadcast(VAR M: ObjMsg);
 VAR o: Object;
BEGIN o := firstObj;
 WHILE o _ NIL DO o.handle(o, M); o := nextObj END
END Broadcast;






































Special Issue, 1994
Introducing Interoperable Objects


The computing industry's next big struggle




Ray Valds


For those of you who follow events in the computer industry, I have some good
news: The operating-system wars are over, as are the language wars, as well as
the application-framework wars. The bad news is that nobody won. Instead, the
struggle has moved over to the arena of interoperable objects.
What are interoperable objects? These are objects that go beyond the usual
boundaries of traditional objects--the long-standing boundaries imposed by the
programming language, the process address space, and the network interface.
We've coined the term to refer to the convergence of certain trends in
software technology: the continued evolution of object-oriented programming
into the areas of language-independent objects, distributed computing, and
compound-document technologies. 
Interoperable objects are in many ways a goal, rather than a practical
reality. Often, what exists today is more of a spec or a white paper, instead
of a working program or software component. Even so, today there are
demonstrable technologies that fall (at least partially) under this rubric,
including Microsoft's Object Linking and Embedding (OLE), IBM's System Object
Model (SOM), OMG's Common Object Request Broker Architecture (CORBA), CI Labs
OpenDoc, NextStep's Portable Distributed Objects (PDO), Novell's Appware
Distributed Bus (ADB), and Taligent's Application Environment (TalAE). The
purpose of this Special Report is to provide you with an in-depth look at
these technologies, each of which is vying to be the dominant application
platform for the rest of this decade, sometimes in conjunction with one or two
of the others.
First, there's a small matter of semantics. Some observers refer to this
phenomenon with different terms: distributed objects, component objects,
compound documents, or middleware. However, as you'll see, the other terms are
not quite accurate. For example, "distributed objects" does not quite cover
some of the major contenders in this arena. Both OLE and OpenDoc, for example,
do not provide a distributed version that's shipping. Other terms lean toward
the proprietary domain. For example, it is conceivable that "component object"
will be a candidate for trademarking. Some terms highlight only one aspect of
the technology. For example, "compound documents" refers mostly to the higher,
application-oriented layers of the system. Likewise, "middleware" refers only
to the midlevel components of the system (the layer of abstraction above the
operating system and below the application), and is not necessarily object
oriented. Compared to the alternatives, interoperable objects is the best of
the lot, even though it may disguise the fact that we are sometimes comparing
apples, peaches, and pumpkin pie (that is, paper specifications versus
proprietary implementations versus partial solutions). Keep in mind that, at
present, there is no shipping system that does the entire job.
So why should you care about interoperable objects? Well, unlike some recent
technological trends such as multimedia and pen-based computing, interoperable
objects are likely to comprise the broad mainstream of computing over the next
few years--as mainframe-based servers continue to link up with departmental
computers and GUI-based desktop PCs, throughout the enterprise and across
networks. Interoperable objects represent a convergence of certain trends that
seems inevitable, although it is far from clear which particular technology
(or vendor) will prevail. 
The ongoing struggle between these emerging technologies has led to many press
reports mentioning acronyms, alliances, announcements, and abdications,
without explaining the underlying technologies. If you are a software
developer or information-technology professional, you will need to become
familiar with these concepts over the next few months or years. Many companies
are now evaluating use of one or more of these technologies on a strategic
basis. To assist in this process, we decided that a Dr. Dobb's Special Report
on Interoperable Objects would be particularly valuable. As much as possible,
we've tried to get the information for you directly from the source: the
architects, designers, and technical staff at the companies promulgating each
alternative. In some cases, the information here has not been released to the
public until now (the Taligent article, for example). In other cases, there is
a concise summary of a flood of material (on OLE, for instance). In the case
of technologies that have not yet been released, working code is not available
to the public. However, for the case of SOM and OLE, we present working
implementations of a comparable program so that you can see for yourself the
concrete details involved in making an object interoperable.
This Special Report is structured into two principal parts. This reflects the
division, on the one hand, between system-level object models such as CORBA,
SOM, and Microsoft's Component Object Model (COM), and, on the other,
application-level technologies such as OpenDoc, OLE, and TalAE. You may notice
that there are two articles on Microsoft OLE. This is because the facilities
in OLE span both the system-level domain (by way of COM), as well as the
application-level domain (by way of the compound document, and the
linking-and-embedding services in OLE). Other application-level systems such
as OpenDoc and Taligent rely upon IBM's SOM to provide the underlying object
model.
The rest of this article tries to set the context for the other contributions
to this Special Report, without duplicating the material. For additional
background on interoperable objects, see also "Interoperable Objects," by Mark
Betz (DDJ, October 1994).


Distributed Computing Before Objects


Although interoperable objects are new, distributed computing is not. The
technologies for distributed computing provide in some ways the roots of
interoperable objects. Distributed computing has existed, in one form or
another, since the 1960s. Possibly the largest single implementation is
American Airlines' SAABRE system, considered by many to be that company's most
important asset. SAABRE is an airline-reservation system that was developed
more than 20 years ago and is still being enhanced. Reportedly, it consists of
a single application program running on top of bare hardware--which, in this
case, consists of a fleet of mainframes, a Texas-sized disk farm, and over
20,000 user terminals. There is no operating system, no high-level language,
never mind any object orientation. It is a distributed system nonetheless, and
perhaps the ultimate example of a proprietary system. Its fully integrated
design trades off complexity and flexibility in favor of maximally efficient
use of hardware.
Moving closer to the present, client/server architectures are now the norm for
distributed computing systems. A client/server architecture distributes the
processing across a network consisting of multiple clients (say, desktop PCs)
connected to one or more servers. Today, many commercial packages and tools
support development of client/server applications--allowing for choice of
vendor, language, operating system, network, and data model. At present, most
of these technologies (such as PowerBuilder) are procedural instead of relying
on objects. You can also implement a client/server system by using C++ and
Windows on the front end, and talking to an SQL server on the back end.
However, the application objects stop at the boundary of the running
executable. Many of the popular client/server tools assume you are building
some kind of database application. If your client program needs to talk to
something other than an SQL server, you may be out of luck. As Ken Haas of
Intellicorp points out, client/server computing has evolved in three phases:
The first was distributing the data, then came distributing the user
interface, and still to come is full distribution of the complex business
logic of the enterprise. 
For those who need the flexibility that a mass-market shrink-wrapped tool
cannot provide, an emerging standard is Distributed Computing Environment
(DCE) from the Open Software Foundation (OSF). DCE implements an
industrial-strength distributed computing environment on top of existing
operating systems and platforms (usually UNIX workstations, but also
mainframes and PCs). Although DCE is procedural rather than object oriented,
many of the concepts in DCE serve as precursors for the interoperable object
technologies found in CORBA, SOM, and OLE.


The OSF DCE Environment


DCE consists of a number of tools, libraries, and components that together
allow great flexibility in building client/server applications. It is
worthwhile to spend a moment going over the facilities in DCE, because
analogous versions of these appear in the technologies discussed in this
Special Report, not always with acknowledgment.
DCE provides a remote procedure call (RPC) facility that is the basis for all
communication in DCE. When a client makes a request to a server, the client
process invokes an RPC on the server process. At the level of client code, it
looks like a call to a local function. The RPC mechanism encapsulates and
hides what may be an elaborate sequence of events. The RPC facility locates
the server using a naming service and binds to it and marshals parameters and
transmits them across the network, doing any necessary fragmenting and
reassembly (in the case of large arrays, for example) as well as any data
conversion (because of differences in byte ordering). 
The blueprint that drives the RPC process is the interface between client and
server. The formal specification of this interface is written in a language
called IDL (Interface Definition Language). IDL constructs resemble the
function prototypes in C, plus typedefs and defined constants. The IDL
specification to your application is run through an IDL compiler, which
produces "stubs" for the client and server portions of your application, as
well as header files. The client and server code uses the header files to
declare the local stub functions. Client and server modules are compiled and
linked in with their respective stub object files. Again, this process is
similar to that found in CORBA, SOM, and OLE.
Although DCE programs can be written in any language supported by the tools
(including Fortran), the underlying concepts resemble, or are derived from, C.
Interoperable object technologies such as CORBA, SOM, and OLE extend this
approach by adding an "object model," a way to represent objects beyond the
bounds of a single application program or language. Although the technologies
are different, you'll find many of the concepts in DCE--such as interfaces,
stubs, proxies, IDL compilation, resource naming, unique identifiers--have
counterparts in CORBA, SOM, and OLE.


The Object Model


The object model is the system-level foundation that makes possible
application-level facilities such as linking-and-embedding, scripting, and
compound-document construction. As mentioned previously, you can have entirely
different application-level frameworks or systems--for example, TalAE and
OpenDoc--built on top of the same object model (in this case, SOM). Just as
DCE extends the C-function-call model across the boundaries of language,
address space, and the network, so do CORBA, SOM, and OLE extend a C++-style
object model across those same boundaries. So, instead of a procedural
interface between client and server, you can partition your distributed
application using an object-oriented factoring. 
Why would you want to do this? Well, this is not the place to convince you of
the benefits of inheritance, encapsulation, and polymorphism. Suffice to say
that many system designers have found that factoring a complex system into
objects greatly assists in managing complexity. On the other hand, if you are
a experienced object-oriented developer who thinks that "C++ is to C as lung
cancer is to lung," you can rest easy. Many of the obnoxious aspects of C++
have been left behind, because the only constructs needed are for specifying
interfaces. And remember that it is only the interface design that is object
oriented. You can program any language for which a binding is available. At
the moment, the only official language binding is C, although a
to-be-finalized C++ binding has been available for some time.
The principal object models are SOM, COM, and CORBA. They're not quite
comparable, because CORBA is a specification rather than an implementation,
but everyone compares them anyway. The CORBA architecture has been implemented
by at least a half-dozen different vendors, including Digital,
Hewlett-Packard, Iona, Expersoft, and SunSoft. In some cases, a vendor's
design predates CORBA and has been made compliant by means of retrofitting.
Although IBM's SOM is CORBA-compliant, it is different enough in design goals,
scale, deployment, and market presence, that it should be considered on its
own. Microsoft's COM has some fundamental differences with CORBA, such as lack
of the usual notion of inheritance. Nevertheless, it, too, has been made
CORBA-compliant by way of Digital's Common Object Model (not to be confused
with the Component Object Model).
Compared to DCE, many of the object models lack maturity and scope. DCE
provides ancillary services--such as security, authentication, directory,
time, and threading--that are only in the planning stages in the case of some
object models. Because OSF sells only source code, most DCE implementations
are based on a common codebase, so you'll find much more interoperability
between vendors than in the CORBA world. 


Compound Documents


In the platform of the future, compound document technologies-- such as found
in OpenDoc, TalAE, or OLE--will be layered on top of an underlying object
model. Although it is too early to tell which alternative will prevail, we do
know the general shape of the winner, because the existing alternatives share
some common design points. The technologies have all evolved down a similar
path.
The history of compound documents is much shorter than that of distributed
computing, because, until the Xerox PARC Smalltalk system of the late 1970s,
most mainstream computing was textual rather than graphical in nature. With
graphics came the ability to present a richer set of application datatypes to
the user: charts, page designs, images, and typographic-quality text. In the
mid- and late 1980s, applications such as Aldus PageMaker could integrate a
wide array of datatypes into a single document. Paul Brainerd of Aldus once
said that the "Place" command in PageMaker (which is used to import a variety
of datatypes) is "the single most important command in the entire program." 
Although powerful and featureful, these programs evolved into enormous
monoliths written without object-oriented tools, and they became increasingly
difficult to maintain and test. At that point many vendors started switching
to C++ for the implementation language. Second-generation page-makeup programs
such as QuarkXPress, along with the new generation of multimedia tools such as
Macromedia Director, provide even more facilities for integrating application
datatypes into a document. However, the focus is still on the application and
the vendor, instead of the document and the user.
This situation led to an emerging paradigm of compound-document computing.
From the user's point of view, compound documents provide a "document-centric"
rather than an "application-centric" model of use. The basic idea is that
there is a generic shell or empty container provided by the platform vendors
that can be filled with any number of different datatypes (and associated
functionality) created by any number of third parties. This generic shell, in
conjunction with the appropriate components, can function just as easily for
doing spreadsheets as for wordprocessing or multimedia.
The germ of this idea probably came from application frameworks, which evolved
in the mid-1980s out of C++ class libraries such as that from NIH, as well as
from the original Smalltalk class library. The first commercial app framework
was Apple's MacApp, in 1988. MacApp provided a generic implementation of the
common functionality needed by most applications (such as file open/save,
print, command undo, help, and so on). MacApp's design was influenced by the
Model/View/Controller paradigm used in Smalltalk. This paradigm has been
adopted by most frameworks, including MFC, OWL, and TalAE. MacApp was
initially written in Apple's object-oriented version of Pascal, then later
rewritten in C++. An object-oriented language is the ideal implementation
vehicle for this, because developers can subclass and modify the generic
behavior to fit their specific needs. An object-oriented language is also
ideal for implementing compound document components, because an object is able
to encapsulate both code and data.
More-recent mainstream frameworks include Borland OWL, Microsoft MFC, Think
TCL, and Symantec Bedrock (now defunct). One of the most sophisticated app
frameworks predates these four--GO's PenPoint environment, which in 1990
included a compound-document facility known as Embedded Document Architecture
(EDA).
Using a complex application framework effectively often requires the source
code--at least to look at the code and understand what the documentation has
left out. So, most frameworks today come with the entire source. In a true
compound-document architecture (such as Penpoint EDA), it is not necessary to
provide the source, because the interfaces between components and containers
are more precisely defined.

As app frameworks matured, they increased in functionality and size. For
example, MFC grew from about 25,000 lines of C++ in Version 1, to 55,000 lines
in the next release, to almost 80,000 in Version 2.5, with more to come. This
dramatic increase in complexity has led to a more careful reevaluation of the
interface between the generic container and the application-specific
component. The idea is to insulate the developer from having to understand a
large body of code, and focus more on the protocols that govern the
interaction between component and generic container. The result has been a
redesign of the app framework as we know it. This is why Taligent's efforts,
which were initiated at Apple under the codename "Pink," did not start with
the MacApp or Bedrock source code, but instead began with a clean slate.
Likewise, Microsoft's OLE will eventually subsume the entire system, from the
application level down to the operating-system services (in the form of the
Cairo OS, due in 1996). For the moment, OLE services have been grafted on to
Microsoft's existing app framework, MFC 2.5.
Compound-document technologies seem to start from the top and percolate
downward to the level of the operating system. Taligent plans to first release
its TalAE application environment, then follow it with TalOS, which represents
functions more akin to that of an operating system. Researchers at Xerox PARC
used to remark that "an operating system is a collection of things that a
language does not provide." Soon one might say that an operating system is one
set of services provided by a new-generation application framework. At that
point, the transition to the platform of interoperable objects will be
complete.




























































Special Issue, 1994
OMG's CORBA


An emerging standard for real-world implementations




Mark Betz


Mark is a senior consultant with Semaphore (Andover, MA), specializing in
client/server development, object-oriented design, and distributed-object
computing. He can be reached on CompuServe at 76605,2346. 


As enterprise information systems move down from the big iron onto the
desktop, much of the activity in object technology now revolves around its
role as a foundation for client/server applications. The model for
client/server is maturing into one based on peer-to-peer distributed
processing. Consequently, a considerable amount of attention is being focused
on technologies for linking applications and objects across machine boundaries
in a heterogeneous, networked environment. These technologies fall under the
rubric "distributed object computing." 
Although the technologies vying for market dominance are quite varied, a large
number rely on an emerging standard called the "Common Object Request Broker
Architecture" (CORBA) specification. This article provides an introduction to
CORBA and its related technologies.
Current systems based on (or compliant with) CORBA include the Distributed
System Object Model (DSOM) from IBM, Digital's Object Request Broker (ORB),
Portable Distributed Objects (PDO) from Next, and Sunsoft's Distributed
Objects Environment (DOE). In addition, many CORBA implementations are being
offered from smaller vendors, such as Iona's Orbix.
The CORBA spec is being promulgated by the Object Management Group (OMG),
which is a consortium of more than 300 hardware, software, and end-user
companies, including every heavyweight in the business from IBM to Microsoft.
The group was founded in 1989 by a group of 11 companies (original members
included Digital, Hewlett-Packard, Hyperdesk, NCR, and SunSoft). Those
companies, along with Object Design, were authors of the CORBA 1.0 spec,
released in October 1991. It was followed in March of 1992 by revision 1.1,
and the group is currently working on revision 2.0, due before the end of
1994. The CORBA spec defines the architecture of an ORB, whose job is to
enable and regulate interoperability between objects and applications. This
facility is part of a larger vision called the "Object Management
Architecture" (OMA), which defines the OMG object model.


The Object Management Architecture


The OMA sets forth the OMG's vision of the complete distributed environment.
Where the concern of the CORBA spec is solely the interaction of apps and
objects, and the mechanisms that enable it, OMA defines a broad architecture
of services and relationships within an environment, as well as the object and
reference models (both of which are discussed later). As shown in Figure 1,
the OMA is built upon the ORB services defined by CORBA, which provide the
interaction model for the architecture. The environment is made richer with
the addition of Object Services and Common Facilities, both of which are
intended to serve as building blocks for assembling the frameworks within
which distributed solutions are built.
The Object Services facility consists of a set of objects that perform
fundamental operations. The OMG Object Services Task Force (OSTF) accepted the
first stage of the Common Object Services Specification (COSS) volume 1, in
November of 1993. This part of the specification covers life-cycle, naming,
event, and persistence services. A request for proposal was issued during the
summer of 1993 for the second stage of the specification (expected during the
latter half of 1994), which defines relationships, externalization,
transactions, and concurrency control. The OSTF envisions two additional
stages beyond the second, which will address issues such as security,
licensing, queries, and versioning. Stages 3 and 4 are called for in 1995 and
1996, respectively. As with all of the OMG specifications, Object Services are
defined as "interfaces" expressed in the OMG's Interface Definition Language
(IDL), which is discussed later. The specifications do not address
implementation details. In this sense, a spec defines an abstraction that may
have any number of unique implementations--a fact that is both philosophically
satisfying and practically worrisome, and something OMG competitors have not
overlooked.
Common Facilities (CF) are the newest area of effort by the OMG. The focus is
on application-level functions, unlike CORBA and Object Services, which are
low-level, fundamental capabilities. CF defines objects that provide key
workgroup-support functions (such as printing, mail, database queries,
bulletin boards and newsgroups, and compound documents). The OMG accords CF
high priority, because they envision these services as comprising the layer
most often utilized by developers who work in a distributed environment. In
December 1993, the Common Facilities Task Force (CFTF) was formed to create
three documents: the CF Architecture, the CF Roadmap, and the CF Request for
Proposals (RFP). The architecture spec identifies and describes the primary
groups of facilities required. The roadmap groups these categories according
to importance, and schedules them for work. The OMG has charged the CFTF with
producing an architecture that provides key services required by most
applications, while leaving room for specialized vertical solutions. The CFTF
anticipates releasing the first CF RFP sometime in 1994.


The Object Model


The OMG object model underlies the OMA and is described in the CORBA
specification. The object model is a classical model in which clients send
messages to servers, and in which a message identifies an object, with zero or
more parameters in the request. The OMG model strictly separates interface
from implementation. The model itself is concerned only with interfaces, to
the extent that "interface" and "object type" are synonymous. This is largely
a result of the need to define the interface between components independent of
their implementation languages. In this case, "interface" means the methods
that can be called on an object, together with the object's accessible
attributes (those attributes intended to be retrievable by way of get/set
methods). By defining these things, the developer is describing how the object
appears to the ORB and to clients. In addition to a set of behaviors and
attributes, an object must have an identity, in order that it can be
referenced by an application.
In C++ programs, an object is identified by the unique memory address at which
it resides. By contrast, in the OMG model, objects are identified by
references. References are guaranteed to identify the same object each time
the reference is used in a request. The specification is mostly silent on how
references are implemented. All current ORB vendors implement them as objects
that carry enough descriptive information that they are effectively unique.
However, the OMG specifically states that references are not guaranteed to be
unique. They chose not to define a Universal Unique Identifier (UUID) scheme
in Version 1.1 of the specification because of concerns about management and
interaction with legacy applications that have a different idea of an object
ID. The lack of a universal means of "federating" (or making globally
compatible) the names used to reference objects is a failing that the OMG
intends to address in Version 2.0 of the specification. 
Objects in the OMG model are created and destroyed dynamically in response to
the issuance of requests. The specification does not define a method for the
application to create and destroy objects. However, vendors such as IBM have
augmented their implementations with this capability. Objects can also
participate in any of the normal types of relationships, with perhaps the most
important being subtype/supertype relationships or inheritance. Multiple
inheritance is also permitted. "Inheritance," in this sense, is inheritance of
interface only. There is no provision in the specification for implementation
inheritance. Inheritance between object interfaces is specified syntactically
by using the OMG's IDL. There is nothing to prevent the developer of a set of
server objects from using implementation inheritance in the design of the
servers, but the dependency is not made explicit in the Interface Definition
syntax. The fact that a set of servers accessed through an interface hierarchy
are also related by implementation inheritance is unknown to the ORB.
The OMG object-model specification makes no accommodation for polymorphism. In
object-oriented systems, polymorphism (whose Greek roots mean "many forms")
refers to being able to invoke the same method on a number of different
objects (for example, invoking the "draw" method on both a pushbutton object
and menu object). Implementing polymorphic behavior usually requires dynamic
binding. Polymorphism is a key benefit of object technology. It allows the
knowledge of specific implementations of an operation to reside with the
server, rather than the client. The alternative (and the usual situation prior
to OO) is to make the client aware of all specific implementations. Though the
OMG model lacks polymorphism, implementors of CORBA-compliant products are
free to provide it as part of a superset of capabilities.


Types in the OMG Model


The OMG model is strongly typed. As in C++, types are used to restrict and
characterize operations. Unlike languages such as Smalltalk, types in the OMG
model are not first-order objects, and cannot be manipulated as objects. The
two primary categories of types in the object model are Basic types and
Constructed types. Both of these types are used in declaring interface methods
and accessible attributes. Basic types represent fundamental data types. These
include integers (signed and unsigned, both short and long), floating-point
numbers (in both 32-bit and 64-bit IEEE formats), ISO Latin-1 characters,
Booleans, enums, strings, and a nonspecific type called "any." In addition, a
special 8-bit datatype is defined that is guaranteed not to undergo conversion
when transferred from one system to another. This type is sometimes called an
"octet." 
Constructed types are more-complex, higher-level entities. The most important
of the constructed types is the Interface type, which specifies the set of
operations that an instance of that type must support. An object is an
instance of an interface type if it satisfies the set of operations defined on
that type. An interface type, in turn, is satisfied by any reference to an
object that satisfies the interface. Other types include Structs, Unions,
Sequences, and Arrays. Structs are pure data structures which operate much
like C++ structs. Likewise, Unions operate like C++ unions. Sequences are a
variable-length array type that may contain any single type of object
(including other sequences). Arrays are fixed-length arrays of a single type.
Figure 2 shows the OMG type hierarchy. 


Object Request Broker Architecture


The job of an ORB is to manage the interaction between clients and server
objects. This includes all the responsibilities of a distributed computing
system, from location and referencing of objects to the "marshaling" of
request parameters and results. Marshaling refers to the process of
translating and transferring parameters and results between machines,
processes, and address spaces. 
To provide all these capabilities the CORBA specification defines an
architecture of interfaces that may be implemented in different ways by
different vendors. The architecture was specifically designed to separate the
concerns of interface and implementation. Figure 3 shows the architecture of
an OMG ORB. The main components of the architecture may be divided into three
specific groups: client side, implementation side, and ORB core. The client
and implementation sides represent interfaces to the ORB. 


Client-Side Architecture


The client-side architecture consists of three components: the Dynamic
Invocation interface, the IDL stub interface, and the ORB services interface.
In general, the stub interface consists of method "thunks" (small pieces of
machine-language interface code), which are generated according to IDL
interface definitions. These method thunks are linked into the client program.
The thunks call out to the ORB, which passes the method call onto the server,
as described later. The stubs represent a language mapping between the client
language and the ORB implementation, allowing use by clients written in any
language for which a mapping exists. There is currently an accepted mapping
for C, and mappings for C++ and Smalltalk are planned. Most current vendors
have provided a C++ mapping based on current proposals to the OMG. The use of
the stub interface brings the ORB right into the application programmer's
domain. The client interacts with server objects by invoking methods just as
it would on local objects.
The Dynamic Invocation interface is a mechanism for specifying requests at run
time. The dynamic interface is accessed by using a call to the ORB in which
the interface typename, request, and parameters are specified. The client code
is responsible for specifying the argument and return-value types. This
information may come from an Interface Repository, which is discussed later.
The information may also come from some other source. The details of which
interface is used to invoke a method are not relevant to servers. The dynamic
interface is necessary when the interface type cannot be known at compile
time. Otherwise, it is better to use the stub interface, which is more
efficient and typesafe. As with much of CORBA, the specification document says
little about the specifics of the dynamic interface, and, in fact, is explicit
in warning that the interface may vary widely across language mappings. 
The last of the client-side interfaces are the ORB services. These are
functions of the ORB that may be accessed directly by the client code. An
example might be retrieving a reference to an object. The specific nature of
these services is largely undefined by the specification.



Implementation-Side Architecture


One aspect of the client-side interface is shared by object implementations:
the ORB services. The other two components on the implementation side are the
IDL skeleton interface and the Object Adapter. The skeleton interface and the
stub interface are examined in the discussion about IDL. In general, the
skeleton interface is an up-call interface through which the ORB calls the
method skeletons of the implementation, on a request by a client. Most of the
functionality provided by the ORB to object implementations is provided
through the IDL skeletons and the Object Adapter. The OMG expects only a few
services to be common across all objects and accessed by way of the ORB core.
Since the skeleton interface is, in fact, implemented on top of the Object
Adapter, the focus here is on the Object Adapter.
The Object Adapter is the means by which server implementations access most of
the services provided by the ORB. These services include generation and
interpretation of object references, method invocation, security, activation
(the process of locating an object's implementation and starting it running),
mapping references to implementations, and object registration. The adapter
actually exports three separate interfaces: a private interface to the
skeletons, a private interface to the ORB core, and a public interface for use
by implementations. 
The CORBA specification is not explicit about what services an adapter must
support, but it is clear that the adapter is intended to isolate object
implementations from the ORB core as much as possible. The spec envisions a
variety of adapters that provide services needed by specific kinds of objects.
The most generic adapter described is the Basic Object Adapter (BOA). The BOA
allows a variety of object-implementation schemes to be accommodated--from
separate programs for each method, to separate programs for each object, to a
shared implementation for all objects of a given type (the C++ model). The
specification also describes adapters suited to objects stored in libraries
and object-oriented databases. 


Interface Definition Language


As mentioned previously, interfaces to servers can be specified in an
abstract, symbolic language. This interface representation predates CORBA and
object-oriented systems. Since the early days of remote procedure call (RPC)
systems, these languages have been known as "Interface Definition Languages"
(IDLs). The purpose of an IDL is to allow the language-independent expression
of interfaces, including the complete signatures (name, parameters, parameter
and result types) of methods or functions, and the names and types of
accessible attributes. This goal is achieved by way of a mapping between the
IDL syntax and whatever language is used to implement client and server
objects. Clients and servers need not be implemented using the same language,
and, in fact, it is anticipated that they will not be. All that's needed is a
mapping for both the client and server implementation languages. 
CORBA IDL is a language with many constructs that resemble those in C++. In
fact, the specification credits the Annotated C++ Reference Manual as the
source for what eventually became the CORBA IDL specification. IDL obeys the
same lexical rules as C++, while introducing a number of new keywords specific
to the needs of a distributed system. Anyone familiar with C++ should have no
trouble adapting to IDL. Writing interface definitions in IDL is quite a bit
like writing class declarations in C++. Because IDL is designed purely for
interface specification, it lacks the constructs of an implementation
language, such as flow control, operators, and object definitions (used here
in the C/C++ sense of allocating storage for a variable or object, as opposed
to declaring its type). There is no concept of public and private parts of the
interface declaration, since the notion of encapsulation is implicit in the
separation of the IDL interface from the implementation.


Exceptions and Modules


Two interesting aspects of IDL are exceptions and modules. Exception
declarations define a struct-like data structure with attributes that can be
used to pass information about an exception condition to a service requestor.
An exception is declared with an identifier (an exception name), which is
accessible as a value when the exception is raised, allowing the client to
determine which exception has been received. Members of the exception, if
declared, are accessible to the client. An object uses the raises keyword to
throw an exception. The name of the exception thrown must be in scope at the
point where the raises expression is encountered. In addition to user-defined
exceptions, the CORBA specification defines a number of standard exceptions
for conditions such as a bad parameter, or a memory-allocation failure.
Modules extend the IDL scoping rules in a way similar to that of the C++
namespaces, which are a recent addition to the C++ language. The module
keyword defines a nested scope within a file or within another module. By
prepending an identifier with the module name and the :: operator, a program
can specify that the identifier be searched for within the scope defined by
the module. The goal of the module construct is to allow the encapsulation of
namespaces so as to prevent name collisions with third-party libraries. In all
other circumstances, IDL scope rules are very similar to those for C and C++.


An IDL Example


Although this article cannot examine IDL in great detail, a simple interface
definition is examined, and the relationship between the interface definition
and the rest of the ORB architecture is described. Example 1 presents the IDL
description of an interface to an object that represents a stock. Note that
the declaration is roughly similar to a C structure declaration, with the
struct keyword replaced by interface. At the top of the declaration are the
exceptions raised by the implementation of this interface. The next three
lines declare read-only attributes. Declaring an attribute in IDL is
equivalent to declaring get/set methods for that attribute. Read-only
attributes are effectively constant, and so only the get method skeletons will
be generated for them by the IDL compiler. Finally, the example shows the
buy() and sell() methods for the stock object. The buy() method takes two
input parameters (identified by the in keyword), and raises the
not_enough_cash exception. The sell() method takes a single input parameter
and raises not_enough_shares. The syntax for declaring exceptions in IDL is
similar to that in C++, except that the keyword is raises rather than throws.
Also shown in Example 1 is an interface derived from the Stock interface. Note
that the syntax for specifying inheritance is similar to C++. One difference
is that there is no "private" inheritance. Interface inheritance in CORBA IDL
is public. All methods and attributes of the parent interface are accessible
in the derived interface. In addition, inherited methods can be overridden in
order to specialize the behavior for the derived interface. If you need
explicit access to base interface methods or attributes, you can specify this
by way of qualification, as in Stock::Buy(), again emulating the C++ usage.
Multiple inheritance is legal in IDL. As in C++, there are rules for resolving
ambiguities in cases where a base class is inherited more than once or a name
is introduced by multiple base classes.
Once the interface to an object has been defined, it is mapped into the client
and server languages by using an IDL compiler. The compiler produces stubs for
the methods on the client side, and skeletons for the method implementations
on the server side. If the implementation provides a C++ binding, the
generated stubs are member functions of a class, which may inherit from a
system-supplied base class that provides a private interface to the ORB. Keep
in mind that any such binding--Iona's Orbix binding is an example--makes
assumptions about what the standard C++ mapping will look like. A C++ mapping
is expected to be finalized as part of Version 2.0 of the CORBA specification.
An earlier mapping devised by Hyperdesk was accepted, but later withdrawn.
When the accepted C language mapping is used, the method definitions produce
external function declarations. The client simply invokes the linked-in stubs
when a service is required. The request is forwarded to the ORB through the
stub interface. If the object is activated, the ORB makes an up-call to the
implementation through the generated method skeleton. If the object is not
activated, the ORB first locates and activates it, then performs the up-call.


Interface and Implementation Repositories


As an alternative to IDL, the CORBA spec devotes a couple of paragraphs to the
idea of repositories for both interface and implementation definitions. On the
interface side, the repository is intended to augment the dynamic invocation
interface by providing persistent objects that represent information about a
server's interface. By using an interface repository, it should be possible
for a client to locate an object which was not known at compile time, query
for the specifics of its interface, and then build a request to be forwarded
through the ORB. 
The implementation repository contains information that allows the ORB to
locate and activate objects to fulfill dynamic requests. In addition to
implementation information, the spec envisions this repository being used to
contain other incidental information about an object (such as debugging info,
versioning data, administrative data, and so on). The specification does not
define the implementation of either repository, and so different vendors have
gone their separate ways, as they have with CORBA Version 1.1. 


Conclusion


The OMG has taken some flak for resembling other industry consortia that
ultimately produced nothing substantive beyond many statements about open
architectures and cooperation. In the case of the OMG, the comparison is
unfair because the spec has garnered broad support. Many implementations are
now available, and serious work is being undertaken to move the specification
forward and address the shortcomings of Version 1.1. CORBA is now being taken
very seriously by a number of large organizations that see it as the only
viable technology truly headed in a cross-platform, nonproprietary direction.
As a specification for an architecture, the model is rather versatile. Because
CORBA is intended to be distributed technology, it is not necessarily
efficient at utilizing objects located on a single processor (though it does
not forestall such use). 
The vision of the OMG is clearly cross-platform and cross-operating system. Is
it a standard? Yes and no. Within the consortium, it is a standard description
of an architecture. It is not a standard for implementation of that
architecture, and it is not as well defined as it needs to be. The result is
that each one of the many implementations of CORBA is effectively a
proprietary product. There is currently no interoperability between ORBs,
although various partnerships have been announced (one example being the
SunSoft/Iona effort to make at least two of the implementations interoperate).
As a technology, CORBA is maturing rapidly. Various companies have produced
tools that ease the building of distributed applications using CORBA. A number
of training companies now offer hands-on courses in building such applications
using one or more of the available implementations. Version 2.0 of the
specification promises to address a number of critical issues necessary for
interoperability. Any success at doing so will go a long way toward nudging
CORBA a rung or two higher on the ladder of consideration. CORBA
implementations are currently available for nearly all the major operating
systems. If an organization is willing to bank on one implementation,
real-world solutions can be--and are being--built now.
Figure 1 The object management architecture (OMA). 
Figure 2 The OMG type tree.
Figure 3 The architecture of an object request broker.
Example 1: Interface to a stock price object, as expressed in IDL.
interface Stock
{
 exception not_enough_cash{ float amount_short };
 exception not_enough_shares{ short shares_short };
 readonly attribute string ticker;
 readonly attribute string companyName;
 readonly attribute float currentPrice;
 void buy( in short numberOfShares, in float cash)
 raises ( not_enough_cash );
 void sell( in short numberOfShares )

 raises ( not_enough_shares );
};
interface BlueChipStock : Stock
{
 // ...
};
























































Special Issue, 1994
The Component Object Model


The foundation for OLE services




Sara Williams and Charlie Kindel


Sara is a technical evangelist in the developer relations group at Microsoft.
She can be reached at saraw@microsoft.com. Charlie is a program manager,
software-design engineer, and technical evangelist in the developer relations
group at Microsoft. He can be reached at ckindel@microsoft.com.


The Component Object Model (COM) is a component-software architecture designed
by Microsoft that allows applications and systems to be built from components
supplied by different software vendors. COM is the underlying architecture
that forms the foundation for higher-level software services, like those
provided by OLE; see Figure 1. OLE services span various aspects of component
software, including compound documents, controls, interapplication
programmability, data transfer, storage, naming, and other software
interactions. 
These services provide distinctly different functionality to the user.
However, all OLE services share a fundamental requirement for a mechanism that
allows binary software components (supplied by different software vendors) to
connect to, and communicate with, each other in a well-defined manner. This
mechanism is supplied by COM, a component-software architecture that:
Defines a binary standard for component interoperability.
Is programming-language independent.
Is provided on multiple platforms (Windows, Windows NT, Macintosh, UNIX).
Provides for robust evolution of component-based applications and systems.
Is extensible.
In addition, COM provides mechanisms for:
Communications between components, even across process and network boundaries.
Error and status reporting. 
Dynamic loading of components. 
It is important to note that COM is a general architecture for component
software. While Microsoft is applying COM to address-specific areas like those
shown in Figure 1, any developer can take advantage of the structure and
foundation that COM provides.
How does COM enable interoperability? What makes it such a useful and unifying
model? To address these questions, it is helpful to examine the basic COM
design principles and architectural concepts. In doing so, you will see the
specific problems that COM was designed to solve, and how COM provides
solutions for these problems. After this, turn to the article, "Application
Integration with OLE," by Kraig Brockschmidt in this issue, to see how OLE
provides higher-level services on top of the COM foundation. For an example
implementation using COM and OLE, see the article, "Implementing Interoperable
Objects," by Ray Valds.


The Component-Software Problem


The fundamental question COM addresses is: How can a system be designed such
that binary software components from different vendors, written in different
parts of the world, at different times, are guaranteed to interoperate? To
design such a system, four specific problems must be solved:
Basic interoperability. How can developers create their own unique components,
yet be assured that these components will interoperate with other components
built by different developers?
Versioning. How can one system component be upgraded without upgrading all the
others?
Language independence. How can components written in different languages
interoperate?
Transparent cross-process interoperability. How can developers write
components to run in-process or cross-process (and eventually cross-network)
using a single programming model?
These problems need to be solved without sacrificing performance. Achieving
cross-process and cross-network transparency must be accomplished without
adding undue system overhead to components interacting within the same address
space. In-process components must be scalable down to small, lightweight
pieces of software, equivalent in scope to C++ classes or GUI controls.


COM Fundamentals


The design of COM rests on fundamental concepts that:
Provide a binary standard for function calling between components.
Define groups of related functions into strongly typed interfaces and allow
developers to define new interfaces.
Define a base interface that provides a way for components to dynamically
discover the interfaces implemented by other components and tracks component
instantiation by way of reference counting.
Define a mechanism to uniquely identify components and interfaces. 
Provide a run-time library to establish and coordinate component interactions.


Binary Standard


To implement a binary standard for component invocations, COM defines a
standard way to lay out (for each of several platforms) virtual function
tables (known as "vtables") in memory, and a standard way to call a function
in a vtable. Thus, any language that can call functions through double-pointer
indirection (C, C++, Smalltalk, Ada, Basic, and many others) can be used to
write components that can interoperate with other components written in any
language that conforms to COM's binary standard. 
An important distinction is made between objects and components. The word
"object" indicates something different to everyone. In COM, an object is some
piece of compiled code that provides some service to the rest of the system.
To avoid confusion, a COM object here is referred to as a "Component Object,"
or simply a "component." This avoids confusing COM objects with source-code
OOP objects, such as those used in C++ programs.



Interfaces


In COM, applications interact with each other and with the system through
collections of functions (or methods) called "interfaces." Note that all OLE
services are simply COM interfaces. A COM interface is a strongly typed
contract between software components to provide a small, but useful, set of
semantically related operations. An interface is the definition of an expected
behavior and expected responsibilities. OLE's drag-and-drop support is a good
example of COM interface usage. All the functionality that a component must
implement to be a drop target is collected into the IDropTarget interface. All
the drag source functionality is in the IDragSource interface. Interface names
begin with "I." OLE defines a number of interfaces for compound document
interactions--these usually start with "IOle." Any developer can design custom
interfaces to take advantage of COM to implement specific types of component
integration and communication. Incidentally, a pointer to a Component Object
is really a pointer to one of the interfaces that the Component Object
implements. This means that you can only use a Component-Object pointer to
call a method and not to modify data. Example 1 shows an interface definition
for a simple phone-directory service, ILookup, which has two methods,
LookupByName and LookupByNumber.
All Component Objects support a base interface called "IUnknown," along with
any combination of other interfaces, depending on what functionality a
Component Object chooses to expose. Unlike C++ objects, Component Objects
always access other component objects through interface pointers. A Component
Object can never access another component object's data. Only an object's
interfaces are exposed to other objects; see Figure 2. This is a primary
architectural feature of the Component Object Model. It allows COM to
completely preserve encapsulation of data and processing, a fundamental
requirement of a true component software standard. It also allows for
transparent remoting (cross-process or cross-network calling), since all
component access is through well-defined interface methods that can exist in a
proxy object that forwards the request and vectors back the response.


Interface Attributes


An interface is a contractual way for a Component Object to expose its
services. The key aspects of this design are:
An interface is a type, not a class. While a class can be instantiated to form
a Component Object, an interface cannot be instantiated by itself because it
carries no implementation. A Component Object must implement that interface
and that Component Object must be instantiated for there to be an interface.
Furthermore, different Component Object classes may implement an interface
differently, so long as the behavior conforms to the interface definition
(such as two objects that implement a hypothetical IStack where one uses an
array and the other a linked list). Thus, the basic OO principle of
polymorphism fully applies to Component Objects.
An interface is not a Component Object. An interface is just a related group
of functions and is the mechanism through which clients and Component Objects
communicate. The Component Object can be implemented in any language with any
internal state representation, so long as it can provide pointers to the
interfaces it implements.
Clients only interact with pointers to interfaces, not with pointers to
objects. When a client has access to a Component Object, it actually has
nothing more than a pointer through which it can access the functions in the
interface--an interface pointer. This pointer is opaque. It hides all aspects
of internal implementation. Your code cannot "see" the Component Object's
data--as in C++ programs, in which a client can directly access an object's
data by way of an object pointer. In COM, the client can only call methods of
the interface to which it has a pointer. This encapsulation allows COM to
provide the efficient binary standard that enables local/remote transparency.
Component Objects can implement multiple interfaces. A Component Object
can--and typically does--implement more than one interface. That is, the class
has more than one set of services to provide. For example, a class might
support the ability to exchange data with clients, as well as the ability to
save its persistent state information (the data it would need to reload to
return to its current state) into a file at the client's request. Each of
these abilities is expressed through a different interface (IDataObject and
IPersistFile), so the Component Object implements two interfaces.
Interfaces are strongly typed. Every interface has its own interface
identifier (known as a GUID), which eliminates any chance of collision that
might occur with human-readable names. To create a new interface, the
developer also must create an identifier for that interface. In using an
interface, the developer must use the interface identifier to request a
pointer to the interface. This explicit identification improves robustness by
eliminating naming conflicts that would otherwise result in run-time failure. 
Interfaces are immutable. They are never versioned, which means that version
conflicts between new and old components are avoided. A new version of an
interface (created by adding more functions or changing semantics) is an
entirely new interface and is assigned a new, unique identifier. Therefore, a
new interface does not conflict with an old interface, even if only the name
has changed.
Figure 3(a) shows a diagram of a Component Object that supports three
interfaces--A, B, and C. By convention, a standard pictorial representation is
used for objects and their interfaces in which an interface is represented as
a "plug-in jack." Figures 3(b) and 3(c) show how interfaces allow for both
client/server and peer-to-peer relationships between components.


Interface Benefits 


The unique use of interfaces in COM provides a number of benefits:
Application functionality can evolve over time. As you will see, IUnkown's
QueryInterface method is used both to determine (at run time) which interfaces
an object supports, and to request a pointer to a supported interface. When a
component is upgraded to support a new interface, it will return a pointer to
that interface (instead of NULL, as it did before it supported the interface)
the next time its QueryInterface is called. Because this negotiation is done
at run time, other system components do not have to be altered to be able to
take advantage of the upgraded component's newly supported interface. Revising
an object by adding new functionality will, therefore, not require any
recompilation on the part of existing clients. By definition, COM interfaces
are immutable, which solves the versioning problem and guarantees backward
compatibility across upgrades. This guarantee is a fundamental requirement for
fostering a commercial component-software market. By comparison, other
proposed system object models generally allow developers to change existing
interfaces, which ultimately leads to versioning problems as components are
upgraded. Although other approaches seem to handle versioning, they don't
really work. If version checking is done only at object-creation time, for
example, subsequent uses of an instantiated object can fail because the object
is of the right type, but the wrong version (and per-call version checking is
impractical because of high overhead).
Object interaction is fast and simple. Once a client establishes a connection
to an in-process object, calls to that object's services (interface methods)
are simply indirect functions calls through two memory pointers. As a result,
the performance overhead of interacting with an in-process COM object (an
object that is in the same address space as the calling code) is negligible.
Calls between COM components in the same process are only a handful of
processor instructions slower than a standard direct function call, and no
slower than a compile-time-bound C++ object invocation. Interfaces are
efficient even for cross-process objects, because the cost of negotiating
capabilities at run time is minimized by negotiating interfaces not individual
functions (by using QueryInterface).
Interfaces can be reused. Design experience suggests that many sets of
operations are useful across a broad range of components (for example, many
components require a set of functions to read and write byte streams). This
facilitates reuse of both code and of design. A programmer must learn an
interface only once, and can apply that interface to many different
components. For example, IDataObject is the sole interface used to move data
between objects. Regardless of how the user requests that data be moved
(cut/copy/paste, drag-and-drop), IDataObject is always used for the data
transfer.
Local and remote calls are indistinguishable to the client. The binary
standard allows COM to intercept an interface call to an object and to make a
remote procedure call instead, to an object in another process or on another
machine. From the caller's point of view, these calls are the same. Of course,
a remote procedure call has more overhead, but no special code is necessary in
the client to differentiate an in-process object from out-of-process objects.
All objects are available to clients in a uniform, transparent fashion.
Microsoft will later provide a distributed version of COM that requires no
modification to existing components in order to gain distributed capabilities.
Programmers can be isolated dealing with networking issues, and components
shipped today will operate in a distributed fashion when this future version
of COM is released.
Component Objects are programming-language independent. Any programming
language that can create structures of pointers and explicitly or implicitly
call functions through pointers, can create and use Component Objects.
Component Objects can be implemented in a number of different programming
languages and used from clients that are written using completely different
programming languages. Again, this is because COM (unlike an object-oriented
programming language) represents a binary-object standard, not a source-code
standard. 


The IUnknown Interface


COM defines one special interface, IUnknown, to implement some essential
functionality. All Component Objects are required to implement the IUnknown
interface, and conveniently, all other COM and OLE interfaces derive from
IUnknown. IUnknown has three methods: QueryInterface, AddRef, and Release; see
Example 2. Since all interfaces derive from IUnknown, QueryInterface, AddRef,
and Release can be called using any interface pointer.
AddRef and Release are simple reference-counting methods. An interface's
AddRef is called when another Component Object makes a copy of a pointer to
that interface. An interface's Release method is called when the other
component no longer requires use of that interface. While the Component
Object's reference count is nonzero, it must remain in memory. When the
reference count becomes zero, the Component Object can safely unload itself,
because no other components hold references to it. 
QueryInterface is the mechanism that allows clients to dynamically discover
(at run time) whether an interface is supported by a Component Object. At the
same time, it is the mechanism that a client uses to get an interface pointer
from a Component Object. When an application wants to use some function of a
Component Object, it calls that object's QueryInterface, requesting a pointer
to the interface that implements the desired function. If the Component Object
supports that interface, it will return the appropriate interface pointer and
a success code. If the Component Object doesn't support the requested
interface, then it will return an error value. The application will then
examine the return code. If successful, it will use the interface pointer to
access the desired method. If the QueryInterface fails, the application will
take some other action, letting the user know that the desired functionality
is not available.
Example 3 shows a call to QueryInterface on the component Phonebook. The code
is asking this component, "Do you support the ILookup interface?" If the call
returns successfully, then the component supports the desired interface and a
pointer can be used to call methods contained in that interface (in this case,
either LookupByName or LookupByNumber). Note that AddRef() is not explicitly
called in this case because QueryInterface() increments the reference count
before returning the interface pointer.


Identifying Interfaces


COM uses Globally Unique Identifiers (GUIDs) to identify every interface and
every Component Object class. GUIDs are equivalent functionally to Universally
Unique Identifiers (UUIDs), as defined in the Open Software Foundation's
Distributed Computing Environment (OSF DCE). GUIDs are 128-bit integers that
are guaranteed to be unique in the world across space and time. Human-readable
names are assigned only for convenience and are locally scoped. This helps
ensure that COM components do not accidentally connect to the "wrong"
component, server, or try to use the "wrong" interface, even in networks with
millions of Component Objects. GUIDs are embedded in the component binary
itself, and are used by COM dynamically at bind time to ensure no false
connections are made between components.
CLSIDs are GUIDs that refer to Component Object classes, and IIDs are GUIDs
that refer to interfaces. Microsoft supplies a tool (uuidgen) that
automatically generates GUIDs. Additionally, the CoCreateGuid function is part
of the COM API. Thus, you can create your own GUIDs when you develop Component
Object classes and custom interfaces. COM header files provide macros that
allow you to define a more readable name to your GUIDs. Example 4 shows two
GUIDs. CLSID_PHONEBOOK is a Component Object class that gives users lookup
access to a phone book. IID_ILOOKUP is a custom interface implemented by the
PhoneBook class that accesses the phone book's database.


Component Object Library


The Component Object Library is a system component that provides the mechanics
of COM. This library provides the ability to make IUnknown calls across
processes. It also encapsulates all the "legwork" associated with launching
components and establishing connections between components, so that both
clients and servers are insulated from location differences.
When an application wants to instantiate a Component Object, it passes the
CLSID of that Component Object class to the Component Object Library. The
library uses that CLSID to look up the associated server code in the
registration database. If the server is an executable, COM launches the EXE
and waits for it to register its class factory through a call to
CoRegisterClassFactory (a class factory is the mechanism in COM used to
instantiate new Component Objects). If the associated server code happens to
be a DLL, COM loads the DLL and calls the DLL's exported function
DllGetClassFactory. COM uses the object's IClassFactory interface to ask the
class factory to create an instance of the Component Object, and returns a
pointer to the requested interface back to the calling application. The
calling application neither knows nor cares where the server application is
run. It just uses the returned interface pointer to communicate with the newly
created Component Object. The Component Object Library is implemented in
COMPOBJ.DLL on Windows and OLE32.DLL on Windows NT and Windows 95. 
COM is designed to allow clients to transparently communicate with components,
regardless of where those components are running. There is a single
programming model for all types of Component Objects--for not only clients of
those Component Objects, but also for the servers of those Component Objects.
From a client's point of view, all Component Objects are accessed through
interface pointers. A pointer must be in-process, and, in fact, any call to an
interface function always reaches some piece of in-process code first. If the
Component Object is in-process, the call reaches it directly. If the Component
Object is out-of-process, then the call first reaches a "proxy" object
provided by COM. This proxy generates the appropriate remote procedure call to
the other process or the other machine. It can then transparently connect to
objects that are in-process, cross-process, or remote.
From a server's point of view, all calls to a Component Object's interface
functions are made through a pointer to that interface. Again, a pointer only
has context in a single process, and so the caller must always be some piece
of in-process code. If the Component Object is in-process, the caller is the
client itself. Otherwise, the caller is a "stub" object provided by COM that
picks up the remote procedure call from the proxy in the client process and
turns it into an interface call to the server Component Object. As far as both
clients and servers know, they always communicate directly with some other
in-process code; see Figure 4. 
The benefits of this local/remote transparency are:
The transparency provides a common solution to problems that are independent
of the distance between client and server. For example, connection, function
invocation, interface negotiation, feature evolution, and so forth, occur the
same for components interoperating in the same process and components
interoperating across global networks.
Programmers leverage their learning. New services are simply exposed through
new interfaces, and once programmers learn how to deal with interfaces, they
already know how to deal with new services that will be created in the future.
This is a great improvement over environments where each service is exposed in
a completely different fashion. For example, Microsoft is working with other
ISVs to extend OLE services. These new services, which will be quite diverse
in function, will all be very similar in their implementations because they
will simply be sets of COM interfaces.
Systems implementation is centralized. The implementors of COM can focus on
making the central process of providing this transparency as efficient and
powerful as possible, thus benefiting every piece of code that uses COM.

Interface designers concentrate on design. In designing a suite of interfaces,
the designers can spend their time in the essence of the design--the contracts
between the parties--without having to think about the underlying
communication mechanisms for any interoperability scenario. COM provides those
mechanisms for free, including network transparency.


Solving the Component-Software Problem


Let's see how the fundamental pieces of COM fit together to enable component
software. COM addresses the four basic problems associated with component
software: basic component interoperability, versioning, language independence,
and transparent cross-process interoperability. COM solves these problems
while satisfying the requirements of high performance and efficiency mandated
by the commercial component marketplace.
COM provides basic component interoperability by defining a binary standard
for vtable construction and method calling between components. Calls between
COM components in the same process are only a handful of processor
instructions slower than a standard direct function call, and no slower than a
compile-time bound C++ object invocation.
A good versioning mechanism allows one system component to be updated without
requiring updates to other components in the system. COM defines a system in
which components continue to support the existing interfaces (used to provide
services to older clients) as well as support new and better interfaces (used
to provide services to newer clients).
Versioning in COM is implemented by using interfaces and
IUnknown:QueryInterface. This mechanism allows only one system component to be
updated at a time. This approach completely eliminates the need for things
such as version repositories or central management of component versions. A
software module is generally updated to add new functionality, or to improve
existing functionality. In COM, you add new functionality to your Component
Object by adding support for new interfaces. Since the existing interfaces
don't change, other components that rely on those interfaces continue to work.
Newer components that know about the new interfaces can use them. Because
QueryInterface calls are made at run time (without expensive calls to some
"capabilities database," as is done in some other system-object models), the
capabilities of a Component Object are evaluated each time the component is
used. When new features become available, applications that know how to use
them will begin to do so immediately. 
Improving existing functionality is even easier. Because the syntax and
semantics of an interface remain constant, you are free to change the
implementation of an interface, without breaking other developers' components
that rely on the interface. Windows and OLE use this technique to provide
improved system support. For example, in OLE today, the Structured Storage
service is implemented as a set of interfaces that currently use the C
run-time file I/O functions internally. In Cairo (the next version of Windows
NT), those same interfaces will write directly to the file system. The syntax
and semantics of the interfaces remain constant. Only the implementation
changes.
Existing applications will be able to use the new implementation without any
changes. The combination of the use of interfaces (immutable, well-defined,
"functionality sets" for components) and QueryInterface (the ability to
determine at run time the capabilities of a specific Component Object) enable
COM to provide an architecture in which components can be dynamically updated,
without requiring updates to other reliant components. This is a fundamental
strength of COM over other proposed object models. At run time, old and new
clients can safely coexist with a given Component Object. Errors can only
occur at easily handled times--at bind time or during a QueryInterface call.
Regarding language independence, COM allows you to implement components in a
number of different programming languages and use these components from
clients that are written using completely different programming languages.
Again, this is because COM, unlike object-oriented programming languages,
represents a binary-object standard, not a source-code standard. This is a
fundamental benefit of a component software architecture over object-oriented
programming languages. Objects defined in an OOP language typically interact
only with other objects defined in the same language. This necessarily limits
their reuse. At the same time, an OOP language can be used in building COM
components, so the two technologies are actually quite complementary. COM can
be used to "package" and further encapsulate OOP objects into components for
widespread reuse, even within very different programming languages.
Achieving cross-process interoperability is, in many respects, the key to
solving the component software problem. It would be relatively easy to design
a component-software architecture if you assumed all component interactions
occurred within the same process space. In fact, other proposed system-object
models do make this basic assumption. Most of the work in defining a true
component-software model involves the transparent bridging of process and
network barriers. The design of COM began with the assumption that
interoperability had to occur across process spaces, since most applications
could not be expected to be rewritten as DLLs loaded into shared memory. By
solving the problem of cross-process interoperability, COM also creates an
architecture under which components can communicate across a network.
The Component Object Library is the key to providing transparent cross-process
interoperability. This library encapsulates all the "legwork" associated with
finding and launching components and with managing the communication between
components. It insulates components from location differences, which means
that Component Objects can interoperate freely with other Component Objects
running in the same process, in a different process, or across the network
without having separate code to handle each case. Because components are
insulated from location differences, when a new Component Object Library is
released with support for cross-network interaction, existing Component
Objects will be able to work in a distributed fashion without requiring any
source-code changes, recompilation, or redistribution to customers.


COM and the Client/Server Model


The interaction between Component Objects and the users of those COM objects
is based on a client/server model. The term "client" already has been used to
refer to some piece of code using the services of a Component Object. Because
a Component Object supplies services, the implementor of that component is
usually called the "server." The client/server architecture enhances system
robustness: If a server process crashes or is otherwise disconnected from a
client, the client can handle that problem gracefully and even restart the
server, if necessary. 
Because COM allows clients and servers to exist in different process spaces
(as desired by component providers), crash protection can be provided between
the different components making up an application. For example, if one
component in a compound document fails, the entire document will not crash. In
contrast, object models that are only in-process cannot provide this same
fault tolerance. The ability to cleanly separate object clients and object
servers in different process spaces is very important for a component-software
standard that promises to support sophisticated applications. Unlike other
competing object models, COM is unique in allowing clients to also represent
themselves as servers. Many interesting designs have two or more components
using interface pointers on each other, thus becoming clients and servers
simultaneously. In this sense, COM also supports the notion of peer-to-peer
computing. This is more flexible and useful than other proposed object models,
where clients never represent themselves as objects.
Servers can come in two flavors: in-process and out-of-process. "In-process"
means the server's code executes in the same process space as the client (as a
DLL). "Out-of-process" means the code runs in another process on the same
machine (as an EXE), or in another process on a remote machine. These three
types of servers are also called "in-process," "local," and "remote."
Implementors of components choose the type of server based on the requirements
of implementation and deployment. COM is designed to handle all situations,
from those that require the deployment of many small, lightweight in-process
components (like OLE Controls, but conceivably even smaller) up to those that
require deployment of a huge component, such as a central corporate database
server. To client applications, the basic mechanisms remain the same.


Creating Custom Interfaces


To create an interface, the developer uses the Interface Description Language
(IDL) to create a description of the interface's methods. From this
description, the Microsoft IDL compiler generates program header files so that
application code can use that interface. It also creates code to compile into
proxy and stub objects that enable an interface to be used cross-process. You
could write this code by hand. However, allowing the MIDL compiler to do it
for you is far less tedious. The Component Object Library contains proxy and
stub objects for all of the standard predefined COM and OLE interfaces, so you
will only use the IDL if you want to create a custom interface. Example 5
shows the IDL file used to define the custom interface, ILookup, which is
implemented by the PhoneBook object. The IDL used and supplied by Microsoft is
based on simple extensions to the IDL used in OSF DCE, a growing industry
standard for RPC-based distributed computing.


Conclusion


COM is not a specification for how applications are structured, but rather a
specification for how applications interoperate. For this reason, COM is not
concerned with the internal structure of an application--that is your job, and
it depends on the programming languages and development environments you use.
Conversely, programming environments have no set standards for working with
objects outside of the immediate application. For example, C++, which works
extremely well with objects inside an application, has no support for working
with outside objects. COM, through language-independent interfaces, picks up
where programming languages leave off, providing network-wide interoperability
of components.
In general, only one vendor needs to (or should) implement a COM Library for
any particular operating system. For example, Microsoft is implementing COM on
Windows, Windows NT, and the Apple Macintosh. Other vendors are implementing
COM on other operating systems, including specific versions of UNIX. 
It is important to note that COM draws a very clean distinction between the
object model and the wire-level protocols for distributed services, which are
the same on all platforms, and platform-specific, operating-system services
(local security or network transports, for example). Developers are therefore
not constrained to new and specific models for the services of different
operating systems, yet they can develop components that interoperate with
components on other platforms.
Only with a binary standard on a given platform and a single, wire-level
protocol for cross-machine component interaction can an object model provide
the type of structure necessary for full interoperability between all
applications and between all different machines in a network. With a binary
and network standard, COM opens the doors for a revolution in innovation
without a revolution in programming or programming tools.
The Problem with Implementation Inheritance
Implementation inheritance--the ability of one component to "subclass" or
inherit some of its functionality from another component--is a very useful
technology for building applications. Implementation inheritance, however, can
create many problems in a distributed, evolving object system.
The problem is that the "contract," or relationship between components in an
implementation hierarchy is not clearly defined; it is implicit and ambiguous.
When the parent or child component changes its behavior unexpectedly, the
behavior of related components may become undefined. This is not a problem
when the implementation hierarchy is under the control of a defined group of
programmers who can update to components simultaneously. But it is precisely
this ability to control and change a set of related components simultaneously
that differentiates an application, even a complex application, from a true
distributed-object system. So while implementation inheritance can be a very
good thing for building applications, it is not appropriate for a system
object model that defines an architecture for component software.
In a system built of components provided by a variety of vendors, it is
critical that a given component provider be able to revise, update, and
distribute (or redistribute) his or her product without breaking existing code
in the field which is using the previous revision or revisions of his
component. In order to achieve this, it is necessary that the actual interface
on the component (including both the actual semantic interface and the
expected behavior) used by such clients be crystal clear to both parties.
Otherwise, how can the component provider be sure to maintain that interface
and thus not break the existing client's? From observation, the problem with
implementation inheritance is that it is significantly easier for programmers
to be unclear about the actual interface between a base and derived class than
it is to be clear. This usually leads implementors of derived classes to
require source code to the base classes; in fact, most application-framework
development environments that are based on inheritance provide full source
code for this exact reason.
The bottom line is that inheritance, while very powerful for managing source
code in a project, is not suitable for creating a component-based system where
the goal is for components to reuse each other's implementations without
knowing any internal structures of the other objects. Inheritance violates the
principle of encapsulation, the most important aspect of an object-oriented
system.
--S.W. & C.K.
COM Reusability Mechanisms
The key to building reusable components is black-box reuse, which means that
the piece of code attempting to reuse another component knows nothing, and
does not need to know anything, about the internal structure or implementation
of the component being used. In other words, the code attempting to reuse a
component depends upon the behavior of the component and not the exact
implementation--implementation inheritance does not achieve black-box reuse.
To achieve black-box reusability, COM supports two mechanisms through which
one Component Object may reuse another: containment/delegation and
aggregation. For convenience, the object being reused is called the "inner
object" and the object making use of that inner object is the "outer object."
Containment/delegation. The outer object behaves like an object client to the
inner object. The outer object "contains" the inner object and when the outer
object wishes to use the services of the inner object the outer object simply
delegates implementation to the inner object's interfaces. In other words, the
outer object uses the inner object's services to implement some (or possibly
all) of its own functionality.
Aggregation. The outer object wishes to expose interfaces from the inner
object as if they were implemented on the outer object itself. This is useful
when the outer object would always delegate every call to one of its
interfaces to the same interface of the inner object. Aggregation is a
convenience to allow the outer object to avoid extra implementation overhead
in such cases.
These two mechanisms are illustrated in Figure 5. The important part to both
these mechanisms is how the outer object appears to its clients. As far as the
clients are concerned, both objects implement interfaces A, B, and C.
Furthermore, the client treats the outer object as a black box and thus does
not care, nor does it need to care, about the internal structure of the outer
object--the client only cares about behavior.
Containment is simple to implement for an outer object. The process is like a
C++ object that itself contains a C++ string object. The C++ object would use
the contained string object to perform certain string functions, even if the
outer object is not considered a "string" object in its own right.
Aggregation is almost as simple to implement. The trick here is for COM to
preserve the function of QueryInterface for Component-Object clients even as
an object exposes another Component-Object's interfaces as its own. The
solution is for the inner object to delegate IUnknown calls in its own
interfaces, but also allow the outer object to access the inner object's
IUnknown functions directly. COM provides specific support for this solution.
Both Containment/Delegation and Aggregation provide for reuse of components
without violating the OO principle of encapsulation.
--S.W. & C.K.
Figure 1 Component Object Model serves as the foundation for
component-software services.
Figure 2 Virtual function tables (vtables) are a binary standard for accessing
component services.
Figure 3 (a) A typical component object that supports three interfaces A, B,
and C; (b) interfaces extend toward the clients connected to them; (c) two
applications may connect to each other's objects, in which case they extend
their interfaces toward each other.
Figure 4 Clients always call in-process code; Component Objects are always
called by in-process code. COM provides the underlying transparent RPC.
Figure 5 (a) Containment of an inner object and delegation to its interfaces;
(b) aggregation of an inner object, where the outer object exposes one or more
of the inner object's interfaces as its own.
Example 1: C++-style interface definition generated by the MIDL compiler for
ILookup, a simple custom interface.

interface ILookup : public IUnknown
{
 public:
 virtual HRESULT __stdcall LookupByName( LPTSTR lpName,WCHAR
 **lplpNumber)=0;
 virtual HRESULT __stdcall LookupByNumber( LPTSTR lpNumber,WCHAR
 **lplpName)=0;
};
Example 2: The IUnknown interface is supported by all Component Objects.
interface IUnknown
{
 virtual HRESULT QueryInterface(IID& iid, void** ppvObj) = 0;
 virtual ULONG AddRef() = 0;
 virtual ULONG Release() = 0;
}
Example 3: Calling QueryInterface() on the component PhoneBook.
LPLOOKUP *pLookup;
char szNumber[64];
HRESULT hRes;
// call QueryInterface on the Component Object PhoneBook, asking for
// a pointer to the Ilookup interface identified by a unique interface ID.
hRes = pPhoneBook->QueryInterface( IID_ILOOKUP, &pLookup);
if( SUCCEEDED( hRes ) )
{
 // use Ilookup interface pointer
 pLookup->LookupByName("Daffy Duck", &szNumber);
 // finished using the IPhoneBook interface pointer
 pLookup->Release();
}
else
{
 // failed to acquire Ilookup interface pointer
}
Example 4: Two GUIDs, one CLSID for a phone-directory class, and an IID for a
custom interface that retrieves phone-directory information.
DEFINE_GUID(CLSID_PHONEBOOK, 0xc4910d70, 0xba7d, 0x11cd, 0x94, 0xe8,
0x08, 0x00, 0x17, 0x01, 0xa8, 0xa3);
DEFINE_GUID(IID_ILOOKUP, 0xc4910d71, 0xba7d, 0x11cd, 0x94, 0xe8,
0x08, 0x00, 0x17, 0x01, 0xa8, 0xa3);
Example 5: IDL file for a custom interface, ILookup, used by the PhoneBook
project.
[
 object,
 uuid(c4910d71-ba7d-11cd-94e8-08001701a8a3),// GUID for PhoneBook object
 pointer_default(unique)
]
interface ILookUp: IUnknown // ILookUp interface derives from IUnknown
{
 import "unknwn.idl"; // Bring in the supplied IUnkown IDL
 HRESULT LookupByName( // Define member function LookupByName
 [in] LPSTR lpName,
 [out, string] WCHAR ** lplpNumber);
 HRESULT LookupByNumber( // Define member function LookupByNumber
 [in] LPSTR lpNumber,
 [out, string] WCHAR ** lplpName);
}








Special Issue, 1994
IBM's System Object Model 


The linchpin of IBM's object-enabling infrastructure




F.R. Campagnoni


Frank, a senior software engineer for IBM's object technology group, can be
contacted at frc@austin.ibm.com.


The object industry today is a patchwork of islands of information residing
within the confines of a myriad of incompatible object systems. For example,
consider binary C++ class libraries that cannot be shared among developers
using different C++ compilers--let alone by Smalltalk or Cobol programmers. 
To address some of the key inhibitors to the widespread acceptance of object
technology, and to overcome some key impediments to object interoperability,
IBM created the System Object Model (SOM). SOM is the linchpin of IBM's
object-enabling infrastructure. Eventually, SOM will underlay all of IBM's
object technology product offerings (including OpenDoc, the Taligent
frameworks, and the Workplace family of operating systems).


What is SOM?


The SOM technology was designed specifically to overcome several major
obstacles to the pervasive use of object-class libraries. The goal is to
enable the development of "system objects"--which can be supplied as part of
an operating system, a vendor tool, or an application--with the following
attributes:
The objects can be distributed and subclassed in binary form. Developers of
class libraries do not need to supply source code to allow users to subclass
their objects.
The objects can be used, with full subclassing, across languages. It is
possible to implement an object using one language, subclass the object using
another language, and use that class to build an application in yet a third
language. Developers want to modify and build applications from class
libraries in their preferred language, which is not necessarily the one in
which the classes were originally written.
The enabling technology allows for the subsequent modification (bug fixes or
enhancements) of these components without having to recompile preexisting
clients that use them (upward binary compatibility). This is a key requirement
because applications that depend upon system libraries cannot be rebuilt each
time a change is made to a component in the library. 
To achieve this goal, the developers of SOM designed an advanced object model
and implemented the object-oriented run-time engine necessary to support this
model. SOM supports all the concepts and mechanisms normally associated with
object-oriented systems, including inheritance, encapsulation, and
polymorphism. In addition, SOM possesses a number of advanced object
mechanisms, including support for metaclasses, three types of method dispatch
(with both static- and dynamic-method resolution), dynamic class creation, and
user intercept of method dispatch.
SOM has been commercially available since 1991, when it first appeared in OS/2
2.0. In addition to OS/2, it is now available for AIX, Windows, and Mac System
7. Over the next two years, SOM is likely to appear on other UNIX platforms
and Novell's NetWare, as well as IBM's Workplace, MVS, and OS/400 operating
systems. SOM has been selected by the Component Integration Laboratories (CIL)
as the underlying object model and run-time engine for the OpenDoc
compound-application technology. SOM is also used by Taligent (a development
company that is a joint venture of Apple, IBM, and HP) in the Taligent
Application Frameworks (TalAE).
One source of confusion when comparing compound document technologies has been
the relationship of the OpenDoc technology to SOM. SOM is object-enabling
technology. It was never intended to provide compound-document functionality.
OpenDoc, developed and distributed by CLI, is built upon the SOM object model
and run time, as well as SOM's distribution framework, and provides a
framework specifically designed for building components, or "parts" that can
be integrated into compound documents.


Backplanes and Frameworks


SOM can be thought of as analogous to the hardware backplane in a personal
computer; see Figure 1. Like the hardware backplane, SOM has slots for objects
or frameworks (defined later) to be inserted, analogous to the boards that
plug into the hardware backplane. The major difference between a traditional
PC backplane and SOM is that the PC backplane is used primarily as a
communications vehicle between the computer's CPU and peripheral devices. SOM
is a peer-to-peer communications vehicle interconnecting objects and
frameworks with each other, rather than to a central "master" controller.
SOM comes packaged with a number of frameworks. Frameworks are interrelated
sets of SOM objects designed to solve a particular problem. They are analogous
to the hardware boards that plug into a PC backplane, as shown in Figure 2.
Like many of the boards that populate a PC backplane, the SOM frameworks are
built to be extended, modified, or completely replaced. Three of the
frameworks packaged with SOM are object persistence, object replication, and
object distribution. 
The purpose of the distribution framework (sometimes called "distributed SOM")
is to seamlessly extend SOM's internal method-dispatch mechanism (the piece of
SOM responsible for invoking operations on objects) so that methods can be
invoked in a programmer-transparent way on objects in a different address
space or in a different machine from the caller.
The object-distribution framework can be viewed as a board fitting into the
SOM backplane that has some components preinstalled, as well as empty sockets
for additional components; see Figure 3. Components that come preinstalled in
SOM's distribution framework allow messaging between objects in different
address spaces on the same machine.
Additional components can be added (marshaling, transport, and so on) to
support messaging between objects on different machines. Components also can
be replaced, depending upon the particular distributed-computing environment
that needs to be supported. For example, different marshaling engines,
transports, or location services can be substituted for the ones supplied by
IBM. Of course, if desired, the entire distribution framework could be
replaced with another CORBA-compliant, distributed-object framework.
Particular installations of SOM may differ according to what frameworks and
components are installed and how they are configured. It is important to note,
however, that these installations do not constitute different varieties of
SOM, but rather different configurations --the underlying SOM infrastructure
is exactly the same in every case.
This is analogous to the almost unlimited variety of configurations available
for different PC architectures (Mac, Intel, and so on). In general, different
configurations run the same binary software, the differences being related to
the resolution of the display or access to different peripherals. Just as you
can increase the power of a PC by adding a floating-point processor or
upgrading the display hardware, you can enhance SOM frameworks by adding or
replacing components with ones capable of meeting the requirements---say, for
example, of enterprise-wide, intergalactic, distributed-object environments.


SOM, Distributed SOM, and CORBA 


SOM is a packaging technology and run-time support for the building of
language-independent class libraries. SOM's distribution framework, a set of
SOM classes (shipped with the SOMobjects Toolkit), allows methods to be
invoked (in a programmer-transparent way) on SOM objects that exist in a
different address space from the caller.
With SOM, IBM is striving to achieve many of the same objectives that the
Object Management Group (OMG) aspires to with its Common Object Request Broker
Architecture (CORBA) specification. Their common goal is to facilitate the
interoperation of objects independent of where they are located, the
programming language in which they are implemented, or the operating system or
hardware architecture on which they are running. 
The distributed SOM class library is fully compliant with the CORBA spec,
supporting all CORBA data types, functionality, and programming interfaces.
Distributed SOM is a framework built using the SOM technology that allows
developers to construct distributed-object applications. Distributed SOM does
not implement a separate object model or run time, since it is built with the
SOM object model and run time. Currently, fully interoperable versions of the
distribution framework are available for SOM on AIX, OS/2, and Windows.
Visualize SOM as a highly optimized, single-address-space, object-request
broker (ORB) that provides interlanguage interoperability and supports binary
subclassing and upward binary compatibility. Using SOM, objects implemented in
different languages can be combined in the same address space. SOM is fully
compliant with the OMG's CORBA specification. For example, SOM classes are
defined by using the CORBA Interface Definition Language (IDL), and support is
available for all the CORBA data types. C-language bindings for SOM classes
are compliant with CORBA (CORBA does not yet have standard bindings for C++,
Smalltalk, and Cobol). SOM provides an interface repository supporting the
CORBA functionality and programming interfaces.
The fact that SOM deals with object implementations distinguishes it from the
narrower focus of the CORBA spec (which defines object interfaces without
regard to implementation). As do other CORBA-compliant implementations, SOM
extends the spec's capabilities. SOM goes beyond CORBA by supporting
implementation inheritance and polymorphism, providing metaclasses that are
manipulated as first-order objects, and allowing dynamic addition of methods
to a class interface at run time.


How Does SOM Work?


SOM achieves cross-language interoperability by building its method-dispatch
mechanism based on system-defined procedure-linkage conventions. This means
that SOM follows the register- and stack- utilization conventions defined by
an operating system for all programs, regardless of their implementation
languages. System-linkage conventions also dictate how return values are
passed from the callee back to the caller. By using the system-linkage
protocol, SOM can dispatch methods independent of the language in which the
executable code was written. As a result, virtually any language that supports
the system procedure-call linkage conventions can use a SOM class, or can be
used to implement a SOM class.
SOM accomplishes upward binary compatibility by completely encapsulating the
implementation details of a class. A client calling a SOM class has no
information about the size and entry points to that class compiled into its
executable. Method dispatch and access to instance data is effected through a
set of data structures that are computed during the construction and
initialization of a class.

Two of the most important SOM data structures are the ClassData structure and
the SOM method table. Because these structures are completely computed at run
time, a SOM class can be modified (by refactoring the class hierarchy, moving
methods up the hierarchy, or adding methods or instance data, for example)
without requiring recompilation of the client code. In addition, the SOM data
structures can be manipulated by the programmer at run time, giving the class
implementor enormous flexibility in enhancing or controlling method dispatch. 
By completely encapsulating the implementation of an object, SOM overcomes
what Microsoft refers to as the "fragile base class problem"--the inability to
modify a class without recompiling clients and derived classes dependent upon
that class.


Method-Resolution Mechanisms


SOM supports three different method-resolution mechanisms: offset, name
lookup, and dispatch. These mechanisms are distinguished by the amount of
information required about the object, and by the method and method signature
known at the time the client application is compiled. The more information
that is known at compile time, the more efficiently the method-resolution
mechanism can be used. However, having all information hardwired into the
client application at compile time reduces the flexibility of the client, as
well as its ability to dynamically determine the object, method, or parameters
used in a method invocation. As a result, it is useful to have alternative
method-dispatch mechanisms available, which may be less efficient, but more
flexible.
Offset method resolution is the default mechanism for method invocation on a
SOM object. It is appropriate whenever the following are all known at compile
time: the method name, the method signature, and the class that introduced the
method. Offset method resolution offers a highly optimized path for invoking a
method. In SOM, the method tokens in the ClassData structure are actually
method-resolution thunks. Using offset resolution, SOM invokes a method by
simply calling the appropriate thunk with the arguments needed for the method.
Name-lookup method resolution is appropriate whenever the method signature is
known at compile time, but the name of the introducing class or the method
name itself is not. Name-lookup resolution is less efficient than offset
resolution. However, this resolution mechanism is more flexible because the
particular method and object on which that method is invoked can be determined
at run time based on heuristic code in the client application.
Dispatch resolution is the least efficient (but the most flexible) of the SOM
method-resolution mechanisms. Using dispatch resolution, a programmer can
dynamically construct the method call on an object. Dispatch resolution can be
described as a dynamic-invocation interface, whereby a request on an object
can be constructed and invoked at run time. The dispatch method-resolution
mechanism is appropriate if it is desired to have neither the object class,
method name, nor method signature compiled into the client application code.
The method name, a memory area to hold any result value produced by the
method, and a data structure that contains all the arguments needed for the
method, are all supplied as arguments to the dispatch mechanism. The object id
is the first member of the argument list.


Comparison to COM 


Comparing SOM and COM is a bit like comparing the engine of an automobile with
its specifications. A car's engine is not a specification. It is the essential
piece of the car that generates the vehicle's motion. Similarly, the engine's
specification will not impart motion to the car, but rather must first be used
to build an engine before a car can be expected to actually move. In this
crude analogy, the engine corresponds to SOM and the specification corresponds
to COM.
An immediately observable difference between SOM and COM arises from these
disparate uses of the term "object model" as a distinction in the type of code
that a developer must write in every application. With SOM, a programmer
writes code that uses the object infrastructure SOM provides. With COM, the
programmer must also write the code that implements many of the rules and
guidelines that comprise the COM infrastructure. COM requires this code to
appear in every program, regardless of whether it is written manually or can
be partially automated with some development tool. Essentially, COM is a set
of rules programmers must interpret and follow in order to build these
components.
Although frequently discussed, COM currently lacks the cross-machine,
distributed-object capabilities of SOM. At the moment, there is no
"distributed COM." Microsoft plans to introduce distributed COM in their Cairo
operating system slated for beta testing sometime in late 1995.
OLE 2.0, Microsoft's linking and embedding technology, is built using COM.
However, Microsoft does not go to very much effort to distinguish COM from OLE
2.0 (the two are bundled together). The major reason for this is that OLE
currently represents the only concrete instantiation of COM.
SOM is a complete implementation of a syntax-free, object-oriented, run-time
engine--one that has been carefully engineered to have a robust binary
interface completely encapsulating implementation detail. This is underscored
by the fact that several C++-compiler vendors currently are using SOM for
their run-time library. Object-oriented language compilers that utilize SOM as
their run time are referred to as "Direct-to-SOM" compilers. Because SOM has
been designed to support a broad set of OO semantics, other languages (both
object oriented and procedural) can utilize the SOM run time through
intermediary mapping layers referred to as SOM "bindings."
Regardless of which approach is utilized (direct-to-SOM or language bindings),
the advantage to a developer is that class libraries can be built that sport
robust binary interfaces. Client programs may be constructed that are derived
from the classes in the library using normal object-oriented inheritance
techniques without compromising the ability of the class-library implementor
to make evolutionary changes in the library's internals, and without requiring
all client programmers to use the same development language. In short, SOM
objects are similar to the normal objects in an object-oriented programming
language (OOPL), except that their binary interfaces have been made more
robust and replaced with language-neutral mechanisms.
Microsoft's COM, on the other hand, while equally effective at hiding an
object's implementation details, does not attempt to be a run-time engine for
object-oriented programming. In fact, Microsoft questions the appropriateness
of today's object-oriented programming languages for exposing the interfaces
of an interoperable software component. The COM specification is a way of
hiding the OOPL notion of an object and exposing instead the different
abstraction called a "Windows Object."
One area in which Windows Objects differ from typical OOPL objects is that of
object identity. Windows Objects are not accessed in the same way as OOPL
objects. Whereas an OOPL would allow you to designate a particular object with
an "object reference" (or a pointer), a programmer never actually obtains a
reference to a Windows Object. Instead, Windows Objects are accessed
exclusively through their interface reference (pointers), and one must obtain
a separate interface pointer for each of the object's interfaces that a
programmer needs to use.
For example, if an object O supports three different interfaces (I1, I2, and
I3), you could obtain and use references for O's I1, I2, or I3, but never
obtain a reference for O itself. If you had a reference for O's I1 and I2, the
only way you could even be sure that both of these referred to the same
underlying object would be to query O's I1 for a reference to I2, and then see
if it was the same I2 reference you already had. In general, this is always
possible because COM requires every interface to support the IUnknown
protocol. The IUnknown protocol specifies three functions that should appear
in every COM interface, and the first of these functions should permit you to
obtain a pointer to any of an object's other interfaces.
Notice that the descriptions of COM frequently use words like "should." This
is because COM is largely a set of rules that are not actually enforced
anywhere. When creating a Windows Object, it is the programmer's
responsibility to implement all these rules and to get them right. This is yet
another difference between SOM and COM. SOM's semantics are implemented within
the SOM run time, while almost all of COM's semantics must be implemented by
the developer in each COM object. 
In 1987, Peter Wegner of Brown University introduced some order to the OO
community by suggesting a subsequently well-accepted taxonomy for classifying
object systems and programming languages based on the features and programming
paradigms they support. In Wegner's terminology, systems that have classes and
support implementation inheritance can be properly called "object oriented,"
while those without implementation inheritance can be characterized as "object
based." This is the significant difference between SOM and COM: SOM is "object
oriented" while COM is merely "object based."
The difference in approaches amounts to the fact that SOM permits class
libraries to be developed by using conventional object-oriented programming
paradigms, and to offer these same paradigms to their clients. COM rejects the
object-oriented notion of implementation inheritance in favor of totally
different paradigms for client programming. Microsoft calls these new
paradigms "containment" and "aggregation," and offers them as an alternative
approach for object reuse.
Aggregation and containment are essentially manual techniques for code reuse
entirely implemented by developer-supplied code. COM is not involved in
aggregation or containment. It simply provides rules about what users must
write.
Containment is used if a programmer wants to change some aspect of the
implementation of an object. It requires the programmer to encapsulate the
object to be modified with another object whose interface includes the same
functions as the encapsulated object. The programmer then supplies the new
behavior for functions that are to be changed and provides redispatch stubs to
call the corresponding function in the encapsulated object for functions that
are not changed.
Aggregation is used when a programmer wishes to add functionality to an
object, but does not need to change any of its preexisting behavior.
Aggregation is really nothing more than an optimized form of containment in
which the programmer is not required to write redispatch stubs for each
function in the object's interface. However, the ability to support
aggregation must be explicitly built into a COM object by the developer, so
not all Windows Objects can be used in this fashion.
With SOM, IBM has chosen to solve the fragile base class problem as opposed to
constraining a developers' ability to apply widely accepted object-oriented
techniques. Developers of SOM objects can employ familiar paradigms such as
single inheritance, multiple inheritance, or abstract (interface-only)
inheritance.
Figure 1 SOM has slots for objects or frameworks to be inserted that are
analogous to the boards that plug into the hardware backplane.
Figure 2 SOM frameworks, analogous to PC hardware boards, are interrelated
sets of SOM objects designed to solve a particular problem.
Figure 3 The object-distribution framework is akin to a board that fits into
the SOM backplane that has some components preinstalled.



























Special Issue, 1994
OpenDoc


An open technology for smart documents 




Jeff Rush


Jeff is a software consultant working with AmigaDOS, OS/2, PC-DOS, and
embedded systems. He is also the creator of the Echomail conferencing facility
used in Fidonet. Jeff can be contacted at jrush@onramp.net.


As tomorrow's applications grow more complex by adding feature upon feature, a
new approach is needed that structures applications for simplicity. Today's
users are increasingly overwhelmed with bells and whistles. Each added wrinkle
satisfies the interests of one small market segment, but the entire set is
considered necessary for the application to have broad appeal. The result is
"Swiss-army-knife" applications with thick user manuals, long training cycles,
and plenty of support calls. Developers pay a price for this approach as
well--long development schedules, arduous test schedules, an increased
time-to-market, and lost opportunities. This makes it almost impossible for
small developers to participate in the major application categories.
OpenDoc is an enabling technology designed to address these problems. OpenDoc
restructures application development in a way that fosters small, reusable,
interoperable application components that users can mix and match according to
their precise needs. This article provides an overview of this technology,
which was designed at Apple, but is now being deployed by a number of vendors
(such as IBM, WordPerfect, and Apple). For a brief history, see the
accompanying text box entitled, "Origins of OpenDoc." 
OpenDoc simplifies applications by letting the end user select and install the
features needed, omit the ones not needed, and, at the same time, level the
playing field so that big and small companies alike can contribute to a larger
market of available feature sets or components. OpenDoc is an open
architecture for the creation of compound documents--which can contain many
different types of data (such as text, graphics, tables, video, sound, and
animation). The documents can be edited, printed, circulated in read-only
form, presented as a slide show, and circulated for mark-up and review. It
pursues the goal of reducing software complexity by using replaceable binary
objects. Application modules interoperate in a seamless fashion by using a
single, shared user interface (UI). It frees the user to view one's work in a
document-centric fashion, instead of as a set of applications between which
one must move. It allows users to build documents that can be used on
different platforms, without the hassle of conversion (which is especially
useful on heterogeneous networks). Users can customize their work environments
by discarding, adding, or replacing objects with those from other vendors.
Through the use of distributed objects, such documents can reach across
networks and transparently participate in client/server arrangements, pulling
data from several sources into a document.


Uses of OpenDoc Documents


OpenDoc is more than a super-extensible word processor. An OpenDoc document
can be used as an audio- and animation-enabled slide show, or as a shared
electronic blackboard with replicas of a document coordinated by distributed
objects, or even as an electronic voting ballot that tallies opinions as the
document circulates. In the latter case, the ballot document could either
store the responses internally for later retrieval or transmit them over a
network by using remote objects. 
Because of OpenDoc's support for connectivity and scripting, documents can be
created that track each time they are read or written. Such documents could be
used, for example, by a publications department to monitor its corporate
manuals and determine their popularity and audience. Authors could be notified
by e-mail each time their work is read. 
The content in an OpenDoc document can be dynamic and automatically generated
by query to a remote database each time the document is opened (such as the
case of a bill-of-materials that shows the current cost of goods by way of
queries to a parts database). OpenDoc supports multiple representations of
information within a single document, allowing documents to be opened in, say,
French or German, according to the reader. Scripts can be embedded in
documents and activated upon viewing--for example, to first present only an
outline and then more detail as desired. A smart table of contents can take
the reader to the pointed-to section, depending on who opens the document.
Since OpenDoc supports the compression and encryption of parts of documents, a
script could request a password, consult with a remote security key database,
and unlock certain portions of the document or verify a digital signature on a
document. 


Benefits for Users


OpenDoc provides a unified, but customizable, system for creating complex
documents, with a consistent UI. Users no longer must run different
applications to create the various parts of a document and then import these
pieces into one master document. Users no longer must remember which
particular application they are in. With some of the early
linking-and-embedding technology, it was necessary to be acutely aware of
switching between applications. By contrast, OpenDoc presents a single work
area (or "shell document") on which users can paste various content types. A
single, high-level UI is used for manipulating the environment. Users can
customize this by adding new content types, tools such as spelling checkers
and thesauri, menu items, and buttons. OpenDoc acts as a high-level toolkit
that allows users to tailor the environment for their particular needs. Shell
documents can be read-only (for the viewing and distribution of information)
or instead be fully functional editors. The user interface for OpenDoc is
based upon extensive research originally conducted by Apple. Each platform
vendor can modify the UI to conform with platform-specific conventions.
OpenDoc relies on a multimedia-capable storage format called "Bento" to hold
documents. This is the same format used by the ScriptX multimedia authoring
environment designed at Kaleida Labs (another Apple spin-off). The name
"Bento" comes from the Japanese-style lunch box, which has multiple
compartments, each containing disparate elements arranged in an aesthetically
pleasing manner. The name represents the fact that the format is very
flexible, so that new types of information can be defined without disrupting
any existing content types. The format is designed with the real-time
requirements of multimedia in mind, so that sound and animation content types
can be played back reliably and without interruption. The format also permits
the alternative versions of a document to support different authors working on
document drafts, or to support alternative content types for using a document
on different platforms. For example, a user can make a picture available in
one form for Windows and another for the Mac, or create a block of text in
both Italian and English, and each user would see the appropriate one.
OpenDoc has provisions for a scripting mechanism that allows users to
collaborate across space and time by explicitly writing scripts or by
recording their actions and converting them into scripts. These scripts do not
have to be low-level macros that only contain raw information such as mouse
clicks and keystrokes. Instead, the scripts can encode operations that
reference the parts of a document semantically--by section, paragraph, word,
and so on--independent of which application object is responsible for which
content type. For example, a script could scan the figure captions in a
document and index them, or change their fonts to a particular style, then
create a summary page and transfer the caption information to that page.
OpenDoc scripts are in an English-like syntax similar to Applescript and
Hypertalk. They can be mechanically translated to other languages (say, a
Kanji-like version of the scripting language) without losing their meanings.
Also, since the parts of a document are actual objects and are "smart," the
script verbs are handled automatically in a manner appropriate to the content
type.
OpenDoc permits users to add new capabilities and content types by drawing
upon a third-party market of objects, known as "parts" in OpenDoc parlance.
Users can select content types from a palette of available "stationery," which
then get dropped onto documents. New parts can be purchased and added to the
palette of stationery. New tools to manipulate existing types can be added as
well, appearing as menu items or buttons. For example, a user can select a
spelling-checker object based on the size of its dictionary or
suffix-recognition ability, and it will operate across the appropriate content
types of text and tables, even reaching into nested types to spell check
embedded text (such as bar-chart labels).
Through its reliance on standards, OpenDoc is able to interact with other
technologies and architectures. OpenDoc uses IBM's System Object Model (SOM)
for its object-messaging facility. SOM conforms with the CORBA object
technology fostered by the Object Management Group (OMG); see the article,
"IBM's System Object Model," by F.R. Campagnoni, on page 24, in this issue.
This means that OpenDoc objects will be able to invoke objects distributed
across any of the wide range of platforms that support CORBA now (or will in
the future), such as Macintosh, OS/2, DOS, the various incarnations of
Windows, NextStep, and even the worlds of UNIX boxes and mainframes. When
encountering platforms that use proprietary standards (such as Microsoft's OLE
2.0), OpenDoc documents can work with those objects by using bridging
technology such as Open Linking and Embedding of Objects (OLEO). OLEO enables
bidirectional interoperability between OpenDoc and Microsoft OLE 2.0, and is
being implemented by WordPerfect for Windows. Because of compliance with the
CORBA standard or by using bridging components, OpenDoc will be able to
interact with NextStep's Portable Distributed Objects (PDO), Sun's Distributed
Objects Everywhere (DOE), Novell's AppWare Bus, and Hewlett-Packard's
Distributed Object Management Facility (DOMF), as well as a few others. 


Benefits for Developers


OpenDoc's object-oriented architecture (see Figure 1) speeds up development of
new parts by letting the developer build upon existing objects using multiple
inheritance and polymorphism. Developers must define only those
characteristics in their object that are different from those of parent class
or classes. For details on the process, see the accompanying text box, "Steps
in Creating an OpenDoc Part." 
Parts can be designed in a general-purpose manner, and then reused on other
product lines or sold on the open market to other developers. Some user
documentation can be reused as well, since the fundamental way of interacting
with a part is not likely to be different between projects. A spell checker is
still a spell checker.
OpenDoc lowers development risk by reducing the complexity of applications.
Bugs and schedule slippage can be better controlled by encouraging the
creation of small, single-purpose, part-handler objects, with well-defined
APIs that can be more easily developed and tested. Such parts can be tested in
isolation from other parts and (because of the fewer combinatorial inputs) be
tested more fully. Once a library of such parts exists, creating a new
application becomes primarily a matter of selecting the base objects and
writing the portions unique to the particular application.
To ensure that a large pool of objects can be made available in the diverse
marketplace, OpenDoc uses the packaging mechanism of SOM to create binary
executable parts that can be replaced or upgraded with just a file copy. This
allows the creation of objects written in any language that conforms to the
SOM standard and gives an object the ability to be called from any other such
language without the usual incompatibilities (such as differing calling
conventions, register usage, and name management/ mangling). This technology
will allow objects to communicate between process spaces on a single machine
or across a network. An object can then call upon the services of other
objects located, say, out on the Internet or the corporate LAN. By packaging
objects in such an interchangeable manner, it becomes possible for vendors to
offer them for sale to other developers for integration into their
applications.
As mentioned previously, WordPerfect is developing OLEO, the bridging
technology that permits OpenDoc objects to interact with OLE objects. Other
such bridges are in the works as well. This creates a larger market of reuse
opportunity by letting OpenDoc developers invoke other types of objects and
lets them sell SOM objects into the OLE 2.0 market for those customers with
OLEO bridges.


The OpenDoc Architecture 


OpenDoc is comprised of several major subsystems (see Figure 2), each rather
independent in its own right, and some usable without the others. Included
are: the shell document along with a pool of part handlers, a
geometry-negotiation protocol, an object-storage mechanism, an exposed event
flow, and an object-packaging technology. To maximize cross-platform
portability, OpenDoc does not specify drawing systems, coordinate systems,
window systems, or human interface guidelines. To enable the parts to work
together, OpenDoc does specify protocols covering storage management, event
distribution, the run-time model, and the management of the human interface.
The basic visual framework within which OpenDoc operates is called a "shell
document." It resembles the UI of a word processor, but without presuming the
type of data that is to be manipulated. The shell document provides an address
space, distributes events, and provides basic UI resources such as windows and
menus. It relies upon an open-ended pool of part handlers to provide the
functionality to handle specialized forms and presentations of data such as
text, spreadsheets, graphs, and so on--OpenDoc "parts." The particular style
of the framework is platform dependent. It is designed to minimize conflicts
with known platform conventions of the Macintosh, Microsoft Windows, Motif,
OpenLook, and OS/2.
A part handler is responsible for displaying the part (both on screen and on
paper), editing the part, and managing the storage for the part (both in
memory and on disk). The part handler and the part itself, together, comprise
a high-level object, with the part providing the state information and the
part handler providing the behavior. Part handlers come in two flavors:
editors and viewers. The viewer is simply a subset of the editor. It lacks the
ability to alter the part. The view-only OpenDoc shell takes up less disk
space than the editor and can be freely distributed.
Each part handler operates within an arena of space and relies upon a protocol
to negotiate the use of the geometric space, which may represent the display
screen or the printed page. This protocol is quite dynamic, allowing objects
to move and adjust on-the-fly as other objects are added or changed. OpenDoc
supports nonrectangular regions, as well as the usual rectangular ones. This
is one advantage over the current version of OLE. (Microsoft has said the
nonrectangular regions will be supported in future versions of OLE.)
Note that the ability to print is not within the domain of OpenDoc proper, but
is up to each part to decide how and what to print. The highest part in the
hierarchy, the root part, drives the printing process.



Bento Object Storage


Every document-manipulation system must have a way to store a particular
complex arrangement of data to disk. OpenDoc relies on the Bento storage
system to hold a document's contents. Bento has the ability to subdivide a
storage container to hold wildly differing forms of data, in a structure that
lends itself to portability between platforms and that addresses the real-time
playback requirements of multimedia. 
Bento can coordinate multiple data streams within a single file. It also
provides a robust method of annotation for links between objects, whether
located in the same file or in another. It has elements that support the
tracking of draft revisions and that arbitrate such revisions when several
authors are collaborating on the same document. Bento is not, however, a
full-blown, object-oriented database of the corporate server variety. That
would make Bento too complex. Instead, it focuses on the management of the
content of a system of structured files, along with references to external
data items. There is nothing, however, in the Bento architecture that would
preclude it from being hosted on top of an existing object-oriented database.
Bento treats a file as a highly structured container that can hold multiple
objects nested within each other. Each object can have multiple properties,
each of which can have multiple values. In Bento, a "value" is a byte stream
with an associated type that defines how the byte stream is interpreted. A
property serves to indicate the role of the value, but not the type itself.
For example, a property may specify that a byte stream has a role of document
title. However, it is the byte stream's type that specifies which character
set is used to interpret the byte-stream value. A different document may have
a title property as well, but with a value typed as a graphic image in order
to represent a stylized corporate logo.
The byte streams that comprise values may be of any length, with random access
possible to any point within. In addition, it is possible to store a byte
stream in noncontinuous pieces on a particular storage medium (interleaved
with other data), to facilitate the real-time playback of multimedia data.
For example, a value containing a sequence of compressed images that make up
an animation may be interspersed with another value containing the audio
soundtrack. Then, during playback, the storage media does not have to seek
between wildly distant regions of the disk. The disjoint nature of these
values is hidden from those objects that do not need to know. The ability to
have disjoint values also means that values can be edited without rewriting
the entire value, which might be quite slow for multimegabyte values that
represent animations or sound tracks. Unlike your normal file system, however,
values support the ability to snip out or splice in new byte sequences in
order to adjust the length of a region of a value. By recording the set of
pieces that make up a value at any particular time, it is possible to track
revisions made to an object. This is how OpenDoc keeps track of revision
drafts of documents.
Bento supports other interesting features such as layered data transforms (for
example, compressing a value, and then encrypting it). Also, users can have
out-of-line data references that can refer to external files or to values
elsewhere in the same container. This makes possible the scenario in which
several multimegabyte 24-bit images stored on a CD-ROM are used in a document.
This document can contain references to the files on the CD-ROM, as well as
alternative representations in low-resolution, 8-bit images stored within the
document itself. If the CD-ROM drive is not available (say, when a user is
traveling), the low-resolution images are substituted automatically so that
the user can continue to edit while on the road.
The physical layout of the Bento storage format is published information, so
that even non-OpenDoc environments can retrieve information from within a
Bento container.


Exposed Event Flow


The flow of events within OpenDoc is made visible to the parts in a document,
so that the actions of a user can be recorded and played back. OpenDoc events
are not just keystrokes and mouse clicks, but rather semantic actions at a
higher level of abstraction. Parts may emit events while they operate. They
can also inject events into the flow in order to alter the state or behavior
of other objects. OpenDoc provides a powerful referencing facility so that
events are meaningful in the context of application usage. For example, an
event can indicate that the user deleted the fourth word in the third
paragraph in chapter four. This is in contrast to the simple macro-like
recording of low-level events used by other systems--in which the mouse
pointer moves to a paragraph, the mouse button is clicked, and the Delete key
pressed. The effect of this sequence is heavily dependent upon the resolution
of the screen, and other similarly unrelated constraints. In addition, such a
low-level event stream is not very meaningful to users examining the resulting
stored script.
By recording the sequence of changes made to documents at a high level of
abstraction, OpenDoc can maintain a meaningful revision history and can
associate changes made by several authors with each person responsible, for
coordinated integration later. The high-level event stream is also more
concise than macro-level recording, and can be transmitted over a slow-speed
communications link to create an efficient electronic blackboard--in which
geographically dispersed authors can simultaneously work on a document and
have their actions reflected in the remote copies in close to real time. Also,
you can imagine an author receiving revised copies of a document and then
invoking a script that walks through the changes made by others, highlighting
each change on the screen, and audibly explaining its rationale. The script
could prompt the author to approve each change before it is applied to the
final draft. As you can see, scripting allows the construction of complex
client applications disguised as compound documents.
It is important to note that OpenDoc has no specific scripting language per
se, but rather the provision for multiple scripting languages using the Open
Scripting Architecture (OSA). The OSA design provides for the interception and
injection of UI and semantic events, organized into standardized suites of
common events for word processing, graphics, and other application-usage
scenarios. However, each platform vendor (or a third party) must provide its
own scripting language and interface it to these events. On the Mac, the three
OSA-compliant scripting languages are Apple Computer's AppleScript, UserLand
Software's Frontier, and CD Software's QuicKeys.


Object-Calling Convention


For OpenDoc to succeed, it must be easy to create the objects that make it up.
This means that objects created using one set of tools (for example, Borland
C++) must be able to interoperate with objects created by another (say, the
Watcom compiler). Because of the differing name-mangling conventions, register
conventions, and in-memory object representations, this is often not possible
with today's objects. Many developers also prefer their C straight, or use
Smalltalk, REXX, Pascal, or some other language. To address this and other
issues, IBM created SOM as an interoperable way of packaging objects.
SOM provides a language-neutral, load-on-demand, object-calling convention
that supports distributed services and field-replaceable components. SOM 1.0
ships with every OS/2 2.x package and is the basis of the Workplace Shell user
interface. SOM 2.0 was revised to comply with the CORBA object standard and
enhanced to support access to objects distributed across a network. This
version of SOM, along with the developer's kit, is the Warp II Beta of OS/2
now in circulation. SOM for Windows is also now available for purchase. This
is IBM's contribution to the OpenDoc effort. By using SOM to construct the
objects that make up OpenDoc, these objects can be written in any language, by
any compiler, and still interact. SOM objects on OS/2 and Windows are
represented as DLLs, each of which may contain one or more object classes. The
interfaces to these libraries are designed such that new methods can be added
without affecting the callers caused by shifted entry points. Classes under
SOM are also objects themselves. So, unlike classes in C++, SOM classes can
dynamically change behavior at run time.
Although the alpha seed of OpenDoc did not include SOM, the beta releases for
all platforms are expected to. SOM should be fully integrated into the OS/2,
Windows, and Macintosh platforms by year's end.


Conclusion


OpenDoc is a future-oriented technology that, if it prevails, will restructure
how applications are written and marketed. Even if it does not succeed, it is
likely that many of the design ideas in OpenDoc will be imitated and
incorporated into the technological mainstream of the mid-1990s.
Origins of OpenDoc
The original ideas behind OpenDoc, including its storage format, known as
"Bento," and the Open Scripting Architecture (OSA), all arose at Apple and
were intended for the Macintosh platform. Apple approached IBM and other
companies for support in forming a nonprofit industry association, Component
Integration Laboratories (CI Labs). The charter of CI Labs is to promote
OpenDoc. The cost of participating in the organizational stage was low and the
following companies formed the founding group: Apple, IBM, Novell/WordPerfect,
SunSoft, Taligent, and XSoft (a division of Xerox). Once CI Labs was underway,
several of the founding companies made business decisions not to join, or are
still evaluating that choice. Among these are SunSoft, Taligent, and XSoft.
Lotus has recently signed up.
The technology pieces of OpenDoc are being contributed by various members.
Apple is supplying the object protocols, the Open Scripting Architecture
(OSA), and the Bento storage subsystem. IBM is supplying the SOM
object-messaging facility and the OS/2 port of OpenDoc. WordPerfect, in
conjunction with Novell, is providing the Windows port of OpenDoc and the
piece that lets it interoperate with Microsoft's OLE 2.0. Novell is giving
network support for distributed access to objects. Taligent is participating
to ensure that OpenDoc fits in with its application frameworks. (At this
writing, there are rumors that WordPerfect may halt its development efforts
because of recent "peace accords" between its parent company, Novell, and
Microsoft, whose OLE technology is the chief competitor of OpenDoc.)
The stated goal of OpenDoc is to produce a level playing field for application
development, so that small companies can participate in major application
markets. Ironically, the current members of CI Labs are a few large firms.
While they may have little direct financial interest in helping the smaller
firms, they have their own reasons for supporting OpenDoc. We can only
speculate about some of their motivations. Apple must attract to the Macintosh
some of the market momentum that OLE 2.0 has given Windows. IBM must extend
its enterprise-wide systems across larger markets. WordPerfect seeks to retain
its application market by broadening its feature base and moving into
distributed document management.
By making reference sources to the various OpenDoc technology pieces freely
available, CI Labs is hoping for rapid deployment across diverse
platforms--Mac, OS/2, Windows, various flavors of UNIX, PowerPC, and so on.
Some of the pieces (such as SOM for OS/2 and Windows and Bento for the Mac)
are available now as separate items. OpenDoc is designed for incremental
adoption by application vendors, and each subsystem is fully replaceable by a
platform vendor. The alpha version was available in the first half of 1994,
with the alpha OS/2 version distributed on IBM's Developers Connection CD-ROM
#4. Betas are expected any time, with product to ship in late 1994. The alpha
version of OpenDoc is still based on Apple's original C++ version with a big
push currently on to port it to IBM's System Object Model (SOM). The betas
will reflect this port.
CI Labs is not a standards organization, but rather a support organization
chartered to provide reference source, technical documents, examples, and
validation suites in an open environment that does not require nondisclosure
agreements. It derives its funding from membership fees, not royalties.
Membership is open to all. Nonmembers may freely use OpenDoc technology
(including source) but only members may vote and hold office. To join CI Labs,
the annual membership fee ranges up to $110,000, computed at $10,000 plus 1
percent of a company's gross revenue. To join as a sponsor, a one-time fee of
$500,000 is added to the annual fee. Current sponsors include Apple, IBM, and
WordPerfect. 
CI Labs is not in itself performing any of the ports, but serves as a
clearinghouse for the various members. Each member company is conducting a
port. Apple is bringing SOM to the Macintosh, IBM is porting OpenDoc to OS/2,
and WordPerfect is responsible for the Windows port. Further technical
information is available on the Internet via FTP at ftp .cilabs.org or via
World Wide Web at URL= "ftp://ftp.cilabs.org/pub/". For e-mail, use
cilabs@cil.org or call 415-750-8352. WordPerfect can be contacted at
opendoc@wordperfect.com. Apple maintains a conference on AppleLink and CI Labs
runs several Internet "interest lists" to which you can subscribe by e-mailing
Majordomo@cilabs.org. IBM's Development Connection organization can be reached
at 800-633-8266 or via e-mail at devcon@vnet.ibm.com.
-- J.R.
Steps in Creating an OpenDoc Part
Define the content model and semantic events of your part. For example, the
content model for a simple text editor would consist of lines of text--with
semantic events for inserting lines, deleting lines, replacing text, and so
forth. For a painting part, the content model may be a rectangular region of
pixels, with semantic events to create points, lines, circles, and so forth.
Implement your core data engine. This is where the custom portion of your part
is developed. It is the basic set of algorithms and data structures specific
to the type of data you are manipulating, independent of any human interface.
A key element of this piece is making sure that the human-interface component
interacts with the core engine through a well-defined set of calls matching
the user model of the core engine. These calls are the semantic events on
which scripting is based.
Implement your part's storage-manipulation code. Here you develop the body of
code that uses the Storage API to load your part into memory and store it back
as needed. It does not mean that your part must be loaded into memory in its
entirety (which may be difficult for certain multimedia types), but rather
that you create the structures necessary for your part to begin accepting
events and rendering requests. Usually, you will not have to worry about the
drafting facility of OpenDoc, as the document shell will handle most cases for
you. If your part is a container, it must ask the embedded parts to load or
store themselves as well.
Implement your part-rendering code. This code examines the frame within which
the part resides, determining whether it is on a screen or paper, and performs
its geometry negotiation appropriately. It then issues platform-specific calls
to draw the content of the part. If your part is some type of container, your
code must include support for layout negotiation and update the
transformations of each frame (embedded in your container) that is visible.
Implement your user-interface event-handling code. This code supports direct
manipulation of your part by handling user-interface events such as mouse
clicks and keystrokes. You may need to deal with drag-and-drop and, if your
part must display elements outside of your frame (like a ruler), you must get
involved in layout negotiation. This portion of your code may use
platform-specific OS calls, or you may rely upon an OpenDoc User-Interface
parts facility (which is more portable and may be extended by developers). If
your new part is some type of container, you must include code to notify parts
embedded within yours of changes to your frame, and maintain information about
the shape and transformations of your frame yourself.
Implement your scripting code. The scripting code provides accessory functions
that resolve external references to a part's content objects (for example,
"Line 6" into the actual reference to the sixth line of your Core Data
structure). This is also where you provide functions to take semantic events
such as "Delete Line," and actually perform the deletion by calling upon the
Core Data Engine. You must handle the notification of dependent parts when the
content of objects linked and exported changes.
Implement the desired extension interfaces. This is an area that goes beyond
the basic OpenDoc architecture, and in which various extensions are added. It
could include full text search, spell checking, or many other interactions.
These APIs are reserved for those functional areas where bandwidth or
integration requirements prohibit the use of scripting to accomplish them. CI
Labs plans to be active in proposing and publishing standard interfaces
between parts.
Package your part handler. Now your part is finished and you can prepare
documentation to be provided with your part that specifies what part types,
semantic events, and extension APIs are handled. The user of your part needs
some information about how to use your part and what to expect. For a complex
part such as a spreadsheet, this may be a small manual.
Create stationery to bootstrap your new part. For the user to insert new parts
of the new type, you must create some stationery that has an empty copy of
your part type. The user can then drag a copy of this empty type from the
stationery palette into a document.
-- J.R.
Figure 1 OpenDoc architecture.
Figure 2 Major blocks of OpenDoc.









Special Issue, 1994
The Architecture of the Taligent System


A web of frameworks leads to People, Places, and Things




Mike Potel and Jack Grimes


The authors are software engineers at Taligent and can be contacted at
potel@taligent.com and jgrimes@taligent.com. 


Developers spend more time today maintaining or rewriting software that
already exists rather than innovating. By providing a fully object-oriented
programming environment and powerful set of software building blocks, Taligent
hopes to reverse this trend. Taligent's objective is to change how software is
created, maintained, and used.
To achieve this objective, Taligent will deliver an application programming
environment, development environment, a set of operating-system services, and
a new human interface. Built with object-oriented technology, the Taligent
system is designed to improve developer productivity, support
business-critical computing, and better align desktop computing with the way
users work and think. Taligent believes there is an increasing demand to
address these issues in order to revitalize growth in the computer industry,
attract new computer users, and enhance the value of computing through the
1990s and beyond.
Taligent, an independent system-software company founded in March 1992, is
owned in part by Apple Computer, Hewlett-Packard, and IBM. All four companies
will license, market, and support Taligent's software products, due to be
released in a series, beginning in the first half of 1995.
This article presents an overview of the software technologies designed by
Taligent.


Introducing Frameworks


To remove the barriers that exist between today's software and operating
systems, Taligent has factored the major aspects of system
software--development, operation, and applications--into reusable sets of
software called "frameworks." As Figure 1 illustrates, Taligent's architecture
is based on an integrated web of these frameworks.
Frameworks are collections of objects that provide an integrated service
intended for customization by the developer. The usage of the term here is a
slight variance from others in the industry, who use the term "application
framework" to refer to a single entity--a large, monolithic class library such
as Microsoft MFC, Borland OWL, or Apple's MacApp. Frameworks, in the Taligent
sense, refer to smaller, more manageable sets of functionality at a level of
abstraction set between a large class library and a single class.
The central problem that frameworks solve is complexity. Everyone knows that
highly functional systems are complex, making them difficult to use. Object
technology--in particular, class libraries--provides part of the answer
through abstractions. Well-designed classes abstract behavior and present it
through an interface that hides (encapsulates) the associated data and other
helper functions. This approach scales well to systems of hundreds of classes.
For systems containing thousands of classes, an intermediate level of
abstraction is needed. Taligent frameworks deliver services that are higher
level than classes. Examples are frameworks for compound documents,
user-interface construction, file systems, I/O devices, database access,
microkernel services, network protocols, many types of media, and so on. 
Because frameworks provide an infrastructure and flexible interfaces, they
avoid many of the problems that traditional procedural programming models
impose upon developers. They make it easier to add extensions, factor out
common functionality, facilitate integration, and improve upon software
maintenance and reliability. Frameworks foster a higher level of code and
design reuse than do other software methodologies.
As yet, frameworks are not supported explicitly by object-oriented programming
languages. In Taligent's C++-based system, for example, frameworks are a
programming convention and a programming-model concept; this is useful, but
abstract. Programming conventions that have language support are generally
more useful, so perhaps future languages will provide explicit support.
Taligent frameworks provide two object-programming interfaces: the Client API
(which provides functionality similar to an object library) and the Framework
API (which provides a separate, richer interface for customization, extension,
and code reuse). 
Taligent's frameworks are designed with the assumption that they will be
customized to create solutions for a family of related problems. Developers
will, therefore, find code that not only solves a set of related problems, but
also can be reused. 


Taligent System Architecture


Taligent's web of frameworks (see Figure 2) is divided into three main areas:
TalAE (which includes most of the object-programming frameworks), TalDE (which
is an extensible set of tools frameworks designed expressly for object
programming), and TalOS (which provides extensible frameworks for
more-traditional operating-system functions). 
The Taligent Application Environment (TalAE) is a complete implementation of
the Taligent object-oriented application-programming model. TalAE contains
more than 100 frameworks, covering areas such as graphics, database access,
multimedia, user interface, internationalization, networking, and distributed
computing. TalAE is both distributed and portable, and will run on existing
32-bit operating systems (such as OS/2, AIX, HP-UX, and PowerOpen), as well as
future versions of Apple's System and Taligent's own Object Services. Taligent
has released TalAE to its three investor companies, which will begin the
deployment of TalAE on their respective platforms starting in the first half
of 1995.
The Taligent Development Environment (TalDE) is a suite of framework-based
developer tools that complement the TalAE. TalDE supports programming in C and
C++, and also provides testing and debugging tools for class and framework
developers. Over time, a range of development language and tool products for
the TalDE will be available from both Taligent and third parties. The TalDE is
an integrated development environment that will include dynamic browsers,
incremental development, an automated build facility, online documentation,
multiuser source-code control, and a GUI constructor. TalDE is currently
scheduled to be deployed in mid-1995.
Taligent Object Services (TalOS) is designed to be an ideal platform to host
the TalAE, a "native," fully object-oriented operating environment. The TalOS
uses a Mach Version 3.0-based microkernel enhanced for objects. The TalOS also
provides system-level frameworks that facilitate customization of all
low-level parts of the system (including I/O drivers, file systems,
networking, and communications). TalOS is currently scheduled to be deployed
in late 1995 or early 1996.
The open structure of Taligent's frameworks enables third parties to extend or
customize the system at all levels. The goal is a high degree of
interoperability with existing standards (such as network protocols, file
systems, data formats, and remote services). In addition, Taligent will
support the Object Management Group's (OMG) CORBA specification for
distributed-object access in open, multisystem environments. 
To promote industry acceptance as an open-system standard, Taligent intends to
submit interface specifications for TalAE to X/Open for adoption. Taligent
plans to work with X/Open to establish a testing-based certification program
to make these object APIs available to third parties.
When TalAE is hosted on existing operating systems, current software (legacy
applications) will continue to run as before on the host operating system,
thus facilitating migration to the new programming model. As a first step in
deploying the Taligent technology, Taligent's partners--Apple, IBM, and
Hewlett-Packard--will port a reference release of the TalAE to their own
preemptive, multitasking-operating systems (such as AIX, OS/2, HP-UX,
PowerOpen, and a future version of Macintosh System 7). Subsequently, the
TalAE will be hosted by Taligent on its own TalOS, which will be ported to
leading hardware architectures (including Intel, PowerPC, and PA-RISC). Given
this strategy of deploying the TalAE on multiple operating systems and
multiple hardware platforms, developers will write to one programming
environment and gain volume deployment. 
When the TalAE is hosted on the TalOS, legacy software will still be supported
using adapters written by Taligent's investors and third parties, allowing
users to access and retain value in software designed for earlier systems.


Clients and Frameworks


In the Taligent system, each framework is built expressly to allow developers
to customize or change it. Like any good library, a framework has a Client API
that represents a calling interface for that set of services. Frameworks can
be thought of as servers in a client/server sense. For example, in a
file-system framework such as the one in Taligent's Object Services, you have
a client API that looks like the file-system interfaces you see today--open,
close, read/write, rename, delete. In today's systems libraries, you get these
in a black-box implementation. If you like this implementation, you're in good
shape. If you don't like it, your options are limited. You must either replace
it or learn to live with it. You don't have the choice of changing it as you
do in the Taligent system. 
Taligent enables this customization with a second API, the Framework API. The
Framework API opens up the inside of the framework for customization. In
effect, it describes how a generic implementation of the framework is actually
organized, allowing the developer to inherit not just code, but design. For
example, Taligent's graphics frameworks are useful "as is" via their client
API for doing precision two-dimensional graphics and photorealistic, shaded,
three-dimensional graphics. However, because it has a framework API, you can
customize your own drawing algorithms or 3-D shaders and make many other
unique changes to suit your purposes.
Customized versions of frameworks are created by adding and replacing Taligent
code with your code, subclassing, and overriding the framework API of that
particular framework. (An example of this is presented later.) You might also
extend what's there in new directions. By either customizing or extending, you
can create a new software library that implements a variant of the original
delivered with the Taligent system. 
With Taligent, the modified version works in the system just the same way that
the original did. You will have the same client API. That's the basis of
polymorphism--the idea that you can create your own customized version, plug
it back in the system, and have it work in a compatible way with the original.



Frameworks as Integration Mechanisms



Another important aspect of Taligent's frameworks is integration. The Taligent
frameworks contain a considerable amount of code designed to talk to the other
frameworks in the system. The Taligent graphics system isn't isolated. It
assumes communication with the printer frameworks, font framework, color
framework, and graphics drivers. It is designed as a coherent architecture, so
all that functionality works together in a reasonable way. Most importantly,
you inherit that integration when you subclass from these frameworks.
TalAE uses frameworks at all levels. It is more than just having an
application framework or a set of tools for object programming. In other
systems, if you want to go beyond the functions in the application framework,
you fall back on the regular procedural APIs of the underlying operating
system. In the TalAE case, the entire programming model is object
oriented--the same approach that some people use today in application
frameworks is used pervasively throughout the Taligent system itself. This
means that when you add a framework to this system, your framework is on the
same technical footing as any framework already there. This blurs the
distinction between system and application. 
Taligent isn't planning on doing this all alone. Taligent will provide the
initial set of implementations, populating each of the frameworks with basic
functionality. Taligent engineers are working closely with those at Apple, HP,
and IBM, who have many ideas about what they want to add to these frameworks.
You can expect implementations from Taligent's partners to support the same
basic APIs, plus added functionality and support for their unique hardware.
Over time, this will lead to more diverse systems. Third-party developers can
also extend and customize this system.


The TalAE Architecture


The frameworks in TalAE are organized into three main groups: the application
frameworks, the domain frameworks, and the support frameworks. The application
frameworks are comparable in functionality to MacApp or, in NextStep, the
AppKit, together with some of the User Interface frameworks. TalAE also has a
Text Editor framework that is a built-in, word-processing building block, and
a Graphics Editing Framework that is a built-in, interactive-graphics building
block.
The domain frameworks in TalAE are for graphics (both 2-D and 3-D),
multimedia, and basic text, plus international support.
The support frameworks provide basic application services such as fonts,
color, and testing. In addition, there are support frameworks for
distributed-computing services, and software-portability frameworks that allow
targeting multiple-host operating systems and hardware platforms.
To see the Taligent system architecture in more detail, consider the
frameworks in four key areas: compound documents, graphics,
internationalization, and data access.
Figure 3 shows the compound-document framework. To use it, you must answer
four questions:
What is your data model? 
How is the data drawn on the screen? 
How does the user make selections within the data?
What commands (operations) can be run on selected data? 
If you subclass and provide specific code that answers these questions, you
generate code that the compound-document framework can call to do many things
for your data type, such as embedding, linking, drag-and-drop, updates on the
fly, edit in place, highlighting, or multilevel undo. All this functionality
is built into the system design. The system supports a "saveless" document
model by journaling all the commands away--so that if your machine crashes,
the document can recover.
The compound-document framework also supports collaboration, allowing any
document to be shared and worked on simultaneously by multiple users. The
document framework also supports scriptability, allowing users to store
commands in a file and play them back later. The system will interoperate with
emerging compound-document standards such as OpenDoc and OLE--as well as with
existing popular file types.
The Taligent graphics system has extensive 2-D and 3-D
functionality--high-precision 2-D as well as full 3-D rendering capabilities.
The graphics framework uses a world model based on 64-bit floating-point
coordinates. Other features in the framework include: extensive 2-D and 3-D
geometry (including NURBS curves and surfaces); fully extensible attributes
(color and line style for 2-D, and shaders and numerous shaders for 3-D); full
3x3 transformations for 2-D modeling and viewing; and full 4x4 transformations
for 3-D modeling. The framework also includes outline fonts, a font-server
framework, color matching, multiple color spaces in 2-D, 3-D geometry
constructors (sweeps, extrusions), antialiasing, camera, and lighting models.
The international system is 16-bit Unicode throughout, not just for a few
special utilities. All strings in the system use 16-bit characters--even
filenames and network names. The international framework also includes style
sheets, ideographic scripts/ languages, ligatures, kerning, and tracking. Also
supported are multilingual typing, collate, search, and replace. There will
also be inline Asian input methods.
The Data Access framework is designed to simplify forming SQL queries. It
includes subclasses supporting different flavors of databases (for example,
SQLNet for Oracle databases, OpenClient for Sybase, and DRDA for DB/2). It
also supports ODBC for other kinds of databases--as well as supporting
flat-file data. Since it is a framework, you can write your own
vendor-specific subclasses, where you need a unique kind of data-access
capability. The data abstractions in this framework allow access by rows,
columns, or cells. There is also the capability to upload or download data in
bulk, and to filter by using table/column filters.


Developer-Created Frameworks


Developer-provided frameworks are similar in structure to Taligent frameworks.
Suppose a company wants to have an online catalog of their products, yet
doesn't want to create the software from scratch. In such a case, the product
is, in effect, a catalog-forms package. It could be constructed as shown in
Figure 4.
Say that the catalog ISV has developed thousands of lines of code (LOC) that
implement a framework specifically for building online catalogs. This
framework could be designed to be modified in two ways by the customers.
First, the pages must be populated with data (the company's products, for
example). This is accomplished by providing the data for the pictures and
accompanying text. The framework handles pagination, indexing,
cross-referencing, and so on. This is the normal use of the framework, as done
by most of the catalog ISV customers.
Second, as shown in Figure 4, this company's developers have further modified
the framework to provide capability not explicitly included by the ISV--to
provide online inventory information for the catalog items (myDataLookup), to
display it in the catalog (myDrawView), and to update that inventory data when
an order is placed (myDataNotify). This can be accomplished by overriding the
catalog framework's DrawView method with the myDrawView method, which in turn
calls the other two new methods. The methods can be overridden because the ISV
has defined them as virtual functions and provided the interface information.
It is typical in framework designs to make many of the functions virtual so
that their behavior can be modified in the future. The cost is low, is paid
once, and enables future modification. In this simple example, the
modifications to the behavior require less than a hundred lines of code,
compared to thousands of lines of code for the base framework. This example
gives an idea of the expected order-of-magnitude productivity benefit from
reuse of design and code, as compared to "modification-by-replacement"
strategies required by other approaches to component software.


A New User-Interface Model


Another unique aspect to the Taligent system is the Taligent desktop, called
the "Workspace." In today's personal computer systems, you essentially have
one "place"--your home place, your office desktop. In the Taligent system,
other kinds of desktops may be optimized for the different places in which
customers want to work, for different functions the customer works on at a
given time, for different departments within the company, and so on. 
Taligent's "People, Places, and Things" metaphor can be used to model an
enterprise in this way. In this model, you might use a graphical business card
to represent an individual person. To represent a group of people, you might
use a picture of an address book. There are other things you can represent
with the People metaphor--roles, for example, so you can delegate your
signature authority to someone else while you're away.
To accomplish these goals, change is needed in the human interface of existing
computer systems, but the change is different than you might think. More (or
different) icons or 3-D glasses are not what is required. What is needed is an
interface paradigm that works the way humans did, before and without
computers. The acronym HCI is often used to refer to the field of
Human-Computer Interaction, also known as user-interface design and
human-factors research. However, what each of us should be working on instead
is HCHI--Human-Computer-Human Interaction. In other words, computers should
facilitate interactions among people.
The move from an application-centered view of computer functionality to a
document-centered view is a good step. "Applications" and files were invented
by the computer industry and had to be learned by the user and developer
communities in order to use computers effectively. The emerging
document-centered view is better because we already know about documents--they
exist outside the digital world.
The next point in this evolution is task-centered computing. Tasks make up
important processes in the home and in organizations. They are tasks we did
yesterday and will need to do tomorrow--tasks that may be partially manual and
partially on the computer, partially on our own desks and partially with
others. 
Applications today don't represent people--only some of their attributes (such
as name and address). The Taligent system uses objects called "Business Cards"
to serve as references to people. This representation makes it convenient to
mail someone, say, a document object. Also, People don't exist in the ether,
they exist in Places (such as offices, homes, libraries, cars, hallways,
conference rooms, and so on). Some systems today provide primitive places. For
example, the familiar desktop can be considered a place--it is a container for
documents, trash cans, disks, and the like. This idea can be generalized and
used as a richer metaphor for organizing the user's world, and most
importantly, as a way to visualize Human-Computer-Human Interaction.


References


Andert, Glenn. "Object Frameworks in the Taligent OS." Compcon 94 Digest of
Papers. IEEE Catalog No. 94CH3414-0.
Fuller, Rodney. "Differences in Human-Computer and Human-Computer-Human
Interactions." Human Factors and Ergonomics Society, CSTG Bulletin, April/
August 1993.
Grimes, Jack. "Objects 101: An Implementation View." Compcon 94 Digest of
Papers. IEEE Catalog no. 94CH3414-0.
"Lessons Learned from Early Adopters of Object Technology," Taligent White
Paper, 1993. Available from the authors.
"Leveraging Object-Oriented Frameworks," Taligent White Paper, 1993. Available
from the authors.
Object Management Group. Common Object Request Broker Architecture (CORBA)
Specification. Version 1.1.
The Taligent Guide to Designing Programs (Reading, MA: Addison-Wesley, 1994).
Taligent World-Wide Web home page: http://www.taligent.com/
Figure 1 Taligent's web of frameworks.
Figure 2 The Taligent system overview.
Figure 3 The compound-document framework.
Figure 4 The architecture of an ISV-supplied catalog framework product as
modified by an in-house developer.
































































Special Issue, 1994
OLE Integration Technologies


Building on the Component Object Model




Kraig Brockschmidt


Kraig is the author of Inside OLE 2, published by Microsoft Press. Kraig can
be contacted at Microsoft Corp., One Microsoft Way, Redmond, WA 98052-6399.


OLE 2, along with the technologies that fall under the OLE umbrella, is all
about integration--integration between functional components of all sorts,
wherever they may be. These components can be located in the system, inside
applications, inside in-process DLLs or out-of-process EXEs, and, in the
future, even in modules distributed across a network.
The basis for this integration is the Component Object Model (COM), described
in the article, "The Component Object Model," by Sara Williams and Charlie
Kindel on page 14 of this issue. OLE uses COM as the low-level plumbing that
provides transparent communication between components through a binary
standard (COM interfaces and structures). As COM evolves, OLE will
automatically benefit and gain support for diverse system services, from
database access, to messaging services, to system management, and more.
OLE's history is concerned primarily with the creation and management of
compound documents, but now OLE is much more than that. OLE integrates
components that can come in many shapes and sizes. The interfaces provided on
those components can also vary widely. In some cases, a particular component
is implemented by OLE itself (to form a standard on which applications can
depend). Other components are implemented by various types of applications.
Components that are primarily users of interfaces implemented on other
components are called "clients" or "containers," depending on what they do
with those interfaces. Modules that implement components with interfaces are
called "servers." All components based on COM are called "Component Objects."
For convenience, the use of the word "object" by itself in this article means
"component object." 
OLE is not an all-or-nothing technology. When using or implementing a
component, you can use as little or as much of OLE as you find appropriate.
This article examines the OLE technologies that build upon COM, as shown in
Figure 1. You'll see that OLE is a very rich collection of features to
integrate data storage and exchange, programmability, compound documents, and
controls.


Structured Storage


Today's world of component integration requires the ability for many
components to share a common, byte-oriented storage medium (whether a disk
file or a record in a database). Each component needs storage in which to save
its persistent state. OLE's Structured Storage is an abstraction layer for
accomplishing this level of integration. Structured Storage can be built on
top of any file or other storage system, as shown in Figure 2 for a hard-disk
system.
In Structured Storage, any byte array can be structured into two types of
elements: storages and streams. Each element has a name, which can be up to 31
Unicode characters in length. A storage implements the IStorage interface,
which has directory-type functions for creating, destroying, moving, renaming,
or copying other elements. A stream implements the IStream interface, with
functions analogous to traditional file I/O (such as read, write, seek, and so
on). In fact, IStream members have a direct one-to-one correspondence with the
file I/O functions in the Win32 API and in the ANSI C run-time library.
Example 1 illustrates Win32 and IStream similarities.
A storage can contain any number of other storages and streams within it, just
like a directory can contain files and subdirectories. However, a storage
isn't restricted only to being disk-based. Structured storage can be
implemented on top of any byte-oriented storage system (or byte array), such
as a disk file, a block of memory, a database field, and the like. Regardless
of the medium, however, structured storage provides uniform access through the
standard IStorage and IStream interfaces. The storage model also defines
transactioning for these elements, where you can create or open an element in
"transacted" mode (in which changes are not permanent until committed) or
"direct mode" (where changes are immediately permanent).
In effect, this "file system within a file" leaves the exact layout of the
file to the system, but makes incremental access to elements the default mode
of operation. With such a system, the application can effortlessly give
individual streams (or even entire storages) to other components in which
those components can save whatever persistent information they desire.
This system is perfect for creating compound documents, and the technology
known as "OLE Documents" relies on this structured storage facility. To
further facilitate the creation and sharing of files (even across platforms),
OLE provides a standard implementation of disk-based storages and streams in
what are called "Compound Files." This facility is compatible with future
versions of Windows (the code is being written by the same developers at
Microsoft). Furthermore, Microsoft licenses the Compound File source code to
other vendors for use on other operating systems.
A major benefit of having a single, standard implementation of structured
storage on a given platform is that any application (including the system
shell) can open and navigate through anyone's compound file. Elements of data
are no longer hidden inside a proprietary file format. You can freely browse
the hierarchy of storage and stream elements. With additional naming standards
and standardized stream formats for specific information, every application
can retrieve significant information from any given file without having to
load the application that created that file.
Microsoft Windows95 will exploit this browsing ability by offering shell-level
document searching. Windows 95 looks in compound files for a stream called
"Summary Information," in which applications store data such as creation and
modification times, author, title, subject, revision number, keywords, and so
on. Windows 95 can match this information against a user query. What was once
a feature buried inside applications for one document type is now a standard
part of the system itself for all documents.


Object Persistence


Structured storage is necessary to allow multiple components to share the same
disk file or other storage. A component indicates its ability to save its
persistent state to a storage or stream by implementing the interface
IPersistStorage or IPersistStream, respectively. (There is also an
IPersistFile interface for components that save to separate files.)
The container application that manages such persistent objects creates the
instances of IStorage or IStream to give to components that implement
IPersistStorage and IPersistStream. The container tells components to save or
load their persistent states from the storages or streams. Thus, the container
remains in control of the overall document or file, but gives each component
individual control over a storage or stream within that file. This tends to
make structures within a file more intelligent--that is, placing more of the
code knowing how to handle the structures into components rather than in the
container.
Example 2 shows how a container would open an IStorage and have a component
save into it through IPersistStorage. If the component doesn't support
IPersistStorage, then the container cannot possibly try to save the component
that way. This shows the power of the QueryInterface function and the
interface-oriented architecture of OLE. You can't ask a component to do an
operation it doesn't support.


Persistent, Intelligent Names (Monikers)


Think for a moment about a conventional filename that refers to data stored
somewhere on disk. The filename essentially describes the "somewhere," and so
the name identifies a file that could be called an "object" (in a primeval
sort of way). However, this is limited, because filenames have no
intelligence. All knowledge about how the name is used exists elsewhere, in
whatever application uses that filename.
Now think about a name that describes the result of a database query, or a
range of spreadsheet cells, or a paragraph in a document. Then think about a
name to identify a piece of code that executes some operation on a network
server. Each different name, if unintelligent, would require every application
to redundantly understand the use of that name. In a component-integration
system, this is far too expensive. To solve the problem, OLE has "persistent,
intelligent names," otherwise known as "monikers."
A moniker is a component that encapsulates a type of name and the intelligence
to work with that name behind an interface called IMoniker. Thus, users of the
moniker pass control to the moniker whenever they want to work with the name.
While IMoniker defines the standard operations you can perform with a moniker,
each different moniker class defines what data makes up the name and how that
name is used in binding. A moniker also knows how to serialize itself to a
stream, because IMoniker is derived from IPersistStream.
The most basic operation in the IMoniker interface is that of binding to the
object. IMoniker::BindToObject runs whatever algorithm is necessary in order
to locate the object of reference and returns an interface pointer to the
component that works with that information (this pointer is unrelated to the
moniker itself). Once a client has bound to the referenced object, the moniker
falls out of the picture entirely. 


Types of Monikers


OLE defines and implements five basic type of monikers: file, item, generic
composite, anti, and pointer. A file moniker maintains a text filename
persistently and the binding means to locate a suitable application and have
it load the file (returning an interface pointer to the "file" object). Item
monikers are used in conjunction with file monikers to describe a specific
part of a file that can be treated as a separate "item" object. To put a file
and item moniker together requires the generic composite moniker. This type
exists only to contain other monikers (including other composites), and its
persistent data is just the persistent data of all the contained monikers in
series (separated by a delimiter). Binding a generic composite means binding
those it contains in turn.
A composite moniker is used whenever you cannot create a reference that is
described by a single, simple moniker. A range of cells in a sheet of a
Microsoft Excel workbook requires a file moniker to identify the workbook, an
item to identify the sheet, and an item to identify the range in the sheet.
Such a composite moniker is shown in Figure 3. Code that would create this
moniker is shown in Example 3.
The antimoniker and pointer moniker are special types. An antimoniker
annihilates the last moniker in the series in a composite. A pointer moniker
wraps an interface pointer into a moniker where binding is nothing more than a
QueryInterface call. These are provided for uniformity, and neither supports
persistence.
Of course, if OLE's standard monikers are not suitable for your naming
purposes, you can always implement your own component with IMoniker. Since you
encapsulate your functionality behind the interface, your moniker is
immediately usable in any other application that knows how to work with
IMoniker.
Working with monikers is generally called "linking," the moniker's information
being the "link" to some other data. OLE uses monikers to implement linked
compound-document objects. This involves other user-interface standards for
managing links. OLE also implements a central "running object table" in which
monikers for already-running objects are stored. This prevents excess
processing when a file is already loaded or when other data is already
available in some other application. 



Uniform Data Transfer and Drag-and-Drop


Structured storage and monikers integrate storage and naming functions. Once
you have found a component that can read from storage, you'd normally like to
have it render data for you. OLE's Uniform Data Transfer mechanism is the
technology for data transfers and notifications of data changes between some
source (called the "data object") and something that uses the data (called the
"consumer"). All of this happens through the IDataObject interface implemented
by the data object. IDataObject includes functions to get and set data, query
and enumerate available formats, and establish a notification loop with the
data source.
The "uniform" aspect arises from the fact that IDataObject separates exchange
operations (get, set, and so on) from specific transfer protocols like that of
the clipboard. Thus, a data source implements one data object and uses it in
any OLE transfer protocol: clipboard, drag-and-drop, or compound documents.
The OLE protocols (unlike the existing Windows protocols) are only concerned
with getting an IDataObject pointer from the source to the consumer. Once
transferred, the protocol disappears and the consumer just has to deal with a
uniform IDataObject. Source and consumers can thus implement a core set of
functions based on IDataObject and build little protocol handlers on top of
that code.


Data Formats and Transfer Mediums


Besides the separation of transfer from protocol, OLE also makes data transfer
much more powerful and flexible with two data structures: FORMATETC and
STGMEDIUM. FORMATETC improves on the clipboard format of Windows--hence its
name ("Format_"). The Windows clipboard format only describes the layout of a
data structure (for example, CF_TEXT describes a null-terminated ANSI
character string). FORMATETC adds a detail field (full content, thumbnail
sketch, and so on), a device description (the device for which the data is
rendered), and a transfer-medium identifier.
This last field brings us to STGMEDIUM, an improvement over the global memory
handles. Existing Windows protocols only allow data exchange via global
memory, which can be inefficient for large data. STGMEDIUM allows you to
reference data stored in either global memory or in another medium--which
could be a disk file, an IStorage, or an IStream.
Together, FORMATETC and STGMEDIUM allow you to keep data stored in the most
appropriate (and efficient) medium and still ship it off to other
applications. This can result in significant performance gains for
applications that would otherwise load large data sets into global memory,
only to have these swapped out to disk again by the virtual-memory system.


Clipboard and Drag-and-Drop


Other OLE technologies build upon the uniform data-transfer concept so you can
take advantage of the improvements, regardless of how you transfer data.
First, the OLE DLLs provide functions to work with the system clipboard
through IDataObject. A source cuts or copies data by packaging it into a data
object and handing an IDataObject pointer to OLE's OleSetClipboard function.
OLE, in turn, makes the formats therein available to all other applications
(non-OLE applications can only see global-memory-based formats). When a
consumer of data wants to paste from the clipboard, it calls OleGetClipboard
to obtain an IDataObject representing the clipboard contents. Through
IDataObject, the consumer checks formats or requests renderings. Data placed
on the clipboard by non-OLE applications are completely available through this
interface. So, you can toss out your old clipboard code and switch easily to
the more powerful OLE mechanism.
Another technology that builds on data transfer is OLE's Drag-and-Drop
feature, really nothing more than a slick way to get an IDataObject pointer
from a source to a consumer, or "target." The source decides what starts a
drag-and-drop operation (usually a mouse click-plus-move in a specific place).
It then packages up its data into a data object--exactly as it does for the
clipboard--and calls OLE's DoDragDrop, passing a pointer to its implementation
of the IDropSource interface. Through this interface, the source controls the
mouse cursor and drop or cancellation times.
The target, on the other hand, implements the interface IDropTarget and
registers it with OLE for a specific window. When the mouse moves over that
window, OLE calls functions in that IDropTarget according to what is happening
with the mouse: enter window, move in window, leave window, or drop in window.
In these functions, the target indicates the effect of a drop at the mouse
location point, modified by the Ctrl and Shift keys. Valid effects are a move
(no keys), copy (Ctrl), link (Shift+Ctrl), or "no-drop." These are specified
using DROPEFFECT_* flags. The effect is handed back to the source to indicate
which cursor to show; see Figure 4. These default cursors are handled by OLE
itself, leaving little for the source to do, as shown in the typical
implementation of IDropSource (excluding IUnknown functions) in Example 4.
Sources do have the ultimate say as to which cursor is shown, of course.
Note that DoDragDrop, in addition to watching mouse motion and the Ctrl/Shift
keys, also watches the Esc key (used to cancel the operation) and the mouse
button for an "up" message (used to cause a drop), as illustrated in Figure 5.
When a drop occurs on a target, that target just ends up with the source's
IDataObject pointer--exactly as it would after a call to OleGetClipboard. At
this point, the transfer protocol again disappears, and the consumer only must
deal with IDataObject. The same is true for the source, which packages data
into a data object for clipboard or drag-and-drop. Because drag-and-drop works
equally well within and between applications, you get considerable mileage
from one piece of code. The icing on the cake is that by adding a few formats
for compound-document objects, you can suddenly start exchanging
compound-document objects and controls by using the same protocols and the
same code!


Notification


Consumers of data from an external source might want to know (asynchronously)
when data in that source changes. OLE handles notifications of this kind
through a component called an "advise sink" that implements an interface
called IAdviseSink. This "sink" absorbs asynchronous notifications from a data
source and can receive a new copy of the data if desired. The consumer that
implements the advise sink connects it to the source's IDataObject through a
member function called DAdvise. Disconnection happens through DUnadvise. In
making the connection, the consumer indicates whether it would like a fresh
data rendering. When the data object detects a change, it then calls
IAdviseSink::OnDataChange to notify the consumer, as shown in Figure 6.
The IAdviseSink interface contains additional member functions that are used
with other interfaces (such as IViewObject for notifications when a
component's display image changes and IOleObject for state changes in
compound-document objects). However, it's not designed to handle arbitrary
notifications from arbitrary components. Such a task requires "events," which
are introduced with OLE Controls (but which are more fundamental than
controls, of course). OLE Controls are discussed a bit later.


OLE Automation


Another key aspect of integrating components is the ability to drive them
programmatically--that is, to control them without requiring an end user's
presence. This means having components expose their end-user functionality
(for example, menu commands and dialog-box interaction), as well as properties
by way of interfaces so that a scripting tool can be used to invoke that
functionality.
There are two sides to this picture. On the one hand are components that are
programmable by way of interfaces, or "automation objects." On the other hand
is an application that provides a programming environment in which a developer
or advanced user can write scripts or create applications that use other
automation objects. These are called "automation controllers." To make all
this happen, objects need a way to programmatically publish their interfaces
(the method names and parameter types, as well as object properties) at run
time such that the controller can perform type-checking and present lists of
callable functions to the programmer.
The technology that supports this is OLE Automation, primarily through an
interface called IDispatch. Applications that expose functions and properties
for various application objects (the window frame, document windows, parts of
the document, and so on) do so by implementing IDispatch on each of those
components. However, IDispatch has only a fixed set of member functions. How,
then, does each component supply its unique features?
The answer is an OLE Automation entity called the "dispinterface"--an
implementation of IDispatch that responds to a specific set of custom methods
and properties. An application frame and a document would both implement
IDispatch, but would have different dispinterfaces.
Dispinterface uses the function IDispatch::Invoke, the prototype for which is
shown in Example 5. The dispID "dispatch identifier" parameter tells Invoke
which method is being called, or which property of this object is being get or
set. The wFlags parameter indicates whether this call to Invoke is a method
call or a property get or set operation. An object's dispinterface, then, is
primarily the set of dispIDs to which the object will respond through Invoke,
and this varies from object to object, of course. Since some methods take
parameters, and properties have types associated with them, the dispinterface
also includes all of this "type information." Other functions in IDispatch
make the type information available to automation controllers, so those
controllers can use the types to enhance their programming environments.
When creating an automation object, you create a file using the Object
Definition Language (ODL) to define a dispinterface. This file is then run
through a special compiler that generates a "type library" containing all the
type information for any number of automation objects and dispinterfaces. This
type library, which can be kept in a separate file or attached to a server
module (DLL or EXE) as a resource, provides a way for automation controllers
to discover what automation objects and dispinterfaces are available, without
actually having to instantiate components just to ask for the information
through IDispatch.
The type library itself is a component that implements the interfaces
ITypeLib, ITypeInfo, and ITypeComp. You generally don't have to implement
these interfaces. OLE provides the implementations that work on any underlying
type library. Automation controllers use these interfaces to navigate through
all the information in the library so as to present the programmer with lists
of callable functions on an object, to extract parameter types to perform
checking, and so forth.
Visual Basic (VB) and the related dialect Visual Basic for Applications are
both automation controllers. When you run a piece of VB code such as that
shown in Example 6, VB will translate the method calls and property
manipulations in the VB code that uses the dot operator into IDispatch::Invoke
calls to the component in question. Ultimately, all the calls are being made
through the binary standard of COM interfaces, so VB doesn't care what
language was used to implement the automation object.
This illustrates the integration power of automation. VB can create and manage
many automation objects from many different applications at once, and use them
to programmatically combine information from a variety of sources. Automation
is exceptionally powerful for developers who are using off-the-shelf
applications such as Microsoft Word or Shapeware Visio to create custom
solutions. Adding automation support to an application opens up that
application to a tremendous number of new uses. In addition, developers can
encapsulate business logic into components, and make this functionality
accessible through high-level, third-party tools, including fourth-generation
programming languages and even productivity-application macro languages.


OLE Documents


Built on top of Structured Storage, Uniform Data Transfer, and Monikers, is
the technology known as "OLE Documents." This technology supports the creation
and management of compound documents. Two types of components are at work
here. The container is the component that controls the document and manages
relationships between the pieces of information in that document (such as
layout). Compound-document objects are pieces that make up the content put
into that document, and these pieces are supplied by servers (DLLs or EXEs).
OLE Documents is thus a way to integrate containers and servers through
compound-document objects. The objects themselves can be shared in two ways.
The first is embedding, where the entire object is "embedded" within the
container--that is, the object's persistent state is kept in the document
itself. Embedded objects always implement the IPersistStorage interface for
this purpose, and containers that support embedding typically use a compound
file to provide IStorage instances to embedded objects (but they don't have
to).
The other way to share an object is linking, in which a graphic image of the
object is cached in the container document along with a moniker that refers to
the location of the object's actual data. The object's persistent state exists
elsewhere, and the moniker provides the link to that data. Since a moniker can
be as complex as desired, the path from the compound document to the source of
the link can be very complex. Therefore, a document can contain linked objects
to things as simple as a file, or as complex as a cell in a table in a
document that is embedded within an e-mail message that exists in a particular
field of a database on a particular network server. Monikers impose no limits.
Embedding is normally optimal for objects with small data sets, while linking
is more efficient for large data sets (especially ones that are shared between
multiple users on a network). Each link is a reference to a single source. By
contrast, embedding the data means making a copy.



Compound-Document Objects


Compound document objects are nothing more than regular OLE objects that have
a particular combination of interfaces. This is shown in Figure 7, along with
the interfaces a container exposes to its objects. Note that the object
interfaces shown in the figure are those seen by the container. Those in
parentheses are implemented only by objects in DLLs. Those in EXEs implement
only the unmarked interfaces and rely on DLL "object handlers" for the others.
The most important interfaces are IPersistStorage, IDataObject, IViewObject2,
and IOleObject. The first two interfaces mean that compound-document objects
support persistence to IStorage elements and that they support exchange of
their data--primarily bitmap and metafile renderings of their display images
that can be cached and displayed in the document. Caching allows the container
to open a document for viewing or printing even when the code to handle the
object is unavailable--the cached images are suitable for these purposes.
If an object implements the IViewObject2 interface, it has the ability to
render itself directly to an hDC, usually the screen DC of the container's
display or a printer DC on which the document is being printed. This gives
control over rendering quality to the object itself. This interface is not
limited to compound documents. Any object can implement it and possess this
ability. 
IOleObject is the primary (and rather sizable) interface that says, "This
object supports the OLE Documents standard for compound documents." A
container uses this interface for many purposes, the most important of which
is activation. Activation means instructing the object to perform some action,
called a "verb." The container will, as part of its user interface, show these
verbs to the end user and forward them to the object when the user selects
them. The object has full control over what verbs it wants to expose. Many
objects have an "Edit" verb, which means "display a window in which this
object's data can be modified." Others, like a sound and a video clip, have a
"Play" verb, which means "play the sound" or "run the video." While the object
defines what verbs it supports and what those verbs mean, the container is
responsible for making the commands available to the end user and invoking
them when necessary.
On the other side of the fence, the container must provide a "site" object for
each embedded or linked object in the container. The site implements the
interfaces IOleClientSite (which provides container information to the object)
and IAdviseSink (which notifies the container when changes occur).


In-Place Activation


In cases other than playing a sound or a video clip, activation of an object
generally requires that the object display another window in which the
operation takes place (such as editing). For example, if you have a table from
a spreadsheet embedded within a document, and you would like to edit that
table, you would need to get the table back into the spreadsheet application
to make changes. Right?
Not necessarily. OLE Documents includes In-Place Activation, also known as
Visual Editing. This is a set of interfaces and negotiation protocols through
which the container and the object merge their user-interface elements into
the container's window space. In-place activation allows the object to bring
its editing tools to the container, instead of taking the object to the
editing tools. This includes menus, toolbars, and small child windows that are
all placed within the container.
A number of interfaces are required to make all this work, on both the
container and compound-document object. The interface names all start with the
prefix IOleInPlace. By way of these interfaces, the two sides can create a
mixed menu (composed of pop-up menus from both container and object), share
keyboard accelerators, and negotiate the space around the container's frame
and document windows in which the object would like to display toolbars and
the like.
Because in-place activation is handled solely through additional interfaces
for both container and object, support for it is entirely optional (but
encouraged, of course). If a fully in-place-capable container meets an
in-place-capable embedded object, then they achieve a high level of
integration. If either side doesn't support the technology, however, then they
can still work together by using the lower-level activation model that
requires a separate window. Even when in-place is supported all around, the
user can still decide to work in a separate window if desired.
In-place activation is not limited to just activating one object at a time, or
activating objects only on user command. Objects can mark themselves to be
in-place activated, without the mixed menu or toolbar negotiation, whenever
visible. This means that each object can have an editing window in its space
in the container. These objects respond immediately to mouse clicks and the
like because their windows are in the container, and those windows receive the
mouse messages. Only one object, however, can be "UI Active," which means that
its menus and toolbars are also available. Of course, the UI Active object
switches (and the user interface changes correspondingly) as the user moves
between objects.
With multiple objects active within a document, you can imagine how useful it
would be if some of those objects were buttons or list boxes. You could create
forms with such objects, and create an arbitrary container that could hold
objects from any source and benefit from all the other integration features of
OLE. This is the reason for OLE Controls.


OLE Controls


An OLE Control is a compound-document object extended with Automation to
support properties and methods through IDispatch. It relies on a mechanism
called an "event," a notification that is fired whenever something happens to
the controls (such as a state change or user input). A control transforms
different types of external events such as mouse clicks and keystrokes (or
application-specific events like the pickle vat on the factory floor springing
a leak) into meaningful programmatic events. When these programmatic events
occur, an event handler can execute code--such as showing a button pressdown,
transmitting a character over a modem, or calling the pickle-vat repair
company.
For the most part, OLE Controls is a set of extensions to the other OLE
technologies such as Structured Storage (by way of adding an
IPersistStreamInit interface) and automation (adding new ODL attributes for
dispinterfaces, methods, and properties). OLE Controls defines a generic
notification mechanism called "Connectable Objects" that is used to connect
some sink object to a source, where the source wishes to call the functions of
a certain interface implemented on the sink. This is like the IAdviseSink
interface working with IDataObject, but more generic. This mechanism is used
to implement events are actually meaningful and useful outside of controls. An
object expresses the events it can fire as a dispinterface, which the event
handler (such as a container application) implements with an IDispatch and
connects to the object using the connectable objects technology. A similar
extension involves property-change notification, which applies very well to
controls, but is useful for any object that has properties of any kind to
notify a sink when those properties change.
OLE Controls also introduces a technology called "Property Pages." This is a
flexible user-interface model that any object can use to allow an end user to
directly modify its properties. A property page is easily integrated into a
tabbed dialog box, along with property pages from other objects as well, to
create a consistent, easy-to-use environment for manipulating such data.
The new interfaces involved for connectable objects, property pages,
property-change notification, and events make up the bulk of the additions to
an OLE control over a regular in-place compound-document object. So, what
actually is specific to controls? Not a lot, but a few key enhancements to the
OLE Documents technology (through the interfaces IOleControl and
IOleControlSite) comprise the final difference between a compound-document
object and a control. For example, in compound documents, only the UI Active
object can trap keyboard messages. By contrast, any control in a form or
document should be able to respond to keystrokes at any time, so OLE Controls
provides the mechanism to make it work. OLE Controls also defines mechanisms
for handling special controls like labels, pushbuttons (where one can be the
"default"), and exclusive button sets. In addition, the container application
that manages the controls exposes a set of "ambient properties" (through a
dispinterface) to all the controls to provide general defaults (such as colors
and fonts). These additions, combined with property pages, property-change
notification, events, and automation enhancements, make up "OLE Controls."
Since being a control or a container for controls is primarily being a
compound-document object or container, OLE Controls leverages any work you do
to support OLE Documents. Furthermore, applications such as Microsoft Access
and Visual Basic support OLE Controls in form creation. With a few good
controls, you can quickly create powerful front ends or custom business
solutions with a minimal amount of code--all you have to do is add some VB
code to the event handlers that these applications supply. As happened with
VBX controls, you can expect the market to provide numerous, useful
third-party OLE controls.


Conclusion


OLE is about integration on many levels. Components can come in various forms,
be they simple functional objects with a straightforward interface (such as a
string object), automation objects, data sources, compound-document objects,
or controls.
Implementing a simple OLE object is very easy (as it should be). Implementing
support for more complex technologies like compound documents and controls is
a little more complex, but more help (in the form of books, articles, and
tools) is available to developers as each month passes.
For example, Visual C++ 1.5 supports OLE by way of the Microsoft Foundation
Classes (MFC), which greatly simplifies adding OLE automation, drag-and-drop,
and compound documents to your app. Visual C++ 2.0 adds support for OLE
Controls.
Regardless of your tools, if integration is what you seek, OLE is an answer.
OLE helps you integrate components with many features and capabilities,
allowing the features of those components to evolve over time. OLE is a
complete solution to integration that will grow over time, to support
distributed objects, for example, without requiring changes to existing code.
Together, the integration technologies in OLE may make the dream of true
component software become a reality.
Figure 1 All OLE technologies build upon COM and one another.
Figure 2 Structured Storage sits above a file as a file system sits above a
disk volume.
Example 1: Comparison of code to read a structure from: (a) a Win32 file; and
(b) a stream.
(a)
 BOOL MyObject::ReadFromFile
 (LPSTR pszFile)
 {
 OFSTRUCT of;
 HFILE hFile;
 UINT cb=-1;
 if (NULL==pszFile)
 returnFALSE;
 hFile=OpenFile(pszFile, &of,OF_READ);
 if (HFILE_ERROR==hFile)
 return FALSE;
 cb=_lread(hFile, (LPSTR)&m_data, sizeof(MYDATA));
 _lclose(hFile);
 return (SIZEOF(MYDATA)==cb);
}
(b)
 BOOL MyObject::ReadFromStorage

 (LPSTORAGE pIStorage)
 {
 HRESULT hr;
 IStream *pIStream;
 LARGE_INTEGER li;
 if (NULL==pIStorage)
 return FALSE;
 hr=pIStorage->OpenStream("MyStruct", 0
 ,STGM_DIRECT STGM_READ
 STGM_SHARE_EXCLUSIVE, 0, &pIStream);
 if (FAILED(hr))
 return FALSE;
 hr=pIStream->Read((LPVOID)&m_data
 , sizeof(MYSTRUCT), NULL);
 pIStream->Release();
 return (SUCCEEDED(hr));
 }
Example 2: Saving persistent data through IPersistStorage.
BOOL SaveObject(IStorage *pIStorage, IUnknown *pObject)
 {
 IPersistStorage *pIPS;
 HRESULT hr;
 hr=pObject->QueryInterface(IID_IPersistStorage, (void **)&pIPS);
 if (SUCCEEDED(hr))
 {
 hr=pIPS->Save(pIStorage);
 pIPS->SaveCompleted(NULL);
 }
 return SUCCEEDED(hr);
 }
Figure 3 A sample composite moniker with a file and two item monikers to
identify a range of cells in a particular sheet of a spreadsheet file.
Example 3: Code that creates a composite moniker with a file and two items.
IMoniker * MakeMonikerToRange(char *pszFile, char *pszSheet
 , char *pszRange)
 {
 IMoniker *pmkComp, *pmkFile, *pmkSheet, *pmkRange; pmkComp=NULL;
 //"!" is a delimeter between monikers
 if (SUCCEEDED(CreateItemMoniker("!", pszRange, &pmkRange)))
 {
 if (SUCCEEDED(CreateItemMoniker("!", pszSheet, &pmkSheet)))
 {
 if (SUCCEEDED(CreateFileMoniker("!", pszFile, &pmkFile)))
 {
 //This creates a File!Item(Sheet) composite
 if (SUCCEEDED(CreateGenericComposite(pmkFile, pmkSheet
 , &pmkComp)))
 {
 //Tack on the range to the File!Item(Sheet)
 if (FAILED(pmkComp->ComposeWith(pmkRange, FALSE)))
 {
 pmkComp->Release();
 pmkComp=NULL;
 }
 }
 pmkFile->Release();
 }
 pmkSheet->Release();
 }
 pmkRange->Release();

 }
 return pmkComp;
 }
Figure 4 Cursors used in Drag-and-Drop.
Example 4: The usual implementation of an IDropSource interface.
STDMETHODIMP CDropSource::QueryContinueDrag(BOOL fEsc, DWORD grfKeyState)
 {
 if (fEsc)
 return ResultFromScode(DRAGDROP_S_CANCEL);
 if (!(grfKeyState & MK_LBUTTON))
 return ResultFromScode(DRAGDROP_S_DROP);
 return NOERROR;
 }
STDMETHODIMP CDropSource::GiveFeedback(DWORD dwEffect)
 {
 return ResultFromScode(DRAGDROP_S_USEDEFAULTCURSORS);
 }
Figure 5 The DoDragDrop function enters a message loop that watches the mouse
and keyboard and calls the IDropSource and IDropTarget functions.
Figure 6 A source notifies a consumer of data changes through IAdviseSink.
Example 5: Signature of the IDispatch::Invoke function.
interface IDispatch : public IUnknown
 {
 ...
 virtual HRESULT Invoke(DISPID dispID, REFIID riid, LCID lcid,
 WORD wFlags, DISPPARAMS *pdispparams, VARIANT *pvarResult,
 EXCEPINFO *pexcepinfo, UINT *puArgErr)=0;
 }
Example 6: Visual Basic code that translates to IDispatch calls.
Sub Form_Load ()
 Dim Cube as Object
 Set Cube = CreateObject("CubeDraw.Object") 'Creates the object
 'Each line of code here calls IDispatch::Invoke with different flags
 x = Cube.Theta 'Property Get on "Theta"
 Cube.Declination=.0522 'Property Set on "Declination"
 Cube.Draw 'Method call
End
Figure 7 The interfaces of a compound-document object and container.

























Special Issue, 1994
Novell's AppWare Distributed Bus


Extending a powerful event engine across the network




Joseph Firmage


Joseph is vice president of Novell's NetWare Development Tools Division,
including Novell's visual development tools: AppWare Bus, Visual AppBuilder,
and AppWare Loadable Modules. He can be contacted at Novell Inc., 4001 South
700 E., Salt Lake City, UT 84107.


The software industry has been flooded in recent years with announcements of
technology initiatives relating to distributed computing. Novell often refers
to distributed applications as "network applications," because they leverage
the power of the network in ways that stand-alone desktop applications can't.
Examples of network applications include client-server databases, online
services, e-mail applications, workflow software, transaction-processing
applications, and remote-access applications.
These types of applications increase the productivity of workgroups of people,
and often facilitate the replacement of aging terminals and mainframes with
more-intelligent desktops and servers. Network applications now play a
critical role in corporate information systems. These applications allow
corporations to centrally locate shared services (as opposed to just shared
data), while maintaining the advantages of intelligent, client-side
applications running on today's modern distributed microcomputers.
Network applications are notoriously difficult to create with the tools
commonly used for developing stand-alone desktop applications. A network
application can consist of several pieces that operate on multiple computers
as one overall software process. The developer of a network application faces
an enormous integration task because different tools are used to implement
different pieces of the application on different platforms with different
operating systems. In addition to integration issues, the connectivity
supporting the network application must support a broad range of
network-transport technologies, address security and versioning requirements,
and be maintainable (and quickly changeable) by user organizations. Perhaps as
important as any of these considerations, such applications must be rapidly
constructible.
Novell's AppWare is designed to resolve many of these difficulties. AppWare
provides the tools and technologies to rapidly develop client applications
that leverage existing network services, on multiple computing platforms.
However, AppWare, as originally announced, did not offer a solution to those
developers who required the ability to create all the parts of a network
application--both the clients and the servers. Novell's AppWare Distributed
Bus (ADB) provides this facility.
This article briefly describes the AppWare Bus and its distributed version,
ADB. It then discusses such aspects of ADB as scalability, interoperability,
service replication, Replaceable Transport Modules, administration, and
security. Finally, the ADB approach is compared with perceived alternatives,
such as Microsoft's distributed OLE and OMG CORBA.


The AppWare Bus


The AppWare Bus is an event- and protocol-based communications and control
engine for software components conforming to its straightforward API. It
orchestrates the behavior of, and interaction between, its native components,
called AppWare Loadable Modules (ALMs), and nonnative components, such as OLE
and OpenDoc, by coordinating notification-based execution (true,
queued-message-based invocation) within custom applications, which is
particularly appropriate for distributed processing. Tools that contain the
AppWare Bus are compatible with all ALMs and can offer to the programmer (or
power user) the ability to link ALMs together to build custom applications.
Novell's Visual AppBuilder is the first such tool. Other 5GL visual tools and
4GL scripting tools will be provided soon, working with the very same
components and AppWare Bus.
Novell's ADB is a network-aware version of the AppWare Bus. ADB allows you to
take any ALM-based application and partition part of the application on many
clients, and the other part on one or more servers; see Figure 1.
The rationale for having a software bus is that all component conversations
and behavior can be centrally managed, bringing all throughput under system
control. ADB defines a model where the conversations between client and server
logic are "atomically standard." That is, every conversation between the
client and server is composed of objects whose data and control messages are
the same, regardless of whether the software is running on the client or the
server. Communicating between the two sides using these objects is, therefore,
automatic and transparent to the application designer.
Today, this intercommunication is managed locally. To make the AppWare Bus
distributed, it is necessary to reroute or redirect the event transmissions
across a network. In so doing, the component parts of the overall application
can work together automatically, transparent to the component creator.
Component developers don't have to code to any distributed architecture or
API. All existing ALMs support this distributed architecture because they use
the AppWare Bus event engine. Application designers who use Visual AppBuilder
(or any tool that incorporates the AppWare Bus) don't have to design their
applications to be explicit clients or servers. They simply partition their
projects, the distribution of which can be determined and changed at run
time--application partitioning at its best.


An Example Application


As an example, say that your company is creating software for bank ATM
machines and for back-end servers that process financial transactions from
bankcard holders. This is a case where it is unreasonable to place the custom
business logic that manages transactions in the client teller machine. Only
the user-interface code should reside in the teller machine (along with the
card reader and receipt printer), while the transactions are safely processed
on a remote, secure, ultra-high-capacity server. When built using ADB, custom
business logic can be rapidly developed to operate on servers. Desktop
applications can share centralized functions (just as today they share files
and databases) without giving up their own intelligent logic.
Users of Novell's AppBuilder may recall that an application project is
organized into units called "subjects." To organize a large project, you break
it into many subjects. To connect subjects together, you alias objects from
one subject into another. The AppBuilder compiler (actually a part of the
AppWare Bus) resolves the object aliases and produces a single application.
With ADB, the user may alias objects between projects themselves which, when
compiled into applications, may be arbitrarily distributed on a network. The
object aliases establish an interface that completely specifies the
interaction between the application parts.
In this example scenario, two projects called "DB Client Project" and "DB
Server Project" together comprise a client-server database application. The Do
Query Button object is in the server project, with a function chain connected
to it that will be executed when the button is "Pressed." An alias of the Do
Query button is placed in the client project, along with a Window object to
display it. This button alias implicitly establishes an interface between
these two projects. After compilation, the resulting application partitions
can be placed together or apart. When the client user presses Do Query, the
function will execute in the server, wherever the server is. Similarly, simple
text fields, pictures, not-so-simple tables, or any other ALM object instance
that contains data (or controls something) can be aliased. When such instances
experience a change in their data, this change is reflected in their
counterparts, wherever they may be on the network. If an instance issues a
control event, this will trigger any responses connected to counterparts on
the network.
The client or server partitions can be changed or replaced at any time, simply
by maintaining conformance to the object alias interface. This allows clients
or services to grow and to offer new features without breaking existing
clients or services. It is a simple and effective concept, though impossible
without the notification-based execution model provided by the AppWare Bus.


How ADB Works


As discussed previously, the AppWare Bus employs a notification-based
processing model for executing applications. ADB extends this model by placing
a network connection in the central processing queue. The network connection
itself is DLL-replaceable by Novell or third parties. It can be a simple RPC,
ORB, pipe, or any other network namespace or transport engine. Within an
application partition, ADB intercepts data and control events for object-alias
instances that reference original objects of another partition. ADB reroutes
and replicates these events through the network connection as needed for
distributed data and control.
More specifically, ADB monitors the event queue for the following events:
ObjectChanged data events, carrying affected object data along.
Object signals, triggering custom logic in any partition where function chains
exists for them.
Object event protocols, for inter-ALM conversations underneath higher
application logic.
Duplicates of these events, along with affected object data, are passed to
object aliases, or to originals in other partitions of the overall
client-server network application.
Once an event is rerouted, the same ALM code is operating on the receiving end
to process the event. So, it is impossible for the message being transmitted
not to be understood. No agreement by committee (or otherwise) must be reached
as to the format of the information crossing the wire, since the sender and
recipient are, in fact, the same type of ALM. 
Note that ADB can create a distributed application around any component, not
just those defined by Novell. As long as the component employs the AppWare Bus
mechanisms for signals and data management, ADB can distribute the processing
transparently. Novell will provide the wrappers for OLE and OpenDoc
components, so that the AppWare Bus can control and leverage them as well as
ALMs. Thus, when you create an ALM, you are, in effect, defining a new object
that becomes a de facto network atom whose data and control are understood by
any other foreign application client or service created with the same type of
component. By such event replication and routing, along with service
replication (described in a later section), the process of building a
distributed application becomes simple, but remains powerful.
If so designated by the application designer, a client or service partition
may be queried at run time (by way of an administrative ALM described later)
for its object-alias-interface specification. The response to such a query is
in the form of a table containing the portion of the interface designated as
public by the application designer. 
Examples 1 and 2 present the object-alias interfaces for the client and server
partitions of the example banking application. (Although, in such a situation,
you probably wouldn't make the application interfaces public!) In this
application, the server provides original objects to transfer information (and
requests) to/from the teller machine. The teller machine provides alias
objects to transfer information (and requests) to/from the server. Notice that
the server's interface title, "Bank of Novell," is called the service type. It
is this service-type identifier (not the individual objects) that is
registered in the namespace on the network. Each instance of a service of a
given service type has a service name (which can be any string) giving a
proper name to that service instance for network registration. Consider the
following analogy. A workgroup might have two laser printers, one named
"Engineering" and the other named "Documentation." Both network entities are
of the type "LaserWriter." It is the same with service names and service
types.
At design time, the partitions can be assigned version numbers and
compatibility numbers, so that clients and servers can both be improved and
changed over time. Every client and server indicates its version numbers, and
the server indicates its minimum compatibility version for clients. ADB will
not allow a connection to be made between a client whose Required Server
Version number is lower than the server's Server Compatibility Threshold, or
higher than the server's Server Interface Version. In this example, note that
the Bank of Novell server offers a picture object ("Advrtsmt") to display as
an advertisement while the cardholder is waiting for the transaction, but the
Teller Screen client doesn't support that facility (perhaps because that
particular teller has just a text-only display ability). The server is version
2, but remains compatible with clients expecting version 1 or 2.
The "Object" type column contains the 32-bit unique ID registered with Novell
for your ALM object type when you created it. ADB does not understand the type
reference. It's there to match types with counterpart objects in counterpart
interfaces. The type "Text" doesn't mean anything to ADB, but ADB can match
that ID with IDs in counterpart interfaces. Registering type IDs with Novell
assures their uniqueness.
Each object in the interface may designate whether it can receive and/or send
information. ADB thus gives the application designer the ability to restrict
the flow of data and control to given directions. When an object can both send
and receive information, ADB enables special logic to ensure that data and
control "echoes" do not occur.
Note that the client interface is headed by the words "Interface of 'B of N
Teller Screen' for 'Bank of Novell.'" A given client partition may
simultaneously be a client to more than one server, and may have other
object-alias interfaces that refer to other server interfaces. Thus, a request
to return a client's object-alias interface must specify the server to which
the interface applies.



Service Replication


One design challenge common to most client-server system architectures is
enabling many client applications to simultaneously access a common service.
Transparently separating one large application into two pieces (one of which
can run on a desktop and the other on the server) is practically useless.
Transparently separating one large application so that one of its pieces can
be used on many desktops while the other piece is deployed on one server is
very useful. In short, one-to-one distributed processing is not client-server
in nature. Many-to-one distributed processing is enormously valuable and
directly fulfills the goals of the client-server paradigm for all application
processes, not just SQL databases.
Some people claim that it is as simple as taking a typical application-level
C++ class and using its interface declaration on the client to call the
implementation on a server. In practice, however, this often delivers a system
that is sound only when there is one client. In order for many clients to
connect to a single server, the implementation on the server must anticipate
the fact that many clients may use its services simultaneously. In fact, it is
very difficult to create code that handles many client requests
simultaneously.
While this issue can be addressed by properly designing and developing the
service, doing so has usually required the talents of highly trained
programmers. Since the fundamental value of AppWare is that it enables
business programmers and knowledge workers to create network applications
(and, with ADB, create the custom-network services, as well), it is necessary
to transparently incorporate this sophisticated, many-to-one service logic
into the AppWare Distributed Bus. ADB has a facility called "service spawning"
that addresses this problem. 
Service spawning allows you to designate a server application partition that
will replicate a portion of its data space and optionally create a new task
for each client that connects to it. Thus, when a client initiates a
connection to a server, the server may automatically extend itself, so that
the server partition can maintain state information for the client. Given this
facility, server logic can be designed as if only one client were connected to
the server. 
Other partitions of the same overall application may not spawn, perhaps
serving the spawning partitions once the information from their clients has
been reduced to a stateless form (such as one-way transactions to/from a
database). In fact, if you wish, servers need not spawn at all. You are then
responsible for assuring the integrity of multiple client-server
conversations. Generally, however, spawning server partitions simply
centralize portions of multiple client applications that would otherwise
consume the same computing resources on lesser-powered, distributed desktops.
By centralizing what would otherwise run on clients, you can improve
performance as a shared function is brought into close proximity with shared
data. In addition, the clients are insulated from implementation and any
changes to it. Further, the client applications themselves can grow
independently from the server applications, retaining whatever logic and
intelligence with which you may empower them.


Replaceable Transport Module


Neither ADB nor ALMs depend directly on any one network transport facility to
accomplish distributed processing. ADB specifies an API that it invokes to
fulfill the namespace and transport requirements underlying distributed
applications. Thus, Novell and third parties are free to replace the
Novell-supplied namespace and transport simply by replacing a particular DLL
with one containing an alternative implementation. Such alternatives might
include RPCs, ORBs, or even simple, direct serial communication links. The
scalability, flexibility, mobility, communications performance, and
network-protocol support for ADB distributed applications are all determined
by the Replaceable Transport Module (RTM).
An RTM must provide the following essential abilities:
Register network-visible entities in a namespace under a certain type and name
string.
Respond to name queries for all network-visible entities of a certain type
and/or name.
Open extended conversations (for example, not single transaction) between two
named entities.
Open reasonable numbers of simultaneous conversations between two such
entities.
Make intranode connections transparent if local deployment is to be supported.
Deliver and receive arbitrarily large data streams, reliably or with robust
error detection.
Provide asynchronous callback hooks for send-complete/receive-started
occurrences.
Abstract the transport protocol, if necessary.
The RTM can provide user connection, authentication, security, and encryption
facilities itself, but this is not required. It is generally presumed that
some basic level of such services will be supplied by the underlying network
operating system (NOS) in an enterprise environment. Tight security at the
level of application logic may be employed by using the administrative ALMs.
Novell will supply an RTM for ADB, bundled with AppBuilder Version 2.0. This
RTM is scalable for use in large enterprises by way of a dynamic namespace
called NetWare Directory Services with multilevel domains and dynamic
replication avoiding single-point-of-failure. The RTM can transport over
IPX/SPX, TCP/IP, or AppleTalk, one at a time (interprotocol transparency is
optionally available), and uses the underlying network operating system (NOS)
for connection, authentication, and other security provisions.
Novell will also provide an RTM based upon the well-known Tuxedo transaction
processing system. This RTM uses a transaction-based communication model that
provides dynamic load balancing for use in large-scale deployments.
Novell also will supply an RTM for a simple serial connection, for use in
dial-up modem access to non-LAN-based online services built upon ADB. This RTM
provides a serial transport facility with essentially no namespace--the
namespace is limited to the caller and the callee. This RTM is important
because the performance and reliability benefits of ADB are most clearly seen
in the context of a slow, unreliable connection. ADB can thus facilitate the
building of online services.
An RTM can be replaced at any time, without even requiring recompilation of
ADB-based applications. However, note that distributed applications running on
different RTMs are not automatically interoperable; to attain
interoperability, gateways must connect namespaces and transport formats.


Heterogeneous Interoperability


Novell has defined a specification for the way ALM objects import and export
their attributes to/from a universal, atomic data expression. This expression
consists of data types such as strings, numeric values, images, sounds,
arrays, and so on. The import/export facility processes data in Universal
Program Structure (UPS) format. Historically, UPS has been used to port
ALM-based application projects between different desktop platforms (such as
Microsoft Windows and Macintosh).
ADB uses UPS to provide rapid run-time data conversion for object values
moving between different native OS platforms. This conversion may involve more
than just byte-swapping. For example, the Picture ALM on Macintosh computers
uses the PICT standard to display images, while on Microsoft Windows the
Picture ALM uses BMPs and metafiles. So, the implementation of UPS on
Macintosh can convert between a Mac PICT and a UPS image, and the Microsoft
Windows implementation of UPS can convert between a BMP or metafile and a UPS
image. The Picture ALM supports the import/export entry points necessary to
interact with UPS. A Picture object can thus be moved anywhere that UPS is
implemented. The implementation of UPS on each given platform supports
conversion between that platform's common OS data types (as well as simple
data types) to UPS universal equivalents, so that the ALMs themselves need
only have the ability to port native object data to UPS types, rather than
every analogous type supported by any target platform.
When ADB connects two partitions, each running on the same native operating
system, no UPS conversion takes place, and, thus, native information is
transmitted.


Scalability


The scalability of ADB-based distributed applications depends on the ability
to construct multilayered, client-server network applications. ADB allows any
number of hierarchical layers of clients and servers. This means that a client
partition can itself be a server to a higher-level client.
Scalability also depends on the namespace employed in the RTM. Consider a
client-server application with 10,000 users and 200 servers in a large
corporation. If, as each user launches the client application, all 200 servers
were visible and accessible, it would be very difficult to navigate and manage
resources, connections, and network activity. The network traffic alone
required to dynamically display the names of all 200 servers to each of the
10,000 users would overload any common NOS. The namespace is responsible for
more intelligently setting hierarchical contexts that restrain the view and
access of users to other users and servers; see Figure 2.
The RTM included with ADB includes a comprehensive namespace technology that
can automatically (and under administrative control) establish multilevel
domains imposing the necessary scopes to prevent unmanageable situations from
arising. This RTM can accommodate unlimited numbers of concurrent users by
using these namespace domains. 


Reliability and Performance 


One key attribute of ADB unmatched in most other distributed object models is
that there is no redirection of OS or component-code routines. "Redirection"
is the process whereby a routine is fooled into thinking that it is running
locally when it is actually running on a remote machine. File and printing
services in all common network operating systems are implemented in this
fashion (the file and print APIs of the local operating system are redirected
toward remote devices). Microsoft's stated plans for distributed OLE also
follow this general model (OLE server interfaces are redirected to remote
implementations). In theory, all applications that use those local interfaces
transparently will get the benefit of the shared resource.
The upside of redirection is that it automatically works with existing
applications. One of the most serious downsides is the difficulty of handling
error conditions that occur when network connections are broken or otherwise
fail. For example, when a Read operation fails on a server disk drive as a
result of a communication problem, the number of levels requiring perfect
error trapping in the user's OS, objects running on the OS, and in the custom
code of the application, is extremely large.
In applications based on ADB, the connection is not established by the
redirection of code routines executing in the OS or in the ALM components.
Rather, the connections between different segments of the application logic
made by the application designer are redirected across the wire by the AppWare
Bus. So, if a failure in the network connection occurs, the worst that can
happen is that your particular application partition triggers an error
condition requiring some handling in your client and/or service-application
logic. You must incorporate reasonable error trapping in your logic to handle
for this type of occurrence (which you'd have to do in any distributed model).
ALMs are provided to assist in detection. In the ADB model, it is impossible
for a network-connection failure to result in a crash or system fault in the
OS or ALMs, regardless of whether ALM developers implement perfect error
checking--there is no explicit or implicit networking in ALMs around which to
error-check. This key reliability benefit is not well recognized, even in the
evolving era of mobile computing in which stable network connections will be
rare.
The same architectural issue drives performance. In general, as the network
conversation becomes less frequent and more coarse-grained, the overall
application performance dramatically rises, and the ratio of overhead to data
becomes insignificant. ADB's architecture will provide one of the highest
performing models for distributed processing, approaching the architectural
performance of hard-coded transaction-processing systems.


Administration and Security



Along with ADB itself, Novell will supply three ALMs that allow you to include
administration and security functionality in server and client projects. These
ALMs also can be used to create separate management applications. The three
ALMs are Client, Server, and Monitor. Table 1 summarizes the functions and
signals in these three ALMs.
As you can see in Table 1(a), the Client ALM provides functions to manually
connect and disconnect to and from a specified server partition. The server's
identity is either known in advance or discovered by way of functions that
list services available in the client's network context. Client partitions can
be designated at compile time to connect automatically--a setting that can be
overridden by the server partition for authentication purposes. The Client ALM
also provides functions to list the types of servers available to connect to,
and individual servers of each type.
The Server ALM shown in Table 1(b) allows the server's logic to authenticate
client login requests using whatever application logic it wants (including
calling the NDS ALM to check in with the global NetWare directory) and accept
or refuse such connection requests. The ALM also allows server to establish an
exclusive conversation with a particular client, temporarily locking out other
clients. Finally, the Server ALM provides functions to disconnect clients and
list all connected clients.
The Monitor ALM, summarized in Tables 1(c) and 1(d), allows both the client
and the server logic to describe the object alias interface of the
counterpart's partition, temporarily suspend the transmission of information
for an object or the whole interface, and manually refresh object values in
counterpart partitions. The Monitor ALM also offers an object that provides
several signals, triggering client or server functionality when important
connection-related events occur.
Remember that the implementation of many of these functions is supplied by the
RTM. The AppWare Bus abstracts away differences between RTM implementations,
so that applications don't need to be recompiled to operate on a new
distributed namespace and transport. Note also that all these functions
inherit the administration, authentication, and security provisions of both
the RTM and the underlying NOS network connection, if there is one.


Other Distributed Object Models: OLE and CORBA


Figure 3 illustrates the fundamental contrast between distributed OLE and ADB.
At this writing, OLE 2.0 is not fully enabled for distributed applications.
Microsoft plans to make OLE 2.0 distributed by inserting a general RPC in its
Component Object Model (COM), the underlying object library engine. This is
planned to be released in conjunction with Cairo in 1996. These relationships
are shown in Figure 4. The key question is where the transparent distribution
occurs. As we understand Microsoft's plans, OLE will distribute processing at
the seam between the custom application logic and OLE server components. Thus,
when an OLE 2.0 container application invokes an OLE 2.0 server through COM,
the server itself could be executing remotely on Cairo, connected via RPC.
Transparent distribution can't really occur within custom application logic
unless there is some system--like the AppWare Bus--orchestrating the execution
of the application logic. AppWare Distributed Bus thus can distribute at one
or more arbitrary seams within the custom application logic. ADB places
control over the distribution seam(s) in the hands of the application builder,
who does not need to know the complexities of crafting a service in the form
of an object. "Partitioning" is targeted at the easy separation and
distribution of portions of application logic, right within a single overall
application logic, rather than the creation of a separate service encapsulated
as an object. For those who wish to take the time to create services as
objects, AppWare's support of OLE and the other object models will facilitate
the use and eventual creation of these components as well, so users get the
best of both worlds.
CORBA is a technology initiative led by the Object Management Group (OMG).
OMG's goal is to establish agreement among major system and application
software vendors on a universal object-oriented model for expressing
interfaces for distributed services (instructions on how to access network
services), and for accessing and using such interfaces to execute distributed
services. Such a system is called an "Object Request Broker" (ORB). By
achieving agreement among the various vendors, network services become
interoperable and interchangeable.
An ORB is a general abstraction for sophisticated developers to use in writing
client/server software systems, where services are encapsulated as universally
usable objects. ORBs will end up working underneath several higher-level
network middleware and application software systems, including partitioning
systems. An ORB can therefore be used to create the Replaceable Transport
Module used by ADB to fulfill distributed processing. Thus, given an RTM based
on a CORBA-compliant ORB, ADB-based applications are CORBA compliant. Novell
expects to provide an RTM based on a CORBA-compliant ORB.
Of course, AppBuilder's forthcoming support of a number of key object models
will guarantee that services exposed as CORBA-compliant objects will be fully
usable, and eventually can be created, within AppBuilder.


Conclusion


The ADB provides a fundamentally different approach to distributed
applications. It approaches the problem by setting three fundamental
objectives. Custom application logic must be able to be placed on the server.
A "periodic table of software elements" comprising a universal expression of
atomic data and control between local and remote applications must be
supported. The system must not demand synchronous execution.
In Novell's vision of the future of network computing, AppWare will play an
important role, and ADB will provide a remarkably simple and powerful way for
AppWare users to leverage the power of distributed processing. It has
immediate relevance to any business that wants user applications share
functions as that business shares data data today, within the context of
distributed networks of microcomputers.
Figure 1 AppWare distributed bus (ADB) allows partitioning an application
between client and server.
Figure 2 Namespaces provide scalability by limiting visibility.
Figure 3 Comparing distributed OLE with ADB.
Figure 4 The structure of distributed OLE.
Example 1: Server interface for bank ATM application.
Interface of "Bank of Novell"
Server Interface Version: 2
Server Compatibility Threshold: 1
Object TypeID Kind Direction Comment
Bank Key Text Original Receive Bank's key to get into system
Card ID Text Original Receive Card holder's card #
PIN Text Original Receive Card holder's PIN #
OpCode Nmbr Original Receive Transaction selection
Amount Nmbr Original Receive Dollar value of transaction
Confirm Subr Original Receive Confirm action to Message
Cancel Subr Original Receive Cancel action to Message
Dispense Subr Original Send Dispense cash
Message Text Original Send Content sent to teller screen
Receipt Text Original Send Content sent to receipt printer
Advrtsmt Pctr Original Send Ad to display while waiting
Example 2: Client interface for bank ATM application.
Interface of "B of N Teller Screen" for "Bank of Novell"
Client Interface Version: 1
Required Server Version: 1
Object TypeID Kind Direction Comment
Bank Key Text Alias Send Bank's key to get into system
Card ID Text Alias Send Card holder's card #
PIN Text Alias Send Card holder's PIN #
OpCode Nmbr Alias Send Transaction selection
Amount Nmbr Alias Send Dollar value of transaction
Confirm Subr Alias Send Confirm action to Message
Cancel Subr Alias Send Cancel action to Message
Dispense Subr Alias Receive Dispense cash
Message Text Alias Receive Content sent to teller screen
Receipt Text Alias Receive Content sent to receipt printer
Table 1: The C/S column denotes whether the function or signal applies to a
client (C) or server (S) partition. The I/O column denotes whether the
function or signal applies to the set of objects comprising the interface (I)
or just an individual object (O) in the interface. (a) Client ALM functions;
(b) Server ALM functions; (c) Monitor ALM functions; (d) Monitor ALM signals.
 (a) C/S I/O Functions 

 C I Connect
 C I Disconnect
 C I Is connection
 C I List service types
 C I List services
 C I Get client ID
 (b) C/S I/O Functions 
 S I List clients
 S I Start exclusive
 S I Stop exclusive
 S I Disconnect client
 S I Authenticate client
 S I Accept client
 S I Refuse client
 (c) C/S I/O Functions 
 C/S I Describe interface
 C/S I/O Suspend
 C/S I/O Resume
 C/S I/O Refresh
 (d) C/S Signals 
 C/S Connection requested
 C/S Connection accepted
 C/S Connection refused
 C/S Connection closed
 C/S Connection broken
 C Auto-connect failed




































Special Issue, 1994
Distributed Applications and NeXT's PDO


PDO brings distributed objects to non-NextStep environments




Dennis Gentry


Dennis is a member of the developer-support team at NeXT. You can reach him by
e-mail at dennis_gentry@next.com.


The Portable Distributed Objects system (PDO) is a powerful subset of NextStep
technology. It's an extension of Distributed Objects (DO) and is part of the
NextStep development environment. Distributed Objects and Portable Distributed
Objects enable developers to efficiently construct, operate, and maintain
complex client/server applications in a heterogeneous computing environment. 
What happens when more people need to use your application than you had
initially planned, so that you need to split the processing load across
several computers? Or, what happens when you want to use NextStep to build the
user interface to a database application, but the database server runs on an
HP server? Perhaps your company needs to build a groupware application that
lets people work together interactively. DO and PDO were designed for these
situations.


Share and Share Alike


The DO system provides a way to share objects among multiple client/server
applications running on separate computers on a network--distributed over both
client and server machines. In this model, the server application consists of
a collection of objects that are intended for use by cooperating client
applications. The server publishes some of its objects to make them available
to client applications on the same computer, and on other computers on the
network. To the clients, the published objects are messaged as if they were in
the same process as the rest of the client. This transparent messaging is much
cleaner than older-generation remote procedure call (RPC) mechanisms. DO
preserves the power and benefits of object-oriented programming, even in a
distributed application environment.
The PDO system extends the power of Distributed Objects to non-NextStep
computers. It allows a core section of the NextStep environment to run on
other systems. Objects in the PDO environment can communicate over networks
with other PDO and DO objects. The PDO system includes all the parts of
NextStep necessary to run distributed-object servers--plus some additional
common functionality (such as NextStep's file-stream functions and portable
BuildServer). PDO facilitates reusability of objects developed under NextStep
and doesn't require additional software on NextStep clients or servers. It
lets you use objects remotely as either clients or servers, even on machines
that aren't running NextStep.
The components that make up PDO are:
Objective C, which is the language used to create objects and interact with
other objects.
The Objective C run time, which are extensions to the Objective C language
environment that provide for dynamic binding at run time.
The NextStep Core Classes, which is a library that provides classes such as
Object, List, and HashTable.
The Distributed Object system, which is a set of classes that manages the
brokering of server objects and provides transparent connections to remote
objects in the network.
The Transport Mechanism, which is built into the DO classes and provides an
independent transport mechanism for communicating between objects over a
network.
The Portable nmserver, which is a portable version of the Mach Network Name
Server used in DO to provide object naming over the network.
The first targeted platforms for PDO are mainstream UNIX-server environments
such as HP-UX and Solaris. The design of PDO is not restricted to UNIX
platforms, so that future support of environments such as NetWare and Windows
NT is possible. With regard to the emerging CORBA standard for object
interoperability, Next has submitted an Objective C binding for the IDL
interface description language so that CORBA clients and servers can be
written in Objective C.


PDO/DO Advantages


Compared to other popular RPCs (such as Sun RPC and Mach RPC), DO and PDO have
a number of advantages that make developing with them nearly transparent. They
allow you to cleanly design client/server application architectures without
the usual hassles that come with other RPC mechanisms.
One important aspect of DO is that it is dynamic. Other RPC systems require
you to specify the exact procedures that you'll call remotely. Likewise, they
require you to indicate the exact types and sizes of the arguments and return
values. When you add a procedure to your RPC project's list of remotely
callable procedures, you must recompile all affected code on the server and
the client. In contrast, DO allows you to send messages to objects that don't
exist, or haven't even been defined. If a new DO server implements and exports
an object that conforms to some message protocol, previously running clients
that use that protocol can begin using the new object immediately.
Another advantage is that DO frees you from many memory-management concerns.
You can't completely ignore memory management, because there's no automatic
garbage collection in NextStep. However, if you're just sending and receiving
parameters and return values, you generally don't need to explicitly deal with
memory as you would with other RPC systems.


Divide and Conquer


With some RPC systems, you must decide that you're writing an RPC program
before you start. If your existing single-machine code was not written with
RPC in mind and you later need to scale up your application as your business
grows, you'll have to rewrite and extend your program to distribute it across
multiple machines. If you're concerned about decent performance with your RPC
application, you have even more work to do. 
In contrast, you can often take a nondistributed NextStep application and make
it distributed with little trouble. Your NextStep application already should
be composed of objects. In such a case, distributing your application might
involve merely identifying the relevant objects and moving them to a server
program.
DO and PDO also benefit from the advantages of object-oriented programming
over procedural programming. Because your application is made up of objects,
and because of the encapsulation properties of objects, your application will
likely be composed of well-contained computational units from the start. These
units often can be relatively easily distributed across multiple machines
because of their clean interfaces to other objects, and their locality of
reference will foster better performance in a distributed environment.


Choosing Between DO and PDO


Ordinarily, most programmers probably would choose to use DO instead of PDO
because DO runs under the full NextStep environment and is, therefore, more
powerful (and simpler to use). For example, the full Application Kit (a
general-purpose application framework) is available under NextStep, but not
under PDO. Also, some PDO operating systems don't have the functionality to
support preemptive threads that you may need to build your server. (The PDO
comes with Distributed Object Event Loop to work around this limitation.)
However, you might consider using PDO rather than DO to build a server for
your application in situations such as these:
A central machine must serve many requests.
Your applications have occasional compute- or memory-intensive requests, or
need a fail-safe or easily recoverable server.
A non-NextStep machine is already set up to parcel out a centralized data
feed. 

You'd like to take advantage of your heterogeneous network to perform tasks in
parallel.
If you don't have one or more of these requirements, you might find a
NextStep-based DO server more convenient than a PDO server. If your site
outgrows your NextStep server, it's relatively easy to move your server to a
PDO platform.


How to Distribute Objects


Applications take advantage of Distributed Objects by sending ordinary
Objective C messages to objects in remote applications. The program that
implements and makes an object available for remote use is called the
"server," and a program that takes advantage of that object by sending it
messages is a "client." A single application can easily play both the client
and server roles.
To set up servers and clients, you must add a few additional lines of code to
each cooperating application to specify which applications and objects are
involved. Both DO and PDO can handle most data types as arguments or return
values--including structures, pointers, strings, and, most importantly,
objects (ids). 
To make an object distributed (and, therefore, available to other
applications), a server program must first "vend" (make publicly available)
the object. Example 1(a) shows a simple application that shares a central
stock-price data feed. The NXConnection class provides other, more commonly
used methods than run that allow the waiting to take place asynchronously.
Example 1(a) instantiates a price server, then registers that server with the
network name server stockPriceServer. The last line loops to wait for remote
messages. You must include this code in each application that will participate
in Distributed Objects.
To use an object that has been vended, a client looks up the desired server
object and stores a handle to it in a local NXProxy object. Example 1(b) gets
a handle and stores it in the variable theServer. If this code returns a
non-nil value to theServer, the client may then refer to the stock-price
server on the server machine as if it were implemented in the client, with
only a few exceptions; see Example 1(c). This transparency is at the heart of
Distributed Objects. 


Passing Objects


Perhaps the most important data type that clients and servers can pass to each
other is the id. In the example application, the server explicitly vends and
the client looks up only one serving object. After that, either the client or
the server may pass ids of objects that each wishes to implicitly vend as
arguments or return values. 
As long as the client is prepared to handle remote messages by way of some
form of NXConnection-run message, and as long as the client vends an object to
the server in this way, the server may then use objects in the client. Thus,
the client and server switch roles. More commonly, the original server can
make additional objects available that are useful to the client, without
additional setup code overhead.
For example, suppose the stock price server needs to return more attributes
than just the price of the stock. A good way to do this is to have the server
return a Stock object that the client can then query for the stock attributes.
The client code might look like Example 2. Executing the first line implicitly
vends a Stock object from the server, which is then accessible by way of the
id mystock. Each printf statement invokes the remote stock object in the
server, even though the client refers to myStock just like any local object.


Multithreaded Servers 


In Example 1(a), the last line of the program, [myConnection run], never
returns--it just loops while waiting for incoming remote messages. However, in
most applications, a server must do more than simply service remote messages.
For example, a real stock-price server also might update a database from a
real-time data feed. Multiple threading is the mechanism that allows a server
to continue with other tasks while it also waits for messages to objects it
has vended. The DO system makes multithreading very easy for programs built
with the Application Kit, using the NXConnection method runFromAppkit.
Although in most applications, you can use PDO objects in exactly the same way
as DO objects, you can't currently write multithreaded PDO servers. This is
because there are no tools for threads in the HP operating system. For
example, the DO code from the server shown previously might be enhanced to
look like that in Example 3. The runFromAppkit method creates a new thread
whose sole purpose is to loop, waiting for remote method invocations. The
runFromAppkit method also is aware that the Application Kit isn't thread-safe,
so it waits to dispatch remote methods until your application is between
Application Kit events. If your server doesn't use the Application Kit and
requires finer-grained parallelism, other methods let you create threads that
dispatch remote methods without waiting for the Application Kit. These methods
are documented in the NextStep General Reference book.


Avoiding Pitfalls


If your application is simple (like the example shown here), you'll find that
making your application distributed is fairly transparent. However, if you're
building a more complex, robust application, you must be aware of a few more
issues.
One is that the semantics of returning self are different. In Objective C,
it's common to return the id self to indicate success of a method. While this
has reasonable performance for local objects, returning self to a remote
caller actually vends the object to which self refers, with all the overhead
involved. Unused object vending is not excessively expensive, but for maximum
efficiency, objects should return a more appropriate type than self. 
For example, to indicate success or failure, an object should return a scalar
type such as YES or NO, instead of self or nil. If the server doesn't need to
return a status at all, it can return void, and the method call can use the
oneway keyword. This results in a very fast one-way asynchronous call--meaning
that the caller doesn't even have to wait for the remote method to finish.
Another issue in distributed applications is how to handle a failure in the
network or in remote machines. You must make sure that cooperating programs
deal gracefully with the failure of their clients or servers. The exact action
an application should take depends on the nature of the cooperating programs,
but DO provides a reasonably simple mechanism that allows programs to notice
the loss of a cooperating program. To be notified of the loss of a cooperating
program, an object must request notification and implement a senderIsInvalid:
method. When the object is asynchronously notified by way of this method, it
must determine which remote objects have become inaccessible, and decide what
to do about it.
Another pitfall in making an application distributed is that a few data types
can't be passed and returned transparently. These types are unions, void
pointers, and pointers inside structures other than ids and char *s. The basic
problem with these types is that, in general, the compiler can't know the size
of the data being referenced, so it can't pass the data to a remote program in
a meaningful way. Another problem is that the computer on which the remote
object is running might deal with the data differently (for example, it might
use different byte ordering). The result is that it's not possible to pass
data types whose layout can't be known. 
Presently, there are at least two ways to deal with this limitation. One is to
type-cast pointers, which works around the void pointer problem, as long as
the pointer is to a nonrecursive structure. Another approach is to enclose
complex structures into an object and transmit that object. However, if you
find yourself often transmitting objects around, you might consider
redesigning or repartitioning your application to lessen network traffic.
To transmit object copies instead of vending them, use the new bycopy
Objective C keyword in the parameter list. Be sure to conform to the
NXTransport protocol, which requires that you write three simple methods:
encodeRemotelyFor:freeAfterEncoding:isBycopy:, encodeUsing:, and decodeUsing:.
The first of these is actually implemented in the Object class. You typically
override it with a simple two-line method that uses the isByCopy parameter to
decide whether to send a copy of the object. If a copy is to be sent, the
other two methods cooperate to send the data necessary to create a copy of the
object at its new location. The encodeUsing: method runs locally and packs up
the unique data of the object. On the remote computer, the decodeUsing: method
unpacks the object to instantiate a copy.
With regard to strings, a nontransparency arises in the current version of DO,
which manages the memory for storing pointers to chars (strings) differently
than it does for pointers to other data types. Normally, pointer data is
automatically freed when the server returns. However, in the current DO, the
server must explicitly free strings when it has finished with them. If your
server code doesn't free strings, the memory for those strings is lost. In a
future version, DO will manage the memory for strings just as it currently
does for other data types. 


Pitfalls in Scalability 


Performance, deadlock avoidance, and transaction management are issues that
you don't need to worry about in smaller or simpler DO applications. However,
in large distributed applications, these issues can become very important.
With a larger network that has complex needs, latent problems in your existing
DO applications may then become apparent. This isn't completely bad news,
because once problems are apparent, you have a better chance of fixing them. 
Dealing with these advanced issues is beyond the scope of this article.
However, it behooves you to consider the inherent complexity involved in
writing distributed applications before you begin work on a large-scale
distributed application. To deal with deadlock, you must carefully think
through the behavior of cooperating and competing servers to make sure they
can never mutually rely on the same resources at the same time in order to
make progress. Likewise, to deal with managing atomic transactions, you should
consider using a two-phase commit protocol. 
If you plan to put compute-intensive tasks on a server, keep an eye on scaling
issues. For example, no current PDO server has the aggregate computing power
of 500, or even ten Pentium-based NextStep machines. Therefore, if you might
eventually decentralize your application, you shouldn't plan to saturate a
single central server. Rather, consider distributing compute-intensive tasks
across multiple server machines.The trade-off, of course, is that it can be
more difficult and time consuming to correctly implement your computation for
parallel processing.
If you decide to distribute a task across several computers, keep in mind that
the network has a finite bandwidth that can be saturated by a few
high-performance machines sending remote messages extensively. Design your
application to take advantage of the facility in DO for moving objects from
one machine to another. This can reduce the amount of remote messaging that
might otherwise occur.


Conclusion


DO and PDO offer excellent tools for developing client/server applications.
Their design also provides the flexibility to expand applications as NextStep
and PDO become available on more platforms. 


References



Andleigh, Prabhat and Michael Gretzinger. Distributed Object-Oriented
Data-Systems Design. Englewood Cliffs, NJ: Prentice Hall, 1992. ISBN
0-13-174913-7.
Elmasri, Ramez and Shamkant B. Navathe. Fundamentals of Database Systems.
Menlo Park, CA: Benjamin/Cummings, 1994. ISBN 0-8053-1748-1.
NeXT Computer Inc. NEXTSTEP 3.2 General Reference, Volume II. Reading, MA:
Addison-Wesley, 1992. ISBN 0-201-62221-1.
NeXT Computer Inc. NEXTSTEP 3.2 Release Notes. NeXT Computer, 1993.
NeXT Computer Inc. Object-Oriented Programming and the Objective C Language.
Reading, MA: Addison Wesley, 1993.
NeXT Computer Inc. Portable Distributed Objects 1.0 Release Notes. NeXT
Computer, 1993.
Example 1: (a) Server code for an application that provides a stock-price data
feed; (b) client code that connects to the stock-price server; (c) client code
that accesses data from the server.
(a)
 id myServer = [ [PriceServer alloc] init];
 id myConnection = [NXConnection
 registerRoot: myServer
 withName: "stockPriceServer"];
 // the following statement does not return
 [myConnection run];
(b)
 id theServer = [NXConnection connectToName:"stockPriceServer"];
(c)
 printf("IBM is currently at %d\n", [theServer priceFor:"IBM"]);
Example 2: Client code that gets an object from the server and sends messages
to that object.
id myStock = [theServer stockFor:"IBM"];
struct tm today = gmtime();
printf("IBM is currently at %d\n", [myStock priceAtTime:today]);
printf("IBM's last dividend was %d\n", [myStock dividend]);
Example 3: Server code that uses multiple threads.
id myServer = [[PriceServer alloc] init];
id myConnection = [NXConnection
 registerRoot: myServer
 withName: "stockPriceServer"];
// the following line creates a new thread that waits
[myConnection runFromAppkit];
// Code to receive the data feed goes here
// and is executed in the original thread...






























Special Issue, 1994
Implementing Interoperable Objects


And in this corner...




Ray Valds


Ray is senior technical editor at DDJ and can be reached at
rayval@well.sf.ca.us.


If you have carefully followed the previous articles on future-oriented
technologies such as CORBA, SOM, COM, OLE, OpenDoc, PDO, ADB, and TalAE, you
may find that your eyes have glazed over a wee bit. Like many DDJ readers, you
perhaps realize that even the cleanest design can exhibit a surprising number
of blemishes, or even flaws, when cast into the form of a working program. So
you may be wondering at this point, "Where's the code?"
In this article, I'll try to provide a concrete basis for evaluation of some
of the technologies discussed previously by presenting side-by-side
implementations of similar functionality. This article is not a "shootout" in
the sense of having a panel of judges evaluate each competing entry and arrive
at a winner. As much as possible, you are the judge, and you can weigh the
merits of each competitor according to your needs. A more accurate word for
this is "bakeoff"--the same word used by the old Internet tradition of
providing competing implementations at interoperability conferences. However,
in a year when the game DOOM II shipped 500,000 copies during its first week,
culinary metaphors lose out.


Setting the Time and Place


Earlier this year, DDJ made a request to each of the vendors whose technology
offerings are described in this Special Report. The request was to implement a
small program specification that could highlight some aspects of the
technology that might not be covered by a narrative description. 
For a variety of reasons, not all vendors could participate. Some packages are
still in the confidential stages of development, scheduled for release next
year. For other vendors, prior commitments prevented allocating resources for
this project. And CORBA, from OMG, is a specification rather than an
implementation. Although the CORBA spec is supported by more than a dozen
vendors, each implementation is perhaps different enough (variations that add
value and differentiation) that it would be a slight to the others to pick
only one candidate.
Nevertheless, we do present here the two object-model technologies considered
to be the leading contenders in the interoperable object wars: Microsoft's
Common Object Model (COM) and IBM's System Object Model (SOM). In addition,
there is a separate implementation from Microsoft that illustrates the
higher-level services of OLE, the reason for which will be explained shortly.
Note that these are not the only viable approaches to doing distributed
computing with objects today. As mentioned previously, a number of vendors are
offering solutions that solve enough aspects of the interoperable object
problem that you can build real-world systems today. These vendors include
companies such as IONA, Orbix, Forte, Visix, ILOG, Peerlogic, and Ochre, among
others. In general, these approaches do not attempt to address the entire
problem. For example, the package from ILOG allows you to build distributed
applications as long as you stick with C++ (that is, no attempt is made to be
language neutral). Likewise, the Galaxy framework from Visix offers a facility
known as "DAS" (short for "Distributed Application Services") that allows
distributed processing if both client and server are built with the framework.
Unfortunately, even if a proposed solution were to meet all requirements, the
reality is that sufficient market presence and company resources are needed in
order to be considered a viable contender in these platform wars. For this
reason, it's no surprise that the major players in the interoperable-object
wars are the large system vendors (IBM, DEC, Sun, HP, and Apple) and
cash-rich, market-dominating corporations (Microsoft), rather than small
software houses with limited resources.


Choosing the Weapons


The specification we provided to vendors was small and straightforward: a
simple phone-directory database that manages customer names and telephone
numbers. This database can be queried in two simple ways, by name and by
number, reflecting the two fields that constitute the database. The goal here
is not to show off prowess in constructing databases. Instead, the basic issue
is how to take an existing chunk of application functionality and package it
in the form of an interoperable object. For each vendor's technology, we want
to show the amount of "glue" (APIs, constructs, libraries, tools, mechanisms)
necessary to make a simple object interoperable.
To this end, DDJ provided the "legacy code" that implements the database. The
requirements are so small, it can be implemented in a half-page of C code in a
few moments, so we called it "the one-minute phone directory." Even so, the
basic concerns related to interoperable objects can scale up by several orders
of magnitude, in both code size and capacity.
The interface between the database and its clients consists of four services
in Table 1: initialize, terminate, lookup-by-name, lookup-by-number. The
nonobject implementation of this interface that DDJ provided to vendors is in
Listings One, Two, and Three . These source files are all compiled and linked
into a single DOS program. The actual client/server partitioning is partly a
figment of our imagination. The "server" is all of 67 lines of C code, in
Listing Two. The "client" is in Listing Three, less than 25 lines. Example 1
shows the procedural declaration of the interface. In this implementation,
everything is static, hard-wired, and resident in memory, to minimize the guts
and maximize the visibility of the glue.
There are several ways that a developer would package this implementation.
Currently, it is easy to package it as a procedural library (an OBJ module), a
dynamic library (DLL), or a procedural component such as a VBX. This subsystem
can also serve as a basis for a conventional object-oriented implementation,
by wrapping a class around it. The interface would remain basically the same,
except for adding constructor and destructor member functions (if you're using
C++) and turning the API entry points into member functions. 
To turn this code into an interoperable object, there are three possible
approaches:
A basic, nonvisual implementation that is close to the traditional
client/server model, in which the client application makes a request (say, for
the phone number) from the server component (which fills a buffer, or returns
a pointer to shared storage). The emphasis is on getting the client and server
to collaborate across various boundaries (such as address space,
process-lifetime boundaries, machine/network boundaries, and implementation
language boundaries).
A visual implementation that shows off the compound document capability of the
technology. This option, of course, is not applicable to technologies that are
oriented to infrastructure (such as SOM, COM, and CORBA). The application
component gets embedded in a container document. It should allow the user to
type in a query (such as a name) and then display the resulting phone number.
The idea is to illustrate the mechanics of the linking-and-embedding protocol.
An additional goal is to highlight the power of a development tool, class
library, or app framework that might ease the burden of implementing this
complex protocol between container and component.
Lastly, an optional implementation might show how an application can access
the component's services programmatically, by way of a scripting language or
automation interface. 
Given the time requirements, the third option was not practical for any
vendors. As stated previously, both IBM and Microsoft provided a basic,
nonvisual implementation illustrating their respective object models. In
addition, Microsoft implemented the second alternative listed here, a visual
implementation of an embeddable application component, as an OLE Custom
Control using the Microsoft Foundation Classes (MFC) library. The rest of this
article will discuss each of these implementations in turn. Because of space
requirements, not all code can be shown here, only the key sections. The
complete listings for each implementations are available in electronic form;
see "Availability," page 3.


Microsoft COM


Sara Williams of Microsoft implemented the phone-directory database as a pure
Component Object without a UI and without any OLE interfaces, in order to
emphasize the distinction between the underlying object model and the
higher-level OLE services. The non-UI object implements the desired
client/server interface on a cross-process Component Object. A cross-process
object, naturally enough, lives in a separate address space from its client
and is packaged in an EXE file. An object can also be implemented as an
in-process server (packaged in a DLL). 
To a client application, a COM Object is simply a COM Object--it is up to the
implementor to decide whether to make the Component Object an in-process or
cross-process object. In either case, the code for the client application
remains the same. At present, there is no available version of COM that
supports interaction between objects across a network. Microsoft has promised
availability of distributed COM/OLE in the not-too-distant future, and also
stated that no changes to client code will be necessary.
There are three aspects to the COM implementation: interface, client code, and
server code. The interface is formally specified using IDL, as shown in
Example 2. This specification is fed to the MIDL compiler, which generates the
stubs and proxies used by client and server. The COM technology demands that
the developer create certain required interfaces and have certain run-time
behavior in order to implement an interoperable object. Here is an excerpted
narrative from Sara Williams that explains how she accomplished this:
First, I wrote a minimal COM object--one that just supports IUnknown and has a
ClassFactory. This isn't very exciting, but it was the first step. I made sure
that my object could be correctly instantiated, and that it would correctly
destroy itself at the correct time. I used OLE2VIEW as a client here, because
it will instantiate an object and then release it. 
Second, I wrote the IDL file that defines my custom interface, and used the
MIDL compiler to compile it into the proxy/stub DLL.
Third, I expanded my simple COM object to implement my custom interface. This
was pretty straightforward. In C++, I changed my class so that it now derives
from ILookup, instead of IUnknown, and I added the two methods (non-IUnknown)
to my implementation.
Fourth, I wrote a client app that creates an instance of my object, and then
uses my custom interface to find information in the server's phone database. A
call to CoCreateInstance has COM instantiate a server object and return a
pointer to me to use. To call the custom interface methods (LookupByName and
LookupByNumber), I just hard-coded input values to make sure the call was
being made correctly. Once I got it working, I added a UI to get the value
from the user. When the user exits the application, I call Release on the
object so that it can be freed at the appropriate time.
The client in this example is a small Windows app. The complete client code is
not shown here, but is available electronically. The most interesting parts of
the client code are in Listing Four . The class declaration for the server
object is shown in Example 3. In addition, there are three principal source
modules for the server: app.cpp (which has WinMain and initialization),
obj.cpp (a simple IUnknown-based object), and icf.cpp (class factory). These
are shown in Listings Five , Six , and Seven , respectively, in slightly
abridged form. Not shown is the file pdserver.reg, a small registration file
that gets merged into the Windows registry, so that COM knows what type of
server mine is, and where to find it. 


IBM SOM



The comparable SOM implementation of the phone-directory database is by
Charles Erickson, a developer in the SOMObjects product group at IBM Austin,
Texas. Erickson turned the DDJ phone-directory code into a SOM object, which
can be either local to the main process or remote. The location of the SOM
object is determined by its registration in the Implementation Repository. If
the PhoneDir class is not registered in the Implementation Repository, the SOM
object will exist local to the process. If the PhoneDir class is registered,
the object will exist in the server with which the PhoneDir class has been
registered. The server process may run on the same system as this client
program, or on a networked system.
Unlike COM, which does not yet exist in a distributed version, the code in
IBM's example works with both vanilla SOM and its distributed flavor, DSOM.
Charles Erickson's narrative states: 
This code was written in such a way as to allow the PhoneDir object to be
created either in the same process as the client program (main) or to be
distributed in a remote process without recompiling the client application_.
As a result of providing this flexibility in the location of the PhoneDir
object, there is some scaffolding or glue code designed to hide some of the
current seams in SOM's local/remote transparency. These seams are related to
memory management and object life cycle. In subsequent releases of the
SOMobjects Toolkit, this scaffolding code will be absorbed behind the CORBA
life cycle and seamless memory management APIs. As a result, new applications
can take advantage of complete local/remote transparency, while applications
written to the old APIs will continue to work unchanged.
You can see these "seams" in Listing Eight , which is the SOM/DSOM client for
the phone-directory object. There is a static Boolean variable called isdsom,
which is set in the PhoneDirInitialize function, depending on the class of the
server. At program-termination time, there are some slight differences in the
code that frees memory and destroys objects. 
The SOM implementation only requires three source modules: the IDL file shown
in Example 4, the client source in Listing Eight, and the PhoneDir server
source in Listing Nine . The appropriate header file (PhoneDir.pxh) is
generated automatically when the IDL file is run through the SOM compiler. The
SOM compiler also generates a template for the PhoneDir code.
You can judge the results for yourself from the listings and examples here,
but, in general, the code for the SOM/DSOM example is shorter and seems to
require less "glue" than the COM case. This is partly because SOM provides
more services at run-time and does not require you to implement things like
class factories. Even so, as you can see from the server code, a fair number
of lines of original code needed to be altered. And, as you can see from the
interface declaration in Example 4, IDL usage in SOM is a bit more convoluted
than the COM equivalent. Nevertheless, it seems fair to say that, if you are
working purely at the object model level, and you are not using an application
framework, and you are perhaps using a language other than C++, working with
SOM seems easier and more straightforward than COM. In addition, third-party
tools such as C++ compilers from MetaWare and Watcom provide direct-to-SOM,
which further reduce the pain.


Microsoft OLE


In addition to the COM example shown previously, Microsoft also provided a
visual implementation of the phone-directory example in the form of an OLE
Custom Control. This implementation was done by Steve Ross of Microsoft, using
tools such as Visual C++ and the MFC framework to automate the process of
constructing a high-level application component. In a convincing demonstration
of the power of these tools, Ross was able to complete his implementation more
quickly and with less manually produced lines of code than the COM example
implemented by Sara Williams, which required more bare-API programming.
To implement this example, Ross created an OLE control that is a subclassed
Windows list box. This subclassed list box is used as a visual front end to
access the phone-directory database and display its data. In addition, the
example uses the notification machinery in OLE controls to fire off an event,
NameNumberChanged, when the user changes the selection. This is an important
difference between OLE controls and the standard list-box control. The
component also implements methods (GetNameFromNumber and GetNumberFromName)
for accessing the database routines (phonedir_LookupByNumber and
phonedir_LookupByName) provided by DDJ. In contrast to the earlier SOM and COM
examples, in which the original server code was basically rewritten, Ross's
version shows how OLE controls can encapsulate legacy code. Other aspects of
Ross's implementation include some read-only properties for inspecting the
state of the object: CurrentNumber and CurrentName. 
Ross's example brings to bear a number of key tools and technologies from
Microsoft: Visual C++ Versions 1.5 and 2.0, the OLE Control Development Kit,
the Microsoft Foundation Classes Versions 2.5 and 3.0 (included in the Visual
C++ package), and the Control Wizard facility that is also part of the Visual
C++ package. In addition, Ross used Microsoft Access to quickly design a
forms-based user interface that shows off all aspects of the OLE control,
including its properties, methods, and events; see Figure 1.
Ross provided an extensive narrative describing the implementation process,
available with the electronic form of the listings. The following excerpt
describes his approach:
To start an OLE Custom Control, the first step is to describe it to the
Control Wizard [in Visual C++]. The Wizard then generates much of the code
necessary to implement the control. For the purposes of this document, Visual
C++ Version 2.0 will be discussed, although exactly the same steps work for
Visual C++ 1.5 (with the OLE Custom Control Development Kit installed).
 Before adding any implementation at all, you need to define the interface to
your OLE Custom Control. This is most easily done using the Class Wizard.
[After adding the methods GetNameFromNumber and GetNumberFromName], the
properties are added using ClassWizard as well_. The last remaining part of
the interface to this control is to add an event that is triggered when the
user changes selection in the listbox. This [NameNumberChanged] event will not
only notify the container that the selection has changed, but it will also
save the container some time and pass the new name and number as event
parameters.
This OLE control implementation consists of many files, totaling about 900
lines of code, a nine-fold increase over the original C-language source.
However, Ross emphasizes, "There are only 34 lines of user-supplied code to
provide the encapsulation of the phone database in an OLE Custom Control
object (35, including the declaration of OnDrawMetafile in DDJDECTL.H)." The
implementation for the methods, events, and properties in this control reside
entirely in the file DDJDECTL.CPP (see Listing Ten), which indicates the lines
that were manually added. Lines added without using the Wizard tools are
denoted using the ==> symbol at the left margin. Example 5 presents the ODL
file that is the source for the MKTYPLIB tool used to produce a TLB type
library file. This type library file then becomes a resource for the OLE
control.


Conclusion


The technologies covered in this article comprise the two principal contenders
in the interoperable-object wars at present. Although the base code that DDJ
provided to vendors was designed to be representative of much larger-scale
projects, drawing conclusions from such a small program is still a bit risky.
Evaluating a platform technology requires a certain amount of "living
together" over a period of time. Even so, it seems apparent that, working at
the object-model level, COM demands a bit more effort on the part of the
programmer than SOM. However, as you can see from the listings, both examples
did require a substantial rewrite of the "legacy code" provided to
participants.
It is not clear how much, if at all, a programmer will be working at the
object-model level. The ultimate goal of interoperable-object computing is
compound documents that provide rich functionality by way of embedded
components. From the articles in this Special Report, it appears that these
technologies are complex enough that you would not undertake an implementation
of compound documents without a tool or framework. 
This is where the view gets murkier, because many of the higher-level tools or
frameworks have not yet been released. And where some of these technologies
fit needs clarification. For example, both OpenDoc and TalAE are
compound-document technologies backed by IBM, and both use SOM as the
underlying object model. Both are scheduled for initial release in the same
time frame (early 1995). Yet, the connection between these two systems seems
tenuous at best. Both cannot prevail, so choosing one over the other means no
more than a 50 percent chance of picking the winner.
And then there's OLE. With its introduction of the OLE suite of technologies,
Microsoft has dramatically increased the burden that rests on the shoulders of
Windows programmers. Most observers agree that the size and complexity of OLE
programming interface is at least equivalent to the Windows 3 API. Kraig
Brockschmidt's introductory book on OLE runs almost 1000 pages, and does not
even get into the subject of OLE Automation. However, it seems that Microsoft
has learned valuable lessons from the early days of Windows programming, when
developers had to chip away at a monolithic API with only the crudest of
implements. The visual implementation of the phone-directory database
presented in this article is a striking example of the power of Microsoft's
tool suite.
Nevertheless, there remains a lingering uneasy feeling that the 35 lines of
manually written code might be precariously perched on a pyramid that consists
of an estimated 25,000 lines of OLE-specific code in MFC, plus an estimated
55,000 lines of non-OLE-specific MFC code. Microsoft states that developers
should choose Visual C++ as an OLE implementation vehicle because MFC contains
tens of thousands of lines of C++ code "that you don't have to write."
However, if there's a bug, you might not escape having to trace through (and
thoroughly understand) this large body of code. 
In a recent presentation to the Software Entrepreneur's Forum (SEF) of Silicon
Valley, Mark Ryland of Microsoft said that Microsoft has "bet the company on
OLE." If you follow the computer industry, you can see that all sectors of the
company have been marshaled to work on, implement, apply, support, and
evangelize about OLE. In some cases, the pervasiveness of OLE has been
overstated by Microsoft evangelists (for example, the extent to which OLE
services are used in the Windows 95 environment is something of a myth, as
revealed in "A Milestone on the Road to Chicago," Dr. Dobb's Developer Update,
August 1994).
If sheer body mass were the sole indicator of which side will prevail in the
platform wars, Microsoft would be the uncontested winner. One small indicator
of Microsoft's commitment to OLE and COM is its level of participation in this
Special Report, which outweighed all rival efforts by a factor of four or
five. This fact has little to do with technology, but merits some
consideration if you are making a strategic platform decision. In his
presentation, Mark Ryland also said, "OLE will prevail because the history of
the computing industry shows that the first plausible solution wins."
Microsoft clearly believes it has such as solution. Whether you buy this
argument depends on how you define "plausible," and on how much that
definition has to do with business issues, as well as technology issues.
Figure 1 A front end to the phone-directory component built with OLE controls
and Microsoft Access.
Table 1: Interface spec for the one-minute phone-directory database.
 Function Purpose 
 Initialize() Called at program startup to initialize subsystem.
 LookupByName() Given a name, returns corresponding phone number.
 LookupByNumber() Given a number, returns corresponding customer name.
 Terminate() Called at program termination
 to do cleanup.
Example 1: The procedural interface to the phone directory, as declared in a C
header file.
public bool entrypoint phonedir_Initialize (void);
public lpstr entrypoint phonedir_LookupByName (lpstr name);
public lpstr entrypoint phonedir_LookupByNumber (lpstr number);
public void entrypoint phonedir_Terminate (void);
Example 2: Declaration of phone directory interface in Microsoft IDL.
[ object,
 uuid(c4910d71-ba7d-11cd-94e8-08001701a8a3),
 pointer_default(unique)
]
interface ILookup : IUnknown
{
 import "unknwn.idl";
 HRESULT LookupByName( [in] LPTSTR lpName,
 [out, string] WCHAR **lplpNumber);
 HRESULT LookupByNumber( [in] LPTSTR lpNumber,
 [out, string] WCHAR ** lplpName);
}
Example 3: Declaration of server class in COM implementation. 
class CPDSvrObj : public ILookup
{

private:
 int m_nCount; // reference count
 CPDSvrApp FAR * m_lpApp ; // pointer to app object so we can
 // tell it when we've been destroyed.
 DWORD m_dwRegister; // Registered in ROT
 record theDatabase[MAX_RECORDS]; // phone book
public:
 // IUnknown methods
 STDMETHODIMP QueryInterface (REFIID riid, LPVOID FAR* ppvObj);
 STDMETHODIMP_(ULONG) AddRef ();
 STDMETHODIMP_(ULONG) Release ();
 // IPhoneDir methods
 STDMETHODIMP LookupByName (LPTSTR lpName, TCHAR **lplpNumber);
 STDMETHODIMP LookupByNumber (LPTSTR lpNumber, TCHAR **lplpName);
 // construction/destruction
 CPDSvrObj(CPDSvrApp FAR * lpApp);
 virtual ~CPDSvrObj();
 // utility functions
 BOOL Initialize (void);
 void CreateRecord (int i,LPTSTR lpName,LPTSTR lpNumber);
};
Example 4: Declaration of phone-directory interface in SOM IDL. 
interface PhoneDir : SOMObject
{
#ifdef __PRIVATE__ // Phone directory implementation details
 struct record {
 string name;
 string phone_number;
 };
 const long MAX_RECORDS = 5;
#endif
 // Operations on a PhoneDir
 string LookupByName( in string name ); // given a name, return number
 string LookupByNumber(in string number); // given number, return name
 void Initialize(inout somInitCtrl ctrl); // Object initializer.
#ifdef __PRIVATE__
 // Return a new phone directory record, given the name and number
 record CreateRecord(in string name, in string phone_number);
#endif
#ifdef __SOMIDL__
 implementation {
 // Class modifiers:
 // releaseorder is for upward compatible release management.
 releaseorder: LookupByName,
 LookupByNumber,
 Initialize,
#ifdef __PRIVATE__
 CreateRecord;
#else
 Internal1;
#endif
 memory_management = corba; // caller owns returned memory
 function_prefix = phonedir_; // language bindings directive
 dllname = "phonedir.dll"; // class library
#ifdef __PRIVATE__
 // Phone directory implementation details
 sequence<record, MAX_RECORDS> theDatabase;
#endif
 // Method modifiers:

 Initialize: init; // this method is an initializer
 somDefaultInit: override, init; // default initializer
 };
#endif /* __SOMIDL__ */
};
Example 5: Interface specification used by OLE Custom Control implementation.
[ uuid(AF3B752C-89D0-101B-A6E4-00DD0111A658), version(1.0),
 helpstring("Ddjdemo OLE Custom Control module")
]
library DdjdemoLib
{
 importlib(STDOLE_TLB);
 importlib(STDTYPE_TLB);
 // Primary dispatch interface for CDdjdemoCtrl
 [ uuid(AF3B752A-89D0-101B-A6E4-00DD0111A658),
 helpstring("Dispatch interface for Ddjdemo Control") ]
 dispinterface _DDdjdemo
 {
 properties:
 // NOTE - ClassWizard will maintain property information here.
 // Use extreme caution when editing this section.
 //{{AFX_ODL_PROP(CDdjdemoCtrl)
 [id(1)] BSTR CurrentName;
 [id(2)] BSTR CurrentNumber;
 //}}AFX_ODL_PROP
 methods:
 // NOTE - ClassWizard will maintain method information here.
 // Use extreme caution when editing this section.
 //{{AFX_ODL_METHOD(CDdjdemoCtrl)
 [id(3)] BSTR GetNameFromNumber(BSTR szNumber);
 [id(4)] BSTR GetNumberFromName(BSTR szName);
 //}}AFX_ODL_METHOD
 [id(DISPID_ABOUTBOX)] void AboutBox();
 };
 // Event dispatch interface for CDdjdemoCtrl
 [ uuid(AF3B752B-89D0-101B-A6E4-00DD0111A658),
 helpstring("Event interface for Ddjdemo Control") ]
 dispinterface _DDdjdemoEvents
 {
 properties:
 // Event interface has no properties
 methods:
 // NOTE - ClassWizard will maintain event information here.
 // Use extreme caution when editing this section.
 //{{AFX_ODL_EVENT(CDdjdemoCtrl)
 [id(1)] void NameNumberChanged(BSTR szName, BSTR szNumber);
 //}}AFX_ODL_EVENT
 };
 // Class information for CDdjdemoCtrl
 [ uuid(AF3B7529-89D0-101B-A6E4-00DD0111A658),
 helpstring("Ddjdemo Control") ]
 coclass Ddjdemo
 {
 [default] dispinterface _DDdjdemo;
 [default, source] dispinterface _DDdjdemoEvents;
 };
 //{{AFX_APPEND_ODL}}
};


Listing One 
/****************************************************************
 > PHONEDIR.H -- Header file for PHONEDIR, the one-minute phone 
 > directory database. by Ray Valdes.
 >***************************************************************/

/****************************************************************
 > The following are some generically useful #defines and typedefs,
 > set in all lowercase just to be different.
 >***************************************************************/
#ifndef entrypoint
#define entrypoint _far pascal
#define public
#define private static
#define nil 0
typedef char _far * lpstr;
typedef int bool;
#define true 1
#define false 0
#endif

/******************This is the PhoneDir API***********************/

public bool entrypoint phonedir_Initialize (void);
public lpstr entrypoint phonedir_LookupByName (lpstr name);
public lpstr entrypoint phonedir_LookupByNumber (lpstr number);
public void entrypoint phonedir_Terminate (void);

/*******************End of PHONEDIR.H****************************/


Listing Two
/**********************************************************************
 > PHONEDIR.C --the one-minute phone directory database. by Ray Valdes.
 >*********************************************************************/

#include <string.h> 
#include "phonedir.h"

/****************************************************************
 > This sets up the database structure, a fixed size array of
 > fixed size records in memory, initialized at startup-time
 > by hard-coded program statements (can this get any simpler?)
 >***************************************************************/
typedef struct
{ lpstr name;
 lpstr phone_number;
} record;

#define MAX_RECORDS 5
static record theDatabase[MAX_RECORDS];

/****************************************************************/
private void phonedir_CreateRecord(int arrayindex,lpstr name,lpstr phone);

/****************************************************************/
public bool entrypoint 
phonedir_Initialize(void)
{ phonedir_CreateRecord(0,"Daffy Duck", "310-555-1212");

 phonedir_CreateRecord(1,"Wile E. Coyote", "408-555-1212");
 phonedir_CreateRecord(2,"Scrooge McDuck", "206-555-1212");
 phonedir_CreateRecord(3,"Huey Lewis", "415-555-1212");
 phonedir_CreateRecord(4,"Thomas Dewey", "617-555-1212");
 return true; /* success */
}
/****************************************************************/
private void 
phonedir_CreateRecord(int i,lpstr name,lpstr phone_number)
{ theDatabase[i].name = name;
 theDatabase[i].phone_number = phone_number;
}
/****************************************************************/
public lpstr entrypoint 
phonedir_LookupByName(lpstr name)
{ int i;
 for(i=0; i < MAX_RECORDS; i++)
 { if(_fstrcmp(theDatabase[i].name,name)==0)
 return theDatabase[i].phone_number;
 }
 return nil;
}
/****************************************************************/
public lpstr entrypoint 
phonedir_LookupByNumber(lpstr number)
{
 int i;
 for(i=0; i < MAX_RECORDS; i++)
 { if(_fstrcmp(theDatabase[i].phone_number,number)==0)
 return theDatabase[i].name;
 }
 return nil;
}
/****************************************************************/
public void entrypoint 
phonedir_Terminate(void)
{ return;
}
/*********************End of PHONEDIR.C**************************/


Listing Three
/****************************************************************
 > MAIN.C -- Sample client for the one-minute phone directory. by Ray Valdes.
 >***************************************************************/


int main(int argc,char*argv[])
{
 char *name,*number;
 (void) phonedir_Initialize();

 // do a lookup by name
 name = "John Doe"; number = phonedir_LookupByName(name);
 if(number) printf("%s's number is %s.\n",name,number);
 else printf("%s does not have a number listed.\n",name);

 // do a lookup by number
 number = "408-555-1212"; name = phonedir_LookupByNumber(number);

 if(name) printf("%s's number is %s.\n",name,number);
 else printf("The phone number %s has not been assigned.\n",number);

 phonedir_Terminate();
}
/********************End of MAIN.C*************************************/


Listing Four
//**********************************************************************
// CLIENT.C -- a Windows client for phone directory database (excerpted)
//**********************************************************************

// This is your basic WinMain routine, plus OLE lib initialization
int APIENTRY WinMain(HINSTANCE hInstance,
 HINSTANCE hPrevInstance,LPSTR lpCmdLine,int nCmdShow)
{
 MSG msg;
 if (!hPrevInstance && !InitApplication(hInstance))
 return FALSE; 
 //o see if we are compatible with this version of the OLE libraries
 DWORD dwVer = OleBuildVersion();
 if (HIWORD(dwVer) != rmm LOWORD(dwVer) < rup) 
 return FALSE;
 if (NOERROR == OleInitialize(NULL)) // initialize the OLE libraries
 fOleInitialized = TRUE;
 if (!InitInstance(hInstance, nCmdShow)) 
 return (FALSE);
 while (GetMessage(&msg, NULL, 0,0)) 
 {
 TranslateMessage(&msg);
 DispatchMessage(&msg); 
 }
 if (fOleInitialized) OleUninitialize();
 return (msg.wParam);
}
//********************************************************************
LRESULT CALLBACK WndProc(HWND hWnd,
UINT message,WPARAM uParam,LPARAM lParam) 
{
 switch (message) 
 {
 case WM_COMMAND: // message: command from application menu
 switch (LOWORD(uParam)) 
 {
 case ID_EXIT: DestroyWindow (hWnd);// exit application
 break;
 case ID_CONNECT: // Connect to the phone book server
 {
 HRESULT hRes;
 // Create an instance of the phone book app.
 // Normally, we would query the registration database
 // for the CLSID; however, for the sake of simplicity
 // for this sample, we've hard-coded it.
 hRes = CoCreateInstance(&CLSID_PHONEBOOK,
 NULL, CLSCTX_SERVER,&IID_ILookup,&pLookup);
 if (SUCCEEDED(hRes))
 {
 MessageBox(hWnd, TEXT("Connected"), 

 TEXT("CoCreateInstance"), MB_OK);
 fConnected = TRUE; // we've got a pLookup pointer to use
 }
 else 
 MessageBox(hWnd, TEXT("Failure"), 
 TEXT("CoCreateInstance"), MB_OK);
 }
 break;
 case ID_LOOKUPBYNAME: 
 case ID_LOOKUPBYNUM:
 {
 TCHAR *ptszFound; // returned string from method call
 TCHAR ptszInput[MAXBUFF]; // pass to ILookupByName/Number
 TCHAR ptszResBuff[MAXBUFF * 3]; // results string
 BOOL fByName = (ID_LOOKUPBYNAME == LOWORD(uParam));
 BOOL fOK; 
 HRESULT hRes;
 FARPROC lpProcFind;
 FINDDLGINFO fdInfo;
 // initialize structure to pass to DialogBoxParam
 fdInfo.ptszNameNum = ptszInput;
 fdInfo.uDlgType = LOWORD(uParam);
 // Get input from user
 lpProcFind = MakeProcInstance((FARPROC)Find, hInst);
 fOK = DialogBoxParam(hInst, TEXT("FindDialog"), 
 hWnd, (DLGPROC)lpProcFind, (LPARAM)&fdInfo); 
 FreeProcInstance(lpProcFind);
 if (!fOK) // user cancelled dialog
 break;
 // Call ILookupByName or ILookupByNumber
 // ILookup_<method> are macros generated by the MIDL compiler 
 // They are not necessary, are just provided for convenience. 
 // They expand to pLookup->lpVtbl-><method>(pLookup, <args>)
 if (fByName)
 hRes=ILookup_LookupByName(pLookup,ptszInput,&ptszFound);
 else
 hRes=ILookup_LookupByNumber(pLookup,ptszInput,&ptszFound);
 if (FAILED(hRes)) // Call Failed
 {
 MessageBox(hWnd, TEXT("Failure"), fByName ? 
 TEXT("LookupByName"):TEXT("LookupByNumber"), MB_OK);
 break;
 }
 // Call succeeded, but string user entered wasn't in database
 if (S_FALSE == hRes) // entry not found in database
 ptszFound = ptszNotFound;
 wsprintf(ptszResBuff, // Format output
 TEXT("Name: %s\r\nPhone Number: %s"),
 (fByName ? ptszInput : ptszFound),
 (fByName ? ptszFound : ptszInput));
 // Display results to user
 MessageBox(hWnd, ptszResBuff, TEXT("Results"), MB_OK);
 if (ptszFound == ptszNotFound)
 { ptszFound = NULL;
 break;
 }
 if (NULL == pMalloc)// Free the memory passed to us.
 { hRes = CoGetMalloc(MEMCTX_TASK, &pMalloc);
 if (FAILED(hRes))

 break;
 }
 pMalloc->lpVtbl->Free(pMalloc, ptszFound);
 }
 break;
 default: return (DefWindowProc(hWnd, message, uParam, lParam));
 }
 break;
 case WM_DESTROY: 
 if (pMalloc)
 pMalloc->lpVtbl->Release(pMalloc); // release the IMalloc pointer
 if (pLookup)
 ILookup_Release(pLookup); // release ptr to the phonebook object
 PostQuitMessage(0);
 break;
 default: return DefWindowProc(hWnd, message, uParam, lParam);
 }
 return (0);
}


Listing Five
//**********************************************************************
// APP.CPP --- Implementation of the CPDSvrApp Class. by Sara Williams.
//**********************************************************************


//**********************************************************************
// CPDSvrApp::CPDSvrApp() --- Constructor for CPDSvrApp
//********************************************************************
CPDSvrApp :: CPDSvrApp()
{ m_nObjCount = 0; // Initialize member variables
 m_fInitialized = FALSE;
}

//**********************************************************************
// CPDSvrApp::~CPDSvrApp() --- Destructor for CPDSvrApp Class
//********************************************************************
CPDSvrApp :: ~CPDSvrApp()
{
 CoRevokeClassObject( m_dwRegisterClass ) ; // Unregister our class
 if (m_fInitialized) OleUninitialize(); // Uninitialize OLE libs
}

// ObjectCreated and ObjectDestroyed are useful for apps that support 
// multiple objects. Ours doesn't, so they are useful, but not necessary.
void CPDSvrApp :: ObjectCreated() 
{ m_nObjCount++ ; }

void CPDSvrApp :: ObjectDestroyed() 
{ m_nObjCount-- ;
 if (m_nObjCount == 0) PostQuitMessage(0) ;
}

//**********************************************************************
// CPDSvrApp::fInitInstance --- Instance initialization
//********************************************************************
BOOL CPDSvrApp :: fInitInstance (HANDLE hInstance, int nCmdShow, 
 CClassFactory FAR * lpClassFactory)

{
 DWORD dwVer = OleBuildVersion(); // Get current running OLE version

 // make sure app was built with compatible version
 if (HIWORD(dwVer) != rmm LOWORD(dwVer) < rup)
 OutputDebugString(TEXT("Not compatible with current libs!\r\n"));
 if (OleInitialize(NULL) == NOERROR) // initialize the libraries
 m_fInitialized = TRUE;
 // Create an instance of our class factory object; we pass this
 // pointer to CoRegisterClassObject.
 lpClassFactory = new CClassFactory(this);

 // inc our ref count to hold the CF alive during CoRegisterClassObject
 lpClassFactory->AddRef();

 // Register our class factory with COM so that instances of our
 // class can be created.
 CoRegisterClassObject(CLSID_PHONEBOOK,
 (IUnknown FAR *)lpClassFactory, 
 CLSCTX_LOCAL_SERVER, 
 REGCLS_SINGLEUSE, 
 &m_dwRegisterClass);
 lpClassFactory->Release(); // match our AddRef
 return m_fInitialized;
}
//**********************************************************************
int PASCAL WinMain(HANDLE hInstance,HANDLE hPrevInstance,
 LPSTR lpCmdLine,int nCmdShow)

{ MSG msg;
 CPDSvrApp FAR * lpCPDSvrApp;
 CClassFactory FAR * lpClassFactory;
 BOOL fContinue = TRUE;

 lpCPDSvrApp = new CPDSvrApp; // Create new instance of application object

 // instance initialization
 if (!lpCPDSvrApp->fInitInstance(hInstance, nCmdShow, lpClassFactory))
 return (FALSE);
 while (fContinue) // message loop
 { while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
 { if (WM_QUIT == msg.message)
 { fContinue = FALSE;
 break;
 }
 TranslateMessage(&msg); /* Translates virtual key codes */
 DispatchMessage(&msg); /* Dispatches message to window */
 }
 }
 delete lpCPDSvrApp ; // Delete our app object
 return (msg.wParam); /* Returns the value from PostQuitMessage */
}


Listing Six
//**********************************************************************
// OBJ.CPP -- Implementation of the CPDSvrObj Class. by Sara Williams.
//**********************************************************************



typedef ILookup * LPLOOKUP;

//**********************************************************************
// CPDSvrObj::QueryInterface
// Purpose: Used for interface negotiation at the "Object" level.
// Params: REFIID riid -- A reference to the interface being queried.
// LPVOID FAR* ppvObj -- Out param returns a ptr to interface.
// Returns: S_OK (if interface is supported) or E_NOINTERFACE.
//********************************************************************
STDMETHODIMP CPDSvrObj::QueryInterface ( REFIID riid, LPVOID FAR* ppvObj)
{ SCODE sc = S_OK;
 if (IsEqualIID(riid, IID_IUnknown)) // asking for IUnknown
 *ppvObj = (LPUNKNOWN)this;
 else if (IsEqualIID(riid, IID_ILookup)) // asking for ILookup
 *ppvObj = (LPLOOKUP)this;
 else { // asking for something we don't implement
 *ppvObj = NULL;
 sc = E_NOINTERFACE;
 }
 if (*ppvObj) ((LPUNKNOWN)*ppvObj)->AddRef(); // increment ref count
 return ResultFromScode( sc );
};

//**********************************************************************
// CPDSvrObj::AddRef --- Increments the object's reference count.
//********************************************************************
STDMETHODIMP_(ULONG) CPDSvrObj::AddRef ()
{ return ++m_nCount;
};

//**********************************************************************
// CPDSvrObj::Release --- Decrements the object's reference count
//********************************************************************

STDMETHODIMP_(ULONG) CPDSvrObj::Release ()
{ // if ref count is zero, then we can safely unload 
 if (--m_nCount == 0)
 { m_lpApp->m_lpObj = NULL ;
 m_lpApp->ObjectDestroyed() ; 
 delete this;
 return 0;
 }
 return m_nCount;
}

//**********************************************************************
// LookupByName --- Given a name, return the corresponding phone number
//********************************************************************
STDMETHODIMP CPDSvrObj::LookupByName(LPTSTR lpName, TCHAR ** lplpNumber)
{ int i;
 LPMALLOC pMalloc;
 HRESULT hRes;

 *lplpNumber = NULL;

 for(i=0; i < MAX_RECORDS; i++)
 {
 if(_tcscmp(theDatabase[i].name,lpName)==0)

 {
 hRes = CoGetMalloc(MEMCTX_TASK, &pMalloc);
 if (SUCCEEDED(hRes))
 *lplpNumber = (LPTSTR)pMalloc->Alloc(25*sizeof(TCHAR));
 else
 return (E_FAIL);
 _tcscpy(*lplpNumber, theDatabase[i].phone_number);
 pMalloc->Release();
 return ResultFromScode(S_OK);
 }
 }
 return ResultFromScode(S_FALSE);
}

//**********************************************************************
// LookupByNumber -- Given a phone number, return corresponding customer name
//********************************************************************
STDMETHODIMP CPDSvrObj::LookupByNumber(LPTSTR lpNumber, TCHAR ** lplpName)
{ int i;
 LPMALLOC pMalloc;
 HRESULT hRes;

 *lplpName = NULL;

 for(i=0; i < MAX_RECORDS; i++)
 { if(_tcscmp(theDatabase[i].phone_number,lpNumber)==0)
 { hRes = CoGetMalloc(MEMCTX_TASK, &pMalloc);
 if (SUCCEEDED(hRes))
 *lplpName = (LPTSTR)pMalloc->Alloc(25*sizeof(TCHAR));
 else
 return (E_FAIL);
 _tcscpy(*lplpName, theDatabase[i].name);
 pMalloc->Release();
 return ResultFromScode(S_OK);
 }
 }
 return ResultFromScode(S_FALSE);
}
//**********************************************************************
// CPDSvrObj::CPDSvrObj --- Constructor for CPDSvrObj
//********************************************************************
CPDSvrObj::CPDSvrObj(CPDSvrApp FAR * lpApp) 
{ m_nCount = 0;
 m_lpApp = lpApp ;
 m_dwRegister = 0; 
 Initialize(); // initialize phone book database
}

//**********************************************************************
// CPDSvrObj::~CPDSvrObj --- Destructor for CPDSvrObj
//********************************************************************
CPDSvrObj::~CPDSvrObj()
{
 OutputDebugString(TEXT("In CPDSvrObj's Destructor \r\n"));
}
//**********************************************************************
// Initialize-- helper function to intialize phone directory database
//**********************************************************************
BOOL CPDSvrObj::Initialize(void)

{
 CreateRecord(0,TEXT("Daffy Duck"), TEXT("310-555-1212"));
 CreateRecord(1,TEXT("Wile E. Coyote"), TEXT("408-555-1212"));
 CreateRecord(2,TEXT("Scrooge McDuck"), TEXT("206-555-1212"));
 CreateRecord(3,TEXT("Huey Lewis"), TEXT("415-555-1212"));
 CreateRecord(4,TEXT("Thomas Dewey"), TEXT("617-555-1212"));
 return TRUE; /* success */
}
//**********************************************************************
// CreateRecord--- helper function to set up phone directory database
//**********************************************************************
void CPDSvrObj::CreateRecord(int i,LPTSTR lpName, LPTSTR lpNumber)
{ theDatabase[i].name = lpName;
 theDatabase[i].phone_number = lpNumber;
}


Listing Seven
//**********************************************************************
// ICF.CPP -- Implementation file for the CClassFactory Class
// by Sara Williams, Microsoft Corporation.
//**********************************************************************


//**********************************************************************
// CClassFactory::QueryInterface
// Params: REFIID riid - Interface being queried for.
// LPVOID FAR *ppvObj - Out pointer for the interface.
// Returns: S_OK if success, else E_NOINTERFACE
//********************************************************************
STDMETHODIMP CClassFactory::QueryInterface ( REFIID riid, LPVOID FAR* ppvObj)
{
 SCODE sc = S_OK;
 // return pointer to interfaces we support
 if ((riid == IID_IUnknown) (riid == IID_IClassFactory))
 *ppvObj = this;
 else // request for interface we don't support
 { *ppvObj = NULL;
 sc = E_NOINTERFACE;
 }
 if (*ppvObj) ((LPUNKNOWN)*ppvObj)->AddRef();
 return ResultFromScode(sc); // pass it on to the Application object
};

//**********************************************************************
// CClassFactory::AddRef
// Purpose: Increments the ref count on CClassFactory object.
//********************************************************************
STDMETHODIMP_(ULONG) CClassFactory::AddRef ()
{ return ++m_nCount;
};

//**********************************************************************
// CClassFactory::Release
// Purpose: Decrements the ref count of CClassFactory object
//********************************************************************
STDMETHODIMP_(ULONG) CClassFactory::Release ()
{ if (--m_nCount == 0) // our ref count is 0; we can now free ourself
 {

 delete this;
 return 0;
 }
 return m_nCount;
};

//**********************************************************************
// CClassFactory::CreateInstance
// Purpose: Instantiates a new OLE object
// Parameters:
// LPUNKNOWN pUnkOuter - Pointer to the controlling unknown
// REFIID riid - The interface type to fill in ppvObject
// LPVOID FAR* ppvObject - Out pointer for the object
// Return Value:
// S_OK - Creation was successful
// CLASS_E_NOAGGREGATION - Tried to be created as part of an aggregate
// CLASS_E_CLASSNOTAVAILABLE - Tried to create a second object; 
// but we only support 1.
// E_FAIL - Creation failed
//********************************************************************
STDMETHODIMP CClassFactory::CreateInstance ( LPUNKNOWN pUnkOuter,
 REFIID riid,
 LPVOID FAR* ppvObject)
{
 HRESULT hErr = ResultFromScode(E_FAIL);
 *ppvObject = NULL; // need to NULL the out parameter

 // We can only have one instance. Thus we must fail this call to
 // CreateInstance
 if (m_lpApp->m_lpObj != NULL)
 return ResultFromScode(CLASS_E_CLASSNOTAVAILABLE);
 if (pUnkOuter) // we don't support aggregation...
 return ResultFromScode(CLASS_E_NOAGGREGATION);
 m_lpApp->m_lpObj = new CPDSvrObj( m_lpApp ); // create a new object
 m_lpApp->ObjectCreated();
 if (m_lpApp->m_lpObj) // get requested interface
 hErr = m_lpApp->m_lpObj->QueryInterface(riid, ppvObject);
 if (FAILED(hErr))
 { delete m_lpApp->m_lpObj ;
 m_lpApp->m_lpObj = NULL ;
 }
 return hErr;
};

//**********************************************************************
// CClassFactory::LockServer
// Params: BOOL fLock -- TRUE to lock the server, FALSE to unlock it
//********************************************************************
STDMETHODIMP CClassFactory::LockServer ( BOOL fLock)
{
 CoLockObjectExternal(this, fLock, TRUE);
 return ResultFromScode( S_OK);
};


Listing Eight
/*************************************************************************
 > MAIN.CPP -- a SOM/DSOM client program. by Charles Erickson, IBM Corp.
 >***********************************************************************/


#include <stdlib.h>
#include <stdio.h>
#include <somd.xh>
#include "PhoneDir.pxh" // Include the public definition of the class.

// Macros for checking for exceptions.
#define EV_OK(ev) ((ev)->_major == NO_EXCEPTION)
#define EV_NOT_OK(ev) ((ev)->_major != NO_EXCEPTION)

// Prototypes
static SOMObject * createObject(SOMClass *cls);
static void freeReturnedMem(void *mem);
static void freeObject(SOMObject *obj);
static SOMClass * PhoneDirInitialize();

// Static globals
static boolean isdsom = FALSE; // assume object local
/*****************************************************************/
int main (int argc, char *argv[])
{
 SOMClass *phoneDirClass;
 PhoneDir *phoneDir;;
 Environment *ev;
 string name, number;

 ev = SOM_CreateLocalEnvironment();

 // Get the phone directory class object. The class object
 // is used to create instances of the PhoneDir class.
 phoneDirClass = PhoneDirInitialize();

 // Create an instance of the PhoneDir class.
 phoneDir = (PhoneDir*)createObject(phoneDirClass);

 // Initialize the object instance.
 phoneDir->Initialize(ev, NULL);

 // Search the phone directory....
 name = "John Doe";
 number = phoneDir->LookupByName(ev, name);
 if (number) {
 printf("%s's number is %s.\n", name, number);
 freeReturnedMem(number);
 } else
 printf("%s does not have a number listed.\n", name);

 number = "408-555-1212";
 name = phoneDir->LookupByNumber(ev, number);
 if (name) {
 printf("%s's number is %s.\n", name, number);
 freeReturnedMem(name);
 } else
 printf("The phone number %s has not been assigned.\n", number);

 // Destroy the phone directory object.
 freeObject(phoneDir);
 SOM_DestroyLocalEnvironment(ev);
 return (0);

}
/*****************************************************************/
static SOMObject * createObject(SOMClass *cls)
{ return(cls->somNewNoInit());
}
/*****************************************************************/
static SOMClass * PhoneDirInitialize()
{ Environment *ev;
 SOMDServer *svr;
 SOMClass *cls, *dcls;

 ev = SOM_CreateLocalEnvironment();
 cls = PhoneDirNewClass(0, 0);
 SOMD_Init(ev);
 if (EV_OK(ev)) {
 svr = SOMD_ObjectMgr->somdFindAnyServerByClass(ev, "PhoneDir");
 if (svr && EV_OK(ev)) {
 dcls = svr->somdGetClassObj(ev, "PhoneDir");
 if (EV_OK(ev)) {
 cls = dcls;
 isdsom = TRUE;
 }
 }
 }
 SOM_DestroyLocalEnvironment(ev);
 return (cls);
}
/*****************************************************************/
static void freeReturnedMem(void *mem)
{ if (mem) 
 { if (isdsom) ORBfree(mem);
 else SOMFree(mem);
 }
}
/*****************************************************************/
static void freeObject(SOMObject *obj)
{ Environment ev;
 SOM_InitEnvironment(&ev);
 if (obj) {
 if (isdsom) SOMD_ObjectMgr->somdDestroyObject(&ev, obj);
 else obj->somFree();
 }
}


Listing Nine
/************************************************************************
 > PHONEDIR.CPP -- SOM phone directory server. by Charles Erickson, IBM.
 >***********************************************************************/

#define PhoneDir_Class_Source
#define VARIABLE_MACROS // Access instance data via a macro

#include "phonedir.xih"
#include <string.h>

// The following function is used by the SOM runtime to initialize
// the PhoneDir class when dynamically loaded.
SOMEXTERN void SOMLINK SOMInitModule(integer4 majorVersion, 

 integer4 minorVersion, string ignore)
{
 PhoneDirNewClass(PhoneDir_MajorVersion, PhoneDir_MinorVersion);
}
/*****************************************************************/
SOM_Scope string SOMLINK LookupByName(PhoneDir *somSelf, 
 Environment *ev, string name)
{ int i;
 string rname, rnumber, number = (string)NULL;
 PhoneDirData *somThis = PhoneDirGetData(somSelf);
 PhoneDirMethodDebug("PhoneDir","LookupByName");

 for (i=0; i<MAX_RECORDS; i++) {
 rname = sequenceElement(_theDatabase,i).name;
 rnumber = sequenceElement(_theDatabase,i).phone_number;
 if (strcmp(rname, name)==0) {
 number = strcpy((string)SOMMalloc(strlen(rnumber)+1), rnumber);
 break;
 }
 }
 return (number);
}
/*****************************************************************/
SOM_Scope string SOMLINK LookupByNumber(PhoneDir *somSelf, 
 Environment *ev, string number)
{ int i;
 string rname, rnumber, name = (string)NULL;
 PhoneDirData *somThis = PhoneDirGetData(somSelf);
 PhoneDirMethodDebug("PhoneDir","LookupByNumber");

 for (i=0; i<MAX_RECORDS; i++) {
 rname = sequenceElement(_theDatabase,i).name;
 rnumber = sequenceElement(_theDatabase,i).phone_number;
 if (strcmp(rnumber, number)==0) {
 name = strcpy((string)SOMMalloc(strlen(rname)+1), rname);
 break;
 }
 }
 return (name);
}
/*****************************************************************/
SOM_Scope void SOMLINK Initialize(PhoneDir *somSelf, 
 Environment *ev, somInitCtrl* ctrl)
{ /* This function is the object initializer */
 PhoneDirData *somThis; /* set in BeginInitializer */
 somInitCtrl globalCtrl;
 somBooleanVector myMask;
 PhoneDirMethodDebug("PhoneDir","Initialize");
 PhoneDir_BeginInitializer_Initialize;

 PhoneDir_Init_SOMObject_somDefaultInit(somSelf, ctrl);
 //local PhoneDir initialization code added by programmer
 sequenceLength( _theDatabase) = MAX_RECORDS;
 sequenceMaximum(_theDatabase) = MAX_RECORDS;
 _theDatabase._buffer =
 (PhoneDir_record*)SOMMalloc(sizeof(PhoneDir_record)*MAX_RECORDS);

 sequenceElement(_theDatabase, 0) =
 somSelf->CreateRecord(ev, "Daffy Duck", "310-555-1212");

 sequenceElement(_theDatabase, 1) =
 somSelf->CreateRecord(ev, "Wile E. Coyote", "408-555-1212");
 sequenceElement(_theDatabase, 2) =
 somSelf->CreateRecord(ev, "Scrooge McDuck", "206-555-1212");
 sequenceElement(_theDatabase, 3) =
 somSelf->CreateRecord(ev, "David Byrne", "415-555-1212");
 sequenceElement(_theDatabase, 4) =
 somSelf->CreateRecord(ev, "Thomas Dewey", "617-555-1212");
 /* Note: the fact that no exception was returned in the <ev> argument
 * indicates successful initialization.*/
}
/*****************************************************************
SOM_Scope PhoneDir_record SOMLINK CreateRecord(PhoneDir *somSelf,
 Environment *ev, string name, string phone_number)
{
 // Return a new phone directory record given the name and number.
 PhoneDir_record retVal;
 PhoneDirData *somThis = PhoneDirGetData(somSelf);
 PhoneDirMethodDebug("PhoneDir","CreateRecord");
 retVal.name = name;
 retVal.phone_number = phone_number;
 return (retVal);
}
/*****************************************************************
 * The default initializer, called by default if no other is specified 
 * when the object is created.
 */
SOM_Scope void SOMLINK somDefaultInit(PhoneDir *somSelf, somInitCtrl* ctrl)
{ Environment *gev;
 PhoneDirData *somThis; /* set in BeginInitializer */
 somInitCtrl globalCtrl;
 somBooleanVector myMask;
 PhoneDirMethodDebug("PhoneDir","somDefaultInit");
 PhoneDir_BeginInitializer_somDefaultInit;
 PhoneDir_Init_SOMObject_somDefaultInit(somSelf, ctrl);
 //local PhoneDir initialization code added by programmer
 gev = somGetGlobalEnvironment();
 Initialize(somSelf, gev, ctrl);
}


Listing Ten
//**********************************************************************
 // ddjdectl.cpp -- The CDdjdemoCtrl OLE control class. by Steven Ross.
 // NOTE: in this listing, the lines that were manually inserted
 // into this code are prefixed with a "==>" at the left margin. Also,
 // minor reformatting has been done during production to conserve space.
 //**********************************************************************
 #include "stdafx.h"
 #include "ddjdemo.h"
 #include "ddjdectl.h"
 #include "ddjdeppg.h"
 #include "phonedir.h" // API for "one-minute phone directory" by R.Valdes
 
 #ifdef _DEBUG
 #undef THIS_FILE
 static char BASED_CODE THIS_FILE[] = __FILE__;
 #endif
 

 IMPLEMENT_DYNCREATE(CDdjdemoCtrl, COleControl)
 
 ////// Message map ///////////////////////////////////////////
 BEGIN_MESSAGE_MAP(CDdjdemoCtrl, COleControl)
 //{{AFX_MSG_MAP(CDdjdemoCtrl)
 ON_OLEVERB(IDS_PROPERTIESVERB, OnProperties)
 ON_MESSAGE(OCM_COMMAND, OnOcmCommand)
 ON_WM_CREATE()
 //}}AFX_MSG_MAP
 END_MESSAGE_MAP()
 
 ///// Dispatch map ////////////////////////////////////////////
 BEGIN_DISPATCH_MAP(CDdjdemoCtrl, COleControl)
 //{{AFX_DISPATCH_MAP(CDdjdemoCtrl)
 DISP_PROPERTY_EX(CDdjdemoCtrl, "CurrentName", GetCurrentName, 
 SetNotSupported, VT_BSTR)
 DISP_PROPERTY_EX(CDdjdemoCtrl, "CurrentNumber", GetCurrentNumber, 
 SetNotSupported, VT_BSTR)
 DISP_FUNCTION(CDdjdemoCtrl, "GetNameFromNumber", GetNameFromNumber, 
 VT_BSTR, VTS_BSTR)
 DISP_FUNCTION(CDdjdemoCtrl, "GetNumberFromName", GetNumberFromName, 
 VT_BSTR, VTS_BSTR)
 //}}AFX_DISPATCH_MAP
 DISP_FUNCTION_ID(CDdjdemoCtrl, "AboutBox", DISPID_ABOUTBOX, AboutBox, 
 VT_EMPTY, VTS_NONE)
 END_DISPATCH_MAP()
 
 ////// Event map ///////////////////////////////////////////
 BEGIN_EVENT_MAP(CDdjdemoCtrl, COleControl)
 //{{AFX_EVENT_MAP(CDdjdemoCtrl)
 EVENT_CUSTOM("NameNumberChanged", FireNameNumberChanged, VTS_BSTR 
 VTS_BSTR)
 //}}AFX_EVENT_MAP
 END_EVENT_MAP()
 
 ////// Property pages ///////////////////////////////////////////
 // TODO: Add more property pages as needed. Remember to increase the count!
 BEGIN_PROPPAGEIDS(CDdjdemoCtrl, 1)
 PROPPAGEID(CDdjdemoPropPage::guid)
 END_PROPPAGEIDS(CDdjdemoCtrl)
 
 ////// Initialize class factory and guid /////////////////////////////////
 IMPLEMENT_OLECREATE_EX(CDdjdemoCtrl, "DDJDEMO.DdjdemoCtrl.1",
 0xaf3b7529, 0x89d0, 0x101b, 0xa6, 0xe4, 0x0, 0xdd, 0x1, 0x11, 0xa6, 0x58)
 
 ///// Type library ID and version ////////////////////////////////////////
 IMPLEMENT_OLETYPELIB(CDdjdemoCtrl, _tlid, _wVerMajor, _wVerMinor)
 
 ////// Interface IDs ///////////////////////////////////////////
 const IID BASED_CODE IID_DDdjdemo =
 { 
 0xaf3b752a, 0x89d0, 0x101b, { 
 0xa6, 0xe4, 0x0, 0xdd, 0x1, 0x11, 
 0xa6, 0x58 
 } 
 };
 const IID BASED_CODE IID_DDdjdemoEvents =
 { 
 0xaf3b752b, 0x89d0, 0x101b, { 

 0xa6, 0xe4, 0x0, 0xdd, 0x1, 0x11, 0xa6, 0x58 
 } 
 };
 
 /////////////////////////////////////////////////
 // CDdjdemoCtrl::CDdjdemoCtrlFactory::UpdateRegistry -
 // Adds or removes system registry entries for CDdjdemoCtrl
 BOOL CDdjdemoCtrl::CDdjdemoCtrlFactory::UpdateRegistry(BOOL bRegister)
 {
 if (bRegister)
 return AfxOleRegisterControlClass(
 AfxGetInstanceHandle(),
 m_clsid, m_lpszProgID, IDS_DDJDEMO, IDB_DDJDEMO,
 TRUE, // Insertable
 OLEMISC_ACTIVATEWHENVISIBLE 
 OLEMISC_SETCLIENTSITEFIRST 
 OLEMISC_INSIDEOUT 
 OLEMISC_CANTLINKINSIDE 
 OLEMISC_RECOMPOSEONRESIZE,
 _tlid, _wVerMajor, _wVerMinor);
 else
 return AfxOleUnregisterClass(m_clsid, m_lpszProgID);
 }
 
 ////// CDdjdemoCtrl::CDdjdemoCtrl - Constructor /////////////////////////
 CDdjdemoCtrl::CDdjdemoCtrl()
 {
 // Set sensible initial size for the control
==> SetInitialSize(250, 100);
 InitializeIIDs(&IID_DDdjdemo, &IID_DDdjdemoEvents);
 
 // Call Ray's Initialize function
==> phonedir_Initialize();
 }
 ///// CDdjdemoCtrl::~CDdjdemoCtrl - Destructor ////////////////////////
 CDdjdemoCtrl::~CDdjdemoCtrl()
 {
 // Call Ray's Terminate function
==> phonedir_Terminate();
 }
 ////// CDdjdemoCtrl::OnDraw - Drawing function ///////////////////////////
 void CDdjdemoCtrl::OnDraw(
 CDC* pdc, const CRect& rcBounds, const CRect& rcInvalid)
 {
 DoSuperclassPaint(pdc, rcBounds);
 }
 
==> #ifndef _WIN32
 // For Windows 3.1, some subclassed controls can't be safely drawn
 // to a metafile. As we don't draw to a metafile anyhow, supply an 
 // empty override for the function. If we had a drawing representation, 
 // we'd iterate the list box, calling DrawText/TextOut for each list item.
==> void CDdjdemoCtrl::OnDrawMetafile(CDC* pdc, const CRect& rcBounds)
==> {
==> }
==> #endif
 
 ///// CDdjdemoCtrl::DoPropExchange - Persistence support /////////////////
 void CDdjdemoCtrl::DoPropExchange(CPropExchange* pPX)

 {
 ExchangeVersion(pPX, MAKELONG(_wVerMinor, _wVerMajor));
 COleControl:: DoPropExchange(pPX);
 // TODO: Call PX_ functions for each persistent custom property.
 }
 
 ////// CDdjdemoCtrl::OnResetState - Reset control to default state ///////
 void CDdjdemoCtrl::OnResetState()
 {
 // Resets defaults found in DoPropExchange
 COleControl::OnResetState(); 
 // TODO: Reset any other control state here.
 }
 
 ////// CDdjdemoCtrl::AboutBox - Display an "About" box to the user ///////
 void CDdjdemoCtrl::AboutBox()
 {
 CDialog dlgAbout(IDD_ABOUTBOX_DDJDEMO);
 dlgAbout.DoModal();
 }
 
 ///// CDdjdemoCtrl::PreCreateWindow - Modify parameters for CreateWindowEx
 BOOL CDdjdemoCtrl::PreCreateWindow(CREATESTRUCT& cs)
 {
 // Modify the style bits for the listbox so we 
 // 1) can make a tab-separated 2-column list; 
 // 2) get notification of listbox events; and 
 // 3) can do vertical scrolling.
==> cs.style = LBS_USETABSTOPS LBS_NOTIFY WS_VSCROLL;
 cs.lpszClass = _T("LISTBOX");
 return COleControl::PreCreateWindow(cs);
 }
 
 // CDdjdemoCtrl::GetSuperWndProcAddr - Provide storage for window proc ////
 WNDPROC* CDdjdemoCtrl::GetSuperWndProcAddr(void)
 {
 static WNDPROC NEAR pfnSuper;
 return &pfnSuper;
 }
 
 ///// CDdjdemoCtrl::OnOcmCommand - Handle command messages //////////////
 LRESULT CDdjdemoCtrl::OnOcmCommand(WPARAM wParam, LPARAM lParam)
 {
 #ifdef _WIN32
 WORD wNotifyCode = HIWORD(wParam);
 #else
 WORD wNotifyCode = HIWORD(lParam);
 #endif
 
 // This is where the listbox notifications are received. The only
 // one we're interested in is LBN_SELCHANGE. When the selection is
 // changed, we reuse the code for GetCurrentName and GetCurrentNumber
 // to retrieve the correct strings for name and phone number, then
 // call the FireNameNumberChanged event that ClassWizard created.
==> switch(wNotifyCode)
==> {
==> case LBN_SELCHANGE:
==> FireNameNumberChanged(GetCurrentName(), 
==> GetCurrentNumber()); break;

==> }
 return 0;
 }
 
 ////// CDdjdemoCtrl message handlers /////////////////////////////////////
 BSTR CDdjdemoCtrl::GetNameFromNumber(LPCTSTR szNumber) 
 {
 // Use one-minute phone directory API to retrieve a name
 // given a number
==> CString s = phonedir_LookupByNumber((char *)szNumber);
 return s.AllocSysString();
 }
 
 BSTR CDdjdemoCtrl::GetNumberFromName(LPCTSTR szName) 
 {
 // Use one-minute phone directory API to retrieve a number
 // given a name
==> CString s = phonedir_LookupByName((char *)szName);
 return s.AllocSysString();
 }
 
 BSTR CDdjdemoCtrl::GetCurrentName() 
 {
==> UINT nIndex;
 CString s;
 
 // If there is a selection, then get the corresponding name,
 // otherwise return an empty string.
==> if((nIndex=(UINT)SendMessage(LB_GETCURSEL)) != LB_ERR)
==> s = phonedir_LookupByOrdinal(nIndex).SpanExcluding("\t");
==> else
==> s = "";
 
 return s.AllocSysString();
 }
 
 BSTR CDdjdemoCtrl::GetCurrentNumber() 
 {
==> UINT nIndex;
 CString s;
 
 // If there is a selection, then get the corresponding number,
 // otherwise return an empty string.
==> if((nIndex=(UINT)SendMessage(LB_GETCURSEL)) != LB_ERR)
==> {
==> s = phonedir_LookupByOrdinal(nIndex);
==> s = s.Mid(s.Find("\t") + 1);
==> }
==> else
==> s = "";
 return s.AllocSysString();
 }
 int CDdjdemoCtrl::OnCreate(LPCREATESTRUCT lpCreateStruct) 
 {
 if (COleControl::OnCreate(lpCreateStruct) == -1)
 return -1;
 // Access the database using an ordinal lookup to get
 // tab-separated strings. Add the strings to the listbox
 // for initial population of the list.

==> CString strTemp;
==> for(int i = 0; 
==> (strTemp = phonedir_LookupByOrdinal(i)).GetLength() != 0; 
==> i++)
==> SendMessage(LB_ADDSTRING, 0, (long)(LPCTSTR)strTemp); 
 return 0;
 }























































Special Issue, 1994
EDITORIAL


Windows Peer-to-Peer Programming


The story goes that when Bill Gates brought Gordon Letwin aboard to write the
Microsoft Basic compiler, Letwin hung a sign on his office door that read "Do
not disturb, feed, poke, tease_the animal." Five months later, Letwin emerged
from his den with a compiler that was to become the basis for all future
versions of MS-Basic. From those days of MS- and PC-DOS, Letwin went on to
become chief architect of Microsoft OS/2 1.x. Windows 1 was the first
Intel-based GUI, and OS/2 was to be the logical migration path. When the
Microsoft/IBM arrangement abruptly ended, so did the era in which Microsoft
provided the operating system and IBM the hardware. Although they got the OS/2
code base as part of the deal, IBM reportedly rewrote OS/2 from scratch.
Nevertheless, the indelible fingerprints of Microsoft were left upon the
design of OS/2 while Microsoft would soon announce yet another new operating
system--Windows NT.
Developing an operating system from scratch is not a trivial task, however.
IBM, for example, took more than a decade to develop its operating system for
the IBM/360. As Microsoft discovered, it's much cheaper to build an
environment on top of an existing operating system. OSF designer and architect
Vania Joloboff came to this same conclusion in designing versions of Motif,
which is based on the X Consortium's Intrinsics Toolkit (Xt). In a talk at the
1991 TOOLS conference, Joloboff stated that a primary goal in designing early
versions of Motif was to provide a look-and-feel as close as possible to
Windows. The trade-off, of course, is performance. As NT becomes available on
non-Intel platforms, UNIX users will have to make a similar choice.
But developers like Letwin and Joloboff are becoming an endangered species, as
the mystique of the midnight programmer gives way to the group-grope of team
development. That's one reason why Windows programming has never been more
difficult. Microsoft's approach of throwing into the operating environment
everything but the heat sink already means developers must deal with system
services for graphics (GDI), video and sound (multimedia extensions), pen
support, online help, interapplication communication (DDE and OLE),
multitasking, and more. And on the horizon, there are even more features--it's
reported that the new Chicago, or what many regard as Windows 4, will include
network services and support IPX/SPX protocols. Team members must specialize
in particular domains of expertise such as the user interface, network
support, graphics, and the like.
This issue of Dr. Dobb's Sourcebook of Windows Programming brings you its own
team of domain experts who share their experiences with and insights into the
Windows and Windows NT environments. Many of the techniques, such as Craig
Lindley's dynalinking and Al Williams's method for simulating pipes under
Windows, are quite unique. Others, such as Joseph Newcomer's method for
avoiding the limitation on the DOS Path environment variable, will help you in
everyday programming. Dick Wilmot, Rick Knoblaugh, and Joe Hlavaty present
utilities that you can use to probe the inner workings of Windows. And to get
the most out of the development environments you use, Joachim Schrmann
describes how to squeeze more performance from Visual Basic programs. Finally,
Vinod Anantharaman returns to Dr. Dobb's to share his techniques for
subclassing with the Microsoft Foundation Class library.
Creating great Windows programs will never be easy, no matter what the spin
doctors tell you about visual programming, custom controls, class libraries,
or other ease-of-use tools. The best source of tools and techniques for
building the best Windows applications are your fellow programmers who, like
the authors in this issue, graciously share their expertise with you.
Michael Floyd
executive editor















































Special Issue, 1994
2PANE Illuminates Windows


Top-level windows and the message loop




Dick Wilmot


Dick is a longtime Windows programmer and editor of The Independent RAID
Report. He can be contacted at dwilmot@crl.com.


How many top-level windows does a windows program have? As many as it wants.
This isn't a trick question. An initial study of Windows programming might
lead you to believe that each instance of a Windows application has exactly
one top-level window, but that is not at all true. I even wrote one useful
Windows application program that had no windows. 
One version of that little program didn't even have a message loop or a window
procedure; its only job was to minimize the active window and terminate.
Having no message loop is unusual, since Windows programs are message driven.
We all know that at the heart of nearly every Windows application is a message
loop, but we will see from the instrumentation in 2PANE, the program presented
with this article, that a huge number of messages bypass a program's message
loop.
2PANE's instrumentation lets you probe how Windows messaging and window
procedures work. Listing One is the include file for the Windows 3
implementation of 2PANE, and Listing Two , the C source file. Other necessary
files (RES, OBJ, RC, MAK, and so on) along with an NT version are available
electronically; see "Availability," page 3. 


Creating Two Windows


Creating two or more windows--like the 2PANE-generated ones in Figure 1--is
straightforward, but managing them is easier if they have separate captions.
This permits the program to later know for which window each message has been
sent. 2PANE registers its window class in a completely normal fashion and
creates all its windows with the same registered class name, but each window
gets its own unique caption from a table of captions. Although the program
creates only two windows, with minor changes it could be generalized to create
as many as our desktop and other resources have room to handle.
Why create two windows? In this case, to illustrate that it can be done, but
why might you do this in a real application? One reason is that it is
extremely easy to do. If your application outgrows the typical main window and
Windows dialogs don't allow enough independent viewing and interaction, then
multiple main windows might be what you need. Separate document windows can be
handled with the multiple-document interface (MDI) but MDI comes to feel
terribly confining at times--restricted to the boundaries of that single
client area. It is hard to get one window out of the way to see another, and
yet, that is often just what the user needs to do. Multiple windows, as in
2PANE, can handle multiple interactive windows that the user is free to
manipulate, move, and resize at will.
What does having multiple open windows cost you? Really nothing more than a
bit of bookkeeping to track which window is being acted upon. This does not
appear excessive and might be less so than using MDI. If you need multiple
windows with very different types of interaction, such as spreadsheets and
charts, then you might need to change the appearance of the menu bar(s) each
time the focus shifts from one window type to the other. Using multiple
independent windows allows each window to have its own menus tailored to its
purposes. 
2PANE creates all its windows as part of initialization, but in actual
applications you aren't constrained to creating all windows up front--you can
create them as interactions with the user and other program's progress. The
programming might be easier if window creation is isolated into a single
subroutine that can create windows and track their handles. This would make
for easier uniformity of common elements and keep the minutiae of window
creation away from the application-handling code.
Managing multiple messages from a single application instance is an
alternative way to communicate large quantities of data between windows
without using a DLL. For simple, multiwindow interaction to common data, a
multiwindow application--a fleshed-out 2PANE--involves less programming work
than implementing a separate DLL. Interwindow communication should also be as
fast as for a DLL, since all variables are in the same application instance.
There is a single heap and a single stack. I also found this form of program
easier to debug than a program with a separate library. All the code and
variables are readily accessible in a single debugging window.
Some simple programming at the end of 2PANE.C allows users to close windows at
will; see Example 1(a). The application code stays alive if there are any
windows still open. This is the same behavior as that of the usual DLL, which
remains active until its client closes.
At the beginning of its window procedure, 2PANE.C converts the message handle
(addressee: window to whom the message is addressed) into an integer window
number; see Example 1(b). The if statement following the window-handle lookup
handles the case where a message handle does not correspond to any of the
windows 2PANE created. This indeed happens, so you need to guard against it
and, in this case, just convert it to the first window. Conversion of
message-window handles to integer-window numbers is exceedingly handy for
instancing variables that pertain to different windows, such as user-selected
options and "within-window" positioning information. Robust handling of
out-of-range window handles was omitted here for brevity.


How Many Messages?


Like most Windows application programs, 2PANE has a message loop to retrieve
messages from its message queue. The only difference here is that 2PANE's
message loop has been instrumented with two counters; see Example 2. The
msg_ct variable counts all messages retrieved by the GetMessage loop within
the WinMain function. WM_PAINT messages are counted separately because they
offer an easy probe into the messaging system. Another set of counters in the
windows procedure, WinProc, tallies the numbers of messages and number of
WM_PAINT messages that pass through that function.
As Figure 2 illustrates, the four message tallies are displayed in the four
corners of each of 2PANE's initial windows. It is evident from the initial
2PANE window displays that a great many messages reach WinProc without having
gone through WinMain. This actually makes perfect sense since 2PANE has called
for the creation, showing, and updating of two windows before the code even
reaches the GetMessage loop. Windows could just place the messages related to
these construction activities in the application's message queue, but it
doesn't. If it did try to just queue up these messages for later action, then
it would overflow the default-message queue, which can only hold eight
messages. Immediate execution for these messages also makes for easier error
handling since deferred actions that went awry would not indicate which part
of a program to notify. So the first window is created using 19 messages, none
of which went through the GetMessage loop, because the program has yet to
reach that part of its code.
After creating each window, the program performs a ShowWindow, followed by
UpdateWindow. ShowWindow creates a WM_PAINT message, but WM_PAINT messages are
low priority and can stay toward the bottom of the queue. If the update region
for a window is not empty, then UpdateWindow forces the sending of a waiting
WM_PAINT. The program works fine with UpdateWindow omitted, but painting might
be unacceptably delayed in a high-traffic environment where the low-priority
WM_PAINT messages could keep getting pushed aside.
The tally information shown in 2PANE's windows is from the time each window
was painted, but this counts for the whole application--tallies are not
separated by window, although they could be. 
WM_PAINT messages are often sent directly to the window's window procedure,
WinProc, bypassing the application message queue entirely. You can, by the
way, give the window procedure any name you like. You just need to export this
procedure so Windows can find it, and it must be named in registering the
window class. Windows will then find and directly invoke the window
procedure's name, as it does for the UpdateWindow call. 
ShowWindow is not the only way that WM_PAINT messages originate. Anything that
causes a window's content to be invalidated will prod Windows into generating
a WM_PAINT message, which is really just an object-oriented command to "go
paint yourself." 
If you activate a window that was partly obscured or drag a window away from
one that it was covering, Windows will treat the newly uncovered portions as
invalid and will send WM_PAINT messages to the affected windows. Spy, the
diagnostic utility that comes with the Microsoft Windows SDK, lets you see
what messages are being sent to a window. Covering and then uncovering one of
the 2PANE windows with Spy's window is a handy way to generate just the
message sequence from an uncovering event while also spying on that message
traffic.
As Figure 3 shows, Spy indicates that five messages were generated, one of
which was a WM_PAINT. It would make sense for 2PANE to get a repaint
(WM_PAINT) message when a part of its client area has been uncovered, and it
receives a paint message passed by DispatchMessage to the WM_PAINT case in its
window procedure. That code uses BeginPaint to obtain a device context for the
screen and begin its painting work. However, before returning to 2PANE's
WM_PAINT code, BeginPaint calls WinProc with a WM_ERASEBKGND message, as shown
by the Spy trace. So a WM_ERASEBKGND message is issued and entirely processed
(by the default window procedure, since 2PANE has no case for that message
type) before 2PANE has finished with its WM_PAINT case. Recursive calls are
issued to WinProc.
2PANE's WM_PAINT code invalidates the entire client area of its window with a
call to InvalidateRect. If it didn't do that, then only the portion of the
window that was newly uncovered would be painted, and you wouldn't get to see
the updated message counts. 


Someone Else's Paint Messages 


You'll probably want to know why the count of total paint messages (in the
upper-right corner) in Figure 3 has jumped from 0 to 2. One paint message
revealed from uncovering Pane 2 came through the GetMessage loop and was duly
counted, but another was there that didn't belong to 2PANE. It was for the
desktop behind Spy, which had to be repainted after the Spy window was moved
to cover Pane 2. Windows receives paint messages for the desktop. These
messages are, however, intercepted by DispatchMessage, which does not send
them to the application's window procedure. Spy does not show these messages.
They can only be seen with a debugger, but they are not completely ghost
messages: 2PANE's tallies confirm their passage through the message loop.


Character Messages


2PANE has also been instrumented to count the messages involved in keyboard
input. The counting process is the same as that for messages retrieved from
the application's queue and the number processed in WinProc. The counts are
displayed in a somewhat-different fashion so as not to distort the counting.
The WM_PAINT case uses the DrawText Windows API function, but DrawText will
only draw its output where the screen is invalid. You could invalidate the
screen, but this would force a repaint, and that would disturb the painting
and general-message counts. So the code in the WM_CHAR case uses TextOut with
some positioning of the current output point.
Character input from the keyboard does not cause repainting but it does
engender quite a few messages (see Figure 4). The message count from the
GetMessage loop includes a WM_KEYDOWN, a WM_CHAR, and a WM_KEYUP message for
each character keystroke. These messages are then multiplied into even more
messages that are seen by WinProc. WM_KEYUP messages generate WM_GETTEXT
messages. If the user pauses in typing, then a stream of WM_KEYUP messages is
seen in the GetMessage loop, and WinProc sees a long stream of pairs of
WM_KEYUP, WM_GETTEXT messages. Why Windows needs to send repetitive
notifications that a key has moved to the up position is a mystery. Weirder
still, the wParam value for the WM_KEYUP messages is the "S" key even if that
key hasn't been pushed.



Sending vs. Posting


When a program wants to pass a message to a window, it can either "send" the
message or "post" it. Posting the message will deposit it in the receiving
application's queue, while sending it will immediately call the receiving
application's WinProc, which will process the message. 2PANE has been equipped
with both these forwarding methods for its character input. It does not
forward the WM_KEYDOWN and WM_KEYUP raw keystrokes but, rather, the distilled
character interpretations of them. An Echo menu allows the user (experimenter)
to select either no echoing, PostMessage echoing, or SendMessage forwarding of
characters between windows. Before forwarding a character, the code checks to
be sure that it is receiving the character from the keyboard and not from the
other window (that it has the "focus"). Without this check, the character
would be echoed endlessly back and forth between windows, making for an
undesired type of autorepeat.
When the user selects send-mode echoing, the receiving window shows no
increase in GetMessage loop count, just as you would expect. Sending is indeed
an immediate call to the receiver's window procedure. Keying characters seems
to usually generate more than 20 messages per keystroke to the window
procedure, but when only distilled characters are forwarded to the other
window, as done in 2PANE, then the receiving window procedure is entered only
once per stroke (actually less, since Shift keys are not forwarded); see
Figure 5. 
Character forwarding can be used with many different kinds of windows. In
Windows 3.1, a program can send characters to nearly any window, whether or
not it owns or is otherwise associated with that window.
When programming messages to other windows, programmers should keep in mind
that SendMessage is a form of call and will process completely through the
receiving window's window procedure. This might not be good for overall
performance, since it gives no other windows a share of the computer.
PostMessage has several advantages: It places the message in the receiving
queue and returns to your program without processing the message. When the
receiving application invokes GetMessage, Windows can give other applications
a turn at the CPU. If the receiving program has a problem, then it will have
to field the problem or fail. The sending program will not fail from a crash
in the destination window's code.
In using the SendMessage function, it is important to be sure that the
receiving window's code will always be available and ready to run. It is
advisable to have the destination window have a window procedure. Also, a
deadlock situation can be created if the destination-window procedure yields
control (for example, with GetMessage, PeekMessage, Yield, DialogBox, and so
on) after being sent a message. The destination-window procedure should call
InSendMessage before using any Windows functions that might yield control and
then, for sent messages, use ReplyMessage to return control to the sending
application to avoid the deadlock condition. In the absence of source code, it
may be difficult to ascertain whether a destination-window procedure is
deadlock safe. If a target procedure yields control, then the sending
procedure is left waiting for a return from its SendMessage call and will be
unable to continue processing messages. 
PostMessage is safer but SendMessage is the only way to get a return code from
the destination window. If you need to be sure of what happened to the
message, then you need SendMessage, otherwise you are probably better off
using PostMessage.


Porting 2Pane to NT


Since I was in the process of converting to Windows NT, I ported the 2PANE
program to Visual C++ NT. The porting process was mostly straightforward, but
one very interesting obstacle caused an hour-long phone call to Microsoft. 
The Windows 3.1 version of the program was developed using Borland C++ 3.1 and
uses a menu that I, of course, named 2PANE. I called Microsoft and found out
that I couldn't name a menu 2PANE or any other name that begins with a
numeral. The compiler didn't complain, and the program executed fine. There
just wasn't any menu when the main title bar appeared. Taking the "2" out of
the menu name in the program as well as the resource file (2PANE.RC) did the
trick and generated a menu. I don't know whether this feature is peculiar to
the NT version of Visual C++ or might also be encountered in the 16-bit
version.
A general change was to remove the _export directive from function
definitions. There also appeared to be a slight difference in behavior of the
break statement in Visual C++, which caused me to use an intermediate variable
in the NT version.
I liked using the Visual C++ NT development system. So far, it has been handy
for debugging, as I can stay right in the full Windows NT environment while
doing so. I teach Windows programming at night school, and NT has been a good,
bulletproof base for grading the beginning students' programs.
The Visual C++ NT compiler also pointed up a problem in my programming
practice when it objected to using NULL as a parameter where a number was
expected. Consequently, I changed the last parameters in SendMessage and
Post-Message from NULL to 0L to reflect that these last parameters are long
integers. NULL should be used to indicate an unused pointer, not an
unspecified integer.
2PANE behaves in the same way under NT as with Windows 3.1, but the counts are
not identical. Of course, the underlying systems are not even similar.
Example 1: (a) This code at the end of 2PANE.C allows users to close windows
at will; (b) converting the message handle into an integer window number.
(a)
case WM_DESTROY:
 if (--iWin_ct < 1) PostQuitMessage (0) ; // only die after last window closes
 return 0 ;
(b)
for (iWinNbr = 0; iWinNbr <= MAX_NBR_WIN; iWinNbr++)
 if (hCurrWin == hWnd [iWinNbr])
 break ;
if (iWinNbr > MAX_NBR_WIN - 1) // not one of our handles.
 iWinNbr = 0 ; // pretend 1st window
Example 2: The 2PANE message loop has been instrumented with two counters.
while (GetMessage (&msg, NULL, 0, 0))
 {
 msg_ct++ ; // count messages
 if (msg.message == WM_PAINT)
 paintmsg_ct++ ; //count paint messages
 TranslateMessage (&msg) ;
 DispatchMessage (&msg) ;
 }
 return msg.wParam ;
Figure 1 Sample windows generated by 2PANE.
Figure 2 Four message tallies are displayed in the four corners of each of
2PANE's initial windows.
Figure 3 Using Spy to track messages.
Figure 4 2PANE's handling of character messages.
Figure 5 Character forwarding under 2PANE.

Listing One 

/* -----------------------
 2PANE.H header file
 ----------------------- */

#define IDM_NOECHO 0
#define IDM_POST 1
#define IDM_SEND 2
#define MAX_NBR_WIN 2




Listing Two

/*--------------------------------------------------------
 2PANE.C -- Displays "2 Panes" from 1 program & counts messages
 Copyright 1993 Dick Wilmot 
 --------------------------------------------------------*/
#define STRICT // get more warnings
#include <windows.h>
#include <string.h>
#include <stdlib.h>
#include "2pane.h"

static HWND hWnd [MAX_NBR_WIN + 1] ; // Cannot be local variables! Else not 
 // available in WinProc
static int iWin_ct = 0; // count of open windows
static int msg_ct, proc_ct, paintmsg_ct, paintproc_ct ;

long FAR PASCAL _export WinProc (HWND, UINT, UINT, LONG) ;

int PASCAL WinMain (HINSTANCE hInstance, HINSTANCE hPrevInstance,
 LPSTR lpszCmdParam, int nCmdShow)
 {
 static char szAppName[ ] = "twopane" ;
 LPCSTR lpszCaption [ ] = {"Pane 1", "Pane 2"} ;
 MSG msg ;
 WNDCLASS wndclass ;
 int i ;

 msg_ct = proc_ct = paintmsg_ct = paintproc_ct = 0 ; // zero all counters

 if (!hPrevInstance)
 {
 wndclass.style = CS_HREDRAW CS_VREDRAW ;
 wndclass.lpfnWndProc = WinProc ;
 wndclass.cbClsExtra = 0 ;
 wndclass.cbWndExtra = 0 ;
 wndclass.hInstance = hInstance ;
 wndclass.hIcon = LoadIcon (NULL, IDI_APPLICATION) ;
 wndclass.hCursor = LoadCursor (NULL, IDC_ARROW) ;
 wndclass.hbrBackground = GetStockObject (WHITE_BRUSH) ;
 wndclass.lpszMenuName = "2pane" ;
 wndclass.lpszClassName = szAppName ;

 RegisterClass (&wndclass) ;
 }
 for (i = 0; i <= MAX_NBR_WIN - 1; i++)
 {
 hWnd [i] = CreateWindow (szAppName, // window class name
 lpszCaption [i], // window caption
 WS_OVERLAPPEDWINDOW, // window style
 (int) (100 + i * 200), // initial x position
 (int) (100 + i * 200), // initial y position
 200, // initial x size
 200, // initial y size
 NULL, // parent window handle
 NULL, // window menu handle
 hInstance, // program instance handle

 NULL) ; // creation parameters
 iWin_ct++ ;
 ShowWindow (hWnd [i], nCmdShow) ; // show the window just created
 UpdateWindow (hWnd [i]); // force painting of window
 }
 while (GetMessage (&msg, NULL, 0, 0))
 {
 msg_ct++ ; // count messages

 if (msg.message == WM_PAINT)
 paintmsg_ct++ ; // count paint messages
 TranslateMessage (&msg) ;
 DispatchMessage (&msg) ; // hand message to Windows' dispatcher
 }
 return msg.wParam ;
 }

long FAR PASCAL _export WinProc (HWND hCurrWin, UINT message,
 WPARAM wParam, LPARAM lParam)
 {
 HDC hdc ;
 PAINTSTRUCT ps ;
 RECT rect ;
 static char szBuffer[10] ;
 static int iEchoType[MAX_NBR_WIN] ; // flag for type of 
 // echoing from each window
 static int xCaret [MAX_NBR_WIN], cxCharSize, cyCharSize ;
 static LPCSTR lpszBuffer = szBuffer ; // pointer to text buffer
 TEXTMETRIC tm ;
 int iWinNbr ; // window number
 HWND hOtherWnd ;
 int i ;
 char ch ; 

 proc_ct++ ; // increment WinProc count

 for (iWinNbr = 0; iWinNbr <= MAX_NBR_WIN ;
 iWinNbr++) // translate handle to our window number
 if (hCurrWin == hWnd [iWinNbr])
 break ;
 if (iWinNbr > MAX_NBR_WIN - 1) // not one of our handles.
 iWinNbr = 0 ; // pretend 1st window
 hOtherWnd = hWnd [(iWinNbr + 1) % MAX_NBR_WIN] ; 
 // establish other window handle

 switch (message)
 {
 case WM_COMMAND:
 iEchoType [iWinNbr] = wParam ; 
 // remember what kind of echoing user wants
 return 0 ;

 case WM_CHAR :
 ch = (char) wParam ;
 szBuffer [0] = ch ; // put character in text buffer
 szBuffer [1] = '\0' ; // terminate with zero
 hdc = GetDC (hCurrWin) ; // get device context to draw on
 SelectObject (hdc, GetStockObject (SYSTEM_FIXED_FONT)) ;
 GetTextMetrics (hdc, &tm) ;

 cxCharSize = tm.tmAveCharWidth ; // How wide characters are.
 cyCharSize = tm.tmHeight+tm.tmExternalLeading ; 
 //How tall characters need be
 TextOut (hdc, xCaret [iWinNbr]++ * cxCharSize, 30, lpszBuffer, 1) ;
 SetTextAlign (hdc, TA_TOP) ;
 TextOut (hdc, 0, 0, strupr( itoa (msg_ct, szBuffer, 10 )), 
 strlen (szBuffer)) ;
 GetClientRect (hCurrWin, &rect) ;
 SetTextAlign (hdc, TA_UPDATECP) ; // position to bottom of window
 MoveTo (hdc, 0, rect.bottom - cyCharSize ) ; // move up 1 text line
 TextOut (hdc, 0, 0, strupr( itoa (proc_ct, szBuffer, 10 )), 
 strlen (szBuffer)) ;
 ReleaseDC (hCurrWin, hdc) ;
 if ((GetFocus ()) == hCurrWin) 
 // echo if requested & this window has focus
 {
 if (hCurrWin == hOtherWnd)
 MessageBox (hCurrWin, "Sending to Self",
 "Warning",
 MB_ICONSTOP) ;
 if (iEchoType [iWinNbr] == IDM_SEND) 
 // user wants send echoing
 SendMessage (hOtherWnd, WM_CHAR, wParam, 0L) ;
 if (iEchoType [iWinNbr] == IDM_POST) 
 // user wants post echoing
 PostMessage (hOtherWnd, WM_CHAR, wParam , 0L) ;
 }
 return 0 ;

 case WM_PAINT:
 paintproc_ct++ ;

 InvalidateRect (hCurrWin, NULL, TRUE) ; 
 // erase whole background for repaint
 hdc = BeginPaint (hCurrWin, &ps) ;
 GetClientRect (hCurrWin, &rect) ;
 DrawText (hdc, strupr( itoa (msg_ct, szBuffer, 10 )), -1, &rect,
 DT_SINGLELINE DT_LEFT DT_TOP);
 DrawText (hdc, strupr( itoa (proc_ct, szBuffer, 10 )), -1, &rect,
 DT_SINGLELINE DT_LEFT DT_BOTTOM);
 DrawText (hdc, strupr( itoa (paintmsg_ct, szBuffer, 10 )), 
 -1, &rect, DT_SINGLELINE DT_RIGHT DT_TOP);
 DrawText (hdc, strupr( itoa (paintproc_ct, szBuffer, 10 )), 
 -1, &rect, DT_SINGLELINE DT_RIGHT DT_BOTTOM);
 EndPaint (hCurrWin, &ps) ;
 return 0 ;

 case WM_DESTROY:
 if (--iWin_ct < 1) PostQuitMessage (0) ; 
 // only die after last
 return 0 ; // window closes
 }

 return DefWindowProc (hCurrWin, message, wParam, lParam) ;
 }







Special Issue, 1994
Very Dynamic Linking in Windows


Managing one-to-many relationships between application code and DLLs




Craig A. Lindley


Craig is a founder and an officer of Enhanced Data Technology (Colorado
Springs, CO), developers of the EnhancedView imaging-database tool. He is also
the author of Practical Image Processing in C and Practical Ray Tracing in C,
both published by John Wiley & Sons. Craig can be contacted at Enhanced Data
or on CompuServe at 73552,3375.


No Windows programmer questions the usefulness of dynamically linked libraries
(DLLs) in program development. The beauty of DLLs is that they promote
modularity of code by segmenting functionality into individual, easily
maintained modules. They promote code sharing because different applications
can make calls into a DLL without being directly linked to it. Further, DLLs
can be used to control run-time memory utilization in a Windows program by
loading functionality when it is needed and unloading it when it is not. In
fact, many modern application programs consist of a small executable program
bundled with many DLLs that provide most of the program's functionality. This
segmentation of functionality between EXEs and DLLs makes complex Windows
programs easier to develop and maintain.
In this article, I'll present a technique that manages the interface between
an application program and one or more DLLs, maximizing power and flexibility
in both. This technique, which I call "dyna-linking," is useful whenever there
is a one-to-many relationship between the application code and the DLLs
necessary to inter-face that code into its run-time environment. 


A DLL Backgrounder


Before Windows, if an application program needed to access functions contained
in a function library, the code for the library had to be linked directly into
the application program. The linker resolved all references made by the
application program into the library during the linking process. When the
linker's job was completed, the application-program code and the
function-libraries code were merged together into a single executable file,
the size of which was the combined total of the two bodies of code. In many
cases, only a few of the functions in a library might have been needed by the
application program, but all of the function-library code was linked with the
application code whether or not it was ever used. This was one reason why DOS
programs quickly consumed their allotted 640K of memory. 
Windows designers understood this problem and took steps to correct it. They
reasoned that not all functionality inherent in an application program had to
be in memory simultaneously as long as it could be loaded and executed quickly
when it was needed. Thus, DLLs were born. They are called "dynamically linked"
because the resolution of function and data addresses previously performed by
the linker at program-build time could now happen at run time. Therefore, the
application program could find out where in the DLL certain data items or
functions resided and call them as needed. The first reference to data or
functions in a DLL causes the DLL to be loaded into memory (if it isn't in
memory already). If references are never made, the DLL is never loaded. In
short, DLLs led to smaller application programs that load and execute faster,
increased modularity of code, and easier code maintainability.
As an aside, DLLs go by many names in the Windows environment. Usually they
have a DLL file extension, but they can have other extensions, as well,
including EXE, DRV, DS, and so on. They can contain executable code, ASCII
character strings, icons, fonts, and just about anything else you can think
of.


How Application Programs use DLLs 


Static DLL linking is accomplished by running the import librarian program on
a DLL to produce a "LIB" file, which is then linked into the Windows
application using the linker in your development environment. This small file
gives the application program information about the entry points within the
DLL that can be used at run time to load and execute functions. From Windows'
point of view, programs linked in this manner are a single entity. If you
attempt to execute a statically linked program and Windows cannot find a DLL
the application is linked with, you will receive a message stating that some
component of the application program is missing. An application program can be
statically linked to as many DLLs as it needs. Each DLL requires a separate
LIB file to be linked into the application-program's executable.
An application program can directly manage the loading and unloading of
component DLLs at run time using LoadLibrary and FreeLibrary function calls.
Once a DLL is loaded successfully into memory, the application program uses
the GetProcAddress function to get the addresses of all functions it needs to
call within the DLL. The application program calls the functions within the
DLL by using far pointers to the functions. When it is finished with a
particular DLL, it calls FreeLibrary to unload it from memory. Note that the
DLL will only be unloaded from memory if all applications using the DLL are
finished with it. Windows keeps track of which DLLs are in use and will only
unload a DLL when it is no longer needed.


Dynalinking


Dynalinking is an extension of the DLL direct-management technique. While you
can think of dynalinking as a C++ class encapsulation of DLL direct
management, it is actually much more. Dynalinking can be used to manage the
interface between an application program and a single DLL, but it is much more
powerful when used to adapt an application program to its environment using
multiple DLLs. To illustrate this, consider an application program that:
Supports status/error messages in multiple languages. 
Supports spell checking in multiple languages. 
Interfaces to n specific hardware-interface boards (video digitizers,
network-interface cards, and so on). 
Is a word processor which needs to export its text into formats compatible
with n different word processors. 
Operates in 16-, 256-, or true-color graphic modes.
Each of these presents a scenario which could be satisfied using a single
application program calling upon multiple DLLs to perform specific tasks.
Dynalinking provides this service.
Good software design would place all common portions of the interface code in
the application program, and device-specific code segmented into DLLs (the
"drivers"). The major advantages of using dynalinking are that:
A single, well-defined API exists between the application program code and all
dynalinked DLLs.
Only one version of the application program is needed for any configuration of
the application. This results in less time spent in program builds and reduced
maintenance costs.
Although the interface between the application program and all DLLs must be
exactly the same, the functionality provided by the DLLs can be as similar or
as different as required by the application. 
The application program can dynamically switch between DLLs at run time with
very little overhead. 


The Demonstration Program


To illustrate dynalinking, I'm providing a demo program for providing program
error/status messages in three different languages--English, German, and
Japanese. Dynalinking eliminates the need for a different version of the
application program for each language. Instead, all messages for each language
are contained in separate DLLs. The DLL used for messages determines which
language the application program supports. Thus, only one version of the
application program is required to operate in three different languages. In
the demo program, the language DLL is selected via a menu. In an actual
application, the selection could be made by reading an entry in an INI file or
reading an option bit programmed into a software lock device.
The demo program is comprised of one application program and three DLLs, one
each for English, German, and Japanese. When the demo program is executed, the
language to display a word in and the word to be displayed are selected by the
user. The selected word is then displayed in a window in the selected
language. Every time the language is changed, the old language DLL is unloaded
and the newly selected language DLL is loaded--all with two lines of
application code similar to Example 1. In this case, the current language DLL
is unloaded and the German DLL is loaded.



How Dynalinking Works


The best way to understand how dynalinking works is to examine the listings.
Listing One shows an important class-definition include file; Listing Two
gives the English version of the DLL code; Listing Three shows the dynalink
code; and the demo application program is in Listing Four . All of the files
required to produce the DLLs and the demo application are available
electronically; see "Availability," page 3.
The files message.hpp (Listing One) and engdll.cpp (Listing Two) define a very
basic Windows DLL with a minimum of functionality. Notice that they contain
the code for the MessageLookup C++ class in addition to the required LibMain
function and optional WEP procedure. This code is compiled as a DLL with
explicit functions exported in the large memory model using Borland C++
Version 3.1.
You'll notice that the constructor and destructor of the MessageLookup class
pop up a message box just to prove that they were executed. In a real
application, of course, these functions would probably do something useful.
LookupMessage provides the only real functionality by returning a long pointer
to a message, given a message ID. A message identifier of GOODMORNING-MSGID
returns a pointer to the string "Good Morning" for example. The German and
Japanese versions of this DLL are identical except for the strings they
return. In the Japanese DLL, the message identifier GOODMORNINGMSGID returns a
pointer to the string "Ohayo---gozaimasu." The German version returns "Guten
Morgen."
The implementation of the dynalink technique itself takes place in
dyna-link.cpp (Listing Three), which is always made a part of the application
program. In this code, a class identical to MessageLookup (the interface class
of the DLLs) is defined. This class, DLMessage-Lookup, has a one-for-one
function mapping for each function defined within the DLL. The constructor for
the DLMessageLookup class does most of the work. The parameter passed to the
constructor of the DLMessageLookup class is the identifier of the DLL to be
used. The identifier determines the filename of the DLL to be loaded into
memory. Prior to loading the DLL, a variable LibError is defined and set True,
indicating a loading error occurred. A problem loading the DLL is assumed
until proven otherwise. Next, an attempt is made to load the DLL into memory
using LoadLibrary. A return code of 33 or greater indicates a successful load.
A load error causes the constructor to pop up a message box and subsequently
terminate.
Next, some magic occurs. Most everyone who knows something about C++ knows
that all methods (functions) carry around a hidden this parameter that points
to the data for a specific class instance (or object). This is how objects
keep track of their personal data while executing shared code. The this
pointer is seldom explicitly referred to in software development, but it is
there nevertheless. Since the dynalink code is interfacing with C++ objects in
a DLL, the data for the DLL object must be allocated somewhere, and a pointer
to this data must be passed to the DLL C++ functions for them to work
correctly. In order to satisfy this requirement, I allocate a block of memory
the size of the object (in this case, the size of MessageLookup), which will
be used for the object's data. Locking this memory block produces a pointer
that will become the this pointer for each of the DLL functions.
Next, function pointers are retrieved for each function within the DLL. The
GetProcAddress function is repeatedly called with the entry-point number of a
specific function within the DLL. The address returned is that of the function
within the DLL. As shown in the listing, entry-point addresses are gotten for
the constructor, the destructor, and the LookupMessage functions. If all three
addresses are nonzero, indicating success, the constructor for the
MessageLookup class is called via its function pointer, passing the
synthesized this pointer. If all went well, the LibError flag is set False.
My LookupMessage has the same interface as the built-in LookupMessage function
within the DLL we are trying to call. It accepts a message ID number as a
parameter and returns a long pointer to the message string. When it is
executed, a check is first made to verify that LibError is False. A call into
a DLL that did not load correctly is destined to failure and will probably
cause your machine to crash and burn. If there is no LibError, a call is made
via the function pointer, and the LookupMessage function within the DLL is
executed.
The destructor for the DLMessageLookup class gets executed when the
DLMessageLookup object is deleted or goes out of scope. Again LibError is
checked, and if False, the destructor within the DLL is executed to allow
cleanup within the DLL. After that, the memory allocated for the object's data
is unlocked and freed. Finally, the DLL is unloaded from memory without a
trace. 
All of the code surrounded by DEBUGDL statements in Listing Three is part of a
serial-port debugger built into the dynalinked code. A debugger of this sort
is probably not necessary for a DLL interface with so few functions, but it
becomes a godsend when 30 to 40 DLL entry points must be monitored. 


Extensions


The basic dynalink technique can incorporate a serial-port debugger to check
the parameters and return values for each dyna-linked DLL function. This
provides a unique way to unintrusively monitor the operation of the DLL
functions at run time. To utilize the serial-debugging feature, the
application program needs to be recompiled with DEBUGDL defined, which causes
the serial code to be included in the application. An RS-232 terminal
connected to the COM2 serial port will then display all calls made to the DLL.
The serial parameters for the remote terminal are 9600 baud, 8 bits, no
parity, and 1 stop bit. These parameters can, of course, be altered by
changing the values hardcoded in the source code. Remote serial debugging is a
very convenient way of debugging the application-program/DLL interface.
Remember, however, if your application program also uses the COM2 port, there
may be contention for this port, and neither your application nor the
serial-port debugger will work correctly.
Building a dynalink interface to C instead of C++ code in a DLL is even easier
than the method shown. This is because the calls to GetProcAddress can specify
the ASCII names of the C functions within the DLL instead of their entry-point
numbers, as is required for the C++ version. Entry-point numbers must be
specified in the C++ version because of C++ name mangling. Also, the memory
allocated for the MessageLookup object is not required for a C-code version
and should be eliminated. Finally, all references to the this pointer must be
eliminated.


Caveats


The dynalinking technique presented here has only been tested with Borland C++
3.1. It may or may not work with other compilers. 
Care must be taken when adding functions to dynalinked DLLs because it is
possible to change the order of entry-point numbers hardcoded into the
GetProcAddress calls. If you always add the code for new functions at the end
of the listings, you'll be okay. If you insert the new code in the middle of
the listings, the entry-point numbers will change. Use Borland's TDUMP, or a
similar program, to determine the entry-point numbers of your DLL functions.
Finally, for a dynalinked C++ class within a DLL, you must provide a
constructor without parameters.


Closing Thoughts


I've been using dynalinking for over a year in our EnhancedView image-database
software with no problems. In fact, the utilization of this technique has
helped us maintain our sanity because it allows one version of our application
program to interface to five different DLLs (one for each different
image-digitizer board) under control of a bit encoded in a software lock. If
this technique were not used, we would need five different versions of our
application and five different sets of installation disks. 
Example 1: Code that loads a newly selected DLL.
delete pMessageLookupClass;
pMessageLookupClass = new
 DLMessageLookup(GERMAN);

Listing One 

/*********************************************************/
/*** "message.hpp" ***/
/*** Message Class Interface File ***/
/*** DynaLink Demo ***/
/*** Craig A. Lindley ***/
/*** Revision: 1.0 Last Update: 02/20/94 ***/
/*********************************************************/
// Check to see if this file already included
#ifndef MESSAGE_HPP
#define MESSAGE_HPP
#include <windows.h>
// These define the various message IDs
#define GOODMORNINGMSGID 100
#define GOODBYEMSGID 101
#define PLEASEMSGID 102
#define THANKYOUMSGID 103
#define CHEERSMSGID 104
/* The following is required when using a C++ class as an interface for a DLL.
It is required because this file will be included in both the application 
compilation and DLL compilation. All members (functions and data) are far!*/

#ifdef __DLL__
# define EXPORT _export
#else
# define EXPORT huge
#endif
class EXPORT MessageLookup {
 private:
 public:
 FAR MessageLookup(void); // Class constructor/destructor
 FAR ~MessageLookup(void);
 LPSTR FAR LookupMessage(WORD MessageID);// Lookup the message
};
#endif



Listing Two

/*********************************************************/
/*** "engdll.cpp" ***/
/*** English Language Message DLL Code ***/
/*** DynaLink Demo ***/
/*** Craig A. Lindley ***/
/*** Revision: 1.0 Last Update: 02/20/94 ***/
/*********************************************************/
#include "message.hpp"
MessageLookup::MessageLookup(void) {
 MessageBox(NULL, "English DLL Constructor Executed", "Status Message", 
 MB_OK MB_TASKMODAL);
}
MessageLookup::~MessageLookup(void) {
 MessageBox(NULL, "English DLL Destructor Executed", "Status Message", 
 MB_OK MB_TASKMODAL);
}
LPSTR MessageLookup::LookupMessage(WORD MessageID) {
 // Use MessageID value to find appropriate message
 switch(MessageID) {
 case GOODMORNINGMSGID:
 return "Good Morning";
 case GOODBYEMSGID:
 return "Goodbye";
 case PLEASEMSGID:
 return "Please";
 case THANKYOUMSGID:
 return "Thank You";
 case CHEERSMSGID:
 return "Cheers";
 }
 return "Unknown English Message ID";
}
// Library entry point for initialization
int CALLBACK LibMain (HANDLE, WORD, WORD wHeapSize,
 LPSTR) {
 if (wHeapSize > 0)
 UnlockData (0);
 return 1;
}
// Window's exit procedure for this DLL
int CALLBACK WEP(int) {

 return(1);
}



Listing Three

/*********************************************************/
/*** "dynalink.cpp" ***/
/*** This interface class provides for the dynamic ***/
/*** linking of DLLs to application programs. ***/
/*** Craig A. Lindley ***/
/*** Revision: 1.0 Last Update: 02/20/94 ***/
/*********************************************************/
#include <windows.h>
#include "dynalink.hpp"
// The following code is used for debugging and is only compiled
// when DEBUGDL is defined.
#ifdef DEBUGDL
// Global variables for serial support
static int DevID = -1;
static BOOL DevError = FALSE;
// Initialize the COM2 port for serial output. Parameters are
// 9600 baud, 8 data bits, no parity and 1 stop bit.
BOOL InitCOM2(void) {
 int Error;
 DCB dcb;
 DevID = OpenComm("COM2", 256, 256);
 if (DevID < 0) {
 DevError = TRUE;
 return FALSE;
 }
 Error = BuildCommDCB("COM2:9600,n,8,1", &dcb);
 if (Error < 0) {
 DevError = TRUE;
 return FALSE;
 }
 // Force the use of XOFF/XON flow control
 dcb.fOutX = 1;
 dcb.fInX = 1;
 Error = SetCommState(&dcb);
 if (Error < 0) {
 DevError = TRUE;
 return FALSE;
 }
 return TRUE;
}
// Write a string to the current serial port device
BOOL WriteString(LPSTR TheString) {
 COMSTAT CommState;
 int Offset, NumWritten;
 if (DevError) return FALSE;
 FlushComm(DevID, 0); // Flush transmission queue
 Offset = NumWritten = 0;
 while (TRUE) { // Loop until complete
 NumWritten = WriteComm(DevID, (LPSTR) &TheString[Offset],
 lstrlen((LPSTR) &TheString[Offset]));
 if (NumWritten < 0) {
 // Error writing data

 GetCommEventMask(DevID, 0xFFFF); // Clear the event mask
 return FALSE; // return an error indication
 } else {
 Offset += NumWritten;
 // Wait until all characters are sent
 do {
 GetCommError(DevID, &CommState);
 } while(CommState.cbOutQue);
 if (Offset >= lstrlen(TheString)) {
 GetCommEventMask(DevID, 0xFFFF);
 return TRUE;
 }
 }
 }
}
// Write a formated string to the serial port
void WriteFormattedString(char *szFormat, ...) {
 char szBuffer[256];
 char *pArguments;
 pArguments = (char *) &szFormat + sizeof(szFormat);
 wvsprintf(szBuffer, szFormat, pArguments);
 lstrcat(szBuffer,"\n\r");
 WriteString((LPSTR) szBuffer);
}
#endif

DLMessageLookup::DLMessageLookup(LanguageDLLType Language) {
 char Buffer[20];
 DWORD FirstEntry;
#ifdef DEBUGDL
 InitCOM2(); // Initialize the COM2 port
#endif
 LibError = TRUE; // Asssume lib cannot be loaded
 FirstEntry = 2; // Normal first entry number in DLL
 // Lookup the name of DLL from identifer
 switch(Language) {
 case ENGLISH:
 lstrcpy(Buffer, "ENGDLL.DLL");
 break;
 
 case GERMAN:
 lstrcpy(Buffer, "GERDLL.DLL");
 break; 
 case JAPANESE:
 lstrcpy(Buffer, "JAPDLL.DLL");
 break;
 
 default:
 MessageBox(GetFocus(), "Unknown language library file, 
 cannot continue!",NULL, MB_OK);
 return;
 }
#ifdef DEBUGDL
 WriteFormattedString("\n\r%s version of language DLL", Buffer);
#endif
 // Attempt to load the appropriate imaging DLL
 hLib = LoadLibrary((LPSTR) Buffer);
 if ((UINT) hLib <= 32) {
 MessageBox(GetFocus(), "Error loading library file!", NULL, MB_OK);

 return;
 }
 // When we get here, the appropriate library DLL has been loaded.
 // Alloc a chunk of memory for the MessageLookup class. Ptr to memory
 // will be passed as this pointer to all class functions.
 hMessageLookupClass = GlobalAlloc(GHND, sizeof(MessageLookup));
 if (!hMessageLookupClass) {
 MessageBox(GetFocus(), "Not enough memory for MessageLookup class!", 
 NULL, MB_OK);
 return;
 }
 // Get pointer for data to be used as this pointer.
 pMessageLookupClass = (MessageLookup *) GlobalLock(hMessageLookupClass);
 // Get all of the function pointers. NOTE: because this are fetched using
 // ordinal numbers, these number will change as new functions are added
 // to the class or rearranged.
 (FARCPROC) lpConstructorFn = (FARCPROC) GetProcAddress(hLib, 
 (LPSTR) FirstEntry+2);
 (FARCPROC) lpDestructorFn = (FARCPROC) GetProcAddress(hLib, 
 (LPSTR) FirstEntry+1);
 (FARCPROC) lpLookupMessageFn = (FARCPROC) GetProcAddress(hLib, 
 (LPSTR) FirstEntry);
 // Make sure all entry points to the DLL were resolved
 if (lpConstructorFn && lpDestructorFn && lpLookupMessageFn) {
 (*lpConstructorFn)(pMessageLookupClass);// Execute the constructor
 LibError = FALSE; // Indicate no error
 } else
 MessageBox(GetFocus(), "Error accessing DLL functions!", NULL, MB_OK);
}
// Destructor for the DLMessageLookup class
void DLMessageLookup::~DLMessageLookup(void) {
#ifdef DEBUGDL
 WriteFormattedString("MessageLookup DLL unloading");
 CloseComm(DevID);
#endif
 if (!LibError) // Run destructor only if constructor
 (*lpDestructorFn)(pMessageLookupClass); // ran ok.
 if (hMessageLookupClass) { // Free class data memory
 GlobalUnlock(hMessageLookupClass);
 GlobalFree(hMessageLookupClass);
 }
 // Free the DLL library
 if ((UINT) hLib > 32)
 FreeLibrary(hLib);
}
LPSTR DLMessageLookup::LookupMessage(WORD MessageID) {
 if (LibError) // If error detected
 return 0;
#ifdef DEBUGDL
 WriteFormattedString("LookupMessage: message id %d", MessageID);
#endif
 return ((*lpLookupMessageFn)(pMessageLookupClass, MessageID));
}



Listing Four

/*********************************************************/

/*** "app.cpp" ***/
/*** A simple demo program to illustrate ***/
/*** the concept of dynalinking. ***/
/*** Craig A. Lindley ***/
/*** Revision: 1.0 Last Update: 02/20/94 ***/
/*********************************************************/
/* This demo program demonstrates the dynalink technique described in the
accompanying article. The demo application is written in Borland's OWL.
It allows the user to select the language to be used for program
messages and then a word to display in the selected language.*/
#include <owl.h>
#include "app.h"
#include "dynalink.hpp"
class TSampleDynalinkApp : public TApplication {
 public:
 TSampleDynalinkApp(LPSTR AName, HINSTANCE hInstance,
 HINSTANCE hPrevInstance,
 LPSTR lpCmdLine, int nCmdShow)
 : TApplication(AName, hInstance, hPrevInstance,
 lpCmdLine, nCmdShow) {};
 virtual void InitMainWindow();
};
_CLASSDEF(TAppWindow)
class TAppWindow : public TWindow {
 private:
 // Private functions
 // Language select menu item functions
 virtual void CMEnglish(RTMessage Msg) = [CM_FIRST + CM_ENGLISH];
 virtual void CMGerman(RTMessage Msg) = [CM_FIRST + CM_GERMAN];
 virtual void CMJapanese(RTMessage Msg) = [CM_FIRST + CM_JAPANESE];
 // Word select menu item functions
 virtual void CMGoodMorning(RTMessage Msg) = [CM_FIRST + CM_GOODMORNING];
 virtual void CMGoodbye(RTMessage Msg) = [CM_FIRST + CM_GOODBYE];
 virtual void CMPlease(RTMessage Msg) = [CM_FIRST + CM_PLEASE];
 virtual void CMThankyou(RTMessage Msg) = [CM_FIRST + CM_THANKYOU];
 virtual void CMCheers(RTMessage Msg) = [CM_FIRST + CM_CHEERS];
 // The about function
 virtual void CMAbout(RTMessage Msg) = [CM_FIRST + CM_ABOUT];
 // The paint function
 virtual void Paint(HDC PaintDC, PAINTSTRUCT _FAR & PaintInfo);
 // Private data
 DLMessageLookup *pMessageLookupClass; // Pointer to DL class
 int CurrentMessageID; // ID of selected message
 public:
 TAppWindow(PTWindowsObject AParent, LPSTR ATitle);
 ~TAppWindow();
};
// Window Class Constructor
TAppWindow::TAppWindow(PTWindowsObject AParent, LPSTR ATitle)
 : TWindow(AParent, ATitle) {
 
 // First assign the command menu to the window
 AssignMenu("CmdMenu"); 
 // Next, instantiate the English message DLL by default
 pMessageLookupClass = new DLMessageLookup(ENGLISH);
 CurrentMessageID = -1; // No message yet selected
}
// Window Class Destructor
TAppWindow::~TAppWindow() {

 delete pMessageLookupClass; // Delete the MessageLookup class object
}
// Window paint function. Displays the selected word in the center of
// the window, whenever window is repainted.
void TAppWindow::Paint(HDC PaintDC, PAINTSTRUCT _FAR & PaintInfo) {
 if (CurrentMessageID != -1) { // Display only if word has been selected
 DrawText(PaintDC, pMessageLookupClass->LookupMessage(CurrentMessageID),
 -1, &PaintInfo.rcPaint,
 DT_SINGLELINE DT_CENTER DT_VCENTER);
 }
}
// This function is called when English language is selected.
void TAppWindow::CMEnglish(RTMessage) {
 // First check this menu selection and uncheck others
 HMENU hMenu = GetMenu(HWindow);
 CheckMenuItem(hMenu, CM_ENGLISH, MF_CHECKED);
 CheckMenuItem(hMenu, CM_GERMAN, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_JAPANESE, MF_UNCHECKED);
 // Delete current message DLL and then instantiate the new one
 delete pMessageLookupClass;
 pMessageLookupClass = new DLMessageLookup(ENGLISH);
 // Invalidate window to cause a repaint
 InvalidateRect(HWindow, NULL, TRUE);
}
// This function is called when German language is selected.
void TAppWindow::CMGerman(RTMessage) {
 // First check this menu selection and uncheck others
 HMENU hMenu = GetMenu(HWindow);
 CheckMenuItem(hMenu, CM_GERMAN, MF_CHECKED);
 CheckMenuItem(hMenu, CM_ENGLISH, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_JAPANESE, MF_UNCHECKED);
 // Delete current message DLL and then instantiate the new one
 delete pMessageLookupClass;
 pMessageLookupClass = new DLMessageLookup(GERMAN);
 // Invalidate window to cause a repaint
 InvalidateRect(HWindow, NULL, TRUE);
}
// This function is called when Japanese language is selected.
void TAppWindow::CMJapanese(RTMessage) {
 // First check this menu selection and uncheck others
 HMENU hMenu = GetMenu(HWindow);
 CheckMenuItem(hMenu, CM_JAPANESE, MF_CHECKED);
 CheckMenuItem(hMenu, CM_ENGLISH, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_GERMAN, MF_UNCHECKED);
 // Delete current message DLL and then instantiate the new one
 delete pMessageLookupClass;
 pMessageLookupClass = new DLMessageLookup(JAPANESE);
 // Invalidate window to cause a repaint
 InvalidateRect(HWindow, NULL, TRUE);
}
// The following functions select the word to display
// This function is called Good Morning is clicked.
void TAppWindow::CMGoodMorning(RTMessage) {
 // First check this menu selection and uncheck others
 HMENU hMenu = GetMenu(HWindow);
 CheckMenuItem(hMenu, CM_GOODMORNING, MF_CHECKED);
 CheckMenuItem(hMenu, CM_GOODBYE, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_PLEASE, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_THANKYOU, MF_UNCHECKED);

 CheckMenuItem(hMenu, CM_CHEERS, MF_UNCHECKED);
 // Set the new message ID
 CurrentMessageID = GOODMORNINGMSGID;
 // Invalidate window to cause a repaint
 InvalidateRect(HWindow, NULL, TRUE);
}
// This function is called when Goodbye is clicked.
void TAppWindow::CMGoodbye(RTMessage) {
 // First check this menu selection and uncheck others
 HMENU hMenu = GetMenu(HWindow);
 CheckMenuItem(hMenu, CM_GOODBYE, MF_CHECKED);
 CheckMenuItem(hMenu, CM_GOODMORNING, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_PLEASE, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_THANKYOU, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_CHEERS, MF_UNCHECKED);
 // Set the new message ID
 CurrentMessageID = GOODBYEMSGID;
 // Invalidate window to cause a repaint
 InvalidateRect(HWindow, NULL, TRUE);
}
// This function is called when Please is clicked.
void TAppWindow::CMPlease(RTMessage) {
 // First check this menu selection and uncheck others
 HMENU hMenu = GetMenu(HWindow);
 CheckMenuItem(hMenu, CM_PLEASE, MF_CHECKED);
 CheckMenuItem(hMenu, CM_GOODMORNING, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_GOODBYE, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_THANKYOU, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_CHEERS, MF_UNCHECKED);
 // Set the new message ID
 CurrentMessageID = PLEASEMSGID;
 // Invalidate window to cause a repaint
 InvalidateRect(HWindow, NULL, TRUE);
}
// This function is called when Thankyou is clicked.
void TAppWindow::CMThankyou(RTMessage) {
 // First check this menu selection and uncheck others
 HMENU hMenu = GetMenu(HWindow);
 CheckMenuItem(hMenu, CM_THANKYOU, MF_CHECKED);
 CheckMenuItem(hMenu, CM_GOODMORNING, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_GOODBYE, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_PLEASE, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_CHEERS, MF_UNCHECKED);
 // Set the new message ID
 CurrentMessageID = THANKYOUMSGID;
 // Invalidate window to cause a repaint
 InvalidateRect(HWindow, NULL, TRUE);
}
// This function is called when Cheers is clicked.
void TAppWindow::CMCheers(RTMessage) {
 // First check this menu selection and uncheck others
 HMENU hMenu = GetMenu(HWindow);
 CheckMenuItem(hMenu, CM_CHEERS, MF_CHECKED);
 CheckMenuItem(hMenu, CM_GOODMORNING, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_GOODBYE, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_PLEASE, MF_UNCHECKED);
 CheckMenuItem(hMenu, CM_THANKYOU, MF_UNCHECKED);
 // Set the new message ID
 CurrentMessageID = CHEERSMSGID;

 // Invalidate window to cause a repaint
 InvalidateRect(HWindow, NULL, TRUE);
}
// This function is called when the About menu item is clicked
void TAppWindow::CMAbout(RTMessage) {
 MessageBox(HWindow, "by Craig A. Lindley",
 "Demo Dynalink Application", MB_OK);
}
void TSampleDynalinkApp::InitMainWindow() {
 MainWindow = new TAppWindow(NULL, Name);
}
int PASCAL WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
 LPSTR lpCmdLine, int nCmdShow) {
 TSampleDynalinkApp MyApp("Demo Dynalink Application Program",
 hInstance, hPrevInstance,
 lpCmdLine, nCmdShow);
 MyApp.nCmdShow = SW_SHOWMAXIMIZED;
 MyApp.Run();
 return MyApp.Status;
}










































Special Issue, 1994
A Generic About... Box Handler


Making use of code reuse




Joseph M. Newcomer


Dr. Newcomer received his PhD in 1975 in the area of compiler optimization. He
is a Windows consultant and applications developer based in Pittsburgh, PA.
His past experience has included computer graphics, document-processing
software, operating-systems development, compiler development, CASE tooling,
computer music, and real-time and embedded-systems development.


The generic "About..." box handler is usually written once and cloned many
times. Recently, I found myself generating an application that would spawn a
dozen or more "little servers," each of which looked like an ordinary
application (top-level menu system, and so on). Consequently, each one needed
an About... handler that gave its own name, potentially a separate copyright,
and its own icon. I also anticipated the need for internationalization, so I
didn't want to put any locale-specific literal text in the program itself. I
wanted to create a single binary that could be used by all the applications,
and as long as I was building a library, I decided to incorporate it into a
DLL that had some other support code needed by the applications. The ultimate
result was the code described here, which also supports a modeless About...
box. This latter requirement was imposed by the need to have the main window
continue to respond to messages, including user input, in a real-time system.
As you will see, the differences between this and the normal modal About...
box are minimal.
Figure 1 shows the modeless About... box. Its template is in Listing One and
its include file is in Listing Two . The AB_ identifiers are the static text
windows on the dialog and the icon. The MSG_ identifiers are for the
STRINGTABLE, and IDD_ABOUT is the identifier for the dialog resource.
The resource file (available electronically, see "Availability," page 3) for
the application contains all the application-specific descriptive text
required by the dialog. The application resource compilation is composed of
the following components: the application-specific resources, in however many
files you have chosen; the About... customization file, libver.rch, used to
define the parameters for the version resource; and the VERSIONINFO library
file, version.rch.
Figure 3 shows the relationship between the files. The objects on the right of
the vertical line are the shared files found in the library and include
directories; the objects to the left are the files private to the application.


The VERSIONINFO Structure


The VERSIONINFO structure is the key to recording version information for an
application. All of its pieces are documented in the SDK Programmer's
Reference Volume 4: Resources, (SDK4), pages 213--222, but it is difficult to
deduce exactly what is going on from the rather sketchy descriptions given in
this manual. It is ignored in all the Windows documentation I have found,
except for Jeffery Richter's lucid description in Windows 3.1: A Developer's
Guide (M&T Books, 1993). The structure I use is based upon his example in
Chapter 10, and is shown as the file version.rch in Listing Three . This file
is placed in a shared directory and is never modified. (I put it in the same
include directory as the about.h file.) To compile resources successfully, you
must name this directory on the command line to rc using the --I switch. 
Without repeating everything Richter explains so well, I will summarize the
lines in version.rch; see Table 1. Note that each of the strings must have an
explicit NUL byte, '\0', which terminates it. The file ver.h is part of the
standard Windows SDK and defines the symbols required for the version
declaration. Symbols such as VOS_DOS_WINDOWS16, VFT_APP, and the like, are
defined in this file; see SDK4, pages 215--217. I've defined the symbols that
begin with VER_; see the application-specific configuration file, libver.rc,
(Listing Four). The order and spaces of the initial definitions must be as
shown if you use the automated file version-incrementing technique discussed
later in this article. When this file is included in your application's
resource compilation, it includes the version.rch file, and thus the complete
version information, in your resource file. If you are using the Microsoft
Visual Workbench, you can cause this inclusion by selecting FileSet Includes_
and adding the file, as shown in Figure 2. 


Extracting the Version Information 


The version information is extracted by the procedures in abouter.c (Listing
Five) and verinf.c (Listing Six). When the About... handler starts up, its
WM_INITDIALOG message handler calls the procedure AboutVerInfo (Listing Five),
which extracts information from the application's VERSIONINFO resource and
places it in the dialog box. The OtherInfo procedure computes the standard,
useful environment information, such as the Windows version and the free
resources, and places that information in the dialog box. 
The version resource block is obtained by a call to my procedure,
GetInstanceVersion (Listing Seven). This takes an instance handle to the
module whose version information is to be obtained. The ver.dll operations do
not include an operation to obtain the version information of a running
program; however, the GetModuleFileName call can be used to get the filename
of the running program, and the GetFileVersionInfo call is used to obtain the
actual information block. The GetFileVersionInfoSize call is used to obtain
the size of the block allocated for GetFileVersionInfo to fill in. The
GetVersionString is a thin layer on top of VerQueryValue so that I don't have
to deal with the NULL test each time. This bulletproofs the handler against
missing information or an entire missing VERSIONINFO structure.
In the procedure AboutVerInfo, I first extract information from the program
that called the library. Therefore, I need the instance handle of the program
itself; the hInst variable which is declared external is the instance handle
of the DLL, set by the LibMain procedure, as shown in Listing Eight . This
hInst value is used to retrieve the information local to the About... box
handler, which must be retrieved from the DLL's resources. To get the handle
to the running application, I first ask for the parent window of the dialog.
With this handle, I can call GetWindowInstance (defined in windowsx.h) to get
the instance handle that I pass to the procedure. This returns an LPVOID
value, which is used in later version information queries. 
Once I have obtained the version information block, I can extract the fields
from it. I call SetAboutTitle to set the heading of the window. This procedure
calls GetVersionString, which is part of ver.dll, and ideally obtains a
pointer to the string that is the value. I then load a string from the
STRINGTABLE, which is the phrase "About"; since this is in the resource file,
it can be readily internationalized. I force a space to follow the string if
one is not already there, then add the product name and do a SetWindowText.
The remaining fields in the dialog box are filled in by passing in the control
ID of the static text box in the dialog box and the query string for the
version string that is to be set in it. Finally the VerFree procedure is
called to release the space of the version resource.
The code in OtherInfo is straightforward. The only clever trick is to send a
WM_QUERYDRAGICON to get an HICON handle to the icon to display in the About...
box. This allows the generic dialog box to display the proper icon of its
parent application. 


Win32s/Win32c Considerations 


This code was originally developed to run under Win32s and Win16, with the
anticipation that it will be ported to the Chicago Win32c interface in the
future. Therefore, the file export.h is used to define the linkage types for
the exported procedures. The Win16 version is in Listing Nine . For the 32-bit
implementations the definitions of EXPORTPROC and DLLEXPORTPROC are empty.


Version Numbering


Automatic version numbering is quite convenient for maintaining sanity. I have
been using automatic version numbering in build procedures for about a decade,
and have used a variety of tools to automatically update the version number on
every build. The tool shown here is an awk script. The awk language is a
string-processing, pattern-matching language created by Alfred V. Aho, Brian
W. Kernighan, and Peter J. Weinberger at AT&T Bell Laboratories. It is popular
on UNIX systems, and a number of commercial, shareware, and freeware
implementations of varying quality and conformance to the "standard" UNIX
version now run on PCs. I have used two without problems: the MKS Toolkit
version and the Thompson Automation version. The GNU version is available from
the Austin Codeworks and elsewhere. An alternative to awk is PERL, also widely
available; however, I used awk for this project and will limit my description
to that language.
The program shown in Listing Nine is a relatively simple use of awk. For more
details on awk, refer to The AWK Programming Language, by Aho, Kernighan, and
Weinberger (Addison-Wesley, 1988, ISBN 0-201-07981-X). The awk language is one
of those truly marvelous tools that can make life a little less exasperating.
I have used it to validate databases, do statistical analysis of programs to
determine the cost of a language translation of 300K+ lines of source, analyze
the statistics of a large lexicon, and perform consistency checks in a
hypertext system. 
In this project I use awk to increment the version numbers. The number I need
to update is the FileVersion which appears both in a binary form (a 4-tuple)
and a text string. The ProductVersion essentially remains constant over a long
period of time, as it represents an interface specification and is not
expected to change often. 
The awk program is shown in Listing Ten . Before the program starts, I set the
field separator, FS, to a regular expression that will cause a split at
periods, commas, backslashes, spaces, and tabs. The patterns are matched
against the input lines, and any one that matches a nonempty pattern will
cause a new line to be written. It then executes the next operation, which
causes the next line to be read, at which point the matching restarts from the
top. (Think of this as equivalent to the continue statement applied to the
read-line/match loop). If no pattern matches, the last pattern-action
statement has no pattern, and will consequently be executed for every line
that gets that far. This causes the input line to be written out verbatim.
I have required that the VER_FILEVERSION string appear first; this is the
pattern match that triggers the incrementing of the version number. When the
VER_BINARY_FILEVERSION line is found, its version number is replaced by the
incremented version number that I have stored. In this way, the two version
numbers never get out of sync. 
In my normal build procedure, this awk program is run any time a successful
link has completed. One unfortunate feature of the Microsoft Visual Workbench
is that it does not provide for insertion of user-defined tools into the build
process unless an external, manually maintained makefile is used, which
obviates many of the advantages of the visual workbench, such as maintaining
correct dependency lists for the build. 
The default action of awk is to take its input from stdin and write its output
to stdout. I use the input and output redirection command-line options to
establish these connections. However, it is not possible to both read and
write the same file via redirection, so I output to a temporary file and then
copy that back to the input, as shown in the batch file in Listing Eleven .


Implementing a Modeless About... Box



The constraints of this project required that I not preempt the main
GetMessage loop of the application by having a modal box. I did not want the
application to have to deal with the details of this, so I defined a simple
protocol for invoking the About... box that uses a state descriptor to
determine if an About... box was currently active. This state descriptor is
the ABOUT structure defined in the include file abouter.h (Listing Twelve).
The protocol is that the About() procedure is called with a pointer to an
ABOUT data structure. The structure is initialized to {NULL, NULL} and will
not be freed up during program execution. (It is either statically or globally
allocated once at the start of the program.) This is shown in Listing Thirteen
, where the top-level command loop has been modified to handle the
IsDialogMessage call required by a modeless dialog box. If the About... box
has not been activated, the window component of the ABOUT data structure is
NULL and nothing will happen. The About... menu handler simply calls the
procedure About() whether the box is active or not.
In Listing Five, the About() procedure first tests that the HWND parameter is
non-NULL; without a valid window handle, I cannot locate the version
information, so I refuse to do anything if there is no parent window. If there
is no instance thunk, I create one. Next, I check to see if a current About...
dialog is being displayed. If so, I simply use SetWindowPos to move it to the
top of the Z-axis list (HWND_TOP) and make sure it is visible
(SWP_SHOWWINDOW). If there is no window instance, I create one with
CreateDialogParam, passing in the pointer to the ABOUT structure.
Consulting the handler procedure AboutDlgProc, also in Listing Five, I first
get the value of the DWL_USER word, which will initially be NULL. I then
handle the WM_INITDIALOG message by storing lParam, the pointer to the ABOUT
structure, in the DWL_USER word. On a WM_DESTROY message I see if the
AboutData pointer, which would have been initialized from the GetWindowLong
call on each entry, is non-NULL. If it is, I set the window handle to NULL so
the next call on About() will be forced to create a new window. It is
important that this information be kept with the window and not stored in a
static variable because variables declared in a DLL are shared by all callers
of the DLL. If I had stored the ABOUT structure in the DLL, the first modeless
box to close would have NULLed out the handle of the last dialog box opened;
any subsequent calls on the About() procedure from an application that already
had an open About... box would cause a new About... box to be created. In this
handler, the only static storage is the HINSTANCE handle to the DLL itself,
which is private to the DLL. All other storage is either stack storage or
passed in as pointers (such as the ABOUT structure pointer) to each caller's
private space. 
This illustrates a point about DLLs, particularly those with message loops
that can yield: A DLL's data segment is shared with all callers, in a
multithreaded fashion, and is therefore subject to being modified between any
two events. Suppose two of my miniservers had their About... boxes up. By
switching focus between the two About... boxes (and doing nothing else), the
handler code is executed in an alternating fashion between the two
applications. It must therefore be reentrant, and this code meets that
criterion. 
Figure 1 About... box.
Figure 2 Adding a custom resource include from Microsoft's AppStudio. 
Figure 3 Structure of files.
Table 1: Fields used in VERSIONINFO structure.
 Field Description 
 FILEVERSION Specifies file's version
 number; four 16-bit values. For example,
 "1,2,4,10" would specify
 version 1.2.4.10. In my application, I use
 only the first two values.
 PRODUCTVERSION Specifies product version for which this
 file is distributed. Four
 16-bit values, as in FILEVERSION.
 FILEFLAGMASK Should be set to VS_FFI_FILEFLAGMASK.
 FILEFLAGS A set of flags that describe the file
 type--debug, patched, private,
 prerelease, or special.
 FILEOS Specifies operating system for which file
 was designed. I use VOS_DOS_WINDOWS16.
 FILETYPE Type of file; for example, applications
 are VFT_APP, and DLLs are VFT_DLL.
 FILESUBTYPE Specifies details of what the file does;
 for example, designating
 what device a driver is intended for or
 font style. For applications,
 it is always VFT2_UNKNOWN.
 VarFileInfo\Translation Specifies language and character set.
 Currently hardwired as 0x0409 (U.S. English)
 and 1252 (Windows, Multilingual character set).
 Once internationalization begins, this
 implementation will change.
 BLOCK "040904E4" Designates the block of information that
 follows as belonging to U.S. English (0x0409)
 and Multinational character set (1252=0x04E4).
 StringFileInfo\Comments A placeholder for comments. My applications use
 this field. In these examples, the comments are
 not meaningful.
 StringFileInfo\CompanyName* The name of the company that produced the
 file. Particularly useful
 if you can have third-party add-on DLLs
 or executables.
 StringFileInfo\FileDescription* A file description that (in Richter's
 words) is "to be presented to the user."
 This is what I do with it in the About... box.
 StringFileInfo\InternalName* The file's version number. This is actually
 redundant with the binary FILEVERSION information,
 but both are required.
 StringFileInfo\InternalName* The "internal name" for the file.
 StringFileInfo\LegalCopyright Copyright notice for the file. I extract it and
 put it in the About... box.
 StringFileInfo\OriginalFileName* The "original name" of the file in
 case the user renames it. Presumably allows
 installation procedures to locate older but
 differently named copies of a DLL.

 StringFileInfo\ProductName* Name of the product for which this is
 distributed. Would ideally allow
 an UnInstall mechanism to locate all
 DLLs and executables related to a single product.
 StringFileInfo\ProductVersion* Version number of the product. Like
 StringFileInfo\FileVersion, this is redundant
 with the FILEVERSION attribute.
 *Required fields

Listing One

#include "resource.h"
#include "windows.h"

IDD_ABOUT DIALOG DISCARDABLE 0, 0, 185, 161
STYLE DS_MODALFRAME WS_POPUP WS_VISIBLE WS_CAPTION WS_SYSMENU
CAPTION "About..."
FONT 8, "MS Sans Serif"
BEGIN
 ICON "",AB_ICON,159,7,18,20
 CTEXT "",AB_PROGTITLE,0,26,191,11
 CTEXT "",AB_COPYRIGHT,0,37,191,8
 LTEXT "Text",AB_PROGVERSION,103,57,46,8
 LTEXT "Program Version:",-1,41,57,59,8
 GROUPBOX "",IDD_ABOUT,30,49,122,31
 LTEXT "Text",AB_FILEVERSION,103,67,46,8
 LTEXT "File Version:",-1,41,67,59,8
 LTEXT "(mode)",AB_MODE,29,108,124,8
 LTEXT "00%",AB_RESOURCES,93,128,60,8
 LTEXT "3.xx",AB_WINVER,93,98,65,8
 LTEXT "00%",AB_MEMORY,93,118,67,8
 LTEXT "Windows version:",-1,30,98,62,8
 LTEXT "Free memory:",-1,30,118,52,8
 LTEXT "Free resources:",-1,30,128,52,8
 DEFPUSHBUTTON "OK",IDOK,67,143,50,14
END

STRINGTABLE DISCARDABLE 
BEGIN
 MSG_STANDARD "Standard Mode"
 MSG_ENHANCED "Enhanced mode"
 MSG_ABOUT "About "
END

#include "libver.rc"





Listing Two

#define MSG_ENHANCED 22
#define MSG_STANDARD 23
#define MSG_ABOUT 24
#define IDD_ABOUT 102
#define AB_PROGVERSION 107
#define AB_FILEVERSION 109
#define AB_ICON 110

#define AB_WINVER 118
#define AB_MEMORY 119
#define AB_RESOURCES 120
#define AB_MODE 121
#define AB_COPYRIGHT 122
#define AB_PROGTITLE 123




Listing Three

#include "ver.h"
VS_VERSION_INFO VERSIONINFO
 FILEVERSION VER_BINARY_FILEVERSION
 FILEOS VOS_DOS_WINDOWS16
 FILETYPE VER_FILETYPE
 BEGIN
 BLOCK "VarFileInfo"
 BEGIN
 VALUE "Translation", 0x0409, 1252
 END
 BLOCK "StringFileInfo"
 BEGIN
 VALUE "Comments", VER_COMMENTS
 VALUE "CompanyName", VER_COMPANYNAME
 VALUE "FileDescription", VER_FILEDESCRIPTION
 VALUE "FileVersion", VER_FILEVERSION
 VALUE "InternalName", VER_INTERNALNAME
 VALUE "LegalCopyRight", VER_LEGALCOPYRIGHT
 VALUE "OriginalFileName", VER_ORIGINALFILENAME
 VALUE "ProductName", VER_PRODUCTNAME
 VALUE "ProductVersion", VER_PRODUCTVERSION
 END
 END




Listing Four

/* These lines must appear in this order */
#define VER_FILEVERSION "1.117\0"
#define VER_BINARY_FILEVERSION 1, 117, 0, 0
/* --- */
#define VER_PRODUCTVERSION "1.20\0"
#define VER_BINARY_PRODUCTVERSION "1, 20, 0, 0


#define VER_FILETYPE VFT_APP
#define VER_COMMENTS "------------------\0"
#define VER_COMPANYNAME "FooBar Associates, Ltd.\0"
#define VER_FILEDESCRIPTION "Sample About Handler\0"
#define VER_INTERNALNAME "Sample About Handler\0"
#define VER_LEGALCOPYRIGHT ") 1993, Joseph M. Newcomer., All Rights
Reserved\0"
#define VER_ORIGINALFILENAME "ABOUT.EXE\0"
#define VER_PRODUCTNAME "Sample About Handler\0"

#include "version.rch"





Listing Five

#define _STRICT_
#include <windows.h>
#include <windowsx.h>
#include <ver.h>
#include <stdlib.h>
#include <string.h>

#include "export.h"
#include "abouter.h"
#include "resource.h"
#include "verinf.h"

extern HINSTANCE hInst; // Library instance handle set by LibMain

/****************************************************************************
* SetAboutTitle
* Inputs:
* HWND hDlg: Dialog window reference
* LPVOID VerInfo: Version information packet
* LPSTR: Query for field which will hold title
* Result: void
* Effect: Changes the title of the dialog
****************************************************************************/

void SetAboutTitle(HWND hDlg, LPVOID VerInfo, LPSTR query)
 {
 char title[256];
 LPSTR data = GetVersionString(VerInfo, query);

 LoadString(GetWindowInstance(hDlg), MSG_ABOUT, (LPSTR)title, sizeof(title));
 if(title[lstrlen(title) - 1] != ' ')
 lstrcat(title, " ");

 if(data != NULL)
 lstrcat(title, data);

 SetWindowText(hDlg, title);
 }

/****************************************************************************
* SetDlgVersionString
* Inputs:
* HWND hDlg: Dialog whose control text is to be set

* UINT id: Control id in dialog whose text is to be set
* LPVOID VerInfo: Version information block
* LPSTR query: Query string
* Result: void
* Effect: Obtains the text VerInfo.query and sets it in hDlg.id
****************************************************************************/

static void SetDlgVersionString(HWND hDlg, UINT id, LPVOID VerInfo, LPSTR
query)
 {

 LPSTR data;

 data = GetVersionString(VerInfo, query);
 if(data != NULL && lstrlen(data) != 0)
 { /* has data */
 SetDlgItemText(hDlg, id, data);
 } /* has data */
 }

/****************************************************************************
* AboutVerInfo
* Inputs:
* HWND hDlg: Dialog window handle
* Result: void
* Effect: Fills in the fields of the dialog from the VERSIONINFO blocks for
* the program and data files
****************************************************************************/

static void AboutVerInfo(HWND hDlg)
 {
 HWND hWnd = GetParent(hDlg); /* get main window */
 LPVOID ProgVerInfo = NULL;
 UINT winver = (UINT)GetVersion();

 /****************************************************************
 * In order to get the version info, we need the file name of the file
 * We use GetModuleFileName to get the name and from that derive
 * the version information
 ****************************************************************/

 ProgVerInfo = GetInstanceVersion(GetWindowInstance(hWnd));

 SetAboutTitle(hDlg, ProgVerInfo, "\\StringFileInfo\\ProductName");

 SetDlgVersionString(hDlg, AB_FILEVERSION, ProgVerInfo,
 "\\StringFileInfo\\FileVersion");

 SetDlgVersionString(hDlg, AB_PROGVERSION, ProgVerInfo,
 "\\StringFileInfo\\ProductVersion");

 SetDlgVersionString(hDlg, AB_PROGTITLE, ProgVerInfo,
 "\\StringFileInfo\\FileDescription");


 SetDlgVersionString(hDlg, AB_COPYRIGHT, ProgVerInfo,
 "\\StringFileInfo\\LegalCopyright");

 VerFree(ProgVerInfo);

 }

/****************************************************************************
* OtherInfo
* Inputs: HWND hDlg: Handle to dialog window
* Result: void
* Effect: Sets the other information such as CPU type in the dialog box
* Notes: Uses WM_QUERYDRAGICON to get the icon of the application
****************************************************************************/


void OtherInfo(HWND hDlg)
{
 char mode[256];
 int cpu;
 DWORD flags = GetWinFlags();
 char * plus = "";
 char tmp[20];

 /* Figure out CPU type */
 if(flags & WF_CPU286)
 cpu = 286;
 else
 if(flags & WF_CPU386)
 cpu = 386;
 else
 if(flags & WF_CPU486)
 cpu = 486;
 else
 { /* bigger */
 cpu = 486;
 plus= ">";
 } /* bigger */
 wsprintf(mode, "%s%d ", (LPSTR)plus, cpu);

 /* Figure out mode */
 if(flags & WF_ENHANCED)
 LoadString(GetWindowInstance(hDlg), MSG_ENHANCED, 
 (LPSTR)&mode[lstrlen(mode)], sizeof(mode) - lstrlen(mode));
 else
 if(flags & WF_STANDARD)
 LoadString(GetWindowInstance(hDlg), MSG_STANDARD, &mode[lstrlen(mode)],
 sizeof(mode) - lstrlen(mode));
 
 SetDlgItemText(hDlg, AB_MODE, mode);


 wsprintf(tmp,"%d.%02d", LOBYTE(winver), HIBYTE(winver));
 SetDlgItemText(hDlg, AB_WINVER, tmp);

 { /* Free space */
 long freespace = GetFreeSpace(0);
 wsprintf(tmp,"%ldK", freespace/1024L);
 SetDlgItemText(hDlg, AB_MEMORY, tmp);
 } /* Free space */
 
 { /* Percentage free */
 UINT free = GetFreeSystemResources(GFSR_SYSTEMRESOURCES);
 wsprintf(tmp,"%d%%", free);
 SetDlgItemText(hDlg, AB_RESOURCES, tmp);
 } /* Percentage free */

 /* Get the icon, if there is one. To do this, we do a QUERYDRAGICON
 request on the parent window. It will either be handled by the parent
 and return a specific icon, or handled by DefWindowProc in the parent
 and return the class icon. If for some reason it returns NULL, we
 don't set the icon (we may want to put a default icon in here later).
 */
 {
 HICON icon = SendMessage(GetParent(hDlg), WM_QUERYDRAGICON, 0, 0L);

 if(icon != NULL)
 Static_SetIcon(GetDlgItem(hDlg, AB_ICON), icon);
 }
 
}

/****************************************************************************
* AboutDlgProc
* Effect: Processes messages for "About" dialog box
* Messages: WM_INITDIALOG - initialize dialog box
* WM_COMMAND - Input received
****************************************************************************/

BOOL DLLEXPORTPROC AboutDlgProc(HWND hDlg,unsigned message,
 WORD wParam,LONG lParam)
 {
 // Load the LPABOUT pointer (it may be NULL)
 LPABOUT AboutData = (LPABOUT) GetWindowLong(hDlg, DWL_USER);

 switch (message) 
 { /* message */
 case WM_DESTROY:
 {
 if(AboutData != NULL)
 AboutData->hAbout = NULL;
 }
 return TRUE;

 case WM_INITDIALOG:
 // Set up our pointer so this window has a reference
 // to the AboutData block

 AboutData = (LPABOUT)lParam;
 SetWindowLong(hDlg, DWL_USER, lParam);

 AboutVerInfo(hDlg);
 void OtherInfo(HWND hDlg);
 return TRUE;

 case WM_COMMAND:
 if (wParam == IDOK
 wParam == IDCANCEL) 
 {
 DestroyWindow(hDlg);
 return TRUE;
 }
 break;
 default:
 break;
 } /* message */
 return FALSE;
 }

/****************************************************************************
* About
* Inputs: HWND hWnd: Parent window
* LPABOUT AboutData: A reference to a block of data (which must be
* initialized to zeros!) which holds transient status.
* Result: void

* Effect: Displays the 'about' dialog box
* Notes: Because of realtime considerations, we don't want to create a
* modal dialog box.
****************************************************************************/

void DLLEXPORTPROC About(HWND hWnd, LPABOUT AboutData)
 {
 if(hWnd == NULL)
 return; // requires non-NULL parent

 if(AboutData->AboutProc == NULL)
 { /* create procinstance */
 AboutData->AboutProc = MakeProcInstance(AboutDlgProc, hInst);
 } /* create procinstance */

 if(AboutData->hAbout != NULL)
 { /* show it */
 SetWindowPos(AboutData->hAbout, HWND_TOP, 0, 0, 0, 0, 
 SWP_NOMOVE SWP_NOSIZE SWP_SHOWWINDOW);

 return;
 } /* show it */

 AboutData->hAbout = CreateDialogParam(hInst, 
 MAKEINTRESOURCE(IDD_ABOUT),
 hWnd,
 AboutData->AboutProc,
 (LPARAM)AboutData);
 }




Listing Six

#define _STRICT_
#include <windows.h>
#include <windowsx.h>
#include <ver.h>
#include <stdlib.h>

#include "export.h"
#include "verinf.h"

/****************************************************************************
* GetInstanceVersion
* Inputs: HINSTANCE hInst: Handle to instance of object
* Result: void
* Effect: Obtains the VersionInfo data and puts it in the buffer
****************************************************************************/

LPVOID DLLEXPORTPROC GetInstanceVersion(HINSTANCE hInst)
 {
 char fname[_MAX_PATH];
 int result;
 DWORD size;
 DWORD handle;
 LPVOID buffer;


 result = GetModuleFileName(hInst, fname, sizeof(fname));
 if(result == 0)
 { /* no file */
 return NULL;
 } /* no file */

 size = GetFileVersionInfoSize(fname, &handle);

 if(size == 0)
 { /* no data */
 return NULL;
 } /* no data */

 /* We do not expect size to exceed 65K */

 buffer = (LPVOID)GlobalAllocPtr(GHND, (size_t)size);
 if(buffer == NULL)
 { /* no memory */

 return NULL;
 } /* no memory */
 result = GetFileVersionInfo(fname, handle, (DWORD)(size_t)size, buffer);
 if(!result)
 { /* load failed */
 GlobalFreePtr(buffer);
 return NULL;
 } /* load failed */
 return buffer;
 }

/****************************************************************************
* GetVersionString
* Inputs: void * VerInfo: Version information block
* LPCSTR query: Query string
* Result: char * Obtains the text VerInfo.query
* Effect: 
****************************************************************************/

LPSTR DLLEXPORTPROC GetVersionString(LPVOID VerInfo, LPCSTR query)
 {
 UINT length;
 void FAR * data;

 if(VerInfo == NULL)
 return "<<no version info found>>";

 VerQueryValue(VerInfo, query, &data, &length);
 return data;
 }

/****************************************************************************
* VerFree
* Inputs: LPVOID data: Data to free
* Result: void
* Effect: Frees the storage allocated for the data
****************************************************************************/

void DLLEXPORTPROC VerFree(LPVOID data)
 {

 if(data == NULL)
 return;

 GlobalFreePtr(data);
 }




Listing Seven

LPSTR DLLEXPORTPROC GetVersionString(LPVOID VerInfo, LPCSTR query);
LPVOID DLLEXPORTPROC GetInstanceVersion(HINSTANCE hInst);
void DLLEXPORTPROC VerFree(LPVOID VerInfo);




Listing Eight

#define _STRICT_
#include <windows.h>
#include "export.h"

HINSTANCE hInst;

BOOL WINAPI LibMain(HANDLE hInstance, WORD wDataSeg, WORD wHeapSize, LPSTR
lpszCmdLine)
 {
 hInst = hInstance; // Store this so members of library can find it
 if (wHeapSize > 0)
 UnlockData(0);
 return 1;
 }

int DLLEXPORTPROC WEP(int wParam)
 {
 return 1;
 }





Listing Nine

#ifndef EXPORTPROC
#define EXPORTPROC FAR PASCAL __export
#define DLLEXPORTPROC EXPORTPROC __loadds
#endif



Listing Ten

BEGIN { FS= "[\\.\\\\, ]+"}

#############################################################################
# #define VER_FILEVERSION "1.20\0"
# 

# $1 $2 $3$4$5
#############################################################################

/VER_FILEVERSION/ {
 FileVersion = $4 + 1
 printf("%s %s %s.%s\\%s\n", $1, $2, $3, FileVersion, $5)
 next }


#############################################################################
# #define VER_BINARY_FILEVERSION 1, 20, 0, 0
# 
# $1 $2 $3 $4 $5 $6
#############################################################################

/VER_BINARY_FILEVERSION/ {
 printf("%s %s %s, %d, %s, %s\n", $1, $2, $3, FileVersion, $5, $6)
 next}


#############################################################################

 { printf("%s\n", $0) }




Listing Eleven

awk -f version.awk <libver.rc >$$$.tmp
copy $$$.tmp libver.rc
del $$$.tmp




Listing Twelve

typedef struct {
 FARPROC AboutProc;
 HWND hAbout;
 } ABOUT, FAR * LPABOUT;

void DLLEXPORTPROC About(HWND hWnd, LPABOUT AboutData);




Listing Thirteen

// At the head of the file
static ABOUT AboutData = {NULL, NULL};

//*****************************************************************************
// The main dispatch loop
 while(GetMessage(&msg, NULL, 0, 0))
 { /* dispatch */
 if(hAccel == NULL !TranslateAccelerator(hWnd, hAccel, &msg))
 { /* not accelerator */

 // In case we have a modeless About... box we process it here
 // If we start doing more than that, we will use the general
 // registry mechanism (DDJ 18,5 (May 1993): "Modless Dialog Boxes
 // for Windows").

 if(AboutData.hAbout != NULL && 
 IsDialogMessage(AboutData.hAbout,&msg))
 continue;

 TranslateMessage(&msg);
 DispatchMessage(&msg);
 } /* not accelerator */
 } /* dispatch */

//*****************************************************************************
// In the handler:

 case ID_ABOUT:
 About(hWnd, &AboutData);
 return 1;










































Special Issue, 1994
DOS Pipes for Windows


Making program output available before the program ends




Al Williams


Al is the author of DOS and Windows Protected Mode and Commando Windows
Programming, both published by Addison-Wesley. Al can be contacted at 310 Ivy
Glen Court, League City, TX 77573 or on CompuServe at 72010,3574.


Traditional multitasking operating systems (such as UNIX) let you chain
programs together using pipes. Even DOS provides a crude form of pipes. Often,
you use pipes from within a program to collect data from another program. For
example, instead of writing your own search routines, you could call grep from
your program and read its output.
Unfortunately, Windows doesn't support pipes--not surprising since Windows
doesn't support standard I/O streams. However, you can run DOS programs from a
Windows application. In this article, I'll show you a way to run a DOS program
(PKUNZIP, in this case), collect output as PKUNZIP creates it, and route it to
a cooperating Windows program. This is different from collecting the data in a
file and then reading the file. With the latter method, you must wait until
the program finishes to examine its output. With a true pipe, the program's
output is available even before the program ends.


Secret APIs


One of the things that makes programming for Windows unusual is the variety of
APIs it presents to the programmer. Sure, the Programmer's Reference covers
the basic API, but there are many extension APIs: DPMI and VDS, to name two.
While all of these interfaces are documented, many are not well documented or
the documentation is difficult to find.
One of these semisecret APIs is the interrupt 0x2F interface. Interrupt 0x2F
is the DOS multiplex interrupt used by a variety of TSR and similar programs.
Windows supports an unusual set of functions via interrupt 0x2F with AH=0x17.
The most interesting function this API provides allows a DOS program to
manipulate the Windows Clipboard.


Using the Clipboard from DOS


In 386 Enhanced Mode, DOS programs (running in a DOS box) can cut and paste
items to the Clipboard using interrupt 0x2F. Table 1 shows the API. Note that
DOS programs can only use certain Clipboard formats--in practice, anything
other than text is too much trouble anyway.
Putting data on the Clipboard from DOS is simple. Just follow these steps:
1. Make sure Windows is running in 386 Enhanced mode.
2. Make sure Clipboard access is supported (function 0x1700).
3. Open the Clipboard (function 0x1701).
4. Compact the Clipboard (function 0x1709); make sure there is enough space
for your data.
5. Copy your data (function 0x1703).
6. Close the Clipboard (function 0x1708).
Reading the Clipboard is nearly the reverse of the above:
1. Make sure Windows is running in 386 Enhanced Mode.
2. Make sure Clipboard access is supported.
3. Open the Clipboard.
4. Make sure the Clipboard has the data type you need (function 0x1704).
5. Paste the data (function 0x1705).
6. Close the Clipboard.
If you need to pass data between a DOS program and a Windows application, the
Clipboard is one way to do it. Andrew Schulman's series of articles in PC
Magazine (see "References") discusses a system that allows DOS programs to
call nearly any Windows API function using this technique. You can also use
shared-memory buffers allocated before Windows starts (see DOS and Windows
Protected Mode) inside a virtual device driver (VxD). 


An Example


Using the Clipboard from DOS is similar to using it from Windows--only the
method of calling the API differs. Since the Clipboard is one of the few
Windows services you can access directly from a DOS program, you may need to
use it to provide "live" communications between the DOS and Windows worlds. 
To illustrate this, suppose you want to run a DOS program from inside Windows,
collect the output, and process it in a Windows application. You could use
files, but then you would need a way to synchronize the two programs. (You
can't read the file in the Windows program until you finish writing to it from
the DOS program.) In addition, you can't easily communicate in both directions
at once. 
A better solution is to collect the output in real time and pass it using the
Windows Clipboard. That's exactly what CLIPSH does (see Listing One). CLIPSH
hooks interrupt 0x29 (the DOS CON driver interrupt) with its int29() function.
When this routine collects an entire line of output in the line array,
flush_clip() passes it to the Clipboard.
Before CLIPSH sends an output line, it waits until the Clipboard contains the
string CLIPSH:RDY. Then it places the line on the Clipboard preceded by a hex
FF byte. This helps distinguish CLIPSH text from ordinary text. (Remember, the
DOS clipboard interface doesn't support custom formats.) If CLIPSH finds
CLIPSH:ABT instead of CLIPSH:RDY, it terminates the executing program.
When CLIPSH is finished, it places CLIPSH:END on the Clipboard. This is the
signal for any cooperating program that the output is complete.
After CLIPSH installs int29(), it collects its command-line arguments and
passes them to the standard system() call. The int29() function then collects
output from the child process--not CLIPSH itself.
If you can run a program and collect its output using the DOS redirection
operator, it will work with CLIPSH. Of course, CLIPSH is most useful for
running programs that you ordinarily can't modify--DIR, CHKDSK, or PKUNZIP,
for example. If you are using your own programs, you could just output to the
Clipboard directly and bypass the interrupt 0x29 handling. 



The Details


The heart of CLIPSH is flush_clip() (see Listing One). Its first duty is to
switch to a private stack (the int_stack array). This requires some simple
assembly code and prevents CLIPSH from overrunning the interrupted program's
stack (which may be quite small).
Next, CLIPSH enables interrupts. When an interrupt occurs, such as the
interrupt 0x29 that triggered flush_clip(), the CPU automatically disables
further interrupts until the service routine completes. However, flush_clip()
may have to wait for a cooperating Windows program to place the CLIPSH:RDY
string on the Clipboard. While interrupts are disabled, no other programs can
run. This causes a deadlock where CLIPSH is waiting for output from a program
that can't execute. With interrupts enabled, other programs are free to run
while CLIPSH is waiting.
When CLIPSH is waiting, it uses interrupt 0x2F, function 0x1680 to release its
time slice to Windows. This allows other programs to execute without waiting
for CLIPSH's time slice to expire.
You may need a PIF file for CLIPSH, unless your _DEFAULT.PIF file happens to
contain the proper settings. You'll need the background-processing box
checked, and you may want to give CLIPSH a high background priority.


Using CLIPSH


The WZIP program uses CLIPSH to execute PKUNZIP. By using file-open dialogs,
WZIP provides a simple way to expand ZIP files and view their contents. Of
course, PKUNZIP and CLIPSH must be in the current directory or on your path. 
WZIP consists of several source files, all of which are available
electronically; see "Availability," page 3. WZIP.C contains the program's main
logic. CLIP.C holds some simple Clipboard functions. WINCD.C provides a dialog
box that selects a directory. Since WZIP is a simple, text-based program, it
uses an edit control to display text. The EWINDOW.C file makes this possible. 
As you run WZIP, you'll find that if the CLIPSH window is in the foreground,
everything proceeds smoothly. However, if the CLIPSH window is in the
background (or hidden), WZIP runs more slowly. Although it is more aesthetic
to hide the CLIPSH program, it is often too slow to be useful. It appears as
though Microsoft Visual C/C++ has the same problem.
The ClipExec() function (see WZIP.C) does most of WZIP's work by placing the
CLIPSH:RDY string on the Clipboard and executing CLIPSH via WinExec(). As
lines appear on the Clipboard, WZIP uses wputs() to add lines to the edit
control that serves as the main window (see EWINDOW.C).


The Bad News


Of course, there's no free lunch. CLIPSH has some drawbacks. First, it
destroys whatever is on the Clipboard, which could annoy an unsuspecting user.
(See the "References" for information on saving some Clipboard formats.)
Additionally, programs that output via the BIOS or require user interaction
won't work properly.
Still, many useful DOS programs will work with CLIPSH. By joining a Windows
front end to a DOS workhorse, it's easy to write some surprisingly powerful
programs. 


Conclusion


Although it isn't perfect, the Clipboard does allow you to create a
DOS/Windows pipe. You could use a variety of programs with CLIPSH to reduce
the burden on your own programs: grep, PKUNZIP, DIR, or even COMMAND.COM. Of
course, you can also use the DOS clipboard techniques to incorporate clipboard
support in your own DOS programs.


References


Brown, Ralf, et al. PC Interrupts. Reading, MA: Addison-Wesley, 1991.
Schulman, Andrew. "Accessing the Windows API from the DOS Box." PC Magazine
(August/September/October 1992).
Shaw, Richard Hale. "Save Multiple Items to the Clipboard with CLIPSTAC." PC
Magazine (August 1992).
Williams, Al. DOS and Windows Protected Mode. Reading, MA: Addison-Wesley,
1993.
Williams, Al. Commando OLE 2.0 and DDE Programming. Reading, MA:
Addison-Wesley, 1993. 
Speeding Up Visual C++
Visual C++'s Visual Workbench apparently uses a method similar to CLIPSH to
run the Microsoft compiler and tools. A cursory examination shows that a DOS
program, WINTEE, runs the compiler and sends its output to Visual Workbench
using some mechanism.
Like CLIPSH, when the WINTEE window is hidden (which it is by default),
performance slows to a crawl. Luckily there's an undocumented way to make the
WINTEE window visible and improve your compilation speed.
First use the PIF editor to modify WINTEE.PIF. You should give it an extremely
high background priority. Next, start Visual Workbench with the undocumented
/v switch. Now when you compile, you'll notice a WINTEE window will appear. If
you want to get maximum performance, move the WINTEE window to the foreground.
I don't know if WINTEE uses the Clipboard to pass data to Visual Workbench.
However, both WINTEE and CLIPSH reveal some interesting behavior about the
Windows scheduler and its relationship to DOS programs.
--A.W.
Table 1: (a) The DOS-box clipboard API; (b) format codes.
(a)
Get WINOLDAP version Input: AX=0x1700
 Output: if AX=1700 then WINOLDAP does not support clipboard
 access, otherwise:
 AL=major version
 AH=minor version
Open Clipboard Input: AX=0x1701
 Output: AX=0 on failure
Clear Clipboard Input: AX=0x1702
 Output: AX=0 on failure
Copy to Clipboard Input: AX=0x1703
 DX=format code

 SI:CX=data length
 ES:BX=pointer to data
 Output: AX=0 on failure
Query Clipboard Input: AX=0x1704
 DX=format code
 Output: DX:AX=size of data (or zero if unavailable)
Paste from Clipboard Input: AX=0x1705
 DX=format code
 ES:BX=pointer to buffer
 Output: AX=0 on failure
Close Clipboard Input: AX=0x1708
 Output: AX=0 on failure
Compact Clipboard Input: AX=0x1709
 SI:CX=required size
 Output: DX:AX=Largest block available (or zero on error)
(b)
Code Description
CF_TEXT Ordinary text
CF_OEMTEXT Text in machine character set
CF_BITMAP Bitmap
CF_METAFILEPICT Windows metafile
CF_SYLK Excel format
CF_DIF Data-interchange format
CF_TIFF TIFF graphics
CF_DIB Device-independent bitmap
CF_PALETTE Palette for bitmap

Listing One

/* This program runs a DOS program and pipes its output
to the Windows clipboard -- Williams */
#include <stdio.h>
#include <dos.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h> /* for isspace */

#define CLIP_TEXT 1

/* Basic clipboard operations */
int clip_open(void);
int clip_close(void);
int clip_copy(int fmt,unsigned long size,void *buffer);
void *clip_paste(int fmt);
int clip_wait(void);
void flush_clip(void);

void err(int no);
void win_yield(void);

/* Track open status */
static int clip_is_open=0;
/* Storage for old interrupt vector */
void (_interrupt _far *oldint29)();

/* Interrupt function registers */
#ifdef __BORLANDC__
#define INTREGS unsigned Rbp, unsigned Rdi, unsigned Rsi, \
 unsigned Rds, unsigned Res, unsigned Rdx, \

 unsigned Rcx, unsigned Rbx, unsigned Rax
#else
#define INTREGS unsigned Res, unsigned Rds, unsigned Rdi,\
 unsigned Rsi, unsigned Rbp, unsigned Rsp,\
 unsigned Rbx, unsigned Rdx, unsigned Rcx,\
 unsigned Rax
#endif
/* Interrupt stack */
char int_stack[8192];

/* Line buffer and pointer into line */
char line[513];
int lp=1;

/* Wait for CLIPSH:RDY */
int clip_wait()
 {
 char *hp=NULL;
 do
 {
 int i;
 win_yield();
 while (!clip_open())
 win_yield();
 hp=clip_paste(CLIP_TEXT);
 clip_close();
/* check for abort command */
 if (hp&&!strcmp(hp,"CLIPSH:ABT"))
 {
 union REGS r;
 hp=NULL;
/* Induce DOS exit call in target program */
 r.x.ax=0x4c00;
 int86(0x21,&r,&r);
 return 1;
 }
/* Wait for ready string */
 } while (!hp(hp&&strcmp(hp,"CLIPSH:RDY")));
 return 0;
 }
/* Flush line from buffer to clipboard */
void flush_clip()
 {
 static int oldss,oldsp;
 char *stacktop=int_stack+sizeof(int_stack);
/* Switch stacks */
 _asm {
 mov oldss,ss
 mov oldsp,sp
 mov dx,stacktop
 mov ax,ds
 mov ss,ax
 mov sp,dx
 }
/* Allow future interrupts */
 _enable();
 if (!clip_wait())
 {
/* Prefix string with hex FF */

 line[0]='\xFF';
/* Terminate string with NULL byte */
 line[lp]='\0';
/* Put line on clipboard */
 while (!clip_open()) win_yield();
 clip_copy(CLIP_TEXT,strlen(line)-2,line);
 clip_close();
 }
/* reset line pointer */
 lp=1;
/* switch stack back */
 _asm {
 mov ax,oldss
 mov dx,oldsp
 mov ss,ax
 mov sp,dx
 }
 }
/* Add output character to buffer, if newline or full
 buffer, call flush_clip() */
void _far _interrupt int29(INTREGS)
 {
 if ((line[lp++]=(Rax&0xFF))=='\n'lp>=sizeof(line))
 flush_clip();
 }
void main()
 {
 char cmdline[129],*cl=cmdline;
 char _far *pspcmd;
 int cmdlen;
/* Point to command line in PSP */
 FP_SEG(pspcmd)=_psp;
 FP_OFF(pspcmd)=0x80;
 cmdlen=*pspcmd++;
/* skip leading spaces */
 while (cmdlen&&isspace(*pspcmd))
 {
 pspcmd++;
 cmdlen--;
 }
 while (cmdlen--) *cl++=*pspcmd++;
 *cl='\0';
 printf("Processing...\n");
/* Get old interrupt 0x29 vector */
 oldint29=_dos_getvect(0x29);
/* Capture interrupt 0x29 */
 _dos_setvect(0x29,int29);
/* run program via system */
 if (system(cmdline)) err(errno);
/* Reset vector */
 _dos_setvect(0x29,oldint29);
/* Flush any remaining characters */
 printf("Done.\n");
 if (lp!=1) flush_clip();
 clip_wait();
/* Mark end of data */
 clip_open();
 clip_copy(CLIP_TEXT,10,"CLIPSH:END");
 clip_close();

 exit(0);
 }
/* common error routine */
void err(int no)
 {
 char errmsg[66];
 sprintf(errmsg,"CLIPSH:ERR=%d",no);
 clip_open();
 clip_copy(CLIP_TEXT,strlen(errmsg),errmsg);
 clip_close();
 exit(1);
 }
/* This function gets called when the program exits.
 This way, if you forget to close the clipboard,
 we will do it for you */
static void clip_exit(void)
 {
 if (clip_is_open) clip_close();
 }
/* Open clipboard */
int clip_open(void)
 {
 union REGS r;
 static int first=1;
 if (first)
 {
/* Register atexit() handler (see clip_exit() above) */
 first=0;
 atexit(clip_exit);
 }
/* Don't open it twice */
 if (clip_is_open) return 1;
 r.x.ax=0x1701;
 int86(0x2f,&r,&r);
/* Record status */
 clip_is_open=(r.x.ax!=0);
 return r.x.ax;
 }
/* Close clipboard */
int clip_close(void)
 {
 union REGS r;
/* Don't close if it isn't open */
 if (!clip_is_open) return 0;
 r.x.ax=0x1708;
 int86(0x2f,&r,&r);
/* Record status */
 clip_is_open=0;
 return r.x.ax;
 }
/* Delete clipboard contents */
int clip_clear(void)
 {
 union REGS r;
 r.x.ax=0x1702;
 int86(0x2f,&r,&r);
 return r.x.ax;
 }
/* Move data to clipboard */

int clip_copy(int fmt,unsigned long size,void *buffer)
 {
 union REGS r;
 struct SREGS s;
 unsigned long avail;
 char _far *buffp=(char _far *)buffer;
/* compact clipboard */
 r.x.ax=0x1709;
 r.x.si=size>>16;
 r.x.cx=size&0xFFFF;
 int86(0x2f,&r,&r);
 avail=r.x.dx<<16L;
 avail=r.x.ax;
/* make sure there is enough room */
 if (avail<size) return 0;
 r.x.ax=0x1703;
 r.x.dx=fmt;
 segread(&s);
/* Move data to clipboard */
 s.es=FP_SEG(buffp);
 r.x.bx=FP_OFF(buffp);
 r.x.si=size>>16;
 r.x.cx=size&0xFFFF;
 int86x(0x2f,&r,&r,&s);
 return r.x.ax;
 }
/* Read data from clipboard */
void *clip_paste(int fmt)
 {
 union REGS r;
 struct SREGS s;
 unsigned long siz;
 void *rv, _far *rvp;
 static char sbuf[256];
 r.x.ax=0x1704;
 r.x.dx=fmt;
 int86(0x2f,&r,&r);
/* If no data available, return NULL */
 if (r.x.dx==0&&r.x.ax==0) return NULL;
 siz=r.x.dx<<16L;
 siz=r.x.ax;
 if (siz>sizeof(sbuf)) return NULL;
 rv=sbuf;
 rvp=(void _far *)rv;
 r.x.ax=0x1705;
 r.x.dx=fmt;
 segread(&s);
 s.es=FP_SEG(rvp);
 r.x.bx=FP_OFF(rvp);
 int86x(0x2f,&r,&r,&s);
 if (r.x.ax==0)
 {
/* Error -- return NULL */
 return NULL;
 }
 return rv;
 }
/* Give time back to Windows */
void win_yield()

 {
 union REGS r;
 r.x.ax=0x1680;
 int86(0x2f,&r,&r);
 }

























































Special Issue, 1994
A Program Architecture for Visual Basic Development


Planning ahead for speed, usability, and maintainability




 Joachim Schrmann


Joachim is director of application development at Academic Advantages and a
consultant with Crossbridge Connections. He can be contacted on CompuServe at
72122,144 or on the Internet as joachim@advantages.mystery.com.


Program architecture lies on the fault line between the analysis of functional
logic and its actual implementation as a program. While the design of a
program is driven by a functional requirement, the architecture is shaped by
constraints of technology. It bridges the gap between pure logic and its
concrete manifestation as machine code. This makes it a fascinating topic for
examination.


Program Architecture


While a program architecture provides constraints and services that appear
desirable for a particular project, they are neither practical to enforce
through standards nor supported by the implementation technology.
It is interesting to note that there seems to be a correlation between the
versatility of an implementation technology and the need to limit its
flexibility by rules imposed on its use. If you build a house from scratch,
the range of possible shapes is limited only by the building material and the
intended use of the structure. If the building material is very versatile,
like brick and mortar, there are many different shapes and layouts that can
satisfy a desired functionality. It is therefore a good idea to agree on a
specific architecture even before drawing up the blueprints. On the other
hand, if you use prefabricated elements, the range of choices is limited: A
Quonset hut is a Quonset hut is a Quonset hut.
From assembler to OOP, the enforcement of constraints to limit complexity
progressively changes from being a set of programming rules (standards), to
favoring the architectural approach, to ultimately becoming part of the
implementation technology itself; see Figure 1. Anywhere between assembler and
OOP the architecture plays an important role in making the technology an
effective and usable implementation tool.


Scope and Constraints


The factors that determine the scope of a program architecture include the
program usage that the architecture supports, the implementation technology,
and the processing mode.
If a program is a one-shot pop that is unlikely to ever get changed, spending
time to develop a specific architecture is hard to justify. Often, however,
seemingly insignificant programs grow into crucial components of major
systems. In such cases, it may become necessary to rewrite the program. A more
cautious approach is to use a proven architecture. This will require more
overhead than just coding a program without any architectural constraints, but
it proves useful if the program turns out to be more important than
anticipated.
The other extreme of program usage is full integration into a complex
data-processing system. No matter what the implementation technology, specific
constraints and services are probably necessary to tie program elements
together into a single system. These common constraints and services are at
the heart of the system and may justify spending as much as 50 percent of the
total system-implementation time on the architecture.
The architecture must provide services and constraints not supported by the
implementation technology. For example, if a system provides unsatisfactory
communication between calling modules (such as CICS), insufficient error
protection (MS-DOS), or unsatisfactory printing routines (Cobol), chances are
that an architecture will fill the gap. Of course, if a technology lacks
certain fundamental capabilities--such as the ability to process sound files,
or to distribute transactions across a network--it may be impractical to rely
on an architecture for corrective actions.
Depending on the processing mode, an architecture may have to perform
different duties. A batch architecture, even for a complex system, will never
need to provide more than just batch services. On the other hand, an
architecture for an online, cooperative system must furnish real-time services
and possibly extend across different system platforms.
The scope of an architecture depends mostly on functional and systems-related
criteria. Its constraints, however, are defined predominantly by the human
factor. 
First, the architecture has the precise function to enforce simplicity and
thus achieve maintainability of even complex programs. It must blot as much
complexity from the application logic as possible and impose a documentable,
recognizable structure upon the logic that it serves. An architecture must
also support program usability by limiting inconsistencies of the user
interface within a system and across programs. Users like program access to
follow a predictable and reliable pattern.
Finally, the architecture must monitor the use of resources in order to
preserve the efficiency of execution. This is particularly important for
online programs, as insufficient response time is a major obstacle to the
system's acceptance.
In short, an architecture is the result of conflicting requirements (see
Figure 2). It takes experience and much attention to detail to achieve a
workable compromise. Because of the importance of an architecture to the
success of a system, you should spare no effort to achieve the best solution
possible for any given set of circumstances. 


Architecture Components


An architecture handles data used across several programs or modules. The
definition of this data and its structure is the first important component of
an architecture. It is equally important to identify the program logic that
should be provided by the architecture. This logic will be available from
anywhere within the system and is likely to define how all system components
interact with each other. Hence, program logic and data definitions are the
two dimensions you have to manage to build an effective program architecture.
This is tantamount to requiring that, in order to build a good program
architecture, you must know exactly what the complete system will look like
and how it will be put together. While system designers generally are
perceptive, they are rarely omniscient. It's a good idea then to look for a
more practical approach to the design of an architecture.


Architecture Design


Bottom up or top down is the question. Do you first design the system, then
devise the architecture, or do you establish the architecture first, then
assemble the system to fit the architecture?
The answer isn't simple. We're likely to do neither, or rather, a little bit
of both. The process is iterative, with continuous corrective measures. We
establish an architecture that provides most of the services and support that
we think is needed, and then we proceed with the detail. While designing and
implementing the program detail, you may notice additional requirements and go
back to modify the architecture. This may affect the programs already designed
and written and cause additional modifications. The further along you get, the
more far-reaching consequences any change to the program architecture will
have. This is a strong incentive to invest a lot of time up front, especially
when large programming teams are involved. 
There are two notable exceptions to this approach. The first is the commercial
availability of established architectures. These include the Hogan umbrella,
Foundation, and others. These architectures are likely to be well proven,
providing all data handling and all elements of common logic that could
possibly be required. Typically, they address complex-systems situations for
which adequate implementation technologies are not available. In Figure 1,
this situation would be placed somewhere left of the center on the time line.
Moving to the right on that axis, the importance of the architecture itself
decreases and the incidence of the implementation technology becomes more
relevant. In the second exception, a sufficiently strong implementation
technology is used, making the existence of an architecture incidental in the
overall scheme of things. Since the architecture has ceased to be a strong
component of the application, you are unlikely to invest a lot of effort into
it. 
Advanced programming languages provide a strong incentive for prototyping and
experimentation. If any architectures are needed, they may be developed along
the way. Common requirements will come into focus when we experiment with the
functionality and implementation techniques.


Visual Basic



Visual Basic (VB) is a fairly advanced programming language. Like Basic, it is
an interpreted language that can also be compiled. Unlike Basic, however, VB
is event driven and does not adhere to the traditional procedural model. 
The procedural model assumes that a central piece of logic has the ultimate
control over the program. The concepts of "program start," "program end," and
"mainline" are strongly associated with this model. There may be questions on
how to break down the logic and manage the data, but control definitely
resides within the program logic and all actions and activities are initiated
from within that logic.
An event-driven structure, however, relies on distributed logic. Actions and
activities are associated with objects that generate events. A screen is an
object, a field on the screen is an object, a button in a field on a screen is
an object, and so on. Each object generates events through some sort of
interaction with the world (a mouse click, for example). This starts the
execution of any logic previously associated with that event. The important
difference from the procedural model is that the logic associated with this
event is self-contained; no centralized piece of logic is aware of all events
being triggered. The event-driven program operates like an ecology: All
elements of the whole act separately, but work together to achieve a common
goal. The fact that events can be started independently leads to their
potential concurrence. Hence, program execution is never quite deterministic,
as separate events may start at different times and interact with each other
in different ways.
All these interactions can grow very complex. For this reason, an event-driven
programming language has a more-solid structure and clearer constraints than
any procedural language. Much of the logic and structure that traditional
systems enforce via an architecture are intrinsic to the event-driven approach
and thus to the programming language itself. A good example is the way data is
handled. Some data is temporary within the event procedure, other data is
shared across procedures or even within the whole system and remains available
throughout the life of a program. The definition and management of the various
types of data is inherent to Visual Basic and does not have to be provided by
an architecture.
You might rightfully ask, if the structure is so strong, does VB really need a
program architecture? The answer depends on what you want to do. I found VB to
be fairly good at handling most structural requirements, even of a complex
system with multiple screens. Where an architecture came in handy, though, was
in ensuring consistency across screens.
Visual Basic places no limitations on what goes where on the screen, and no
areas are reserved for any special purpose. Nevertheless, in one recent
project, we wanted to provide a navigation aid to users. We decided to reserve
the top area of all major screens for information about the data being
processed. To look stable and reassuring, that area had to remain consistent
and perfectly aligned across multiple screens. This required a special
mechanism not available through the programming language.
An architecture might also be necessary because VB forms (which define a unit
of logic consisting of a window, its controls, and associated logic and
variables) do not have an intrinsic awareness of where they are called from.
They are loaded and activated, then perform whatever logic is necessary, and
relinquish control to whatever code started them. If the execution logic is
supposed to vary depending on where the form was called from, some customized
code needs to be devised.
Another example is the the management of asynchronous processing. Simply put,
VB asynchronous processing lacks finesse. If you plan to rely on asynchronous
processing in any way at all, allow yourself extra time to develop a managing
mechanism that handles conflicts arising from concurrent and recursive
execution and enforces dependencies.


An Example


I was recently part of a team that designed a reasonably complex, commercial
information-management program encompassing about 35,000 lines of code written
in Visual Basic.
When we started out, we assumed that VB would be able to support all our
structural requirements. It was only during the protoyping of the screens that
we realized the usefulness of a mechanism that would ensure a consistent look
and feel across several of our screens.
In particular, we decided that at any given time, any of the three major
screens were to provide information about the data being processed. To make
things easy and natural, this information was to remain displayed in the same
location--the upper portion of the screen--independently from the data window
opened at any given time. User interaction was most successful when the area
containing that information looked as if it were hovering above, almost as if
it were no part of the screen proper. A change of data windows had to leave
the information area unaffected, unless, of course, the data being displayed
changed.
Figure 3 shows one of the three screens that displays this information area,
located immediately below the menu bar. There is one panel containing two
fields. The right field provides information about the file that is currently
open. The left field provides details about the item being accessed. Since the
entire application relies heavily on graphical symbols, these fields must be
capable of displaying icons, in addition to character information.
The status of both fields must be clear to the user. Figure 4 shows a
progression of information areas captured in various situations. When no file
is open, the Item field is invisible, since no item can be selected. If a file
is open, the Item field either shows the name of a selected item or conveys
the request that an item is to be selected. Whenever valid information is
associated with the field, the background color is yellow. If no information
is selected or the information is incomplete, the background of the field is
white. When an action request is stated, like <click> to select a DATABASE,
this request is displayed in red on white background, and the field must be
clickable. In addition, for user convenience, either field is clickable when
associated with a screen that allows the selection of a database or of an
item. The field always contains an image that varies depending on the
situation being addressed. 
All this sounds complicated. However, as a user interface it comes across
quite naturally, communicating through colors, images, and characters, making
the structure of the message very intuitive.
Once we defined our intentions, we had to seek a way to implement them. We
explored several alternatives and found that only a few of them were suitable
for implementation. I'll briefly discuss the options that we discarded.
The first (and possibly most obvious) way to manage the screens would be to
display the information area in a window of its own. This window would be
visible together with any of the data windows. This would require that for
every screen change, not just one but two windows would need to be managed and
synchronized. While this adds complexity, it can be implemented, but there are
other difficulties to be reckoned with.
The menu bar and title are always displayed along the top of a window. While
we chose not to use a menu bar or to suppress the title, we didn't have the
option of displaying it in other than the standard system location.
If we had chosen to display the information fields in a window of their own
and programmatically placed this window above the data window, the title and
menu bar of the data window would be sandwiched between the two windows. This
would be an entirely wrong location. To remain consistent with the general
windows interface, a menu must be above the area that a user recognizes as a
work area and not in the middle of it.
An alternative was to place the menu and its management within the confines of
the information window. In this case, the logic of the information window
would need to associate the appropriate menu choices with the active data
windows and convey menu selections to the program logic of that window. This
was feasible but too complex for the simple goal that we had in mind. 
We were then tempted to define a window so that it seems to float on top of
all other windows displayed on the screen. To do that, you use the TOPMOST
option of the SetWindowPos API, which works like the Always on Top option of
Windows help. We considered defining a separate information window in this
manner. This solution was aesthetically pleasing, leaving the user in full
control of where to drag the information window. 
The flexibility of that approach, however, also carried a price: The user
would have to worry about placing the information window on the screen and
moving it to where data would not be covered, thus making the interface more
complex rather than less.
Additionally, there was an architectural problem with this solution. The
topmost window will remain topmost even if the associated application ceases
to have the focus. In other words, if we Alt+Tab from our
information-management application with an open topmost window into an
unrelated application, the deactivated topmost window would remain visible on
the screen, on top of all other windows. This can be addressed
programmatically but requires extra logic for proper management. More
importantly, we found that in the case of system messages or the display of
any other small modal windows, deadlock situations could arise. A topmost
window can hide a modal window, preventing the latter from being accessed.
Yet, the modal window requires service before the topmost window can be moved.
At this point, the application is in deadlock and needs to be restarted. Any
preventive mechanism would be quite delicate, so we preferred to avoid the
risk of locking up the system entirely. We also did not like that an API call
would be needed, hence bypassing VB. This may have affected future portability
of the code, so we decided to abandon this approach, as well.
As we considered the problem further, a less-sophisticated solution ultimately
prevailed. Technically straightforward, it has the advantage of being very
efficient and easy to use. In essence, it requires that the major screens be
equipped with a predefined information area and that some common method be
found to align and display data in those areas.
Contrary to the other alternatives, the information area is now part of the
data window; however, the managing mechanism is centralized and interacts with
each form in identical fashion. Besides managing the data displayed and the
characteristics of the display fields, this mechanism also manages the
relative location of the fields within the open window. By ensuring that the
relative position is the same across different windows, we achieve the
impression that the window changes, but the information field does not. 


An Overview of Our Approach


It would have been nice if VB had stronger object orientation, because, if an
object on a form were being reused elsewhere, we would only need to refer to
that object at run time and use its data and logic as required. Unfortunately,
this isn't possible with VB because the only objects for which new instances
can be created are forms. Fortunately, a little more coding yields an equally
satisfactory result.
Any of the windows that are to contain an information area require the
component objects to be explicitly defined. Their location, size, content, and
the like, do not have to be preset because properties and data can be
manipulated at run time. 
Referring again to Figure 4, you have to decide what type of objects are
required. The information area consists of one panel with a three-dimensional
look and two fields that contain both an image and some characters.
Functionally, the panel is optional. However, we decided that the improvement
to the look and feel would be worth the added overhead. A solid gray 3-D Panel
Control suited our needs.
The two fields embedded within the panel are Grid Controls displaying a single
cell. The Grid is the only basic control (an object associated with a Visual
Basic form) that displays character information and images with equal ease. At
the same time, it intercepts Click and MouseMove events and supports the
BackColor and ForeColor properties, which we can use to conveniently change
the color of the fields. 


Definitions


At this point, we determined that each information area would consist of one
3-D Panel Control and two Grid Controls within the panels. Listing One shows
the code that defines these controls. It identifies the information area with
the name panTopPanel and specifies the controls grdItemName and grdFileName as
contained within that panel. Since the properties of these tools will be set
at run time, their initialization is not strictly necessary. It is just a
sensible precaution against possible oversights.
In addition to the mere definition of the tools, code must be provided to
manage the information area. Prior to window activation, the information area
must be initialized; after the deactivation of the screen, the then-current
information area must be preserved and passed on to the next screen. Contrary
to the display controls themselves, the code for this logic does not have to
be included in every form but can be shared from a central library. Figure 5
shows the modules of this architecture and their relationship with the rest of
the program.
Remember that the data from an information area needs to be passed from form
to form. There are various ways to accomplish this. Intuitively, you might
think of copying the information at every screen change directly from one
window to the next window. This avoids the use of internal storage, but it
requires the copying logic to have an awareness of both the source and the
target screen. In addition, both screens must be present in memory at the same
time. We found that updating fields upon activation and saving fields upon
deactivation would provide a more flexible and consistent approach. Therefore,
we allocated a storage area for the information data. 
Listing Two is the definition of the variables in which the properties of
panTopPanel and its two controls are being stored. There are two interesting
anomalies you may notice when looking at these definitions. The first regards
the variables for the column width and row height of the two grid controls.
While we know from Figure 4 that only one cell of the grid control is being
displayed, the variable for the column width and height is an array of two,
allowing you to store values for two separate rows and columns. There is a
good reason for this: The grid is not defined as a single cell but rather as a
two-by-two. Row 0 and column 0 are defined with a nominal width and height
that renders them invisible. Only the cell at the intersection of row 1 with
column 1 is visible. This is because, in a grid, the active cell is always
highlighted, meaning that the bitmap of the content is inverted (white to
black, red to green, and so on). If there is only one cell in the grid, this
is the only cell that can be active, and an occasional inversion of the bitmap
does not look good. Therefore, we defined the additional row and column of
negligible height and width and made sure that only the invisible cells are
ever activated.
Note that there's no field in which to store the images from grdItemName and
grdFileName. Figure 4 shows that there are always images in the information
field. They must be saved somewhere, but rather than saving them in a
variable, we found it easier to write them to a temporary file. The VB
language has the commands to do so. Performance is essentially unaffected by
this choice, since the bit string containing the images remains in the data
buffer most of the time, remaining available for immediate access. Writing the
file seems to be asynchronous so it's not on the critical path to efficient
performance. 
Figure 5 shows that the variables that store the architecture's properties
must be initialized before any display logic is executed. This is implemented
in Listing Three . These settings define the defaults. Note that there are no
defaults for the text and the image displayed in the information field. The
grid content is initialized by the program logic of the first form executed.
All properties set through this procedure can be modified by the program code.
The architecture will accept these modifications and propagate them to other
windows. In reality, however, only a very few properties of the controls
defined by these variables will be changed programmatically. The defaults set
here define the look and feel of the information area across the program and,
therefore, have to be chosen with care.


The Code


The architecture consists of three program modules: gpTopPanelSave writes the
data from the information area to memory, gpTopPanelInit uses the data stored
in memory to initialize the information area of a screen, and gpTopPanelItem
allows controlled changes to the Item field of the information area.

Listing Four is the gpTopPanelInit code, which consists of three parts that
initialize the panel and two grids, respectively. Each part is preceded by a
statement that determines how much of the panel and the grids needs to be
initialized. If the format is already established, only the data is restored.
This eliminates unnecessary overhead without reducing functionality. 
This procedure requires the form of the calling program to be passed as a
parameter. It is invoked by gpTopPanelInit Me. The Me identifies the form with
which the calling logic is associated. The invoking statement should be
triggered by the activation of every form that contains an information area.
This ensures that the information area is refreshed whenever any window with
an information area receives user focus.
The grids are defined as having four cells, out of which only the bottom-right
cell with the coordinates 1,1 is visible. You must be careful to use the row
and col properties to position the logic on the valid cell before moving text
into the grid. Also interesting, and very important, is the way we are dealing
with the images. Remember that we decided not to store the images in memory
variables, but to place them in temporary files. This isn't a problem with
LoadPicture. However, the present logic might be engaged to initially load the
information area, before any image is stored on file. You must therefore allow
for enough resiliency to correctly deal with a situation where no file can be
accessed. LoadPicture generates an error if a specified file is not found.
Therefore, we use the dir$ command to determine whether the temporary file
exists on file. If it does not, an empty string is passed to LoadPicture
rather than the filename itself. An empty string does not trigger an error
condition, it just causes an empty image to be generated and stored in the
grid. After startup, the logic of the first form will load images into both
grids.
gpTopPanelSave (see Listing Five) is the counterpart to gpTopPanelInit.
Instead of restoring the properties of the information area, it saves them. We
don't have to worry about nonexisting files, nor do we allow saving partial
data. This simplifies the code somewhat. This procedure is invoked by
gpTopPanelSave Me. In keeping with the symmetrical nature of this approach,
the statement is triggered by the deactivation of a form that contains an
information area.
In addition to these two procedures, we found it useful to have a third
procedure to change the content of the Item field. Functionally, this isn't
necessary, because the content and the definition of the information area can
be changed directly from the program code. All such changes would be captured
and saved by the gpTopPanelSave procedure. However, the information-area
fields should be uniform, no matter when, and from where they are updated. The
Item field is updated from different locations. Thus, we chose to build a
common update procedure that changes or resets the Item-field text or replaces
the image. It is called gpTopPanelItem (see Listing Six). 
This procedure requires three parameters: the form from which it is called, an
optional text, and the image to be displayed. The Text field can be either
null, empty, or contain a valid string. When it is null, a red default string
with the invitation to click on the field is displayed on a white background.
When the text field is empty, but not null, no text is displayed, and the
background is white. When it contains a valid string, the Item area is
initialized to a yellow background, and the characters of text are displayed
in black, using the default-character options selected by the user. There must
always be a valid image, and that image is always displayed.


Conclusion


The approach to program architecture presented here is simple and
straightforward. It allows for speedy processing: Properties and data are
stored in main memory, and access is very fast. Pictures are stored in and
retrieved from files, but buffer handling makes file access transparent.
All calls are very simple. Instead of a long series of parameters, we pass the
whole form and let the called procedure sort out which parts to access. This
makes the procedures easy to use, as well as cutting down on programming time.

Repetitive code is hidden by the calls to those procedures. The code is reused
many times but exists only once and is stored in centralized procedures. This
reduces redundancy, thus improving development time and simplifying
maintenance. 
Overall, we found that the architecture described in this article complemented
the native Visual Basic services very well and provided just the enhancement
needed. 
Figure 1 Enforcement of constraints to limit complexity changes from being
provided by standards to being embedded in the programming language.
Figure 2 Program analysis and implementation: An architecture is the result of
conflicting requirements.
Figure 3 One of the three screens displaying the information area immediately
below the menu bar.
Figure 4 A progression of information areas captured in various situations.
Figure 5 Modules of this architecture and their relationship with the rest of
the program. Elements of the architecture are shown in blue.

Listing One 

Begin SSPanel panToppanel 
 Alignment = 0 'Left Justify - TOP
 BackColor = &H00C0C0C0&
 BevelWidth = 2
 BorderWidth = 4
 FloodShowPct = 0 'False
 Font3D = 3 'Inset w/light shading
 ForeColor = &H00000000&
 Height = 612
 HelpContextID = 2
 Left = 60
 TabIndex = 0
 Top = 60
 Width = 9612
 Begin Grid grdItemName 
 BackColor = &H00FFFFFF&
 BorderStyle = 0 'None
 FixedCols = 0
 FixedRows = 0
 FontBold = 0 'False
 FontItalic = 0 'False
 FontName = "Arial"
 FontSize = 9
 FontStrikethru = 0 'False
 FontUnderline = 0 'False
 ForeColor = &H000000FF&
 GridLines = 0 'False
 Height = 435
 HelpContextID = 3
 HighLight = 0 'False
 Left = 120
 ScrollBars = 0 'None
 TabIndex = 5
 Top = 120
 Width = 3300
 End

 Begin Grid grdFileName 
 BackColor = &H00FFFFFF&
 BorderStyle = 0 'None
 FixedCols = 0
 FixedRows = 0
 FontBold = -1 'True
 FontItalic = 0 'False
 FontName = "Arial"
 FontSize = 9
 FontStrikethru = 0 'False
 FontUnderline = 0 'False
 ForeColor = &H00000000&
 GridLines = 0 'False
 Height = 435
 HelpContextID = 4
 HighLight = 0 'False
 Left = 4680
 ScrollBars = 0 'None
 TabIndex = 4
 Top = 120
 Width = 3375
 End
End



Listing Two

'Global variables defining the TopPanel of the data gathering and 
' management forms. TopPanel provides navigational aid to the user. 
' Upon activation or deactivation of a form the TopPanel information is 
' kept current by the architecture. If the processing status changes, 
' program logic will update the fields directly.

' variables to store properties of TopPanel
Global gvTopPanelAlign As Integer
Global gvTopPanelAlignment As Integer
Global gvTopPanelAutosize As Integer
Global gvTopPanelBackColor As Long
Global gvTopPanelBevelInner As Integer
Global gvTopPanelBevelOuter As Integer
Global gvTopPanelBevelWidth As Integer
Global gvTopPanelBorderWidth As Integer
Global gvTopPanelCaption As String
Global gvTopPanelDragIcon As Integer
Global gvTopPanelDragMode As Integer
Global gvTopPanelEnabled As Integer
Global gvTopPanelFloodColor As Long
Global gvTopPanelFloodPercent As Integer
Global gvTopPanelFloodShowPct As Integer
Global gvTopPanelFloodType As Integer
Global gvTopPanelFont3d As Integer
Global gvTopPanelFontBold As Integer
Global gvTopPanelFontItalic As Integer
Global gvTopPanelFontName As String
Global gvTopPanelFontSize As Single
Global gvTopPanelFontStrikeThru As Integer
Global gvTopPanelFontUnderline As Integer
Global gvTopPanelForeColor As Long

Global gvTopPanelHeight As Single
Global gvTopPanelHelpContextID As Long
'note "hWnd" is read only at exec time; we shall not store it
'note "Index" is read only at exec time; we shall not store it
Global gvTopPanelIndex As Integer
Global gvTopPanelLeft As Single
Global gvTopPanelMousePointer As Integer
Global gvTopPanelName As String
Global gvTopPanelOutline As Integer
'note "Parent" is read only at exec time; we shall not store it
Global gvTopPanelRoundedCorners As Integer
Global gvTopPanelShadowColor As Integer
Global gvTopPanelTabindex As Integer
Global gvTopPanelTag As String
Global gvTopPanelTop As Single
Global gvTopPanelVisible As Integer
Global gvTopPanelWidth As Single

'Variables to store properties of grdFileName which is part of TopPanel
'grdFileName is a Grid and has the corresponding Variables
Global gvFilenameBackColor As Long
Global gvFilenameBorderStyle As Integer
Global gvFileNameCellSelected As Integer
Global gvFileNameClip As String
Global gvFileNameCol As Integer
Global gvFileNameColAlignment As Integer
Global gvFilenameCols As Integer
Global gvFilenameColWidth(0 To 1) As Long
Global gvFileNameDragIcon As Integer
Global gvFilenameDragmode As Integer
Global gvFilenameEnabled As Integer
Global gvFilenameFillstyle As Integer
Global gvFileNameFixedAlignment As Integer
Global gvFilenameFixedCols As Integer
Global gvFilenameFixedRows As Integer
Global gvFilenameFontBold As Integer
Global gvFilenameFontItalic As Integer
Global gvFilenameFontName As String
Global gvFilenameFontSize As Single
Global gvFilenameFontstrikethru As Integer
Global gvFilenameFontUnderline As Integer
Global gvFilenameForecolor As Long
Global gvFilenameGridLines As Integer
Global gvFilenameHeight As Single
Global gvFilenameHelpContextID As Long
Global gvFilenameHighlight As Integer
Global gvFilenameLeft As Single
Global gvFileNameLeftCol As Integer
Global gvFileNameName As String
'note "Parent" is read only at exec time; we shall not store it
Global gvFilenamePicture As Variant
Global gvFileNameRow As Integer
Global gvFilenameRowHeight(0 To 1) As Long
Global gvFilenameRows As Integer
Global gvFilenameScrollbars As Integer
Global gvFileNameSelEndCol As Integer
Global gvFileNameSelEndRow As Integer
Global gvFileNameTag As String
Global gvFileNameText As String

'Global goDataBase As String 'defined as global option
Global gvFilenameTop As Single
Global gvFileNameTopRow As Integer
Global gvFilenameVisible As Integer
Global gvFilenameWidth As Single

'Variables to store properties of grdItemName which is part of TopPanel
'grdItemName is a Grid and has the corresponding Variables
Global gvItemNameBackColor As Long
Global gvItemNameBorderStyle As Integer
Global gvItemNameCellSelected As Integer
Global gvItemNameClip As String
Global gvItemNameCol As Integer
Global gvItemNameColAlignment As Integer
Global gvItemNameCols As Integer
Global gvItemNameColWidth(0 To 1) As Long
Global gvItemNameDragIcon As Integer
Global gvItemNameDragmode As Integer
Global gvItemNameEnabled As Integer
Global gvItemNameFillstyle As Integer
Global gvItemNameFixedAlignment As Integer
Global gvItemNameFixedCols As Integer
Global gvItemNameFixedRows As Integer
Global gvItemNameFontBold As Integer
Global gvItemNameFontItalic As Integer
Global gvItemNameFontName As String
Global gvItemNameFontSize As Single
Global gvItemNameFontstrikethru As Integer
Global gvItemNameFontUnderline As Integer
Global gvItemNameForecolor As Long
Global gvItemNameGridLines As Integer
Global gvItemNameHeight As Single
Global gvItemNameHelpContextID As Long
Global gvItemNameHighlight As Integer
Global gvItemNameLeft As Single
Global gvItemNameLeftCol As Integer
Global gvItemNameName As String
'note "Parent" is read only at exec time; we shall not store it
Global gvItemNamePicture
Global gvItemNameRow As Integer
Global gvItemNameRowHeight(0 To 1) As Long
Global gvItemNameRows As Integer
Global gvItemNameScrollbars As Integer
Global gvItemNameSelEndCol As Integer
Global gvItemNameSelEndRow As Integer
Global gvItemNameTag As String
Global gvItemNameText As String
Global gvItemNameTop As Single
Global gvItemNameTopRow As Integer
Global gvItemNameVisible As Integer
Global gvItemNameWidth As Single



Listing Three

'Initialize variables that store properties of TopPanel
 gvTopPanelAlign = 0
 gvTopPanelAlignment = 0

 gvTopPanelAutosize = 0
 gvTopPanelBackColor = &HC0C0C0
 gvTopPanelBevelInner = 0
 gvTopPanelBevelOuter = 2
 gvTopPanelBevelWidth = 2
 gvTopPanelBorderWidth = 4
 gvTopPanelCaption = ""
 gvTopPanelDragMode = 0
 gvTopPanelEnabled = True
 gvTopPanelForeColor = &H0&
 gvTopPanelFontName = "Arial"
 gvTopPanelFontSize = 9.6
 gvTopPanelHeight = 615
 gvTopPanelHelpContextID = 0
 gvTopPanelLeft = 105
 gvTopPanelOutline = False
 gvTopPanelRoundedCorners = True
 gvTopPanelShadowColor = 0
 gvTopPanelTop = 75
 gvTopPanelVisible = True
 gvTopPanelWidth = 9345

'Initialize variables that store properties of grdFileName 
' which is part of TopPanel
 gvFilenameBackColor = &HFFFFFF
 gvFilenameBorderStyle = 0
 gvFilenameCols = 2
 gvFilenameColWidth(0) = 1
 gvFilenameColWidth(1) = 3500
 gvFilenameDragmode = 0
 gvFilenameEnabled = True
 gvFilenameFillstyle = 0
 gvFilenameFixedCols = 0
 gvFilenameFixedRows = 0
 gvFilenameFontBold = False
 gvFilenameFontItalic = False
 gvFilenameFontName = "Arial"
 gvFilenameFontSize = 9.75
 gvFilenameFontstrikethru = False
 gvFilenameFontUnderline = False
 gvFilenameForecolor = &H0&
 gvFilenameGridLines = False
 gvFilenameHeight = 500
 gvFilenameHelpContextID = 0
 gvFilenameHighlight = False
 gvFilenameLeft = 3900
 gvFilenameRows = 2
 gvFilenameRowHeight(1) = 425
 gvFilenameRowHeight(0) = 1
 gvFilenameScrollbars = 0
 gvFilenameTop = 75
 gvFilenameVisible = True
 gvFilenameWidth = 3501

'Initialize variables that store properties of grdItemName 
' which is part of TopPanel
 gvItemNameBackColor = &HFFFFFF
 gvItemNameBorderStyle = 0
 gvItemNameCols = 2

 gvItemNameColWidth(0) = 1
 gvItemNameColWidth(1) = 3500
 gvItemNameDragmode = 0
 gvItemNameEnabled = True
 gvItemNameFillstyle = 0
 gvItemNameFixedCols = 0
 gvItemNameFixedRows = 0
 gvItemNameFontBold = goListFontBold
 gvItemNameFontItalic = goListFontItalic
 gvItemNameFontName = goListFontName
 gvItemNameFontSize = goListFontSize
 gvItemNameFontstrikethru = False
 gvItemNameFontUnderline = False
 gvItemNameForecolor = &H0&
 gvItemNameGridLines = False
 gvItemNameHeight = 500
 gvItemNameHelpContextID = 0
 gvItemNameHighlight = False
 gvItemNameLeft = 120
 gvItemNameRows = 2
 gvItemNameRowHeight(1) = 425
 gvItemNameRowHeight(0) = 1
 gvItemNameScrollbars = 0
 gvItemNameTop = 75
 gvItemNameVisible = True
 gvItemNameWidth = 3501



Listing Four

Sub gpTopPanelInit (pForm As Form)

 'define the path for temp files
 tPath$ = App.Path 'put temp files into application directory
 If Right$(tPath$, 1) <> "\" Then tPath$ = tPath$ & "\" 
 'terminate path with backslash
 
 'panel proper
 If pForm.panToppanel.Width = gvTopPanelWidth And 
 pForm.panToppanel.Height = gvTopPanelHeight And 
pForm.panToppanel.Top = gvTopPanelTop Then
 'do nothing - panel is already established
 Else
 pForm.panToppanel.Align = gvTopPanelAlign
 pForm.panToppanel.Alignment = gvTopPanelAlignment
 pForm.panToppanel.AutoSize = gvTopPanelAutosize
 pForm.panToppanel.BackColor = gvTopPanelBackColor
 pForm.panToppanel.BevelInner = gvTopPanelBevelInner
 pForm.panToppanel.BevelOuter = gvTopPanelBevelOuter
 pForm.panToppanel.BevelWidth = gvTopPanelBevelWidth
 pForm.panToppanel.BorderWidth = gvTopPanelBorderWidth
 pForm.panToppanel.Caption = gvTopPanelCaption
 pForm.panToppanel.DragMode = gvTopPanelDragMode
 pForm.panToppanel.Enabled = gvTopPanelEnabled
 pForm.panToppanel.FloodColor = gvTopPanelFloodColor
 pForm.panToppanel.FloodPercent = gvTopPanelFloodPercent
 pForm.panToppanel.FloodShowPct = gvTopPanelFloodShowPct
 pForm.panToppanel.FloodType = gvTopPanelFloodType

 pForm.panToppanel.Font3D = gvTopPanelFont3d
 pForm.panToppanel.FontBold = gvTopPanelFontBold
 pForm.panToppanel.FontItalic = gvTopPanelFontItalic
 pForm.panToppanel.FontName = gvTopPanelFontName
 pForm.panToppanel.FontSize = gvTopPanelFontSize
 pForm.panToppanel.FontStrikethru = gvTopPanelFontStrikeThru
 pForm.panToppanel.FontUnderline = gvTopPanelFontUnderline
 pForm.panToppanel.ForeColor = gvTopPanelForeColor
 pForm.panToppanel.Height = gvTopPanelHeight
 pForm.panToppanel.HelpContextID = gvTopPanelHelpContextID
 pForm.panToppanel.Left = gvTopPanelLeft
 pForm.panToppanel.MousePointer = gvTopPanelMousePointer
 pForm.panToppanel.Outline = gvTopPanelOutline
 pForm.panToppanel.RoundedCorners = gvTopPanelRoundedCorners
 pForm.panToppanel.ShadowColor = gvTopPanelShadowColor
 pForm.panToppanel.TabIndex = gvTopPanelTabindex
 pForm.panToppanel.Tag = gvTopPanelTag
 pForm.panToppanel.Top = gvTopPanelTop
 pForm.panToppanel.Visible = gvTopPanelVisible
 pForm.panToppanel.Width = gvTopPanelWidth
 End If

 'itemname: Name grid on Panel
 If pForm.grdItemname.Top = gvItemNameTop And 
 pForm.grdItemname.Width = gvItemNameWidth And 
pForm.grdItemname.Height = gvItemNameHeight Then
 'grid is already established - rewrite only content
 pForm.grdItemname.FontBold = gvItemNameFontBold
 pForm.grdItemname.FontItalic = gvItemNameFontItalic
 pForm.grdItemname.FontName = gvItemNameFontName
 pForm.grdItemname.FontSize = gvItemNameFontSize
 pForm.grdItemname.FontStrikethru = gvItemNameFontstrikethru
 pForm.grdItemname.FontUnderline = gvItemNameFontUnderline
 pForm.grdItemname.ForeColor = gvItemNameForecolor
 pForm.grdItemname.BackColor = gvItemNameBackColor
 
 pForm.grdItemname.Row = 1: pForm.grdItemname.Col = 1: 
 pForm.grdItemname.Text = gvItemNameText

 temp$ = Dir$(tPath$ & "smarta01.tmp") 
 ' empty if File does not exist
 If temp$ <> "" Then temp$ = tPath$ & temp$ 
 ' Dir$ does not list subdirectory. 
 ' Add it to name if found!
 pForm.grdItemname.Picture = LoadPicture(temp$) 
 ' set picture to retrived info. Empty if none

 Else 'grid not established - set grid and rewrite content
 pForm.grdItemname.BackColor = gvItemNameBackColor
 pForm.grdItemname.BorderStyle = gvItemNameBorderStyle
 pForm.grdItemname.Cols = gvItemNameCols
 pForm.grdItemname.ColWidth(0) = gvItemNameColWidth(0)
 pForm.grdItemname.ColWidth(1) = gvItemNameColWidth(1)
 pForm.grdItemname.DragMode = gvItemNameDragmode
 pForm.grdItemname.Enabled = gvItemNameEnabled
 pForm.grdItemname.FillStyle = gvItemNameFillstyle
 pForm.grdItemname.FixedCols = gvItemNameFixedCols
 pForm.grdItemname.FixedRows = gvItemNameFixedRows
 pForm.grdItemname.FontBold = gvItemNameFontBold

 pForm.grdItemname.FontItalic = gvItemNameFontItalic
 pForm.grdItemname.FontName = gvItemNameFontName
 pForm.grdItemname.FontSize = gvItemNameFontSize
 pForm.grdItemname.FontStrikethru = gvItemNameFontstrikethru
 pForm.grdItemname.FontUnderline = gvItemNameFontUnderline
 pForm.grdItemname.ForeColor = gvItemNameForecolor
 pForm.grdItemname.GridLines = gvItemNameGridLines
 pForm.grdItemname.Height = gvItemNameHeight
 pForm.grdItemname.HelpContextID = gvItemNameHelpContextID
 pForm.grdItemname.HighLight = gvItemNameHighlight
 pForm.grdItemname.Left = gvItemNameLeft
 pForm.grdItemname.RowHeight(0) = gvItemNameRowHeight(0)
 pForm.grdItemname.RowHeight(1) = gvItemNameRowHeight(1)
 pForm.grdItemname.Rows = gvItemNameRows
 pForm.grdItemname.ScrollBars = gvItemNameScrollbars
 pForm.grdItemname.Row = 1: pForm.grdItemname.Col = 1: 
 pForm.grdItemname.Text = gvItemNameText
 pForm.grdItemname.Top = gvItemNameTop
 pForm.grdItemname.Visible = gvItemNameVisible
 pForm.grdItemname.Width = gvItemNameWidth

 temp$ = Dir$(tPath$ & "smarta01.tmp") 
 ' empty if File does not exist
 If temp$ <> "" Then temp$ = tPath$ & temp$ 
 ' Dir$ does not list subdir. Add it if found!
 pForm.grdItemname.Picture = LoadPicture(temp$) 
 ' set picture to retrived info. Empty if none
 End If

 'Filename: Name grid on Panel
 If pForm.grdFilename.Width = gvFilenameWidth And 
 pForm.grdFilename.Top = gvFilenameTop And 
pForm.grdFilename.Height = gvFilenameHeight Then
 'grid is already established - rewrite only content
 pForm.grdFilename.FontBold = gvFilenameFontBold
 pForm.grdFilename.BackColor = gvFilenameBackColor
 pForm.grdFilename.FontItalic = gvFilenameFontItalic
 pForm.grdFilename.FontName = gvFilenameFontName
 pForm.grdFilename.FontSize = gvFilenameFontSize
 pForm.grdFilename.FontStrikethru = gvFilenameFontstrikethru
 pForm.grdFilename.FontUnderline = gvFilenameFontUnderline
 pForm.grdFilename.ForeColor = gvFilenameForecolor
 pForm.grdFilename.Row = 1: pForm.grdFilename.Col = 1: 
 pForm.grdFilename.Text = gvFileNameText
 
 temp$ = Dir$(tPath$ & "smarta02.tmp") 
 ' empty if File does not exist
 If temp$ <> "" Then temp$ = tPath$ & temp$ 
 ' Dir$ does not list subdir. Add if found!
 pForm.grdFilename.Picture = LoadPicture(temp$) 
 ' set picture to retrived info. Empty if none

 Else 'grid not established - set grid and rewrite content
 pForm.grdFilename.BackColor = gvFilenameBackColor
 pForm.grdFilename.BorderStyle = gvFilenameBorderStyle
 pForm.grdFilename.Cols = gvFilenameCols
 pForm.grdFilename.ColWidth(0) = gvFilenameColWidth(0)
 pForm.grdFilename.ColWidth(1) = gvFilenameColWidth(1)
 pForm.grdFilename.DragMode = gvFilenameDragmode

 pForm.grdFilename.Enabled = gvFilenameEnabled
 pForm.grdFilename.FillStyle = gvFilenameFillstyle
 pForm.grdFilename.FixedCols = gvFilenameFixedCols
 pForm.grdFilename.FixedRows = gvFilenameFixedRows
 pForm.grdFilename.FontBold = gvFilenameFontBold
 pForm.grdFilename.FontItalic = gvFilenameFontItalic
 pForm.grdFilename.FontName = gvFilenameFontName
 pForm.grdFilename.FontSize = gvFilenameFontSize
 pForm.grdFilename.FontStrikethru = gvFilenameFontstrikethru
 pForm.grdFilename.FontUnderline = gvFilenameFontUnderline
 pForm.grdFilename.ForeColor = gvFilenameForecolor
 pForm.grdFilename.GridLines = gvFilenameGridLines
 pForm.grdFilename.Height = gvFilenameHeight
 pForm.grdFilename.HelpContextID = gvFilenameHelpContextID
 pForm.grdFilename.HighLight = gvFilenameHighlight
 pForm.grdFilename.Left = gvFilenameLeft
 pForm.grdFilename.RowHeight(0) = gvFilenameRowHeight(0)
 pForm.grdFilename.RowHeight(1) = gvFilenameRowHeight(1)
 pForm.grdFilename.Rows = gvFilenameRows
 pForm.grdFilename.ScrollBars = gvFilenameScrollbars
 pForm.grdFilename.Row = 1: pForm.grdFilename.Col = 1: 
 pForm.grdFilename.Text = gvFileNameText
 pForm.grdFilename.Top = gvFilenameTop
 pForm.grdFilename.Visible = gvFilenameVisible
 pForm.grdFilename.Width = gvFilenameWidth
 
 temp$ = Dir$(tPath$ & "smarta02.tmp") 
 ' empty if File does not exist
 If temp$ <> "" Then temp$ = tPath$ & temp$ 
 ' Dir$ does not list subdir. Add it if found!
 pForm.grdFilename.Picture = LoadPicture(temp$) 
 ' set picture to retrived info. Empty if none
 
 End If

End Sub



Listing Five

Sub gpTopPanelSave (pForm As Form)
 
 'define the path for temp files
 tPath$ = App.Path 'put temp files into application directory
 If Right$(tPath$, 1) <> "\" Then tPath$ = tPath$ & "\" 
 'terminate path with backslash

 'panel proper
 gvTopPanelAlign = pForm.panToppanel.Align
 gvTopPanelAlignment = pForm.panToppanel.Alignment
 gvTopPanelAutosize = pForm.panToppanel.AutoSize
 gvTopPanelBackColor = pForm.panToppanel.BackColor
 gvTopPanelBevelInner = pForm.panToppanel.BevelInner
 gvTopPanelBevelOuter = pForm.panToppanel.BevelOuter
 gvTopPanelBevelWidth = pForm.panToppanel.BevelWidth
 gvTopPanelBorderWidth = pForm.panToppanel.BorderWidth
 gvTopPanelCaption = pForm.panToppanel.Caption
 gvTopPanelDragMode = pForm.panToppanel.DragMode

 gvTopPanelEnabled = pForm.panToppanel.Enabled
 gvTopPanelFloodColor = pForm.panToppanel.FloodColor
 gvTopPanelFloodPercent = pForm.panToppanel.FloodPercent
 gvTopPanelFloodShowPct = pForm.panToppanel.FloodShowPct
 gvTopPanelFloodType = pForm.panToppanel.FloodType
 gvTopPanelFont3d = pForm.panToppanel.Font3D
 gvTopPanelFontBold = pForm.panToppanel.FontBold
 gvTopPanelFontItalic = pForm.panToppanel.FontItalic
 gvTopPanelFontName = pForm.panToppanel.FontName
 gvTopPanelFontSize = pForm.panToppanel.FontSize
 gvTopPanelFontStrikeThru = pForm.panToppanel.FontStrikethru
 gvTopPanelFontUnderline = pForm.panToppanel.FontUnderline
 gvTopPanelForeColor = pForm.panToppanel.ForeColor
 gvTopPanelHeight = pForm.panToppanel.Height
 gvTopPanelHelpContextID = pForm.panToppanel.HelpContextID
 gvTopPanelLeft = pForm.panToppanel.Left
 gvTopPanelMousePointer = pForm.panToppanel.MousePointer
 gvTopPanelOutline = pForm.panToppanel.Outline
 gvTopPanelRoundedCorners = pForm.panToppanel.RoundedCorners
 gvTopPanelShadowColor = pForm.panToppanel.ShadowColor
 gvTopPanelTabindex = pForm.panToppanel.TabIndex
 gvTopPanelTag = pForm.panToppanel.Tag
 gvTopPanelTop = pForm.panToppanel.Top
 gvTopPanelVisible = pForm.panToppanel.Visible
 gvTopPanelWidth = pForm.panToppanel.Width
 
 'itemname: Name grid on Panel
 gvItemNameBackColor = pForm.grdItemname.BackColor
 gvItemNameBorderStyle = pForm.grdItemname.BorderStyle
 gvItemNameCols = pForm.grdItemname.Cols
 gvItemNameDragmode = pForm.grdItemname.DragMode
 gvItemNameEnabled = pForm.grdItemname.Enabled
 gvItemNameFillstyle = pForm.grdItemname.FillStyle
 gvItemNameFixedCols = pForm.grdItemname.FixedCols
 gvItemNameFixedRows = pForm.grdItemname.FixedRows
 gvItemNameFontBold = pForm.grdItemname.FontBold
 gvItemNameFontItalic = pForm.grdItemname.FontItalic
 gvItemNameFontName = pForm.grdItemname.FontName
 gvItemNameFontSize = pForm.grdItemname.FontSize
 gvItemNameFontstrikethru = pForm.grdItemname.FontStrikethru
 gvItemNameFontUnderline = pForm.grdItemname.FontUnderline
 gvItemNameForecolor = pForm.grdItemname.ForeColor
 gvItemNameGridLines = pForm.grdItemname.GridLines
 gvItemNameHeight = pForm.grdItemname.Height
 gvItemNameHelpContextID = pForm.grdItemname.HelpContextID
 gvItemNameHighlight = pForm.grdItemname.HighLight
 gvItemNameLeft = pForm.grdItemname.Left
 gvItemNameRows = pForm.grdItemname.Rows
 gvItemNameScrollbars = pForm.grdItemname.ScrollBars
 gvItemNameTop = pForm.grdItemname.Top
 gvItemNameVisible = pForm.grdItemname.Visible
 gvItemNameWidth = pForm.grdItemname.Width
 gvItemNameColWidth(0) = pForm.grdItemname.ColWidth(0)
 gvItemNameColWidth(1) = pForm.grdItemname.ColWidth(1)
 gvItemNameRowHeight(0) = pForm.grdItemname.RowHeight(0)
 gvItemNameRowHeight(1) = pForm.grdItemname.RowHeight(1)
 pForm.grdItemname.Row = 1: pForm.grdItemname.Col = 1
 SavePicture pForm.grdItemname.Picture, tPath$ & "smarta01.tmp"
 gvItemNameText = pForm.grdItemname.Text

 
 'Filename: File grid on Panel
 gvFilenameBackColor = pForm.grdFilename.BackColor
 gvFilenameBorderStyle = pForm.grdFilename.BorderStyle
 gvFilenameCols = pForm.grdFilename.Cols
 gvFilenameDragmode = pForm.grdFilename.DragMode
 gvFilenameEnabled = pForm.grdFilename.Enabled
 gvFilenameFillstyle = pForm.grdFilename.FillStyle
 gvFilenameFixedCols = pForm.grdFilename.FixedCols
 gvFilenameFixedRows = pForm.grdFilename.FixedRows
 gvFilenameFontBold = pForm.grdFilename.FontBold
 gvFilenameFontItalic = pForm.grdFilename.FontItalic
 gvFilenameFontName = pForm.grdFilename.FontName
 gvFilenameFontSize = pForm.grdFilename.FontSize
 gvFilenameFontstrikethru = pForm.grdFilename.FontStrikethru
 gvFilenameFontUnderline = pForm.grdFilename.FontUnderline
 gvFilenameForecolor = pForm.grdFilename.ForeColor
 gvFilenameGridLines = pForm.grdFilename.GridLines
 gvFilenameHeight = pForm.grdFilename.Height
 gvFilenameHelpContextID = pForm.grdFilename.HelpContextID
 gvFilenameHighlight = pForm.grdFilename.HighLight
 gvFilenameLeft = pForm.grdFilename.Left
 gvFilenameRows = pForm.grdFilename.Rows
 gvFilenameScrollbars = pForm.grdFilename.ScrollBars
 gvFilenameTop = pForm.grdFilename.Top
 gvFilenameVisible = pForm.grdFilename.Visible
 gvFilenameWidth = pForm.grdFilename.Width
 gvFilenameColWidth(0) = pForm.grdFilename.ColWidth(0)
 gvFilenameColWidth(1) = pForm.grdFilename.ColWidth(1)
 gvFilenameRowHeight(0) = pForm.grdFilename.RowHeight(0)
 gvFilenameRowHeight(1) = pForm.grdFilename.RowHeight(1)
 pForm.grdFilename.Row = 1: pForm.grdFilename.Col = 1
 SavePicture pForm.grdFilename.Picture, tPath$ & "smarta02.tmp"
 gvFileNameText = pForm.grdFilename.Text
 
End Sub



Listing Six

Sub gpTopPanelItem (pForm As Form, pText As Variant, pImage As Control)


 pForm.grdItemname.Col = 1: pForm.grdItemname.Row = 1 
 'choose cell for image AND text
 
 'update image
 pForm.grdItemname.Picture = pImage.Picture

 If IsNull(pText) Then 'display default text
 pForm.grdItemname.BackColor = &HFFFFFF 'white
 pForm.grdItemname.ForeColor = &HFF& 'red
 pForm.grdItemname.FontName = "Arial"
 pForm.grdItemname.FontBold = False
 pForm.grdItemname.FontSize = 9.75
 pForm.grdItemname.Text = "<click> to select an ITEM"
 
 ElseIf pText = "" Then 'no text, image only

 pForm.grdItemname.BackColor = &HFFFFFF 'white
 pForm.grdItemname.ForeColor = &HFF& 'red
 pForm.grdItemname.FontName = goListFontName
 pForm.grdItemname.FontBold = goListFontBold
 pForm.grdItemname.FontSize = goListFontSize
 pForm.grdItemname.FontItalic = goListFontItalic
 pForm.grdItemname.Text = ""
 
 Else 'text and image, using character settings from user options 
 ' (go..=gENERAL oPTIONS)
 pForm.grdItemname.Col = 1: pForm.grdItemname.Row = 1
 pForm.grdItemname.Text = pText
 pForm.grdItemname.BackColor = &H80FFFF
 pForm.grdItemname.ForeColor = 0 'black
 pForm.grdItemname.FontName = goListFontName
 pForm.grdItemname.FontBold = goListFontBold
 pForm.grdItemname.FontSize = goListFontSize
 pForm.grdItemname.FontItalic = goListFontItalic
 
 End If

End Sub








































Special Issue, 1994
A Windows I/O Monitor


Monitoring I/O with a Windows VxD




Rick Knoblaugh


Rick is a software engineer specializing in systems programming. He is a
frequent contributor to various computer publications and can be reached on
CompuServe at 71020,2034.


One of the useful protection mechanisms built into the 80386 processor is the
ability to restrict I/O accesses. Operating systems can utilize this to limit
accesses to critical system resources, simulate hardware devices, and handle
device contention between processes. 
The Windows Virtual Machine Manager (VMM) supports this by making services
available to virtual device drivers (VxDs), which facilitate the trapping and
processing of I/O-port accesses. These services also make it possible to write
a VxD which simply "listens in" on I/O activity initiated by less-privileged
code. This can effectively track down problems with hardware, firmware, and
drivers, as well as offer insight into just what is taking place in your
system.
Only one VxD can trap a given I/O port, and that trapping can be enabled and
disabled on the fly. Thus, it would normally not be possible to eavesdrop on
some of the more-interesting I/O ranges, such as COM ports, because these are
trapped by existing Windows drivers. However, the driver I'll present here,
VRKIOMON.386, gets around this by hooking the VMM services for installing I/O
handlers and enabling and disabling I/O trapping. This also lets you use
VRKIOMON to verify that given VxDs are correctly virtualizing I/O ports. I'll
briefly review how I/O trapping works, then delve into the services the VMM
provides to support this. Finally, I'll highlight the inner workings of
VRKIOMON that make use of these services.


The Protection Mechanism


The key to restricting I/O accesses is the I/O-permission bitmap located at
the end of the task-state segment (TSS). This bitmap specifies which I/O
addresses a task may access. If a task's current privilege level (CPL) is less
privileged than its I/O privilege level (IOPL), the bitmap is consulted before
allowing access to a given port. In V86 mode, the bitmap is always checked.
If the bit corresponding to the given port is set, a general-protection
exception is generated. The Ring 0 virtual-machine monitor code must then
inspect the bytes of code that attempted the instruction to ascertain the
particular port and the type of I/O instruction. This involves determining if
the I/O port is an immediate value within the instruction, or if the
instruction is a string instruction (in which case the direction flag must be
checked, and so on). Fortunately, under Windows, the VMM handles this and
more.


VMM Services


To receive control when a given port is accessed, VxDs can call the
Install_IO_Handler VMM service, specifying the port and the address of a
callback routine. If multiple ports are being trapped, you can use
Install_Mult_IO_Handlers (which will, in turn, call Install_IO_Handler for
each port specified). When subsequent I/O accesses occur within Ring 3 code, a
general-protection exception is generated, and the particular I/O instruction
is decoded. The VxD's callback routine is called with register values
indicating the type of I/O, the port, the VM handle, any output data, and a
pointer to the client-register structure. The iomon_trap procedure in Listing
One is an example of such a callback routine.
As if providing this useful information to your callback routine weren't
enough, the VMM can go to even greater lengths to serve you. If, for example,
the I/O access is via a string instruction, and getting every ounce of
performance isn't an issue, you can avoid emulating it by using the
Simulate_IO service. Jumping to Simulate_IO causes the VMM to enter your
callback multiple times, breaking up the complex string instruction into
individual in or out instructions. It will adjust all client registers
appropriately for you.
Once I/O ports are trapped, VxDs can use VMM services to enable and disable
the trapping on a global basis or per virtual machine. These services are
useful in managing device contention. For example, by allowing only one VM to
access a given hardware device at a time, a VxD can disable trapping for the
VM that currently owns the device and reenable the trapping when the device is
released.


Inside VRKIOMON


Typically, I/O handlers are installed during the Device_Init phase of VxD
initialization, but a few may do so during Sys_Critical_Init. To ensure that
its I/O handlers get installed first, VRKIOMON uses an initialization order of
VMM_INIT_ORDER+1 and performs I/O-handler installation during
Sys_Critical_Init. During this initialization phase, VRKIOMON also hooks the
Install_IO_Handler service and the services for enabling and disabling
trapping; see hook_enab_dis in Listing One.
When subsequent calls are made to Install_IO_Handler by other VxDs, VRKIOMON's
version of the service, iomon_iohand, passes through requests that don't
involve monitored ports. For those that are monitored, it stores the requested
callback address and returns a status indicating success (provided that it
hasn't already seen a request for the particular port). Thus, VRKIOMON is
called back when all monitored I/O ports are trapped. If another VxD has not
requested a callback for the port, VRKIOMON performs the I/O and stores the
data. If ports are trapped by other VxDs, VRKIOMON calls their callback
routines and stores data that is read or written by those routines. The code
that handles this (iomon_trap) has to deal with the possibility that the other
callback may jump to the Simulate_IO service, or that the other VxD has
disabled I/O trapping.
VRKIOMON provides an API for DOS or Windows applications to allow the logged
data to be retrieved and displayed. (See the API_call table in Listing One for
a list of API functions.) Since one of my goals was to make the logged data
available to a DOS utility, the logging buffer is allocated in V86 address
space using the _Allocate_Global_V86_Data_Area service.


Using VRKIOMON


Since VRKIOMON is not a full-blown commercial product (with all the extensive
QA that goes with it), I strongly recommend you only run it on a test system.
Also, if you use VRKIOMON to monitor the hard-disk I/O (1f0 through 1f7), be
sure to use a test platform! In this case, set 32BitDiskAccess=off in your
SYSTEM.INI file. If you monitor 1f1, you must also monitor 1f0 (1f1 by itself
will still cause a trap when word accesses are performed at 1f0).
To install the I/O monitor, copy VRKIOMON.386 to your \windows\system
directory and make the additions shown in Figure 1 to your SYSTEM.INI file. In
addition to adding device=vrkiomon.386 to SYSTEM.INI, you can specify several
parameters (see Table 1) and up to ten ranges (VIOBEG0/VIOEND0 through
VIOBEG9/VIOEND9) for a total of 64 ports (default value of MAX_PORTS equates).
To display the results, go into a DOS box and run IODISP (Listing Two), which
reports I/O activity information to the screen. Output can also be redirected
to a file (for example, IODISP > myfile). The same API used by IODISP is also
available to Windows applications. Finally, if you need to isolate a
particular activity (which could be lost when the logging buffer wraps), use
the API for initializing the buffer and controlling wrapping.


Conclusion


The ability to monitor I/O ports can offer insights when debugging or studying
system activity. The API provided by this VxD can add this aspect to other
analysis tools you may be using. For example, my DOS device-driver monitor
(see "Device Driver Monitoring," DDJ, March 1992) could use this to report the
commands output to a controller as the result of a driver receiving a
particular request. In the future, I may also provide an object-oriented
I/O-logging buffer-analysis tool that will report activity in more high-level
terms. 


References



80386 Programmer's Reference Manual. Santa Clara, CA: Intel Corp., 1986.
Thielen, David and Bryan Woodruff. Writing Windows Virtual Device Drivers.
Reading, MA: Addison-Wesley, 1993.
Figure 1: Required additions to SYSTEM.INI to install VRKIOMON.
[386Enh]
device=vrkiomon.386 Create entry for I/O Monitor driver.
VIOBUF=6 Number of 4K pages for buffer.
VIOBEG0=378 Specify ranges you want to listen to.
VIOEND0=37f You can have up to 10 ranges
VIOBEG0--VIOEND0, VIOBEG1--VIOEND1, and so on.
 maximum of 64 ports may be trapped The example
 shown here specifies the ports for LPT1.
Table 1: Optional parameters that can be specified when installing VRKIOMON.
 Parameter Description 
 VIOBUF=nnn Where nnn is the number of 4K pages to be allocated for the
 logging buffer.
 VIOBEG0=nnnn Beginning I/O address to be monitored (first range).
 VIOEND0=nnnn Ending I/O address (first range).
 .
 .
 .
 VIOBEG9=nnnn Beginning I/O address to be monitored (last range).
 VIOEND9=nnnn Ending I/O address (last range).

Listing One 

;--------------------------------------------------------------- 
;vriomon.asm - I/O Monitoring VxD 
;Copyright 1994 ASMicro Co. 
;01/09/94 Rick Knoblaugh 
;

.386p

;----- include files 
 include vmm.inc
 include debug.inc
;------ equates -----------------------------------------------
VRIOMON_VER_HI EQU 1
VRIOMON_VER_LO EQU 0
MAX_PORTS EQU 64 ;increase to log more
MAX_RANGES EQU 10 ;VIOBEGn VIOENDn (n < MAX_RANGES)
MAX_VM_TRACKED EQU 1fh ;track I/O for this many VMs

; VxD ID assigned by vxdid@microsoft.com
VRIOMON_Dev_ID EQU 317eh 

;----- structures ---------------------------------------------
buf_record struc ;format of logged data
buf_info db ?
buf_port dw ?
buf_data db ?
buf_record ends

doub_word struc 
d_offset dw ?
d_segment dw ?
doub_word ends 


port_info struc
vrio_port dw ? ;port number
vrio_callb dd 0 ;callback of other trapper
 ;(if any, zero if none)
vio_enab_flags dd 0 ;bitmap enable/disable status
port_info ends

get_buf_info struc
buf_beg_ptr dd ?
buf_data_end dd ?
buf_size dd ?
buf_flags db ?
get_buf_info ends

enab_disab_flag record glob_io_bit:1, local_ios:31
wrap_flag_bits record yo_unused:6, dont_wrap:1, it_wrapped:1

flag_bits record fill0:14, vmbit:1, resumef:1, fill1:1, nest_taskf:1,\
 iopl:2, overf:1, direcf:1, inter:1, trapf:1, sign:1, \
 zero:1, fill3:1, auxcarry:1, fill4:1, parity:1, \
 fill5:1, carry:1
;---------------------------------------------------------------
; Virtual Device Decalaration 
;--------------------------------------------------------------- 

Declare_Virtual_Device VRKIOMON, VRIOMON_VER_HI , VRIOMON_VER_LO, 
 VRIOMON_Control, VRIOMON_Dev_ID, \
 VMM_Init_Order, API_handler, API_handler
;---------------------------------------------------------------
; Local Data 
;--------------------------------------------------------------- 

VxD_LOCKED_DATA_SEG

buffer_beg_ptr dd 0
buffer_end_ptr dd 0
buffer_cnt_ptr dd 0
buffer_wrk_ptr dd 0
the_vmm_iohand dd 0
old_glob_disab dd 0
old_loc_disab dd 0
old_glob_enab dd 0
old_loc_enab dd 0
in_process_cnt dw 0
buf_capacity dd 0
buf_wrap_flag db 0
number_ports dw 0
hold_string_info dw 0
hold_string_ptr dd 0
hold_string_cnt dw 0
port_data port_info MAX_PORTS dup(<>)

API_call label dword
 dd offset32 VxDversion
 dd offset32 VxDget_bufptr
 dd offset32 VxDinit_buf
MAX_API_CALLS EQU ($ - API_call)/ size doub_word


VxD_LOCKED_DATA_ENDS
;---------------------------------------------------------------
; Initialization Data 
;--------------------------------------------------------------- 

VxD_IDATA_SEG

Viomon_Buf_String db 'VIOBUF',0
Viomon_Beg_String db 'VIOBEG0',0
Viomon_End_String db 'VIOEND0',0
CNT_POSITION EQU ($ - Viomon_End_String) - 2 
VxD_IDATA_ENDS 

;---------------------------------------------------------------
; Initialization Code 
;--------------------------------------------------------------- 
VxD_ICODE_SEG

;--------------------------------------------------------------
;VRIOMON_Crit_Init - Trap all the ports specified by the 
; parms in SYSTEM.INI. Look for values 
; specifying up to MAX_RANGES ranges 
; (e.g. 
; VIOBEG0=xxxx 
; VIOEND0=xxxx 
; VIOBEG1=xxxx 
; VIOEND1=xxxx 
; ... ). 
; Also, we need to hook VMM services for 
; Install_IO_Handler, Disable_Local_Trapping 
; Enable_Local_Trapping, Disable_Global_Trapping,
; and Enable_Global_Trapping. This allows us 
; to continue to listen in on port activity 
; even when another VxD has trapped the same 
; port or disabled the trapping. 
; Enter: 
; Exit: 
; port_data = filled with ports we're trapping
; if unable to trap ports or hook services 
; carry is set. 
;--------------------------------------------------------------
BeginProc VRIOMON_Crit_Init
 xor ecx, ecx ;range counter
 xor eax, eax 
 mov ebx, OFFSET32 port_data ;area to store values

VRIOMON_Crit_I025:
 mov edi, OFFSET32 Viomon_Beg_String 
 xor esi, esi ;[386enh] section
 VMMCall Get_Profile_Hex_Int ;get begin range 
 jz short VRIOMON_Crit_I050 ;if no value
 jnc short VRIOMON_Crit_I100 ;if found
VRIOMON_Crit_I050:
 cmp cx, (MAX_RANGES - 1) ;end of ranges?
 je short VRIOMON_Crit_I400
 jmp short VRIOMON_Crit_I300 ;try another range

VRIOMON_Crit_I100:
 mov dx, ax ;save range start


 mov edi, OFFSET32 Viomon_End_String 
 xor esi, esi ;[386enh] section
 VMMCall Get_Profile_Hex_Int ;get end range 
 jc short VRIOMON_Crit_I150 ;if not found
 jz short VRIOMON_Crit_I150 ;or no value 

 cmp ax, dx ;cmp with begin
 jae short VRIOMON_Crit_I200 ;if valid range

VRIOMON_Crit_I150:
IFDEF DEBUG
 Trace_Out "VRKIOMON: Invalid range specifed in SYSTEM.INI"
ENDIF

VRIOMON_Crit_I180:
 jcxz VRIOMON_Crit_I800 ;exit if none at all
 jmp short VRIOMON_Crit_I400 ;go trap valid ports

VRIOMON_Crit_I200:
 push ecx ;save range count
 call store_ports 
 pop ecx ;restore range count
 jc short VRIOMON_Crit_I180

;Change "VIOBEG1" to "VIOBEG2", etc.
VRIOMON_Crit_I300:
 inc [Viomon_Beg_String + CNT_POSITION]
 inc [Viomon_End_String + CNT_POSITION]

 inc cx ;next range
 cmp cx, MAX_RANGES ;done all?
 jne VRIOMON_Crit_I025 ;if not, get more

VRIOMON_Crit_I400:

; hook the ports to watch
 movzx ecx, number_ports 
 jcxz VRIOMON_Crit_I800 ;if no ports
 mov ebx, OFFSET32 port_data ;area to store values
 mov esi, OFFSET32 iomon_trap ;address of our handler
VRIOMON_Crit_I450:
 movzx edx, [ebx].vrio_port ;port number
 VMMCall Install_IO_Handler
 jnc short VRIOMON_Crit_I500 ;if trapped ok

IFDEF DEBUG
 Trace_Out "VRKIOMON: Unable to trap port #EDX"
ENDIF

VRIOMON_Crit_I500:
 add ebx, size port_info ;get port entry
 loop VRIOMON_Crit_I450 ;go do next port

 mov eax, Install_IO_Handler ;hook I/O
 mov esi, OFFSET32 iomon_iohand ;trap install
 VMMCall Hook_Device_Service

 jnc short VRIOMON_Crit_I550 ;if trapped ok

IFDEF DEBUG
 Trace_Out "VRKIOMON: Unable to hook Install_IO_Handler"
ENDIF
;
VRIOMON_Crit_I550:
 mov the_vmm_iohand, esi
 call hook_enab_dis
 ret
VRIOMON_Crit_I800:
;No valid ports specified
IFDEF DEBUG
 Trace_Out "VRKIOMON: No valid ports specified in SYSTEM.INI"
ENDIF
 stc
 ret
EndProc VRIOMON_Crit_Init

BeginProc hook_enab_dis
 mov eax, Disable_Global_Trapping
 mov esi, OFFSET32 iomon_glob_dis
 VMMCall Hook_Device_Service
 jc short hook_enab_dis_575
 mov old_glob_disab, esi
hook_enab_dis_575:
 mov eax, Disable_Local_Trapping
 mov esi, OFFSET32 iomon_loc_dis
 VMMCall Hook_Device_Service
 jc short hook_enab_dis_600
 mov old_loc_disab, esi
hook_enab_dis_600:
 mov eax, Enable_Local_Trapping
 mov esi, OFFSET32 iomon_loc_enab
 VMMCall Hook_Device_Service
 jc short hook_enab_dis_625
 mov old_loc_enab, esi
hook_enab_dis_625:
 mov eax, Enable_Global_Trapping
 mov esi, OFFSET32 iomon_glob_enab
 VMMCall Hook_Device_Service
 jc short hook_enab_dis_650
 mov old_glob_enab, esi
hook_enab_dis_650:
 ret
EndProc hook_enab_dis
;--------------------------------------------------------------
;store_ports - Store ports for I/O trapping. If ports 
; have not previously been specified, store 
; the values. 
; Enter: 
; ebx = ptr to next record for storing 
; port numbers 
; ecx = range number being processed 
; dx = start of range of I/O ports 
; (range has been validated) 
; ax = end of range 
; number_ports = count of ports trapped so far 
; Exit: 
; ebx = advanced to next record 
; number_ports = updated count of ports 

; If error, return with carry set 
; esi, ax, cx, dx trashed 
;--------------------------------------------------------------
BeginProc store_ports
 mov esi, OFFSET32 port_data ;area to store values
 movzx ecx, number_ports
 jcxz store_p250 ;if none stored
store_p100:
 cmp [esi].vrio_port, ax ;below end of range?
 ja short store_p200 ;if not, not a dup
 cmp [esi].vrio_port, dx ;below start range?
 jb short store_p200 ;if so, not a duplicate
IFDEF DEBUG
 Trace_Out "VRKIOMON: Overlapping ranges specified in SYSTEM.INI"
ENDIF
 stc
 ret
store_p200:
 add esi, size port_info 
 loop store_p100
store_p250:
 movzx ecx, ax ;end of range
 inc cx
 sub cx, dx ;number in range

 mov ax, number_ports
 add ax, cx ;get number of ports
 cmp ax, MAX_PORTS ;cmp with max supported
 jna short store_p275

 mov ax, MAX_PORTS
IFDEF DEBUG
 Trace_Out "VRKIOMON: Too many ports specified in SYSTEM.INI (max is #ax)"
ENDIF
 stc
 ret
store_p275:
 mov number_ports, ax ;add in to total
store_p300: 
 mov [ebx].vrio_port, dx ;store port
 inc dx ;next port in range
 add ebx, size port_info 
 loop store_p300
store_p500:
 clc
 ret
EndProc store_ports
;--------------------------------------------------------------
;VRIOMON_Device_Init - Retrieve buffer size parm from 
; SYSTEM.INI and allocate the buffer 
; within the global v86 data area. 
; Enter: 
; Exit: 
; buffer_beg_ptr = start of buffer 
; buffer_wrk_ptr = start of buffer 
; buffer_end_ptr = end of buffer 
; If error, return with carry set 
;--------------------------------------------------------------
BeginProc VRIOMON_Device_Init

 mov edi, OFFSET32 Viomon_Buf_String 
 xor esi, esi ;[386enh] section
 VMMCall Get_Profile_Hex_Int ;get buffer size 
 
 
 and eax, 0ffh ;Get # of 4k 
 jnz short VRIOMON_D100 ;if legal value
 mov al, 2 ;else default to 2
VRIOMON_D100:
 mov cl, 12 ; * 4K
 shl eax, cl
 mov ecx, eax ;save size in bytes
 mov buf_capacity, eax

 push ecx ;save size
 VMMcall _Allocate_Global_V86_Data_Area, <eax, GVDAZeroInit>
 pop ecx 
 or eax, eax ;got the memory?
 jnz short VRIOMON_D200 ;if so, continue
IFDEF DEBUG
 Trace_Out "VRKIOMON: Unable to allocate #CX bytes"
ENDIF
 stc
 ret
VRIOMON_D200:
 mov buffer_beg_ptr, eax 
 mov buffer_wrk_ptr, eax

 add eax, ecx
 sub eax, ((size buf_record) + (size doub_word) )
 mov buffer_end_ptr, eax

 clc
 ret
EndProc VRIOMON_Device_Init

VxD_ICODE_ENDS
;---------------------------------------------------------------
; Locked Code 
;--------------------------------------------------------------- 
VxD_LOCKED_CODE_SEG

BeginProc VRIOMON_Control
 Control_Dispatch Sys_Critical_Init, VRIOMON_Crit_Init
 Control_Dispatch Device_Init, VRIOMON_Device_Init

 clc
 ret
EndProc VRIOMON_Control

VxD_LOCKED_CODE_ENDS
;---------------------------------------------------------------
; Code Segment 
;--------------------------------------------------------------- 
VxD_CODE_SEG
;--------------------------------------------------------------------
;API_handler - API handler for both V86 and protected mode callers 
; Enter: 
; caller's ax = API function 

; Exit: 
; caller's CY set if invalid function 
;--------------------------------------------------------------------
BeginProc API_handler
 movzx eax, [ebp].Client_AX
 cmp ax, MAX_API_CALLS ;valid function?
 ja short API_hand_900 
 and [ebp.Client_EFlags], not (mask carry) ;success
 call API_call[eax * 4]
 ret
API_hand_900:
 or [ebp.Client_EFlags], (mask carry) ;error
 ret
EndProc API_handler

BeginProc VxDversion
 mov [ebp.Client_AX], ((VRIOMON_VER_HI shl 8) or VRIOMON_VER_LO)
 ret
EndProc VxDversion

BeginProc VxDget_bufptr
;First, get ptr to caller's structure to be filled with ptrs to
;the buffer, size of buffer, and indication of whether it has wrapped.
 mov ax, (Client_BX shl 8) + Client_DX
 VMMcall Map_Flat
 cmp eax, -1 ;error?
 je short VxDget_b400
 mov esi, eax ;32 bit ptr to caller data

 mov edx, buffer_wrk_ptr 
 mov eax, buffer_beg_ptr 
 sub edx, eax ;get count of bytes used

 mov [esi].buf_data_end, edx ;give it to caller 

 mov ecx, buf_capacity 
 dec ecx ;seg limit
 add eax, [ebx.CB_High_Linear]
 VMMcall Map_Lin_To_VM_Addr
 jnc short VxDget_b500 ;if error
VxDget_b400:
 bts [ebp].Client_EFlags, carry ;error
 ret
VxDget_b500:
 mov [esi].buf_beg_ptr.d_segment, cx
 mov [esi].buf_beg_ptr.d_offset, dx

 mov eax, buf_capacity
 mov [esi].buf_size, eax
 mov al, buf_wrap_flag
 mov [esi].buf_flags, al 

 ret
EndProc VxDget_bufptr


;--------------------------------------------------------------
;VxDinit_buf - Handle ADP function for initializing logging 
; buffer. Reset buffer ptr to beginning. Also, 

; set flag per user option to wrap or not wrap 
; if end of buffer is reached. 
; Enter: 
; client dx = 1 if buffer should not wrap 
; Exit: 
; buffer_wrk_ptr = buffer_beg_ptr 
; buf_wrap_flag is updated. 
;--------------------------------------------------------------
BeginProc VxDinit_buf
 cli
 mov eax, buffer_beg_ptr ;reset to start
 mov buffer_wrk_ptr, eax 
 mov buf_wrap_flag, 0 ;just started, no wrap
 test [ebp.Client_DX], 1 ;want buf not to wrap?
 jz short VxDinit_b999 ;if want wrap, exit
 or buf_wrap_flag, (mask dont_wrap) ;set not to wrap
VxDinit_b999:
 sti
 ret
EndProc VxDinit_buf
;--------------------------------------------------------------
;iomon_trap - callback procedure for I/O port trapping. 
; Enter: 
; ebx = current VM handle 
; ecx = type of I/O 
; edx = port number 
; ebp = ptr to client reg struc 
; eax = output data 
; Exit: 
; eax = input data (if it's a read) 
;--------------------------------------------------------------
BeginProc iomon_trap, HIGH_FREQ
 call ck_handler ;other trappers?
 jz short iomon_trap040 ;if not, go do I/O

;If the other trapper calls simulate I/O (to break up string I/O), it will
;in turn repeatedly call this handler (for each byte, word, or dword).
;Thus, check to see that it isn't simulate I/O calling. If it is, simply
;jump to other trapper's callback routine. When all data has been
;tranferred, it will finally return from the call to the callback.
 cmp in_process_cnt, 0 ;already processing?
 ja short iomon_trap030
 push ebx ;save VM handle
 push ecx
 push edx

 inc in_process_cnt ;processing

 test cl, String_IO ;string I/O ?
 jz short iomon_trap010 ;if not, go do call

;If other Vxd truly processes string I/O, it will adjust client index
;registers, and rep count. Thus, use call below to save this information.
 push eax
 push ecx
 call get_string_info 
 pop ecx
 pop eax
 jmp short iomon_trap012 

iomon_trap010:
 test cl, Output ;is this output?
 jz short iomon_trap012 ;if not, go read
;Store the value to be written now --just in case it doesn't stay in ax.
 call storedat 
iomon_trap012:
 call dword ptr [esi].vrio_callb 
 pop edx
 pop ecx
 pop ebx

 test cl, String_IO ;string I/O ?
 jz short iomon_trap025 ;if not, go do other

 cld
 test ecx, Reverse_IO ;insure direction flag
 jz short iomon_trap015
 std
iomon_trap015:
 push edi
 push ecx

 mov di, cx ;info about string I/O
 movzx ecx, hold_string_cnt ;rep count 
 mov esi, hold_string_ptr

 cmp esi, -1 ;if for any reason address
 je short iomon_trap020 ;was bad, can't store

 call store_string_io ;store string data
iomon_trap020:
 pop ecx
 pop edi
 jmp short iomon_trap999
iomon_trap025:
 test cl, Output ;is this output?
 jnz short iomon_trap999 ;if so, got it all
 jmp short iomon_trap900 ;if read, go store
iomon_trap030:
 inc in_process_cnt ;processing
 jmp dword ptr [esi].vrio_callb 
iomon_trap040:
 test cl, String_IO ;string I/O ?
 jz short iomon_trap050
 call process_string ;go perform string I/O
 jmp short iomon_trap999
iomon_trap050:
 test cl, Output ;is this output?
 jnz short iomon_trap500

 test cl, Dword_IO ;size is dword?
 jz short iomon_trap080 
 in eax, dx ;input a dword 
 jmp short iomon_trap900
iomon_trap080:
 test cl, Word_IO ;size is word?
 jnz short iomon_trap100
 in al, dx ;input a byte


 jmp short iomon_trap900
iomon_trap100:
 in ax, dx ;input a word
iomon_trap200:
 jmp short iomon_trap900
iomon_trap500:

 test cl, Dword_IO ;size is dword?
 jz short iomon_trap550 
 out dx, eax ;output a dword 
 jmp short iomon_trap900
iomon_trap550:
 test cl, Word_IO ;size is word?
 jnz short iomon_trap600
 out dx, al
 jmp short iomon_trap900
iomon_trap600:
 out dx, ax
iomon_trap900:
 cmp in_process_cnt, 1 ;simulate_IO ?
 jbe short iomon_trap910 ;if not, go store 
 ret 
iomon_trap910:
 call storedat ;store the data
iomon_trap999:
 mov in_process_cnt, 0 ;no longer processing
 ret

EndProc iomon_trap
;--------------------------------------------------------------
;process_string - Perform the string I/O operation and update 
; client registers appropriately. 
; Enter: 
; ebx = current VM handle 
; ecx = type of I/O 
; bit 6 indicates if repeat 
; prefix is present 
; bit 8 indicates if the 
; direction flag is set 
; edx = port number 
; ebp = ptr to client reg struc 
; eax = output data 
; Exit: 
; Client registers updated. 
;--------------------------------------------------------------
BeginProc process_string, HIGH_FREQ
 push eax
 push ecx
 push edi
 push ebp

 call get_string_info
 cmp eax, -1 ;error getting address?
 je process_s999

 test hold_string_info, Rep_IO ;rep prefix?
 jz short process_s100

 mov [ebp.Client_CX], 0 ;if so, zero client's cx

process_s100:
 mov edi, eax ;address for string data
 mov esi, eax
 mov ax, hold_string_info
 push ecx ;save count
 test al, Output ;outs?
 jnz short process_s500
process_s150:
 add ebp, Client_DI ;point to client di on stack
 test al, Dword_IO ;dword I/O ?
 jnz short process_s300
 test al, Word_IO ;word I/O ?
 jnz short process_s200 
 rep insb
 jmp short process_s400
process_s200:
 rep insw
 jmp short process_s400
process_s300:
 rep insd
process_s400:
 jmp short process_s800 ;go store the data
process_s500:
 add ebp, Client_SI ;point to client si on stack
 test al, Dword_IO ;dword I/O ?
 jnz short process_s700
 test al, Word_IO ;word I/O ?
 jnz short process_s600 
 rep outsb
 jmp short process_s800
process_s600:
 rep outsw
 jmp short process_s800
process_s700:
 rep outsd
process_s800:
 pop eax ;get count

 push eax
 
 mov cx, hold_string_info 
 and cl, (Word_IO or Dword_IO)
 shr cl, 3 ;convert to shift count
 ;(i.e. dword=2, word=1)

 shl eax, cl ;adjust index by this amt

 test hold_string_info, Reverse_IO ;direction flag set?
 jz short process_s850 
 neg ax ;if so, subtract
process_s850:
 add [ebp], ax ;adjust user index reg
process_s900:
 pop ecx ;restore count
 mov esi, hold_string_ptr
 mov di, hold_string_info

 call store_string_io
process_s999:

 pop ebp
 pop edi
 pop ecx
 pop eax
 ret
EndProc process_string

;--------------------------------------------------------------
;store_string_io - Called by process_string and iomon_trap to 
; store string I/O data. 
; Enter: 
; cx = rep count 
; edx = port number 
; di = I/O type 
; esi = ptr to start of data that was 
; input or output via string I/O 
; Direction flag appropriately set or clear 
; Exit: 
;--------------------------------------------------------------
BeginProc store_string_io, HIGH_FREQ
store_s100:
 push ecx ;save count
 test di, Dword_IO ;dword I/O ?
 jnz short store_s300
 test di, Word_IO ;word I/O ?
 jnz short store_s200 
 lodsb 
 jmp short store_s400
store_s200:
 lodsw 
 jmp short store_s400
store_s300:
 lodsd
store_s400:
 mov cx, di
 call storedat
 pop ecx
 loop store_s100

 ret
EndProc store_string_io

;--------------------------------------------------------------
;get_string_info - Called by process_string and iomon_trap to 
; determine address for string I/O operation 
; and save away information. 
; Enter: 
; ecx = type of I/O 
; edx = port number 
; ebp = ptr to client reg struc 
; Exit: 
; eax = 32 bit address for I/O data 
; ecx = rep count (1 if no rep) 
; values also saved away: 
; hold_string_ptr = 32 bit address 
; hold_string_info = I/O type 
; hold_string_cnt = rep count 
;--------------------------------------------------------------
BeginProc get_string_info

 mov ax, cx ;get I/O type
 mov hold_string_info, ax ;save it also
 mov ecx, 1 ;default to no rep count
 test al, Rep_IO ;repeats?
 jz short get_s100 

 movzx ecx, [ebp.Client_CX] ;if so, get the count
get_s100:
 mov hold_string_cnt, cx
 
 test al, Output ;outs?
 jnz short get_s500 ;if so, go get ds:si ptr

 mov ax, (Client_ES shl 8) + Client_DI 
 VMMcall Map_Flat
 jmp short get_s999
get_s500:
 mov ax, (Client_DS shl 8) + Client_SI
 VMMcall Map_Flat
get_s999:
 mov hold_string_ptr, eax ;save ptr to data

 ret
EndProc get_string_info

;--------------------------------------------------------------
;storedat - Store the data into the buffer. Buffer records 
; have the following format: 
; buf_info db <- Tells size and direction of 
; of port access. 
; See "type of I/O" in VMM.INC 
; for definitions. 
; buf_port dw <- Port number 
; buf_data db <- Contains the data (this value 
; can be also be a word or dword) 
; Enter: 
; cl = indicator of direction of access 
; and size of data transfer 
; eax = data value 
; edx = port 
; buffer_wrk_ptr = next buffer position 
; buffer_beg_ptr = start of buffer 
; buffer_end_ptr = end of buffer 
; Exit: 
; buffer_wrk_ptr = next buffer position 
; buf_wrap_flag = Bit 0 set if buffer wrapped 
; All registers saved. 
;--------------------------------------------------------------
BeginProc storedat, HIGH_FREQ
 push ecx
 push edi
 cmp buffer_beg_ptr, 0 ;have buffer?
 je short stored999 ;if not, can't store
 mov edi, buffer_wrk_ptr 
 cmp edi, buffer_end_ptr ;at end?
 jb short stored100

 test buf_wrap_flag, (mask dont_wrap) ;if set up not to wrap
 jnz short stored999 ;then exit


 mov edi, buffer_beg_ptr ;if so, wrap
 mov buffer_wrk_ptr, edi
 or buf_wrap_flag, (mask it_wrapped) ;note that it wrapped
stored100:
 add edi, [ebx.CB_High_Linear] ;address in VM

 sub ch, ch
 movzx ecx, cx
 mov [edi].buf_info, cl
 mov [edi].buf_port, dx

 test cl, Dword_IO ;size is dword?
 jz short stored130
 mov dword ptr [edi].buf_data, eax
 and cl, Dword_IO 
 jmp short stored140
stored130:
 and cl, Word_IO
 jcxz stored150 ;jump if byte 
 mov word ptr [edi].buf_data, ax ;store if size is word
stored140: ;convert size to
 shr cx, 2 ;number of bytes
 dec cx ;minus 1
 jmp short stored160 ;go add in to total 
stored150:
 mov [edi].buf_data, al
stored160:
 add ecx, size buf_record ;size of info
 add buffer_wrk_ptr, ecx ;add size data
stored999:
 pop edi
 pop ecx
 ret
EndProc storedat
;--------------------------------------------------------------
;ck_handler - Determine if another VxD has requested a call- 
; back for this port. Also, if another VxD has 
; trapped the port, see if it has disabled 
; the trapping via global or local disable. If 
; so, treat it as if it's not trapped. 
; Enter: 
; edx = port number 
; ebx = vm handle 
; Exit: 
; esi = ptr to port info structure 
; ZR if no other trappers or the other 
; trapper has disabled trapping 
;--------------------------------------------------------------
BeginProc ck_handler, HIGH_FREQ
;
;Already know this is one of the ports we're trapping, use determine
;routine to get index into info about port
;
 call determin_r_port 
 cmp [esi].vrio_callb, 0 ;callback for this port?
 jz short ck_hand900



 push ebx
 bt [esi].vio_enab_flags, glob_io_bit ;global disable?
 jc short ck_hand800

 mov ebx, [ebx].CB_VMID ;get vm id
 dec ebx ;zero relative 
 and ebx, MAX_VM_TRACKED
 bt [esi].vio_enab_flags, ebx 
;carry set if local or global disable
ck_hand800:
 mov ebx, 2
 sbb ebx, 1 
;return zero if local disable
 pop ebx
ck_hand900:

 ret
EndProc ck_handler 
;--------------------------------------------------------------
;iomon_iohand - This routine replaces the VMM's 
; install_io_handler. If requested port is one 
; that is being monitored, save callback 
; address and return success. 
; Enter: 
; esi = callback address 
; edx = port 
; Exit: 
;--------------------------------------------------------------
BeginProc iomon_iohand
 push eax
 push esi

 mov eax, esi ;save callback address

 call determin_r_port
 jnz short iomon_io900 ;if not one of ours 
 
 cmp [esi].vrio_callb, 0 ;is there already a client?
 jne short iomon_io900 ;if so, let vmm reject it

 mov [esi].vrio_callb, eax ;store callback address
 pop esi
 pop eax
 ret ;return carry clear
iomon_io900:
 pop esi
 pop eax
 jmp dword ptr the_vmm_iohand
 ret
EndProc iomon_iohand
;--------------------------------------------------------------
; iomon_glob_dis - Gets control when VMM service for 
; globally disabling I/O trapping is called. 
; If this is for a port being logged, 
; make a note of it, but don't really do it. 
; If not a port being logged, simply 
; transfer control to the original handler. 
; Enter: 
; edx = port 

; Exit: 
;--------------------------------------------------------------
BeginProc iomon_glob_dis
 push esi 
 call determin_r_port
 jnz short iomon_gd900
 bts [esi].vio_enab_flags, glob_io_bit
 pop esi
 ret
iomon_gd900:
 pop esi 
 jmp dword ptr old_glob_disab
EndProc iomon_glob_dis
;--------------------------------------------------------------
; iomon_loc_dis - Gets control when VMM service for 
; local disabling of I/O trapping is called. 
; If this is for a port being logged, 
; make a note of it, but don't really do it. 
; Bit map in vio_enab_flags is used for 
; flagging per vm id number. 
; If not a port being logged, simply 
; transfer control to the original handler. 
; Enter: 
; edx = port 
; ebx = vm handle 
; Exit: 
;--------------------------------------------------------------
BeginProc iomon_loc_dis
 push esi 
 call determin_r_port
 jnz short iomon_ld900
 push ebx ;save vm handle 
 mov ebx, [ebx].CB_VMID ;get vm id
 dec ebx ;zero relative 
 and ebx, MAX_VM_TRACKED
 bts [esi].vio_enab_flags, ebx 
 pop ebx
 pop esi
 clc
 ret
iomon_ld900:
 pop esi 
 jmp dword ptr old_loc_disab
EndProc iomon_loc_dis

BeginProc iomon_glob_enab
 push esi 
 call determin_r_port
 jnz short iomon_ge900
 btr [esi].vio_enab_flags, glob_io_bit
 pop esi
 ret
iomon_ge900:
 pop esi 
 jmp dword ptr old_glob_enab
EndProc iomon_glob_enab

BeginProc iomon_loc_enab
 push esi 

 call determin_r_port
 jnz short iomon_le900
 push ebx ;save vm handle 
 mov ebx, [ebx].CB_VMID ;get vm id
 dec ebx ;zero relative 
 and ebx, MAX_VM_TRACKED
 btr [esi].vio_enab_flags, ebx 
 pop ebx
 pop esi
 clc
 ret
iomon_le900:
 pop esi 
 jmp dword ptr old_loc_enab 
EndProc iomon_loc_enab
;--------------------------------------------------------------
;determin_r_port - Check for port in list of ports we're 
; trapping. 
; Enter: 
; edx = port to check 
; number_ports = number of ports trapped 
; port_data = array of port info 
; Exit: 
; NZ if not one of our ports 
; ESI = offset of entry if found 
;--------------------------------------------------------------
BeginProc determin_r_port
 push ecx
 movzx ecx, number_ports ;number we trapped
 mov esi, OFFSET32 port_data
determin_r_p100: 
 cmp [esi].vrio_port, dx
 je short determin_r_p999
 add esi, size port_info 
 loop determin_r_p100 

determin_r_p999:
 pop ecx
 ret
EndProc determin_r_port

VxD_CODE_ENDS

 END



Listing Two

/*-----------------------------------------------------------------------
-----------------------------------------------------------------------*/
#include <stdio.h>
#include <bios.h>
#include <dos.h>

#pragma pack(1)


#define FALSE 0

#define TRUE 1
#define GET_BUFFER_PTR 1
#define INPUT 0
#define OUTPUT 4
#define ABYTE 0
#define AWORD 8
#define ADWORD 0x10
#define VMIOD_DEV_ID 0x317e
#define GET_VXD_API 0x1684

#define BUF_REC_SIZE 4 //actually, it's variable length (see below)

struct buf_record {
 unsigned char io_attrib;
 unsigned short int io_port;
 union {
 unsigned char bio_data;
 unsigned short wio_data;
 unsigned long dio_data;
 } io_data;
 };


struct buf_info {
 struct buf_record far * buf_beg_ptr;
 unsigned long int buf_bytes_used; 
 unsigned long int buf_size;
 unsigned char buf_flags;
 };
 

struct buf_info buf_ptrs;

char * text_direc[]= {"Read",
 "Write"};
char * text_size[]= {"Byte",
 "Word", 
 "Dword"
 };
main()
{

void (far * vxd_code_ptr) ();

union REGS inregs, outregs;
struct SREGS segs;
register unsigned char attrib;
struct buf_record far * buf_wrk_ptr;
unsigned incr;
unsigned long int wrk_count = 0;

 inregs.x.bx=VMIOD_DEV_ID;
 inregs.x.ax=GET_VXD_API; 

 segs.es=0;
 outregs.x.di=0;

 int86x ( 0x2f, &inregs, &outregs, &segs ); //Get VxD API 



 if ( segs.es == 0) //If VxD not installed
 {
 printf ("%s", "VRKIOMON.386 not installed.");
 exit(1);
 }
 
 FP_SEG(vxd_code_ptr)=segs.es;
 FP_OFF(vxd_code_ptr)=outregs.x.di;

 __asm
 {
 mov ax, GET_BUFFER_PTR ;VxD API function for get ptr
 lea dx, buf_ptrs
 mov bx, ds ;bx:dx ptr to buf info data

 }
 (*vxd_code_ptr)(); //Go get the buffer ptr


 if (!(buf_wrk_ptr=buf_ptrs.buf_beg_ptr)) exit(1);

 while(wrk_count < buf_ptrs.buf_bytes_used)
 {
 attrib=buf_wrk_ptr->io_attrib;
 printf ("%s \t", text_direc[ (attrib & OUTPUT) >> 2 ]);
 printf ("%s \t", text_size[ ((attrib & ( ADWORD AWORD) ) >> 3) ]);
 attrib &= ( ADWORD AWORD);

 switch ( (char) attrib)
 {
 case ABYTE:
 printf ("%4x \t %4x \n", buf_wrk_ptr->io_port, 
 buf_wrk_ptr->io_data.bio_data);
 break;
 
 case AWORD:
 printf ("%4x \t %4x \n", buf_wrk_ptr->io_port, 
 buf_wrk_ptr->io_data.wio_data );
 break;

 case ADWORD:
 printf ("%4x \t %8lx \n", buf_wrk_ptr->io_port, 
 buf_wrk_ptr->io_data.dio_data);
 break;
 }

 if (attrib)
 {
 attrib = attrib >> 2;
 --attrib;
 }
 incr= BUF_REC_SIZE + attrib;

 buf_wrk_ptr=(struct buf_record far *) ( 
 (unsigned char far *) buf_wrk_ptr + incr);
 wrk_count += incr;
 }
}































































Special Issue, 1994
Customizing Window Behavior


Subclassing windows with MFC




Vinod Anantharaman


Vinod is a software-design engineer at Microsoft working with desktop
applications and graphical user interfaces. He can be reached at One Microsoft
Way, Redmond, WA 98052.


Subclassing is a well-known meth-od of enhancing and modifying default window
behavior under Microsoft Windows. The way subclassing works is by letting an
application intercept any Windows message before it reaches its target window,
thereby giving the application the first shot at performing some action in
response to the message. Subclassing is widely used for developing specialized
types of window controls from existing ones. For instance, masked edit
controls and multiple-selection list boxes can be developed from the edit and
list-box controls that come with Windows. Potentially expensive
reimplementation of existing control functionality can be avoided by using
this technique. 
When discussing ways to avoid reimplementation, the term "object oriented" can
never be far behind. Although objects have been touted as the panacea for all
programming ills for several years now, only in recent times is this paradigm
finding widespread acceptance in the world of GUI-based Windows programming.
This has a lot to do with the growing acceptance, by developers, of
object-oriented frameworks for Windows, such as the Microsoft Foundation Class
(MFC) library and Borland's Object Windows Library (OWL). By combining
powerful visual tools with well-designed Windows wrapper classes, class
libraries bring method to the madness of API calls. Programmers leverage from
the classes provided in such libraries by inheriting their functionality in
their own derived classes and then extending and modifying the inherited
behavior as desired in an object-oriented fashion. Class libraries support and
simplify code reuse, substantially reducing the time and effort needed to
design and implement Windows applications. In this article, I'll discuss how
to use subclassing in MFC and illustrate this with an MFC DLL that lets you
change the default look of windows running on your system to whatever you
choose.


Automatic Subclassed Windows in MFC


With MFC, custom controls and standard windows inheriting from MFC's CWnd
class get subclassed automatically. You just need to handle messages
appropriately with "message maps"--tables connecting messages to the member
functions that handle them. For example, the CEdit and CListBox classes both
inherit from CWnd. To subclass one of these classes and create your own custom
control, you simply: 
Make your new, derived class inherit from the base class you want; for
example, CMaskedEdit from CEdit.
Write the message map for your derived class. The messages to be included in
the map are usually evident from the functionality of the control you want to
build. For instance, masked edit controls restrict the data input to specific
formats, preventing the entry of invalid characters. Evidently, WM_CHAR needs
to be on CMasked Edit's message map.
Write "message-handler" functions for your class. These are the member
functions that handle the commands mapped to them in the message map.
Construct an instance of your new class and call its creation function. Note
that the call to your class's creation function should eventually lead to one
of CWnd's Create or CreateEx member functions (possibly through an
intermediate call to your immediate base class's Create function) for
subclassing to be automatic.
How does the class library implement subclassing? This isn't much of a
mystery, really. CWnd's Create and CreateEx member functions call the Windows
function SetWindowsHook to set up a window-procedure filtering hook (of the
type WH_CALLWNDPROC) with a corresponding callback function. Windows ensures
that this library callback function will get called by the system whenever the
SendMessage function is invoked. This enables MFC to intercept, among others,
all the WM_CREATE messages sent by the system, as in Figure 1(a). When MFC's
callback function gets a creation message, it subclasses the window by calling
SetWindowLong to reset the window procedure to AfxWndProc, the window
procedure for all windows in MFC. Figure 1(b) shows how AfxWndProc routes all
Windows messages through the attached message map once the window is
subclassed.


Windows MFC Does Not Subclass


So much for windows created by the MFC library. The interesting problem is
that of subclassing windows created with CreateWindow or CreateWindowEx, the
usual Windows API functions.
Assume you want to subclass the window myHwnd. One simple solution would be to
declare a CWnd object and then use CWnd's Attach member function, as in
Example 1(a). But this does not work because Attach does not subclass myHwnd,
and myHwnd must be subclassed prior to the call to Attach. Fortunately, CWnd
has a member function, SubclassWindow, that enables you to do exactly what you
want. So you might be tempted to use SubclassWindow along the lines of Example
1(b).
But this will only get you into more trouble. Why? Remember that MFC itself
subclasses all windows that it creates. On subclassing, MFC saves the window's
original window procedure, its Super WNDPROC, so that this procedure can
continue to process messages not handled by AfxWndProc's message map. Now, a
CWnd class can be subclassed with SubclassWindow exactly once so that all
objects of that class have exactly the same behavior. It is likely that the
CWnd class has already been subclassed by MFC--typically, this would have
happened when a CWnd object was created in your application, either explicitly
or implicitly, by the MFC framework. If so, the
myCWndObject.SubclassWindow(myHwnd) call in Example 1(b) fails to subclass
myHwnd. 
To get over this hurdle, derive a class from CWnd and make this class store
its Super WNDPROC in a location different from that of CWnd. For this, the
derived class has to override CWnd's GetSuperWndProcAddr function. This
virtual function returns a pointer to a window procedure. For subclassed
windows, this return value is a pointer to the original window procedure. Your
derived class should override this function by declaring a static WNDPROC
member variable, initialized to NULL, and returning a pointer to this static
in GetSuperWndProcAddr, as in Example 1(c). Because the WNDPROC member
variable is declared static, all subclassed windows belonging to the class
CMyWnd share a single Super WNDPROC. 
It always helps to see how things fit into a real program, so I'll step
through a subclassing DLL that I developed with MFC. The DLL subclasses all
windows, system-wide, that are not already attached to a CWnd object and that
have a caption and a system-menu box (this includes both preexisting and newly
created windows). To visually demonstrate the subclassing, the DLL replaces
their default Windows-provided looks with a nonclient look that I want to
experiment with. 
In a nutshell, my DLL will: 
Set the subclass style to be windows that have a caption bar and a system
menu.
Dynamically subclass every visible subclass-style window running under Windows
that is not already attached to a CWnd object, attaching each such window to a
C++ object of a class CMyWnd (which inherits from MFC's CWnd class).
Install a window-procedure filtering hook (a Windows hook of the type
WH_CALLWNDPROC) to catch every new window-creation message meant for a
subclass-style window. This way, we can subclass, on the fly, any newly
created subclass-style windows.
Intercept nonclient paint (WM_NCPAINT) messages before they reach their target
subclass-style windows and call my own paint function, which gives subclassed
windows the nonclient look that I fancy.


The Pieces of the Subclassing DLL


For easy readability, I've organized the source code for the DLL in three
separate listings. Listing One contains some typedefs and globals for the DLL.
The code for CWinApp and the Windows hook is in Listing Two . In MFC, CWinApp
is the base class from which you derive a Windows application object; this
class provides member functions for initializing your application and for
cleanup before termination. My DLL derives the class CSubclassDLL from CWinApp
and overrides its InitInstance and ExitInstance member functions.
The InitInstance function, called during DLL initialization, sets up a
window-procedure filtering hook with the exported callback function
CallWndProc, which is called by the system whenever SendMessage is called.
CallWndProc monitors all WM_NCCREATE messages--if it was meant for a
subclass-style window (for example, my function FSubclassStyle returns TRUE
for this window), it calls CMyWnd's MySubclassWindow member function to
promptly subclass this newly created window on the fly. The ExitInstance
function removes this hook before termination. Note that since there can be
only one instance of a DLL, the InitInstance code is called just once and
could equivalently be in an overridden InitApplication member function.
An MFC application needs to declare its derived CWinApp object at the global
level; my DLL's CSubclassDLL object is simply called subclassDLL. 
The CMyWnd and subclassing-related functions are in Listing Three . Each
window that gets dynamically subclassed is attached to an object of my class
CMyWnd, which inherits from MFC's CWnd. CMyWnd overrides CWnd's virtual
GetSuperWndProcAddr function, as discussed previously. CMyWnd adds two new
functions, MySubclassWindow and UnsubclassWindow. MySubclassWindow takes a
window handle, subclasses the window if it does not have a CWnd object already
attached to it, and if successful, returns a pointer to the CMyWnd object that
it attached to the window. The UnsubclassWindow member function retrieves the
Super WNDPROC pointer from GetSuperWndProcAddr, does a SetWindowLong to reset
the Super WNDPROC as the window's window procedure, and calls Detach to detach
the window handle from the CMyWnd object.
The DLL's exported function SetSubclassingState is used to start or terminate
subclassing for this demo. This needs to be called from your driver program.
To start subclassing, call SetSubclassingState with the fStart parameter set
to TRUE. This results in a call to InitSubclassing, which, together with the
recursive InitSubclassingHwnd function, does a depth-first scan on the
system's window tree (starting from the desktop window), subclasses all the
windows that belong to the subclass style, and calls the redraw function for
those windows. When terminating subclassing, call SetSubclassState with fStart
set to FALSE--this leads to a call to TerminateSubclassing, which provides the
means to restore the system back to its pre-subclassed state (a good feature
to have, too!). TerminateSubclassing, together with the recursive
TerminateSubclassingHwnd function, scans depth-first for subclassed windows in
the system's window tree, un-subclasses these by calling UnsubclassWindow, and
calls their redraw function.
Finally, the message map of CMyWnd is set to include, among other messages,
the WM_NCPAINT message. CMyWnd's OnNcPaint handler function calls MyPaint,
which does my nonclient painting. Figure 2 shows what windows look like under
the version of MyPaint that I use. Notice that the frame has a 3-D look, the
title-bar font is different, and the nonclient area buttons use new visuals
and are no longer in their default positions. Because the title bar, borders,
caption-bar buttons, and so on, no longer conform to the Windows defaults, the
message map needs to add the WM_NCHITTEST message, and the corresponding
handler, OnNcHitTest, must return the correct value (HTCAPTION, HTMINBUTTON,
and so on) to indicate the position of the cursor on our window's nonclient
area. We also need to detect mouse clicks over our repositioned, nonclient
area buttons ourselves, so the message map includes the WM_NCLBUTTONDOWN and
WM_LBUTTONUP messages. To do this, I check in CMyWnd's OnNcLButtonDown handler
to see if the mouse-down was on one of the repositioned buttons; if so, I set
a flag and capture the mouse. In the OnLButtonUp handler function, I check if
the mouse-up occurred on a button, and from the flags value, I determine if
the mouse-up occurred on the same button that we recorded the mouse-down on.
If it did, a mouse click occurred, so I do whatever is appropriate for that
button. The code for MyPaint and the OnNcHitTest, OnNcLButtonDown, and
OnLButtonUp handler functions is not shown in the listings, but you can
customize these functions to create any window look or nonclient button
positions (and functionality) of your own choice.
Figure 1 (a) Message path after MFC's hook has been set up, before the window
is subclassed; (b) message path after MFC subclasses the window.
Example 1: (a) Declaring a CWnd object and using CWnd's Attach member function
to subclass the window myHwnd; (b) using SubclassWindow to subclass myHwnd;
(c) declaring a static WNDPROC member variable to override a function.
(a)

CWnd myCwndObject;
myCWndObject.Attach (myHwnd);
(b)
CWnd myCwndObject;
myCWndObject.SubclassWindow (myHwnd);
(c)
class CMyWnd : public CWnd
{
protected:
 static WNDPROC lpfnSuperWndProc;
 virtual WNDPROC* GetSuperWndProcAddr()
 { return &lpfnSuperWndProc; }
}
WNDPROC CMyWnd::lpfnSuperWndProc = NULL; CMyWnd myCwndObject;
myCWndObject.SubclassWindow (myHwnd);
Figure 2 New-look subclassed windows.

Listing One 

// A couple of typedefs
typedef struct tagCWPSTRUCT
 {
 LPARAM lParam;
 WPARAM wParam;
 UINT msg;
 HWND hWnd;
 }
CWPSTRUCT;

typedef CWPSTRUCT FAR* LPCWPSTRUCT;

// Global handle of the installed Windows hook
HHOOK vhHookCallWnd = NULL;

// The subclass-style
const long lSubclassStyle = (WS_CAPTION WS_SYSMENU);



Listing Two

// Our CWinApp class, and the Windows hook related stuff

// CSubclassDLL is our CWinApp class
class CSubclassDLL : public CWinApp
{
public:
 virtual BOOL InitInstance();
 virtual int ExitInstance();
 
 // No special code for the constructor
 CSubclassDLL(const char* pszAppName) : CWinApp (pszAppName) { }
};

// Our global CWinApp object
CSubclassDLL NEAR subclassDLL("subclass.dll");

// InitInstance is called on DLL initialization, it sets up the Windows hook.
BOOL CSubclassDLL::InitInstance() {

 HMODULE
hmodule=::GetModuleHandle((LPCSTR)MAKELONG(AfxGetInstanceHandle(),0));
 vhHookCallWnd = ::SetWindowsHookEx(WH_CALLWNDPROC,
 (HOOKPROC)CallWndProc, hmodule, NULL);
 return (vhHookCallWnd != NULL);
} 
// ExitInstance removes the hook if there was one.
int CSubclassDLL::ExitInstance() 
{
 if (vhHookCallWnd != NULL)
 ::UnhookWindowsHookEx(vhHookCallWnd);
 return CWinApp::ExitInstance(); 
}
// The hook callback function
LRESULT CALLBACK AFX_EXPORT CallWndProc(int code, WPARAM wParam, LPARAM
lParam)
{
 LPCWPSTRUCT lpCall;
 LPCREATESTRUCT lpcs;
 CMyWnd *pMyWnd;
 if (code < 0)
 {
 CallNextHookEx (vhHookCallWnd, code, wParam, lParam);
 }
 else
 {
 lpCall = (LPCWPSTRUCT) lParam;
 switch (lpCall->msg)
 {
 case WM_CREATE:
 lpcs = (LPCREATESTRUCT) lpCall->lParam;
 if (FSubclassStyle(lpcs->style)) 
 { 
 HWND hwnd;
 hwnd = lpCall->hWnd;
 pMyWnd = CMyWnd::MySubclassWindow(hwnd);
 } 
 break;
 }
 }
 return 0L;
}



Listing Three

// Subclassing functions and the class CMyWnd

// FSubclassStyle determines if lStyle conforms to our subclass-style
BOOL FSubclassStyle (LONG lStyle)
{
 return ((lStyle & lSubclassStyle) == lSubclassStyle);
}
// SetSubclassingState is the exported function that needs to 
// be called from your driver program to begin or terminate subclassing
void FAR PASCAL __export SetSubclassingState(BOOL fStart)
{ 
 if (fStart) 
 InitSubclassing();
 else

 TerminateSubclassing();
}

// InitSubclassing, along with InitSubclassingHwnd, enumerates all 
// the windows and subclasses relevant ones. void InitSubclassing()
{
 HCURSOR hcurSave;
 HWND hwnd;
 hcurSave = SetCursor(LoadCursor(NULL, IDC_WAIT));
 hwnd = GetDesktopWindow();
 InitSubclassingHwnd(hwnd);
 SetCursor(hcurSave);
}
void InitSubclassingHwnd(HWND hwndParent)
{
 HWND hwndChild = GetWindow(hwndParent, GW_CHILD);
 static CMyWnd * pMyWnd;
 while (hwndChild)
 {
 long lStyle = GetWindowLong(hwndChild, GWL_STYLE);
 BOOL fActive;
 
 if (FSubclassStyle(lStyle) && IsWindowVisible (hwndChild))
 {
 pMyWnd = CMyWnd::MySubclassWindow(hwndChild);
 if (pMyWnd != NULL)
 pMyWnd->RedrawWindow(NULL, NULL, 
 RDW_INVALIDATE RDW_FRAME);
 }
 InitSubclassingHwnd(hwndChild);
 hwndChild = GetWindow (hwndChild, GW_HWNDNEXT);
 }
}
// TerminateSubclassing, along with TerminateSubclassingHwnd, 
// enumerates all the windows and un-subclasses relevant ones.
void TerminateSubclassing()
{
 HCURSOR hcurSave;
 HWND hwnd;
 
 hcurSave = SetCursor(LoadCursor(NULL, IDC_WAIT));
 
 hwnd = GetDesktopWindow();
 TerminateSubclassingHwnd(hwnd);
 
 SetCursor(hcurSave);
}
void TerminateSubclassingHwnd(HWND hwndParent)
{
 HWND hwndChild = GetWindow(hwndParent, GW_CHILD);
 static CMyWnd * pMyWnd;
 
 while (hwndChild)
 { 
 if ((pMyWnd = (CMyWnd *) 
 CWnd::FromHandlePermanent(hwndChild)) != NULL)
 {
 pMyWnd->UnsubclassWindow();
 if (IsWindowVisible (hwndChild)) RedrawWindow(hwndChild, NULL, NULL,
RDW_INVALIDATE 

 RDW_FRAME);
 }
 TerminateSubclassingHwnd(hwndChild);
 hwndChild = GetWindow (hwndChild, GW_HWNDNEXT);
 }
}
// The class CMyWnd and its member functions
class CMyWnd : public CWnd
{
protected:
 static WNDPROC lpfnSuperWndProc;
 virtual WNDPROC* GetSuperWndProcAddr(); 
 void OnNcPaint();
public:
 CMyWnd* MySubclassWindow(HWND hwnd);
 BOOL UnsubclassWindow();
};
WNDPROC CMyWnd::lpfnSuperWndProc = NULL;
WNDPROC* CMyWnd::GetSuperWndProcAddr() 
{ 
 return &lpfnSuperWndProc; 
}
// CMyWnd::MySubclassWindow subclasses hwnd if it is doesn't already have a
// CWnd attached to it. If successful, it returns a pointer to the CMyWnd 
// object that it attached to the window.
CMyWnd* CMyWnd::MySubclassWindow(HWND hwnd)
{ 
 CMyWnd * pMyWnd = NULL; 
 if (CWnd::FromHandlePermanent(hwnd) == NULL)
 { 
 pMyWnd = new CMyWnd;
 pMyWnd->SubclassWindow(hwnd);
 } 
 return pMyWnd;
} 
// CMyWnd::UnsubclassWindow un-subclasses the window
BOOL CMyWnd::UnsubclassWindow()
{
 WNDPROC* lplpfn;
 WNDPROC oldWndProc;
 RedrawWindow(NULL, NULL, RDW_FRAME RDW_INVALIDATE 
 RDW_UPDATENOW RDW_NOCHILDREN);
 lplpfn = GetSuperWndProcAddr();
 oldWndProc = (WNDPROC) ::SetWindowLong (m_hWnd, GWL_WNDPROC, 
 (DWORD) (*lplpfn));
 Detach();
 
 return TRUE;
}
// CMyWnd's message map
BEGIN_MESSAGE_MAP(CMyWnd, CWnd)
 ON_WM_NCPAINT()
 ON_WM_NCHITTEST() 
 ON_WM_NCLBUTTONDOWN()
 ON_WM_LBUTTONUP()
END_MESSAGE_MAP()
// CMyWnd's WM_NCPAINT message handler
afx_msg void CMyWnd::OnNcPaint ()
{ 

 MyPaint();
}




























































Special Issue, 1994
Avoiding Windows PATH Cram 


FreePath solves the problem by not adding new directories 




Joseph M. Newcomer


Joe received his PhD in 1975 in the area of compiler optimization. He is a
Windows consultant and applications developer based in Pittsburgh, PA. His
past experience has included computer graphics, document-processing software,
operating-systems development, compiler development, CASE tooling, computer
music, and real-time and embedded-systems development.


Back in the old days of DOS, we suffered with "RAM cram." We had large TSRs
and device drivers that consumed massive amounts of lower 640K memory, leaving
insufficient memory for running applications. Windows solved this by using
VxDs to provide some of these capabilities and multitasking Windows apps to
provide most of the others.
However, Windows has subjected us to a far worse problem: "PATH cram." How
many times have you installed a new application, only to find out that it has
added itself at the front of your PATH? If you remove it, the application
doesn't run. And what happens when that miniscule 127-byte PATH limit is
reached? You lose even more than with TSRs because the limit is so much lower.

A massive PATH means that the cost of loading a DLL or executable becomes
incredibly high because each directory must be searched. If you spawn a DOS
shell and type some nonexistent command, plenty of time can pass before the
error message appears. With DOS 6.0 and higher (as well as other
MS-DOS-compatible operating systems) this can be solved using separate, often
empty configuration sections in the CONFIG.SYS. These set the %CONFIG%
variable that can be tested in AUTOEXEC.BAT to select among several PATH
statements. However, having to reboot between applications is not exactly user
friendly. Alternately, all the necessary DLLs can be dumped into a single
directory in the PATH, but this creates a directory full of incomprehensible
files. Removing or updating a program then becomes a nightmare, particularly
if DLLs are shared by several applications. This also results in "disk cram,"
in which each install dumps five or ten megabytes of DLLs into your Windows
directory. The install procedures are often rather crude, and if there is not
enough free space on the drive containing Windows, the install will usually
refuse to attempt the installation. 
Another disadvantage of PATH is that it allows one program to mask another;
for example, if an application is delivered with a DLL, then exactly which DLL
is invoked may depend upon whether you have added the new directory at the
front or back of your PATH. If you have put it at the end of the PATH, then a
DLL of the same name (but possibly with different interfaces!) found earlier
in the PATH will be used, with potentially disastrous consequences. If you put
the new directory at the front of your PATH, the program that previously used
the DLL of the same name will now see the new DLL instead.
Typical DLLs first try the current directory, then the Windows and
Windows\SYSTEM directories, then the directory containing the executable file,
then the directories listed in the PATH, and finally the list of directories
mapped in a network. Normally the DLL will be stored with the executable file
and found before the PATH is searched; it is becoming more common, however, to
have shared DLLs placed in a separate directory, forcing LoadLibrary to use
the PATH. Thus, two identical systems loaded with the same executables might
exhibit completely different behavior based solely upon the differences in
their PATH variables. Fundamentally, the PATH mechanism is a poorly designed,
inefficient abbreviation mechanism for translating from an unqualified program
name to a particular instance of executable code. Its great charm is that it
is easy to implement, which explains its survival.
My normal DOS approach was to have a minimal PATH and execute programs with
.BAT files that either gave an explicit path to the executable or temporarily
set the PATH to a new value. But the PATH must be set before you start
Windows, so you cannot change it dynamically. DOS-based software should
function correctly if invoked by an explicit directory path on the DOS command
line, even if the user's PATH is empty. Trivial as this is to accomplish (use
argv[0] to derive the home directory of the program), many commercial DOS
applications fail if the program's directory is not in the PATH. For Windows,
I use GetModuleFileName to obtain the program directory, and consequently
never have to depend upon the PATH to find those DLLs, executables, or data
files that would reside there. Finally, you can use an application-specific
.INI file initialized during the setup to hold a section that locates other
executables or DLLs: [programs] mumble.exe=d:\mumble\bin\mumble.exe. 
Using LoadLibrary with an explicit path, followed by a series of
GetProcAddress calls to initialize a series of pointers, makes it relatively
easy to avoid requiring an implicitly loaded DLL. This eliminates the need for
PATH; a simple macro makes the code look as if an implicitly loaded library
were used.
All of these techniques can bulletproof an application from the vagaries of
PATH; unfortunately, most commercial applications do not practice PATH-safe
computing. For me, the breaking point came when I recently installed the OLE
2.0 (April 1993) SDK from the MSDN CD-ROM. It wanted not just one, but three
new directories with long names in my PATH! This was simply impossible; the
PATH is already too long to hold what is needed. I needed a way to avoid
putting anything else in my PATH so that it could be found by Windows.
My solution to PATH cram is a program called "FreePath." It is designed to be
loaded from the LOAD= line in your WIN.INI file, and it handles the PATH
problem by simulating the effect of PATH without actually requiring new
directories to be added to the PATH. 


ProcHook to the Rescue 


Key to making FreePath work is the ProcHook DLL presented in the article "Hook
and Monitor Any 16-bit Windows Function with our ProcHook DLL," by James
Finnegan in Microsoft Systems Journal (January 1994). Finnegan's DLL allows an
application to provide a callback function which will intercept any selected
Windows API call. A "hook" to this function is then set in the selected API
call, allowing you to do anything with this API call, including calling the
underlying API that had been hooked. Hooks are set by the SetProcAddress call,
temporarily removed and restored by the ProcHook call, and permanently removed
from the hook database by the SetProcRelease call. Table 1 provides relevant
information about the ProcHook DLL. Hooks are implemented by actually
modifying the code of the procedure to contain a JMP to either the hook
handler or its instance thunk. To call the actual API call, you must replace
the 5-byte JMP instruction with the original code sequence using ProcUnhook,
then perform the call again; this time it will not be intercepted by the hook
procedure. I hooked the LoadLibrary, LoadModule, and WinExec calls. The real
work, as you'll see, is done in LoadModule. 
The initial design was simple: My hook procedure would first try to load the
module using the base LoadModule call. If LoadModule succeeded, it would
rehook the callback and return the HINSTANCE value to the caller. If the call
failed, I would then look at the filename that was passed in. If it were an
absolute pathname, I would simply return the error code. However, if it were a
simple name--such as FOOBAR.DLL, FOOBAR.EXE, or the like--I would find a
corresponding complete path and try again. If this second attempt failed, I
would return the error code of the first call; otherwise, I would return the
newly obtained instance handle. This would successfully simulate the PATH
without doing a search! As it turned out, the final implementation was much
more complicated.


Maintaining the Mappings 


I had to decide where to store the path-mapping information--in the overused
and much-abused WIN.INI file, in an application-specific .INI file, or in the
"more-modern" registration database. The registration database has several
potential advantages: 
It is considered to be the successor of the .INI file.
There is a tool to manipulate it (regedit), and its interface is cleaner than
that of .INI files.
It allows for hierarchical key/value pairs.
It does not require textual processing (parsing the entries).
A simple add-and-delete capability could be incorporated into my hook
processor without much programming.
The registration database also seemed to have significant performance
improvement over the alternative methods because it keeps a cached copy of
some of its information in memory. Although less efficient than writing my own
binary database system, it involved considerably less effort.
I could not find any good documentation on the "proper" use of the
registration database (all existing documentation concentrates on its use for
OLE servers), so I adopted some conventions. The first-level key is the name
of my application, FreePath. Below this are names of some FreePath-related
options, and under each option is a list of program name/pathname pairs. 
An entry in the registration database is obtained by passing in a pointer to a
string that looks like a directory string, for example,
FreePath\Active\foobar.dll. The text string, which in my case is the full
pathname, can be obtained by using the RegQueryValue API call. If I cannot
find a full pathname to substitute in the registration database, I just return
the error code of the LoadModule that failed.
I required that the mapping from a program name to a pathname be complete,
rather than just a path to be prefixed; not only was this a bit simpler, but
it also meant that the user could redirect to another DLL that had the same
interface! 
The code that locates a mapping is shown in Listing One . It uses a static
variable for its scratch area so that it does not consume any more stack space
than necessary for calls (see "Conserving Resources: Stack Space"). There is
no way to tell how much stack is available when LoadModule is ultimately
called, so the callback code should not be profligate of stack space (see
Finnegan's article). The code simply forms the key "FreePath\Active" and opens
the registration database for that key. If it finds the key, the code queries
the subkey value, which is the filename of the module to be loaded--that is,
it locates the value associated with FreePath\Active\filename. If the value is
found, it is returned via the parameter pointer newfile. 
Figure 1 is a sample of the registration database, as shown by regedit. Note
that several other keys appear here. The distributed version lets you disable
a definition (for example, for testing); disabled mappings are under the
disabled key. You can also instruct FreePath to log any requests that generate
errors or any requests for names which are not found; these are kept under the
keys BadPath and NoPath, respectively. This can help determine why a load
failed; for example, you may not realize that an executable needs a certain
DLL; just turn on the error logging and the failing name will appear in the
database! A simple pushbutton will clear these failure entries from the
database, so you don't have to search through hundreds of two-week-old failure
requests, or delete them one at a time. 


The User Interface


The FreePath control panel is shown in Figure 2. This provides an
application-specific editor for the registration database, and makes it easy
to, for example, move a mapping from the Active to Disabled section, and back.
The Display check boxes allow for selective display of information from the
registration database, and the API check boxes allow for the API calls to be
selectively enabled or disabled. A global Disable check box not only disables
the actions, but completely removes the hooks set by ProcHook, leaving your
system in its pre-FreePath condition. 
I added the Performance section to determine FreePath's effectiveness. To
improve its own performance, FreePath does not attempt to update this display
when it is minimized; if it is visible, you see the numbers update in real
time as modules are loaded. The counters are maintained internally even when
the display is minimized (the overhead of a simple "++" operation is very
small). When the icon is opened, the WM_SIZE handler posts a message to update
the Performance display.
The Browse button brings up a file dialog and lets you locate a file to enter.
It automatically sets the name to correspond to the filename part of the full
pathname. You may also type a full pathname directly and FreePath will
automatically fill in the filename.
The status of all the check boxes is stored in the FREEPATH.INI file in the
Windows directory. When FreePath is started, its initial settings are taken
from this initialization file. If you change any of the settings, the Save Now
button will become active; clicking it will save the profile settings.
Normally the profile is saved upon program exit, but since this program
normally doesn't exit unless Windows shuts down successfully, I wanted to
provide an option to guarantee that a particular configuration could be saved.
The Enable (alternatively Disable) button becomes active if a disabled or
Active entry is selected from the list box, allowing the entry to be
transferred between the two categories; Delete will delete a selection, and
Add will add (or replace) an Active entry from the contents of the Name and
Path input boxes. 



The Callback Table


Certain actions had to be performed for all callbacks, while others had to be
performed for specific callbacks only. Rather than hardwire all the values for
generic actions into the program, I simply constructed a table which pointed
to individual entries; see Listing Two . The table contains a printable name
(primarily for the debugging output), the address of the "real" procedure, a
pointer to the handler procedure MakeProcInstance thunk, an entry for the hook
argument to the ProcHook and ProcUnhook calls, the control id that enables the
check box, and a set of Boolean flag values that indicate flag status. 
The table is initialized using the code shown in Listing Three . I immediately
ProcUnhook the hook set by SetProcAddress; if a hook fails to take, I make the
check box that selects it invisible. (Initially, I had simply disabled the
check box, but I found the distinction too subtle to detect, so I changed it
to completely hide the box in question.) The use of MakeProcInstance, now
largely obsolete because of smart callbacks, is absolutely mandatory for
ProcHook (see Finnegan's article).
The callback table is initialized via a PostMessage call set up during the
OnInitDialog handler. This is because WinExec does not return control to the
calling application until the first Yield, typically implied by the GetMessage
of the top-level message loop. Hooking the WinExec function before WinExec
completes (in particular, the WinExec that launched FreePath) can cause a
catastrophe. Finnegan recommends performing the hook initialization via a
handler invoked from the top-level message loop via a PostMessage; see Listing
Three.
Once the table is initialized, SetHooks establishes the settability of its
hooks, after which the EnableHooks call actually places them; see Listing Four
. Note that the m_API variables (m_LoadLibrary, for example) are variables
maintained by the Microsoft Foundation Class (MFC) library to reflect the
status of the check boxes; m_Disabled represents the state of the "disabled"
check box that renders the program totally inactive, even to the point of
removing its physical hooks. 


The Hook Handlers 


The hooking is handled by callback procedures. Each API procedure to be
intercepted has its own callback procedure as a hook handler. The handler's
signature is the same as that of the API procedure. When a hook is set, any
call to the API procedure will transfer control to its associated handler. The
callback can do anything it wants, including calling the hooked API procedure.
To prevent infinite recursion, the hook handler must first unhook the API
procedure that calls it, then rehook it before returning. 
The simplest callback is the WinExec callback, Free_WinExec, in Listing Five .
Following the algorithm described, I unhook the procedure, perform certain
actions, call the underlying "real" procedure, perform more actions, and
return. In this case, I wanted to "activate" the LoadModule handler, whether
the user had explicitly checked LoadModule as a trappable API call or not. If
called from WinExec, its performance would be dictated by the WinExec check
box, hence the assignment p_LoadModule.active = p_WinExec.active. 
The LoadLibrary handler, Free_LoadLibrary, is a bit more complex because after
reading Matt Pietrek's Windows Internals (Addison-Wesley, 1993), I was under
the impression that LoadLibrary would be implicitly called to load related
DLLs.
Pietrek correctly pointed out that LoadLibrary is just a shell around
LoadModule and that WinExec is a wrapper around LoadModule. Consequently, I
thought I could do everything just by hooking LoadModule. However, I realized
that some users might not want everything redirected, so I gave the option of
specifying which API calls were to be redirected. I mistakenly assumed,
however, that LoadModule could call LoadLibrary to load any implicit
libraries, which, of course, would call LoadModule. (The alleged relationships
between the API calls are shown in Figure 3, and this seems to be supported by
the pseudocode on page 259 of Pietrek's book.) This led me into a recursive
situation--once I unhooked LoadLibrary to call the "real" LoadLibrary, I ran
the risk of getting another call to LoadLibrary; because I had removed the
hook, however, my callback would not see the call, and the real LoadLibrary
would end up calling the real LoadModule. This complicated the implementation
somewhat. 
According to Pietrek, the "helper function" LMImports (called by LoadModule)
calls LoadLibrary to load any related implicitly linked libraries. In fact, I
discovered that it does not, and the true structure was inferred by setting a
breakpoint at LoadModule and examining the callback stack from CodeView; see
Figure 4. 
The effects of this difference were nearly enough to kill the whole project.
Fortunately, some of the code was salvageable because it handles a related
problem, where the LibMain of a DLL explicitly calls LoadLibrary; see Listing
Six . 
The LoadLibrary handler works much like the WinExec handler: It is unhooked,
and ultimately I call the underlying "real" LoadLibrary call. If the
LoadModule check box wasn't selected on the user interface, it will not be
active. But since I have to intercept LoadModule to complete LoadLibrary, I
activate it by setting the active field TRUE. In addition, LoadModule might
not actually have a hook set because Free_LoadModule unhooked itself to call
LoadModule; if necessary, ProcHook is called to reset the LoadModule hook. If
the hook was set active and the user doesn't want LoadLibrary mapping, we
deactivate the LoadModule hook. After I call the "real" LoadLibrary routine, I
reset the LoadModule.active state. If LoadModule has been hooked from within
LoadLibrary, I unhook it and restore its active flag.


LoadModule


First, I'll illustrate Free_LoadModule's basic operation. Then I'll detail the
consequences of the differences between the ideal implementation (Figure 1)
and the real implementation (Figure 2), and aspects of subtler interactions
such as SetErrorMode. The Free_LoadModule code and its associated helper
function Call_LoadModule are in Listing Seven .
One little glitch was the error notification built into Windows. To handle
this correctly, I had to add the SetErrorMode processing. The error-reporting
dialog box that normally pops up when a load fails should be suppressed, but I
must pop it up if I cannot remap the load request and if the prevailing error
mode would have popped it up had FreePath not been handling the load
operation. I must properly simulate the Windows behavior that would have
occurred without FreePath. The handler and helper functions appear in Listing
Twelve . For now, assume that Call_SetErrorMode is just SetErrorMode.
After setting the error mode, I unhook the LoadModule procedure, as expected.
For now, ignore the pong-related code and the strange handling of the
filename_stack and filename_ptr; these will be discussed later. 
To avoid potential mutual recursion with LoadLibrary, if the LoadLibrary
handler is unhooked I rehook it at this point (note that I will have to unhook
it upon exit). Next, I perform the actual call on the base LoadModule function
via Call_LoadModule. This is just a convenient way to package the pong-related
code; substituting LoadModule for Call_LoadModule gives a good approximation
of program execution. If the call to the base LoadModule succeeds, I increment
a counter (for the Performance display on the FreePath control panel), notify
the panel that an update is requested, clean up all the state I have modified,
and return the instance handle to the caller. If the LoadModule call does not
succeed, and LoadModule is "active," life becomes far more interesting_.
At this point, I am dealing with the "primary load" failure case. It is not
clear exactly why the base LoadModule call failed. Consider the case of
loading a program, TOP.EXE, which requires the implicit library DLL1.DLL. The
DLL1.DLL library requires the loading of the implicit library DLL2.DLL.
Therefore, when I get an error return, I don't know exactly why TOP.EXE failed
to lead; either TOP.EXE, DLL1.DLL, or DLL2.DLL was not found. This is another
use for the LoadModule_Depth counter: If the failure occurred at the immediate
call to LoadModule, then the Failure_Depth will be zero because it is set to
zero before the call to the base LoadModule and no recursive calls to
LoadModule failed. In this case, I can attempt a retry. If, however, the
Failure_Depth is nonzero, the failure must have occurred at a much lower
level, because the LoadModule_Depth was stored at the time the failure
occurred. I have already issued an error message for it, and retrying the
operation with a mapped name will not help, so I just go directly to the
failure exit.
Next, I check whether the path given was absolute or just an unqualified name.
The code for RelativePath is in Listing Eight . I did not wish to implement a
parser for pathnames, especially when the C library already had one;
unfortunately, it was model specific. This meant that it wanted a char *
pointer, which in medium model is 16 bits of DS-relative offset, whereas I had
passed in an LPCSTR, a 32-bit FAR pointer. The solution was to copy the string
to a local (near) variable and apply _splitpath. I also test whether the
string length is acceptable and return FALSE if it is too long. If there is a
drive or directory in the path, I assume that it is not mappable and return
FALSE; otherwise it is mappable. While it is certainly possible to create
mappings for fully qualified names, there are some problems with the backslash
characters, and in any case, the point is to simulate the PATH environment,
which only deals with unqualified names. (Note the use of static variables to
avoid stack consumption.)
If the path is unqualified, I call GetMappedFile to map the file to a new
name. If this fails, I take the failure exit. The code for GetMappedFile is
shown in Listing One. If I find a mapping, I attempt to issue the base
LoadModule call using the new filename. If this call fails, I increment a
counter for the Performance display and take the error exit. If it succeeds, I
increment a success counter for the Performance display.
In the exit code, I deal with more pong-related processing, unhook the
LoadLibrary hook if it was set, rehook the LoadModule intercept if necessary,
restore the error mode, and return. The desire to put the pong-related code in
one place resulted in the helper function Call_LoadModule. 
In the failure exit code, I examine the prevailing SetErrorMode state. If it
is not SEM_NOOPENFILEERRORBOX, I reset the error mode to the prevailing error
mode and issue a LoadModule call. This will force the standard (and expected)
Windows error box to appear. 


Playing "Pong" with Hooks 


After the entire user interface was developed and the first-level functions
were operational, I then tackled the LoadModule/LoadLibrary recursion problem.
I came up with a brute-force solution, as follows: 
1. A hook is set at LoadModule that transfers control to the handler
Free_LoadModule. 
2. Free_LoadModule unhooks the hook but sets a new one at a later point in
LoadModule. 
3. Free_LoadModule calls the base LoadModule. 
4. The secondary hook transfers control to the secondary-hook handler. 
5. The secondary-hook handler rehooks the LoadModule hook. 
6. The secondary hook removes itself and resumes execution of LoadModule.
Because these hooks alternate back and forth, with control bouncing from one
to the other, I named the secondary hook the "pong hook," in honor of that
first video game. Now I had to come up with a feasible and reasonable
implementation. I rejected several alternatives for the same reason that
Finnegan did; for example, using an INT3 interrupt would cause problems if you
were running a debugger. Clearly there was a nice hook-setting mechanism in
PROCHOOK.DLL. But the hook mechanism of ProcHook inserted direct JMP
instructions to the handler thunk so that the stack frame would be correct in
the handler. If I did a JMP from the middle of the code, I would get control
with the stack frame in some undefined state that certainly would not cause
the handler procedure to return to the desired location. To resume, the
ordinary hook handler depends upon the RETF that it executes to return to the
caller of the hooked function because that is how the stack frame is set up.
In my case, I had to insert a hook that would allow the handler to return to
the point that I had intercepted. In debugging at the code level, I discovered
that the calls beyond the API entry point for LoadModule passed parameters via
the registers, for example, CX:AX. If I placed my intercept code after this
point, I would have to preserve CX:AX, which is not an assumed condition for
the C compiler; in particular, the thunk code modifies AX. The internal calls
were also NEAR rather than FAR calls, which would make setting intercepts at
the called procedure difficult and potentially hazardous. The actual code for
the first few instructions of the Windows 3.1 LoadModule is shown in Listing
Nine . Not knowing what that first Call instruction called, I wanted to do my
intercept before it was reached. But I had to set the new hook in such a way
that when it rehooked LoadModule, the placement of the hook would not
interfere with the execution of the instructions. Therefore, the secondary
hook had to be at least five bytes into the LoadModule code so that the 5-byte
far JMP that ProcHook installed could be safely placed.
My solution was to plant a ProcHook within the LoadModule code. Careful
inspection of the code determined that the earliest feasible place was also at
least five bytes into the code (and, of course, on an instruction boundary). I
placed it at LoadModule+5. However, a JMP instruction would not transfer
control properly to the handler. I therefore modified the hook instruction
after the hook was set to be a CALL instruction. In order that ProcHook
continue to work correctly, I restored the operation to a JMP instruction
before unhooking it. In the handler, I modify the return address to point to
the location where the hook was set. This implementation is void where
prohibited by law.
Creating the pong handler required careful reading of the generated entry and
exit code of the Free_Pong routine in Listing Ten . As I expected, the
instruction-pointer portion of the far-return address was two bytes above
SS:BP. I reset this value to be the pointer to the pong hook location, using
the __asm insertion shown. The complete pong code outside Free_LoadModule is
in Listing Eleven . Note the use of the undocumented AllocCStoDSAlias call,
the same one Finnegan uses to map the selectors. AllocCStoDSAlias must be
declared as extern "C" UINT WINAPI AllocCStoDSAlias(UINT); and the .DEF file
must contain the declaration: 
IMPORTS 
AllocCStoDSAlias = KERNEL.170 
The pong code is quite fragile and could, with a minor change in the code for
LoadLibrary, seriously corrupt the system. Before I set the hook, I verify
that the bytes found at the location where I place this internal hook are
those that I expected. This code does not show that I have let the offset and
signature to be specified by the .INI file, to allow for unforeseen
circumstances. Such capabilities should not be used casually! 


Handling SetErrorMode 


The first time I tried to load a complex system (Quattro Pro for Windows),
Windows complained that it could not find one of its DLLs by popping up a
MessageBox. This suggested that my code was suddenly failing for some unknown
reason. Yet after I clicked on OK, Quattro Pro came right up, which it could
not have done if the DLL had actually failed to load. In examining the log
data I wrote with OutputDebugString, I saw that the first attempt to load it
had indeed failed, and the second (mapped) attempt had succeeded. I added the
SetErrorMode call so that (expected) failing load attempts would not notify
the user. Of course, this behavior is also unacceptable if the FreePath
remapping attempt fails or can not be attempted because no remapping was
found. Therefore, in the failure exit, unless the user has externally set the
error mode to SEM_NOOPENFILEERRORBOX, I simply re-issue the base LoadModule
call. I could have done my own MessageBox, but that would lead to problems in
internationalization; it would be confusing if some messages came up in the
user's native language, and one that looked remarkably like a Windows message
popped up in English. To avoid this, I force the underlying LoadModule code to
issue the error. 

This suppression of the error message introduces yet another problem: Suppose
that TOP.EXE requires DLL1.DLL, and DLL1.DLL requires DLL2.DLL. I come in, set
the error mode off, and load TOP.EXE successfully. TOP.EXE calls LoadModule to
load DLL1.DLL, which also is successful; LoadModule is called again to load
DLL2.DLL, which fails. By the time it gets to DLL2.DLL, I have already turned
the error reporting off (at the highest level, when attempting to load
TOP.EXE). LoadModule would typically have no history as to whether it was
called from a user application or from an internal recursive call. Thus, I
might fail to issue the error message because the error mode is incorrect. So
I added two variables: LoadModule_Depth and the Prevailing_ErrorMode array.
Note that I use the Prevailing_ErrorMode for the current LoadModule_Depth as
the context in which I issue the LoadModule attempt that should generate the
error message. Of course, now LoadModule must properly maintain the
Prevailing_ErrorMode, so I set a ProcHook in SetErrorMode. This allows my own
SetErrorMode handler, Free_SetErrorMode, to intercept the SetErrorMode calls
and maintain the Prevailing_ErrorMode. This handler is in Listing Twelve.
The hooking of SetErrorMode rendered my code undebuggable, due to the
interaction of CodeView with the SetErrorMode handler. I had made a minor
error in the code which I didn't see immediately, so I brought up CodeView. I
discovered that attempting to debug the code resulted in an infinite recursion
entering Free_SetErrorMode. This seemed even less explicable than the bug I
was looking for, which was that the SetErrorMode was not yet correctly
maintained. In single-stepping, I discovered that I went into the infinite
loop when I called ProcHook, which was even more confusing! I started
single-stepping into ProcHook and got a fault in KRNL386.EXE in an _fmemcpy
call. Suddenly, I remembered that one of the options to SetErrorMode is
SEM_NOGPFAULTERRORBOX, which has the annotation "This flag should be set only
by debugging applications that handle GP faults themselves." The code in
ProcHook that sets the hook consists of two _fmemcpy calls; see Example 1.
While single-stepping, CodeView apparently calls SetErrorMode just before
control returns to the user prompt. The internal failure occurred as I
single-stepped across the first _fmemcpy. This has just changed the first
instruction of SetErrorMode to a JMP, but the jump address, installed by the
second _fmemcpy, has not yet been installed, which explains that failure. I
wanted to rewrite ProcHook so that a single _fmemcpy would set the hook in one
5-byte transfer--that way, when CodeView called SetErrorMode before returning
control to the user, a valid hook would have been set. But take a look at
Listing Twelve. In Free_SetErrorMode, I call Call_SetErrorMode, which
determines if the .set flag is TRUE and the hook pointer is not NULL. But
since control has not yet returned to my code, I have not stored the result of
ProcHook, so the value of the hook is NULL; I certainly would not have
executed the next statement, so the value of the .set field would be FALSE.
Thus, I would not call ProcUnhook, and the infinite recursion would happen.
Now, all this would happen if I were single-stepping inside ProcHook itself,
but it happens even when I am single-stepping my own code. Unfortunately,
because I would have stored the .hook field, but not the .set flag, my test
would again fail. Of course, I could set the .set flag first, but this would
only be a partial solution. To fix this, I would have to actually test the
first instruction of SetErrorMode to see if it was a JMP, and if so, unhook
the handler before calling SetErrorMode. But this would work as long as I did
not single-step into ProcHook and try to trace it; for complete success, I
would have to modify ProcHook to set the hook in a single 5-byte transfer. I
would also have to add an IsHooked call to tell me if the API is already
hooked to me. These would be changes to the ProcHook library, and I did not
wish to have a private, customized version of it. For now, I just avoid
single-stepping into a ProcHook call on SetErrorMode, and set my breakpoint
after I have successfully stored p_SetErrorMode.set. 


Conserving Resources: Message Queue 


When writing a program that attaches itself symbiotically into a system, it
must not have a significant impact on the system's overall resource
requirements or performance. I did not want to slow the system down by
updating the Performance display if it was not necessary, or impose the
additional stack space requirements for calling the update routine, as would
be required for a SendMessage. Therefore, I used a PostMessage call to post an
update request. The PostMessage queue size is usually very small and could
overflow if a complicated module suite were loaded. Therefore, I set a flag
after issuing the first PostMessage, and I issue no subsequent PostMessage
calls until the message was handled and the flag was cleared. For performance
reasons, the display need not be updated if the program is iconized, so I
don't even bother with PostMessage when the window is iconized. 
The interface here spans both C++ (in which the user interface is written) and
pure C (in which the callbacks must be written), so I reflect some of the C++
state (from members of the class) into static C variables. One such variable
is the window handle to which to post messages, C_var_PostWnd. If the window
is iconized, I set it to NULL; if the window is restored, I set it to the
window handle. The full posting code is shown in Listing Thirteen . The
NotifyUpdate procedure is called from the callback for LoadModule; it does not
attempt PostMessage if the C_var_PostWnd is NULL. Note that an update request
is posted in the restoration of the window. 


Conserving Resources: Stack Space


Generally, I have conserved stack space by making local variables static so
that they are allocated in the program's DGROUP. However, in one case this was
not possible: Keeping the local name of the mapped file required keeping a
local copy in each recursive incarnation. I had initially done this in the
obvious way, by allocating a local char[_MAX_PATH] array. Unfortunately, I
would occasionally get a fatal stack overflow (which would crash all of
Windows) when loading a complex program. Some careful investigation suggested
that this might be caused by overconsumption of stack space. The only culprit
I could find was the large char array that held the filename. 
I therefore chose the implementation shown in Listing Seven. On each recursive
entry, I store the current value of the filename_ptr in new_filename. After I
fill in the new filename string in the location pointed to by new_filename, I
increase filename_ptr by the length of the string plus one byte for the
terminating NUL byte. When I exit the procedure, I reset filename_ptr to the
value I stored in new_filename. This gives me a very compact filename stack.
From a viewpoint of language purity this is outrageous; our languages are
supposed to handle this for us. However, it reduces consumption of the
application stack (SS:SP), a scarce resource over which I have no control, and
increases consumption local DGROUP space, which is comparatively plentiful,
essentially unrestricted, and over which I have complete control. I chose a
large limit, MAX_PATH_DEPTH, on the length of the mapped strings, which in
general should translate to several times that capability for realistic path
lengths. 
Note that I only use static temporaries in those procedures that can be called
via the callbacks, where SS!=DS. For other uses, such as working with the user
interface, the stack is our own, and SS==DS; the stack for the program is
large enough to handle these cases, and there is no potential recursion, so
large limits (such as 256 bytes) are acceptable.


Conclusions


Finnegan's ProcHook DLL is critical to this operation. It allows (nominally)
any API call to be intercepted and routed to a handler. The binary code for
PROCHOOK.DLL is provided electronically; see "Availability," page 3. The
source code is available on CompuServe, Microsoft Download Service
(206-936-6735), and several other services cited in MSJ. 
A complete running version of FreePath is also available electronically in
binary, along with the source for the critical subroutines shown in these
listings. The package is being distributed as shareware, and registered users
will get the complete source to FreePath and an online Help file. 
Finnegan's ProcHook article discusses a very important limitation of ProcHook
when it interacts with WinExec: Because WinExec does not get control back
until after the WM_CREATE message, it is possible to get into unrecoverable
situations if you attempt to ProcHook while in WinExec. This can be fixed by
using PostMessage to trigger the hooking request, as my application does. If
your handlers are in your main executable, it is critical that the code be
compiled with the /Gw switch so that the call via the instance thunk will set
DS properly. 


Acknowledgments 


I particularly want to thank James Finnegan, the author of ProcHook, without
whose work this would not have been possible, and both the author and
Microsoft Systems Journal for permission to distribute the ProcHook DLL with
FreePath. I would also like to thank Matt Pietrek for answers and observations
that helped me get this up quickly, and the many correspondents on
CompuServe's MSLANG and MSMFC forums, especially the Microsoft engineers who
have been helping me learn the MFC library.
Table 1: The PROCHOOK.DLL interface.
NPHOOKCHILD SetProcAddress( FARPROC OriginalFunc, FARPROC NewFunc, BOOL
exclusive)
Sets up a hook to the function specified in OriginalFunc by redirecting the
entry point to the function specified by NewFunc. The pointer specified by
NewFunc must either be in a DLL, or it must be an instance thunk returned by
MakeProcInstance. Smart callbacks will not work! The exclusive flag specifies
whether this hook will be exclusive to the function. It returns a near pointer
to an NPHOOKCHILD if successful, or NULL if not.
BOOL SetProcRelease(NPHOOKCHILD hookptr)
Permanently removes the hook specified by hookptr. Returns FALSE if
successful, TRUE if not.
BOOL ProcHook(NPHOOKCHILD hookptr)
Rehooks the function specified by hookptr. The function should have been
previously unhooked by ProcUnhook. It returns FALSE if successful, TRUE if
not.
BOOL ProcUnhook(NPHOOKCHILD hookptr)
Temporarily unhooks the function specified by hookptr. Should be matched by a
subsequent call to ProcHook. It returns FALSE if successful, TRUE if not.
Figure 1 Registration database editor.
Figure 2 FreePath control panel.
Figure 3 Alleged component relationships.
Figure 4 Actual component relationships.
Example 1: Two _fmemcpy calls. 
// Change the first 5 bytes to JMP 1234:5678 (EA 78 56 34 12)
_fmemcpy(lpJmpPtr++,&wJmp,1);
_fmemcpy(lpJmpPtr, &npHookChild->lpfnNewFunc,4);

Listing One

#define KEY_APPNAME "FreePath"
#define KEY_ACTIVE "Active"

BOOL CPromptDlg::GetMappedFile(LPCSTR filename, LPSTR newfile, int 
newfile_len)
 {
 static char key[256]; // Dont eat stack, make this static

 HKEY subkey;
 
 lstrcpy(key, KEY_APPNAME);
 lstrcat(key, "\\");
 lstrcat(key, KEY_ACTIVE);

 // At this point we have formed
 // FreePath\KEY_ACTIVE

 LONG retval = RegOpenKey(HKEY_CLASSES_ROOT, key, &subkey);

 if (retval != ERROR_SUCCESS)
 { /* missing key KEY_ACTIVE */
 RegCloseKey(subkey);
 return FALSE;
 } /* missing key KEY_ACTIVE */
 else
 { /* has key KEY_ACTIVE */
 LONG len = newfile_len;
 // Ask for the value of
 // FreePath\path\foo.dll
 retval = RegQueryValue(subkey, filename, (LPSTR)newfile, &len);
 if(retval != ERROR_SUCCESS)
 { /* failed */
 RegCloseKey(subkey);
 return FALSE;
 } /* failed */

 // We have found the mapped path
 RegCloseKey(subkey);
 return TRUE;
 } /* has key KEY_ACTIVE */
 
 }





Listing Two

typedef struct {
 char * name; // Printable name
 FARPROC proc; // "Real" proc address
 BOOL enable; // do we want this hook set?
 FARPROC handler; // Free_ handler for this proc
 int id; // control ID that is associated with this entry
 FARPROC callback; // MakeProcInstance(handler)
 BOOL set; // Hook has been set
 NPHOOKCHILD hook; // ProcHook magic cookie
 BOOL active; // We are active
 BOOL settable; // we can set this hook
 } hook_entry;
//--------------------------------
// p_LoadLibrary
//--------------------------------
hook_entry p_LoadLibrary =
{"LoadLibrary", // Name
 (FARPROC) LoadLibrary, // Address

 FALSE, // enable
 (FARPROC)Free_LoadLibrary, // local callback
 IDC_LOADLIBRARY, // checkbox 
 NULL, // MakeProcInstance pointer
 FALSE, // set
 NULL, // hook
 FALSE, // active
 TRUE // settable
}; 
//--------------------------------
// p_LoadModule
//--------------------------------
hook_entry p_LoadModule = 
{"LoadModule", // Name
 (FARPROC) LoadModule, // address
 FALSE, // enable
 (FARPROC)Free_LoadModule, // local callback
 IDC_LOADMODULE, // checkbox
 NULL, // MakeProcInstance pointer
 FALSE, // set
 NULL, // hook
 FALSE, // active
 TRUE // settable
};
//--------------------------------
// p_WinExec
//--------------------------------
hook_entry p_WinExec =
{"WinExec", // Name
 (FARPROC) WinExec, // address
 FALSE, // enable
 (FARPROC)Free_WinExec, // local callback
 IDC_WINEXEC, // checkbox
 NULL, // MakeProcInstance pointer
 FALSE, // set
 NULL, // hook
 FALSE, // active
 TRUE // settable
};
//--------------------------------
// p_Pong
//--------------------------------
hook_entry p_Pong =
{"Pong", // Name
 (FARPROC) 0, // address
 FALSE, // enable
 (FARPROC)Free_Pong, // local callback
 0, // checkbox (none corresponds to this)
 NULL, // MakeProcInstance pointer
 FALSE, // set
 NULL, // hook
 FALSE, // active
 FALSE // settable
};
//--------------------------------
// p_SetErrorMode
//--------------------------------
hook_entry p_SetErrorMode =
{"SetErrorMode", // Name

 (FARPROC) SetErrorMode, // address
 FALSE, // enable
 (FARPROC)Free_SetErrorMode, // local callback
 0, // checkbox (none corresponds to this)
 NULL, // MakeProcInstance pointer
 FALSE, // set
 NULL, // hook
 FALSE, // active
 FALSE // settable
};




Listing Three

hook_entry * hook_table[] = {
 &p_LoadLibrary,
 &p_LoadModule,
 &p_WinExec,
 NULL };

void CPromptDlg::InitProcTable()
 {
 int i;

 for(i=0; hook_table[i] != NULL; i++)
 { /* initialize it */
 hook_table[i]->callback = MakeProcInstance(hook_table[i]->handler,
 AfxGetInstanceHandle());
 hook_table[i]->hook = SetProcAddress((FARPROC)hook_table[i]->proc,
 hook_table[i]->callback,
 FALSE);
 if(hook_table[i]->hook != NULL)
 ProcUnhook(hook_table[i]->hook); // unhook immediately

 hook_table[i]->set = FALSE;

 // If the hook could not be set, do not show it as a possibility
 // for being hooked (This used to be EnableWindow but the 
 // difference is too subtle to notice, so I just make it go away)

 if(hook_table[i]->id != 0)
 GetDlgItem(hook_table[i]->id)->ShowWindow(
 (hook_table[i]->hook != NULL 
 ? SW_SHOW 
 : SW_HIDE));
 } /* initialize it */
 }

#define UWM_INITIALIZE (WM_USER+1)

BOOL CPromptDlg::OnInitDialog()
{
 // ...
 PostMessage(UWM_INITIALIZE, 0, 0L);
 // ...
}


BEGIN_MESSAGE_MAP(CPromptDlg, CDialog)
 ...
 ON_MESSAGE(UWM_INITIALIZE, OnUserInitialize)
 ON_MESSAGE(UWM_UPDATE, OnUserUpdate)
 ...
END_MESSAGE_MAP()

LONG CPromptDlg::OnUserInitialize(WPARAM wParam, LPARAM lParam)
 {
 InitProcTable();
 SetHooks();
 EnableHooks();
 return 0;
 }



Listing Four

void CPromptDlg::SetHooks()
 {
 p_LoadLibrary.active = p_LoadLibrary.enable = m_LoadLibrary;
 p_LoadLibrary.settable = TRUE;

 p_LoadModule.active = p_LoadModule.enable = m_LoadModule;
 p_LoadModule.settable = TRUE;

 p_WinExec.active = p_WinExec.enable = m_WinExec;
 p_WinExec.settable = TRUE;
 }


void CPromptDlg::EnableHooks()
 {
 int i;
 for(i=0; hook_table[i] != NULL; i++)
 { /* check it */
 if(!m_Disabled && !hook_table[i]->set)
 { /* set it */
 if(hook_table[i]->hook != NULL && hook_table[i]->settable)
 { /* settable */
 ProcHook(hook_table[i]->hook);
 hook_table[i]->set = TRUE;
 } /* settable */
 } /* set it */
 else
 if(m_Disabled && hook_table[i]->set)
 { /* release it */
 if(hook_table[i]->hook != NULL)
 { /* unsettable */
 ProcUnhook(hook_table[i]->hook);
 hook_table[i]->set = FALSE;
 } /* unsettable */
 } /* release it */
 } /* check it */
 }




Listing Five

HINSTANCE __export WINAPI Free_WinExec(LPCSTR filename, UINT cmdshow)
 {
 HINSTANCE inst;
 BOOL LoadModule_active = p_LoadModule.active;

 // Unhook the procedure

 ProcUnhook(p_WinExec.hook);
 p_WinExec.set = FALSE;

 /*
 Since WinExec eventually calls LoadModule, we want LoadModule to
 do the mapping. If LoadModule is not active, we activate it so it
 will do the remapping. Note that this is independent of the
 LoadModule check box, which says that *all* LoadModule calls
 will be mapped.
 */

 p_LoadModule.active = p_WinExec.active;

 // Now call the real, underlying version

 inst = (HINSTANCE)WinExec(filename, cmdshow);

 p_LoadModule.active = LoadModule_active;

 // Rehook WinExec so we get called again

 ProcHook(p_WinExec.hook);
 p_WinExec.set = TRUE;

 return inst; // return failure code
 }



Listing Six

HINSTANCE __export WINAPI Free_LoadLibrary(LPCSTR filename)
 {
 HINSTANCE inst; // return instance

 // Unhook the procedure and mark it as unhooked

 ProcUnhook(p_LoadLibrary.hook);
 p_LoadLibrary.set = FALSE;

 /*
 Note that LoadLibrary calls LoadModule, which may in turn call
 LoadLibrary. If LoadLibrary is active we want name mapping, so
 we have to make sure LoadModule is hooked. It might be unhooked
 because it unhooked itself to call the *real* LoadModule which has
 now called LoadLibrary.
 */

 BOOL need_unhook_LoadModule = FALSE;
 BOOL prev_active = p_LoadModule.active;


 if(p_LoadLibrary.active)
 { /* we want LoadLibrary calls */
 if(!p_LoadModule.set)
 { /* enable LoadModule hook */
 ProcHook(p_LoadModule.hook);
 need_unhook_LoadModule = TRUE;
 // mark it has both hooked and active
 p_LoadModule.set = TRUE;
 p_LoadModule.active = TRUE;
 } /* enable LoadModule hook */
 } /* we want LoadLibrary calls */
 else
 { /* we don't want LoadLibrary calls */
 if(p_LoadModule.set)
 { /* disable LoadModule hook */
 p_LoadModule.active = FALSE;
 } /* disable LoadModule hook */
 } /* we don't want LoadLibrary calls */

 // Now call the real, underlying version

 inst = LoadLibrary(filename);

 // If we had hooked LoadModule to do the mapping, unhook it how
 if(need_unhook_LoadModule)
 { /* was set */
 ProcUnhook(p_LoadModule.hook);
 // mark it as unhooked and restore its active flag
 p_LoadModule.set = FALSE;
 } /* was set */

 // Restore the LoadModule.active flag to its incoming setting
 p_LoadModule.active = prev_active;

 // Reset the LoadLibrary hook 
 ProcHook(p_LoadLibrary.hook);
 p_LoadLibrary.set = TRUE;

 return inst; // return failure code
 }



Listing Seven

HINSTANCE Call_LoadModule(LPCSTR filename, LPVOID parms)
 {
 HINSTANCE inst;

 // Unhook the procedure

 if(p_LoadModule.set)
 { /* unset it */
 ProcUnhook(p_LoadModule.hook);
 p_LoadModule.set = FALSE;
 } /* unset it */

 // Now hook in the Pong hook (we "pong" the hooks between the ponghook

 // and the LoadModule hook)

 if(!p_Pong.set && p_Pong.hook != NULL)
 { /* set it */
 PongHook();
 p_Pong.set = TRUE;
 } /* set it */
 inst = LoadModule(filename, parms);

 return inst;
 }

#define MAX_LOADMODULE_DEPTH 25
static UINT Prevailing_ErrorMode[MAX_LOADMODULE_DEPTH];
static int LoadModule_Depth = 0;

HINSTANCE __export WINAPI Free_LoadModule(LPCSTR filename, LPVOID parms)
 {
 static int Failure_Depth = 0;
 static char filename_stack[MAX_LOADMODULE_DEPTH * _MAX_PATH];
 static char * filename_ptr = filename_stack;

 HINSTANCE inst;
 char * new_filename = filename_ptr;

 // We first set it up so that if the "direct" load fails, we don't
 // get an error message that would confuse and annoy the user.
 // If a box was supposed to pop up, and we can't redirect the
 // load, we will force it to pop up before we leave this procedure

 LoadModule_Depth++;

 UINT Old_ErrorMode = Call_SetErrorMode(SEM_NOOPENFILEERRORBOX);

 // If this is the first call to LoadModule, either from the user
 // or via WinExec or LoadLibrary, save the current error mode
 // so we can issue error messages correctly

 if(LoadModule_Depth < MAX_LOADMODULE_DEPTH)
 Prevailing_ErrorMode[LoadModule_Depth] = 
 Prevailing_ErrorMode[LoadModule_Depth - 1];

 // Prepare to maintain the error mode by SetErrorMode
 BOOL must_unhook_SetErrorMode = !p_SetErrorMode.set;
 if(!p_SetErrorMode.set && p_SetErrorMode.hook != NULL)
 { /* set it */
 ProcHook(p_SetErrorMode.hook);
 p_SetErrorMode.set = TRUE;
 } /* set it */

 /*
 Here's a tricky bit. 

 LoadModule can cause LoadLibrary to be called, which in turn calls
 LoadModule. But loading a DLL could cause loading of another DLL
 and we want to keep the chain going all the way. 
 */

 // If the LoadLibrary mapping option is set, we make sure that

 // LoadLibrary is hooked, and also that it is active
 // This prepares us for half of the recursion

 BOOL must_unhook_LoadLibrary = FALSE;

 if(p_LoadLibrary.enable && !p_LoadLibrary.set)
 { /* set it */
 ProcHook(p_LoadLibrary.hook);
 p_LoadLibrary.set = TRUE;
 must_unhook_LoadLibrary = TRUE;
 } /* set it */

 // Now call the *real* LoadModule, which may yet call LoadLibrary
 // or it may call LoadModule recursively.

 Failure_Depth = 0; // do we fail immediately or at lower level?
 inst = Call_LoadModule(filename, parms);

 if(inst < HINSTANCE_ERROR)
 { /* failed, do our special retry */
 // If we are not "active", we just return the failure code
 if(!p_LoadModule.active)
 { /* not active */
 goto bad_exit;
 } /* not active */

 // If we failed at a lower level, don't retry the operation
 if(Failure_Depth > 0)
 { /* already issued message */
 goto bad_exit;
 } /* already issued message */

 if(!RelativePath(filename))
 { /* not relative path */
 // It wasn't a relative path, we can't do anything
 // so just undo what is done and return to the caller with
 // the error code
 goto bad_exit;
 } /* not relative path */
 
 // Try to find a mapping for the file

 if(!GetMappedFile(filename, new_filename, _MAX_PATH))
 { /* no mapping found */
 // We don't know why it failed, but we can't recover
 // Log the "NoMap" entry for this one
 RecordFailure(filename, KEY_NOMAP, NULL, -1);
 goto bad_exit;
 } /* no mapping found */

 // We have now consumed some of our "filename stack". Update
 // the filename stack pointer to point just beyond this:

 filename_ptr += lstrlen(new_filename) + 1;
 
 // We found a mapping for the top-level file. It almost certainly
 // means that it was not on the path, so let's try to load it from
 // the mapped name


 // First, we have to unhook LoadModule which has been rehooked
 // by the 'pong' hook, then rehook the ponghook:

 
 HINSTANCE new_inst = Call_LoadModule(new_filename, parms);

 if(new_inst < HINSTANCE_ERROR)
 { /* failed secondary load */
 // This may have been due to a bad mapping, or a missing DLL
 // at a lower level. Assume it is a bad mapping (most likely)
 C_var_BadMap++;
 NotifyUpdate();
 RecordFailure(filename, KEY_BADMAP, new_filename, (int)new_inst);
 goto bad_exit;
 } /* failed secondary load */

 // It loaded! Congratulate ourselves and return the instance handle
 inst = new_inst;

 // Indicate that our remapped load succeeded
 C_var_Remaps++;
 NotifyUpdate();
 } /* failed, do our special retry */
 else
 { /* direct load succeeded */
 // Well! It worked the first time with the name we were given,
 // either because it was already in the PATH or it was an absolute
 // name. Makes no matter, record the success.
 C_var_Direct++;
 NotifyUpdate();
 } /* direct load succeeded */
exit:
 // Restore the filename stack pointer to its value when we came in
 filename_ptr = new_filename;

 if(p_Pong.set)
 { /* unhook Pong */
 PongUnhook();
 p_Pong.set = FALSE;
 } /* unhook Pong */

 // If we had hooked the LoadLibrary call, unhook it so it is back in
 // its original state.

 if(must_unhook_LoadLibrary && p_LoadLibrary.set)
 { /* unhook it */
 ProcUnhook(p_LoadLibrary.hook);
 p_LoadLibrary.set = FALSE;
 } /* unhook it */

 // Unhook SetErrorMode if we hooked it at this level
 if(must_unhook_SetErrorMode && p_SetErrorMode.set)
 { /* unhook it */
 ProcUnhook(p_SetErrorMode.hook);
 p_SetErrorMode.set = FALSE;
 } /* unhook it */

 // Hook LoadModule back into the chain
 if(!p_LoadModule.set)

 { /* re-set it */
 ProcHook(p_LoadModule.hook);
 p_LoadModule.set = TRUE;
 } /* re-set it */

 // Reset the error mode (if we haven't already)
 Call_SetErrorMode(Old_ErrorMode);

 // Decrement the depth count
 LoadModule_Depth--;

 // return handle or failure code
 return inst;

bad_exit:
 // This may look strange, but what we want to do is force the
 // dialog box to come up if it would have on a straight call

 
 // First, we set the depth based on the current LoadModule depth,
 // including a depth limit
 int depth = (LoadModule_Depth < MAX_LOADMODULE_DEPTH 
 ? LoadModule_Depth
 : MAX_LOADMODULE_DEPTH - 1);
 
 Call_SetErrorMode(Prevailing_ErrorMode[depth]);
 if(Prevailing_ErrorMode[depth] != SEM_NOOPENFILEERRORBOX)
 { /* force error message */
 HINSTANCE err;
 err = Call_LoadModule(filename, parms);
 } /* force error message */

 // Now record the failure depth so we can tell where we failed
 Failure_Depth = LoadModule_Depth;
 goto exit;
 }



Listing Eight

BOOL RelativePath(LPCSTR filename)
 {
 static char nearfile[_MAX_PATH];
 static char drive[_MAX_DRIVE];
 static char path[_MAX_DIR];
 static char file[_MAX_FNAME];
 static char ext[_MAX_EXT];

 if(lstrlen(filename) > _MAX_PATH)
 return FALSE;
 lstrcpy(nearfile, filename);
 _splitpath(nearfile, drive, path, file, ext);
 return !(strlen(drive) > 0 strlen(path) > 0);
 }



Listing Nine


IP instr disassembly
244: 45 inc bp 
245: 55 push bp 
246: 8bec mov bp,sp 
248: 1e push ds 
249: 1f pop ds <= Pong hook placed here
24a: 687502 push 0275
24d: 8b460a mov ax, [word ptr bp+0a]
250: 8b4e0c mov cx, [word ptr bp+0c]
253: e82f01 call 385



Listing Ten

=============================================================================
Prolog code
=============================================================================
; void __export WINAPI Free_Pong()
; {
 ?Free_Pong@@ZCXXZ:
 *** 000c0c 8c d8 mov ax,ds
 *** 000c0e 90 xchg ax,ax
 *** 000c0f 45 inc bp
 *** 000c10 55 push bp
 *** 000c11 8b ec mov bp,sp
 *** 000c13 1e push ds
 *** 000c14 8e d8 mov ds,ax
 *** 000c16 b8 00 00 mov ax,OFFSET L21402
 *** 000c19 9a 00 00 00 00 call FAR PTR __aFchkstk
 *** 000c1e 56 push si
 *** 000c1f 57 push di
=============================================================================
Epilog code
=============================================================================

; UINT off = OFFSETOF(p_Pong.proc);
 *** 000cc5 a1 02 00 mov 
 ax,WORD PTR ?p_Pong@@3Uhook_entry@@A+2
 *** 000cc8 8b 16 04 00 mov 
 dx,WORD PTR ?p_Pong@@3Uhook_entry@@A+4
 *** 000ccc 89 46 fa mov WORD PTR -6[bp],ax
; __asm { // In some places in the world, doing this would be a capital
; // offense. Void where prohibited by law.
; mov ax, off
 *** 000ccf 8b 46 fa mov ax,WORD PTR -6[bp]
; mov [word ptr BP+2],ax
 *** 000cd2 89 46 02 mov WORD PTR 2[bp],ax
; }
; }
 *** 000cd5 e9 00 00 jmp L20315
 L20315:
 *** 000cd8 5f pop di
 *** 000cd9 5e pop si
 *** 000cda 8d 66 fe lea sp,WORD PTR -2[bp]
 *** 000cdd 1f pop ds
 *** 000cde 5d pop bp
 *** 000cdf 4d dec bp

 *** 000ce0 cb ret OFFSET 0




Listing Eleven

static WORD pong_datasel; // DS: based selector for pong hook
static WORD pong_codesel; // CS: based selector for pong hook

void WritePong(unsigned char val)
 {
 unsigned char FAR * p;
 p = (unsigned char FAR *)MAKELP(pong_datasel, OFFSETOF(p_Pong.proc));
 *p = val; // change from JMP to CALL or back
 }

void PongHook()
 {
 ProcHook(p_Pong.hook);
 WritePong(0x9A); // JMP => CALL
 }

void PongUnhook()
 {
 WritePong(0xEA); // => JMP
 ProcUnhook(p_Pong.hook);

 }
void __export WINAPI Free_Pong()
 {
 PongUnhook();
 p_Pong.set = FALSE;

 if(!p_LoadModule.set)
 { /* rehook LoadModule */
 ProcHook(p_LoadModule.hook);
 p_LoadModule.set = TRUE;
 } /* rehook LoadModule */

 // Now reset the return address pointer to point to the place we
 // had set the PongHook
 UINT off = OFFSETOF(p_Pong.proc);
 __asm { // In some places in the world, doing this would be a capital
 // offense. Void where prohibited by law.
 mov ax, off
 mov [word ptr BP+2],ax
 }
 }
// The following lines are added to CPromptDlg::InitProcTable:

p_Pong.proc = (FARPROC) ((char _huge *)p_LoadModule.proc + pong_offset);

pong_codesel = SELECTOROF((p_Pong.proc));
pong_datasel = AllocCStoDSAlias(pong_codesel);

// The following line is added before the program terminates

FreeSelector(pong_datasel);




Listing Twelve

UINT Call_SetErrorMode(UINT mode)
 {
 UINT oldmode;
 BOOL set = p_SetErrorMode.set;

 if(p_SetErrorMode.set && p_SetErrorMode.hook != NULL)
 { /* unset it */
 ProcUnhook(p_SetErrorMode.hook);
 p_SetErrorMode.set = FALSE;
 } /* unset it */
 oldmode = SetErrorMode(mode);
 if(set && p_SetErrorMode.hook != NULL)
 { /* reset it */
 ProcHook(p_SetErrorMode.hook);
 p_SetErrorMode.set = TRUE;
 } /* reset it */
 return oldmode;
 
 }

UINT __export WINAPI Free_SetErrorMode(UINT mode)
 {
 // Set the prevailing mode to be the one we want
 int depth = (LoadModule_Depth < MAX_LOADMODULE_DEPTH 
 ? LoadModule_Depth
 : MAX_LOADMODULE_DEPTH - 1);
 Prevailing_ErrorMode[depth] = mode;

 int oldmode;
 oldmode = Call_SetErrorMode(mode);

 return oldmode;
 }



Listing Thirteen

static HWND C_var_PostWnd = NULL;
static BOOL C_var_Posted = FALSE;

void NotifyUpdate()
 {
 if(C_var_PostWnd != NULL && !C_var_Posted)
 { /* post it */
 PostMessage(C_var_PostWnd, UWM_UPDATE, 0, 0L); 
 C_var_Posted = TRUE;
 } /* post it */
 }
LONG CPromptDlg::OnUserUpdate(WPARAM wParam, LPARAM lParam)
 {
 C_var_Posted = FALSE; // allow more posts to come thru
 if(IsIconic())
 return 0; // don't update iconic window


 char val[20];

 wsprintf(val, "%ld", C_var_NoMap);
 c_NoMap.SetWindowText(val);

 // ... other values formatted and displayed here

 return 0;
 }
void CPromptDlg::OnSize(UINT nType, int cx, int cy)
{
 // If we have a minimized window, we don't want to spend any time 
 // updating its performance display, so NULL out the PostWnd used by
 // the callbacks

 if(nType == SIZE_MINIMIZED)
 C_var_PostWnd = NULL;
 else
 C_var_PostWnd = m_hWnd;
 
 if(nType == SIZE_MINIMIZED && m_Hide)
 { /* hide icon */
 ShowWindow(SW_HIDE);
 return;
 } /* hide icon */

 // Since it was not a minimzation request, suggest that we should update
 // the performance display
 PostMessage(UWM_UPDATE, 0, 0L);

 // Now do usual WM_SIZE processing...
 CDialog::OnSize(nType, cx, cy);
 
}



























Special Issue, 1994
Exception Handlers and Windows Applications


Here's an invaluable debugging tool




Joseph Hlavaty


Joe is a systems programmer at a major hardware vendor. He is a graduate of
Georgetown University and currently lives and works in South Florida. He can
be contacted at 72370,1265.


Every Windows user has at one time or another faced the dreaded UAE
(Unrecoverable Application Error) or Application Execution Error. This is
certainly frustrating for the user and can result in lost time and perhaps
effort because the faulting application is removed from Windows. It can be
troublesome for the application developer, too, because the user may not be
able to effectively communicate the problem to the developer. Scenarios
leading up to such errors can often be difficult, if not impossible, to
reproduce.
While there are tools (such as Dr. Watson) that extract information about an
exception, there are no tools for debugging a currently trapping application.
Nor is there a general reference for writing exception handlers under Windows.
Other than a few ToolHelp functions, there is little available to help you
debug traps in Windows applications. Consequently, I've written TrapMan, the
Windows Trap Manager--a debugging tool for analyzing exceptions in Windows
applications. 
TrapMan runs in any currently available protected-mode version of Windows.
I've tested TrapMan extensively in Windows 3.0 and 3.1 (Standard and Enhanced
modes), Win-OS/2 2.0 (Standard-mode Windows 3.0-compatible support under OS/2
2.0), and Win-OS/2 2.1 (Standard- and Enhanced-mode Windows 3.1-compatible
support found in OS/2 2.1). TrapMan will not run in real mode--it is a
protected-mode-only application. Additionally, if you wish to debug a
currently faulting application caught in one of TrapMan's handlers, you must
be running a debugger capable of processing unowned INT 3hs in code. I prefer
Nu-Mega's Soft-Ice for Windows for debugging DOS-based versions of Windows and
the OS/2 kernel debugger for debugging OS/2-based versions (I use both almost
on a daily basis). Both debuggers can handle an INT 3h instruction in code
that they did not place there. 
Before discussing exception handlers, I'll review a number of concepts central
to their understanding, namely, the System VM in Windows, DPMI (DOS Protected
Mode Interface), protected-mode selectors, and interrupts and exceptions in
286 or greater (286/386/i486, and Pentium) processors. The complete source
code and binaries for TrapMan (including DeadMan, a sample application
program) are provided electronically; see "Availability," page 3.


The System VM


System VM technically refers to the first VDM (or virtual-DOS machine) started
in Windows 3.0 or 3.1 in Enhanced mode. A VM (or virtual machine) is an
emulated (virtual) 8086 processor available as a special mode on 386 (and
higher) processors for use in Windows and Windows applications. Here, I'll use
the term "System VM" somewhat loosely, referring to some address space where
all Windows applications reside, which may or may not be equivalent to a
virtual machine. (For example, Windows 3.0 Standard mode does not use
virtual-8086 mode.) Why is this System VM so important? First, separate page
tables are kept for each VM. If all Windows applications were not run in the
same VDM, then they would be inaccessible to each other. Under OS/2 2.x, you
can run multiple copies of Windows. Each copy is run in a separate VM and
cannot access or interfere with any other.
There is another feature of System VM that makes programs such as TrapMan
possible. Protected-mode Intel processors such as the 386 can access memory
through one of two tables. These tables are the GDT (Global Descriptor Table)
and the LDT (Local Descriptor Table), and they are used by operating systems
to limit memory access among applications. Simply stated, a protected-mode
program has access to memory addressable by the GDT and by one LDT. No other
memory is accessible.
In OS/2, separate OS/2 programs are assigned separate LDTs, so one program
cannot accidentally or intentionally modify the memory of another. The
information in an individual program's LDT simply does not include information
about the memory of any other process unless it is explicitly shared by the
owning process.
As a second option, Intel specifications permit tasks to share a single LDT,
and this is how Windows operates under both OS/2 and DOS. Because of this,
TrapMan (and any other Windows program) can access memory in any Windows
program or within the Windows kernel itself. This foregoes protection from
interapplication corruption, but gives Windows applications great flexibility
in how they interact with each other and with Windows itself.


DPMI


The DOS Protected Mode Interface (DPMI) is a set of entry points that permit
protected-mode applications ("DPMI clients") to perform various tasks
involving selector allocation, interrupt hooking, and the like, without
affecting the integrity of the overall operating system.
Such tasks are generally considered privileged in multiprogram systems such as
Windows, and applications are not permitted to implement these tasks
themselves. However, they can request that a more privileged program (in this
case the "DPMI host") make these changes for them.
I'll use a small subset of the available DPMI calls: GetVersion(),
GetProcessorExceptionHandler(), SetProcessorExceptionHandler(),
AllocateLDTDescriptors(), FreeLDTDescriptor(),GetSegmentBaseAddress(),
SetSegmentBaseAddress(), and SetSelectorLimit(). These are only a few of the
50 or so DPMI calls available under the 0.90 specification. Microsoft
officially supports only a few DPMI functions (see the Microsoft SDK 3.1
Programmer's Reference, Chapter 20) and the Microsoft Windows Guide to
Programming warns "Do not use DPMI services for hooking interrupts or faults."
However, all DPMI functions used in TrapMan are available in all DPMI
implementations tested. Those more familiar with DPMI may notice that I do not
(and should not) call the DPMI functions for placing the processor in
protected mode. Windows does this before TrapMan (or any other application) is
loaded. 


Protected-Mode Selectors


In real mode, a segment register's value can be directly mapped to a physical
address. A CS:IP of 197:0 refers directly to address 1970 (197h << 4). In
protected mode, however, there is a level of indirection. The same CS:IP
cannot be directly mapped to a physical address. A segment register such as CS
is loaded with a segment-selector value in protected mode, and this value must
be translated to get a physical address. (Paging on 386-family processors
differs. With paging enabled, the value calculated is actually a linear
address that must be converted to a physical address. For the purposes of this
article, linear address=physical address.)
A selector is read as 197 hex = 0000 0001 1001 0111 binary, where 0000 0001
1001 0 (190) is the selector index (all numbers are in hex unless otherwise
noted), bit 2 is the table indicator (a 1 value indicates that this is an LDT
selector), and bits 1 and 0 give the requested privilege level (11 is ring 3,
the least privileged). Inside the LDT entry for 190 is a base address (for
example, 0040 0000). It is this base address that must be added to the offset
to get the resulting physical address. For debugging traps in protected mode,
a basic understanding of selectors is essential. However, much of the
difficult work of selector translation is done automatically by the processor.


Protected-Mode Interrupts and Exceptions


Interrupts in protected mode are similar to their real-mode counterparts.
However, instead of the interrupt-vector table (IVT) in low memory, interrupts
are processed based on a protected-mode interrupt-descriptor table (IDT). For
this reason, protected-mode interrupts cannot be watched by changing the
interrupt vector table; instead, DOS or DPMI calls must be used to get or set
a protected-mode handler.
There are three classes of exceptions: FAULTS, TRAPS, and ABORTS. Basically, a
FAULT happens before any changes are made in the system. An example is the
general-protection fault, or "GPFault," as it is known to Windows users. Such
a fault might be the pseudocode fragment in Example 1 (which is not valid
Intel assembler), which attempts to get the address of the current INT 3h
handler in the real-mode interrupt-vector table (4 bytes, starting at physical
address 0Ch) by placing 0 in a segment register. This procedure is perfectly
valid in real mode; however, in protected mode a selector of 0 is not valid,
and the actual use of this invalid selector in the third instruction will
cause a fault before the third instruction is executed. CS:IP will still point
to the faulting instruction. Faults are restartable. It is the restartability
of faults that permits TrapMan (and Windows) to terminate faulting
applications.
A TRAP, on the other hand, is caught after any changes are made to the system.
An example of a trap is the special 1-byte INT 3h (0CCh) instruction used by
TrapMan to break to a debugger. The INT 3h handler is called after the INT 3h
is executed by the processor, CS:IP now points to the instruction after the
INT 3h.
Another type of exception is an ABORT. Since these are generally caused by
things such as hardware errors, I'll not discuss them here.
TrapMan was designed for debugging FAULTS. Some common faults found on
80286/80386/80486 Intel processors are listed in Table 1. 


TRAPMAN.C



TrapMan is written almost entirely in C (with the exception of some inline
assembler) using Microsoft C 6.0a. For reasons I'll discuss later, C is not
really the best choice for writing exception handlers. Because it is
comfortable for most programmers, however, I used C here to show the basic
ideas behind exception handlers and debugging with them.
The file trapman.c (Listing One) is fairly straightforward code. The first
thing unusual about TrapMan's source is in the global data area. To set and
reset the handlers for various exceptions as the user changes exception
handlers, you must always query the current handler for a particular exception
(such as trap D, which would be stored in the _Prev13 code pointer). For this
version of TrapMan, you can install up to four handlers independently (Traps
0, 6, 12, and 13, also known as DivideByZero, InvalidOpCode, StackFault, and
GPFault, respectively). Additionally, you keep flags marking whether or not
the Previous handler code is current (that is, whether or not we are currently
watching a particular exception). See Figure 1. 
Execution of TrapMan begins in the same manner as all Windows applications in
WinMain(). To avoid parsing TrapMan arguments, TrapMan accesses the C run-time
globals __argc and __argv; see Listing Two, page 71.) If any arguments are
passed in on TrapMan's command line (argc >1), then argv[1] is taken as a
program name to be debugged and passed to WinExec(). This name should be a
fully qualified path to the application, if it is not on the path or in one of
the directories searched by WinExec().
Additionally, TrapMan will not permit multiple instances of itself to be run.
While this is not important in the C version, the versions given in future
modules have handlers that contain extensive amounts of self-modifying code.
You can't have multiple instances of an application modifying the same
code--it just doesn't work very well! For the same reason, all TrapMan code is
preload nondiscardable (although only the handlers and certain support
functions actually need to be nondiscardable). 
The instance initialization processing (InitInstance) does some important
bookkeeping work for TrapMan. First, it gets handles to the main window's Trap
and Option menus so that we can modify them later as the user selects and
deselects various menu items. It also sets up the default TrapMan
configuration. While this configuration is not used with the C handlers
described in this article, this section is left in both for completeness and
to give an idea of what the user interface of future versions will look like. 
All possible exit points to the application must be covered in the window
procedure for the main window (MainWndProc) because in all cases you must
reset any set trap handlers to their previous values before exiting the
application. 
For our purposes, I'm assuming that all previous handlers belong to Windows or
DPMI, so I won't check them for validity. If private exception handlers become
more common, you'll need to verify that the current handlers are your own
(that is, they have not been replaced by another app) before resetting them.
You'll also need to verify that the code is valid. (To do this, you'll need to
keep track of the signature for the previous handler to make sure that it has
not been replaced by another application's code.)
The PutInEditControl() routine creates a 10K buffer that it uses to manipulate
the contents of the edit control. The routine is sufficient for explanatory
purposes, although handlers in your applications may not use an edit control.
I felt it essential that users be able to update the TrapMan window's text as
required (with a description of events leading up to the trap, for example),
either before or after the trap occurred. An edit control seemed the most
straightforward manner to do this, but flicker is a problem. No matter where
the text cursor is in the buffer, new text from TrapMan is always placed at
the bottom of the buffer. This is done so that users can manipulate the buffer
while maintaining synchronous messaging from TrapMan or the operating system.
The SaveBuffer() routine simply writes the buffer to the given filename, and
it will warn the user before replacing an existing file on disk. The
HandleTraps() and HandleOptions() routines process WM_COMMAND messages for the
Trap and Options pulldowns, respectively.
TrapMan uses a number of DPMI calls. DPMI uses registers to pass arguments,
and all the functions we'll need will use the INT 31h interface. Note that
DPMI functions consistently return with the carry flag set to mark failure (as
does DOS). All of these functions assume that AX is to be used to return
16-bit values, and DX:AX to return 32-bit values. This is consistent with the
Windows API and with most DOS C compilers, and is done so that DPMI calls can
be conveniently made from C code. This module contains wrappers for the
following functions:
DPMIGetVersion returns the version number of the DPMI host. Windows 3.0/3.1
and OS/2 2.0 return 005a (90 decimal); OS/2 2.1 returns 005f (95 decimal), due
to some additional features.
DPMIGetProcessorExceptionHandlr returns the current exception handler for the
exception number passed in as its argument, or 0 for failure.
DPMISetProcessorExceptionHandlr returns 0 if okay. It takes two arguments: the
exception number and a far pointer to the routine that will handle it.
DPMIAllocateLDTDescriptors returns a base selector or 0=error. It takes one
argument: the number of descriptors to allocate (usually 1).
DPMIFreeLDTDescriptor returns the selector freed if successful, otherwise
returns 0. It takes one argument: a selector allocated via
DPMIAllocateLDTDescriptors.
DPMIGetSegmentBaseAddress returns the base address of the given selector.
DPMISetSegmentBaseAddress returns AX=0 if failure. It takes three arguments:
the selector whose base is to be modified, the hiword of the new base, and the
loword of the new base.
The HANDLER.C file contains the handlers for exceptions that TrapMan watches.
Note: 
Exception handlers are far procedures (they must far return--RETF--back to
DPMI). 
Exception handlers take no parameters (although DPMI does push a 16-bit
exception frame onto the stack before calling).
Exception handlers must be preloaded and nondiscardable.
When called, an exception handler has only cs:ip valid and pointing to the
handler's code. All other registers belong either to the faulting application
or to DPMI. The handler must change them, so many exception handlers push the
client registers first. TrapMan's assembly-language handlers will do the same.
At the INT 3h in our GPFault handler, for example, the DPMI fault-exception
frame begins at SS:SP+8. One GPFault had an exception frame that looked
something like Figure 2. At ErrorFrame+0, you find the return address to DPMI
(3b:0204); at ErrorFrame+4, the error code (a 0 selector); and at
ErrorFrame+6, the address of the faulting instruction (1c7f:45d).
Upon disassembling the faulting instruction address, you find a STOSB. Since
we know that a stosb instruction takes the value in AL and puts it in ES:DI,
we know that a bad ES value can cause the trap. We see that ES is 0--this is
the cause of the trap. Finally, at ErrorFrame+ah you find the flags, and at
ErrorFrame+ch, a far pointer to the application's stack (1c77:15e8 in this
case).


Using TrapMan to Debug a Trap or Fault 


Both Windows and Win-OS/2 exception handlers are 16-bit handlers. On entry to
a handler, all registers of the faulting application are preserved except CS,
IP, SS, and SP. These registers are available in the exception stack frame.
The stack frame begins at SS:SP and is laid out as in Table 2.
The far ptr at SS:SP+0Ch is that of the application's stack at fault time. It
may be used to trace back a C calling stack or to examine the parameters of
the faulting procedure (if a suitable stack frame has been set up). This is
done by dumping data using the SS of the faulting app with the current bp or
the value of SP of faulting app. The far ptr at SS:SP+6h is the instruction of
the app whose attempted execution has generated the fault.
The error code is very similar to a selector in protected mode. The high 13
bits are the selector index (bits 3--15), and bit 2 is the table index.
However, instead of an RPL (requested privilege level), bits 0 and 1 have the
following meaning: Bit 0 (EXT) is set if the fault was called by an event
external to the program; bit 1 (IDT) is set if the selector index refers to a
gate descriptor in the IDT.


Closing Notes


The DPMI offsets described here are valid only at entry to the exception
handler, as further modification of the stack will change SP. TrapMan's
exception handlers are written so that the DPMI stack and registers are valid
on the fault INT 3h (if Break On Fault option is checked).
You must be running a protected-mode debugger if you wish to stop on INT 3h
during faults. Otherwise, a debugger is not necessary to use TrapMan.
Currently, TrapMan does not add handlers for INT 3h, nor are there plans to do
so.
Certain portions of the Windows and Win-OS/2 debug kernels do not use
OutputDebugString() to write to the debugger. Their output will not be trapped
even if OutputDebugString() is hooked. The functionality to add this hook
(K328 or _DebugOutput, see Undocumented Windows, p.205) will be included in a
future version of TrapMan. Executing random sections of memory frequently
causes Trap D or Trap 7 faults.
Example 1: Pseudocode fragment that leads to a GPFault.
mov ds, 0
mov bx, 0Ch
mov OffInt3, word ptr ds:bx ; GPFault in protected mode!
 ; Trying to access selector of 0!
mov SegInt3, word ptr ds:bx+2
Table 1: Some common faults found on 80286, 80386, and 80486 Intel
microprocessors.
 Fault Description 
 Interrupt 0: Divide-by-zero Trap 0
 Interrupt 6: Invalid opcode Trap 6
 Interrupt 8: Double
 Interrupt 12: Stack Trap C or Trap 12
 Interrupt 13: General protection Trap D or Trap 13
Figure 1: TrapMan in action. Only the default handlers (Traps 6, 12, and 13)
are set.
This instance's DS is 12AF
The DPMI Version is 5A
Trapping apps will be terminated
Enabling breaking to debugger on traps
Enabling beeping on traps

Prev Trap 13 handler is 1175FA2
Trap 13 (general protection fault) handler installed
Prev Trap 12 handler is 1175F93
Trap 12 (stack fault) handler installed
Prev Trap 6 handler is 1175FB1
Trap 6 (invalid opcode) handler installed
Trap 13!
E:\DEADMAN\DEADMAN.EXE
DPMI app regs:
 CS:IP = 1307:056C
 SS:SP = 12FF:15AE
 with selector 7470 (flags=0212)
App regs:
 AX=15AC BX=15EC CX=0000 DX=1800
 IP=**** SP=**** BP=15AE SI=15EC DI=1800
 CS=**** SS=**** DS=12FF ES=12FF
Dump of 12FF:15AE:
00___02___04___06__+__08___0A___0C___0E====0123456789ABCDEF
7473 6E69 2167 2020 4144 474E 5245 2121 sting! DANGER!!
2021 5453 4341 204B 564F 5245 5257 5449 ! STACK OVERWRIT
Trap 6!
E:\DEADMAN\DEADMAN.EXE
DPMI app regs:
 CS:IP = 130F:04F4
 SS:SP = 131F:1596
 with selector 6E58 (flags=0287) App regs:
 AX=159A BX=04F2 CX=131F DX=130F
 IP=**** SP=**** BP=159C SI=15D8 DI=1800
 CS=**** SS=**** DS=131F ES=131F
Dump of 131F:1596:
00___02___04___06__+__08___0A___0C___0E====0123456789ABCDEF
15D8 1800 131F 15AF 0278 130F 15D8 1800 ########x#######
0001 0201 130F 131F 15C9 27BB 04A7 0000 ###########'####
Figure 2: Typical GPFault exception frame.
87:fd0 0204 003b 0000 45d 1c7f 0287 15e8 1c77
Offset 0 2 4 6 8 ah ch eh
Table 2: Exception stack frame.
SS of faulting app 0Eh
SP of faulting app 0Ch
Flags 0Ah
CS of faulting app 08h
IP of faulting app 06h
Error Code 04h
Return CS 02h
Return IP 00h

Listing One
/****************************************************************************
 FILE: trapman.c -- a trap manager for Windows 3.x and Win-OS/2
 Copyright (c) 1994, Joseph Hlavaty This product is CAREWARE. Please
 refer to the licensing agreement LICENSE.AGR on the diskette.
 PURPOSE: source file for trapman.exe, contains WinMain, window procedure
 and menu processing routines
****************************************************************************/
#define NOSOUND
#define NOCOMM
#define NODRIVERS
#define NOMINMAX
#define NOLOGERROR

#define NOPROFILER
#define NOLFILEIO
#define NOOPENFILE
#define NOATOM
#define NOLANGUAGE
#define NODBCS
#define NOKEYBOARDINFO
#define NOGDICAPMASKS
#define NOCOLOR
#define NODRAWTEXT
#define NOTEXTMETRIC
#define NOSCALABLEFONT
#define NOMETAFILE
#define NOSYSMETRICS
#define NOSYSTEMPARAMSINFO
#include "windows.h"
#include "trapman.h"
#include "DPMI.h"
#include <string.h>
#include <stdio.h>
#define SYSCOMMAND_MASK 0xFFF0
// this is our global data area
HANDLE hwndTrapMenu ; // for checking and unchecking our trap options
void (_far *Prev13) () = NULL;
void (_far *Prev12) () = NULL;
void (_far *Prev6) () = NULL;
void (_far *Prev0) () = NULL;
int WeGot13 = 0 ;
int WeGot12 = 0 ;
int WeGot6 = 0 ;
int WeGot0 = 0 ;
WORD wDPMIVersion = 0xFFFF ;
char *TrapMsgs[15] = {
 "Trap 0 (divide by zero fault) handler installed", // 0
 " **** undefined msg *** ", // 1
 " **** undefined msg *** ", // 2
 " **** undefined msg *** ", // 3
 " **** undefined msg *** ", // 4
 " **** undefined msg *** ", // 5
 "Trap 6 (invalid opcode) handler installed", // 6
 " **** undefined msg *** ", // 7
 " **** undefined msg *** ", // 8
 " **** undefined msg *** ", // 9
 " **** undefined msg *** ", // A
 " **** undefined msg *** ", // B
 "Trap 12 (stack fault) handler installed", // C
 "Trap 13 (general protection fault) handler installed" // D
 } ;
char szBuffer[255] ;
/****************************************************************************
 PROGRAM: Trapman.c
 PURPOSE: Creates an edit window
****************************************************************************/
HANDLE hInst;
HWND hEditWnd; /* handle to edit window */
HWND hwnd; /* handle to main windows */
/****************************************************************************
 FUNCTION: WinMain(HANDLE, HANDLE, LPSTR, int)
 PURPOSE: calls initialization function, processes message loop

****************************************************************************/
int PASCAL WinMain(hInstance, hPrevInstance, lpCmdLine, nCmdShow)
HANDLE hInstance;
HANDLE hPrevInstance;
LPSTR lpCmdLine;
int nCmdShow;
{
 MSG msg;
 if (!hPrevInstance)
 if (!InitApplication(hInstance))
 return (FALSE);
 if (!InitInstance(hInstance, nCmdShow))
 return (FALSE);
 while (GetMessage(&msg, NULL, NULL, NULL))
 {
 TranslateMessage(&msg);
 DispatchMessage(&msg);
 }
 return (msg.wParam);
}
/****************************************************************************
 FUNCTION: InitApplication(HANDLE)
 PURPOSE: Initializes window data and registers window class
****************************************************************************/
BOOL InitApplication(hInstance)
HANDLE hInstance;
{
 WNDCLASS wc;
 wc.style = NULL;
 wc.lpfnWndProc = MainWndProc;
 wc.cbClsExtra = 0;
 wc.cbWndExtra = 0;
 wc.hInstance = hInstance;
// wc.hIcon = LoadIcon(NULL, IDI_APPLICATION);
 wc.hIcon = LoadIcon(hInstance, MAKEINTRESOURCE( TRAPMANICON ));
 wc.hCursor = LoadCursor(NULL, IDC_ARROW);
 wc.hbrBackground = GetStockObject(WHITE_BRUSH);
 wc.lpszMenuName = "TrapManMenu";
 wc.lpszClassName = "TrapManWClass";
 return (RegisterClass(&wc));
}
/****************************************************************************
 FUNCTION: InitInstance(HANDLE, int)
 PURPOSE: Saves instance handle and creates main window
****************************************************************************/
BOOL InitInstance(hInstance, nCmdShow)
 HANDLE hInstance;
 int nCmdShow;
{
 RECT Rect;
 int OurDS ;
 hInst = hInstance;
 hwnd = CreateWindow(
 "TrapManWClass",
 "TrapMan",
 WS_OVERLAPPEDWINDOW,
 CW_USEDEFAULT,
 CW_USEDEFAULT,
 400, 100,

 NULL,
 NULL,
 hInstance,
 NULL
 );
 if (!hwnd)
 return (FALSE);
 GetClientRect(hwnd, (LPRECT) &Rect);
 /* Create a child window */
 hEditWnd = CreateWindow("Edit",
 NULL,
 WS_CHILD WS_VISIBLE 
 ES_MULTILINE 
 WS_VSCROLL WS_HSCROLL 
 ES_AUTOHSCROLL ES_AUTOVSCROLL,
 0,
 0,
 (Rect.right-Rect.left),
 (Rect.bottom-Rect.top),
 hwnd,
 IDC_EDIT, /* Child control i.d. */
 hInst,
 NULL);
 if (!hEditWnd)
 {
 DestroyWindow(hwnd);
 return (NULL);
 }
 ShowWindow(hwnd, nCmdShow);
 UpdateWindow(hwnd);
 wsprintf((LPSTR) szBuffer,"PLEASE USE THE FULL VERSION OF TRAPMAN");
 PutInEditControl(szBuffer, 1) ;
 wsprintf((LPSTR) szBuffer," (INCLUDED ON DISK) FOR DEBUGGING. 
 This version ");
 PutInEditControl(szBuffer, 1) ;
 wsprintf((LPSTR) szBuffer," is meant to be used only for a better");
 PutInEditControl(szBuffer, 1) ;
 wsprintf((LPSTR) szBuffer," understanding of the article.");
 PutInEditControl(szBuffer, 1) ;
 wsprintf((LPSTR) szBuffer," ");
 PutInEditControl(szBuffer, 1) ;
 _asm push ds
 _asm pop OurDS
 wsprintf((LPSTR) szBuffer,"This instance's DS is %0X", OurDS) ;
 PutInEditControl(szBuffer, 1) ;
 wDPMIVersion = DPMIGetVersion() ;
 wsprintf((LPSTR) szBuffer,"The DPMI Version is %0X", wDPMIVersion) ;
 PutInEditControl(szBuffer, 1) ;
// standard mode Win300 returns '90h' instead of 0x5A required by the DPMI ...
 if ((wDPMIVersion > 0x005a) &&
 (wDPMIVersion != 0x90)){ // 1.0+ not supported 0x5A is 0.9
 MessageBox(NULL, "Warning: Unknown DPMI Host!", "TRAPMAN",
 MB_ICONEXCLAMATION MB_OK) ;
// return 0 ;
 }
 #define TRAPMENU 1 // the SECOND pull down (0 is first)
 hwndTrapMenu = GetSubMenu (GetMenu(hwnd), TRAPMENU);
 return (TRUE);
}

/****************************************************************************
 FUNCTION: MainWndProc(HWND, unsigned, WORD, LONG)
****************************************************************************/
long FAR PASCAL MainWndProc(hWnd, message, wParam, lParam)
HWND hWnd;
unsigned message;
WORD wParam;
LONG lParam;
{
 FARPROC lpProcAbout;
 switch (message) {
 case WM_SYSCOMMAND:
 switch (wParam & SYSCOMMAND_MASK) {
 // make sure we restore old trap handlers...
 case SC_CLOSE:
 SendMessage(hWnd, WM_COMMAND, IDM_EXIT, 0L) ;
 break;
 default:
 return (DefWindowProc(hWnd, message, wParam, lParam));
 }
 break ;
 case WM_COMMAND:
 switch (wParam) {
 case IDM_ABOUT:
 lpProcAbout = MakeProcInstance(About, hInst);
 DialogBox(hInst, "AboutBox", hWnd, lpProcAbout);
 FreeProcInstance(lpProcAbout);
 break;
 /* file menu commands */
 case IDM_NEW:
 if (IDYES == MessageBox(NULL, "Erase current edit buffer?",
 "TrapMan", MB_OKCANCEL))
 SendMessage(hEditWnd, WM_SETTEXT, 0, (LONG) (LPSTR) "");
 break ;
 case IDM_SAVE:
 case IDM_SAVEAS:
 SaveBuffer("(Untitled).trp") ;
 break ;
 case IDM_OPEN:
 MessageBox (
 GetFocus(),
 "Command not implemented",
 "TrapMan",
 MB_ICONASTERISK MB_OK);
 break;
 case IDM_EXIT:
 // reset the previous trap handlers...if any existed
 if (WeGot13 && Prev13)
 DPMISetProcessorExceptionHandlr( 13 , (void _far *) 
 Prev13 ) ;
 if (WeGot12 && Prev12)
 DPMISetProcessorExceptionHandlr( 12 , (void _far *) 
 Prev12 ) ;
 if (WeGot6 && Prev6)
 DPMISetProcessorExceptionHandlr( 6 , (void _far *) 
 Prev6 ) ;
 if (WeGot0 && Prev0)
 DPMISetProcessorExceptionHandlr( 0 , (void _far *) 
 Prev0 ) ;

 DestroyWindow(hWnd);
 break;
 /* trap menu commands */
 case IDM_GP:
 if (!WeGot13)
 {
 SetFaultProc(13, (long *) &Prev13,
 (void _far *) MyGPProc,
 TrapMsgs[13]) ;
 CheckMenuItem(hwndTrapMenu,wParam,
 MF_CHECKED MF_BYCOMMAND) ;
 WeGot13 = 1 ;
 }
 else
 {
 CheckMenuItem(hwndTrapMenu,wParam,
 MF_UNCHECKED MF_BYCOMMAND) ;
 WeGot13 = 0 ;
 if (Prev13)
 {
 PutInEditControl(" *** Resetting previous trap 13 
 handler", 1) ;
 DPMISetProcessorExceptionHandlr( 13 , (void _far *) 
 Prev13 ) ;
 Prev13 = NULL ;
 }
 }
 break;
 case IDM_STACK:
 if (!WeGot12)
 {
 SetFaultProc(12, (long *) &Prev12,
 (void _far *) MySPProc,
 TrapMsgs[12]) ;
 CheckMenuItem(hwndTrapMenu,wParam,
 MF_CHECKED MF_BYCOMMAND) ;
 WeGot12 = 1 ;
 }
 else
 {
 CheckMenuItem(hwndTrapMenu,wParam,
 MF_UNCHECKED MF_BYCOMMAND) ;
 WeGot12 = 0 ;
 if (Prev12)
 {
 PutInEditControl(" *** Resetting previous trap 12 
 handler", 1) ;
 DPMISetProcessorExceptionHandlr( 12 , (void _far *) 
 Prev12 ) ;
 Prev12 = NULL ;
 }
 }
 break;
 case IDM_INVALIDOP:
 if (!WeGot6)
 {
 SetFaultProc(6, (long *) &Prev6,
 (void _far *) MyInvalidOpProc,
 TrapMsgs[6]) ;

 CheckMenuItem(hwndTrapMenu,wParam,
 MF_CHECKED MF_BYCOMMAND) ;
 WeGot6 = 1 ;
 }
 else
 {
 CheckMenuItem(hwndTrapMenu,wParam,
 MF_UNCHECKED MF_BYCOMMAND) ;
 WeGot6 = 0 ;
 if (Prev6)
 {
 PutInEditControl(" *** Resetting previous trap 6 
 handler", 1) ;
 DPMISetProcessorExceptionHandlr( 6 , (void _far *) 
 Prev6 ) ;
 Prev6 = NULL ;
 }
 }
 break;
 case IDM_TRAPZERO:
 if (!WeGot0)
 {
 SetFaultProc(0, (long *) &Prev0,
 (void _far *) MyDivideByZeroProc,
 TrapMsgs[0]) ;
 CheckMenuItem(hwndTrapMenu,wParam,
 MF_CHECKED MF_BYCOMMAND) ;
 WeGot0 = 1 ;
 }
 else
 {
 CheckMenuItem(hwndTrapMenu,wParam,
 MF_UNCHECKED MF_BYCOMMAND) ;
 WeGot0 = 0 ;
 if (Prev0)
 {
 PutInEditControl(" *** Resetting previous trap 0 
 handler", 1) ;
 DPMISetProcessorExceptionHandlr( 0 , (void _far *) 
 Prev0 ) ;
 Prev0 = NULL ;
 }
 }
 break;
 case IDM_DEFAULT:
 if (!WeGot13)
 PostMessage(hWnd, WM_COMMAND, IDM_GP, 0L) ;
 if (!WeGot12)
 PostMessage(hWnd, WM_COMMAND, IDM_STACK, 0L) ;
 if (!WeGot6)
 PostMessage(hWnd, WM_COMMAND, IDM_INVALIDOP, 0L) ;
 break;
 case IDC_EDIT:
 if (HIWORD (lParam) == EN_ERRSPACE)
 {
 MessageBox (
 GetFocus ()
 , "Out of memory."
 , "TrapMan"

 , MB_ICONHAND MB_OK
 );
 }
 break;
 }
 break;
 case WM_SETFOCUS:
 SetFocus (hEditWnd);
 break;
 case WM_SIZE:
 MoveWindow(hEditWnd, 0, 0, LOWORD(lParam), HIWORD(lParam), TRUE);
 break;
 case WM_DESTROY:
 PostQuitMessage(0);
 break;
 default:
 return (DefWindowProc(hWnd, message, wParam, lParam));
 }
 return (NULL);
}
/****************************************************************************
 FUNCTION: About(HWND, unsigned, WORD, LONG)
 PURPOSE: Processes messages for "About" dialog box
 MESSAGES:
 WM_INITDIALOG - initialize dialog box
 WM_COMMAND - Input received
****************************************************************************/
BOOL FAR PASCAL About(hDlg, message, wParam, lParam)
HWND hDlg;
unsigned message;
WORD wParam;
LONG lParam;
{
 switch (message)
 {
 case WM_INITDIALOG:
 return (TRUE);
 case WM_COMMAND:
 if (wParam == IDOK
 wParam == IDCANCEL)
 {
 EndDialog(hDlg, TRUE);
 return (TRUE);
 }
 break;
 }
 return (FALSE);
}
int SetFaultProc(unsigned char ThisTrap, long *PrevHandler,
 void _far *MyFaultHandler, char *Msg )
{
 *PrevHandler = (void _far *) DPMIGetProcessorExceptionHandlr( ThisTrap ) ;
 wsprintf((LPSTR) szBuffer,"Prev Trap %d handler is %0lX", ThisTrap, 
 *PrevHandler) ;
 PutInEditControl(szBuffer, 1) ;
 DPMISetProcessorExceptionHandlr(ThisTrap, MyFaultHandler ) ;
 PutInEditControl(Msg, 1) ;
 return TRUE ;
}

// passing in a NULL msg pointer should cause us to free any alloced buffer
BOOL PutInEditControl(char *Msg, int bWithReturn)
{
static HANDLE hBuff = NULL ;
static LPSTR lpBuff ;
 int iSizeOfBuff = 10 * 1024 ; // 10K
 if (NULL == hEditWnd) // edit window is non-existent
 return FALSE ;
 if (NULL == Msg)
 {
 if (hBuff)
 {
 GlobalFree(hBuff) ;
 hBuff = NULL ;
 return 0 ;
 }
 }
 if (NULL == hBuff)
 {
 hBuff = GlobalAlloc(GMEM_MOVEABLE, iSizeOfBuff) ;
 if (!hBuff)
 return 0 ;
 lpBuff = GlobalLock(hBuff) ;
 if (!lpBuff)
 return 0 ;
 }
 SendMessage(hEditWnd, WM_GETTEXT, iSizeOfBuff, (LONG) lpBuff) ;
 lstrcat(lpBuff, Msg) ;
 if (bWithReturn)
 {
 lstrcat(lpBuff, "\015\012") ;
 }
 SendMessage(hEditWnd, WM_SETTEXT, 0, (LONG) lpBuff) ;
 return TRUE ;
}
int KillCurrentDOSProcess()
{
 _asm mov ah, 4Ch
 _asm mov al, 0FFh
 _asm int 21h
 if (0)
 return 0 ; // get rid of no return code error message
}
// note this allocates and deletes a save buffer each time... We should really
// query the edit control to see how big it is, instead of assuming 10K is big
// enough. This is fine for as it shouldn't generate that much output!
BOOL SaveBuffer(char *filename)
{
 HANDLE hBuff ;
 PSTR pBuff ;
 FILE *flOutput ;
 int NoOfChars, iSizeOfBuff = 10 * 1024 ;
 hBuff = LocalAlloc(GMEM_MOVEABLE, iSizeOfBuff) ;
 if (!hBuff)
 return 0 ;
 pBuff = LocalLock(hBuff) ;
 if (!pBuff)
 return 0 ;
 SendMessage(hEditWnd, WM_GETTEXT, iSizeOfBuff, (LONG) (LPSTR) pBuff) ;

_asm int 3h
 if (pBuff)
 {
 flOutput = fopen(filename, "rb") ;
 if (flOutput)
 {
 // warn user file exists -- only continue if they permit it!
 // ow return
 }
 flOutput = fopen(filename, "w") ;
 if (flOutput)
 {
 NoOfChars = strlen(pBuff) ;
 fwrite(pBuff, sizeof(char), NoOfChars, flOutput) ;
 fclose(flOutput) ;
 }
 LocalUnlock(hBuff) ;
 LocalFree(hBuff) ;
 }
 // ow edit control error!
}
BOOL PrintOutTrapRegs( unsigned int BPofFrame )
{
// a NOP in the C version of TrapMan
 if (0)
 return 0 ;
}


Listing Two
#define argc __argc
#define argv __argv
extern int argc ;
extern char **argv ;
if (argc > 1) {
 int rc ;
 rc = WinExec(argv[1], SW_SHOW) ;
 if (rc < 0x20) {
 wsprintf( szBuffer, "Cannot load '%s', error code=%d",
 (LPSTR) argv[1], rc) ;
 MessageBox(NULL, szBuffer, "TrapMan", MB_ICONEXCLAMATION MB_OK) ;
 }
 }



