LMFL#!C(:HOST "SYS" :BACKUP-DATE 2760032572. :SYSTEM-TYPE :LOGICAL :VERSION 1. :TYPE "TEX" :NAME "_BENCHMARK" :DIRECTORY ("REL3-PUBLIC" "PUBLIC" "BENCH") :SOURCE-PATTERN "( :DIRECTORY (\"REL3-PUBLIC\") :NAME :WILD :TYPE :WILD :VERSION :NEWEST)" :CHARACTERS T :NOT-BACKED-UP T :CREATION-DATE 2752870544. :AUTHOR "REL3" :LENGTH-IN-BYTES 71198. :LENGTH-IN-BLOCKS 70. :BYTE-SIZE 8.)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      % A typical header for a latex file% also sets up lozenges for abort resume and end\documentstyle[twoside,12pt]{report}   \pagestyle{headings}\topmargin = 0cm\oddsidemargin = 0cm\evensidemargin = 0cm\headheight = 5mm\headsep = 3mm\textheight = 8in\textwidth = 5.9in\makeindex\begin{document}\input{ekberg$disk:[ekberg.tex]pretxt.tex} % titlepage set up\begin{titlepage}\begin{center}{\tiny .} \\\vspace{2in}{\huge Performance Measurement Software} \\\vspace{3in}{\large Very Rough Draft} \\\vspace{1in}\today{} \\\end{center}\end{titlepage}\clearpage\setcounter{page}{1}\pagenumbering{roman}\tableofcontents\clearpage\setcounter{page}{0}\pagenumbering{arabic}\chapter{Introduction}This document describes the performance software available for runningbenchmark suites, manipulating the collected data, and reportinginformation from the collected data.  These topics are initiallydiscussed with reguard to the three standard benchmark suites, Gabriel,Zippel, and Interactive, and existing report formats, in order toprovide general, but essential information for using the performancesoftware.  The section on performance testing methodology should also beread, before running any suite, in order to ensure accuracy of thecollected data.  Additional topics discuss details of the performancesoftware data structure, the different approaches to defining your ownsuites as exemplified by the three standard suites, and how to definepersonalized reports.\chapter{Standard Benchmark Suites}\section{Introduction}This chapter will discuss the functionality of the benchmarks thatcomprise each of the standard benchmark suites, GABRIEL, ZIPPEL, andINTERACTIVE.  The benchmarks in each of the suites were designed toplace the system under specifically controlled conditions and theperformance software records system performance under these conditions.Each of the suites represents a unique class of benchmarks, each withits own method of definition.  The methods for defining benchmark suitesof each type will be discussed in the chapter, Defining New BenchmarkSuites.    \section{The Gabriel Benchmarks}   The GABRIEL benchmark suite, a subset of the Richard P.  Gabrielcollection, includes the following benchmarks: Boyer, Browse, Deriv,Dderiv, Fdderiv, Fft, Traverse, Destru, Puzzle, Triang, Tak, Takr, Stak,Ctak, Takl, Fprint, Fread, Nprint, Tprint, Frpoly, and Div2.  Each ofthese is a self-contained function or group of functions which performssome specific task while the performance software meters the function'sperformance.The following descriptions of the Gabriel benchmarks are excerpts fromthe the book {\underlinePerformance and Evaluation of Lisp Systems} byRichard P. Gabriel, published by the MIT press in 1985.Boyer, essentially a theorem-proving benchmark, is a good mixture oftypical list-structure-manipulating operations with a proportionalnumber of function calls.  Boyer performs about three quarters of amillion CARs and a half million each of CDRs, NULLs, and ATOMs.  Browse performs many of the list-structure-manipulating operationsperformed by Boyer, but the mixture of operations is proportionallysimilar to that of an expert system.  Deriv is a symbolic derivative benchmark, which uses a simple subset ofLisp and does a lot of Consing.  Dderiv is a variant of deriv that istable-driven, using property lists rather than Cond.  Fdderiv is avariant of Dderiv which optimizes Funcall by using a Funcall-likefunction that assumes it is being handed compiled code. Fft is a benchmark written by Harry Barrow which tests a variety offloating point operations, including array references. Traverse is a benchmark that creates and traverses a tree structure.  Destru benchmarks the destructive operations Rplaca and Rplacd.  Puzzle is Forest Baskett's benchmark program which solves ablock-packing puzzle invented by John Conway, using a two dimensionalarray.  Triang is a program which solves a board game using a onedimensional array.  The ''Tak'' set of Gabriel benchmarks is a family of heavily recursivefunctions, called the Takeuchi functions.  Takr is a version of Tak thatwas designed to thwart the effectiveness of cache memories.  Stak usesspecial binding to pass arguments rather than the normal argumentpassing technique.   Ctak is a version of Tak that uses CATCH and THROWto return values instead of using the function-return mechanism.  Taklis like Tak but does not perform any explicit arithmetic. Fprint is a file print benchmark that prints to a file, a test patternwhich is a  six level tree with a branching factor of six.  Nprintprints the same test pattern as Fprint to a null stream in order tomeasure the test pattern format time.  Tprint is a benchmark which teststerminal output to a 10 inch by 10 inch window.  Fread measures readingfrom a file.  It requires the existence of Fprint.tst which is createdby Fprint.Div2 is a benchmark which contains a recursive as well as an iterativetest that divides by 2 using lists of n Nil's.Frpoly is a benchmark from Berkeley that computes powers of particularpolynomials represented as lists.\section {The Interactive Benchmarks}       The INTERACTIVE benchmark suite was designed to simulate the real worldenvironment of a typical development session.  In this suite, eachbenchmark is the execution of a predefined system function or anapplication.  The basis for the benchmark suite, the text file SCRIPT,is read one top-level form at a time by the performance software.  Eachform defines a benchmark containing its name and the function orapplication to execute.  The benchmark is executed by stuffing the cdrof this form, into the keyboard buffer of the active window.  Theperformance software times each benchmark's execution and records theresults in the benchmark's history structure.The first series is the ''select-create'' benchmarks, which selectvarious applications in order to measure initial environment creation.The following applications are included in this series : converse,inspector, lisp listener, peek, backup, font editor, telnet, vt100, andzmacs editor.  The next series is the ''select'' benchmarks which execute the sameapplications as the select-create benchmarks.  The purpose of thisseries is to measure the loading performance of previously instantiatedwindows.  Once loading performance has been measured, the ''break''benchmarks moniter entering the break mode in the editor and lisplistener.  Then performance for entering the error handler is measuredby benchmarks that force an error and make a call to ''ferror.'' The next series is designed to test ZMACS performance.  One benchmarkedits a nonexistent file.  Others consist of variations of file copy,executed in order to measure remote to remote, remote to local, andlocal to local file copy performance.  Another edits a new file andenters text from the script file into the new buffer.  This benchmark isfollowed by an apropos on the word ''comment''.  The following groupmeasure performance of a cycle of file load, compile and save.  Thiscycle is executed for a remote file and then for a local file.  Then agroup designed to test source compare is executed.  Benchmarks in thisgroup include: a mark region / kill region, a source compare, a sourcemerge compare by form, a source merge compare by text, and finally anedit buffers followed by a yank region.  Next, some additional filebenchmarks are executed that measure local and remote file deletions,expunges and dired's.  The final group of edit benchmarks measurescrolling, meta-point, setting major and minor modes, and listing andediting buffers.  The final series of Interactive benchmarks is executed in the LispListener.  First a simple function, Tak, is typed into the buffer,evaluated and then compiled and executed.  Finally hostat, print-disk-label and print-herald benchmarks are executed.\section {The Zippel Benchmarks}The ZIppel Benchmark Suite consists of two subsuites, Zippels andExtensions.  The Zippels consist of eight groups of benchmarks includingInstructions, Calls, Arithmetics, Arrays, Lists, Strings, Flavors, andPathnames.  The Extensions consist of seven groups of benchmarksincluding Resource, Hash, Graphics, Array-Push, Eval, Branch, andBinding.  The Instruction Benchmarks perform basic push, pop, load, and storefunctions.  These basic operations include : push an immediate constant,push a special variable, push an instance variable, pop a value off thestack, store into a special variable, and store into an instancevariable.  The timings for these benchmarks are used as normalizationvalues for the other zipple benchmarks.  One or more of these timingsare inherent in various benchmarks and are subtracted from their resultsto improve accuracy.  This issue is discussed futher in section 8.2,Zippel Benchmark Definition.The Call Benchmarks perform basic function calling operations withvariations on the number and type of arguments supplied, as well thenumber of values returned and whether they are stored or ignored.  Eachbasic function call defines a Zipple benchmark.  Variations in thisgroup include: benchmarks that perform function calls with zero to sevenarguments and \&rest arguments, those that return zero to three valueswhich are ignored and one to four values which are stored, those thatmake calls with one to three optional arguments, some of which aresupplied and others ignored, and finally one that does a three keywordargument function call.The Arithmetic Benchmarks perform timings of arithmetic functions.  Thisgroup includes basic addition, subtraction, multiplication, and divisionof fixed, float, and big numbers.  More complex operations includeWallis, Fibonacci, and Factorial functions.  There are also polynomialoperations such as exponentiation of univariate and multivariatepolynomials, and greatest common divisor of a multivariate polynomial.The Array Benchmarks perform basic array access and store operations.There are benchmarks that perform references and stores for one and twodimensional arrays and array leaders, and for one dimensional displacedarrays, bit arrays, 8 bit arrays, and flonum arrays.  The final arraybenchmarks perform reference, store, and setf on an array leader.The List Benchmarks perform storage allocation and list accessoperations.  The first benchmarks include a cons of two integers, andthe list operation on 5,10,15 and 20 elements.  Other benchmarks includecar, cdr, cadr, rplaca, and rplacd on a list of 3 elements, nth 10,nthcdr 10, last, nreverse, and reverse on a list of 26 elements, andnconc, append, memq, member, assq, assoc, equal, and mapcar on lists ofsizes 4 and 26. The next group of Zipples, the String Benchmarks measure the performanceof string manipulative functions.  This group includes three subgroupsof benchmarks, string, print, and read.  The string benchmarks includestring-search, string-equal, string=, string-lessp, string<,string-search-char, string-search-not-char, substring, nsubstring,string-up-down-case (which performs a string-down-case followed by astring-up-case), and string and intern of a symbol.The Print Benchmarks measure the performance of various print operationsand are all executed to a null stream.  The first benchmarks executeformat using both princ and prin1.  The next series execute prin1 forthe following data types : symbol, symbol with package, fixnum, flonum,bignum, rational, string, and list.  The final benchmark does agrind-top-level on a defun.The Read Benchmarks, read (via the read function) various s-expressionsfrom a stream whose output is bound to a string.  These s-expressionsinclude the following types: string, symbol, fixnum, flonum, list, anddefun. The flavor benchmarks contain several groups that measure theperformance of various flavor mechanisms.  The first group is made up ofnormalization benchmarks which compute normalizations for functioncalls, get-handler-for, and Deamons.  The second group contains messagepassing and method combination benchmarks.  The third group containsbenchmarks which measure accessing instance variables.  The fourth groupcontains the Defflavor benchmarks.  These include simple Defflavors,Defflavors varying the number of mixins, and Defflavors with gettable,settable, and initiable instance variables.  The fifth group measureinstantiating the various flavors defined by the Defflavor benchmarks.The final group of flavor benchmarks measure the performance ofDefmethod, defining primary, before, and after methods.The final group of the Zipples category, pathname benchmarks, measuresthe performance of various operations which use logical pathnames.  Thebenchmarks in this group include Parse-pathname, Set-default-pathname,Merge-pathname-defaults, Complete-pathname, open local file for output,open local file for input, close a local input file, and close a localoutput file.    The first group of the Extension category, resource benchmarks, includesbenchmarks that measure Using-resource, Deallocate-resource, andAllocate-resource.  The Hash benchmarks includes benchmarks that measure Remhash, Swaphash,Gethash, and  Puthash for a hash table of size 1000.The Graphic benchmarks measure the performance of various graphicoperations by sending the operation to a window bound to terminal-io.The benchmarks in this group include Draw-String, Draw-Char, Draw-Line,Draw-Filled-Circle, Draw-Hollow-Circle, Draw-Triangle, Draw-Rectangle,and Bitblt, a bit block transfer operation.The Array-Push benchmarks measure the performance of the functionsArray-Push-Extend and Array-Pop.The next group, the Eval benchmarks, measure the performance of thefunctions: Eval, Symeval, Fsymeval, eval of 0,1,2 and 7 argumentfunctions, and eval of 2,3 and 7 functions.The Branch benchmark measure the performance of Branch not nil andBranch fallthrough.The Bind benchmarks measure the performance of Bind and Closure.\chapter {Performance Data Structure}\section {Introduction}   The performance data structure facilitates running benchmark suites andsaving the results, massaging selective benchmark results in order toreduce noise errors, and comparing suites ran on different machineconfigurations.  Each benchmark code and results is organized into oneor more classes as specified by the benchmark definitions.  Thebenchmarks in each class contain the results for all known machines andeach machine contains descriptive environment information for thatconfiguration.The top level of the data structure is the special variable {\bf*ALL-BENCHMARK-CLASSES*} whose value is the names of all known benchmarkclasses.  (Classes become known through {\bf Make-System} or loading aresults file.)  The value of each class is the list of benchmark namesbelonging to that class.  Finally, the property list of each benchmarkname contains pairs of the property : machine name - benchmark history.The benchmark history associated with benchmark B and machine M is thedata set collected when running B on M.  A new member is added to theset for each iteration of B.  The complete data set is known as thebenchmark history.  This property is the heart of the data structure andis discussed in the next section.  An additional property that existsfor Gabriel and Zipple benchmarks is : benchmark - benchmark-code.  Thevalue of the keyword benchmark is the compiled benchmark code.  Thisproperty is also discussed in an additional section.\section {Machine Name - Benchmark History}In addition to serving as the indicator for the Machine Name - BenchmarkHistory property, Machine Name's property list contains all of theenvironment information collected for that machine.  The property listcurrently consists of one property, which has the keyword {\bfHerald}.The value of Herald is an a-list with the seven following keys: User,Time, Herald, System, Machine, Room, and Disk-Label.  The respectivevalues of the seven keys are as follows: the logged-in user when thebenchmarks were ran, the universal time the benchmarks were ran, thePrint-Herald results, the software-version for the environment in whichthe benchmarks were ran, the machine-version for the environment inwhich the benchmarks were ran, the Room results, and thePrint-Disk-Label results for each known unit in the disk system.The value of Machine Name, Benchmark History, is a structure thatcontains the performance data collected when running this particularbenchmark on this particular machine.  The structure is the listconsisting of the following items: Name, Pretty-Name, Count,Un-Normalized-Time, History, and Plist.  {\bfName} is the symbol name ofthe benchmark and is used in all references to the benchmark - both codeand data.  {\bfPretty-Name} is the name used when printing benchmarkresults.  {\bfCount} is the number of iterations the benchmark hasexecuted.  {\bfUn-Normalized-Time} is the total realtime the benchmarkexecuted, without any normalization values subtracted.  {\bfHistory} isa structure consisting of the following items, by default:  Real-Time,Disk-Time, Page-Faults, Consing, Cpu-Time, and Paging-Time.  The history items discussed thus far are the results of the meterscollected by default by the support software.  Additional values formeters specified by the user will also be stored in this history list.They come after the default meter values, in the order and formatspecified by the user.  Each user-specified item in History must bepreceeded by a identifier keyword.  Not only does this facilitateinterpreting the data structure but it is needed by the reportingsoftware to identify the items.  The software knows the position of thedefault meter values, therefore those value always have Nil keywords.{\bfPlist} is a list that contains keyword - value pairs.  The value ofkeyword Normalization is the list of normalization benchmarks that applyto this particular benchmark.  (Normalization is discussed in thechapter ''Defining New Benchmark Suites'', in the section ''ZippelBenchmarks''.)  The value of keyword Classes is the list of benchmarkclasses to which this benchmark belongs. \section {benchmark - benchmark-code}Another property that exists on the property list of benchmark name forGabriel and Zippel benchmarks is benchmark - benchmark-code.  Thekeyword is benchmark and its value is a dtp-closure.  This property iswhere the compiled benchmark code resides; it is common to all machinesknown in the system.  Notice that each machine name resides on thisproperty list also, as the designator of its history for this benchmark.Since the Interactive benchmarks are created from a script file, they donot consist of compiled code; therefore they do not possess thisproperty.  \chapter {Performance Testing Methodology}This section is intended to provide a sound starting point for doingserious performance anaysis of the system.  It is not possible to coverall variables that could affect the results of a test, therefore anyonedoing testing must take great care to control any additional factorsthat are particular to the environment being tested.At all times keep in mind that test results are {\underline extremely}sensitive to the environment in which they are run.  Careful followingof these guidelines is necessary to ensure repeatability ofmeasurements, which in turn allows for a productive discussion of theresults.  The exact methodology will change as the system evolves,therefore only general guidelines are appropriate in this document.Start from as clean an environment as possible, preferably a cold boot,which will cause the fewest number of possible side-effects.  Since itis not always possible to start from a cold boot, load the minimumamount of changes to the system to support the test, and carefullydocument the details about your modifications.  In addition, archive thefollowing information with the data:\begin{itemize}\item (PRINT-HERALD) to document patch level of all software pieces, total amounts of physical and virtual memory, and machine on which the test was run.\item (ROOM)  Additional virtual memory information.\item (PRINT-DISK-LABEL) for every disk pack on the system.  This  documents exact system, paging, and file system layout.\end{itemize}In addition, document detailed information about any customizationsincluded in login files, or their equivalents.The timing function used should wrap the reading of clocks and meters aswell as the test function in a (WITHOUT-INTERRUPTS) form.  Wheneverpossible tests should be run with the network disabled (CHAOS:DISABLE)in order to eliminate the effect of random network traffic processingfrom the results.\chapter {Running Benchmark Suites} This chapter will discuss the method for running the standard benchmarksuites, Gabriel, Zippel, and Interactive.  Although the procedure isvery similar for each of the suites, there are some minor differences;therefore the general method will be explained first, and then theissues unique to each suite.  Before running a benchmark suite on yoursystem, it is most important follow the guidelines presented in thechapter, Performance Testing Methodology, in order to ensure dataintegrity.  Once you have created the environment in which you wish to run abenchmark suite, load the main defsystem from its host system byexecuting {\bf(Load ''host:bench;defsystem'').} This is the only filerequired to get you started.  The defsystem defines thelogical-pathname-host, specifying the host as {\bf''Bench:''}, theperformance software directory as {\bf''Bench:bench;''} and thebenchmark result files directory as {\bf''Bench:br;''.}Once the defsystem is loaded, the easiest way to run any of thebenchmark suites is to execute {\bf(Do-Suite\_Name 'machine)} whereSuite\_Name is the name of the suite to be run, and machine is adescriptive name for the environment which has been booted.  Forexample, executing (Do-Gabriel 'fan16-lru) would run the Gabrielbenchmark suite on a system booted with fan16 load band and lrumicrocode band.  The Do-Suite\_Name function performs a Make-System on the benchmarksuite and runs the benchmarks specified within.  Once the finalbenchmark has finished running, all of the collected data is written tothe file {\bfMachine-Suite\_Name.Bench.} For example, if the previousGabriel example were run, the results would be stored in the file''fan16-lru-gabriel.bench''.  The Make-System call loads (compiling if necessary) the benchmarks andother performance support files listed in the file {\bfSuite\_Name-defsystem.} For the three standard suites, Gabriel, Zippel, andInteractives, the defsystem file name would be Gabriel-defsystem,Instruction-defsystem, and Interactive-defsystem, respectively.  Once the Make-System has completed, Do-Suite\_Name performs{\bf(Run-Suite\_Name machine)} which executes the benchmarks in batchmode, looping a fixed number of times.  Data is collected from metersfor real time, disk time, paging time, number of page faults, consingtime, and cpu time.  These are the standard meters which are collectedby default in the performance data structure.  If desired, you canspecify additional meters that should be collected while the benchmarksare executing.  For specific details on how to accomplish this, refer tochapter 8, Defining New Benchmark Suites.  The results file, created by the Run-Suite\_Name function call toSave-Results, consists of a machine property designating the type ofLisp machine on which the suite was run, information documenting theenvironment in which the suite was run, and the test results.  Thedocumenting information consists of the user identification, theuniversal time the benchmarks ran, the print-herald information, thesoftware-version and machine-version for the environment in which thebenchmarks were run, the Room results (called with no arguments), andthe disk label.  Test results consist of a record for each benchmark inthe suite, containing the data collected for that benchmark.  The resultrecord is discussed in detail in chapter 3, Performance Data Structure.\chapter {Data Handling Techniques}This chapter will discuss the performance routines that are availablefor improving the accuracy of the data that has been collected throughrunning a benchmark suite in a specific environment; there are severalsituations in which you may need to do this.  Often data obtained fromone or more of the benchmarks are questionable and you may want to rerunthose particular benchmarks again, replacing the questionable data withthe newly obtained results.  Other times you may feel the standarddeviation of the data collected for a given benchmark is too large andthat you need to collect additional data in order to stabilize resultsfor that benchmark.  Finally, you may want to run some additionalbench-marks, appending those results to previously collected data.  Partof the {\bf Run-Benchmarks file}, the routines available foraccomplishing these tasks include {\bfPerform-Benchmarks, Save-Results,Update-Results}, and {\bfRestore-Results.} Another routine,{\bfGenerate-Benchmark}, can be used to create compatible benchmarkresult records from data available from hard copy only.When you want to run one or more selected benchmarks, the performanceroutine to use is {\bfPerform-Benchmarks.} Perform-Benchmarks has fourparameters which are defined as follows : (Benchmarks \&key (Repcount10.)  (Report-P t) (Clear t)).  {\bfBenchmarks} is the list of names ofthe benchmarks you want to run.  {\bfRepcount} is the number of times torun each benchmark before the next one is run.  (More accurately, thenumber of times a given benchmark is run will be the difference betweenrepcount and the count of the number of runs previously recorded forthat benchmark.)  Once the final benchmark has completed, {\bfReport-P}if true will cause Report-Benchmarks to print to the terminal theresults of all the benchmarks named in Benchmarks.  If true, {\bfClear}will forget any result data it currently knows about for all thebenchmarks named in Benchmarks. Once the desired benchmarks have been rerun, you can use the function{\bfUpdate-Results} to replace the old results with the new results.The function has three parameters which are defined as follows :(Pathname \&optional (Benchmark-List *All-Benchmarks*) (NameMy-Benchmark-Property)).  {\bfPathname} is the name of the file wherethe old results are stored; the new results will be stored in ahigher-version-numbered file with the same name.  {\bfBenchmark-List} isthe list of names of benchmarks for which results are to be updated.This will generally consist of the names of the benchmarks that werererun, however the default, *All-Benchmarks* consists of all thebenchmarks that have been run (or loaded via Restore-Results) in thecurrent environment.  {\bfName} is the benchmark property of the machineon which the benchmarks were run.  The default, My-Benchmark-Property,for an Explorer would be Explorer-Benchmark. {\bfUpdate-Results} writes the environment documentation from the oldresult file and for the current environment to the new result file.Next the function reads in the benchmark records of the old result fileand writes them to the new file, discarding any record for a benchmarknamed in Benchmark-list.  Once the old result file has been written, theresults of each benchmark named in Benchmark-List are appended to theend of the new result file.  {\bfSave-Results} is similar to Update-Results except there is no oldresult file from which previous benchmark records are read.  Theparameters are the same as those for Update-Results and are used in thesame fashion.  Save-Results writes the current environment documentationto the result file and then the results of each benchmark named inBenchmark-List.  {\bfRestore-Results} is the function to use for loading benchmark resultfiles.  It has two parameters which are defined as follows: (Pathname\&optional Machine-Name).  The function loads each benchmark record inthe result file specified by {\bfPathname.} and stores it under itsbenchmark name, a field from the benchmark record.  All of the benchmarknames are stored under {\bfMachine-Name.}  If the machine name isomitted, it defaults to the name stored in the result file.  If previousbenchmark records exist for that machine name, restored records replaceold records with the same name, but old records that do not get replacedstill exist after Restore-Results completes.{\bfGenerate-Benchmark} is a routine that enables you to create abenchmark structure from benchmark records in a text file and optionallysave the results to an output file.  This routine facilitates comparingyour own benchmarks to those from other sources, because it puts rawdata into a compatible result form.  Once this is accomplished, you canuse the reporting software to compare these new benchmarks to your own.The parameters for the routine are as follows : (Machine Class-ListInput-Pathname \&optional Output-Pathname).  {\bfMachine} is the name ofthe machine under which the benchmark records are to be stored.{\bfClass-List} is the name of the benchmark suite to which the raw databelongs, for example, Gabriel-Benchmarks.  Input-Pathname is for thetext file containing the raw benchmark data.  The records in this filemust be in the format : benchmark-name cpu-time-in-seconds.  IfOutput-Pathname is supplied, the benchmark records generated from theraw data will be saved to this file.  You can use Save-Results to savethe results at a later time, if so desired.\chapter {Reporting Results}\section {Introduction}The performance reporting sofware provides three levels of flexibilityin generating detailed reports from previous runs of benchmark suites.At the top level, there are standardized reports for the standardbenchmark suites, which report details of collected meters forindividual machines and comparisons between two or three machines.  Atthe second level, you can define and arrange personalized reports offactual and comparison information from a multitude of predefinedcolumns.  (The standard reports were defined in this manner.)  At thelowest level, you can define your own columns for a given report, whichcan be mixed with any previously defined columns.  You may need todefine new columns in two situations.  The data is there but you want toreport it differently.  You have defined new meters and now need toreport the newly collected data.  There are several analyze functions that the reporting software providesfor reporting data.  All of the default meter values and the userdefined meter values can be accessed and reported in a variety ofstatistical formats through the reporting software.  These issues arealso discussed in this chapter.\section {Initial Setup}The report software gets loaded on two occasions: in conjunction withloading the runtime software for executing a benchmark suite, andindependently when the runtime software is not part of the environment.If you have already ran a benchmark suite in the current environment,then the report software has been loaded.  Otherwise you must load thereport software by loading the defsystem and doing a Make-System.  Thespecific steps are as follows: \begin{itemize} \item (Load''lm:bench;defsystem'') \item (Make-System 'Reports) \end{itemize}The directory where the report software expects the benchmark data filesto be is Bench:br; where Bench is defined as the logical host name.  Allof the reports are printed to the terminal so it is necessary to dribblethem to an output file if you want to save them for later reference.Try to pick a descriptive dribble filename that will indicate the typeof benchmark suite ran and the machines that were compared.  Forexample, a good filename for a Gabriel report of two machines, Symbolicswith system 6.0, and Explorer with fan20 might be,''Sym6-ExpFan20-Gabriel.Report''.  Often the report format will be widerthan 80 columns, so it is best to use a printer that supportslandscaping or compressed print.  The Imagen font 5X5 can easily printten columns of information in portrait mode.\section {Standardized Reports}The correct syntax for executing one of the standardized reports, is{\bf(Report-Suite\_Name Machines Reports).} For the Gabriel andInteractive suites, {\bfSuite\_Name} is Gabriel and Interactive,respectively; for the Zippel suite, there is a report for each subsuite,Zippel and Extensions, with corresponding Suite\_Name.  {\bfMachines} isthe required list of one, two, or three machine names, the nameassociated with the data file that contains the results of running agiven suite on that machine.  (Perhaps ''system'' would have been abetter nomenclature.)  For example, if Report-Gabriel is given the list'(Exp172) , it knows to read the data file''Bench:Br;Exp172-Gabriel.bench''.  {\bfReports}, an optional parameter, is a list of names of reports that areto be generated for two machines.  Report-Suite\_Name uses a fixed set ofreports for one or three machines.  Report-Gabriel for example, uses thedefault reports Gabriel-Compare Realtime-Compare Cpu-Compare andPaging-Compare for two machines.  For one machine it uses the reportsGabriel-Report and for three machines, the reports Gabriel-Compare3,Realtime-Compare3, and Paging-Compare3.  Each of these reports iscomprised of standard report columns which provide an easy and flexiblemethod for defining and arranging your own reports.  The standard categories made available by the report software includethe following : cpu time, real time, disk response time, paging time,number of page faults, and number of consings.  For each of thesecategories it is possible to report, based on the data stored for eachrepetition of the given benchmark, mean, median, standard deviation,minimum, and maximum.  You can also report percentage changes and ratiosof these statistical items for any two machines.  The cross product ofthe standard categories and the statistical items listed above,benchmark name, and number of repetitions performed, makeup thepredefined columns available for building reports.\section {Defining Reports From Predefined Columns}The second level of flexibility, designing reports using predefinedcolumns, can best be explained by way of example.  Let's take a closerlook at a dummy report called My-Compare.  The report prints factual andcomparison information for two machines and is defined as follows:\begin{verbatim}   (DEFREPORT My-Compare      short-pretty-name 1 count  1 cpu-min 2 cpu-min   (compare-1-2 cpu-min)     blank-3 1 time-min 2 time-min) \end{verbatim}{\bfDefreport} is a macro that puts the name 'My-Compare in the list*All-Reports*, the list of all known reports.  It then defines aconstant, {\bfMy-Compare}, whose value is the list of symbols and listsappearing in the Defreport.  The symbols and symbols in each list aredefined by entries in the constant {\bfReport-Types.} A symbol that isreferenced in a defreport but not defined in Report-Types consitutes anerror.  Each entry (symbol or list) in the Defreport defines one columnof the report, except for numbers which are used to switch machinesbeing reported on.  A list references two entries in Report-Types anddefines a column for the car of the list based on the value of the cdrof the list.  In this example, (compare-1-2 cpu-min) would define acolumn comparing machine one to machine two on minimum cpu time.The report that would be generated by the call(Report-Gabriel '(Exp172 Exp200) 'my-compare) would be as follows(assuming that the only known Gabriel benchmarks were boyer and fft) :\begin{verbatim}Benchmark Results    TI internal data    4/07/86 14:46:46    Page1    EXP172          EXP EXP172    EXP200             EXP172    EXP200                    Num   Min       Min     2 rel     Min       Min     Benchmark Name  Run   Cpu       Cpu      to 1   RealTime  RealTime  GABRIEL-BOYER    5  26.62 s   26.78 s    0.99   26.62 s   26.78 s   GABRIEL-FFT      5  23.21 s   23.23 s    1.00   23.21 s   23.23 s   \end{verbatim}If you wanted to produce a new type of report for the Gabriel suite, theabove procedure would work just fine.  However, it is more likely thatyou will want to produce new report types for your own benchmark suites.In addition to writing the Defreports you will need a procedure verysimilar to Report-Gabriel in order to make use of your Defreports.  Forthis reason, we need to look at the Report-Gabriel function.   Report-Gabriel calls Load-Reports with Machine\_names and Suite\_namewhich will load the data files defined by the two parameters.  Forexample, if the call (Load-Reports '(Exp172) 'Gabriel) is made, thefunction will read the data file ''Bench:Br;Exp172-Gabriel.bench''.Load-Reports calls Restore-Results for each machine in Machine\_names.Restore-Results loads the results for each machine, updating the globalvariable *gabriel-benchmarks*, which will contain a list of the names ofall known Gabriel benchmarks.  Report-Gabriel then calls Run-Reportswith the parameters (*gabriel-benchmarks* Machine\_names Report1\&optional Report2 Report3).  The first two parameters have beendiscussed.  Report1, Report2, and Report3 are lists of Defreports to beproduced for each machine in Machine\_names.  When there are one, two,or three, machines, Run-Reports will call Report-Benchmarks for eachreport in Report1 Report2 or Report3, respectively.  Report-Benchmarkswill print each report for each benchmark whose name appears in*Gabriel-Benchmarks* if every machine has a record for that benchmark.If one or more machines had no data for a given benchmark, thatbenchmark is excluded from the report and a message appears at the endof the report. \section {Defining New Report Columns}   The lowest level of flexibility allows you to define new columns inaddition to the standard columns available for building report formats.Once the new columns have been defined, you can use them in a Defreportas you would standard columns.  Each Defreport entry's underlying format and structure is definedthrough an entry by the same name in Report-Types.  Each entry inReport-Types is an alist of the form : (Name Format-string Top-headerBottom-header Analyze-Function .  Args).  {\bfName} is the report-type name used to reference the report type in aDefreport.  {\bfFormat-string} is a format control string used forprinting the headers and results.  {\bfTop-header} is a string used asthe top line of the two line label.  {\bfBottom-header} is a string usedas the bottom line of the two line label.  {\bfAnalyze-Function} is thefunction whose result is printed with Format-string as the report value.Function is called with arguments Benchmark .  Args, where Benchmark isthe benchmark structure object being reported.  If an item in aDefreport is a list, for example (compare-1-2 cpu-min), then the cdr ofthe list is added to Args.For example, two entries in Report-Types, cpu-min and compare-1-2, whichwere used in the previous example, MyCompare, were defined as follows:\begin{verbatim}DefConstant Report-Types   '((cpu-min '' ~10a'' ''  Min  '' ''  Cpu   '' analyze-time CPU Min)     (compare-1-2 '' ~5,2f'' ''2 rel'' ''to 1'' compare-machines 1 2))\end{verbatim}Reporting items in the benchmark history that were additional metervalues specified by the user, is similar to reporting default metervalues; however they are a little different.  The major difference isthat the meter label must be specified so that the reporting softwarewill know which item you are referencing.  The item still has to bedefined in the Defconstant Report-Types, and referenced in one or moreDefReports before it can become a column of output.There are three generic report types that were designed to facilitatereporting user specified meter values: Get, Time, and Fix.  Get returnsa flonum value, Time returns a raw microsecond time in appropriateunits of ns, us, ms, or seconds, and  Fix returns a rounded fixnum.  TheseDefConstant entries are adequate in most instances which means generallyall you have to do is make the DefReport references.The DefReport entry must be a list whose Car is the Report-Type name andwhose Cdr is the arguments to be passed to the print function.  For thethree generic report types, the arguments are Type and Need.  {\bfType}is the meter label that is used to specify the meter value in thehistory list.  {\bfNeed} is one of the following: mean mode medianstd-dev min max total size samples count real-count un-normalized-time.It is the computation that is to be performed on this data item.The following example shows the report specification of a meter valuethat was user-defined.  The reference is to Report-Type Fix.  The meterlabel in the benchmark history is :extra-pdl.  The desired Need is mean.\begin{verbatim}(defreport gabriel-reportmed-pretty-name count cpu-meancpu-dev cpu-min cpu-max disk-max(fix :extra-pdl mean))\end{verbatim}\section {The Analyze Functions}The reporting software includes several general Analyze-Functions thatare used by the standard reports and should cover 99 percent of alladditional user needs.  These functions are:  Analyze-History,Average-Disk-Response, Analyze-Time, Analyze-History-Fix,Compare-Machines, and Compare-Percent.  When using the standardReport-Types, these functions are of no concern to the user since theyget called automatically.  However, when defining new Report-Types, itis necessary for you to specify the analyze function.  The followingdescriptions should facilitate this process.{\bfAnalyze-History} is the heart of the analyze functions; all of theothers listed above use it as well.  Analyze-History returns a valuebased on Need and Type.  {\bfNeed} is the desired statistical analysisand can be one of the following: total mean mode median std-dev min maxsamples size count real-count un-normalized-time.  To understand theexplanation of these statistical items remember that there is a historyentry for each iteration of a given benchmark on a given machine.  Eachhistory entry contains a value for each of the meters collected on thatiteration.\begin{itemize} \item {\bfTotal} is the total of each recorded value of the specifiedmeter type.  For example, if there were five iterations of benchmark X,then there will be five Cpu-Time meter values in the history structureand these five values will be summed as Total. \item {\bfMean} is Total divided by the number of history items(iterations.)\item By definition, {\bfMode} is the meter value appearing mostfrequently for the meter history.  As defined by the reporting software,it is one-half of the sum of the smallest and largest history items.\item {\bfMedium} is the meter value below and above which there is anequal number of values.  \item {\bfStd-Dev} is the statistical standard deviation of the meteritem.  \item {\bfMin} is the minimum value for the meter item.  \item {\bfMax} is the maximum value for the meter item.  \item {\bfSamples} is the number of iterations the benchmark hascurrently been executed on a given machine.  \item {\bfSize} is the benchmark count divided by the number of recordedhistory items.  When the benchmark is first created, the benchmark countis specified.  Size is the ratio of this count to Samples.  \item {\bfCount} is the product of Samples and benchmark count dividedby the number of history items.\item {\bfReal-Count} is Count expressed as a decimal integer.\item {\bfUn-normalized-timed} is the accumulated Real Time for alliterations of the benchmark, without any normalization.  This number ofiterations is the product of benchmark count and Samples.\end{itemize} {\bfType} is the meter type from the benchmark history to which thestatistical analysis is to be applied.  It can be one of the following:time disk-time paging-time faults cons cpu or a keyword.  Theenumeration of each of the record values for the default types, pluseach of the keyword types with its recorded value, constitutes onehistory item.  This item corresponds to one iteration of the benchmark.If the repetition count specified when the benchmark was defined isgreater than one, then each of the default type values is divided by therepetition count.\begin{itemize} \item {\bfTime} is the Real Time in microseconds, calculated from thecounter si:%microsecond-time, that was required to execute the benchmarkone iteration.  \item{\bfDisk-Time} is the Disk Wait Time in microseconds, calculatedfrom the counter si:%disk-wait-time, that was recorded while thebenchmark was executing one iteration.\item {\bfPaging-Time} is the Total Page Fault Time in microseconds,calculated from the counter si:%total-page-fault-time, that was recordedwhile the benchmark was executing one iteration. \item {\bfFaults} is the Total Page Fault Count, calculated as thesum of counters %count-disk-page-reads and si:%count-fresh-pages, whilethe benchmark was executing one iteration.\item {\bfCons} is the Total Cons Count, calculated from the functionArea-Size, that was recorded while the benchmark was executing oneiteration.\item {\bfCpu} is the Total Cpu Time, calculated as the differencebetween Time and Disk-Time, that was recorded while the benchmark wasexecuting one iteration.\item{\bfKeyword} is a meter label for the meter that was specified bythe user when defining the benchmark.  The meter value immediatelyfollows the Keyword in the history structure.   \end{itemize} {\bfAverage-Disk-Response} returns the dividend of disk-time and faultsin appropriate units of ns, us, ms, or seconds.  Disk-time and faultsare calculated by Analyze-History as described above.{\bfAnalyze-Time} returns any microsecond time in appropriate units ofns, us, ms, or seconds.  It gets the microsecond time fromAnalyze-History based on type and need as described above.{\bfAnalyze-History-Fix} uses the Round function to return as a fixnum,the value it receives from Analyze-History.{\bfCompare-Machines} returns the ratio of two recorded meter values itreceives from Analyze-History, based on type and need as described above.Normally this function is used for comparing the same history item ofthe same benchmark for two different machines.{\bfCompare-Percent} returns the value of Compare-Machines as apercentage.\chapter{Defining New Benchmark Suites}\section{Introduction}This chapter will explain in greater detail the three standard benchmarksuites, Gabriel, Zippel, and Interactive.  The purpose of theexplanation is to provide an understanding of how each suite was definedso that additional suites can be defined using the same methods.  Eachsuite was defined from a collection of code segments that differed informat from the others.  For this reason each type of collectionrequired a different method of defining it as a benchmark suite.  The Gabriel benchmarks, which consist of self-contained functions, weredefined using the function {\bf Define-Gabriel-Benchmarks}.  The Zippelbenchmarks consist of single operations that require certainpreconditions in order to execute.  They were defined using the macro{\bf Define-Benchmark}.  The Interactive benchmarks consist ofexpressions that are stuffed into the keyboard buffer and executed.They are defined "on-the-fly".  As each expression is stuffed in thebuffer and completes execution, the benchmark name and meter results arerecorded.  Hopefully these defining methods will be general enough tofacilitate defining most any collection of code segements into abenchmark suite.Reguardless of the type of benchmark definition that is used, thesupport software will collect the default meters, (Disk-Wait-Time,Paging-Time, Hard-Page-Fault-Count and Area-Size), and store the resultsin the benchmark's data structure.  The default meters are collected foreach benchmark type and additional meters may be specified when usingthe macro Define-Benchmark.  The data structure is identical for eachtype of benchmark and is discussed in detail in the chapter, PerformanceData Structure.\section{Gabriel Benchmarks}The Gabriel benchmarks consist of one or more main functions and astart-up function whose sole purpose is to execute the benchmark.Defining them requires writing the basic code for the benchmark,including a call to the start-up function as one of the arguments of themacro {\bf Timer}, and specifying the benchmark to the function{\bf Define-Gabriel-Benchmark}.  The general procedure is to write allof the basic benchmarks first, including the Timer calls for eachbenchmark's initial function, and then make one call toDefine-Gabriel-Benchmark with a list of all benchmark names andspecifications.  Since there are approximately 45 Gabriel benchmarks, itwas decided to put each group in a separate package and file.  If youare defining a few benchmarks only, you may wish to put all of them inthe same package.The following example is part of the code that defines the benchmarkGabriel-Div2.  The function test1 is the top-level function for thebenchmark.  (It calls div2 which is not shown here.)  Following test1 isthe call to the macro Timer, with the macro expansion shown next to it.Timer simply compiles the start-up function, timit1, whose sole task iscalling test1.  The macro's parameters are Name, Form and Args, where{\bf Name} is the name of the start-up function, {\bf Form} is the callto the benchmark's top-level function, and {\bf Args} is the argumentsthe start-up function might need.  The start-up function's name isinsignificant; it is only used to specify the benchmark toDefine-Gabriel-Benchmark. \begin{verbatim} ;;;-*- Mode:LISP; Package:BENCH-DIV2; Base:10 -*-       (defun test1 (l)        (do ((i 300. (1- i)))       ((= i 0))       (div2 l)       (div2 l)       (div2 l)       (div2 l)))     (timer timit1                       (PROGN 'COMPILE         (test1 l))      --->           (DEFUN TIMIT1 ()                                          (TEST1 L)))\end{verbatim}Once the caller function has been created by Timer, the benchmark has tobe specified to Define-Gabriel-Benchmarks which performs the remaininghousekeeping tasks, associating the benchmark code with its true symbolname and assigning the benchmark name to the specified class.  Thefollowing example shows the call to Define-Gabriel-Benchmarks for thebenchmark Div2 used in the previous example, and an additional benchmarkFrpoly.  This example is a little simplistic because the entire Gabrielsuite was defined with only two calls to Define-Gabriel-Benchmarks, onefor each class of Gabriels, *Gabriel-Benchmarks* and *Gabriel-Print*.The parameters to Define-Gabriel-Benchmarks are Bench-Name Repeat-Countand Benchmarks.  {\bfBench-Name} is the name of the class of benchmarksto which the benchmarks will be assigned.  Benchmarks belonging to morethan one class must be specified by additional calls.  For example, theGabriel print benchmarks were assigned to the sub-class *Gabriel-Print*by a second call to Define-Gabriel-Benchmarks with Bench-Name, *Gabriel-Print*.  {\bfRepeat-Count} is stored on the benchmark symbol name'sproperty list as the value of property Benchmark-Repeat-Count.  Thevalue is not currently used by the performance software.  The final parameter to Define-Gabriel-Benchmarks, {\bfBenchmarks}, is alist containing three members: Repeat-Count, Function-Definition, andGC- Flag.  {\bfRepeat-Count} applys to all of the benchmarks beingdefined is this list.  The value is optional and not currently used bythe performance software.  {\bfGC-Flag} is the garbage collection flagand has possible values of gc, no-gc, and nil.  If gc is specified, whenthe benchmark has executed, a fast and dirty garbage collection will beperformed by resetting the temporary consing area.  The other two valuesno-gc, and nil will prevent garbage collection from being done.  {\bfFunction-Definition} is a list containing three members as well.The first member is the true name of the benchmark; the name by whichall references to the benchmark are made.  This benchmark name is addedto the class name for the group of benchmarks (the parameterBench-Name.) The second member is the function name that was given tothe macro Timer when the benchmark code was written.  The third memberis the parameters to be passed to this function, when it is executed.\begin{verbatim}    (define-gabriel-benchmarks '*gabriel-benchmarks* 1      '(5   (gabriel-div2-1 bench-div2:timit1) gc   (gabriel-frpoly-1-10 bench-frpoly:timit1 10) no-gc))\end{verbatim}\section{Zippel Benchmarks}The benchmarks in each of the Zippel groups consist of single operations(such as function calls or flavor instantiations) that need additionalcode to control the environment in which they execute.  For this reasoneach benchmark is defined using the more versatile macro, {\bfDefine-Benchmark}.  The macro expands each benchmark into aself-contained function, providing code for controlling page-faults,executing normalization functions, determining which meters arecollected while the benchmark is running, and cleaning up after thebenchmark terminates.  In addition it provides code for saving thecollected meter values. The parameter list for Define-Benchmark is (name pretty-name repcount\&body others).  {\bf Name} is the name of the symbol containing theexpanded benchmark.  The benchmark code is stored on Name's propertylist as the value of property Benchmark.  Name is used in all referencesto the benchmark, both for the execution and the report functions.{\bfPretty-name} is the print name associated with the benchmark and isthe name printed with the benchmark results.  {\bf Repcount} is the number of times that the benchmark code, definedin :body or :real-body, is to be executed.  This can be confusingbecause you also specify the number of iterations at the top level, whenexecuting the benchmarks.  Repcount defines the number of ''internal''iterations, which is executed for each iteration specified at the toplevel.  All of the default meter values collected are the sums over allof the internal iterations.  The values recorded in the historystructure are first divided by Repcount to get the most accurate valuefor a single iteration.  In the following example, Fixnum-Addition,Repcount is specified as 3000.  This means that if you executeRun-Benchmarks on this benchmark with a loop count of 1, it will execute3000 times.  The meter values recorded for this one iteration will firstbe divided by 3000.{\bf Others} consists of one or more keyword/value pairs, where thelegal keywords are: :classes :body :bindings :declarations :real-body:cleanup-form :normalization :meters and :allow-page-faults.Thesekeyword/value pairs provide the flexibility for executing the benchmarkin a specifically controlled environment.  To better explain the formatand purpose of these pairs, let's look at the definition of thebenchmark Fixnum-Addition.\begin{verbatim}(define-benchmark Fixnum-Addition ''Fixnum Addition'' 3000  :classes (*zippel-benchmarks* *fixnum-arithmetic* *arithmetic*)  :allow-page-faults nil  :normalization (+(normalization push-local)(normalization store-local))   :bindings ((v1 val1) (v2 val2) result)  :body (setq result (+ v1 v2)))\end{verbatim}The value for keyword {\bf :classes} is a list of global names to whichthe symbol Name will be added.  Classes make it easy to organize thebenchmarks into one or more groups for convenience in running andreporting results for related benchmarks.  In the example,fixnum-addition belongs to three classes which makes it easy to run orreport the entire suite of Zippel benchmarks or only one of thesubgroups.When the value of keyword {\bf :allow-page-faults} is true,Define-Benchmark adds no additional code for controlling page faults.However, when the value is nil the macro inserts code to keep executingthe benchmark until no paging occurs during the current execution.  Onlythe meter values recorded during this final execution are returned.    Normalizations are benchmarks themselves, and are run in the sameenvironment as the rest of the benchmarks.  The purpose of normalizationis to measure operations that are inherent in certain types ofbenchmarks.  For example, pushing a local value onto the stack is aninherent operation in all arithmetic operations.  Its performance ismeasured in the normalization benchmark, push-local, and the result isused to normalize the results of all arithmetic benchmarks.  In ourexample, the normalization benchmark store-local is also specified,because the code contains a setq which is an inherent operation.  We arenot interested in the setq time, only in the fixnum addition time. Normalization benchmarks are defined using the same macro as the otherZippels, Define-Benchmark.  By convention, they are designated as suchby adding ''-normalization'' to the symbol name.  For example, thestore-local normalization benchmark is designatedStore-Local-Normalization.  All of the normalization benchmarks shouldbe ran before the other benchmarks are executed.  As each benchmarkcompletes execution, the normalization value will be subtracted from thebenchmark's recorded real-time and cpu-time.The value of keyword {\bf :bindings} is a list of variables and initialbinding that will be needed in the benchmark code.  The Define-Benchmarkmacro simply binds them using a Let statement.The value of keyword {\bf :declaration} is a list of declarations to aidin compilation or optimization.  The Define-Benchmark macro simply usesthese as arguments of a Declare statement.The argument {\bf :meters} provides a method of specifying meters thatyou want to collect in addition to the default ones.  The default metersinclude Disk-Wait-Time, Paging-Time, Hard-Page-Fault-Count andArea-Size.  It is often desirable to collect additional information,such as :extra-pdl data.  There are approximately 40 meters that aredefined in the Explorer system and can be collected by the performancesoftware.  See the comments concerning the DefSysConstA-Memory-Counter-Block-Names in Run-Benchmarks for a listing anddescription of these meters.The following example, Slinghash, shows the format for specifying the:meters argument.  Each sublist, which specifies one meter, consistsof three members: meter-label, meter-form and difference-function.  Themeter list may consist of as many sublist as required.  {\bfmeter-label} is a required keyword for identifying the value that issaved for this meter.  The default meters always use nil as the labelfor their values, because their position in the history list is known atcompile time.  The example shows the specification of two meters.Labels not only facilitate looking at the data structure, but are neededby the reporting software to distinguish multiple meter values.{\bfmeter-form} is the form that should be used for collecting themeter.  For the meters known in A-Memory-Counter-Block-Names, the formwill always be (si:read-meter counter-name).  The Define-Benchmark macroexpands the code to collect the measurement once before and once afterthe :body code.  The latter collected value will be subtracted from the previous valueonce the benchmark has executed, using the third argument,{/bfdifference-function}.  The default one is minus (-).  For timevalues, microsecond-time-difference is used to compensate for thepotential clock wrap around, as shown for the second meter in theexample.  The result of the subtraction is the value that is saved inthe history list along with the optional meter-label. \begin{verbatim}(define-benchmark slinghash nil 500  :classes (*extended-zippels* *hash-benchmarks*)  :bindings ((var 0) (hash-table test-hash-table))  :meters ((lru (si:read-meter 'si:%least-used-page)         nil)           (nil (si:read-meter 'si:%total-page-fault-time)        microsecond-time-difference))  :body  (GETHASH (SETQ var (1+ var)) hash-table))\end{verbatim}If there is a meter that you plan to include for all benchmarks, you maywish to set it as a default meter.  This will prevent you from having tospecify it with the :meter argument for each benchmark definition.  Youcan change the default collected meters by modifying the global constant{\bfBenchmark-Meter-List}.  Each element of Benchmark-Meter-List is asublist with three members : Meter-Label, Form, and Difference-Function.The format and functionality of the constant is identical to that of the:Meters argument used in Define-Benchmark. The chapter, Performance Data Structure, gives more information on theformat of the history list.  The chapter, Reporting Results, explains howto access user specified meter values using the performance reportingsoftware.The value for the keywords {\bf :body} and {\bf :real-body} contains thesection of code or operation which Define-Benchmark is to expand into abenchmark.  The two keywords are mutually exclusive; only one of themshould be specified in a given definition.  The value of either of thetwo keywords can only contain a single top level form (which could be aProgn if necessary.)  {\bf :body} is used when defining less complex benchmarks that need noinitialization code.  With this keyword specified, Define-Benchmark willexpand the benchmark to execute Repcount times, collecting the defaultand additionally specified meters.{\bf :real-body} is used when you need greater flexibility than :bodyallows.  Using this keyword allows you to specify initialization codethat you want to be executed prior to the actual benchmark code, and notto be included in the metering.  The following example of the Bitbltbenchmark shows how :real-body is commonly used.  The call totv:prepare-sheet is initialization code that will be executed before,and excluded from the metered code.  Benchmark-body is theperformance macro that provides code to execute the benchmark (in thisexample, the call to Bitblt) the number of specified times, and collectthe default and additional meters.  Define-Benchmark uses this macrowhen :body is specified.\begin{verbatim}   (define-benchmark bitblt "Bitblt a 100 x 100 square" 200  :classes (*extended-zippels* *graphics-benchmarks*)  :bindings ((scrn-ary (SEND tv:main-screen :screen-array)))  :cleanup-form (SEND terminal-io :set-char-aluf alu)  :real-body  (tv:prepare-sheet (tv:main-screen)    (benchmark-body (200)      (BITBLT tv:alu-xor 100 100 scrn-ary 10 10 scrn-ary 200 200)))) \end{verbatim}The value of the final keyword, {\bf :cleanup-form}, is the form to beexecuted once the benchmark has executed, and all meters have beencollected.  In the previous example the clean-up form will set the aluargument for terminal-io back to the value of alu.  Once the code section has been expanded, Define-Benchmark puts thebenchmark on the property list of the symbol Name as the value ofproperty Explorer-Benchmark.  This data structure is discussed in detailin the section, Performance Data Structure.  When Perform-Benchmarks iscalled with the list of benchmarks to be executed, it gets the benchmarkcode from each benchmark name's property list.  \section{Interactive Benchmark Definition}Unlike the Gabriel and Zippel benchmarks which are compiled code, eachInteractive benchmark is an s-expression which is stuffed into theactive keyboard buffer and interpreted.  The s-expressions are stored ina script file which is read one form at a time.  Once the benchmark hascompleted execution, the results are stored on the benchmark name'sproperty list.  The code for the benchmark is not stored as it is forthe other suites.Each s-expression in the script file defines an Interactive benchmarkand simulates a single action a user would perform, such as ''meta-xcompile buffer''.  Each one must be of the following list form:(Test-Name Strings \#/Char Symbols).  {\bfTest-Name} is the name to beassociated with the benchmark.  The meter results are stored onTest-Name's property list.  The remaining three form items can be in anynumber and any order that makes sense for the current operation.  Eachof them will be stuffed into the keyboard buffer and executed.{\bf\#/Char} designates using the reader macro, \#/ to specify thecharacter Char.  (Refer to chapter 21 of the Lisp Reference Manual formore information on reader macros.)  {\bfStrings} are just strings thatwill be entered into the buffer, such as ''compile buffer''.  Finally,the characters of {\bfSymbols}' print-names will be entered into thebuffer.  The following sample benchmark which measures the performanceof deleting local files will make the explanation more clear.\begin{verbatim}    (Edit-Delete-Local-File #\m-x "delete file" #\Return(String local-dir) "tree.*.*" #\Return #\Y)\end{verbatim}The value of Test-Name is Edit-Delete-Local-File which is the nameassociated with the benchmark.  The remainder of the expressiondetermines what will be stuffed into the buffer and executed.  In thisexample three actions will occur:\begin{itemize}\item Meta-X delete file RETURN is stuffed and executed.\item filename-to-be-deleted RETURN is stuffed and executed.      File-name is the concatenation (String local-dir) and "tree.*.*".\item Y is stuffed and executed.  This is in response to the delete files      confirmation prompt.  \end{itemize}Everything in the list form except for Test-Name is included in themeter timings.  The special symbol {\bfWait} can be used between form objects as neededto prevent losing characters stuffed into the buffer.  When the symbolWait is read, the stuff routine will wait until the buffer is againavailable before stuffing the next form or object.  In the previousexample it may have been better to have put a Wait symbol before thefilename designation.  The special Test-Name symbol {\bfIgnore} can beused to execute a form that is not a benchmark and for which you do notwant to save the meter results.  The form will be executed normally butthe results discarded.  Ignore facilitates clean-up between iterationsso the next loop will start in the same environment as did the previousone. The following example shows the use of ignore and wait symbols.  Thisbenchmark which is named edit-clean is not really a benchmark; itspurpose is to clean up the ZMACS environment before executing the nextbenchmark.  The ignore prevents the timing results recorded duringedit-clean from being saved.  The first wait symbol ensures that thisform will not execute until the previous benchmark has completed.  Thesecond wait symbol ensures that the return is not missed.\begin{verbatim}edit-clean(select-editor #\system #\e)(ignore " " wait #\c-x #\k wait #\return "no" #\return #\return #\abort)\end{verbatim}The script file execution is begun by the function Junior, which hasparameters (Pathname \&optional Clear-p Name (Repcount 1)).{\bfPathname} is the pathname of the script file to be executed.{\bfClear-p} when true will cause the performance software to forget anyknown history of benchmarks ran for Name.  {\bfName} is the system nameto be associated with this benchmark run and result data file asdescribed in the chapter Running Benchmark Suites.  {\bfRepcount} is thenumber of times the benchmark suite is to be executed.  Junior's maintask is to create the process Stuff which is the driver that executesthe benchmarks.Process Stuff opens the script file for input, then reads each form andprocesses it until end of file.  As each form is read, Stuff savesTest-Name in the special variables, *all-benchmarks* and*interactive-benchmarks* and creates a benchmark structure for thecurrent benchmark.  Next it calls Stuff-Input on each element in the cdrof the script form.  Finally it calls Record-Benchmark to save the meterresults for the current benchmark. There is a debug mode for the Interactive benchmarks which can be usefulwhen testing new script files.  The third parameter to Junior,{\bfName}, in normal mode is a symbol indicating the environment and theresult file.  However, supplying this parameter as a list causesStuff-Input to go into a debug mode.  In debug mode Stuff waits aftereach character is stuffed, slowing the interaction so it's easier to seewhat's going on.  The script file may contain top-level symbols instead of the listsdescribe above.  These symbols act as labels that can be used forincluding or excluding portions of the script file during debug.  Namelist is a list of labels, optionally preceeded by a minus (-) sign.Stuff skips over sections of script that are not preceeded by one of thelabels in the Name list, or, if the first thing in the name list is aminus (-), it skips sections of the script that are in the Name list.This feature can be used to speed up debug by making it easy to skipover portions of the script that are already debugged.The following example should clarify the previously made points.  Itcontains the call to the function Junior and a subsection of the scriptfile, Debug.Script.  Notice that the parameter Name is a list of labelsas described above.  Finding the Name parameter specified as a list,Stuff will switch to debug mode.  Notice also the matching label''edit-misc'' that appears before the series of Edit benchmarks.  Whenprocess Stuff reads in this top-level label, it will search the namelist parameter for a match.  Finding the label in the name list, it willbegin stuffing and executing the script forms in debug mode.  When Stuffreads the next top-level label, Create, it will not find a match in Namelist and will not execute the Select series of benchmarks.  If the firstsymbol in Name list had been a minus sign, then Stuff would have skippedover the Edit series and executed the Create series in debug mode.\begin{verbatim}(junior "lm:bench;debug.script" T '(edit-misc edit-dired) 1)edit-misc(edit-list-buffers #\c-x #\c-b wait " ")(edit-select-buffer #\c-x #\b wait "tree" #\escape wait #\return)  (edit-scroll-forward #\c-v)(edit-scroll-backward #\m-v)(edit-select-previous-buffer #\c-m-l)create(select-converse-create #\system #\control-c)(select-inspector-create #\system #\control-i)(select-lisp-create #\system #\control-l)\end{verbatim}\end{document}-INTERVAL))(LAST-BP (INTERVAL-LAST-BP NEW-INTERVAL)))    (DOLIST (BP (LINE-BP-LIST (BP-LINE (INTERVAL-FIRST-BP OLD-INTERVAL))))      (MOVE-BP BP (IF (EQ (BP-STATUS BP) :MOVES)      LAST-BP      FIRST-BP)))    (MOVE-BP (INTERVAL-FIRST-BP OLD-INTERVAL) FIRST-BP)    (MOVE-BP (INTERVAL-LAST-BP OLD-INTERVAL) LAST-BP)    (DO ((LINE (BP-LINE FIRST-BP) (LINE-NEXT LINE)) (END-LINE (BP-LINE LAST-BP)))(NIL)      (SETF (LINE-NODE LINE) OLD-INTERVAL)      (IF (EQ LINE END