Zoem User's Manual

December 10, 2021

zoem-21-341

Zoem User's Manual

Zoem: <Dutch> The sound made by electrical devices and flying bugs. Pronounced: zoom or zum; the vowel is short.

Zoem is a general macro/programming language with filtering capabilities. It transforms text in two stages.

In the first stage the text is scanned for macro escape sequences. The core zoem language consists of so-called primitive macros. They provide a wide spectre of functional behaviour, including data storage, arithmetic evaluation, I/O facilities, iteration and flow control, dictionary stacks, system interaction, and regular expressions. As with any macro/programming language, the real power comes from the ability to create new user-defined macros in terms of primitives and other user-defined macros. Macro expansion can be arbitrarily delayed, and inside-out evaluation is supported.

A useful feature is the combination of anonymous macros, lists, and the \apply#2 primitive, constituting a callback mechanism in zoem. Another feature is the automatic popping and pushing of dictionaries with the begin/end environment, providing shadowing and localization.

In the optional second stage, the text is filtered. Two filter scopes are distinguished. The first is called device scope and is always enclosed in a filter escape sequence. The second, plain scope, is everything else. Filtering mechanisms are provided for both. The filtering language is useful when the output is meant to be in an external format with device-specific escape sequences, such as HTML or troff. Conversions are specified in device specific filtering rules that are applied in plain scope, so that the zoem input is device agnostic. By setting up different filtering rules the same input can be used to generate different outputs.

This manual covers the whole zoem language. A large part of it is only interesting for someone writing a reusable macro package. A smaller part is still interesting for someone who is just using such a package, but might want to add some utilities and shortcuts of his own. The part where file inclusion, macro definitions, and macro expansions are explained is required reading for anyone considering or planning to use zoem.

1.2.

3.1.

Invoking zoem from the command line

Zoem syntax and parsing

5.1.

Syntax and nomenclature

5.2.

Primitives, built-ins and user macros

5.3.

List of escape sequence classes

5.4.

5.5.

5.6.

5.7.

5.8.

5.9.

5.10.

5.11.

5.11.1.

Device scope resulting from mapping special characters

5.11.2.

Device scope resulting from glyph definitions

5.12.

SGML/HTML/XML syntactic sugar cubes

6.1.

6.2.

6.3.

6.4.

Of blocks and varargs

6.5.

6.6.

6.7.

8.1.

8.2.

8.3.

10.

End.

Zoem supports several programming constructs, immediate and postponed expansions, stream character filtering, easy I/O facilities, integer arithmetic, and a whole lot more. Its main aims are:

•

Providing the building blocks for a structural and programmable approach. Section 3.1 contains an overview of the zoem primitives and their use.

•

Accepting a pleasant syntax that does not require much thinking, favouring simplicity and rigor over looseness and context dependent rules.

•

Creation from the keyboard while minimizing on key-strokes.

•

Few meta characters. Zoem achieves this by having a single special character and a reasonably restricted syntax.

•

Adding filtering capabilities so that multiple devices can be addressed.

Features

3.1

Listed below are some features of the zoem primitives. In practice there are two kinds of zoem files. The first is a zoem macro file, which should contain macros defined in terms of lower-level macros, with zoem primitives at the lowest level. The second is a zoem document file, which should import such a macro file and only use the high-level macros defined in that macro file. Additionally, a document file can define some high-level macros of its own, in terms of low-level macros, zoem primitives, or a mixture of both.

•

Macros with arguments, overloading of key names allowed (i.e. different keys with the same name are distinguished by the number of arguments they take). (\def#2, \set#2, and \setx#2). Zoem primitives look like regular macros, but usually they expand their arguments before use. User macros can be equipped with this behaviour by wrapping them in \apply#2.

•

Support for variable number of arguments - see the vararg section.

•

Easy file, STDIN, STDOUT, and STDERR input/output, nested file inclusion (\dofile#2, \write#3, \finsert#1, and \zinsert#1).

•

Extensive support for arithmetic with the primitives \let#1, \f#2, \f#3, \fv#2, and \eqt#3.

•

Operators returning booleans (\defined#2, \cmp#3, \eqt#3), control operators acting on those (\while#2, \if#3).

•

The for-like primitives \apply#2 and \table#5.

•

Match and substitution capabilities using POSIX regexes (\inspect#4).

•

The switch primitive \switch#2, employing the vararg construct.

•

Localized expansions (\eval#1) and meta-zoem (\! and \!#1).

•

A user dictionary stack that can be manipulated using \push#1 and \pop#1. An environment environment for doing \begin{stuff} .. \end{stuff} stuff (see \env#4). This environment creates name scopes by pushing and popping to/from the dollar dictionary stack. Environments may take arguments, one particular useful application is that local variables (e.g. local to an itemize environment) can thus be specified by the user.

•

Storage of data by multiple string indexing — arbitrary data can be stored in a tree by indexing nodes with (arbitrary) strings. Refer to the Tree data section.

•

The ability to nicely format macros (see \formatted#1).

•

Syntactic sugar for writing SGML-style mark-up and having it checked.

•

Executing system commands, possibly sending data to STDIN and receiving data from STDOUT — \system#3.

•

And more.

Invoking zoem from the command line

You use zoem by invoking it from the command line. The normal mode of operation is that you specify a file for zoem to chew up and spit out. This is called the entry file, and its name usually has the extension .azm. A common invocation looks as follows:

zoem -i mcl.azm

The -i flag specifies the entry file. It is not necessary to write the .azm extension, zoem will append it for you. The preceding could also have been entered as

zoem -i mcl

In either case, zoem will set the session key \__fnbase__ to the base name of the entry file, i.e. the name stripped of its .azm suffix and any leading path components. In this example, the key \__fnbase__ will get the value mcl. If you have an input file without the .azm extension, you need to use the -I option.

Zoem writes its output to a default output file which is named according to three rules. The rules are:

•

If the -o flag was given the value say somestr, zoem will write to the file named somestr.

•

If -o was not supplied but the -d flag was used with argument say zyx, zoem will write to the file named \__fnbase__.zyx. The -d flag also results in the macro \__device__ being set to xyz.

•

If neither -d nor -o was given, zoem will write to the file named \__fnbase__.ozm.

It is possible to change the default output file from within the document; this is achieved with the \writeto#1 primitive. Zoem can mingle default output with output to other files, use the \write#3 primitive for that.

Both the -i and -o flag accept a hyphen as argument, meaning respectively that zoem will read from STDIN and write to STDOUT.

Specifying just zoem and entering a return will cause zoem to enter interactive mode, in which it reads from STDIN and writes to STDOUT. Interactive mode currently should catch any errors occurring, so it is a good way of experimenting and testing. By default, interactive mode reads chunks that are ended by a single dot on a line by itself. This behaviour can be changed by setting the session variable \__parmode__ using the -s option. Using zoem -l parmode shows the bits that can be set in this variable. It is for example possible to make zoem read paragraphs (ended by two or more consecutive newlines).

Note There is a difference between specifying no output stream (i.e. not using the -o option) and explicitly specifying -o -. In the latter case, zoem will never enter interactive mode. Should you need to insert zoem into some pipe sequence, then you need to use -o -.

Tracing errors If your document contains erroneous input (e.g. using a macro that was never defined), zoem will by default print the approximate corresponding line number of the current input file and the last key it saw, and then exit. If that does not suffice in tracking down the error, you have a choice of options. One possibility is to use one of the various tracing modes described below and in the zoem interpreter manual. Another possibility is to specify the -x option which says to enter interactive mode should an error occur. This enables you to inspect the values of keys defined or used in the problematic area. A selection of other options is given below. For the full story, consult the manual page of the zoem interpreter.

-h

lists all flags accepted by zoem with a short description for each.

-s foo=bar

Sets key \foo to bar.

-e <any>

Zoem will evaluate <any>, writing any result text to STDOUT, and then exit.

-E <any>

Zoem will evaluate <any>, writing any result text to STDOUT, and then proceed.

-x

If an error occurs, zoem stops processing and enters interactive mode.

-l <str>

lists all entities specified by <str>. It can be any of all, filter, legend, alias, session, trace, or zoem. Repeated use is allowed. In fact, zoem will only check whether the target is present as a substring, so

zoem -l legendzoem

will print the legend and the list of zoem primitives.

--trace

This traces (prints) all keys encountered, and prints possibly truncated arguments. Zoem has several other tracing flags, use the -h flag or refer to the zoem manual page for more information.

Tracing can be set from within the document using the \trace#1 primitive. Part or whole of the data tree can be output from within the document using the \%dump primitive (refer to the Tree data section).

--stats

When zoem is done, it prints statistics about the primitive name table and about the user name table.

Zoem syntax and parsing

Syntax and nomenclature

5.1

Zoem parses text which may contain zoem escape sequences, these are sequences that have special meaning and cause zoem to do special processing. Each and every escape sequences starts with a backslash, no exceptions. There are three kinds of sequences that are macros, which may or may not take arguments. These are zoem primitives, user keys, and dollar keys.

There are currently about sixty zoem primitives, these are listed in the The zoem language section. Sixty is quite a lot; it includes convenience sibling sets such as \set#2, \setx#2, \def#2, and \def#2, and entries covering a variety of areas such as I/O, arithmetic, testing, control, string conversions, formatting, shadowing (scopes), and all the other stuff listed in the Topic index section.

Additionally there are a number of zoem built-ins that are defined in terms of primitives. Built-ins live in the same dictionary as primitives and behave the same in all aspects. The next section has some further remarks on the differences and resemblances between primitives/built-ins on the one hand and user macros on the other hand.

User keys and dollar keys are discussed in the Macro expansion section. Arguments are shipped by delimiting them with curly braces, as in

\thiskey{takes}{\bf{two}\it{arguments}}.

No characters are allowed inbetween (the delimiting curlies of) two arguments (but take note of the handy \formatted#1 primitive). See the Macro expansion section for more information. Zoem is very strict in the syntax it accepts, but it garantuees to accept a text in which each backslash \ is escaped as \\ (i.e. a text in which all consecutive runs of backslashes have even length).

An active backslash is any backslash that is not made inactive by an active backslash immediately preceding it. The first backslash seen by zoem (proceeding sequentially through the text) is active. This is one incomprehensible way of stating the obvious, and I bet you know what I mean anyway. An active backslash must always have a meaning known to zoem. If zoem does not get it, it will complain and exit. The meaning (i.e. class) of the escape sequence introduced by an active backslash is determined by the character immediately following it. A list is given below.

Within arguments, curlies not functioning as argument delimiters must also be escaped if they are not balanced. It is best practice to escape all non-argument-delimiting curlies, but I never do so myself unless they are not balanced. An escaped curly is a curly preceded by an active backslash. An active curly is a curly that is not escaped. A pair of balanced curlies consists of an active left curly that matches an active right curly, where inbetween all escaped curlies are disregarded. A block is anything delimited by balanced curlies. The word scope is most often used to distinguish between device scope and plain scope, these are the two kinds of parse scopes. An environment scope refers to the stuff enclosed by instances of the \begin#2 and \end#1 primitives. So called name scopes are entered and exited by \push#1 and \pop#1.

Primitives, built-ins and user macros

5.2

Zoem distinguishes between primitives and built-ins and user macros on the other hand. Consider the following slightly contrived example.

\def{fib#1}{ \push{fibonacci} \set{a}{1} \set{b}{1} \set{c}{0} \while{\let{\a <= \1}}{ \setx{c}{\a} \setx{a}{\let{\a + \b}} \write{-}{txt}{\c\|} \setx{b}{\c} } \pop{fibonacci} }

The example is contrived in that zoem is not the most appropriate language to compute Fibonacci numbers. The reason for using it is that extracts from existing macro packages require more context and are simply more boring.

In the example, the following macros are primitives: \push#1, \pop#1, \def#2, \set#2, \while#2, \let#1, \setx#2, and \write#3. The example defines a user macro \fib#1, which can be invoked e.g. as \fib{100}. Doing this either from a file or from interactive mode should give the output below.

1 2 3 5 8 13 21 34 55 89

From the above it can be seen that a macro (primitive or user-defined) is in this text often referenced by its signature. The signature contains both the name of the macro and the number of arguments it takes, separated by the # (octothorpe) character. The octothorpe and ensuing integer are omitted if a macro does not take arguments. A new macro is defined by specifying the required signature (without the leading backslash) as the first argument of one of the definition macros. In this text a signature is usually prefixed with the backslash.

The example above also defines user macros a, b, and c. The \set#2 primitive will not warn when a previous definition exists. The \setx#2 acts similarly but will also first evaluate the value assigned to the macro. Finally, \push#1 and \pop#1 temporarily create a new dictionary in which the definitions are stored. This is one easy way to ensure that no other definitions are overwritten. This level of care is generally not required though.

The following are useful to know about primitives, built-ins and user macros.

•

Primitives and built-ins live in one namespace (or dictionary), user macros live in another.

•

A built-in is a special macro provided by zoem. It is defined in terms of one or more primitives, but its definition lives in the same dictionary as primitives do. It is called a built-in because its definition is built into the zoem interpreter. Example: begin#1 is defined as \begin{\1}{} (cf \begin#2). The full list can be obtained by issuing zoem -l alias.

•

Primitives and built-ins can be shadowed by user-macros, but a warning will be issued. You can test this by issuing e.g. \def{def#2}{\set{\1}{\2}}. This particular piece of code redefines \def#2 as \set#2 by shadowing the primitive \def#2 as a user macro, losing the property of \def#2 that it warns if a key already exists.

•

Primitives and built-ins can always be accessed by prefixing the name with a right quote, as in \'def{foo}{bar}. The prefixed primitive syntax also has the advantage that it is slightly faster, although zoem speed is likely not something one should worry about.

•

It is impossible to access a user macro with the quote prefix syntax.

It is probably a good idea in macro packages that export functionality to use the primitive quote prefix syntax. This protects the package from user redefinitions. At the same time, the ability to shadow zoem primitives implies that user macros (also those exported by macro packages themselves) are protected against potential clashes with zoem primitives that may be introduced in later versions of the language. The Fibonacci example looks as follows using the quote prefix syntax.

\'def{fib#1}{ \'push{fibonacci} \'set{a}{1} \'set{b}{1} \'set{c}{0} \'while{\'let{\a <= \1}}{ \'setx{c}{\a} \'setx{a}{\'let{\a + \b}} \'write{-}{txt}{\c\|} \'setx{b}{\c} } \'pop{fibonacci} }

The quote mechanism only works for zoem primitives and built-ins that follow the syntax of user macros. This includes names starting with a dollar $ or with a double quote ". The primitive \$#2 and the built-ins \"" and \""#1 are the only examples in this category. The quote mechanism does not work for special zoem primitives such as data keys (Tree data), delay keys (\!#1), or XML syntactic sugar (SGML/HTML/XML syntactic sugar cubes).

List of escape sequence classes

5.3

This is a list of escape sequence classes recognized by zoem, indexed by the (set of) character(s) triggering the class(es) — this assumes that the character in question is preceded by an active backslash.

$[_a-zA-Z]*

[dollar key] A sequence starting with a dollar sign possibly continued with underscores and alphanumeric characters. Introduces a dollar key. No dollar signs are allowed in the remainder, and the first non-alphanumeric non-underscore character terminates the sequence. The primary use of dollar keys is that they are set by \begin#2 and \end#1. Dollar keys live in the dollar dictionary stack, which is pushed and popped by \begin#2 and \end#1. Nested begin/end scopes can thus safely associate different values with the same key name. Refer also to the Macro expansion section, the The zoem language section, and the Dictionary stacks section.

Note: \$#2 is the only zoem primitive starting with a dollar.

[_a-zA-Z][_a-zA-Z0-9]*

[user key/zoem primitive] A sequence starting with an underscore or an alphabetic character, with only underscores and alphanumeric characters in the remainder. Introduces a user key or a zoem primitive. The first non-alphanumeric non-underscore character terminates the sequence. These keys live in the user dictionary stack, which the user can control with the \push#1 and \pop#1 primitives. Refer also to the Macro expansion, The zoem language, and Dictionary stacks sections.

A sequence consisting of a single underscore (i.e. not followed by an alphanumeric character) introduces an anonymous key.

"

[user key or zoem built-in] Starts a user key, which is only different from the user keys mentioned above in that it looks different. The sequence is terminated by a closing ". Inbetween, anything is allowed except a backslash or curly. This is used for creating mnemonic names such as

\"man::author" \"man::section" \"man::version" \"html::title" \"html::charset"

These keys live in the user dictionary stack. See the Macro expansion section and the Dictionary stacks section.

There are two zoem built-ins that take this particular form. These are \"" and \""#1. Both expand to nothing. The first can be used to temporarily separate two pieces of text, as they will be joined after expansion. The second can be used to quickly outcomment sections of text.

'

[primitive quote prefix] As seen above, primitives and user keys largely live in the same syntactic namespace. It is possible to unambiguously invoke a primitive by inserting a right quote to the left of the primitive key or built-in. Refer to section Primitives, built-ins and user macros.

The user keys live in a stack of dictionaries. Dictionaries can be pushed and popped using \push#1 and \pop#1. The default user dictionary is always present and acts as a global namespace. It is possible to retrieve a key \foo directly from the global namespace using the syntax \''foo, even if it is shadowed in stacked dictionaries. It is possible to set a key in the global namespace using \set#3.

%

[data key] A sequence starting with a percent sign. The percent sign is followed by a number of scopes. This is used to access multi-dimensional data storage. Such data is stored using the \def#2 primitive or one of its siblings. Refer also to section Tree data.

[1-9]

[positional parameter] A single (character encoding a) positive digit. The sequence backslash followed by digit is called a positional parameter. It is only interpreted in the second argument of \set#2, \setx#2, \def#2, and \defx#2, and in the definition part of an anonymous key (which can be an argument to \apply#2 and \inspect#4). In all these instances, the sequence denotes a positional parameter into which the corresponding argument will be interpolated when a key with arguments is used. It is allowed in other places though, as it is possible in zoem to create key definitions dynamically (see e.g. the \setx#2 primitive). Note during interpolation, positional parameters that are enclosed by the delay scope \!{..} will not be interpolated (see \!#1). The status of this feature is not entirely clear.

<newline>

[strip newline] The newline will be stripped during the interpretation stage. If you want the newline to be stripped during the file-read preprocessing stage use the sequence \:{/}, which is a special case of the comment sequence (see below).

:

[preprocessing sequence] There are a few preprocessing sequences, which are evaluated during file read (cf the File read section). The most important preprocessing sequence is simply the sequence \: followed by whitespace, an alphanumeric character, or a backslash. It introduces a comment up till and excluding the next newline, which is stripped.

|

[zoem glyph] Comprises a special two-character sequence that can be given a device-specific meaning. It is customarily used to encode a line break. To zoem, this sequence is more or less the same as a 'normal' character. See the \special#1 primitive.

~

[zoem glyph] It is customarily used to encode a non-breaking space. See the entry above.

-

[zoem glyph] It is customarily used to encode a long dash. See two entries back.

\

[backslash] Denotes a literal backslash.

{

[left curly] Denotes a literal left curly.

}

[right curly] Denotes a literal right curly.

*

[glyph sequence] Starts a glyph sequence or constant sequence. Refer to the Device scope section and to the \constant#1 primitive.

<

[syntactic sugar] This introduces syntactic sugar for directly writing SGML-style mark-up, such as HTML and XML (e.g. DocBook). Refer to the anchor section for that topic.

@

[at scope] Typically seen in macro package files only. Starts a special instance of device scope called at scope. The sequence \@ must immediately be followed by a pair of balanced curlies, so at scope always appears as \@{..}. Refer to the Scope dichotomy and Device scope sections for more information.

@e

[at HTML entity] \@e{ent} will expand to the HTML character entity &ent; — it is equivalent to, somewhat easier to type and minimally shorter than \@{&ent;}.

&

[and scope] Typically seen in macro package files only. May only be used within device scope, and implements a limited form of macro expansion within that scope. The sequence \& must immediately be followed by a pair of balanced curlies. so and scope always appears as \&{..}. Refer to the Scope dichotomy and Device scope sections for more information.

[back quote - formatting escape] Typically seen in macro package files only. Must be followed by a pair of balanced curlies enclosing a formatting sequence. This is only recognized within the \formatted#1 primitive. This primitive removes all literal whitespace it encounters in its argument; the formatting sequences are transformed to the whitespace characters encoded by them.

,

[comma - atomic separator] Typically seen in macro package files only. This is interpreted during filter time, and is always mapped to nothing. Use it for glueing things as in \foo\,1, which will result in theresultoffoo1.

!

[delay sequence] Typically seen in macro package files only. Introduces a zoem meta sequence. Such a sequence consists of a maximal run of consecutive backslashes, possibly followed by a single block. It can be used to delay interpretation during any number of interpretation stages, and is useful to delay interpretation, perhaps even in a nested fashion, for arguments in keys such as \apply#2 that expand one or more of their arguments before use. The run of exclamation marks (or 'bangs' as they are internally called) actually comprises an argument to the underlying primitive, so the two primitives (one taking a single block argument) internally have respective signatures !#1 and !#2. Externally though, they are just refered to as \! and \!#1.

=

[inline files] Rarely used feature. Starts either a sequence of the form \={fname}, which begins a so called inline file named fname, or a sequence of the form \==, which ends such an inline file. Refer to the File read section.

This leaves 0>()[]?^#/.; for future use, hopefully very few of these will ever acquire meaning. If the sequence \# acquires meaning, it will probably be for encoding Unicode scalar values.

Parsing stages

5.4

Parsing is separated into three stages, conceptually. Zoem knows two different parse scopes, plain scope and device scope. These are mentioned below, and explained in the Scope dichotomy section. The three stages are:

•

File read

•

Macro expansion / file inclusion - only plain scope is seen.

•

Filtering - both plain scope and device scope are filtered. Device directives that lay hidden in device scope are interpreted during output.

A file is read in chunks, if possible. The requirement is that chunks must end on lines and be in the outermost scope. The default minimum chunk size is approximately one megabyte. Chunks are processed by recursively chunking them into smaller chunks as dictated by macro expansion. As soon as a chunk is no longer subject to macro expansion it is immediately filtered and output.

Macro expansion is done recursively. Whenever a macro is encountered, it is replaced by its expansion, and the result is again fed to the parser. Evaluatation is not necessarily lazy, that is, during macro expansion the expander may expand arguments before they are interpolated and substituted in the macro definition. This inside-out evaluation can recurse if necessary. Many zoem primitives evaluate one or more of their arguments before use. The default behaviour for user macros is lazy evaluation. This can be changed however by wrapping both the macro and its arguments in \apply#2. Expansion can be delayed using \! and \!#1, so different arguments can be treated differently.

Important is that the result from the second stage is still valid zoem input. If you re-feed it to zoem, file read and macro expansion are usually no-ops (unless some interpretation delay-magic was used), and the syntax is garantueed to be acceptable to zoem. This is because device scope is not touched during the first two stages, and device specific text (which is most likely not conforming to zoem syntax) lies always hidden in that scope. There are three kinds of escape sequences introducing device scope; these are described in the Device scope section.

This is used for example when creating a table of contents; you can write expanded but unfiltered content to a file and read it in during the following run. It is important that such content is fully expanded, because you want things like index numbers and references as they are at the time of macro invocation. It is equally important that what you read back in is still valid zoem input; this is simply achieved by witholding filtering. When the table of contents is read in, it can be subjected to filtering, and this is the right way to do toc stuff in Zoem.

File read

5.5

File read - stripping comments, reading inline files.

Zoem searches for files included via \dofile#2 or one of its built-in aliases in a number of places if it cannot find the file in the current directory. The precise way of searching is documented in section File search path.

\:

In most cases, this sequence introduces a comment, namely where it is followed by whitespace, an alphanumeric character, or a backslash. It introduces a comment up until and excluding the next newline, which is stripped.

The sequence \:{/} introduces a comment up till and including the newline. This feature can be useful within the \protect#1 primitive, as it is the only way to delete actual newlines within the argument of that primitive.

The sequence \:{!} is replaced by a backslash. The single use currently known is to make it easy to quote zoem input containing comment sequences. This

\protect{\foo \bar \zut \:{!}: this will end up as a comment. }

will result in the following

\foo \bar \zut \: this will end up as a comment.

\={fname}

starts inline file named fname at the next line, removes remainder of line after the \={fname} sequence. When using the \dofile#2 primitive or one of its four built-in aliases, an inline file takes precedence over regular files in the file system, whether it is present (as a regular file) or not. See below. This feature can be used to ship zoem input in one piece while putting the macro parts at the end. \zinsert#1 can address inline files as well, but \finsert#1 cannot. The reason for this is that inline files have to satisfy zoem syntax, whereas \finsert#1 can be used to read arbitrary data.

The future will probably bring a zoem option that creates such a self-contained file automatically from the zoem entry file.

\==

ends inline file, removes remainder of line.

The above applies to any file read at any stage. Inline files may occur in any file included at any time, but they do not nest.

The zoem entry file is the single file that is specified on the command line. This is the main file, from which other files can be included if desired.

Zoem entry files usually have the extension .azm, which is memnonic for A ZoeM file. This is required if the -i option is used. Arbitrary entry file names can be specified using the -I option. It is not uncommon to generate sibling files with .roff, .html, .zmt (zoem table of contents), and .zmr (zoem references) extensions — however, this is all configurable in user space and not part of zoem itself. There are no restrictions on names of files that are included from the entry file. Inclusion is done recursively.

The future will probably bring a second extension that is allowed, namely .ezm for Expanded ZoeM file, which is a self-contained file in which every included file is present as an inline file.

File search path

5.6

If zoem cannot find a file in the current directory, it attempts to find the file in one of three different ways. These are, in the order in which they are attempted:

•

The environment variable $ZOEMSEARCHPATH is checked. It may contain a listing of paths separated by whitespace or colons.

•

The zoem variable \__searchpath__ is checked. It must contain a listing of paths stored as a vararg, i.e. a sequence of paths where each path is delimited by curly brackets. DO NOT overwrite this variable, but rather append or prepend to it. Most likely zoem was configured and compiled locally on your system, in which case \__searchpath__ contains the path necessary to find the macro packages man.zmm, faq.zmm, and ref.zmm.

•

The path of the file currently being parsed is used. Assume that file foo contains \import{/a/b/c/bar}. If file bar wants to include file zut, which is in the same /a/b/c/ directory, it need not prepend a path but can just issue \import{zut}. Should the previous search mechanisms fail to find zut, then zoem will as a last resort deduce the path from /a/b/c/bar. This feature is probably rarely needed, if ever at all.

Macro expansion

5.7

Macro expansion consists of recursive file inclusion and macro expansion. All zoem primitives and user keys are recursively expanded until none remains. Zoem primitives and user keys take one of the following forms:

\abc_0123_

A key with alphanumerics and underscores only. Ends with any other character. All zoem primitives but one have this form.

Note: \_ denotes an anonymous key, see the Anonymous keys section.

These keys live in the user dictionary stack. Initially, there is only one dictionary. The stack can be manipulated using the \push#1 and \pop#1 primitives.

\"abc::def-ghi.jkl,mno+qrs"

A quoted key. Almost anything inbetween quotes is allowed. Always ends with a quote. No zoem primitive has this form. These keys live in the same user dictionary stack as the keys above.

\$abc_0123_

A key introduced with a dollar sign. The name may further consist of alphanumerics and underscores and it ends with any other character. These keys live in the dollar dictionary stack. A dictionary is pushed with every occurrence of \begin#2, and that dictionary is popped with the corresponding occurrence of \end#1.

Further note: \$#2 is a zoem primitive.

All three types of keys may take arguments, and overloading is allowed:

\foo \: signature foo \foo{bar} \: signature foo#1 \foo{bar}{bop} \: signature foo#2 \$foo{bar}{baz} \: signature $foo#2 \"foo::oof"{zut}{zit}{zot} \: signature "foo::oof"#3

is an ensemble of valid and unique keys, which can be defined for example by

\def{foo}{FOO} \def{foo#1}{The FOO of \1} \def{foo#2}{The FOO of \1 and \2} \def{$foo#2}{The $FOO of \1 and \2} \def{"foo::oof"#3}{\foo{\1}{\2}\foo{\2}{\3}}

Additionally, zoem allows the definition of constant keys that map directly into device space and are ignored during macro expansion. Usage of such keys looks like \*{'e} or \*{(c)} and is detailed later on.

A sequence \k where k is in 1-9 is allowed within anonymous keys (as used for example in \apply#2 and \inspect#4) and in the definition argument of the \def#2 primitive and its siblings \defx#2, \set#2, and \setx#2. It indicates the position(s) where arguments should be interpolated. Note during interpolation, positional parameters that are enclosed by the delay scope \!{..} will not be interpolated (see \!#1). The status of this feature is not entirely clear.

A feature that should only rarely be needed is that zoem allows name scopes. Refer to the Dictionary stacks section.

File inclusion

5.8

There is one zoem primitive which has four different uses. For each of those uses, a built-in alias exists.

\dofile#2 use alias meaning \dofile{expr}{!+} \input{expr} require file, interpret and output \dofile{expr}{!-} \import{expr} require file, interpret only \dofile{expr}{?+} \read{expr} permit absence, interpret and output \dofile{expr}{?-} \load{expr} permit absence, interpret only

The \dofile#2 primitive and its four aliases are perhaps a little funny interface-wise — better ideas are welcome. The expr argument is digested, that is, expanded until no macro's remain. It is thus possible to specify \__fnbase__.zmt and include a table of contents file that has been written to in a previous run. \dofile#2 and its aliases have the property that zoem really descends into the files, and on error will emit a message containing the approximate line number where it occurred.

Additionally, the contents of a file can be placed inline using \finsert#1 and \zinsert#1.

Note: wherever key is written, it means that something of the form \foo, \$foo, or \"foo" has to be provided, so you would use \setx#2 e.g. as \setx{foo}{\finsert{\__fnbase__.zyx}}.

Protection

5.9

The \dofile#2 primitive requires that the files to be included satisfy zoem syntax. It will descend into the files and proceed parsing them.

The \finsert#1 and \zinsert#1 primitives do not descend, but rather act as if the contents of the file specified were pasted into the place of macro invocation.

\finsert#1 will protect the contents of the inserted file, that is, all backslashes and curlies are escaped by preprending them with a backslash. \zinsert#1 will include the file unchanged, assuming that its contents satisfy zoem syntax.

The \system#3 primitive is able to pipe data to a system command's STDIN stream and retrieve data from the command's STDOUT stream. This primitive will unprotect the data it sends, and it will protect the data it receives. Note the security implications of this feature as discussed at the \system#3 entry.

Data can be explicitly protected using the \protect#1 primitive.

Protected data can (currently) never result in it being expanded again. This is because escaped backslashes are only interpreted at filter time, and never during expansion. If you only need temporary delay of expansion, use the \! primitive or the \!#1 primitive.

Scope dichotomy

5.10

Zoem knows two parse scopes: plain scope and device scope. The latter is also called 'at scope' because \@{..} is one (but not the only) way of entering device scope. In plain scope, every character represents itself as a glyph, i.e. as something that should show that way in print/on screen (after the zoem output/device input is fed to the device interpreter).

For example, if you write the less than sign < in plain scope, it should show up as a readable less than glyph, like in this very sentence. In order to make this happen, zoem provides the \special#1 primitive, so that the less than sign can be automatically mapped to the html entity sequence <.

In device scope, nothing is mapped except for a double backslash should it occur. If you enter this particular sequence of mixed scope: \@{}<hello world>\@{} as zoem input, the zoem output/device input is (provided the \special#1 primitive was correctly used for the html device): <hello world> and what you finally see on screen is: <hello world>. In device scope, every character (except for the escape sequences available in that scope) represents itself as the character that should be present in the zoem output/device input. Device scope should normally only be seen in macros and not in running zoem input.

In plain space you type characters just as you want to see them eventually — when you read the document after the zoem output was run through a device interpreter (such as a browser or printer, or postscript previewer). The only exceptions are the backslash and the two curlies, these should be entered as \}, \{, and \}, respectively. Those escape sequences are interpreted as the characters or glyphs \, {, and }. For all characters, including these three, it is checked whether they should be further mapped according to the \special#1 primitive. If a mapping is found, it is retrieved and interpreted by the device scope filter. Read on.

Device scope

5.11

There are three kind of strings which are interpreted by the generic device filter, and which are said to live in device scope:

•

The strings embedded in \@{..} sequences.

•

The strings mapped to by the \special#1 primitive, including mappings of the zoem glyphs \~, \|, and \-.

•

The strings mapped to by the \constant#1 primitive.

In a macro package that is meant to work for multiple devices, every use of any of these constructs will typically be embedded in something that tests the value of the active device. This can be done using either \cmp#3 with \if#3, \switch#1, or \$#2, in conjuction with the pre-defined zoem key \__device__, containing the name of the active device (which can be specified on the command-line). The following are equivalent:

\if{\cmp{eq}{\__device__}{html}}{ \@{} }{} \: is equivalent with \${html}{ \@{} }

The \$#2 primitive is used if something needs to be done for one device only, and it may occasionally appear in documents. For example, the PUD man macros enable the creation of a table of contents (for both html and troff). My own convention is to have a table of contents only in html, and I specify this using the sequence

\${html}{\"man::maketoc"}

When zoem enters device scope, it outputs all characters literally, except that the backslash still has special meaning. It is used for encoding the backlash itself (as \\), and for encoding the two curlies { and } (as \{ and \}). This is the same as in plain scope (except that in plain scope the resulting character may again be mapped onto something else, for example, in troff the backslash also needs encoding as \\).

In device scope the sequence \" maps to a double quote. This is an additional feature to allow zoem input to be more susceptible to some editors moving features. It is not necessary though; simply using the double quote without escaping it is sufficient.

Additionally, the backslash can be followed by a single letter from a prescribed set listed below. Such a backslash+letter combination is called a device directive. By default, zoem will never print consecutive newlines, and it will never print consecutive spaces or spaces at the beginning of a line. The device directives allow this to be altered.

garantuee a newline

garantuee a paragraph skip (two consecutive newlines)

garantuee a space (except if next char is newline)

increase indent by one (indent is printed after each newline)

decrease indent by one

set indent to zero

print newline

print space

print tab

stop managing white space (squashing spaces and newlines)

start managing white space (use after w)

start and scope (see further below)

set the special level (see further below)

Note that the directives mainly affect the lay-out of the device text (which is zoem output), not the look of the interpreted device text. The 'N' directive is rather important when constructing troff macros, as many special troff commands are encoded by a dot as the first character on a line, i.e. a newline followed by a dot. Since troff attaches special meaning to two consecutive newlines as well (interpreting it as a paragraph break), zoem needs to be able to specify print a newline only if the previous character was not a newline. This is exactly what the N directive means. The 'W' and 'w' directives are required for enabling the construction of a verbatim environment.

The sequence \&{<almost any>} can be used to avoid overly cumbersome constructions. It is for example illegal to write

\@{<table width="\width">}

In the early days of zoem, you had to write

\@{<table width="}\width\@{">}

— ugly by most standards. Today you write

\@{<table width="\&{\width}">}

Which is not any shorter, but more pleasant to read. What happens is that the contents of and scope \&{..} is first fully expanded in plain scope, after which the result is passed back to device scope. You have to be careful though. The content of \&{..} should never expand to something containing the at sequence \@{..}, because device scope is not allowed to nest. It should also not expand to something containing the and sequence \&{..} either, as this sequence is illegal in plain scope.

Device scope resulting from mapping special characters

5.11.1

The first kind of zoem escape introducing device scope is \@{..}. The second kind comprises the \special#1 mappings, including the three zoem glyphs \~, \|, and \-. Conventionally, these are used to encode a non-breaking space (  in html), a line break (  in html), and a long dash (emdash, not present in html). You would for example put

\if{\cmp{eq}{\__device__}{html}}{ \special{ {38} {&} \: 38 is ascii character '&' {60} {<} \: 60 -> '<' {62} {>} \: 62 -> '>' {-1} { } \: the zoem escape \~ {-2} { \!N} \: the zoem escape \| {-3} {-} \: the zoem escape \- } }{ }

All \special#1 definitions are interpreted in device scope. For every character encountered in plain scope, it is checked whether a \special#1 definition exists, if so, the corresponding string is retrieved and this is filtered through the device scope filter. Note that the three zoem glyphs described here may not be used in device scope, they can only be used in plain scope. In device scope you will have to write the explicit, device-specific sequence such as   (in html).

The \special#1 primitive allows different levels of mappings to be defined simultaneously. Several definitions of the same character are allowed; these are placed on a stack particular to that character (cf. the \special#1 entry). When zoem encounters a character for which one or more mappings exist, it retrieves a mapping using the special level. This is an integer that has by default the value 1. Each open output stream has a unique special level associated with it. [Output streams exist for the default output file (see e.g. \writeto#1) and for each file openend by \write#3]. A mapping is retrieved using this rule: The deepest element is fetched for which the depth does not exceed the level. The most visible element (which is the element first occurring in the \special#1 invocation) has depth 1.

The presence of different levels comes in handy e.g. when the troff device is used. In some contexts, the double quote is a special character in troff (and a printable quote is then mysteriously represented by two consecutive double quotes), in most contexts it is not. This is combatted by including these two specifications in the \special#1 call preparing for troff output (note that 34 is the ASCII value representing the double quote):

\special{ ... {34}{"} {34}{""} {92}{\\e} ... }

The first pair shown simply maps the double quote onto itself, and the second pair maps it onto a double double quote. As long as the special level is 1, the second definition is not used. The backslash (with ASCII value 92) needs only one definition as it is escaped in the same way in all troff contexts.

The special level can be set using the \+ directive, which must be followed immediately by a digit in the range 0-9 enclosed by curly brackets, e.g. \@{\+{2}} will set the special level to 2. The special level can be set to 0 (zero) and this means that no character will be mapped.

Example Double quotes need to be escaped in certain troff contexts. This is achieved by the following.

\@{\+{2}"} ... funny quote context ... \@{"\+{1}}

Such a context is typically encapsulated by a macro defined in a package; its definition should never be visibile to the user of the package. Note that the double quotes embedded in at scope in the example above are not susceptible to special mapping — mapping is only applied in plain scope.

Device scope resulting from glyph definitions

5.11.2

The third kind of device scope strings are those mapped to by the \constant#1 primitive. An example of (toy) usage is this:

\constant{ {'e} {é} \: Use e.g. as \*{'e}l\*{`e}ve (élève) {(c)} {©} \: Use e.g. as \*{(c)} DEEDEE (© DEEDEE) {+-} {±} \: Use e.g. as \*{+-} a few (± a few) }

This is largely convenient syntactic sugar. These constants could also have been defined as

\def{"'e"}{\@{é}} \def{"(c)"}{\@{©}} \def{"+-"}{\@{±}}

The idea is that the \*{..} namespace is used for glyph-like device-specific bindings, whereas the \".." namespace is used for semantic purposes that are device-independent, but nothing prohibits you from fiddling with this.

SGML/HTML/XML syntactic sugar cubes

5.12

Zoem provides a shorthand for entering SGML-style tags. It is checked by zoem for well-formedness of the resulting SGML code, and it can be freely mixed with other modes of entering tags. Normally you would have to enter SGML-style tags in device scope, or write a macro for doing that. For example, a macro x#2 that expands \x{b}{be bold} to \@{}be bold\@{} is a likely candidate. However, this would be inelegant for constructions that span a long distance, and it does not provide for letting zoem expressions expand within an xml tag.

Zoem provides the \< token. It can be used in several ways:

\<foo> \: foo can be an expression. Some <content> \: of course, expressions may occur here as well. over here. \<bar>{zut} \: bar can be an expression too; \: this syntax will close itself. Some <content> \: again, expressions may occur here as well. over there. \<> \: this is a closing tag for the first foo. \</foo> works too. \<tim x=y/> \: zoem knows this closes itself. \<*br> \: zoem converts this to

Suppose that foo, bar, and zut are zoem expressions expanding to strings FOO, BAR, and ZUT respectively (FOO and BAR might be of the form tag a="b" c="d"). Provided that the characters <, >, and & are automatically mapped in plain scope (as a result of correct \special#1 usage), the above will result in

<FOO> Some <content> over here. <BAR> ZUT </BAR> Some <content> over there. </FOO> <tim x=y/>

The foo part inside the \<foo> syntax should never expand to something containing a >. This is entirely the responsibility of the user or macro package author.

Both kinds of syntax, \<foo> and \<bar>{zut}, are kept as they are during the expansion stage, and they can be subjected to multiple levels of expansion (which may be the case if such syntax is used inside e.g. \setx#2 or \apply#2). It is only at the output stage that the syntax is transformed to actual SGML code and that well-formedness is checked. So, the two examples just seen will first transform to \<FOO> and \<BAR>{ZUT} (please note that foo, bar, and zut all denote expressions here). If they are at that point no longer subject to expansion they enter the output stage where they are converted to <FOO> and <BAR>ZUT</BAR> (plus some additional formatting/indenting) respectively.

Zoem pushes on a stack of opening tags whenever it encounters \<foo> syntax during the output stage. It naturally knows that a tag can be followed by attributes. It also knows that a tag such as \<tag a=b/> closes itself (XML syntax), and the applies for DTD tags such as \<!ENTITY ...>. As a special case, \<*tag foo bar zut> is converted to <tag foo bar zut> to allow encoding of HTML tags such as <meta>, <link>, and <hr>. This syntax is mandatory for tags that will not be closed. Note that you should only use \ if you are going to use \ or \<> as well (because zoem requires closing tags for opening tags). That said, the syntax \{ paragraph content } is preferable in most cases.

Zoem does not know about other ways of entering tags, so \@{<body>} would not affect the stack just mentioned. \<> automatically closes the top level opening tag from the stack. Again, syntax such as \@{</body>} does not interact with the stack.

It is possible to explicitly close a tag by simply using \</foo> syntax. Zoem will check whether the closing tag matches the top level opening tag. As seen before, \<> does the same thing, but rather than doing a check, zoem will use the top level opening tag to construct the corresponding closing tag.

Zoem miscellanea

Key signatures

6.1

Several keys take another key as argument, e.g. they store a value in a second key or check whether the second key exists. The full list of these meta keys is \def#2, \defx#2, \set#2, \setx#2, \undef#1, \defined#2, \apply#2, and \inspect#4. In all cases, the argument key is passed as the first argument, by means of the key signature.

For a key \key taking k (k>0) arguments its signature is key#k. The signature of a key \key taking no arguments is simply key. The rule is: Key usage always includes a single leading backslash (this activates the key). When a key is subject of inspection, it is always referred to by its signature.

Throughout this text, a key with signature key#k is mentioned by means of its key mention \key#k, that is, for extra clarity the backslash is prepended to the signature.

As explained in Primitives, built-ins and user macros, almost all primitives can be specified using quote syntax. The quote syntax is integrated with signatures. This means that primitives that expect a signature (such as \def#2, \undef#1, and \apply#2) accept quoted signatures too when the signature refers to a primitive.

Anonymous keys

6.2

A single underscore introduces an anonymous key. It is optionally followed by a #k tag (for k in 1..9), denoting the number of arguments the anonymous key takes. An occurrence of the latter is called a tagged anonymous key. The first argument to the key should be a key definition, the other arguments are the arguments for that key definition. If a tag is present, it is used for verifying that the anonymous key is used properly.

\_{\1 the \2}{row}{boat} \_#2{\1 the \2}{row}{boat}

results in

row the boat row the boat

Anonymous keys may occur in the first argument of \apply#2, within the first argument of \inspect#4, and they may occur freely in running text. The presence of a tag is required when an anonymous key is used within either of \apply#2 and \inspect#4. An example of usage in \apply#2:

\apply{_#2{\1 kisses \2\|}}{{bill}{max}{max}{bill}} bill kisses max max kisses bill

or even

\set{%foo}{{{\1 hugs \2\|}}} \apply{_#2\%{foo}}{{bill}{max}{max}{bill}} bill hugs max max hugs bill

Note that in order to store a block with \set#2, an extra pair of curlies has to be used, as blocks can only be passed as a sub-argument of a single-element vararg. Also note that in a vararg it is allowed to put white space inbetween the constituting elements.

Tree data

6.3

Data can be organized in a global tree with a specialized use of \set#2 and its siblings, as shown further below. The data is retrieved from the tree using so called data keys. Such a key is started using a percent sign, immediately followed by zero, one, or more blocks, e.g. \%, \%{..}, and \%{...}{...}{...} would all be allowable invocations. If more than one block follows the percent sign, there must be no interleaving white space.

The underlying primitive has signature %#1, as the trailing scopes are congregated into a single argument before they are further processed. The two sibling primitives %free#1 and %dump#1 serve for freeing and dumping parts or whole of the tree, as described further below.

When applied to data keys, \set#2 and its siblings set one or more values in a global multi-dimensional associative array that we shall refer to here as ROOT. Please note that ROOT is for explanatory purposes only. This associative array is best viewed as a tree, in which every node can have branches to higher nodes. A node may or may not contain a value. Let us denote the value contained by a node some-node as *(some-node). The fact that beta is a node one branch higher than alpha, which is in turn one branch higher than ROOT, is denoted as ROOT->"alpha"->"beta". In this path notation, strings indexing nodes in the trees are written inbetween quotes. This has the advantage that the empty string, which is a valid index string, has the representation "". Combining these conventions, we write the value associated with beta as *(ROOT->"alpha"->"beta"). Consider these examples.

\set{%{foo}{bar}{zut}}{lez} \: now *(ROOT->"foo"->"bar"->"zut") is "lez" \set{%{foo}{bar}{zut}}{{a}{b}{x}{y}} \: now *(ROOT->"foo"->"bar"->"zut"->"a") is "b" \: and *(ROOT->"foo"->"bar"->"zut"->"x") is "y" \: and *(ROOT->"foo"->"bar"->"zut") still is "lez" \: some special cases \set{%{foo}{bar}{zut}}{{{c}}} \: now *(ROOT->"foo"->"bar"->"zut") is "{c}" \set{%{foo}{bar}{zut}}{{c}} \: now *(ROOT->"foo"->"bar"->"zut") is "c" \set{%{foo}{bar}{zut}}{c} \: now *(ROOT->"foo"->"bar"->"zut") is "c" \set{{foo}{bar}{zut}}{{c}{d}{e}} \: This does nothing, because the second argument \: must either be an *even* vararg, a 1-element vararg, \: or a simple argument. \set{%{{tiger}}}{in the woods} \: now *(ROOT->"{tiger}") is "in the woods" \set{%{tiger}}{in the woods} \: now *(ROOT->"tiger") is "in the woods" \set{%tiger}{on the loose} \: now *(ROOT->"tiger") is "on the loose" \: stripping curlies from a vararg with one argument \: does not make a difference with the exception of \: the case shown below. \set{%{}}{empty} \: now *(ROOT->"") is "empty". \set{%}{root} \: now *(ROOT) is "root".

Take note of the rule governing the second argument. If the first non-white space character is a left curly, \set#2 expects a vararg. The vararg must either be even or it must contain exactly one argument.

An even vararg is interpreted as a sequence of key-value pairs. Each key induces a new branch from the node specified in the first argument, and each value is associated with the node at the end of that branch. If the vararg contains exactly one argument, that argument is simply used as a value. This is the only way to specify a block as the value. If the first non-white space character is not a curly, \set#2 will simply interpret the second argument as a value to append to the node specified in the first argument. It is possible to sidestep these issues by using \set#3 and the directive u as argument to the modes key. This will cause the value to be copied without further interpretation as a vararg or block.

If you want the data to be stored to be expanded before it is bound, use \setx#2 or \defx#2.

Data is retrieved simply by prefixing the path with the \% token. Example:

\set{%{foo}{bar}{zut}}{{a}{b}{x}{y}} \%{foo}{bar}{zut}{a} \%{foo}{bar}{zut}{x}

will print b and y. If the path is not an existing path in the current tree it will simply be ignored, although an error message will be emitted.

Whole or part of the data tree can be freed using the \%free primitive. Again, simply append the access sequence in which you are interested. For freeing the entire tree, use \%free without trailing scopes. The \%free primitive largely exists for testing purposes to ensure that zoem gets its internal data manipulation right.

Whole or part of the data tree can be output for debugging purposes using the \%dump primitive. Simply append the access sequence in which you are interested. For printing the entire tree, use \%dump without trailing scopes. This can be used for debugging if your data manipulation does not work out as expected. There is no result text as far as usual processing is concerned. The underlying primitive dumps its findings to STDOUT in a line-based textual representation of the data-tree.

Of blocks and varargs

6.4

A block is a string beginning with a left curly and ending with a right curly, the curlies being balanced. This is a convenient naming convention. Blocks can be used in constructing anonymous keys; refer to the Anonymous keys section.

Some keys take a vararg argument, which is a single argument (enclosed by curlies as are all arguments), which can contain any number of sub-arguments, that is, a list consisting of blocks. Inbetween the blocks white space may occur. The \special#1 and \constant#1 keys both take a single vararg argument, and the \apply#2 and \switch#2 keys each take a vararg as their second argument. For \apply#2 the first argument is a key that is applied to subsequent batches of arguments from that vararg. The \table#5 primitive takes a vararg as its last argument. For examples, see the A zoem tour section and the \apply#2 entry.

An even vararg is a vararg with an even number of elements, An odd vararg is a vararg with an odd number of elements.

User keys may check whether an argument is a vararg by employing the \nargs#1 primitve. This can be used to take different actions depending on the structure of the argument.

Session keys

6.5

This is a compact listing of session keys, created by issuing zoem -l session.

Predefined session variables \$__args__ (local to env) key/value pairs given to \begin#2 \$__xargs__ (local to env) key/value pairs given to \begin#2, expanded \__device__ name of device (given by -d) \__fnbase__ base name of entry file (given by -i/-I) \__fnentry__ name of entry file (given by -i/-I) \__fnin__ name of current input file \__fnout__ name of current output file \__fnpath__ path component of entry file (given by -i/-I) \__fnwrite__ arg1 to \write#3, accessible in arg3 scope \__lc__ expands to a left curly (only for magic) \__line__ index of current input line \__parmode__ paragraph slurping mode for interactive sessions \__rc__ expands to a right curly (only for magic) \__searchpath__ search path for macro packages (e.g. man.zmm) \__split__ user space toggle for chapter mode indicator \__sysval__ exit status of last system command \__version__ version of zoem, formatted as e.g. 2003, 2004-010 \__zoemput__ result text of last \try#1 key \__zoemstat__ status of last \try#1 key

This manual is littered with examples of the usage of \__device__. The \__fnbase__ key is useful for creating sibling files of the entry file, i.e. a table of contents file or a file containing reference information. I have the habit of naming those \__fnbase__.zmt and \__fnbase__.zmr, respectively. The \__fnin__ key is useful for emitting log, warning, or error messages particular to the file currently being parsed. The \__parmode__ macro affects the way in which zoem reads chunks in interactive mode (refer to Section Invoking zoem from the command line). The \__searchpath__ macro is one of the ways in which zoem can be instructed to search for files in a set of locations, when confronted with \dofile#2 or one of its built-in aliases. Section File search path has more information about the mechanism of file location. See also the zoem manual.

Built-in macros

6.6

This is the result from doing zoem -l builtin:

Built-in aliases done maps to \'throw{done} ifdef#3 maps to \'if{\defined{\1}{\2}}{\3}{} ifdef#4 maps to \'if{\defined{\1}{\2}}{\3}{\4} input#1 maps to \'dofile{\1}{!+} import#1 maps to \'dofile{\1}{!-} read#1 maps to \'dofile{\1}{?+} load#1 maps to \'dofile{\1}{?-} begin#1 maps to \'begin{\1}{} env#3 maps to \'env{\1}{}{\2}{\3} system#2 maps to \'system{\1}{\2}{} system#1 maps to \'system{\1}{}{} throw#1 maps to \'throw{\1}{} inform#1 maps to \'write{stderr}{device}{\1\@{\N}} append#2 maps to \'set{{modes}{a}}{\1}{\2} appendx#2 maps to \'set{{modes}{ax}}{\1}{\2} seq#4 maps to \'set{\1}{\2}\'while{\'let{\'get{''}{\1}<\3}}{\4\'setx{\1}{\'let{\'get{''}{\1}+1}}} update#2 maps to \'set{{modes}{e}}{\1}{\2} updatex#2 maps to \'set{{modes}{ex}}{\1}{\2} "" maps to (nothing) ""#1 maps to (nothing) group#1 maps to \1 PI maps to 3.1415926536 E maps to 2.71828182846 $#1 maps to \'switch{\__device__}{\1} $#3 maps to \'switch{\__device__}{{\1}{\2}{\3}}

Dictionary stacks

6.7

By default, when using \def#2 and \set#2, keys and their values are put in a global user dictionary. It could be useful to shadow keys by entering a new name scope. Zoem facilitates this by providing the \push#1 and \pop#1 keys. These push and pop a new dictionary onto/from the user dictionary stack.

A second dictionary stack is the dollar dictionary stack, which contains all keys that start with a dollar sign. The \begin#2 primitive pushes a dollar dictionary each time it is invoked, and that dictionary is popped by the corresponding \end#1 invocation. This is typically useful for creating nested environments that need access to the same type of information - by storing such information in dollar keys, it can be shadowed and recovered. Refer to the \begin#2 entry.

When a key is used its definition is searched in all dictionaries, starting from the top-level dictionary. The key \undef#1 has only access to the top-level dictionary, and will never delete a key in any other dictionary.

A zoem tour

What follows is an informal tour through zoem's offerings. The next section contains a comprehensive overview of the zoem primitives.

Let us start with how filtering in plain space is configured. The following was obtained from the PUD man macros.

\switch{\__device__}{ {html}{ \special{ {38} {&} {60} {<} {62} {>} {-1} { } \: the zoem escape \~ {-2} { \!N} \: the zoem escape \| {-3} {-} \: the zoem escape \- } } {roff}{ \special{ {46} {\\.} {96} {\\`} {92} {\\\\} \: a single backslash {-1} {\\ } \: the zoem escape \~ {-2} {\!N.br\!N} \: the zoem escape \| {-3} {\\-} \: the zoem escape \- } } {\write{stderr}{txt}{No such device: \__device__\|} \write{\__fnbase__.err}{txt}{No such device: \__device__\|} \exit } }

Take note of the number of backslashes. In order to print a backslash in troff, the troff input must contain two consecutive backslashes. In order to specify a backslash in zoem, we must also provide two, thus we need four backslashes in all (in order to create this example I needed eight backslashes in the zoem input).

Also note the use of the \switch#2 primitive, which takes an expression in the first argument and an arbitrary number of pairs plus an optional clause in the second argument. The optional clause was in this case used as a failure test.

\special#1 is an example of a zoem key taking an argument that may contain arbitrarily many sub-arguments (i.e. a vararg). In this particular case the sub-arguments must be paired, each pair defining how certain characters that are special to the device must be represented.

The \write#3 and \exit need little comment, they work as expected. Zoem opens output files as needed, and closes them when it is done. The file name - is equivalent to either STDOUT or STDIN (depending on context), the file name stderr denotes STDERR.

\exit is considered a failure (and will cause zoem to stop and complain), but \throw{done} is not. \throw#2 with argument done will merely quit parsing the current stack, so if you specify it at top level in a file — not nested in a key that does its own parsing such as \setx#2, zoem will stop parsing the current file and transfer control back to the file from which it was included.

The previous example introduces the keys \__device__ and \__fnbase__. They are so called session variables described in section Session keys.

A sibling primitive to \special#1 is \constant#1. The following is an example of use.

\constant{ {'e} {é} \: Use e.g. as \*{'e}l\*{`e}ve (élève) {(c)} {©} \: Use e.g. as \*{(c)} DEEDEE (© DEEDEE) {+-} {±} \: Use e.g. as \*{+-} a few (± a few) }

The main thing to note here is that the target string (e.g. é) is always interpreted in device space. In the reference string (e.g. 'e, (c) and +- in the example above) almost anything is allowed, including backslash-escaped characters and balanced curlies. The latter are not recommended though.

There are three zoem tokens representing the characters that have meaning to zoem syntax, the backslash and the two curlies. Those zoem tokens are just like any other plain characters: they can be mapped in plain space, and they are printed literally in device space.

\\ \: A backslash; possibly mapped in plain space. \{ \: A curly; possibly mapped in plain space. \} \: A curly; possibly mapped in plain space. \, \: The atomic separator (vanishes).

These tokens are mapped only during the (final) filter stage. The atomic separator can be useful when you want to glue together items some of which will be the result of macro expansion.

\def{foo}{bar} \foo\,1 \: \foo1 would be the key \foo1

This will result in bar1. The tokens \\, \{, and \} are really the corresponding ordinary characters. They can be mapped in plain space via \special#1 using their ASCII values 92, 123, and 125 as was seen above for the backslash. In device space, they will result in \, {, and }.

Let us now continue with device scope by implementing a \bf#1 key. Below you find two possible definitions:

\def{bf}{1}{\@{} \1 \@{}} \: OK \def{bf}{1}{\@{ \1 }} \: Wrong! Wrong!

The second is wrong because the contents of \1 end up in device space. If the expansion of \1 still contains keys they will not be expanded (and cause a fatal syntax error when device space is filtered), and additionally any special characters in \1 will not be mapped.

The zoem language

Alphabetic index

8.1

\! \!#1 \$#2 \apply#2 \begin#2 \catch#2 \cmp#3 \constant#1 \def#2 \defx#2 \defined#2 \dofile#2 \dowhile#2 \__env__#1 \end#1 \env#4 \eqt#3 \branch#1 \eval#1 \exit \f#2 \f#3 \fv#2 \finsert#1 \format#2 \formatted#1 \get#2 \if#3 \inspect#4 \length#1 \let#1 \nargs#1 \pop#1 \protect#1 \push#1 \register#2 \set#2 \setx#2 \set#3 \special#1 \switch#2 \system#3 \table#5 \textmap#2 \throw#2 \tr#2 \trace#1 \try#1 \undef#1 \vanish#1 \while#2 \whilst#2 \write#3 \writeto#1 \zinsert#1

Topic index

8.2

This is an overlapping categorization in topics.

Using and inspecting keys

\apply#2 \constant#1 \def#2 \defined#2 \defx#2 \inspect#4 \pop#1 \push#1 \set#2 \setx#2 \table#5 \undef#1

These primitives affect or use either user keys that are stored in the user dictionary, dollar keys that are stored in the dollar dictionary, or anonymous keys. Dictionaries are discussed in Section 6.7.

Control, booleans, testing and comparison

\$#2 \apply#2 \branch#1 \cmp#3 \defined#2 \dowhile#2 \eqt#3 \if#3 \length#1 \register#2 \switch#2 \undef#1 \while#2

Expansion, delay

\! \!#1 \apply#2 \defx#2 \eval#1 \setx#2

Meta-zoem, introspection, exceptions, errors

\catch#2 \exit \throw#2 \try#1

Execution, tracing

\trace#1

Input/output

\dofile#2 \finsert#1 \format#1 \register#2 \vanish#1 \write#3 \writeto#1 \zinsert#1

Filtering

\special#1 \vanish#1

Environment scopes

\begin#2 \end#1 \env#4

Name scopes

\pop#1 \push#1

Data storage

\%, \%free, and \%dump are primitives described elsewhere — refer to the Tree data section.

String conversions

\format#2 \inspect#4 \length#1 \textmap#2 \tr#4

Arithmetic

\f#2 \f#3 \fv#2 \let#3

Glyphs

\constant#1

Syntactic sugar

\formatted#1 \vanish#1 \<>#1 \<>#2

Primitives

8.3

Zoem primitives may expand (which is the same as evaluate) one, several, or all of their arguments before using them. Such arguments are enclosed by double angle brackets in the listing below. The inside-out type evaluation is done recursively and works for arbitrary levels of nesting. An argument which is first expanded and is then interpreted as a label is thus written <label> in the primary entry. In the definition text accompanying the entry, the expanded argument is simply refered to as <label>, so the extra pair of brackets is dropped.

Each primitive below has a little paragraph with the caption Result text. It gives a summary of 'what comes out'. Note that the result of macro expansion is always passed to the parser again, so the result text is again subject to expansion.

\<>#1

\<>#2

These are special. Refer to section SGML/HTML/XML syntactic sugar cubes. The angle brackets are part of the syntax, do not confuse them with the angle brackets used below to enclose arguments. These primitives are respectively used as \<*any*> and \<*any*>{*any*}, so the positioning of arguments is different from all other zoem primitives.

\!

This primitive is triggered by an active backslash followed by a consecutive run of exclamation marks, which is not followed by an opening curly. The sequence is called a delay sequence, and its arity is the count of backslashes. A single exclamation mark is stripped (i.e. the arity is decremented) and the sequence is no longer subject to the current expansion stage. It is used to construct valid zoem input, which is usually redirected to file with the copy filter, stored using \setx#2, or used in nested occurrences of \apply#2, \inspect#4, and other primitives. Other uses are possible, the main thing is that one should keep a clear view of the meta-programming implied by \!. Refer also to the \eval#1 primitive.

Example The primitive \eval#1 evaluates its argument a single time, and passes it on for further evaluation. The following are fully equivalent:

\set{foo}{zut} \eval{\!foo} \foo

whereas \!foo would pass the sequence \foo to the filtering stage, where it will yield a (non-fatal) error message. Similarly, \eval{\eval{\!!foo}} is equivalent to the above.

Result text A delay sequence of decremented arity.

\!#1: \!{<any>}

This primitive is triggered by an active backslash followed by a consecutive run of exclamation marks, which is in turn followed by a block. The block is called a delay scope. The arity of the delay scope is the count of backslashes found in the run. A single exclamation mark is stripped (i.e. the arity is decremented); if no further exclamation marks remain (i.e. the arity becomes zero) then the introducing backslash and the delimiting curlies are stripped as well. The result (including the contents of the block) is passed on and is no longer subject to the current expansion stage. The same observations hold as those made for the previous entry. Refer also to the \eval#1 primitive.

Additionally, blocks that are protected by the delay primitive will be skipped during parameter interpolation.

Result text <any>, if the arity of the delay scope just found was equal to one, otherwise, <any> put in a decremented delay scope. <any> will in both case no longer be subject to the current expansion stage.

\$#2: \${<str>}{<any>}

This is a shortcut for activating output for a particular device. If \__device__ expands to <str>, <any> is passed on for expansion, otherwise it is ignored. The following two are equivalent:

\${html}{Seen only in the html device} \if{\cmp{eq}{\__device__}{html}}{Seen only in the html device}{}

Result text Either none or <any>.

\apply#2: \apply{<<key-sig|anon-key>>}{<<vararg>>}

The first argument is expanded before use. It should expand either to the signature of a user key, primitive or builtin taking arguments, or to a tagged anonymous key. Examples of the first are foo#k and "bar::baz"#k, the latter takes the form _#k{..}. If you use an anonymous key containing macro sequences, be sure to escape whole or part of the anonymous key, depending on your needs. The expansion of _#2\!{{<any>}} for example, will result in _#2{<any>}. Primitives can used in both quoted and regular syntax.

The second argument should result in a vararg. \apply#2 extracts k elements at a time from <vararg>, and applies the key resulting from the first argument to each vector of k elements successively. Any elements remaining in <vararg> are ignored.

Result text Entirely depending on the key specified in the first argument.

\begin#2: \begin{<label>}{<<vararg>>}

\begin#2 pushes a new dictionary onto the dollar dictionary stack which is popped by the matching \end#1.

\begin#1 is an alias which invokes \begin#2 with an empty <vararg> argument.

\begin#2 pushes the begin expression associated with <label> via \env#4. The <label> part is not expanded. The second argument <vararg> consists of consecutive scopes denoting key-value pairs. It is expanded before use and is allowed to be empty. The keys (which are the odd-numbered scopes, starting with one) in <vararg> must be such that prepending a dollar sign ($) to them yields a valid key signature. That signature will be used to set a dollar key in the newly pushed dollar dictionary that expands to the value part associated with the key (specified as the consecutive even-numbered scope in <vararg>). Alternatively, since the 07-333 release, it is also possible to explicitly include the dollar sign in the keys rather than having them prepended.

The \env#4 invocation that defines the environment likely sets defaults for the dollar keys (via the second argument of \env#4) that can be set as described above.

The pushing of a dictionary provides a means for shadowing and localization with nested \begin#2 statements. By associating dollar keys with an environment, these keys can be given different meanings in nested environments - the previous meaning will be restored once an environment is closed. The advantages are that the environment does not have to exercise \push#1 and \pop#1 itself, that the user dictionary stack is not unnecessarily extended (saving look-up time), and that the 'dollar' look of a key such as \$align signals that it will automagically work in nested enviroments. Of course, the latter is still the responsibility of the author of the environment.

Result text The string associated with <label> via \env#4.

Example I The zoem faq macros define a faqsec environment, for which two additional arguments are required. It is used for example as

\begin{faqsec}{{ref}{misc}{cap}{Miscellaneous questions}} ... \end{faqsec}

The ref key introduces the label, the cap key introduces the caption.

Example II The itemize environment in the zoem generic package allows one additional and optional argument. This argument, if present, must contain a vararg, and is used to set options related to the itemize environment. It is used for example as

\begin{itemize}{ {interitem}{0} {flow}{compact} {mark}{\*{itembullet}} {align}{right} } ... \end{itemize}

Internally, the itemize environment maps these options to dollar keys. Because a unique dollar dictionary is associated with each environment, this makes it possible for nested itemize instances to have separate namespaces.

Result text The (unexpanded) string stored in the third argument of the corresponding \env#4 invocation.

\catch#2: \catch{<type>}{<any>}

This will process <any>. Depending on <type> the result is accepted as succesful. If <type> is towel, any occurrence of \throw{towel} in <any> is caught, and the truncated result is further processed. For towel, zoem errors are not caught but cascade/escalate further down/up. If <type> is error, any error in <any> is caught, and the truncated result is further processed. If <type> is done, no exception is accepted. It is possible though to use throw{done{..}} which will stop processing without generating an exception. See also \throw#2.

Output will be truncated in case an error or exception was caught. The status, currently one of done, towel, or error, is written in the session macro \__zoemstat__.

Result text The possibly truncated result of expanding <any> in case of a caught exception or error, else the full result.

\cmp#3: \cmp{<str>}{<<any>>}{<<any>>}

The last two arguments are expanded. Their results are compared as strings. The first argument must be one of the labels lt|lq|eq|gq|gt|ne|cp. In case it equals one of the six labels lt|lq|eq|gq|gt|ne, this primitive puts in place the associated boolean as a string (i.e. either 0 or 1). In case the label equals cp, it puts in place the result of the string compare (as a string), namely one of -1, 0, or 1.

Result text Either the string enconding of a boolean (0 or 1) or the string encoding of the ternary value resulting from a string compare (-1 or 0 or 1).

\constant#1: \constant{<vararg>}

<vararg> must have an even number of arguments. These are interpreted as pairs. The first of each pair must enclose a string that does not contain any of the characters *, \, {, or }, say string <keystr>. The second encloses a string that will be interpreted in device space, say string <valstr>. When a sequence \*{<keystr>} is encountered, it is interpreted as \@{<valstr>}. This is done at filter time only, the sequence is skipped during macro expansion. It is not allowed to use a sequence \*{<keystr>} in device scope, e.g. \@{\*{<keystr>} is illegal. For further information see section Device scope.

Result textNone.

\def#2: \def{<key-sig|data-seq>}{<any>}

Bind second argument to the key or access sequence in the first argument. This primitive will complain if a binding exists already, but it will overwrite the previous binding and continue anyway. Use \set#2 if you do no want to be warned for overwriting. Examples of usage:

\def{foo}{FOO} \def{foo#1}{The FOO of \1} \def{foo#2}{The FOO of \1 and \2} \def{$foo#2}{The $FOO of \1 and \2} \def{"foo::oof"#3}{\foo{\1}{\2}\foo{\2}{\3}}

These examples are all of type key-sig. See the Tree data section for examples of type data-seq (this pertains to multi-dimensional data storage). See the Macro expansion section for the forms that keys may take. A key signature is the name of a key with appended to it the number of argument that the key takes, if any. If the key takes no arguments, than the key signature is identical to the key name.

If you want the value to be bound to be expanded before binding it, use either \defx#2 or \setx#2. This works the same for data keys.

Result textNone.

See also\set#3.

\defx#2: \defx{<key-sig>}{<<any>>}

The second argument is expanded and stored in the key <key-name>. This primitive will complain if a binding for that key exists already, but it will overwrite the previous value anyway.

Result textNone.

See also\set#3.

\defined#2: \defined{<type>}{<<access>>}

<type> is one string of key, lkey, data, primitive, builtin, zoem or ENV. The second argument is expanded before use. For the type key, the <access> argument is looked up as a key signature in either the user dictionary stack or the dollar dictionary stack. For the type lkey, it is looked up only in the top level dictionary. For the type data, the <access> argument is interpreted as a data access sequence. For the type primitive, the <access> argument is looked up in the zoem primitive table. For the type builtin, the <access> argument is looked up in the zoem builtin macro table. The type zoem corresponds with the union of the types primitive and alias. For the type ENV, it is checked whether it exists as an environment variable (which can be retrieved using the \__env__#1 primitive). This primitive pushes the string 1 if the result is indeed a valid reference, it pushes the string 0 otherwise.

Result textThe string encoding of a boolean (0 or 1).

\dofile#2: \dofile{<<file name>>}{<char[!?]><char[+-]>}

Open a file and process its contents while keeping track of line numbers. Depending on the second argument, absence of the file is either allowed or not, and its interpreted contents are output or not. The fact that <file name> is first expanded allows you to specify file names such as \__fnbase__.zyx.

Zoem may search for a file in several locations until it is found. The process of locating a file is described in section File search path.

When found, the file is opened according to the specification in the second argument. This argument must contain exactly two characters, the first one of [!?], the second one of [+-]. The first character indicates whether the file is allowed to be absent. A '!' implies that absence is fatal, a '?' permits absence. The latter is useful e.g. when creating a Table Of Contents file. The second character indicates whether the interpreted file should be filtered and output or not ('+' for yes and '-' for no). Macro packages typically need interpretation only, whereas concatenation of document parts (c.q. chapters) stored in different files requires that the interpreted content is also filtered and output. The following aliases are available:

\input{fname} \dofile{fname}{!+} \import{fname} \dofile{fname}{!-} \read{fname} \dofile{fname}{?+} \load{fname} \dofile{fname}{?-}

The contents of <file name> cannot be captured. If you need to capture the contents of a file, use \finsert#1 or \zinsert#1.

Result text Technically none. Of course the processing of <file name> may result in output, depending on the mode of opening. However, this result text cannot be captured. For example,

\setx{foo}{\dofile{bar}{!+}}

will result in the file <bar> being processed and output via the standard output mechanisms, while the key \foo will have the empty string as value.

\dowhile#2: \dowhile{<any>}{<condition>}

<any> is expanded and concatenated to the result text until <condition> exands to something that is nonzero when interpreted as an integer. <any> is expanded at least one time.

\__env__#1: \__env__{<name>}

Looks up <name> in the environment.

Result text The corresponding value if <name> exists in the environment, the empty string otherwise.

Note This primitive yields identical results for names not in the environment and names in the environment for which the value is empty. Use \defined#2 to check whether <name> exist.

\end#1: \end{<label>}

Expands the end definition associated with <label> via \env#4.

Result text The string associated with <label> via \env#4.

\env#4: \env{<label>}{<<any1>>}{<any2>}{<any3>}

Stores expanded <any1> and unexpanded <any2> for later use with \begin#2 (when given argument <label>) and unexpanded <any3> for later use with \end#1 (when given argument <label>).

<any1> may contain a vararg denoting key-value pairs. These will be set for each \begin#2 invocation in the corresponding dollar dictionary. It provides a convenient mechanism to set default values for keys that can be passed in the second argument of \begin#2. Note that keys are passed as regular macro signatures, but they are then transformed to dollar keys by prepending a dollar sign. Environments are tighly linked to the dollar dictionary stack. Read more about this in the description of \begin#2.

With each \begin#2 invocation, after <any1> is processed as indicated above, <any2> will be pushed onto the input stream. Before this, \begin#2 defines the keys \$__args__ and \$__xargs__. These contain respectively the vararg that was passed as the second argument of \begin#2 and the same vararg after it was expanded. These keys can be used in <any2>. One possible usage is to pass the key-values on to other environment invocations. This is a likely scenario in case one environment is a thin customization wrapper around a full-fledged base environment.

Result textNone.

\eqt#3: \eqt{<str>}{<<num1>>}{<<num2>>}

The last two arguments are expanded. Their results are compared as numbers. The first argument must be one of the labels lt|lq|eq|gq|gt|ne|cp. In case it equals one of the six labels lt|lq|eq|gq|gt|ne, this primitive pushes the associated boolean as a string (i.e. either 0 or 1). In case the label equals cp, this primitive pushes the result of the integer compare, namely one of -1 (if the result of <any1> is smaller than the result of <any2>), 0 (equal to), or 1 (greater than).

Result text Either the string enconding of a boolean (0 or 1) or the string encoding of the ternary value resulting from a numeric compare, (-1 or 0 or 1).

\branch#1: \branch{<<vararg>>}

Two arguments are successively taken from <vararg>. The first is expanded and then evaluated as an integer. If the integer is nonzero, the second argument is expanded and everything else is ignored. Otherwise the procedure is repeated. If no (odd) argument matches, and the <vararg> has an odd number of arguments, the last argument is put in place. It can be considered a default, else, or failure clause.

\eval#1: \eval{<<any>>}

Expands <any> and passes it on for further evaluation. This can come in handy when complicated requirements demand zoem acrobatics. This primitive used to be implemented as a macro; it is fully equivalent to

\set{eval#1}{\apply{_#1{\!1}}{\1}}

The above macro works as follows. First, \apply#2 expands both of its arguments. The second argument is the data it received from \eval, i.e. the latter's single argument. At this stage, the data is thus expanded for the first time. \apply#2 also expands its first argument. The sequence \!1 is contracted to the sequence \1. The \1 needed by \apply#2 needs to be protected by the interpolation that occurs when eval#1's argument is interpolated. Expansion of \apply#2's first argument thus yields the anonymous key _#1{\1} — a key that simply copies it argument and passes it on for further expansion.

Example \foo and \eval{\!foo} are fully equivalent. \!foo on the other hand, expands to \foo and is then passed to the filter stage.

See also\set#3.

\setx#2: \setx{<key-sig>}{<<any>>}

The second argument is expanded and stored in the key <key-name>. Besides simply storing the expansion of an expression, it can also be used to do trickier things as

\def{bar}{klaas} \setx{foo#2}{\bar says \1 and \2} \foo{x}{y} klaas says x and y

If you need lambda-like capabilities, take note that you can use \!k or \!{\k} to construct a positional parameter \k, if you want to interpolate arguments into a key that will later take other arguments. Like this:

\: is there any use for this wacky stuff? \def{lambda#2}{\setx{\1#1}{\2 says \!1}} \lambda{foo}{bar} \foo{moo} bar says moo

Take care: the \dofile#2 key outputs to the default output file. If you need to include the contents of a file within a \setx#2 call, you need to use \finsert#1 or \zinsert#1 in conjunction with \setx#2.

Result textNone.

See also\set#3.

\set#3: \set{<{modes}{..}{if}{..}{unless}{..}{dict}{..}{start}{..}{width}{..}>}{<key-sig>}{<any>}

This primitive encompasses all of the previous four as well as providing additional modes of operation. The first argument is a vararg storing key-value pairs. The possible keys are modes, if, unless, dict, start and width. All of these are optional.

The <modes> value, if present must be a string over the following characters.

append to the key, do not overwrite, create if not existing.

conditionally; only set if not already defined.

existing; update existing key, possibly in lower dictionary.

global; set in the global (bottom) user dictionary.

unary; do not interpret vararg in <any> as key-value list (data keys only)

vararg; interpret vararg in <any> as key-value list (regular keys only).

warn if key exists (like \def#2 and \defx#2).

expand argument (like \setx#2 and \defx#2).

The u directive applies when setting data. The v directive applies when setting regular keys. In this case, <key-sig> must be empty, and <any> is treated as a vararg of repeated key-value pairs. Directives can be combined as needed.

Note that keys in the global user dictionary can be accessed even if other dictionaries are pushed using the syntax \''foo.

The if and unless directives can be used to trigger action (i.e. definition of keys) only if the corresponding clause evaluates to non-zero or zero, respectively.

The dict directive must be followed by a dictionary name in the subsequent block. The dictionary stack will be searched for a dictionary with this name. The type of dictionary is derived from the key signature. This is either the dollar dictonary used for the dictionary stack associated with \begin#2 (if the key starts with a dollar sign $), or the default user dictionary.

The start and width directives only work if a single key is set. Their values should evaluate to integers <start> and <width>. The key will be set to its old value with a segment of length <width> starting at offest <start> replaced by <any>. Offsets are zero-based and units are in bytes.

Result textNone.

See also\def#2, \defx#2, \set#2, \setx#2.

\special#1: \special{<<vararg>>}

<vararg> is expanded before use. It must have an even number of arguments. These are interpreted as pairs. The first of each pair must enclose an integer in the range 0-255 or one of the special token identifiers -1, -2 or -3. The integers in the range 0-255 are interpreted as character indices The characters indexed -1, -2 and -3 correspond with the zoem glyphs \~, \|, and \- respectively. The second element in each pair defines the string to which the character specified by the first element must be mapped. This string is interpreted in device scope. See the Device scope section for simple uses.

A key may occur multiple times. The corresponding definitions are stacked away and will be accessed according to the current special level (cf. the section on mapped characters in device scope).

Repeated use of \special#1 does not cause the removal of previous definitions, with one exception: If \special#1 is invoked with no arguments at all then all definitions are removed.

Note Be sure to use delay sequences as appropriate, noting that vararg is expanded. Below is how Portable Unix Documentation encodes a line break in troff:

{-2} {\!N.br\!N}

Zoem interprets the value and accordingly associates the device scope sequence \N.br\N with the token \|. The escape sequence \N will thus be processed during the filter stage as is appropriate. Without the delay sequence zoem would try to expand \N during processing of the \special#1 primitive.

Result textNone.

\switch#2: \switch{<<pivot>>}{<vararg>}

The first argument is expanded. Subsequently, two arguments are successively taken from <vararg>. The first is expanded and string compared with <pivot>. If they match, the second argument is expanded and everything else is ignored. If they do not match, the procedure is repeated. If no (odd) argument matches, and the <vararg> has an odd number of arguments, the last argument is put in place. It can be considered a failure clause. This primitive does not have fall-through behaviour; at most one branch will be handed to the parser.

Different cases can be grouped in a vararg. If \switch#2 recognizes that the test argument can be parsed as a vararg it will exctract all the corresponding sub-arguments. If the pivot matches any of these the branch corresponding to the test argument will be taken.

Result text Either the first block associated with a matching case of <pivot>, the failure clause, or nothing at all.

\system#3: \system{<cmd>}{<<args>>}{<<data>>}

By default this primitive is disallowed. The first argument is the name of a system command. The environment variable PATH is used in tracing the location of the command. If <data> is non-empty, it is first unprotected, that is, escaped backslashes and curlies are unescaped. The resulting text is then fed to command <cmd> with arguments <args>. The latter, if non-empty, must be specified as a vararg, even if only a single argument is present. Should execution of the command (be it with or without data) result in output on STDOUT then the latter is captured, backslashes and curlies are escaped (i.e. the output is protected), and the result is put in place. STDERR is the same as it is for the parent (zoem).

Note The security implications of this feature. By default zoem will ignore \system. The command line option --unsafe will cause zoem to prompt for user confirmation (if prompting is not possible it will ignore again) for each encountered \system invocation. The option --unsafe-silent will silently allow all \system invocations. The option --allow=cmd[:cmd]* explicitly specifies which commands to allow silently. It is also possible to use this option repeatedly rather than separate different commands by colons.

If the zoem command line option --system-honor is used, zoem will exit if a system command fails or is ignored.

A simple exit status is written in the variable \__sysval__: it is zero (0) on success, and one (1) on failure.

Refer to the manual page of the zoem interpreter for more information on --unsafe, --unsafe-silent, --allow=, and --system-honor.

Built-in macros \system#2 and system#1 exist. The former drops the <data> argument, the latter also drops the <args> argument.

Example

\system{sort}{{-n}}{\finsert{foo}} \system{ls}{{-l}{-a}} \system{date}

Result text The output captured (and then protected) from <cmd>'s STDOUT, if any.

\table#5: \table{<<row length>>}{<any1>}{<any2>}{<any3>}{<<vararg>>}

The first argument is expanded and interpreted as an integer, say k. Successively, vectors of k elements are shifted from <vararg>. Each vector is bordered on the left with <any1>, bordered on the right with <any3>, and all elements in the vector are separated with <any2>.

This primitive is perhaps not really needed as its functionality is largely covered by \apply#2.

Result text The blocks from <vararg> interspersed in a table-like manner with <any1>, <any2>, and <any3>.

\textmap#2: \textmap{<vararg>}{<<any>>}

Apply one or more transformations to <any> and put the result in place. <vararg> takes a succession of key-value pairs. The associated transformation are applied in order. The supported transformations are:

{word}{ucase}

Uppercase <any>

{word}{lcase}

Lowercase <any>

{number}{roman}

Convert number to Roman

{number}{alpha}

Convert number to letters

{repeat}{<num>}

Concatenate <num> copies of <any>

{caesar}{<num>}

Apply caesar encryption

{vigenere}{<key>}

Apply vigenere encryption

{vigenerex}{<key>}

Apply vigenere encryption and include space

The roman transformation can e.g. used to equip the Aephea itemize environment with roman numbering. To get uppercase roman, do this:

\textmap{{number}{roman}{word}{ucase}}{\your_nice_counter}

The alpha transformation maps its argument to a string over the alphabet _a-z, i.e. all the set of all lowercase letters with the underscore added. This set is simply used for counting in base 27, with the underscore playing the role of zero.

Result text The transformed text.

\throw#2: \throw{<towel|error|done>}{<<any>>}

Quit parsing, unwind stack until some occurrence of \catch#2, \try#1 captures this throw.

The throw \throw{done} is also unconditionally caught by \while#2 and \eval#1. If \throw{done} is encapsulated by neither of these four primitives it means that processing of the current file is stopped, and processing at the including file, if applicable, is resumed.

<any> is digested; if it has positive length the result is issued as a diagnostic.

Note Many primitives evaluate one or more of their arguments before use, as indicated in this manual. An occurrence of \throw{done} in such an argument, if not caught, will be treated like an error. It is possible to use \throw{done} in such an instance by encapsulating the argument in \eval#1.

Result text None, affects the result text of the embedding scopes.

\tr#2: \tr{<vararg>}{<<any>>}

<vararg> contains key-value pairs. The accepted keys are from and to which must always occur together, and delete and squash. The values of these keys must be valid translation specifications. This primitive transforms <any> by successively applying translation, deletion and squashing in that order. Only the transformations that are needed need be specified.

The syntax accepted as translation specification is almost fully compliant with the syntax accepted by tr(1), with three exceptions. First, repeats are introduced as [*a*20] rather than [a*20]. Second, ranges can (for now) only be entered as X-Y, not as [X-Y]. X and Y can be entered in either octal or hexadecimal notation (see further below). As an additional feature, the magic repeat operator [*a#] stops on both class and range boundaries. Finally, character specifications can be complemented by preceding them with the caret ^. See further below for examples where these features are used.

Preprocessing The values (not the data <any>) are subjected to UNIX tilde expansion as described in the the zoem manual.

Syntax Specifications may contain ranges of characters such as a-z and 0-9. Posix character classes are allowed. The available classes are

[:alnum:] [:alpha:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:]

Characters can be specified using octal notation, e.g. \012 encodes the newline. Use \173 for the opening curly, \175 for the closing curly, \134 for the backslash, and \036 for the caret if it is the first character in a specification. DON'T use \\, \{, or \} in this case! Hexadecimal notation is written as \x7b (the left curly in this instance).

Result text The expanded <any> subjected to the tr operator as specified.

Example The following was entered in interactive mode.

\tr{ {from}{[:lower:][:upper:][:digit:][:space:][:punct:]} {to}{[*L#][*U#][*D#][*S#][*P#]}}{ !"#$%&'()*+,-./0123456789:;<=>?@ ABCDEFGHIJKLMNOPQRSTUVWXYZ [\\]^_` abcdefghijklmnopqrstuvwxyz \{|\}~]} . ---------------------------------------- SSPPPPPPPPPPPPPPPDDDDDDDDDDPPPPPPPSUUUUUUUUUUUUUUUUUUUUUUUUUUSPPPPPPSLLLLLLLLLLLLLLLLLLLLLLLLLLSPPPPP ---------------------------------------- \tr{ {squash}{^} {from}{[:lower:][:upper:][:digit:][:space:][:punct:]} {to}{[*L#][*U#][*D#][*S#][*P#]}}{ !"#$%&'()*+,-./0123456789:;<=>?@ ABCDEFGHIJKLMNOPQRSTUVWXYZ [\\]^_` abcdefghijklmnopqrstuvwxyz \{|\}~]} . ---------------------------------------- SPDPSUSPSLSP

Note how the magic repeat operator [*#] stops on class boundaries.

\trace#1: \trace{<<int>>}

The argument is expanded and interpreted as an integer. This integer encodes an ensemble of flags controlling the trace output. The different modes are exactly the same as those that can be set from the command line. Refer to the corresponding section for more information. Special values are 0 (switch off all tracing), -1 (switch on all tracing modes in short mode), -2 (switch on all tracing modes in long mode), -3 (switch to the previous tracing value), and -4 emit a listing of tracing bits. The third can be useful to switch tracing on for a short while and then off again if you need to debug your document. Additionally and redundantly, \trace#1 puts the previous tracing value in place.

Result text The previous tracing value.

\try#1: \try{<any>}

This will process the content and output is written in the macro \__zoemput__. Output will be truncated in case \throw#2 was used or an error occurred. The status, currently one of done, towel, or error, is written in the session macro \__zoemstat__.

Result text None.

\undef#1: \undef{<key-sig>}

Deletes the key with signature <key-sig> from the top level dictionary. Complains (but does not fail) if the key does not exist in that dictionary. It is possible to specify that a regular key (i.e. not a dollar key) must be looked up in the global dictionary by prefixing its signature with two single quotes.

Result text None.

\vanish#1: \vanish{<any>}

This will only process the content for its side effects. Any result text is disregarded. This allows easy free-style commenting of sequences of definitions. By comparison, \formatted#1 provides the means to give a formatted presentation of the definitions themselves.

Result text None.

\while#2: \while{<condition>}{<any>}

While <condition> exands to something that is nonzero when interpreted as an integer, <any> is expanded and concatenated to the result text. The following piece of zoem asks the user for an integer and writes all Fibonacci numbers smaller than that integer plus one extra to STDOUT.

\import{ctr.zmm} \: import ctr macros. \def{fib#1}{ \ctrset{a}{1} \ctrset{b}{1} \ctrset{c}{0} \while{\eqt{lq}{\ctrput{c}}{\1}}{ \ctrset{c}{\ctrput{a}} \ctrset{a}{\let{\ctrput{a}+\ctrput{b}}} \write{-}{txt}{\ctrput{c}\|} \ctrset{b}{\ctrset{c}} } } \write{-}{txt}{Enter a number please, } \write{-}{txt}{then press <cr> and <ctl-d>\|} \setx{num}{\finsert{-}} \: this reads from STDIN. \fib{\num}

Note The strings built up by \while#2 are internally concatenated until it is done, so the result from \while#2 can be captured. This will make \while#2 work for \setx#2. If you want to output 100M worth of lines or paragraphs in a while loop, either embed the stuff to be output in a \write#3 call and make sure that no whitespace results from the loop (for example by using \formatted#1), or simply use \whilst#2. With \write#3 you can specify a file name to which results should be output (use \__fnout__ for the current default output file) whereas \whilst#2 simply outputs to the current default stream.

\whilst#2: \whilst{<condition>}{<any>}

While <condition> exands to something that is nonzero when interpreted as an integer, <any> is expanded and immediately output to the current default output stream.

Result text None — everything is sent to the default output stream right away. Output from \while#2 can be captured, i.e. it can be that what is assigned by a \setx#2 invocation.

\write#3: \write{<<file name>>}{<str>}{<<any>>}

The first argument is expanded and used as a file name. It is a fatal error if the file has either not been opened by a previous \write#3 call or cannot be opened for writing. Two special file names or 'streams' are recognized, namely - and stderr. They map to STDOUT and STDERR. The third argument is expanded, filtered, and written to file. The second argument indicates the filter to be used. It must be one of the (literal) strings copy, device, or txt.

The copy filter does not filter anything at all (neither plain scope nor device scope) and does not touch any of the zoem escape sequences remaining after expansion.

The device filter does a full-fledged filtering of both parse scopes. It respects the settings according to the \special#1 primitive. The write primitive associates unique metadata with each file it opens, so at directives such as \N, \W, and \+ for different output files do not interfere with one another. Refer to the Device scope section for more information on at directives.

The txt filter maps \\ to \ (i.e. a single backslash), \~ to a single space, \- to a single hyphen, \, (the atomic separator) to nil, \| to a newline, \{ to {, and \} to }. It copies everything else verbatim.

Result text Technically none, as the output of \write#3 cannot be captured.

\writeto#1: \writeto{<<file-name>>}

Closes the current default output stream, and changes it to point to file <file name>. Useful when splitting a document into chapters, or god forbid, nodes.

Notes If the file name contains a path separator, zoem will refuse to carry on, because this may pose a risk that sensitive files are overwritten - in case someone has written a malicious zoem input file to do just that. If the option --unsafe is used, zoem will query the user what to do. If the option --unsafe-silent is used, zoem will merrily buzz on without querying. The path separator is entirely UNIX-centric, i.e. a forward slash.

Zoem will recognize if \writeto#1 is issued more than once for the same file <file-name>. On the first occassion, it will simply open the file and truncate any previous contents. On the second occasion and onwards, it will append to the file. There is currently no option to vary this behaviour. Zoem will not recognize the fact that different strings might refer to the same file (e.g. foo and ./foo). Whenever it encounters a file name not seen before, it will try to open the file in write mode.

In interactive mode, \writeto#1 has no effect for text entered in plain mode. It does have effect in case \write#3 is issued with \__fnout__ as the file name argument, since \writeto#3 resets the \__fnout__ macro.

Result text None.

\zinsert#1: \zinsert{<<file name>>}

The contents of file <file name> are put in place unaltered enclosed by the \!#1 delay primitive. The contents must necessarily satisfy zoem syntax. If the file can not be opened, the empty string results. See also \finsert#1.

Example

\setx{foo}{\zinsert{mydata}} \setx{foo}{\eval{\zinsert{mydata}}}

In the first case, \foo will contain the exact contents of file mydata. Those contents are first enclosed within the \!#1 primitive by \zinsert#1. The resulting text is evaluated by \setx#2 - the only thing this does is strip the enclosing \!{} scope.

In the second case, \foo will contain the evaluated contents of file mydata, as \eval#1 adds an additional layer of evaluation.

Note This primitive is able to read inline files, unlike \finsert#1.

Result text The contents of file file name or the empty string if file can not be opened.

Pitfalls

This is a young section, with only few entries yet.

•

[\system#3] Beware that the argument/option list (the second argument of \system#3) is encoded as a vararg. If you have a single argument, it is easy to forget the enclosing curlies. In that case, zoem ignores the argument altogether.

Zoem protects the data returned by \system#3. So you may e.g. think (as I once did) that

\system{date}{{%e}{%B}{%Y}}

Is a neat way to create a vararg, but you will end up with something like

\{24\}\{April\}\{2004\}

•

[Package authors] Beware of using and scope within at scope within a \write#3 invocation that uses the copy filter.

\write{file}{copy}{ foo \@{ zut \&{ bar }}}

The bar part will not be evaluated as the copy filter does not apply the filtering stage. If the stuff written is read back in from some other part of the document, or from another document altogether, the bar part will be evaluated in a different context than the one in which it was created.

•

\throw#2 with argument done can be used to halt processing of the current file. Refer to the \throw#2 description for the associated requirements.

Glossary

For user keys, dollar keys, and dictionary stacks, refer to the Dictionary stacks section and the Macro expansion section. For data keys, refer to the Tree data section.

For key signatures and key mention, refer to the Key signatures section. For anonymous keys: the Anonymous keys section.

Session variables are described in the Session keys section.

For varargs, arguments in which a variable number of sub-arguments can be stored, and for blocks: the Of blocks and varargs section.

For plain scope, device scope, at scope, and glyph sequences: The Scope dichotomy and Device scope sections.

For file read and inline files: the File read section.

Sometimes zoem protects or unprotects data. Refer to the Protection section.