Chapter 10
Strictness, Macros and Efficiency
|
|
Programming in a functional language means that one should focus on algorithms and without worrying about all kinds of efficiency details. However, when large applications are being written it may happen that this attitude results in a program that is unacceptably inefficient in time and/or space.
In this Chapter we explain several kinds of annotations and directives that can be defined in Clean. These annotations and directives are designed to give the programmer some means to influence the time and space behavior of Clean applications.
Clean is by default a lazy language: applications will only be evaluated when their results are needed for the final outcome of the program. However, lazy evaluation is in general not very efficient. It is much more efficient to compute function arguments in advance (strict evaluation) when it is known that the arguments will be used in the function body. By using strictness annotations in type definitions the evaluation order of data structures and functions can be changed from lazy to strict. This is explained in Section 10.1.
One can define constant graphs on the global level also known as Constant Applicative Forms (see Section 10.2). Unlike constant functions, these constant graphs are shared such that they are computed only one. This generally reduces execution time possibly at the cost of some heap space needed to remember the shared graph constants.
Macro’s (Section 10.3) are special functions that will already be substituted (evaluated) at compile-time. This generally reduces execution time (the work has already been done by the compiler) but it will lead to an increase of object code.
Clean uses by default a lazy evaluation strategy: a redex is only evaluated when it is needed to compute the final result. Some functional languages (e.g. ML, Harper et al.) use an eager (strict) evaluation strategy and always evaluate all function arguments in advance.
Lazy evaluation has the following advantages (+) / disadvantages (-) over eager (strict) evaluation:
• only those computations which contribute to the final result are computed (for some algorithms this is a clear advantage while it generally gives a greater expressive freedom);
• one can work with infinite data structures (e.g. [1..]);
• it is unknown when a lazy expression will be computed (disadvantage for debugging, for controlling evaluation order);
• strict evaluation is in general much more efficient, in particular for objects of basic types, non-recursive types and tuples and records which are composed of such types;
-/+ in general a strict expression (e.g. 2 + 3 + 4) takes less space than a lazy one, however, sometimes the other way around (e.g. [1..1000]);
Each expression in a function definition is considered to be either strict (appearing in a strict context: it has to be evaluated to strong root normal form) or lazy (appearing in a lazy context : not yet to be evaluated to strong root normal form) The following rules specify whether or not a particular expression is lazy or strict:
• a non-variable pattern is strict;
• an expression in a guard is strict;
• the expressions specified in a strict let-before expression are strict;
• the root expression is strict;
• the arguments of a function or data constructor in a strict context are strict when these arguments are being annotated as strict in the type definition of that function (manually or automatically) or in the type definition of the data constructor;
• all the other expressions are lazy.
Evaluation of a function will happen in the following order: patterns, guard, expressions in a strict let before expression, root expression (see also 3.1).
The space occupied by Clean structures depend on the kind of structures one is using, but also depends on whether these data structures appear in a strict or in a lazy context. To understand this one has to have some knowledge about the basic implementation of Clean (see Plasmeijer and Van Eekelen, 1993).
Graphs (see Chapter 1) are stored in a piece of memory called the heap. The amount of heap space needed highly depends on the kind of data structures that are in use. Graph structures that are created in a lazy context can occupy more space than graphs created in a strict context. The garbage collector in the run-time system of Clean automatically collects graphs that are not being used. The arguments of functions being evaluated are stored on a stack. There are two stacks: the A-stack, which contains references to graph nodes stored in the heap and the BC-stack which contains arguments of basic type and return addresses. Data structures in a lazy context are passed via references on the A-stack. Data structures of the basic types (Int, Real, Char or Bool) in a strict context are stored on the B-stack or in registers. This is also the case for these strict basic types when they are part of a record or tuple in a strict context.
Data structures living on the B-stack are passed unboxed. They consume less space (because they are not part of a node) and can be treated much more efficiently. When a function is called in a lazy context its data structures are passed in a graph node (boxed). The amount of space occupied is also depending on the arity of the function.
In the table below the amount of space consumed in the different situations is summarised (for the lazy as well as for the strict context). For the size of the elements one can take the size consumed in a strict context.
Type |
Arity |
Lazy context (bytes) |
Strict context (bytes) |
Comment |
Int, Bool |
- |
8 |
4 |
|
Int (0≤n≤32), Char |
- |
- |
4 |
node is shared |
Real |
- |
12 |
8 |
|
Small Record |
n |
4 + S size elements |
S size elements |
total length≤12 |
Large Record |
n |
8 + S size elements |
S size elements |
|
Tuple |
2 |
12 |
S size elements |
|
|
>2 |
8 + 4*n |
S size elements |
|
{a} |
n |
20 + 4*n |
12 + 4*n |
|
!Int |
n |
20 + 4*n |
12 + 4*n |
|
!Bool,!Char |
n |
20 + 4*ciel(n/4) |
12 + 4*ciel(n/4) |
|
!Real |
n |
20 + 8*n |
12 + 8*n |
|
!Tuple, !Record |
n |
20 + size rec/tup*n |
12 + size rec/tup*n |
|
Hnf |
0 |
- |
4 + size node |
node is shared |
|
1 |
8 |
4 + size node |
|
|
2 |
12 |
4 + size node |
also for [a] |
|
>2 |
8 + 4*n |
4 + size node |
|
Pointer to node |
- |
4 |
4 |
|
Function |
0,1,2 |
12 |
- |
|
|
>3 |
4 + 4*n |
- |
|
Strict arguments of functions can sometimes be handled much more efficiently than lazy arguments, in particular when the arguments are of basic type.
Example: functions with strict arguments of basic type are more efficient.
Ackerman:: !Int !Int -> Int
Ackerman 0 j = j+1
Ackerman i 0 = Ackerman (i-1) 1
Ackerman i j = Ackerman (i-1) (Ackerman i (j-1))
The computation of a lazy version of Ackerman 3 7 takes 14.8 seconds + 0.1 seconds for garbage collection on an old fashion MacII (5Mb heap). When both arguments are annotated as strict (which in this case will be done automatically by the compiler) the computation will only take 1.5 seconds + 0.0 seconds garbage collection. The gain is one order of magnitude. Instead of rewriting graphs the calculation is performed using stacks and registers where possible. The speed is comparable with a recursive call in highly optimised C or with the speed obtainable when the function was programmed directly in assembly.
So, lazy evaluation gives a notational freedom (no worrying about what is computed when) but it might cost space as well as time. In Clean the default lazy evaluation can therefore be turned into eager evaluation by adding strictness annotations to types.
Strict |
= |
! |
This can be done in several ways:
• The Clean compiler has a built-in strictness analyzer based on abstract reduction (Nöcker, 1993) (it can be optionally turned off). The analyzer searches for strict arguments of a function and annotate them internally as strict (see 10.1.1). In this way lazy arguments are automatically turned into strict ones. This optimization does not influence the termination behavior of the program. It appears that the analyzer can find much information. The analysis itself is quite fast.
• The strictness analyzer cannot find all strict arguments. Therefore one can also manually annotate a function as being strict in a certain argument or in its result (see 10.1.1).
• By using strictness annotations, a programmer can define (partially) strict data structures (Nöcker and Smetsers, 1993; see 10.1.3). Whenever such a data structure occurs in a strict context (see 10.1.1), its strict components will be evaluated.
• The order of evaluation of expressions in a function body can also be changed from lazy to strict by using a strict let-before expression (see 3.5.4).
One has to be careful though. When a programmer manually changes lazy evaluation into strict evaluation, the termination behavior of the program might change. It is only safe to put strictness annotations in the case that the function or data constructor is known to be strict in the corresponding argument which means that the evaluation of that argument in advance does not change the termination behavior of the program. The compiler is not able to check this.
Constant graphs can also be defined on a global level (for local constant graphs see 3.6).
GraphDef |
= |
Selector =[:] GraphExpr ; |
A global graph definition defines a global constant (closed) graph, i.e. a graph which has the same scope as a global function definition (see 2.1). The selector variables that occur in the selectors of a global graph definition have a global scope just as globally defined functions.
Special about global graphs (in contrast with local graphs) is that they are not garbage collected during the evaluation of the program A global graph can be compared with a CAF (Constant Applicative Form): its value is computed at most once and remembered at run-time. A global graph can save execution-time at the cost of permanent space consumption.
Syntactically the definition of a graph is distinguished from the definition of a function by the symbol which separates left-hand side from right-hand side: "=:" is used for graphs while "=>" is used for functions. However, in general the more common symbol "=" is used for both type of definitions. Generally it is clear from the context what is meant (functions have parameters, selectors are also easy recognisable). However, when a simple constant is defined the syntax is ambiguous (it can be a constant function definition as well as a constant graph definition).
To allow the use of the "=" whenever possible, the following rule is followed. Locally constant definitions are by default taken to be graph definitions and therefore shared, globally they are by default taken to be function definitions (see 3.1) and therefore recomputed. If one wants to obtain a different behavior one has to explicit state the nature of the constant definition (has it to be shared or has it to be recomputed) by using "=:" (on the global level, meaning it is a constant graph which is shared) or "=>" (on the local level, meaning it is a constant function and has to be recomputed).
Global constant graph versus global constant function definition: biglist1 is a graph which is computed only once, biglist3 and biglist2 is a constant function which is computed every time it is applied.
biglist1 = [1..10000] // a constant function (if defined globally)
biglist2 =: [1..10000] // a graph
biglist3 => [1..10000] // a constant function
A graph saves execution-time at the cost of space consumption. A constant function saves space at the cost of execution time. So, use graphs when the computation is time-consuming while the space consumption is small and constant functions in the other case.
Macros are functions (rewrite rules) which are applied at compile-time instead of at run-time. Macro’s can be used to define constants, create in-line substitutions, rename functions, do conditional compilation etc. With a macro definition one can, for instance, assign a name to a constant such that it can be used as pattern on the left-hand side of a function definition.
At compile-time the right-hand side of the macro definition will be substituted for every application of the macro in the scope of the macro definition. This saves a function call and makes basic blocks larger (see Plasmeijer and Van Eekelen, 1993) such that better code can be generated. A disadvantage is that also more code will be generated. Inline substitution is also one of the regular optimisations performed by the Clean compiler. To avoid code explosion a compiler will generally not substitute big functions. Macros give the programmer a possibility to control the substitution process manually to get an optimal trade-off between the efficiency of code and the size of the code.
MacroDef |
= |
[MacroFixityDef] |
|
|
DefOfMacro |
MacroFixityDef |
= |
(FunctionName) [Fix][Prec] ; |
DefOfMacro |
= |
Function {Variable} :== FunctionBody ; |
|
|
[LocalFunctionAltDefs] |
The compile-time substitution process is guaranteed to terminate. To ensure this some restrictions are imposed on Macro’s (compared to common functions). Only variables are allowed as formal argument. A macro rule always consists of a single alternative. Furthermore,
8) Macro definitions are not allowed to be cyclic to ensure that the substitution process terminates.
Example of a macro definition.
Black :== 1 // Macro definition
White :== 0 // Macro definition
:: Color :== Int // Type synonym definition
Invert:: Color -> Color // Function definition
Invert Black = White
Invert White = Black
Example: macro to write (a?b) for lists instead of [a:b] and its use in the function map.
(?) infixr 5 // Fixity of Macro
(?) h t :== [h:t] // Macro definition of operator
map:: (a -> b) [a] -> [b]
map f (x?xs) = f x ? map f xs
map f [] = []
Notice that macros can contain local function definitions. These local definitions (which can be recursive) will also be substituted inline. In this way complicated substitutions can be achieved resulting in efficient code.
Example: macros can be used to speed up frequently used functions. See for instance the definition of the function foldl in StdList.
foldl op r l :== foldl r l // Macro definition
where
foldl r [] = r
foldl r [a:x] = foldl (op r a) x
sum list = foldl (+) 0 list
After substitution of the macro foldl a very efficient function sum will be generated by the compiler:
sum list = foldl 0 list
where
foldl r [] = r
foldl r [a:x] = foldl ((+) r a) x
The expansion of the macros takes place before type checking. Type specifications of macro rules are not possible. When operators are defined as macros, fixity and associativity can be defined.
Here are some additional suggestions how to make your program more efficient:
• Use the Clean profiler to find out which frequently called functions are consuming a lot of space and/or time. If you modify your program, these functions are the one to have a good look at.
• Transform a recursive function to a tail-recursive function.
• It is better to accumulate results in parameters instead of in the right-hand side results.
• It is better to use records instead of tuples.
• Arrays can be more efficient than lists since they allow constant access time on their elements and can be destructive updated.
• When functions return multiple ad-hoc results in a tuple put these results in a strict tuple instead (can be indicated in the type).
• Use strict data structures whenever possible.
• Export the strictness information to other modules (the compiler will warn you if you don’t).
• Make function strict in its arguments whenever possible.
• Use macros for simple constant expressions or frequently used functions.
• Use CAF’s and local graphs to avoid recalculation of expressions.
• Selections in a lazy context can better be transformed to functions which do a pattern match.
• Higher order functions are nice but inefficient (the compiler will try to convert higher order function into first order functions).
• Constructors of high arity are inefficient.
• Increase the heap space in the case
that the garbage collector takes place too often.