In the previous Chapter on overloading it is explained how type classes can be used to define different functions or operators that have the same name and perform similar tasks albeit on objects of different types. These tasks are supposed to be similar, but they are in general not exactly the same. The corresponding function bodies are often slightly different because the data structures on which the functions work differ. As a consequence, one has to explicitly specify an implementation for every concrete instance of an overloaded function.
In the example above the programmer explicitly defines equality on lists and trees. For each new data type that we want to compare for equality, we have to define a similar instance. Moreover, if such a data type changes, the corresponding instance should be changed accordingly as well. Though the instances are similar, they are not the same since they operate on different data types.
What is similar in these instances? Both instances perform pattern match on the arguments. If the constructors are the same, they compare constructor arguments pairwise. For constructors with no arguments True is returned. For constructors with several arguments, the results on the arguments are combined with &&. If constructors are not the same, False is returned. In other words, equality on a data type is defined by looking at the structure of the data type. More precisely, it is defined by induction on the structure of types. There are many more functions than just equality that expose the same kind of similarity in their instances. Below is the mapping function that can be defined for type constructors of kind *->*.
Again, both instances are similar: they perform pattern match on constructors and pairwise mapping of their arguments. The results are packed back in the same constructor.
Generic programming enables the programmer to capture this kind of similarities and define a single implementation for all instances of such a class of functions. To do so we need a universal structural representation of all data types. A generic function can then be defined ones and forall on that universal representation. A specific type is handled using its structural representation. In the rest of this section we will explain roughly how it all works. First we focus on the universal structural representation of types; then we show how a generic function can be defined on it; and at last we show how the generic definition is specialized to a concrete type. See also (Alimarine & Plasmeijer, 2001, A Generic Programming Extension for Clean).
In CLEAN data types are algebraic: they are built in of sums of products of type terms. For example, the List type is a sum of two things: nullary product for Nil and a binary product of the head and the tail for Cons. The Tree type is a sum of unary product of elements for Leaf and binary product of trees for Node. Having this in mind we can uniformly represent CLEAN data types using binary sums and products.
The UNIT type represents a nullary product. The PAIR type is a binary product. The EITHER type is a binary sum. We do not need a type for nullary sums, as in CLEAN data types have at least one alternative. As one can imagine, we want sum-product representation of types to be equivalent (i.e. isomorphic) to the original data types. In the following example we give representations for List and Tree with conversion functions that implement the required isomorphisms.
As we said, all algebraic types can be represented in this way. Basic types are not algebraic, but there are only few of them: they are represented by themselves. Arrow types are represented by the arrow type constructor (->). To define a function generically on the structure of a type it is enough to define instances on the components the structure can be built from. These are binary sums, binary products, basic types, and the arrow type.
Having defined instances on the structure components we can generate instances for all other types automatically.
Not only instances of one class expose similarity in the definition of instances.
The instance in the example above also works in a similar way as the instance of Functor: it also maps substructures component-wise. Both Functor and Bifunctor provide mapping function. The difference is that one provides mapping for type constructors of kind *->* and the other for type constructors of kind *->*->*. In fact instances of mapping functions for all kinds are similar.
The generic feature of CLEAN is able to derive instances for types of different kinds from a single generic definition. Such generic functions are known as kind-indexed generic functions (Alimarine & Plasmeijer, A Generic Programming Extension for Clean). Actually, a generic function in CLEAN stands for a set of classes and instances of the same function for different kinds. Since CLEAN allows function to be used in a Curried manner (see 3.7.1), the compiler is in general not able to deduce which kind of map is meant. Therefore the kind of a generic function application has to be specified explicitly.
To define a generic function the programmer has to provide to things: the base type of the generic function and the base cases (instances) of the generic function.
GenericsDef | = | GenericDef ; | ||
| | GenericCase; | |||
| | DeriveDef ; |
GenericDef | = | generic FunctionName TypeVariable+ [GenericDependencies] :: FunctionType | ||
GenericDependencies | = | | {FunctionName TypeVariable+ }-list | ||
GenericCase | = | FunctionName {|GenericTypeArg|} {Pattern}+ = FunctionBody | ||
GenericTypeArg | = | GenericMarkerType [of Pattern] | ||
| | TypeName | |||
| | TypeVariable | |||
GenericMarkerType | = | CONS | ||
| | OBJECT | |||
| | RECORD | |||
| | FIELD |
In the generic definition, recognised by the keyword generic, first the type of the generic function has to be specified. The type variables mentioned after the generic function name are called generic type variables. Similar to type classes, they are substituted by the actual instance type. A generic definition actually defines a set of type constructor classes. There is one class for each possible kind in the set. Such a generic function is sometimes called a kind-indexed class. The classes are generated using the type of the generic function. The classes always have one class variable, even if the generic function has several generic variables. The reason for this restriction is that the generic function can be defined by induction on one argument only.
Roughly the algorithm for deriving classes is the following.
The programmer provides a set of basic cases for a generic function. Based on its basic cases a generic function can be derived for other types. See the next section for detailed discussion on types for which a generic function can and cannot be derived. Here we discuss what can be specified as the type argument in the definition of a generic base case
A Generic structural representation type: UNIT, PAIR, EITHER, CONS, OBJECT, RECORD and FIELD. The programmer must always provide these cases as they cannot be derived by the compiler. Without these cases a generic function cannot be derived for basically any type. | |
Basic type. If a generic function is supposed to work on types that involve basic types, instances for basic types must be provided. | |
Type variable. Type variable stands for all types of kind *. If a generic function has a case for a type variable it means that by default all types of kind star will be handled by that instance. The programmer can override the default behavior by defining an instance on a specific type. | |
Arrow type (->). If a generic function is supposed to work with types that involve the arrow type, an instance on the arrow type has to be provided. | |
Type constructor. A programmer may provide instances on other types. This may be needed for two reasons: |
1. | The instance cannot be derived for the reasons explained in the next section. | |
2. | The instance can be generated, but the programmer is not satisfied with generic behavior for this type and wants to provide a specific behavior. |
The user has to tell the compiler which instances of generic functions on which types are to be generated. This is done with the derive clause.
DeriveDef | = | derive FunctionName {DerivableType}-list | ||
| | derive class ClassName {DerivableType}-list | |||
DerivableType | = | TypeName | ||
| | PredefinedTypeConstructor |
A generic function can be automatically specialized only to algebraic types that are not abstract in the module where the derive directive is given. A generic function cannot be automatically derived for the following types:
Generic structure representation types: UNIT, PAIR, EITHER, CONS, OBJECT, RECORD, FIELD. See also the previous section. It is impossible to derive instances for these types automatically because they are themselves used to build structural representation of types that is needed to derive an instance. Deriving instances for then would yield non-terminating cyclic functions. Instances on these types must be provided for the user. Derived instances of algebraic types call these instances. | |
Arrow type (->). An instance on the arrow type has to be provided by the programmer, if he or she wants the generic function to work with types containing arrows. | |
Basic types like Int, Char, Real, Bool. In principle it is possible to represent all these basic types as algebraic types but that would be very inefficient. The user can provide a user defined instance on a basic type. | |
Array types as they are not algebraic. The user can provide a user defined instance on an array type. | |
Synonym types. The user may instead derive a generic function for the types involved on the right-hand-side of a type synonym type definition. | |
Abstract types. The compiler does not know the structure of an abstract data type needed to derive the instance. | |
Quantified types. The programmer has to manually provide instances for type definitions with universal and existential quantification. |
The compiler issues an error if there is no required instance for a type available. Required instances are determined by the overloading mechanism.
derive class of a class and type derives for the type all generic functions that occur directly in the context of the class definition (see 6.1).
The generic function in Clean stands for a set of overloaded functions. There is one function in the set for each kind. When a generic function is applied, the compiler must select one overloaded function in the set. The compiler cannot derive the required kind automatically. For this reason a kind has to be provided explicitly at each generic function application. Between the brackets {| and |} one can specify the intended kind. The compiler then resolves overloading of the selected overloaded function as usually.
GenericAppExpression | = | FunctionName {|TypeKind|} GraphExpr | ||
TypeKind | = | * | ||
| | TypeKind -> TypeKind | |||
| | IntDenotation | |||
| | (TypeKind) | |||
| | {|TypeKind|} |
As it was outlined above, the structural representation of types lacks information about specific constructors and record fields, such as name, arity etc. This is because this information is not really part of the structure of types: different types can have the same structure. However, some generic functions need this information. Consider, for example a generic toString function that converts a value of any type to a string. It needs to print constructor names. For that reason the structural representation of types is extended with special constructor and field markers that enable us to pass information about fields and constructors to a generic function.
The markers themselves do not contain any information about constructors and fields. Instead, the information is passed to instances of a generic function on these markers.
GenericTypeArg | = | GenericMarkerType [of Pattern] | ||
| | TypeName | |||
| | TypeVariable | |||
GenericMarkerType | = | CONS | ||
| | OBJECT | |||
| | RECORD | |||
| | FIELD |
Uniqueness is very important in Clean. The generic extension can deal with uniqueness. The mechanism that derives generic types for different kinds is extended to deal with uniqueness information. Roughly speaking it deals with uniqueness attribute variables in the same way as it does with normal generic variables.
Uniqueness information specified in the generic function is used in typing of generated code.
Current limitations with uniqueness: generated types for higher order types require local uniqueness inequalities which are currently not supported.
Generic type variables may not be used in the context of a generic function, for example:
generic f1 a :: a -> Int | Eq a; // incorrect
generic f2 a :: a -> Int | gEq{|*|} a; // incorrect
However generic function definitions that depend on other generic functions can be defined by adding a | followed by the required generic functions, separated by commas.
GenericDef | = | generic FunctionName TypeVariable+ [GenericDependencies] :: FunctionType | ||
GenericDependencies | = | | {FunctionName TypeVariable+ }-list |
For example, to define h using g1 and g2:
generic g1 a :: a -> Int;
generic g2 a :: a -> Int;
generic h a | g1 a, g2 a :: a -> Int;
h{|PAIR|} ha g1a g2a hb g1b g2b (PAIR a b)
= g1a a+g2b b;
The algorithm for generating classes described in 7.2 is extended to add the dependent generic function arguments after each argument added by (GenFun{|k|} b1 .. bn).
Generic declarations and generic cases - both provided and derived - can be exported from a module. Exporting a generic function is done by giving the generic declaration in the DCL module. Exporting provided and derived generic cases is done by means of derive.
GenericExportDef | = | GenericDef ; | ||
| | derive FunctionName {DeriveExportType [UsedGenericDependencies]}-list ; | |||
| | derive class ClassName {DerivableType}-list ; | |||
| | FunctionName {|GenericExportTypeArg|} {Pattern}+ = FunctionBody | |||
GenericDef | = | generic FunctionName TypeVariable+ :: FunctionType |
DeriveExportType | = | TypeName | ||
| | GenericMarkerType [of UsedGenericInfoFields] | |||
| | PredefinedTypeConstructor | |||
| | TypeVariable | |||
UsedGenericInfoFields | = | {[{FieldName}-list]} | ||
| | Variable | |||
UsedGenericDependencies | = | with {UsedGenericDependency} | ||
UsedGenericDependency | = | Variable | ||
| | _ | |||
GenericExportTypeArg | = | GenericMarkerType [of Pattern] | ||
| | UNIT | PAIR | EITHER |
A generic function cannot be derived for an abstract data type, but it can be derived in the module where the abstract type defined. Thus, when one may export derived instance along with the abstract data type.
Additional information can be specified to make it possible for the compiler to optimize the generated code.
The used generic info fields for generic instances of OBJECT, CONS, RECORD and FIELD can be specified by adding: of {FieldName}-list, at the end of the derive statement. |
For example for:
gToString {|FIELD of {gfd_name}|} fx sep (FIELD x) = gfd_name +++ "=" +++ fx x
add: of {gcd_name} in the definition module:
derive gToString FIELD of {gfd_name}
and the function will be called with just a gfd_name, instead of a GenericFieldDescriptor record.
Whether generic function dependencies for generic instances are used can be specified by adding: with followed by the list of dependencies. An _ indicates an unused dependency, a lower case identifier a (possibly) used dependency. |
For example for:
generic h a | g1 a, g2 a :: a -> Int;
h {|OBJECT of {gtd_name}|} _ g1_a _ (OBJECT a) = g1_a gtd_name a;
Add: with _ g1 _ in the definition module:
derive h OBJECT of {gtd_name} with _ g1 _;
h for OBJECT will be called without a function argument for h (for a of OBJECT), with g1 and without g2, because h and g2 are not used by the implementation.
Generic cases for the generic representation types (UNIT, PAIR, EITHER, OBJECT, CONS, RECORD, FIELD) may be defined in definition modules (instead of a derive) using the same syntax as used in implementation modules. |