|
|
THIS CHAPTER IS UNDER CONSTRUCTION!!!
In the previous Chapter on overloading it is explained how type classes can be used to define different functions or operators that have the same name and perform similar tasks albeit on objects of different types. These tasks are supposed to be similar, but they are in general not exactly the same. The corresponding function bodies are often slightly different because the data structures on which the functions work differ. As a consequence, one has to explicitly specify an implementation for every concrete instance of an overloaded function.
Equality class. The equality function on lists and trees. The programmer has to specify explicitly the function bodies for each concrete instantiation of equality.
:: List a = Nil a
| Cons a (List a)
:: Tree a = Leaf a
| Node (Tree a) (Tree a)
class Eq a
where
(==) infix 2 :: a a -> Bool
instance Eq (List a) | Eq a
where
(==) infix 2 :: (List a) (List a) -> Bool | Eq a
(==) Nil Nil = True
(==) (Cons x xs) (Cons x xs)= x == y && xs == ys
(==) _ _ = False
instance Tree a | Eq a
where
(==) infix 2 :: (Tree a) (Tree a) -> Bool | Eq a
(==) (Leaf x) (Leaf y) = x == y
(==) (Node lx rx) (Node ly ry) = lx == ly && rx == ry
(==) _ _ = False
In the example above the programmer explicitly defines equality on lists and trees. For each new data type that we want to compare for equality, we have to define a similar instance. Moreover, if such a data type changes, the corresponding instance should be changed accordingly as well. Though the instances are similar, they are not the same since they operate on different data types.
What is similar in these instances? Both instances perform pattern match on the arguments. If the constructors are the same, they compare constructor arguments pairwise. For constructors with no arguments True is returned. For constructors with several arguments, the results on the arguments are combined with &&. If constructors are not the same, False is returned. In other words, equality on a data type is defined by looking at the structure of the data type. More precisely, it is defined by induction on the structure of types. There are many more functions than just equality that expose the same kind of similarity in their instances. Below is the mapping function that can be defined for type constructors of kind *->*.
The Functor provides the mapping function for a type constructor of kind *->* (see also 6.4).
class Functor f
where
fmap :: (a -> b) (f a) -> (f b)
instance Functor List
where
fmap :: (a -> b) (List a) -> (List b)
fmap f Nil = Nil
fmap f (Cons x xs) = Cons (f x) : Cons (fmap f xs)
instance Functor Tree
where
fmap :: (a -> b) (Tree a) -> (Tree b)
fmap f (Leaf a) = Leaf (f a)
fmap f (Node l r) = Node (fmap f l) (fmap f r)
Again, both instances are similar: they perform pattern match on constructors and pairwise mapping of their arguments. The results are packed back in the same constructor.
Generic programming enables the programmer to capture this kind of similarities and define a single implementation for all instances of such a class of functions. To do so we need a universal structural representation of all data types. A generic function can then be defined ones and forall on that universal representation. A specific type is handled using its structural representation. In the rest of this section we will explain roughly how it all works. First we focus on the universal structural representation of types; then we show how a generic function can be defined on it; and at last we show how the generic definition is specialized to a concrete type. See also (Alimarine & Plasmeijer, 2001, A Generic Programming Extension for Clean).
In Clean data types are algebraic: they are built in of sums of products of type terms. For example, the List type is a sum of two things: nullary product for Nil and a binary product of the head and the tail for Cons. The Tree type is a sum of unary product of elements for Leaf and binary product of trees for Node. Having this in mind we can uniformly represent Clean data types using binary sums and products.
Binary sum and product types defined in StdGeneric.dcl. These types are needed to represent Clean types as sums of products of types for the purpose of generic programming.
:: UNIT a = UNIT a
:: PAIR a b = PAIR a b
:: EITHER l r = LEFT l | RIGHT r
The UNIT type represents a nullary product. The PAIR type is a binary product. The EITHER type is a binary sum. We do not need a type for nullary sums, as in Clean data types have at least one alternative. As one can imagine, we want sum-product representation of types to be equivalent (i.e. isomorphic) to the original data types. In the following example we give representations for List and Tree with conversion functions that implement the required isomorphisms.
Sum-product representation of the structure of List and Tree with conversion functions that implement isomorphisms between the types and their sum-product representations.
:: ListS a :== EITHER UNIT (PAIR a (List a))_
listToStruct :: (List a) -> _ListS a
listToStruct Nil = LEFT UNIT
listToStruct (Cons x xs) = RIGHT (PAIR x xs)
listFromStruct :: (ListS a) -> _List a
listFromStruct (LEFT UNIT) = Nil
listFromStruct (RIGHT (PAIR x xs) = Cons x xs
:: TreeS a :== EITHER a (PAIR (Tree a) (Tree a))
treeToStruct :: (Tree a) -> _TreeS a
treeFromStruct :: (TreeS a) -> _Tree a
As we said, all algebraic types can be represented in this way. Basic types are not algebraic, but there are only few of them: they are represented by themselves. Arrow types are represented by the arrow type constructor (->). To define a function generically on the structure of a type it is enough to define instances on the components the structure can be built from. These are binary sums, binary products, basic types, and the arrow type.
Equality on sums, products and primitive types. Equality cannot be feasibly defined for the arrow type, so the instance is omitted.
instance UNIT
where
(==) infix 2 :: UNIT UNIT -> Bool
(==) UNIT UNIT = True
instance PAIR a b | Eq a & Eq b
where
(==) infix 2 :: (PAIR a b) (PAIR a b) -> Bool
(==) (PAIR a1 b1) (PAIR a2 b2) = a1 == a2 && b1 == b2
instance EITHER a b | Eq a & Eq b
where
(==) infix 2 :: (EITHER a b) (EITHER a b) -> Bool
(==) (LEFT x) (Leaf y) = x == y
(==) (RIGHT x) (RIGHT y) = x == y
(==) x_ y_ = False
instance Int
where
(==) infix 2 :: Int Int -> Bool
(==) x y = eqInt x y // primitive equality on integers
Having defined instances on the structure components we can generate instances for all other types automatically.
Equality for lists and trees that can be automatically generated.
Arguments are first converted to the structural representations which are then compared.
instance Eq (List a) | Eq a
where
(==) infix 2 :: (List a) (List a) -> Bool | Eq a
(==) xs ys = listToStruct xs == listToStruct ys
instance Tree a | Eq a
where
(==) infix 2 :: (Tree a) (Tree a) -> Bool | Eq a
(==) xs ys = treeToStruct xs == treeToStruct ys
Not only instances of one class expose similarity in the definition of instances.
The Bifunctor provides the mapping function for a type constructor of kind *->*->*. Instances are defined in a similar way to instances of Functor.
:: Tree2 a b = Tip a
| Bin b (Tree a b) (Tree a b)
class Bifunctor f
where
bmap :: (a1 -> b1) (a2 -> b2) (f a1 a2) -> (f b1 b2)
instance Bifunctor Tree
where
bmap :: (a1 -> b1) (a2 -> b2) (Tree2 a1 a2) -> (Tree b1 b2)
bmap f1 f2 (Tip x) = Tip (f1 x)
bmap f1 f2 (Bin x l r) = Bin (f2 x) (bmap f1 f2 l) (bmap f1 f2 r)
The instance in the example above also works in a similar way as the instance of Functor: it also maps substructures component-wise. Both Functor and Bifunctor provide mapping function. The difference is that one provides mapping for type constructors of kind *->* and the other for type constructors of kind *->*->*. In fact instances of mapping functions for all kinds are similar.
The generic feature of Clean is able to derive instances for types of different kinds from a single generic definition. Such generic functions are known as kind-indexed generic functions (Alimarine & Plasmeijer, A Generic Programming Extension for Clean). Actually, a generic function in Clean stands for a set of classes and instances of the same function for different kinds. Since Clean allows function to be used in a Curried manner (see 3.7.1), the compiler is in general not able to deduce which kind of map is meant. Therefore the kind of a generic function application has to be specified explicitly.
To define a generic function the programmer has to provide to things: the base type of the generic function and the base cases (instances) of the generic function.
GenericsDef |
= |
GenericDef ; |
|
| |
GenericCase; |
|
| |
DeriveDef ; |
GenericDef |
= |
generic FunctionName TypeVariable+ :: FunctionType |
GenericCase |
= |
FunctionName {|GenericTypeArg|} {Pattern}+ = FunctionBody |
GenericTypeArg |
= |
CONS [of {Pattern}] |
|
| |
FIELD [of {Pattern}] |
|
| |
TypeConstructorName |
|
| |
TypeVariable |
Example. The generic definition of equality.
generic gEq a :: a a -> Bool
gEq {|Int|} x y = x == y
gEq {|Char|} x y = x == y
gEq {|Bool|} x y = x == y
gEq {|Real|} x y = x == y
gEq {|UNIT|} UNIT UNIT = True
gEq {|PAIR|} eqx eqy (PAIR x1 y1) (PAIR x2 y2) = eqx x1 x2 && eqy y1 y2
gEq {|EITHER|} eql eqr (LEFT x) (LEFT y) = eql x y
gEq {|EITHER|} eql eqr (RIGHT x) (RIGHT y) = eqr x y
gEq {|EITHER|} eql eqr x y = False
gEq {|CONS|} eq (CONS x) (CONS y) = eq x y
gEq {|FIELD|} eq (FIELD x) (FIELD y) = eq x y
Example. The generic definition of map.
generic gMap a b :: a -> b
gMap {|c|} x = x
gMap {|PAIR|} fx fy (PAIR x y) = PAIR (fx x) (fy y)
gMap {|EITHER|} fl fr (LEFT x) = LEFT (fl x)
gMap {|EITHER|} fl fr (RIGHT x) = RIGHT (fr x)
gMap {|CONS|} fx (CONS x) = CONS (fx x)
gMap {|FIELD|} fx (FIELD x) = FIELD (fx x)
Bidirectional mapping/ illustrate arrows
generic gMap a b :: a -> b
gMap {|c|} x = x
gMap {|PAIR|} fx fy (PAIR x y) = PAIR (fx x) (fy y)
gMap {|EITHER|} fl fr (LEFT x) = LEFT (fl x)
gMap {|EITHER|} fl fr (RIGHT x) = RIGHT (fr x)
gMap {|CONS|} fx (CONS x) = CONS (fx x)
gMap {|FIELD|} fx (FIELD x) = FIELD (fx x)
In the generic definition, recognised by the keyword generic, first the type of the generic function has to be specified. The type variables mentioned after the generic function name are called generic type variables. Similar to type classes, they are substituted by the actual instance type. A generic definition actually defines a set of type constructor classes. There is one class for each possible kind in the set. Such a generic funcion is sometimes called a kind-indexed class. The classes are generated using the type of the generic function. The classes always have one class variable, even if the generic function has several generic variables. The reason for this restriction is that the generic function can be defined by induction on one argument only.
Classes that are automatically generated for the generic map function given above.
class gMap{|*|} t :: t -> t
class gMap{|*->*|} t :: (a -> b) (t a) -> t b
class gMap{|*->*->*|} t :: (a1 -> b1) (a2 -> b2) (t a1 a2) -> t b1 b2
…
Roughly the algorithm for deriving classes is the following.
Algorithm for generating classes.
Suppose we have a generic function genFun with type GenFun.
:: GenFun a1 … an :== …
Generic genFun a1 … an :: GenFun a1 .. an
A class for kind k.
class genFun{|k|} t :: GenFun{|k|} t … t
Is derived by induction on the structure of kind
:: GenFun{|*|} a1 … an :== GenFun a1 … an
:: GenFun{|k->l|} a1 … an :==
A.b1 … bn: (GenFun{|k|} b1 … bn) -> GenFun{|l|} (a1 b1) … (an bn)
The programmer provides a set of basic cases for a generic function. Based on its basic cases a generic function can be derived for other types. See the next section for detailed discussion on types for which a generic function can and cannot be derived. Here we discuss what can be specified as the type argument in the definition of a generic base case
A Generic structural representation type: UNIT, PAIR, EITHER, CONS and FIELD. The programmer must always provide these cases as they cannot be derived by the compiler. Without these cases a generic function cannot be derived for basically any type.
Basic type. If a generic function is supposed to work on types that involve basic types, instances for basic types must be provided.
Type variable. Type variable stands for all types of kind *. If a generic function has a case for a type variable it means that by default all types of kind star will be handled by that instance. The programmer can override the default behavior by defining an instance on a specific type.
Arrow type (->). If a generic function is supposed to work with types that involve the arrow type, an instance on the arrow type has to be provided.
Type constructor. A programmer may provide instances on other types. This may be needed for two reasons:
1. The instance cannot be derived for the reasons explained in the next section.
2. The instance can be generated, but the programmer is not satisfied with generic behavior for this type and wants to provide a specific behavior.
The user has to tell the compiler instances of which generic functions on which types are to be generated. This is done with the derive clause.
DeriveDef |
= |
derive FunctionName TypeConstructorName+ |
Deriving instances of generic mapping and generic equality for List , Tree and standard list
derive gEq List, Tree, []
derive gMap List, Tree, []
A generic function can be automatically specialized only to algebraic types that are not abstract in the module where the derive directive is given. A generic function cannot be automatically derived for the following types:
Generic structure representation types: PAIR, EITHER, UNIT, CONS, FIELD. See also the previous section. It is impossible to derive instances for these types automatically because they are themselves used to build structural representation of types that is needed to derive an instance. Deriving instances for then would yield non-terminating cyclic functions. Instances on these types must be provided for the user. Derived instances of algebraic types call these instances.
Arrow type (->). An instance on the arrow type has to be provided by the programmer, if he or she wants the generic function to work with types containing arrows.
Basic types like Int, Char, Real, Bool. In principle it is possible to represent all these basic types as algebraic types but that would be very inefficient. The user can provide a user defined instance on a basic type.
Array types as they are not algebraic. The user can provide a user defined instance on an array type.
Synonym types. The user may instead derive a generic function for the types involved on the right-hand-side of a type synonym type definition.
Abstract types. The compiler does not know the structute of an abstract data type needed to derive the instance.
Quantified types. The programmer has to manually provide instances for type definitions with universal and existential quantification.
The compiler issues an error if there is no required instance for a type available. Required instances are determined by the overloading mechanism.
The generic function in Clean stands for a set of overloaded functions. There is one function in the set for each kind. When a generic function is applied, the compiler must select one overloaded function in the set. The compiler cannot derive the required kind automatically. For this reason a kind has to be provided explicitly at each generic function application. Between the brackets {| and |} one can specify the intended kind. The compiler then resolves overloading of the selected overloaded function as usually.
GenericAppExpression |
= |
FunctionName TypeKind GraphExpr |
TypeKind |
= |
{|* {-> *} |} |
Example: a generic equality operator can be defined as equality on types of kind *.
(===) infix 2 :: a a -> Bool | gEq{|*|} a
(===) x y = gEq{|*|} x y
Example: a mapping function fmap for functors and bmap for bifunctors can be defined in terms of the generic mapping function defined above just by specializing it for the appropriate kind.
fmap :: (a -> b) (f a) -> (f b) | gMap{|*->*|} f
fmap f x y = gMap{|*->*|} f x y
bmap :: (a1 -> b1) (a2 -> b2) (f a1 a2) -> (f b1 b2) | gMap{|*->*->*|} f
bmap f1 f2 x y = gMap{|*->*->*|} f1 f2 x y
Equality makes sense not only on for kind *. In the example we alter the standard way of comparing elements. Equality for kind * and *->* are explicitly used.
eqListFsts :: [(a, b)] [(a, c)] -> Bool | gEq{|*|} a
eqListFsts xs ys = gEq{|*->*|} (\x y -> fst x === fst y) ys
eqFsts :: (f (a, b)) (f (a, c)) -> Bool | gEq{|*->*|} f & gEq{|*|} a
eqFsts xs ys = gEq{|*->*|} (\x y -> fst x === fst y) ys
Examples of generic applications
Start =
( gEq{|*|} [1,2,3] [2,3,3] // True
, [1,2,3] === [1,2,3] // True
, gEq{|*->*|} (\x y -> True) [1,2,3] [4,5,6] // True
)
As it was outlined above, the structural representation of types lacks information about specific constructors and record fields, such as name, arity etc. This is because this information is not really part of the structure of types: different types can have the same structure. However, some generic functions need this information. Consider, for example a generic toString function that converts a value of any type to a string. It needs to print constructor names. For that reason the structural representation of types is extended with special constructor and field markers that enable us to pass information about fields and constructors to a generic function.
Definition of the constructor and field marker types (in StdGeneric.dcl).
:: CONS a = CONS a
:: FIELD a = FIELD a
The markers themselves do not contain any information about constructors and fields. Instead, the information is passed to instances of a generic function on these markers.
Examples of structural representation with constructor and field information
:: ListS a :== EITHER (CONS UNIT) (CONS (PAIR a (List a)))
:: TreeS a :== EITHER (CONS a) (CONS (PAIR (Tree a) (Tree a)))
:: Complex = { re :: Real, im :: Real }
:: ComplexS :== PAIR (FIELD Real) (FIELD Real)
GenericTypeArg |
= |
CONS [of {Pattern}] |
|
| |
FIELD [of {Pattern}] |
|
| |
TypeConstructorName |
|
| |
TypeVariable |
Definition of the constructor and field descriptors (StdGeneric.dcl).
Constructor descriptor is passed after of in the CONS case of a generic function.
:: ConsDescriptor =
{ gcd_name :: String
, gcd_arity :: Int
}
Field descriptor is passed after of in the FIELD case of a generic function.
:: FieldDescriptor =
{ gfd_name :: String
}
Generic pretty printer.
generic gToString a :: String a -> String
gToString {|Int|} sep x = toString x
gToString {|UNIT|} sep x = x
gToString {|PAIR|} fx fy sep (PAIR x y) = fx sep x +++ sep +++ fy sep y
gToString {|EITHER|} fl fr sep (LEFT x) = fl sep x
gToString {|EITHER|} fl fr sep (RIGHT x) = fr sep x
gToString {|CONS of c|} fx sep (CONS x)
| c.gcd_arity == 0
= c.gcd_name
| isEmpty c.gcd_fields
= “(“ +++ c.gcd_name +++ “ “ +++ fx “ “ x +++ “)”
| otherwise
= “{“ +++ c.gcd_name +++ “ | “ +++ fx “, “ x +++ “}”
gToString {|FIELD of f|} fx sep (FIELD x) = f.gfd_name +++ “=” +++ fx x
toStr :: a -> String | gToString{|*|} a
toStr x = gToString{|*|} ““ x
Currently generic support for uniqueness is rudimentary and experimental.
Uniqueness is very important in Clean. The generic extension can deal with uniqueness. The mechanism that derives generic types for different kinds is extended to deal with uniqueness information. Roughly speaking it deals with uniqueness attribute variables in the same way as it does with normal generic variables.
The type of standard mapping for lists with uniqueness
map :: (.a -> .b) ![.a] -> [.b]
Generic mapping with uniqueness. The instance on lists has the same uniqueness typing as the standard map
generic gMap a b :: .a -> .b
Uniqueness information specified in the generic function is used in typing of generated code.
Generated classes
class gMap{|*|} t :: .t -> .t
class gMap{|*->*|} t :: (.a -> .b) (.t .a) -> .t .b
class gMap{|*->*->*|} t :: (.a1 -> .b1) (.a2 -> .b2) (.t .a1 .a2) -> .t .b1 .b2
Current limitations with uniqueness: generated types for higher order types require local uniqueness inequalities which are currently not supported.
Counter Example due to limitation in the current version of Clean.
class gMap{|(*->*)->*|} t ::
(A. (a:a) (b:b): (.a -> .b) -> (f:f a:a) -> g:g a:a, [f <= a, g <= b])
(.t .f) -> .t .g
, [f <= t, g <= t]
Generic declarations and generic cases – both provided and derived – can be exported from a module. Exporting a generic function is done by giving the generic declaration in the DCL module. Exporting provided and derived generic cases is done by means of derive.
GenericExportDef |
= |
GenericDef ; |
|
| |
DeriveDef ; |
GenericDef |
= |
generic FunctionName TypeVariable+ :: FunctionType |
DeriveDef |
= |
derive FunctionName TypeConstructorName+ |
Example. Exporting of generic mapping. Definition as given in module GenMap.dcl
generic gMap a b :: .a -> .b
derive gMap c, PAIR, EITHER, CONS, FIELD, []
A generic function cannot be derived for an abstract data type, but it can be derived in the module where the abstract type defined. Thus, when one may export derived instance along with the abstract data type.