type_traits: GCC vs Clang
Recently I was writing another article for polish developer journal "Programista" (eng. Programmer). This time I decided to focus on type_traits library - what can we find inside, how it is implemented, and finally - what can we expect in the future (mainly in terms of compile-time reflection).
While writing I was not referring to any particular implementation, but Clang and GCC implementations were opened on the second monitor all the times. I noticed some differences that I'd like to document here.
First things first. Whole post is based on:
They are used in following fashion:
At first glance everything looks nice. However you won't find such things in Clang's type_traits implementation. Clang approach is to use standard template mechanisms, like template specialization:
There's no doubt - GCC's version is more human-friendly. We can read it like it was a book and everything is clear. Reading Clang version is much more harder. Code is bloated with template stuff and there is a lot of kinky helpers.
On the other hand Clang developers use things that are shipped with the C++ compiler. Therefore, at least in theory, compilation should take less time for Clang implementation. It's not that easy to test this, but there's fancy new tool out there - templight. It is a tool that allows us to debug and profile template instances ;)
After some time playing with templight I got following results for simple program that just includes type_traits library, but in two versions: GCC and Clang.
When we sum instantiations and memoizations it turns out that assumption was right - Clang implementation of type_traits library should be slightly faster. But of course it depends on how compilers are implemented.
Another interesting fact is that during compilation of Clang's type_traits a lot more memory is consumed (comparing to GCC's version).
Both versions of type_traits were compiled using Clang 3.6 (SVN) with templight plugin, so there's no data related to GCC. It may be that for GCC GCC's version of type_traits is better.
The second thing that I've noticed is that Clang's type_traits strive to not rely on compiler built-ins as much as possible while GCC does. Good example here are implementations for std::is_class and std::is_enum.
GCC approach here is to simply use compiler built-ins, like __is_class and __is_enum.
Clang, on contrary, utilizes metaprogramming tricks developed by the community. In case of std::is_class implementation is based on function overloading - the first version of function __is_class_imp::__test accepts pointer to a member and thus will be chosen by the compiler only if the type is a class or union. Therefore second check is needed as well. Simple and brilliant.
In case of enums there's also interesting bit in Clang implementation:
Voila! No built-ins needed ;)
The last thing I'd like to mention here is std::is_function. GCC's approach here is a bit ridiculous, because it needs a lot of specializations (there are 24 of them so I don't want to include 'em all).
I think that this double template unpacking (or nested unpacking - I don't know how to name this) is good example how to make implementation look like a nightmare. When you have hammer in your hand everything is looking like a nail, isn't it ;)?
* Memoization - "Memoization means we are _not_ instantiating a template because it is already instantiated (but we entered a context where wewould have had to if it was not already instantiated)."
While writing I was not referring to any particular implementation, but Clang and GCC implementations were opened on the second monitor all the times. I noticed some differences that I'd like to document here.
First things first. Whole post is based on:
- GCC (libstdc++v3): c6292d1d5dc2c42aaaa2731ea8721dbe4b6fcb6f
- Clang (libcxx): 216318
100 template<typename...>
101 struct __or_;
102
103 template<>
104 struct __or_<>
105 : public false_type
106 { };
107
108 template<typename _B1>
109 struct __or_<_B1>
110 : public _B1
111 { };
They are used in following fashion:
709 /// is_unsigned
710 template<typename _Tp>
711 struct is_unsigned
712 : public __and_<is_arithmetic<_Tp>, __not_<is_signed<_Tp>>>::type
713 { };
At first glance everything looks nice. However you won't find such things in Clang's type_traits implementation. Clang approach is to use standard template mechanisms, like template specialization:
688 // is_unsigned
689
690 template <class _Tp, bool = is_integral<_Tp>::value>
691 struct __libcpp_is_unsigned_impl : public integral_constant<bool, _Tp(0) < _Tp(-1)> {};
692
693 template <class _Tp>
694 struct __libcpp_is_unsigned_impl<_Tp, false> : public false_type {}; // floating point
695
696 template <class _Tp, bool = is_arithmetic<_Tp>::value>
697 struct __libcpp_is_unsigned : public __libcpp_is_unsigned_impl<_Tp> {};
698
699 template <class _Tp> struct __libcpp_is_unsigned<_Tp, false> : public false_type {};
700
701 template <class _Tp> struct _LIBCPP_TYPE_VIS_ONLY is_unsigned : public __libcpp_is_unsigned<_Tp> {};
There's no doubt - GCC's version is more human-friendly. We can read it like it was a book and everything is clear. Reading Clang version is much more harder. Code is bloated with template stuff and there is a lot of kinky helpers.
On the other hand Clang developers use things that are shipped with the C++ compiler. Therefore, at least in theory, compilation should take less time for Clang implementation. It's not that easy to test this, but there's fancy new tool out there - templight. It is a tool that allows us to debug and profile template instances ;)
After some time playing with templight I got following results for simple program that just includes type_traits library, but in two versions: GCC and Clang.
GCC | Clang | |
Template instantiations* | 3 | 5 |
Template memoizations* | 59 | 45 |
Maximum memory usage | ~957kB | ~2332kB |
When we sum instantiations and memoizations it turns out that assumption was right - Clang implementation of type_traits library should be slightly faster. But of course it depends on how compilers are implemented.
Another interesting fact is that during compilation of Clang's type_traits a lot more memory is consumed (comparing to GCC's version).
Both versions of type_traits were compiled using Clang 3.6 (SVN) with templight plugin, so there's no data related to GCC. It may be that for GCC GCC's version of type_traits is better.
The second thing that I've noticed is that Clang's type_traits strive to not rely on compiler built-ins as much as possible while GCC does. Good example here are implementations for std::is_class and std::is_enum.
GCC approach here is to simply use compiler built-ins, like __is_class and __is_enum.
411 /// is_class...
412 template<typename _Tp>
413 struct is_class
414 : public integral_constant<bool, __is_class(_Tp)>
415 { };
399 /// is_enum
400 template<typename _Tp>
401 struct is_enum
402 : public integral_constant<bool, __is_enum(_Tp)>
403 { };
487 { "__is_class", RID_IS_CLASS, D_CXXONLY },489 { "__is_enum", RID_IS_ENUM, D_CXXONLY },
Clang, on contrary, utilizes metaprogramming tricks developed by the community. In case of std::is_class implementation is based on function overloading - the first version of function __is_class_imp::__test accepts pointer to a member and thus will be chosen by the compiler only if the type is a class or union. Therefore second check is needed as well. Simple and brilliant.
413 namespace __is_class_imp
414 {
415 template <class _Tp> char __test(int _Tp::*);
416 template <class _Tp> __two __test(...);
417 }
418
419 template <class _Tp> struct _LIBCPP_TYPE_VIS_ONLY is_class
420 : public integral_constant<bool, sizeof(__is_class_imp::__test<_Tp>(0)) == 1 && !is_union<_Tp>::value> {};
In case of enums there's also interesting bit in Clang implementation:
500 template <class _Tp> struct _LIBCPP_TYPE_VIS_ONLY is_enum
501 : public integral_constant<bool, !is_void<_Tp>::value &&
502 !is_integral<_Tp>::value &&
503 !is_floating_point<_Tp>::value &&
504 !is_array<_Tp>::value &&
505 !is_pointer<_Tp>::value &&
506 !is_reference<_Tp>::value &&
507 !is_member_pointer<_Tp>::value &&
508 !is_union<_Tp>::value &&
509 !is_class<_Tp>::value &&
510 !is_function<_Tp>::value > {};
Voila! No built-ins needed ;)
The last thing I'd like to mention here is std::is_function. GCC's approach here is a bit ridiculous, because it needs a lot of specializations (there are 24 of them so I don't want to include 'em all).
490 template<typename _Res, typename... _ArgTypes>
491 struct is_function<_Res(_ArgTypes......) volatile &&>
492 : public true_type { };
493
494 template<typename _Res, typename... _ArgTypes>
495 struct is_function<_Res(_ArgTypes...) const volatile>
496 : public true_type { };
497
498 template<typename _Res, typename... _ArgTypes>
499 struct is_function<_Res(_ArgTypes...) const volatile &>
500 : public true_type { };
501
502 template<typename _Res, typename... _ArgTypes>
503 struct is_function<_Res(_ArgTypes...) const volatile &&>
504 : public true_type { };
I think that this double template unpacking (or nested unpacking - I don't know how to name this) is good example how to make implementation look like a nightmare. When you have hammer in your hand everything is looking like a nail, isn't it ;)?
* Memoization - "Memoization means we are _not_ instantiating a template because it is already instantiated (but we entered a context where wewould have had to if it was not already instantiated)."