Specializations of forwarding template functions

Funny how sometimes just after having studied a subject one finds cases that let themselves treat with it, but maybe it’s just me being a man with a hammer. Anyway, in my previous post I wrote about rules that give c++ the perfect forwarding mechanism and today I would like to look at a particular dark corner, namely, I will try to specialize forwarding function templates.

Suppose that you wanted to have a function template whose signature is a perfectly forwarding one:

class probe;

    template<class T>
struct wrapper;

    template<class T>
auto produce_wrapped(T&& t) -> wrapper<T>;

Let me start by fixing the obvious straight away. In case T is inferred as l-value reference, the result is not going to be what you likely want. Similarly with cv-qualification, you’d probably want wrapped<probe> rather than wrapped<const probe> or wrapped<const probe&>. So you need to remove the reference and remove the cv-qualification. You can do that either by applying to T both the remove_cv and remove_reference traits, or, in future C++20, by applying remove_cvref, or, if you agree to not look the way of pointers and arrays, by applying decay. So we really start with

class probe;

    template<class T>
struct wrapper;

    template<class T>
auto produce_wrapped(T&& t) -> wrapper<std::decay_t<T>>;

Now, let’s think at how we could specialize produce_wrapped. Why specialize and not overload, use specialized traits from the template or whatever else? Well, chiefly because it’s interesting on its own, and by way of extenuation I can cite a case where there are many probe classes, in the hundreds, that have befriended the produce_wrapped template which, in turn, now needs to peek into the bowels of the object before forwarding it. And there’s no talk of touching hundreds of files at once.

  template<class T>
struct wrapper;

  template<class T>
auto produce_wrapped(T&& t) -> wrapper<std::decay_t<T>>;

class probe
  {
    std::string s_;
  public:
    probe(std::string s) : s_(std::move(s)) {}
    template<class T> friend auto produce_wrapped(T&&) -> wrapper<std::decay_t<T>>;
  };

class guinnea_pig
  {
    int x_;
  public:
    guinnea_pig(int x) : x_(x) {}
    template<class T> friend auto produce_wrapped(T&&) -> wrapper<std::decay_t<T>>;
  };


    template<class T>
struct wrapper
  {
  T t_;
  bool good_ = false;
  };

So now let’s say that good probe objects have strings shorter than 100, good guinnea_pig objects have odd ints, and let’s log what’s inside. Since the logic is different, we won’t do that in the principal template but in specializations. We also cannot delegate the work to some traits, as it is only this particular function template that can access the necessary data.

But how do we actually write that (necessarily full) specialization? There’s no template parameter T to be used on the parameters’ list or in the std::forward call. That should not come as a surprise, as the compiler will first produce a candidate function by using the principal template definition, make all the type substitutions, and only then look if there is a specialization. We have to start with empty template<> and then write everything by hand. We know all the necessary rules from the previous post, but I think it will be instructive to go slow now, and study an example call:

auto foo(const probe& p) -> void
  {
  produce_wrapped(p);
  }

What will be the compiler’s reasoning here? p is an l-value reference to const of type probe, so, according to the special rule for the r-value reference pattern in template function argument, substitution in the principal template will infer T as const probe& and the reference collapsing rule will turn T&& into the same const probe&. And that’s what we need to give to the specialization:

    template<>
auto produce_wrapped(const probe& t) -> wrapper<probe>
  {
  return {t, t.s_.size() < 100};
  }

No need to decay your const probe& if you can just write what is needed. Apparently, there’s enough information here for the compiler to figure out that this is a specialization for T equal to const probe&, but if you want to be verbose, you can say

    template<>
auto produce_wrapped<const probe&>(const probe& t) -> wrapper<probe>
  {
  std::cerr << t.s_ << "\n";
  return {t, t.s_.size() < 100};
  }

But l-value references to const are boring, and it’s not what forwarding is about, we want r-values, you’ll say. So, let’s do r-values.

auto foo(const probe& p) -> void
  {
  produce_wrapped(p);
  }
auto main(void) -> int
  {
  probe p("regular probe");
  foo(p);
  produce_wrapped(probe("rvalue probe"));
  return 0;
  }

When the compiler sees this code, it will infer T to be probe and T&& to be probe&&. But there is no such specialization, so there’s a linker error on missing symbol definition. Ah then, we have to write that specialization separately.

    template<>
auto produce_wrapped(probe&& t) -> wrapper<probe>
  {
  std::cerr << t.s_ << "\n";
  return {std::move(t), t.s_.size() < 100};
  }

Again, if you want to be verbose, you can put produce_wrapped<probe> explicitly. So we are done, you might think. Until you try the next call

auto foo(const probe& p) -> void
  {
  produce_wrapped(p);
  }
auto main(void) -> int
  {
  probe p("regular probe");
  foo(p);
  produce_wrapped(probe("rvalue probe"));
  produce_wrapped(p);                     // doesn't link
  return 0;
  }

Why doesn’t it link? When you overload functions and you have a version for const type& (but no version for type&), you can call it with non-const objects as the reference to const can bind to the non-const object, and the overload resolution will select it. In our case, however, overload resolution works on the principal template level instantiated with probe& for both T and T&&. And there is no code for such a specialization, we have to provide yet another one.

    template<>
auto produce_wrapped(probe& t) -> wrapper<probe>
  {
  std::cerr << t.s_ << "\n";
  return {t, t.s_.size() < 100};
  }

So we’re done, you may think, three verbose specializations for a perfect forwarding in this setup. And if you’re doing moderately reasonable things, you may be right, but I am after perfect forwarding, and this is the place things go zonkers. Remember the expression `cv-qualified’? Well, there’s four of these and they combine with two distinct reference types, l-value and r-value, which leaves us with eight necessary boilerplate specializations to cover all the cases. And keep in mind that the cv-qualified references come with a lovely diamond-shaped reference-will-bind-to dependence diagram (and, you know, in c++ diamonds are forever), and to complete the tangle, special rules for binding const (but not volatile) l-value references. Even the standard ends up by going bananas about it. Let me write one of the specializations:

    template<>
auto produce_wrapped(const volatile probe& t) -> wrapper<probe>
  {
  const std::string& psr = const_cast<const probe&>(t).s_;
  std::cerr << psr << "\n";
  return {t, psr.size() < 100};
  }

String has no operations that would work on volatile objects so one has to cast even for the simplest things. And for that to even compile, you’d have to complete probe with copy constructors that would unambiguously cover all the calls, and the dependencies do not make it all that easy, for instance if you have probe(const probe&) and probe(const volatile probe&), then they will compete for a call with probe&, you need another one (but they will not compete for a probe&& call, const probe& will be selected, go figure). To You probably don’t care about the vileatile qualification or about const r-value references, but I’m just trying to be perfect, that is, provide a forwarding mechanism regardless of other considerations, and probe‘s constructor is out of immediate scope.

And there’s another shortfall with probe, in that it befriends the entire produce_wrapped template, even for unrelated types, that is, for instance produce_wrapped<guinnea_pig> is friend to probe. If we tried to be well behaved and do as the common advice says, we would have to make friend the particular specializations, and yes, all the eight separately. It’s a rule-of-eight that emerges here, eight specializations, eight friends, eight constructors, lots, lots of code.

But what are the alternatives, if, after all, we were allowed to change the friend declaration in the many probe classes to suit our design? Putting that in different terms, what are the ways to perfectly forward values of a single type, as opposed to writing a greedy template? I can offer two designs that seem to work. One would be to actually overload produce_wrapped_overload, but since for forwarding we need the argument to be a template, we would need to SFINAE-enable every overload for just the type it was designed for:

class probe
  {
    std::string s_;
  public:
    probe(std::string s) : s_(std::move(s)) {}
    template<class T> friend auto produce_wrapped_overload(T&&)
    -> std::enable_if_t
         <
         std::is_same<std::decay_t<T>, probe>::value, 
         wrapper<std::decay_t<T>>
         >;
  };
  
class guinnea_pig
  {
    int x_;
  public:
    guinnea_pig(int x) : x_(x) {}
    template<class T> friend auto produce_wrapped_overload(T&&)
    -> std::enable_if_t
         <
         std::is_same<std::decay_t<T>, guinnea_pig>::value, 
         wrapper<std::decay_t<T>>
         >;
  };
  
    template<class T> 
auto produce_wrapped_overload(T&& t)
    -> std::enable_if_t
         <
         std::is_same<std::decay_t<T>, probe>::value, 
         wrapper<std::decay_t<T>>
         >
  {
  using underlying = std::remove_volatile_t<std::remove_reference_t<T>>;
  using workable = std::conditional_t<std::is_lvalue_reference<T>::value,
                                      std::add_lvalue_reference_t<underlying>,
                                      std::add_rvalue_reference_t<underlying>
                                     >;
  workable tr = const_cast<workable>(t);
  const auto& psr = tr.s_;
  std::cerr << psr << "\n";
  return {std::forward<T>(t), psr.size() < 100};
  }

    template<class T> 
auto produce_wrapped_overload(T&& t)
    -> std::enable_if_t
         <
         std::is_same<std::decay_t<T>, guinnea_pig>::value, 
         wrapper<std::decay_t<T>>
         >
  {
  using underlying = std::remove_volatile_t<std::remove_reference_t<T>>;
  using workable = std::conditional_t<std::is_lvalue_reference<T>::value,
                                      std::add_lvalue_reference_t<underlying>,
                                      std::add_rvalue_reference_t<underlying>
                                     >;
  workable tr = const_cast<workable>(t);
  const auto& psr = tr.x_;
  std::cerr << psr << "\n";
  return {std::forward<T>(t), bool(psr % 2)};
  }

There’s the additional hops to make it work for the most general case when T is cv-qualified in any possible sense, and it is not as straightforward since the standard trait remove_volatile only removes it from an actual type and not from a reference to said type. Your compile times are going to take a hit, though, since at every call every overload will have to be tried and only one will survive. Another approach would be to use a traits class:

    template<class T>
struct forwarding_trait;

class probe
  {
    std::string s_;
  public:
    probe(std::string s) : s_(std::move(s)) {}
    friend class forwarding_trait<probe>;
  };
  
class guinnea_pig
  {
    int x_;
  public:
    guinnea_pig(int x) : x_(x) {}
    friend class forwarding_trait<guinnea_pig>;  
  };

    template<class T>
using non_vola_ref = std::conditional_t
  <
  std::is_lvalue_reference<T>::value,
  std::add_lvalue_reference_t<std::remove_volatile_t<std::remove_reference_t<T>>>,
  std::add_rvalue_reference_t<std::remove_volatile_t<std::remove_reference_t<T>>>
  >;

    template<>
struct forwarding_trait<probe>
  {
      template
        <
        class T, 
        class = std::enable_if_t<std::is_same<probe, std::decay_t<T>>::value>
        >
  static auto process(T&& t) -> wrapper<probe>
    {
    const auto& tr = const_cast<non_vola_ref<T>>(t);
    const auto& psr = tr.s_;
    std::cerr << psr << "\n";
    return {std::forward<T>(t), psr.size() < 100};
    }
  };

    template<>
struct forwarding_trait<guinnea_pig>
  {
      template
        <
        class T, 
        class = std::enable_if_t<std::is_same<guinnea_pig, std::decay_t<T>>::value>
        >
  static auto process(T&& t) -> wrapper<guinnea_pig>
    {
    const auto& tr = const_cast<non_vola_ref<T>>(t);
    const auto& psr = tr.x_;
    std::cerr << psr << "\n";
    return {std::forward<T>(t), bool(psr % 2)};
    }
  };

    template<class T>
auto produce_wrapped_trait(T&& t) -> wrapper<std::decay_t<T>>
  {
  return forwarding_trait<std::decay_t<T>>::process(std::forward<T>(t));
  }

There’s also enable_if inside, but it is just a safety measure, to prevent you from calling forwarding_trait<T>::process with objects of unrelated class U. Otherwise, the compiler will know precisely where to look for your specialization. Wonder if there is other, do you happen to know or can comment on it otherwise?

PS. WordPress now has an editor that plays tricks with angle brackets, so I write my posts in a plain text editor. The c++ brackets then need to be converted to html entities, but brackets on known html tags must not. If you ever need to do the same, here’s my sed, just complete the list of allowed html tags (here a, em, code) if you need:

sed 's/<\(\/\)\?\(a\|em\|code\)\([^<>]*\)>/\x02\1\2\3\x03/g;s/\&/\&amp;/g;s/</\&lt;/g;s/>/\&gt;/g;s/\x02/</g;s/\x03/>/g'

An expressive language they are, regexes, should recall next time I moan about the intricacies of c++.

Advertisements

On a forwarding bug

Recently I got caught into a funny bug with forwarding references that I want to share here. In theory, I knew all the principles that led to it, but then it turned out that it was not easy for me to see how they apply precisely and to understand their full extent. Maybe reading this can save a few hours of your time someday.

Let’s start with a code snippet.

  struct element
    {
    int v=0;
    };

    template<class What>
  struct holder
    {
    What what;
    holder(What&& w) : what(w) {}
    };

    template<class T>
  auto make_holder(T&& t) -> auto
    {
    return holder<T>(std::forward<T>(t));
    }

  auto ret_holder1(void) -> auto
    {
    element vl{17};
    auto h = make_holder(std::move(vl));
    // use h
    return h;
    }

  auto ret_holder2(void) -> auto
    {
    element vl{17};
    auto h = make_holder(vl);
    return h;
    }

  auto main(void) -> int
    {
    auto rh1 = ret_holder1();
    auto rh2 = ret_holder2();
    return 0;
    }

This builds and runs, but some of it is broken. Do you feel uneasy why this compiles in the first place? Can you see the bug(s) already?
Read more of this post

get<N>(tuple) considered harmful

I have been toying around with a custom implementation of tuple and came across this pitfall. Suppose you need a tool that would provide a tuple that would be the tail of a given tuple. What’s a tuple? It can be std::tuple or it can be mytuple, so the tool should best be dependent on the tuple’s template template together with the tuple’s type pack:

    template<template <class...> class Tuple, class ArgHead, class... ArgTail>
auto tuple_tail(Tuple<ArgHead,ArgTail...>& t) -> Tuple<ArgTail...>;

The implementation must construct a tuple with pack expansion involving only the tail arguments. Types are easy, but to get the values from t we need an index sequence going from 1 to sizeof...(T)-1. It is straightforward to get a sequence like that with the use of std::index_sequence_for, but then it starts at 0, that we need to drop. We then send the index sequence together with the tuple t to a helper function that does the actual job. All of that should be fairly straightforward c++ for anybody with a couple of years of experience (did I say a couple? I meant a couple of dozen, of course…):
Read more of this post

On would–be const constructors

In this post I would like to discuss the idea of having constructors, or some equivalent, designed to build only objects that are const. My motivating example is an attempt to implement matrices and matrix views. I believe this is a canonical example where temptation to have const constructors appears in const methods of the principal class, the chief context in which one has limited data that suffice to only build a const–restricted object. I will try to be clever implementing them, and then show why it does not really work, and what lessons should be learnt from that exercise.
Read more of this post

Generating set partitions – replacing virtual functions with protected inheritance

The purpose of this long post is twofold: first, I really am in need of an efficiently implemented set partition enumerating library, and second, I would like to use this work as a pretext to share and discuss the designs that I considered. The latter is the main part here, as the algorithms themselves are taken from other people’s work, and the c++ discussion does not require a thorough understanding of them.

A set partition is a way to split elements of a finite set into disjoint subsets, called blocks. A way to model it is to take the set of natural numbers {0,1,2,…,n} as the initial set, and give each member a block number. Thus, for n=4 the partition (0,1,0,1) defines two blocks, block 0 containing elements 0 and 2, and block 1 containing elements 1 and 3. Of course, had we taken partition (1,0,1,0), we would have got the same split, so to avoid repetition there has to be a normalization rule: we start with block 0, and if a new block is needed, it gets the next number unused so far. A nice overview of the algorithms that accomplish this task has been gathered by Michael Orlov (section Technical reports at https://www.cs.bgu.ac.il/~orlovm/papers/ ), there’s even an accompanying implementation, but I think that with modern c++ one can now do better. And I need it for a student of mine who is doing research on some industrial process optimisation as well as for purely mathematical research, set partitions have a lot to do with probability, see for instance the explanatory paper by Roland Speicher: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.9723

1. Elementary solution for all partitions

Partitions of fixed size can be looked upon as an ordered collection of immutable elements, and the most natural way to implement this kind of objects in c++ is to have a collection object equipped with iterators, technically speaking const_iterators. Actually, most work is done in iterators in this case, the collection object is just needed for the convenience of having begin and end methods, which allows for instance the range for loop syntax. Michael Orlov’s paper details four algorithms: initialization of first and last partition and generation of next and previous partition, in terms of two sequences, a current partition k and a running maximum of used block numbers m. I will not analyse the algorithms themselves, it is not necessary for the c++ discussion that follows, and Michael has already done a prefect job, please consult his text. So, my first take at implementing the unrestricted set partitions is as follows:
Read more of this post

Introduction to random number generation in c++

This article is intended as a concise introduction into using the (pseudo)-random number generation facilities present in c++11. What I write about certainly isn’t new, but I need an easy-reading place I can point people to. That does not mean there are no problems with the standard services, there are some are fundamental, and are mentioned in the articles I cite, others are programmatic and could use some, well, circumventing, that comes with all the c++ intricacy delight, but I will leave that out waiting for a future second installment on the subject.

Structure of the problem

So you want a random number from a computer, right? You think it really is possible? After all, if your computer, a deterministic machine designed to follow your progam to the letter, started doing things at random, you would most likely declare it broken and either try to repair it or scrap it. No way. And yet, you play games, you do your Monte-Carlo simulations and all that claims to be using some sort of randomness. So where is it to be found in your computer? The answer has three parts.
Read more of this post

Nested class privacy leak: bug or feature?

Most programmers know classes in c++ can be nested, and the nested ones can be made private if they are not needed outside the implementation of the host class. The canonical example is a linked list:
Read more of this post