Monday, March 12, 2012

C++'11: Ranged for loops

First for orientation, consider the current for loop
for(int i = 0; i < 10; i++)
array[i] = 0;
The standard states that
for ( for-init-statement condition; expression) statement 
is equivenent to
{
for-init-statement
while ( condition {
statement
expression ;
}
}
No big surprises there, just reordering where the pieces fit. Now consider the new ranged based for loop which is defined similarly as
for ( for-range-declaration : expression ) statement 
is equivalent to
{
auto && __range = (expression);
for ( auto __begin = begin-expr, __end = end-expr;
__begin != __end;
++__begin)
{
for-range-declaration = *__begin;
statement
}
Not quite so straight forward, but not to worry because it probably behaves the way you would expect it to. The __ variables were defined only to help explain the syntax, they don't really exist. The begin-expr and end-expr are dependent on the type that expression evaluates to. If expression is an array, then they are the beginning and one past the ending of the array respectively. If the length of the array is not known then it is invalid and should not compile. If expression is a class, then it is __range.begin() and __range.end(). If there are not member functions by that name, then argument dependent lookup (ADL) is used to find the functions that would be valid for begin(__range) and end(__range)

Here is an array example:
int array[6] = {2,3,5,7,11,13};
for( const auto prime : array) cout << prime << endl;
Here is a class example:
void printIncrement( vector& vect )
{
for(auto x : vect) cout << x << endl;
for(auto &x : vect) ++x; //need reference to make the ++ save in the original vector
}

Sunday, March 11, 2012

C++'11: Nested Template >>

When the first edition of the Annotated C++ Reference Manual was written, it introduced Templates to the C++ world. The notation of using the < and > brackets was used because Bjarne felt like it would probably be easier to parse because () was already overloaded. He said later that he realized that using () would not have been that bad, but it was already too late. As a result of this oddity was put into the language:

typedef std::vector< std::vector< int > > type1; // Okay

typedef std::vector< std::vector< int >> type2; //Error

It becomes a more difficult problem to fix when you remember that the compiler front end is usually done in stages. First the compiler does lexical analysis to break the program into tokens, then does syntax analysis to check the grammar, and then does the type checking. Usually at that point it has an abstract syntax tree that is passed off to the compiler back end. I know this is a simplification and many compilers will blur each of these stages, but this model should work.

When the lexical analysis is being done, the tokens are broken up into the longest match, that is why long inta gives you a long named "inta" and not a long int named "a". This is usually called "Maximum Munch"

in [n1757] there were three alternative suggestions to fix this problem. Here they are, out of order.

Proposal 3

Get rid of the >> token. This makes a single > the maximum munch. It would also mean that a right shift operation would be two > tokens, so inta > > 3 would be a valid right shift operation. This approach was suggested just for the sake of completeness in the submission, not because anyone would actually want to do it. It causes too many other technical complications and it just looks funny.

Proposal 1

Keep the >> as its own token, but decree that if there is an open < token, then the >> token will be counted as two closing > tokens. This seems like the obvious solution, but it does have an unintended backwards compatibility side affect. Consider the following program

#include <iostream>
template%ltint I&rt struct X {
static int const c = 2;
};
template<> struct X<0> {
typedef int c;
};
template<typename T> struct Y {
static int const c = 3;
};
static int const c = 4;
int main() {
std::cout << (Y<X<1> >::c >::c>::c);
std::cout << (Y<X< 1>>::c >::c>::c);
}

Under the current rules this outputs 03, but under this proposed change this program would output 00 because the right shift operator turns into the closing template brackets.


Proposal 2

GCC and EDG C++ has implemented a variation of Proposal 1 for error recovery purposes. It treats the >> as right shift if it can, but otherwise uses it to close the template arguments. This means A<B<int>> would be valid but A<B<1>> would still be invalid. This solution would only partly fix the problem, and leave an even more archaic one in its place.

Solution:

The committee went with Proposal 1. See 14.2/2 of the spec for the official wording of the change. This also applies to the <...> in the casting operators. It is worth pointing out that a similar issue can come up when using the >>= or even the >= operators, although using that in a template type declaration would be so uncommon that they decided not to do anything about it at this time.

Monday, February 27, 2012

C++'11: auto keyword

auto keyword
Forget everything you know about the auto keyword. Section 7.1.1 of the standard forgot it like this:

The auto and register specifiers shall be applied only to names of objects declared in a block (6.3) or to function parameters (8.4). They specify It specifies that the named object has automatic storage duration (3.7.2). An object declared without a storage-class-specifier at block scope or declared as a function parameter has automatic storage duration by default. [Note: hence, the auto specifier is almost always redundant and not often used; one use of auto is to distinguish a declaration-statement from an expression-statement (6.8) explicitly.—end note]

Standards committee members searched through millions of lines of code and found only a few rare examples of its use, so they hijacked it for something everyone will use. It is now an auto initializer for variables. Borrowing heavily from the change submission [n1984] we will look at a few code examples:

auto x = 3.14; // x has type double

int foo();
auto x1 = foo(); //x1:int
const auto& x2 = foo(); //x2: const int&
auto& x3 = foo(); //x3: int&: error, cannot bind a reference to a temporary

float& bar();
auto y1 = bar(); //y1: float. Value semantics is the default (notice y1 is not of type float&)
const auto& y2 = bar(); //y2: const float&
auto& y3 = bar(); //y3: float&

A* fii();
auto* z1 = fii(); //z1: A*
auto z2 = fii(); //z2: A*
auto* z3 = bar(); //error, bar does not return a pointer type

The way it deduces the type to use is the same as how templates do it. i.e. in the code snipit:
const auto &i = expr;
The type of i is the deduced type of teh parameter u of teh call f(expr) of the following invented function template:
template void f(const U& u);
(see section 7.1.5.4 of the standard for the note on this)

You are even allowed to have multiple declarations on one line, but they must all be of the same type, just as if you specified the type. For example:
int i;
auto a = 1, *b = &1; //int and int*


The interesting thing that is missing is the ability to use auto with arrays. This is because an array decays into a pointer to the first element, which makes it difficult to specify the behavior in a way that is both consistent with the language and simple to understand. See section 8.3.4 paragraph 1, where it is explicitly banned.

Sunday, February 19, 2012

C++'11: C99 compatibility

When I was first learning how to program, I stumbled across the now defunct website programing.com [sic] where someone posed the question "What is the difference between C and C++?" The first response to the question said that "The difference is in C you say printf, but in C++ you say cout." I thought it was so funny that I showed all my software friends.

C compatibility has been one of the greatest advantages of the C++ language. Without it the language would have been (as Bjarne put it in D&E) stillborn. It has also been one of the biggest sources of problems in the language. I frequently deal with these kind of issues when integrating COTS C code with C++. The ISO C standard was updated for the first time in 11 years in 1999, a year after C++'98 was released. It was updated again in 2011 around the same time as C++'11. The two committees coordinated many of their changes to continue their symbiosis.

For an interesting discussion on incompatibility between the languages see appendix C of the standards document. The appendix is informative and not part of the actual C++ language definition but it is still worth your time if you work much with C and C++ together.

long long
long long was officially added to define an integer that is at least 64 bits long. Along with that comes unsigned long long and the literal constants LL/ll and ULL/ull. printf also has the ll length specifier. long long has been the de facto standard for awhile so it will probably be a surprise to some that it was not officially in the language before now. In was actually proposed back in 1995 but at the time was rejected because C had not yet officially adopted it yet. Here is a little code example:
 long long myNumber = -1234567LL;
unsigned long long uMyNumber = 8446744073709551616ULL; //That is a large number
printf("%lld and %llu\n", myNumber, uMyNumber);
See for [n1811] for more information.

Extended Integer Types
The C++ language did not adopt actual Extended Integer types. i.e. 128-bit integers or larger. They did however define a way for them to behave is an implementation decided to provide them. In a nutshell, they will behave basically how you would expect them to when it comes to things like implicit conversions. For the most part a normal developer should not need to worry about it or even know about this language extension because any use of it would not be portable. It should be avoided unless there is a good reason to use it. The C language, as well as the ECMA C++ Binding for CLI standard (that is the European Computer Manufactures Association) have already adopted it, so for the sake of not creating a C++ dialect the committee picked it up too. See [1988] for more information.

Other Stuff...
A bunch of other misc. things were carried over. Maybe the biggest thing was updates to the C libraries, a quick scan of [n1568] will show you more detail than you probably care to know about some of the compatibility library updates. I am choosing to gloss over changes like the updates to the preprocessor and C style strings in hopes that they will be touched on in another post. One important difference to point out though, is C++ did not adopt the Variable Length Arrays. Sorry, I guess you will just have to continue using new() for a variable length - or better yet use the new array class in the Standard Template Library.


C++'11: __cplusplus macro updated

The macro __cplusplus in now defined to have a value of 201103L. This is outlined in section 16.8 [cpp.predefined] of the standard. This value is expected to be updated in each release. As the standards document says:
"It is intended that future versions of this standard will replace the value of this macro with a greater value. Non-conforming compilers should use a value with at most five decimal digits."
If you use GCC though, be aware that they just recently fixed their bug that caused the value to be always set to 1 instead of the previous standards value 199711L. See their bug report here for more details and some interesting insight into why something as simple as #define __cplusplus 199711L is not always so simple.