MSSQL Pivot

Pivots (turning a column’s values into actual columns) is a very common activity. Spreadsheet programs have robust support for it. But Standard SQL? Not so much.

The Problem

Create Table orders (
  orderNumber int,
  sku char(3),
  quantity int,
  salesPerson varchar(10)
);

insert into orders values
( 1, 'ZR34', 2, 'Mary'),
( 1, 'AS99', 1, 'Mary'),
( 2, 'ZR34', 1, 'Jim'),
( 2, 'MB01', 1, 'Jim');

The ubiquitous order table with the SKU, quantity and sales person. To keep this simple, I did not normalize. If that bothers you, then think of the orders table as the results of joining between the all the bits.

The ask, is to product a report that shows, for each sales persons, how many of each SKU they sold.

 sku	Mary	Jim	Kiki
AS99	1	0       0
MB01	0       1	0
ZR34	2	1	0

(Kiki was apparently on vacation.)

Now if your most people, you download the data to a spreadsheet and call it a day. But we’re not most people. We have a hammer (SQL) so we are going to hammer this flat. Plus, we know that we’ll get the same request next week, and the next, etc. And who has time for that?

Standard SQL

If we were to do this in standard SQL, it would look like:

select sku, sum([Mary]) as "Mary", sum([Jim]) as "Jim", 
        sum([Kiki]) as "Kiki"
from (
   select
      sku,
      case when salesPerson = 'Jim' then quantity else 0 end as [Jim],
      case when salesPerson = 'Mary' then quantity else 0 end as [Mary],
      case when salesPerson = 'Kiki' then quantity else 0 end as [Kiki]
   from orders
   ) as bySP
group by bySP.sku

A new case statement is required for every salesperson. That’s no fun.

MSSQL

MSSQL has a pivot  statement that makes this a bit less painful

select sku, [Mary], [Jim], [Kiki]
from (select sku, quantity, salesPerson from orders) s
pivot (sum(quantity) for salesPerson in ( [Mary], [Jim], [Kiki])) pvt

Some notes about the syntax:

  • The alias for the pivot (eg pvt) is required.
  • The alias for the subselect is also required, even though  it isn’t used.
  • The values that form the in-list are not strings – they are column names.
  • You can use * in the outer select’s return list.

You can use a bare table in the from clause, but be careful. Any column (like sku) that is not aggregated or used as the pivot column becomes a defacto group-by. In our example orderNumber becomes another row label

select *
from orders
pivot (sum(quantity) for salesPerson in ( [Mary], [Jim], [Kiki])) pvt

Leading to:

orderNumber	sku	Mary	Jim	Kiki
1	        AS99	1	NULL	NULL
2	        MB01	NULL	1	NULL
1	        ZR34	2	NULL	NULL
2	        ZR34	NULL	1	NULL

It would be nice if we could do something like this:

-- THIS DOESN'T WORK
select *
from (select sku, quantity, salesPerson from orders) s
pivot sum(quantity) 
for salesPerson in ( select distinct salesPerson from orders ) as pt

But unfortunately, the list of values needs to be given explicitly. The one good thing about this is that you will have a column for a value, even if there is no rows that match (think of poor Kiki).

Dynamic SQL

But this can be done using Dynamic SQL and exec. I’ve built dynamic queries such as this for the standard sql case and it is no fun. Doing so for the pivot operator is a piece of cake.

First create a salesForce table so Kiki will make an appearance.

create table salesForce(
	name varchar(10)
);

insert into salesForce values ('Kiki'), ('Mary'), ('Jim')

Then use a cursor to build our column list and presto!

declare sp cursor for select [name] from salesForce
declare @list varchar(500) = ''
declare @query varchar(1000)
declare @aName varchar(10)

open sp
Fetch next from sp into @aName
while @@FETCH_STATUS = 0
begin
	if @list != ''
		set @list = @list + ', '

	set @list = @list + '[' + @aName + ']'
	fetch next from sp into @aName
end

close sp
deallocate sp

set @query = 'select sku, ' + @list + 
	' from (select sku, quantity, salesPerson from orders) s ' +
	'pivot (sum(quantity) for salesPerson in (' + @list + ')) pvt'

print @query
exec(@query)

Unfortunately, to get rid of the nulls means keeping two parallel list – one for the select that has the isnull and one for the pivot value list.

I hope this helps you use the pivot statement effectively.

Advertisements

Identifier Parsing in Boost.Spirit X3 – custom parser

This time around, we will use a custom parser to handle the keywords.

I really hadn’t planned on making this a series, but there you go. This will be the last – I think.

Upgrades

I started from the code from the last post, but did make a minor adjustment. I made underbar (‘_’) a valid character in an identifier.

auto const ualnum = alnum | char_('_');
auto const reserved = lexeme[symtab >> !ualnum];
auto const ident = lexeme[ *char_('_') >> alpha >> *ualnum ] - reserved;

Custom Parser

Parsers in X3 are classes that have a parse template function with a specific signature.

template<typename Iterator, typename Context, typename RContext, typename Attribute>
    bool parse(Iterator &first, Iterator const& last, Context const& context, 
               RContext const& rcontext, Attribute& attr) const

first and last are input iterators that contain the stream of characters (or whatever the iterators are iterating) to match.

context and rcontext contain various client and system supplied information.

attr is the attribute – the value that the parser will pass back on success. In our case, this will just be the keyword itself.

And that really is it. The code itself is straightforward and pretty much mirrors what the “standard” version does – match the given string, then check that next character (if it exists) is not a letter, number, or underbar.

After that, it is just a matter of using the new parser keyword in the mkkw lambda.

Now that wasn’t bad, was it?

Here is the finished code.

 

Identifier Parsing Redux

The ink hadn’t dried[1] on my Identifier Parsing post when I realized that there was indeed a better way to handle multiple keywords.

In that post I stated that a symbols<T> parser would not help because it suffered the same problem as lit(). Which is true.

What I missed was that, of course, you could use the same trick with symbols as you did with lit() to make it work.

Like this:

symbols<int>symtab;
symtabs.add("var")("func");
auto const reserved = lexeme[symtab >> !alnum];
auto const ident = lexeme[  +alnum - reserved ]

That does what we need.

AND, we can fix up our lambda to automatically register new keywords.

auto mkkw = [](std::string kw) {
    symtab.add(kw);
    return lexeme[x3::lit(kw) >> !alnum];
};

Now, we can happily make up keywords and keep the rest of the parser in sync.

I will place a V4 in the Github repository.


 

[1] Yea, I know. Work with me.

The Tools

I thought I would take a moment and document my current development environment.

The main computer runs windows 10. However, the development computer is actually a VirtualBox ArchLinux client running on that Win 10 box. I use X11 Forwarding to display back to an cygwin/X server running on the host.

I am currently using Codelite as my IDE, though it has some rough edges. GNU compiler suite rounds things out.

 

Identifier Parsing in Boost.Spirit X3

In Boost.Spirit X3, parsing identifiers is a bit tricky.

If you are used to the distinction between lexical analysis and syntactical analysis (as I am), Spirit can take some getting used. Lexical analysis is done in the same grammar as the syntactical analysis. So the ubiquitous IDENT token type is now a grammar rule.

To be sure, it doesn’t have to be this way. Spirit parsers work on iterator pairs, so it would be very feasible to write a lexical analyzer that makes a token object stream available via a ForwardIterator and write your grammar rules based on that.

That has some benefits. It definitely would keep the grammar a bit cleaner since you don’t have to write rules around lexical issues. And you don’t have to care about the skip parser (the user-supplied grammar for recognizing spaces, etc that you don’t want to be considered as important input).

But the separation also has a cost. You will frequently have to create a feedback loop where the parser must give guidance to the lexer in order to reduce ambiguity in the grammar. Which means that the two are going have to share a good bit of state. So, if they are going to be all in each other’s business anyway, you might as well use Spirit to write both.

But using Spirit for both ALSO has a bit of cost. Things can get tricky. Lets take a look.

A First Simple Attempt

Below is the code for a a very simple grammar – input is the keyword “var” followed by an identifier – that duo can be repeated as many time as you want. Everything is space separated.

Note that the code is also available via Github

Lets go to the bottom for a minute and look at main(). This won’t change so we’ll get it out of the way. Most of this is self explanatory, but I want to mention two things.

The first is the if/else if/else. In general, this is how you will need to check the outcome of your parse. The return value will be true unless special error handling says otherwise. Possibly we will delve into that in a future post. So, the main way to know if you got what you wanted is to see if all the input was consumed.

Now let’s look at the call to the Spirit parsing engine.

bool r = x3::phrase_parse(iter, end_iter, program, x3::ascii::space);

The two iterators say what to parse, program is the parser to use, and space is the skip parser.

The skip parser is a special parser that Spirit calls between calls to other parsers. So, if you have a line

A >> B >> *C

then, Spirit will act as if you has specified

skip >> A >> skip >> B >> *(skip >> C)

This will be important in a few paragraphs.

So, now the actual parser:

auto const kw_var = x3::lit("var");
auto const ident = lexeme[ alpha >> *alnum ];
auto const stmt = kw_var >> ident;
auto const program = +stmt;

A “program” is one or statements each of which is the the keyword “var” followed by an identifier which made up of one or more alphanumeric characters.

Le me talk about the “lexeme” parser for a minute. Remember that Spirit inserts a call to the skip parser between invocations of other parsers? If we had written ident as simply

auto const ident = alpha >> *alnum;

that would have been transformed into

auto const ident = alpha >> *( skip >> alnum );

The effect would be that the ident parser would happily consume all the input (assuming no punctuation characters) since “space” (our skip parser) soaks up all the spaces, tabs, etc.

“lexeme” turns off skip processing for the parser it encloses. So, with lexeme, ident will stop at the first non-alphanumeric character – including spaces. lexeme is a way to say that spaces (or what ever the skip parser normally consumes) is important.

So, does our parser work? It does! It even fails as expected.

$ ./parse_ident_v1 "var foo var bar"
parsing : var foo var bar
Good input
$ ./parse_ident_v1 "var foo var"
parsing : var foo var
Failed: didn't parse everything
stopped 3 characters from the end ( 'v' )

Or does it?

$ ./parse_ident_v1 "varfoo varbar"
parsing : varfoo varbar
Good input

A Second Attempt

The problem is that the string parser “lit” is, well, too literal. We asked it to check if the three characters ‘v’, ‘a’, ‘r’ were in the input stream – and they were. What we really wanted is that those characters were in the stream AND they were not followed by anything that we allow in an identifier.

auto const kw_var = lexeme[x3::lit("var") >> !alnum]

Note that we now have to use “lexeme” otherwise we would skip spaces before checking to see if there was a alphanumeric character.

$ ./parse_ident_v2 "varfoo varbar"
parsing : varfoo varbar
Failed: didn't parse everything
stopped 13 characters from the end ( 'v' )

Much better. Note that it stopped at the “v” in “varfoo”.

But we still have a problem.

$ ./a.out "var var"
parsing : var var
Good input

Hmmm..

A Third Attempt

If the language you are parsing doesn’t reserve keywords, then we’re done. But most languages have at least a few reserved words that cannot be used as general identifiers.

The first fix actually helps us here as well.  The fix is as easy as :

auto const ident = lexeme[ (alpha >> *alnum) - kw_var ]

Note that the “except” operator ( “-” my term) may not work the way you think. What does NOT happen is that the first parser does its thing and then the “except” parser checks to see if it matches and fails if it does.

The order of operations is reversed from what might be implied by the expression. The “except” parser runs first. If it matches, then the whole expression fails. If it does NOT match, then the primary parser ( +alnum in this case) runs.

Because of this, something like:

auto const ident = lexeme[ +alnum - lit("var") ]

would fail to match the string “variable” for the same reason that “varfoo” succeeded in our first attempt.

With that in mind let’s do one more thing – let’s cater to the case of multiple keywords.

Multiple Keywords

It would be tempting – after glancing at the Spirit X3 docs – to try to use x3::symbols for this. It would indeed be nice if it worked:

x3:symbols symtab.add("var")("func");
auto const kw_var = lexeme[ lit("var") >> !alnum];
auto const kw_func = lexeme[lit("func") >> !alnum];
auto const ident = lexeme[ (alpha >> *alnum) - symtab ]

Unfortunately, this suffers from “lit”-erally the same problem. symtab is (for parsing purposes) equivalent to:

symtab = lit("var") | lit("func")

So the string “var variable” would – again – be rejected.

We will need to define a new parser “reserved” that wraps up all the reserved words.

auto const kw_var = lexeme[ lit("var") >> !alnum];
auto const kw_func = lexeme[lit("func") >> !alnum];

auto const reserved = kw_var | kw_func;

auto const ident = lexeme[ (alpha >> *alnum) - reserved ]

And presto!

So, here is the final code. I have also made a little helper lambda to define keywords. I tried to figure out a way to have it also add the new keyword to the reserved parser, but couldn’t come up with anything. I’m happy to listen to any suggestions.

Final Thoughts

Doing lexical analysis in Spirit is not much different for actually writing one freehand. The same thought processes are involved. Just be aware of the skip processor and you’ll do fine.

The Onyx Project

I have decided to design, and build my own computer language. It will be called Onyx.

I agree, that is a pretty big task.

So we will take a bit at time.

Design Criteria

I don’t really have a whole lot yet. You might think of onyx as C++ EXCEPT – meaning, it acts like C++ EXCEPT these differences. Over time the list will grow and I’ll turn it into a full-feature language spec.

Here are the “excepts” so far.

Variable/Function declaration

Onyx has a postfix declaration for the type e.g.

var foo int;                         // int foo;
var foo * ( * int, *() ) * int;      // int* (*foo)(int*, *());
var baz * [] * int;                  // int *(*baz)[]; I think.
// functions are similar
func doit ( a * int, b MyClass *) YourClass *

Yes, there will be keywords introducing variables and function.

I haven’t decided what to do about const yet.

var foo const int = 3;
// Q: allow abbreviation as : ??
const foo int = 3;

Classes

Haven’t really given this much thought yet. Points up for thought:

  • different inheritance types (public/protected/private) or not?
  • What are the visibility levels – all three?? I’m leaning towards only two – public and protected (though I will probably relabel them).
  • virtual inheritance?

Of course, the biggest questions is – can objects live on the stack (Like C++) or only as references/pointers (like java). I want objects on the stack, but I want to see if we can do something about the awful mess of T vs T* vs T& vs T&&.

My current thinking is that there are no pointers – only T and &T, but that references would be “spelled” “*”.  So getting to members through a pointer would use dot notation.

var a * MyClass = &anObj; // fine initializing a to reference anObj
var b * MyClass = &otherObj;
a.m1 = b.m1 + 3;  // update a single member
a = b;            // copy - otherObj now has anObj's data.
a = &b;           // update a to reference otherObj (not b)
                  // a would have to have type * * MyClass for that.

Move semantics would definitely be baked in from the being.

Operator Overloading

Indeed. But I’m thinking of stealing a page from python’s playbook  and have special names rather than trying to use the actual lexical symbol. To help, however, they would be names that would otherwise not be valid function identifiers. Maybe something like this..

class MyClass {
     func op.assign() {};
     func op.plus() {};
     func op.create() {};  // constructor
     func op.destroy() {}; // destructor

The downside of this is that it would be a little weird redefining op.plus to do string concatenation.

Templates

I’d like to generalize the if constexpr paradigm to include the whole function/class, so that its entire existence can depend on an “if” of some sort.

tif isarraytype(T) {
   func foo(T) int ( /* some stuff */ };
} else {
   func foo(T) int { /* different stuff */ };
}

This may not be doable or even make sense. How does it play with normal overloading?

Where are the template parameter lists placed? I’m think of putting them at the class keyword rather than the name.

Module System

Modeled after python. But with some compiler support for “Module providers” so that the name of the file would not have to be tied to the file name (e.g. for versioning), but that would be the default.

Modules would be compiled once, of course. only things specifically marked would be exported. Would be nice to have a tool that could generate a reference guide to a module based on its exported symbol table.

Phase I implementation

The initial parser/compiler will be built in C++ use Boost.Spirit X3. This phase will be more of a translator that will spit out C++ code to implement the semantics. Module files will be some dense representation of the parse tree.

Phase II implementation

This will still utilize Spirit, but will target LLVM IR as the target output.

Conclusion

Thanks for making this far. If you have any thoughts on any of the above, I would be happy to hear them.

There is obviously a ton of work to do. And not much spare time. So, this will definitely be a slow leisurely endeavor.

Day One

So, I decide to join the blogging crowd. It remains to be seen if it sticks.

Why?

I’ve been doing some home-brew software development lately and found that there were some things I wanted to say and no place to say them. So here is that place.

This won’t be all about software development. There will undoubtedly be some music-ish things thrown in. And things about music software and probably software music too.

What does the name mean?

Not much. Vamping is used in music to mean an small mostly improvised section of music over a very small chord set used to either kill time or to support a soloist.

Apparently it also means to put a new upper on a shoe.

The main reason I picked the name is that it didn’t appear to be used by anybody.

To quote Vonnegut – “And so it goes”