Dotfile Maintenance

Here is a party I’m a bit late in joining. And it is one of those ideas that makes you smack your head wondering why you didn’t think of it.

Config file tweaks

For years, I’ve maintained a tar.Z bundle that contains a .profile, .cshrc, .bashrc, .vimrc etc. You can tell how old it is by the fact that I still maintain it with compress rather than gzip. When I started it, gzip did not exist. And even after it did, I could not be sure that a system would have it installed. There are a few zillion copies of it lying around on various drives and removable media. Needless to say, each one is slightly different and it may be hard to figure out which is the “latest”.

But with the advent of github, why not use that to store and version control those files?

Github has a nice round up of various systems to do just that.

The idea is rather simple. The “actual” dotfiles live in a git repository that can be cloned and updated. What lives in your home directory is a symlink.

It can get fancier with some of the frameworks allowing you store the files as fragments that get built into a single unit. That allows you to group all the settings for e.g. fzf in one place rather than having to go hunt them down in all the various place they might reside (.vimrc, .bash_profile, etc).

Or, if you don’t like all the symlinking, you can make your entire home directory the repository.

Note that you might still need need to render the repository down to a tarball to get it on a system if the company admin does not allow connection to github – a not unusual security stance to take, mostly to keep things from moving from the system to github.

Getting Started

For something simple to get your feet wet, I would suggest Jeff Coffler’s skeleton.

Jeff has organized things by system (*nix/Win/Mac) and then by subsystem (bash, git, vim), though that second level can actually be organized anyway you want.

Fork his repository into one of your own, clone it to your system. Copy your original dotfiles into the working directory (remember to rename them to remove the dot) and run the bootstrap.sh. Commit; push; you’re done.

One particular file to watch out for is nix/git/gitconfig. It has Jeff’s name and email and some other stuff that you will probably not want. So, be sure to copy your config over top.

Dot directories

Edit: This fix has been added to the upstream version now.

There is a minor flaw in the bootstrap.sh script that makes it play rough if you have a dot directory (e.g. .vim) that is a symlink. The script will delete the symlink as “stale” since it assumes all such dot things should be files. You can grab the version from my dotfiles as a fix. I will be working to get Jeff to make the change in his copy as well.

Vim packages as submodules

This is a good tutorial on how to manage you vim packages as submodules of your dotfiles repo. I really don’t have anything to add. I am using this method to handle NERDtree and lightline

Advertisements

Dairying in Mongolia

This will be a bit off from the normal fodder for this blog, but I thought it was interesting.

Lets start with a bit of biology.

Babies naturally produce an enzyme called lactase that allows them to digest the main sugar component of milk – lactose. Many (globally it would be most) people lose that ability near puberty. This leads to lactose intolerance and the GI issues it can cause. Many people of European heritage, however, are lactase persistent and maintain the ability to digest lactose – to the delight of the dairy industry.

The standard story in archaeology about dairy husbandry is that lactase persistence will naturally follow along the path of the spread of dairy farming. After all, it would be a great advantage to be able to consume the additional calories and proteins – as an adult – that milk and other dairy products represent. So those that are lactase persistent would be better able to spread their genes and the trait would come to dominate the population.

And when we look at the spread of dairy animals in the Africa and the Levant, that is pretty much what we see.

And then, there are the Mongols.

According to this article from HeritageDaily, the Mongols have been drinking milk as adults for 3,000 years without gaining a majority of lactase persistence in the population.

So, the question is, why?

Has there been continuous influxes of people that are not lactase persistent – essentially swamping any shifts in allele frequency? Have they come up with some other genetic variant that mutes the normal response to lactose?

Hopefully somebody will look into that.

Musical Sensor

Vibrating cantilevers has a long history of being used as sensors, but almost always in the micro-domain. The cantilever is frequently etched into silicon or other substrate. The weight of even single molecules can be measured or detected.

The idea is fairly simple. Change the distribution of weight on the cantilever, and the frequency of vibration will change.

It can even be used as a motion detector since acceleration in the same plane as the natural vibration will either start the cantilever vibrating or will change the frequency of the vibration.

A research group at the University of California, Riverside was attempting to find a cheap, easy way to detect counterfeit or adulterated medicines.

Adulterated medicines will almost always have a difference density to the “real” product. So, if a fixed volume is tested, then the weight will be different. Cantilevers based sensors can be very sensitive to such changes.

But what could be used to create the cantilevers?

The mbira is musical instrument that uses cantilevers to produce tones (similar to a music box). This became the inspiration for the paper Musical Instruments as Sensors

Of course, there is also the need to capture the frequency and compare to a standard. Since the mbira produces tones in the audible range, why not use the recording capabilities of a smartphone?

The researchers created a site which allows a user to upload recordings and have them analyzed. According to the paper, the analysis software is written (at least in part) in python.

Bibilography.

Another writeup

Github for the Grover Lab

Spirit X3 – Separate Lexer Part II

Last time, we looked at the lexer and supporting staff. This time, we will look at the primitive parser and final usage.

The full code is in GitHub.

tok

The tok parser is quite simple. Give it the TokenType to look for and it returns true if that is indeed the next token in the stream.

This is the beauty of the separate lexer. The lexer is responsible for the hard work of classifying the characters and splitting them up into logical chucks. Parsing can concentrate on a little higher level of syntactical analysis – how those chunks are organized.

One subtlety in tok’s code. We must be sure to consume the token (advance the iterator) if and only if we actually match.  Since we are only assuming a ForwardIterator, we can’t depend on a decrement operation in order to “unconsume”. So we, make sure the increment is only done in the true leg of the match logic.

There is a bit of a flaw in the interface of tok. Currently it makes an undocumented assumption that the iterator’s value type has a istype operation. Once Concepts are standardize (hopefully in C++2020), we would be able to document this. As it is, passing, say, a char iterator would cause a vary hard to debug instantiation error.

In parting, we define two specializations for operator>>. These will allow use to simplify the parser expression we write. If we were doing this for real, we would also specialize operator| at the very least.

Main

There should be no surprises here. The grammar is straight-forward. Those helper specializations come in handy letting us string together token type rather than having to explicitly wrap everything into a tok().

auto vardef = tok(tokVar) >> tokIdent >> tokSemi ;

rather than

auto vardef = tok(tokVar) >> tok(tokIdent) >> tok(tokSemi);

And we’re done.

But we can do a bit better.

Attributes

The current way tok()is specified, it exposes a string as its attribute. So, the syntensized attribute for, e.g. vardef would be something like array. But this is less than desirable. Normally, we would not be concerned about an attribute of say, the var keyword. If we DID want to know, we could capture the difference as seaparate rules.

There will certainly be exceptions. For instance, if we had two keywords for a type (say int, and float), we would definitely care which of these we parsed, but not want to have a separate rule for each.

The solution is to define two different parsers – one which  returns an attribute and one which exposes the attribute type used. And in v2/parser.hpp, that is what has been done.

tok_sym

This is the version that returns the string. It is a renamed copy of the original tok.

tok_kw

This is the new one that does not return an attribute. There are only two changes from the original.

The attribute is specified as unused:

using attribute_type = x3::unused_type;

and the assignment to attr has been removed.

Now the v2/main can say…

auto vardef = tok_kw(tokVar) >> tokIdent >> tok_kw(tokSemi);

And now, the synthesized attribute for vardef is a simple string.

Other Improvement

We could make other improvements.
We could get keywords to automatically use tok_kw by using two different enum classes.
We could provide a third variety that returns the TokenType as the attribute (to solve the int, float problem.
I’ll leave those as exercises for reader.

Spirit X3 – Separate Lexer Part I

Back in this post, I said about Spirit ..

…it would be very feasible to write a lexical analyzer that makes a token object stream available via a ForwardIterator and write your grammar rules based on that.

But is it ? really?

The short answer is – Yes its feasible, but probably not a good idea.

The long answer is the journey we’ll take on the next two posts.

Overall Design

The first part will be a stand-alone lexer (named – of course – Lexer), that will take a pair of iterators to a character stream and turn it into a stream of Tokens. The token stream will be available through an iterator interface. We’ll look at in more detail in a moment.

The Spirit framework can be thought of as having 4 categories of classes/objects:

  • rules
  • combinators (“|”, “>>”, etc)
  • directives (lexeme and friends)
  • primitive parsers (char_, etc)

Only the primitive parsers truly care about the type you get when dereferencing the iterators. Unfortunately, they really care (and rightly so). So, that means we will need to write replacements for them. Fortunately  we do not have to replace any of the rule or combinator infrastructure or this would be undoable – even on a dare.

To recap – we will be writing the following classes:

  • Lexer – the tokenizer
  • Token – the class that represents the lexed tokens.
  • tok – a primitive Token parser.

We will look at Token and Lexer in this post and tok in the next.

All code can be found in the GitHub repository

Token Class

Looking at lexer.hpp, the first thing we see is the enum TokenType. No surprises here except possibly the fact that we need a special tokEOF to signal the end of the end input. This will also act as the marker for the end iterator.

struct token is also fairly simple. It will hold the TokenType, iterators to where in the input it was found and the actual lexeme. The lexeme won’t be of much use except in the case of tokIdent.

I intentionally made these small so that we could pass tokens around by value most of the time.  The embedded iterators are not really necessary for this project, but would be if this were fleshed out more with good parse error reporting.

The most important things are the istype member function and mkend()istype() will be what the parser uses to decide if there is a match. mkend() is a static helper to generate an EOF token.

Lexer Class

Lets start off in the header file – lexer.hpp.

To keep this simple, I decided to hardcode the fact that we are using std::string::const_iterators as input.

The lexer class itself is simply a shell. It holds on to the input iterators and uses them to create it’s own iterators as requested. begin(), end() are the only reason the outer class exists.

Lexer::iterator

Lets look at this in some detail.

using self_type = iterator;
using value_type = token;
using reference = value_type &;
using pointer = value_type *;
using iterator_category = std::forward_iterator_tag;
using difference_type = int;

These types are require to allow our iterator to play nice with STL algorithm. The STL templates consult these typedefs to know what types to instantiate for temporary values, etc. We could use these to make the lexer class hyper-general and match any value type for which operator== is defined.

But lets not.

self_type operator++();
self_type operator++(int junk);
reference operator*();
pointer operator->();
bool operator==(const self_type& rhs) const { return m_curr_tok == rhs.m_curr_tok; };
bool operator!=(const self_type& rhs) const { return !(m_curr_tok == rhs.m_curr_tok); };

These are the operators that are needed to make it a ForwardIterator – increment and dereference and equality.

Note, that in general, you will also want to supply a const_iterator as well. The only difference would be that operator* and operator-> would return const versions.

Now lets head over to the implementation – lexer.cpp

Skip_space

This is a utility that – as the name on the box says – skips spaces. It also helpfully returns an indication if the end of input was reached. In an effort to be somewhat standard, isspace is used to decide whether a character needs to be skipped.

get_next_token

Here is the heart of the lexer. get_next_token returns by value the next token that it can get out of the input or return tokEOF if it reaches the end of input or can make a valid token out of the current position.

After skipping spaces, it checks to see if the current character is a “punctuation” token – in this case a semicolon or a parenthesis.

If not, it gathers up the next batch of consecutive alphanumeric characters and checks to see if they are a keyword. If not, it brands it an identifier.

And that’s about it for the lexer.

Next time, we’ll look at the parsing primitive and put it all together.

Static Exceptions

Dynamic Exceptions have their flaws. Herb Sutter has proposed a replacement known as Static Exceptions. Lets look at it a bit.

Before we do, we need to look at the C+11 feature std::error_code

std::error_code and Friends

Anyone who has done any coding in C knows about good old errno, the global int that many system functions will set to signal a problem.  This, of course, has many problems, not the least of which is that different platforms could and did use different integer values to represent the same error.

To bring some order to the chaos, std::error_code was added along with its friend std::error_category.

An error code is actually two numbers – an integer saying which exact error and a “category” or domain for that error. Thus the math related errors and the filesystem errors could have the same integer value, but different domains. A domain or category is nothing but a pointer to a singleton object.

For a bit more, go look at the cplusplus.com writeup as well as a tutorial on creating your own error codes from the folks behind Outcome.  And here is another writeup on a use of custom error codes.

For our purposes, std::error_code has four really nice properties:

  • It is small – the size two pointers. It could in theory be passed around in cpu registers.
  • Creating one cannot possibly throw an exception.
  • Copying and/or moving can be done with a memcpy or just two memory read/writes.
  • It does not require any RTTI – no dynamic casting is required – only (possibly) a static_cast between integer types.

Dynamic Exceptions considered harmful

Sutter does a much better job than I can of enumerating the problems with the current exception system. So go read the paper.

And error returns schemes such as Expected or Outcome aren’t much better.

Static Exceptions

Sutters proposal is to do something like the following.

Introduce a new keyword throws.

IF you define a function as :

T my_function() throws;

Then behind the scenes the compiler will act as if the function was defined.

variant<T, std::error_code> my_function();

In the body of the function anything that looks like:

throw e;

get translated to a simple

return e;

And at the call site

try {
    x = my_function();
} catch (e) {
    /* try to recover */
}

Will get translated into something like:

x = my_function();
if (compiler_magic::is_error(x)) {
     /* try to recover */
}

This eliminates the hand-rolled “if checks” that have to be written to use something like Outcome. And it propagates. If you don’t handle the call there will still be the check, but it will have a simple return to move the exception outward.

The paper is filled with more details about the interplay between the proposed mechanism and the current exception system, noexcept, and other details the language lawyers need to care about.

Onyx

I have decided to make this the standard of exception handling in Onyx. There are details to be worked out. In particular in the early stages, I will literally have to rewrite the return types in order to “reduce” Onyx to C++.

But it will be fun to try out.