Lab

20.10.2016 ― Fame by name

Towards eRum

Background

Little is known about infographics with R. Those tends to be quite a curse word for an academic way of statistics. Too close and too popular by the masses, while taking things less seriously. Putting reality into abstract settings, maybe even executed from the design point of view.

But then again, the motivation and goal for such grotesque output might be different also: just making a point could be enough. Weird shapes and distorted forms of data could be used not to replace the standard graphics, but to bring other way to see the same stuff. Times it is easier to understand the key point, when it is clearly highlighted with a good visualizing. Actually, it could graphically scream THIS IS MY MESSAGE in front of you. An example shall follow.

About R package names

How do people name their children in general? Did your names travel all the way from the ancestors till present day or were you born by chance on the peak of LOTR hype? Maybe the name variety in your country is already settled to couple of options? [1] Maybe the parents wanted to test the humor of local DBA in the social security systems and named you 'Drop Table'? Many ways to end up with the final choice.

But when it comes to actual R packages, there are many reasons for selecting a name. Such as naming by creators or giving honor to the content. Sometimes it's straightforward from abbreviation, very descriptive content word or just an R adaptation from The Da Vinci Code [2]. As long as it would be something an R user (with reasonable sized memory capability) would be able to remember, the choice just might be enough. There seem not be any formal standard, but at least I haven't seen any name starting with pure numbers or special characters. Guess the name should be given artistic liberties, as variation makes random library browsing very interesting hobby.

About keyboards

QWERTY is not the only option for keyboard. I never thought about it, but realized this while reading Donald A. Norman's [3] legendary book, The Design of Everyday Things – that was written already decades ago. Despite the time interval here and turning focus from ignorance to the topic: does Dvorak sound familiar? The legend says this keyboard layout should be faster and more ergonomic compared to QWERTY ones. Even if in theory its better solution, in real life this challenger never replaced the QWERTY keyboard. As Norman put it later on the book, re-designing an old idea and selling it with new format, is much harder than bringing totally new concept into the market. Sad, since development might not happen if things are stuck with version 1.0 by general vote.

Since the infographics were supposed to be targeting the masses, the execution should therefore be done by the market leader version. Thus output device of R will be turned to QWERTY keyboard field.

Execution

Keyboard are by default, usually, sold with lame colors. Therefore, the version to be published with eRum slides should be colorful and celebrate the results. With package RColorBrewer I've found many use. Not just taking advantage from pre-defined color palettes, but making your own is easy and neat. Just picking few shades and the functions will settle the rest shades between them. Worth trying every time. Yet, if scaling the colors by equal intervals, might be smart to pick colors that won't distort the values. Palette must be reliable within its levels.

Available R-packages can be found here [4]. Idea of the execution was to check whether package names are distributed same way as normal names/words. Thus founding semi-random electrical dictionary with 300t+ words looked more than ok to become the base for comparison data. Excluding from dictionary the words starting with R-illegal characters and creating an infographic with two keyboard formation – the results looked as above [5]. The darker the key background color is, the bigger occurrence rate the letter has in the data.

The conclusion is that R packages tend to be named with initial R quite often. Which is bit trivial: they do are R packages – so they have the R already in the content and as the language! Yey! Could be tactical move, to name any package with other initial than R, as things tend to melt into the mass. If anyone uses the UI for installing and loading, finding a package that is located in the beginning or at the end of alphabets makes it easily accessed. Or if using rather rare initial letters, the internet search might find unique name easier that one of the ten package-doppelgängers.

End result


Do use R, please.
But maybe less with naming?

Giving just an idea here...