Thanks to all who offered their help in reply to my last post about visualizing typological data. Most of you agreed that the recent hype about visualization and infographics almost completely neglects qualitative small-n data in favour of quantitative large-n sets. There are various reasons for this state of affairs: for one, much of this hype is driven by the fact that large government statistics are being put in the public domain (or by the availability of equally large private statistics generated through web 2.0). Secondly, small-n data lends itself nicely to narrative writing, rendering visualization a less pressing requirement. Lastly, such qualitative data tends to be far more complex than statistics, and its complexity can not easily be reduced through statistical generalization.

Reducing the complexity of small-n data is not impossible, however - and usually takes the form of typologies. The need to visualize these, and particularly to visualize them in an interactive way, arises from the fact that such typologies often suggest a rigidity which is never there in the data (as I wrote here). The reason for this deception is basically that the underlying cluster analyses - be they statistically aided or intuitive - always generate an x-fold typology if you ask them to so - even if the dissimilarities between types are marginal in comparison to their similarities. The irreducability of original data behind typologies is therefore what I would love to visualize, to give readers of my upcoming book1 a hands-on feeling for the flexibility of the typology of Muslim peace activists which I propose therein.

Now with your help, I figured out two visualization strategies that might work (though neither is perfect): Sankey flow diagrams and Node-Edge graphs. It is not accidental that both options rely on network-like data: typologies club certain people (or entities more generally) together because they share similarities with each other while being dissimilar from other clubs of people (or entities). These similarities and dissimilarities can easily be transformed into links between network nodes - some links between people are weaker, some stronger (which is why they are grouped together in an empirical typology). How do Sankey flow diagrams and Node-Edge graphs capture and visualize this?

First to the Sankeys. A nice web adoption (and explanation), called Fineo, has been developed by the DensityDesign Lab at Politecnico di Milano, Italy. The visualization adorns the picture for this post (to your top right): basically, one can feed their application the categorical index which one usually creates in the first step towards any typology (rows being people or entities, columns various categories, and the fields themselves record how people fare in each category). If one adds one column for "Type" (the type each person is assigned to as an outcome of the typologizing process), one can visualize how the typology relates to any one or two given categories - and see how a thick stream might link one type with one particular value for this category, while other people of the same type link to other values (have a look at the picture if this was unclear). Fineo allows to select which categories one wants to look at, and to filter both the typology and the category, if one is interested in some select values only. Neat!

A second visualization option would be a Node-Edge graph, the traditional representation of network data. Using Gephi and gexf-js, one can picture the relationship between each person (or entity) in one's typology with an edge: if the two entities so linked belong to the same type, the edge would be coloured to visually convey the typology. More importantly, however, the thickness of the edge can be made to rely on the dissimilarity statistics which one most likely generated from the categorical index in a second stage of typology extraction (for instance via Gower metrics2). This, too, makes for a nice optic which intuitively conveys the links within as well as between any given type.

It is still too early to release the draft visualizations for the typology developed in my book to the general public, but drop me a line and I will share the link. I am glad to have found at least some solution to the dilemma of interactively presenting small-n typologies. At last!