First an apology to my readers: this "weekly blog" turned monthly ever since I started writing up this paper, that one, a resubmit and my PhD in general. Add to this Easter holidays and incessant networking now that I am back in Europe - you get the picture. More: I am afraid this state of affairs is likely to continue for a while. But one particular project reached a milestone worth reporting: sharing my dataset on religion and politics in Uttar Pradesh - under an open license.
Ever since my namematching and mapping efforts bore fruit, I wanted to share the resulting massive dataset with the interested research public. Partly because I received significant public funding for this (by virtue of being Associate in Oxford), and believe that data from publicly funded research should be publicly available. Partly, I also know that I would likely never exhaust the potential of these data, given my own (diverging) research interests in different things. In a recent workshop in London, I finally spoke with a number of young colleagues who did similar mindless copy-paste-jobs with Indian Election Commission data that I already spent many weeks on - and it just seemed such a waste of energy that we all did the same thing parallel, energy which should go into better analyses instead. For a brief while, I then considered offering the dataset to the Center for the Study of Developing Societies in Delhi as an institutional home - but they have fairly restrictive access policies and charge for their data (this is partly because they - unlike me - have less generous public funding, so my decision for open data should not be misconstrued as a heads-on criticism of their approach). For my own data, I wanted to find a truly open home.
Over the coming weeks, I will thus make an innovative and uniques (methinks at least...) micro-level dataset on religion and politics in Uttar Pradesh available under an open license. The archive's architecture is online already, as are GIS shapefiles on polling station, assembly and parliamentary constituency level. The first actual datafiles will follow soon, pending final error checks (this is the hard part of sharing data, particularly outside lose academic collaborations: to make it usable and comprehensible for "outsiders").
A key decision was to put this dataset under an open license (kindly developed by the Open Knowledge Foundation): you are allowed to use and redistribute this data, but on the condition that you a) attribute properly (which should anyhow be good academic practice) and b) share alike. And thats the catch: if you combine this dataset with your own variables, you need to make those publicly available on comparable terms, too. This trojan-horse approach worked in other knowledge domains already (not least creating most of the opensource software I am using for my daily work), and I hope it inspires others to be more forthcoming with data-sharing (once a critical mass of relevant data are not merely in the public domain, but under similar open licenses, it will become hard to keep data locked up somewhere, even for those who still prefer this approach). After a sensible embargo period (my definition: until a first publication is in print, so that oneself can publish one's data first which seems a reasonable prerogative), and as long as one makes caveats transparent (and data thus not only available, but usable), I can't see any good reason to limit access to data. Anyway, I decided to try it - and am curious to see how it works...