First an apology to my readers: this "weekly blog" turned monthly ever since I started writing up this paper, that one, a resubmit and my PhD in general. Add to this Easter holidays and incessant networking now that I am back in Europe - you get the picture. More: I am afraid this state of affairs is likely to continue for a while. But one particular project reached a milestone worth reporting: sharing my dataset on religion and politics in Uttar Pradesh - under an open license.

Once you have observed something, once you have interviewed somebody, how do you get from this pile of data to a compelling research report? Today's blog sums up the gist of a lecture I recently gave on this issue as part of my field methods teaching at SIT Study Abroad. It looks at four stages in turn: explorative analysis, draft writing, confirmative analysis and editing. These stages are, of course, most likely circular activities, through which you go back and forth until you a) arrive at a decent report or b) simply run out of time. Which is the chief reason why it is advisable to start the circle as soon as possible, not letting the pile of data become so scaringly tall in the first place...

Always, always, always look closely at raw data before doing any statistics! This was the most important lesson my statistics teacher tried to impress upon me back in undergraduate training. Funny things can go wrong when handling large datasets, so switch on your common sense and compare input with output - or so he said. He has just been proven right once more. I spent two weeks to pay for my negligence, and the following three blog posts had to be corrected:

Mapping Lucknow: party strongholds
Mapping Lucknow: Muslim life
Residential segregation

What happened? Two weeks ago, I decided to wrap up my work with the electoral rolls which kept me occupied for the last so many weeks. While copying all files in a common folder to clean up the mess on my pendrive, I saw an odd irregularity in polling station names. I looked closer. And it all blew up.

In order to create the maps and statistics mentioned above, I had to integrate datasets from four different years: election results from 2007, 2009 and 2012, polling station localities from 2009, and electoral rolls revised in 2011. I knew that 2007 would be tricky, since constituency boundaries were redrawn in the 2008 delimitation exercise. I did not expect 2009, 2011 and 2012 to be a problem though. Consequently, I just integrated these datasets based on the unique polling booth ID assigned by the Election Commission. Silly me.