Almost half a year after my book on Muslim peace activists in Gujarat hit the shelves, a first review appeared. This is what Moinuddin Ahmad had to say in The Statesman, one of India's oldest English newspapers particularly read in West Bengal:

This book is a rare collection of brave accounts of the people, asserting different identities, marked by the author. It may inspire people in various capacity to engage with the society and work for peace. The author tries to elucidate a thin line between the "religious" and the "secular" in many aspects of daily life, and how it gets blurred when people with different confessions work for same cause, as was done in Gujarat. This work by Raphael Susewind definitely opens a window through which Muslim society and peace-builders can be seen with a nuanced perspective.

I find it quite interesting to see how he focusses on my blurring the line between the faith-based and the secular, a way in which I haven't thought about my work so far (perhaps because that line for me was always kind of obviously fake), but which makes obvious sense, particularly in the context of the Indian debate. Read the whole review online...

A few weeks ago, I wrote about additional estimates for the accuracy of my namematching algorithm, and also commented once more on the test corpus of names used to establish these. In the meantime, Gilles let me access his dataset of the social profiles of all MLAs in Uttar Pradesh since independence in full, i.e. including a manually coded variable for religion (thanks again!). Using his data, I was able to alleviate some potential biases in my original test corpus (particularly in terms of non-Muslim names), making my accuracy estimates more robust still.

The new test corpus consists of three raw name lists: the Haj Qurrah for 2012 (which by law only includes Muslim names, and should be fairly representative of those, as argued earlier), the undergraduate admissions list of Lucknow University under SC quota (which by law excludes Muslims, but has a bias towards lower economic strate of non-Muslims as well towards the young), and finally the names of all MLAs since independence (both Muslim and non-Muslim, and arguably with a bias towards higher economic strata as well as older people). The former two lists provide names and father names, the latter has name and gender. In the overall corpus, the ratio of Muslims to non Muslims is roughly 50:50 (since the Qurrah is fairly extensive); the following figures weighed the corpus to reflect the religious demographic of UP (which does not affect sensitivity and specificity, but renders predictive values more meaningful).

It's almost a year since I began working on a namematching algorithm to approximate Muslim population share in Lucknow's mohallas by exploiting the religious connotations of names on the electoral rolls of these areas. This has worked out quite well, and since led to a number of follow-up analyses, several conference papers, new collaborations, an article under review, two more in the pipeline - and last but not least the publication of a large dataset on religion and politics in Uttar Pradesh (featured in my second last post).

One thing kept worrying me, though: the scope of the algorithm varied quite a bit. Across UP's assembly constituencies, for instance, it sometimes managed to categorize 95% of the electorate - and sometimes only 70%. While accuracy of those names which were identified seemed alright, missings of up to a third were worrysome. Overwhelmingly, they however occured because names in the electoral rolls were simply not covered by indiachildnames.com. There is little I could do about that, I thought.

Whenever I discuss my name-matching algorithm and derivative work, one question comes up: how well does it work outside UP, at other times, for other groups of people? And: what if your test corpus of names (Haj pilgrims and SC students) were non-representative of wider names (a concern particularly strong with the SC list)? Unfortunately, I have no hard and fast answer to these questions; they bother me, too.

But now, I have fresh some indicators at least - drawn from work-in-progress by Francesca Jensenius and a team around Christophe Jaffrelot spearheaded by Gilles Verniers. They try to look into social profiles of MLAs in India since independence - and as a prerequisite came up with a list of the names of all contestants in all elections in all states. Unlike the SC names, this corpus is arguably more elite, and it moves beyond UP, thus nicely complementing my own. On the downside, this list neither includes gender nor fathers' names, and first names are frequently abbreviated - much less material for my algorithm to work with. Most importantly, I only have the bare names from them, not the manual classification (which, as I understand, is still work in progress - once this is done, I could calculate actual sensitivity, specificity, PPV and NPV).

First an apology to my readers: this "weekly blog" turned monthly ever since I started writing up this paper, that one, a resubmit and my PhD in general. Add to this Easter holidays and incessant networking now that I am back in Europe - you get the picture. More: I am afraid this state of affairs is likely to continue for a while. But one particular project reached a milestone worth reporting: sharing my dataset on religion and politics in Uttar Pradesh - under an open license.