Today, I follow up on my initial post on names ("What's in a name?"), which later inspired the map of Muslim Lucknow and my ongoing election analyses. The key idea back then was: if micro-level datasets on religion are unavailable, can we not create our own by making informed guesses about the religion of registered voters - lists of which are readily available? This methodology and its surprisingly high accuracy created quite some excitement over the last months, and a "research note" on it is on the way to publication (here). It thus seems to be about time to clarify the limits of this strategy: what is not in a name?

One thing that is not - or at least not clearly enough - is sectarian affiliation. Quite some people who got excited about my earlier posts asked whether the same strategy would also work to separate Shia and Sunni based on their names. This would open interesting analyses in the case of Lucknow in particular (see here), but I honestly did not think it would fly. People insisted, so I gave it a shot - which by and large confirmed my hesitation: inferring sectarian belonging from names is frought with difficulties. That much is clearly not in a name.

But the exercise is still reliable enough to give a rough guess of Shia population share in Lucknow, a topic of perennial debate, with estimates among Lucknowites of either sect ranging from 10% - the figure reported in the colonial Census1 - to 25-30% - a figure mentioned in recent research.2 Before I tell you which of these two is supported by my own analysis: some dry technicalities.

For identifying Muslim names, I relied on a master name list from A comparable list is not available for Shia vs. Sunni names, the conventions for which also tend to be more localized given the very specific sectarian history of Lucknow. To come up with a master name list, I thus asked four local experts - two Sunni, two Shia - in my network to classify the most prominent names found among at least 0.1% of the Muslim population in Lucknow into five boxes: clearly Shia, slightly Shia, undecidable, slightly Sunni and clearly Sunni (see picture for today's post). The four experts agreed in most of their judgements, or rather: they often disagreed about the strength of the sectarian connotation of any given name, but rarely about the general direction (leading to a good enough inter-coder agreement: Krippendorff's alpha3 stands at 0.43). For the following, I removed the few names where experts disagreed on the tendency (and not just its strength) from the list and formed an average expert opinion for all other names.

Overall, the exercise revealed a lot of undecidability; few names came out as clearly sectarian. On the Shia side, these were male names Haider and Abbas and surnames Saidi, Agq, Abbas, Abdi, Husain, Jafri and Mahdi. On the Sunni side, it were male Usman, female Aysha, and surnames Usmani, Farukhi, Khan, Ullah, Uddin, Ansari, Siddiqi, Beg, Faridi and Kidwai. Some of these names are fairly common, though, and when one includes "slightly" Shia or Sunni names, one can actually get somewhere. But where? And how reliably?

As a next step, I created three certainty benchmarks. Firstly, I assumed a person to be Shia or Sunni with strong certainty if at least one of his names was on average classified by my experts as "clearly" Shia or Sunni while the average classification of all of her names simultaneously leaned more than "slightly" towards the same side, too. Secondly, I assumed a person to be Shia or Sunni with medium certainty if at least one of his names was on average classified by my experts as "slightly" Shia or Sunni, as long as the average classification of all of her names did not lean towards the opposite side. Thirdly, I assumed a person to be Shia or Sunni with weak certainty if at least one of his names was classified by at least one expert as Shia or Sunni at all, as long as the average classification of all of her names did not lean towards the opposite side. Finally, I counted the percentage of Muslims classified that way once by using only the voter's own names - and once by also taking into account the voter's father's names (which helps in the case of women - since female names are less clear in their connotations - but also blurs the effect of cross-sectarian marriages).

With these benchmarks in place, I was able to categorize 12% of Muslim voters as Shia or Sunni with strong certainty, another 15% with medium certainty, and a final 22% with weak certainty; 51% of Muslim voters could not be classified either way. Once I took father's names into account, the percentages increase to 21% with strong, another 36% with medium and a final 31% with weak certainty, leaving me clueless about only 12% of all Muslims. It is hard to say which benchmark is the best to use - it will always be a trade off between classifying more people less reliably versus classifying less people more reliably and then generalizing to a larger population.

Now the interesting part and ultimate goal of this complicated exercise is the estimate of Shia (and by inversion Sunni) population share among Lucknow's Muslims which emerges from these classifications. If only looking at voters classified with strong certainty, Shia percentage stands at 33%. If I also include those classified with medium certainty, Shia percentage falls to 25%, and by including those classified with low certainty it rises again to 32%. If I take father's names into account, the percentages stand at 35%, 25% and 36%, respectively (which indicates that cross-sectarian marriages are either uncommon or cancel each other out on aggregated levels - so that I can safely use father's names, considering that this leaves far fewer people unclassified).

At first sight, this looks like a surprisingly clear endorsement of recent research findings2 - but there is still quite some methodological wariness involved. Could I overestimate Shia percentages due to my methodology? I might. First of all, it is possible that my four experts got it collectively wrong. There is no way to control this, other than trust them. Secondly, there might be a bias because Shia names are easier to identify than Sunni names or the other way round. This does not seem to be the case, though, at least not outrageously so: across all four experts and all names, names were classified as clearly or slightly Shia 140 times, but 366 times as clearly or slightly Sunni. With this ratio of 28:72, it is unlikely that a bias in the master name list distorted my results (not accounting for, importantly, the frequency of these 140 or 366 names, though - that would make the argument circular). Shia do care more for clear names than Sunni, however - a minority thing, as many kept telling me (and something confirmed in the experts' classifications as well). Might this tendency have distorted the results? It doesn't seem like: while my estimates for Shia population shares are indeed highest when I only take "clearly" Shia or Sunni names into account, they are almost equally high on the opposite end of certainty, with even only very "slightly" Shia or Sunni names in the claculation.

But there are more potential flaws. For instance, all of the above only covers voters, and Shia and Sunni might be registered as voters to different extents. We also don't know whether the betwen 12 and 88 per cent of Lucknow Muslims not covered by the calculation (because they have indistinguishable names) follow the same pattern - even though the fact that all estimates listed above lie between a quarter and a third make me optimistic. Most importantly, however, I have no body of names of people with known sectarian affiliation to test this probabilistic strategy the way I could test the Muslim / non-Muslim algorithm.

It is not all gloomy, though - probabilistic theory gives my conscience some relief: all of these flaws will become less of an issue the more I aggregate - since then, the likelihood that there will be a wrongly identified Sunni for each wrongly identified Shia increases.

Which in reverse means: mapping on high resolution will become fairly unreliable, and will certainly not be possible at the same level of accuracy I achieved for my Muslim map (I will give it a shot next week, though). Separating Shia from Sunni is at the brink of breaking the general analytical strategy of name-based classification: that much is really not in a name. Still I feel confident enough about one thing: it seems likely that about a quarter to a third of Lucknow's Muslims are Shia - which is more than the ten percent it used to be in the 1900s...

I would like to acknowledge the use of the Oxford Supercomputing Centre (OSC) in carrying out this work.

  • 1. Hasan, M. (1997). Traditional rites and contested meanings: Sectarian strife in colonial Lucknow. In V. Graff (Ed.), Lucknow: Memories of a city. Delhi: Oxford Univ. Press.
  • 2. a. b. Verniers, G. (2012). A minority within a minority: The Shias of Kashmiri Mohalla, Lucknow. In L. Gayer, C. Jaffrelot (Eds.), Muslims in Indian cities: Trajectories of marginalisation. London: Hurst.
  • 3. Krippendorff, K. (1980). Content analysis: An introduction to its methodology. Beverly Hills: Sage.