If you were to meet the stereotypical Muslim couple in Lucknow, she would be called Noor, he Mohammad (but prefer to abbreviate his name to Mohd or Md) and their last name would be Ali - these are the three most prominent Muslim names found on Lucknow's electoral rolls right now. The runner-ups are Nasreen, Naseem, Sima and Parveen on the female side, Abdul, Ahmad, Ali and Saif on the male side and Ahmad, Bano, Khatun and Khan as far as lastnames go.1 What is interesting is to see how the frequency of these names change over time - and how gender plays a role in this. Let's have a look at male firstnames first (click on the image for a larger version):

  • 1. I should clarify that single names - if found on the electoral rolls - count as lastnames here...

Luckily, the server farms in Oxford remained unaffected by the huge electricity blackout rocking North India for the last 48 hours. Ever since my post on Muslim names, they stoically crawled through Lucknow's current electoral rolls to guess the religious community each voter belongs to. They identified 98% of all voters at the pace of roughly half a million names a day (for comments on the accuracy of this matching exercise, see below). And they thus enabled me to generate a draft map of Muslim life in Lucknow by caculating population shares (voter population shares, to be precise) on polling booth level and linking them to polling station locality (drawn from the National Informatics Centre).

Apart from the river (in blue), my own home (red dot), and the MODIS built-up area polygon already featured in earlier maps, the following visualization shows "Muslim name" density (background shade), Haj pilgrims (green circles; drawn from the 2012 Qurrah), and major Islamic institutions in Lucknow. The map is interactive: do zoom in for increasing detail, and click on the mosques to get to the respective institution's website (there is also a larger version):

Can one infer the religious community to which an Indian belongs from his or her name? Intuitively, the answer would be yes: Indians and those familiar with the country certainly develop a pretty good sense for such inferences. And even though names remain only one among several clues (including dress, language, etc), names alone are sadly often reason enough to discriminate against people (for instance to deny Muslims housing). But most Indians also know the flurry of probing questions along the lines of "What's your name?" - "X" - "No, your full name?" - "X Y" - "Where are you from?" - "Z" - "No, I mean: Hindu?". Clearly, names are not always good indicators to gauge an individual's community.

Today's post sheds a probabilistic light on this problem. First, I discuss why it could be useful to infer communities from names. Next, I introduce a name matching algorithm which I developed to achieve this task (building on others' earlier efforts, and available for download below under the GNU Affero GPL license). Finally, I give a first indication of how good my algorithm works: what's in a name? Your comments are of course highly appreciated - and I apologize in advance for a rather technical post (which is in fact as much a writeup for my own memory as it is meant for you to read). Once I develop empirical applications of this software, I promise more lively prose...