Can one infer the religious community to which an Indian belongs from his or her name? Intuitively, the answer would be yes: Indians and those familiar with the country certainly develop a pretty good sense for such inferences. And even though names remain only one among several clues (including dress, language, etc), names alone are sadly often reason enough to discriminate against people (for instance to deny Muslims housing). But most Indians also know the flurry of probing questions along the lines of "What's your name?" - "X" - "No, your full name?" - "X Y" - "Where are you from?" - "Z" - "No, I mean: Hindu?". Clearly, names are not always good indicators to gauge an individual's community.
Today's post sheds a probabilistic light on this problem. First, I discuss why it could be useful to infer communities from names. Next, I introduce a name matching algorithm which I developed to achieve this task (building on others' earlier efforts, and available for download below under the GNU Affero GPL license). Finally, I give a first indication of how good my algorithm works: what's in a name? Your comments are of course highly appreciated - and I apologize in advance for a rather technical post (which is in fact as much a writeup for my own memory as it is meant for you to read). Once I develop empirical applications of this software, I promise more lively prose...






