It's almost a year since I began working on a namematching algorithm to approximate Muslim population share in Lucknow's mohallas by exploiting the religious connotations of names on the electoral rolls of these areas. This has worked out quite well, and since led to a number of follow-up analyses, several conference papers, new collaborations, an article under review, two more in the pipeline - and last but not least the publication of a large dataset on religion and politics in Uttar Pradesh (featured in my second last post).
One thing kept worrying me, though: the scope of the algorithm varied quite a bit. Across UP's assembly constituencies, for instance, it sometimes managed to categorize 95% of the electorate - and sometimes only 70%. While accuracy of those names which were identified seemed alright, missings of up to a third were worrysome. Overwhelmingly, they however occured because names in the electoral rolls were simply not covered by indiachildnames.com. There is little I could do about that, I thought.