Publication Type:



Raphael Susewind


Field Methods, Volume 27, Number 4, p.319-332 (2015)



Data, PhD


Fine-grained data on religious communities are often considered sensitive in South Asia and consequently remain inaccessible. Yet without such data, statistical research on communal relations and group-based inequality remains superficial, hampering the development of appropriate policy mea- sures to prevent further social exclusion on the basis of religion. The open- source algorithm introduced in this article provides a workaround by prob- abilistically exploiting the communal connotations of names; it transforms name lists—which are readily available—into a new source of demographic data. The algorithm proves highly accurate in identifying Muslim population shares in Uttar Pradesh, India’s most populous state, but could be employed more widely across South Asia. It potentially enables more detailed analyses in economics, development studies, and political science as well as better sampling procedures in sociology and anthropology. This article describes the algorithm, evaluates its accuracy, reflects on ethical implications, and introduces a sample data set; the software itself is available in an online sup- plement to this article.