Homophily in Voting Behavior [replication data]

Replication data for publication Coufalová, L., Mikula, Š., & Ševčík, M. (2023). Homophily in voting behavior: Evidence from preferential voting. Kyklos, 76(2), 281-300.

The primary data set used in our empirical analysis is an electoral database that contains the secondary information on candidates (name, age, political party affiliation, occupation, municipality of residency) together with the electoral outcomes, including the number of preferential votes received. Electoral data in the estimation the file are aggregated by municipality -- i.e. there is one row (observation) for each candidate, municipality and election.

The electoral database contains exact data on candidates' ages, which we classify into seven categories (18–29, 30–39, 40–49, 50–59, 60–69, 70–79, 80–89). Candidates list one or more occupations on the ballot. We manually encode these occupations into 11 categories according to the ISCO classification. In addition to ISCO level 1 we include a special category for “occupations” that are not listed in ISCO (e.g., “self-employed”, “motorbike rider”, or “anti-communist resistance fighter”). Another category consists of candidates who are likely economically inactive—occupations such as “retiree”, “student” or “mother”. We encode these occupations (6.6% of observations) as missing values. When a candidate listed more than one occupation we use the first one.

Information on education and gender must be inferred from the name and candidate description. Tertiary education is signaled by the presence of academic degrees.

We infer the candidates' genders from their names. We match names from the electoral database with a database published by the Ministry of the Interior of the Czech Republic which includes the frequency of each name for each gender. We assign each candidate the gender that is more frequent for his/her name. This is a reliable measure since the Czech Republic is highly linguistically and ethnically homogeneous, 9 first names commonly given to both genders are rare, and gender is further signaled with a suffix on the surname (“-ová” for female).

The estimation sample contains voting data for elections to the Chamber of Deputies held in 1996, 1998, 2002, 2006, 2010, 2013, 2017, and 2021. The sample contains all free elections that took place after the fall of the Iron Curtain and communist regime in 1989 and the dissolution of Czechoslovakia in 1993.

We construct variables for each municipality and candidate characteristic (i.e., education, occupation, age category, gender) that capture each candidate's similarity with the population of the given municipality. These variables are defined as the percentage of the adult population that shares the specific variant of the given characteristic with the candidate (e.g., belongs to the same age group). It results in a continuous measure for each of these four characteristics, with a theoretical range from 0 to 100%. We define an additional indicator variable that takes the value one for the municipality in which the candidate lives.

For the construction of these variables we use two municipality-level data sources. For age and gender structure we use annual data compiled by the Czech Statistical Office from census and administrative records. 10 Data on education and occupation are available only from decennial censuses. We associate elections with the nearest census – i.e., we use the 2001 census to calculate similarity measures in the 1996, 1998, and 2002 elections, and the 2011 census for the rest of the observed elections.

The replication data files/tables (DOI 10.5281/zenodo.7070555) contain the following variables:

  • pref...number of preferential votes
  • homo_municipality...indicator variable for a candidate running in the municipality of his/her residence
  • homo_(education/occupation/age/gender)...percentage of the population sharing the characteristic of the candidate
  • KOD_OBEC...municipality ID in CISOB classification
  • total_fe...ID of candidate-election pair
  • maxPORCISLO...number of candidates on the ballot
  • cluster_ID...ID for error term clustering
  • POC_HLASU...total number of votes cast for the party in the given municipality
  • small_municipality...indicator variable for a municipality with a population below the median (defined separately for each year and constituency)
  • year...election year (election ID)
  • VOLKRAJ...constituency ID
  • PORCISLO...position of the candidate on the ballot
  • VEK...age
  • agecat...age category
  • MANDAT...indicator variable for elected candidates
  • tertiary_educ...indicator variable for candidates with tertiary education
  • gender...male/female
  • ger_share...indicator variable for municipality being dominated by ethnic Germans in 1930

More articles

All articles

You are running an old browser version. We recommend updating your browser to its latest version.

More info