The Grass Is Not Greener on the Other Side [replication data]
Replication data for Coufalová, L., & Mikula, Š. (2023). The grass is not greener on the other side: the role of attention in voting behavior. Public Choice, 194(1), 205-223.
The primary data set used in the paper is an electoral database which contains candidates’ rankings (i.e., positions on the ballot paper) and secondary information on candidates that was made available to voters on the ballot (name, age, political party affiliation, occupation, municipality of residency) along with electoral outcomes, including the number of preferential votes received. Source data (broken by election) is available at the website of the Czech Statistical Office.
The electoral database contains exact data on the ages of the candidates, which we classify into seven categories (18–29, 30–39, 40–49, 50–59, 60–69, 70–79, 80–89). Candidates list one or more occupations on the ballot. We manually encoded these occupations according to ISCO classification into 11 categories. In addition to ISCO level 1 we also use a special category for “occupations” that are not listed in ISCO such as “student”, “mother”, or “retiree”. If a candidate listed more than one occupation on the ballot, we list the first one.
Information on education and gender must be inferred from the candidate’s name and description. Tertiary education is signaled by the presence of academic titles, which are commonly used in formal communication (written and spoken) in the Czech Republic. We use academic titles to classify candidates into four categories that correspond to the levels of education defined by the International Standard Classification of Education (ISCED). In addition to ISCED 5 (Short-cycle tertiary education), 6 (Bachelor’s or equivalent), 7 (Master’s or equivalent), and 8 (Doctorate or equivalent) we also define a special category “other tertiary” for graduate degrees that are not traditional in the Czech Republic and have no close equivalent (such as MBA, LLM, etc.). The Czech academic titles “profesor” and “docent” (titles associated with the positions of full and associate professor) are added into the ISCED 8 category. Candidates can be classified in multiple education categories simultaneously.
We infer candidates’ gender using their first names. We match the name listed in the electoral database with a database published by the Ministry of the Interior of the Czech Republic, which includes the frequency of each name for each gender. We assign each candidate the gender that is more frequent for his or her first name. Please note, that the database of names is not available anymore due to GDPR.
We augment the electoral database with hand-collected data on ballot layout—specifically, we add the line number of the last candidate printed on the front side of each ballot.
The resulting database contains constituency-level data for elections to the lower house of the Czech Parliament that took place in 2006, 2010, 2013, and 2017.
The replication data (DOI 10.5281/zenodo.7218946) file contains the following variables:
- year ... Election year/Election ID,
- ballot ... Ballot ID -- a combination of KSTRANA (election-specific party ID), VOLKRAJ (constituency ID) and year,
- pref_hlasy ... Number of preferential votes (n),
- otherside ... Indicator variable for candidates listed on the reverse side,
- male ... Indicator variable for males,
- agecat ... Age category,
- VEK ... Age,
- ISCO ... ISCO 1 (other -- not categorized in ISCO),
- maxBallot ... Total number of candidates on the ballot,
- POC_HLASU ... Total number of votes cast for the party,
- dff ... Distance to break,
- ISCED* ... Indicator variables for ISCED categories