Introduction
The Program for the International Assessment of Adult
Competencies (PIAAC) is a comprehensive international survey of adult skills. It
measures adults’ proficiency across a range of key information-processing skills
and assesses these adult skills consistently across participating countries.
PIAAC is administered every 10 years and has had two cycles so far. For PIAAC
Cycle 1, the United States participated in three rounds of data collection
between 2011 and 2018. A total of 38 countries participated in these three
rounds of PIAAC Cycle 1. More detailed information can be found in the PIAAC 2012/2014/2017: Main Study, National Supplement, and PIAAC 2017 Technical Report.
PIAAC Cycle 2 began in 2022–23, with 31 countries
participating in the first round.
The assessment focused on the key cognitive and workplace skills necessary for
individuals to participate successfully in the economy and society of the 21st
century. This multicycle study is a collaboration between the governments of
participating countries, the Organisation for Economic Co-operation
and Development (OECD), and a consortium of various international organizations,
referred to as the PIAAC Consortium. In the United States, PIAAC is sponsored by
the National Center for Education Statistics (NCES) in the Institute of
Education Sciences of the U.S. Department of Education.
An important element of the value of PIAAC is its
collaborative and international nature. Internationally, PIAAC was developed
collaboratively by participating countries’ representatives from ministries and
departments of education and labor as well as by OECD staff through an extensive series of
international meetings and workgroups. All PIAAC countries must follow common
standards and procedures. As a result, PIAAC can provide a reliable and
comparable measure of adult skills in the adult population (ages 16–65) of
participating countries.
This Methodology and Technical Notes document provides
an overview, with a particular focus on the U.S. implementation, of the
following technical aspects of PIAAC Cycle 2:
More detailed information on these topics can be found in the upcoming PIAAC Cycle 2 U.S. technical report.
International Requirements for Sampling, Data Collection, and Response Rates
The PIAAC Consortium oversees all PIAAC activities on
behalf of OECD and provides support to participating countries in all aspects of
PIAAC. Each country is responsible for conducting PIAAC in compliance with the
PIAAC Technical Standards and Guidelines (OECD 2022) provided by the Consortium
to ensure that the survey design and implementation yield high-quality and
internationally comparable data. The standards were generally based on
agreed-upon policies or best practices to follow when conducting the study, and
all participating countries were required to follow them to have their data
included in the OECD reports and data products.
To ensure all participating countries met the
standards, the Consortium implemented a comprehensive quality control process to
monitor all aspects of the study,
including sample selection and monitoring, background questionnaire (BQ)
adaptations, instrument translation, interviewer training, data collection,
coding and data processing, data delivery, and weighting and variance
estimation. The requirements regarding the target populations, sampling design,
sample size, exclusions, and defining participation rates are described next.
International Target Population
The PIAAC target population consisted of all
noninstitutionalized adults between the ages of 16 and 65 (inclusive) who
resided in the country (whose usual place of residency is in the country) at the
time of data collection. Adults were included regardless of citizenship,
nationality, or
language.
The target population included
- Full-time and part-time members of the military who did not reside in military barracks or on military bases;
- Adults in noninstitutional collective dwelling units (DUs) or group quarters, such as workers’ quarters or halfway homes; and
- Adults living at school in student group quarters, such as dormitories.
In countries where persons were selected from a
registry, age at the mid-point of data collection was used to determine
eligibility. In countries where persons were selected using a screener
questionnaire, age was
defined as the day the screener was conducted.
Sampling Design
It is not feasible to assess every adult in each
participating country. Therefore, a representative sample of adults needed to be
selected from a list of adults in the target population, i.e., from a sampling
frame. The sampling frames for all countries were required to include 95 percent
or more of the PIAAC target population. That is, the undercoverage rate,
combined over all stages of sampling, could not exceed 5 percent.
In some countries, a central population registry
constituted the frame, and individuals were sampled directly from the frame. In
other countries, including the United States, a multistage sample design was
used, with the frame built from other sources, for example, lists of primary
sampling units, secondary sampling units, dwelling units, and individuals within
dwelling units.
The sampling frame at each stage was required to
include any information necessary for sample design, sample selection,
and estimation purposes, as well as sufficiently reliable information to sample
individual units and ultimately to locate individuals for the interview and
assessment.
Other requirements for each country’s sampling design
included the following:
- The sampling frame(s) had to be up-to-date and contain only one unique record for each sampling unit.
- For multistage area sample designs in which a population registry was not used, countries were required to have a frame of DUs within the selected geographic clusters.
- Countries with central population registers were required to have a sampling coordination strategy in place to spread the response burden more equally across the population.
Sample Sizes
The minimum sample size requirement for PIAAC Cycle 2
was between 4,000 and 5,000 completed cases
per reporting language 1 for the PIAAC target population, with the specific
requirement depending on the number of sampling stages for the country, which is
related to the predicted design effect for the country. The overall goal of the
sample design was to obtain a nationally representative sample of the target
population in each participating country that was proportional to the population
across the country, in other words, a self-weighting sample design (Kish 1965).
Countries with highly clustered samples or with a high
degree of variation in sampling rates due to either oversampling or variation in
household size were required to increase the sample size requirements to account
for the higher expected design effects compared to other countries with equal
probability samples and the same number of sampling stages. Countries had the
option to increase the sample size to obtain reliable estimates for groups of
special interest (e.g., 16- to 29-year-olds) or for geographic regions (e.g.,
states and provinces) or to extend the age range (e.g., age 66 or over).
Exclusions
The PIAAC target population excluded adults in
institutional collective DUs or group quarters such as prisons, hospitals,
and nursing homes, as well as adults residing in military barracks and on
military bases.
The Consortium reviewed and approved any additional
exclusions to the PIAAC target population, regardless of whether they exceeded
the 5 percent threshold noted above. Country-specific exclusions were only
implemented because of operational or resource considerations, for instance,
excluding persons in hard-to-reach areas.
Defined Response Rates
Although the Consortium did not establish set
participation or response rate standards for all participating countries, each
country was required to specify sample size goals for each stage of data
collection (screener if applicable, BQ, and assessment). Other requirements
included the following:
- Each country should specify its assumptions about nonresponse and ineligibility rates.
- The sample size should be adjusted to account for expected nonresponse.
- For countries with a screener, sample size goals should be constructed for the screener to account for ineligibility and screener nonresponse, as well as nonresponse to the BQ and the assessment.
A completed case is one that met all of the following criteria:
- Responses to key background questions in the full BQ, including age, gender, highest level of schooling, employment status, and country of birth (native/nonnative) were collected.
- The tablet tutorial section was attempted.
- The locator2 was attempted.
Sampling in the United States
The U.S. PIAAC Cycle 2 National Sample Design
The target population for U.S. PIAAC Cycle 2 consisted
of noninstitutionalized adults ages 16–74 who resided in the United States at
the time of the interview. The 16–65 age group is consistent with the
international target population, and the 66–74 age group was added as a national
option. Adults were included regardless of citizenship, nationality, or
language.
To select a nationally representative sample, U.S.
PIAAC used a four-stage stratified cluster sample design. This method involved
(1) selecting primary sampling units (PSUs) consisting of counties or groups of
contiguous counties; (2) selecting secondary sampling units (SSUs) consisting of
area blocks; (3) selecting DUs (for example,
single-family homes or apartments selected from address listings); and (4)
selecting eligible persons within DUs. Random selection methods were used at
each stage of sampling. Initial sample sizes were determined based on a goal of
5,000 respondents ages 16–65 per PIAAC standards, plus an additional 1,020
respondents ages 66–74. During data collection, response rates and sample yields
were monitored and calculated by key demographic and subgroup characteristics.
These sampling methods and checks ensured that the sample requirements were met
and that reliable statistics based on a nationally representative sample could
be produced.
First Stage
The PSU sampling frame was constructed from the list
of counties and population estimates in the Vintage 2020 Census Population
Estimates, joined with additional county-level data for stratification. To form
PSUs, small counties were combined with adjacent counties until they reached a
minimum population size of 15,000 eligible adults;3 most PSUs consisted of a
single county.
The four largest PSUs were selected with certainty
(i.e., with a probability of 1). The remaining PSUs were grouped into major
strata formed by Census region, metro status,
and literacy level, where literacy level was based on results from PIAAC
2012/14/17. Within each major stratum, PSUs were further grouped into minor
strata formed from one or more proficiency-related variables from the 2015–19
American Community Survey (ACS; U.S. Census Bureau 2020) related to education,
ethnicity, poverty, employment status, marital status, occupation, and health
insurance status.
Once the strata were formed, one PSU was selected per stratum using a
probability-proportional-to-size (PPS) technique.
Second Stage
The sampling frame of SSUs was constructed from
block-level data in the Census 2020 PL-94 redistricting file, with blocks
combined to reach a minimum size of 120 DUs. Within a PSU, SSUs were sorted
geographically and selected using a systematic PPS technique. This approach
allowed for a diverse sample of SSUs spread across the PSU.
Third Stage
The sampling frame at the third stage, a list of DUs,
was formed from a combination of residential address lists from the U.S. Postal
Service (also known as address-based sampling lists) and lists of DUs made by
field staff (also known as traditional listing) for each sampled SSU. Within an
SSU, DUs were sorted geographically and selected using a systematic random
sample. This resulted in an initial self-weighting sample of DUs (i.e., each DU
had the same overall probability of selection). The initial sample was randomly
divided into a main sample for initial release and a reserve sample to be used
as needed.
Fourth Stage
The fourth stage sampling frame, a list of
individuals, was created through information collected in a screener
questionnaire, in which a household respondent was asked to list people who
lived in the dwelling and had no usual place of residence elsewhere. Individuals
were then selected using a stratified simple random sample, with strata based on
age group (16–65 and 66–74). In the first stratum, one or two 16- to 65-year-olds
were selected depending on household size. Selecting two persons in larger
households (households with four or more 16- to 65-year-olds) helped reduce the
variation due to unequal probabilities of selection. One 66- to 74-year-old
was selected from the second stratum. Therefore, an eligible household could
have one to three individuals selected for the survey.
The U.S. PIAAC Cycle 2 Supplemental State Sample
In addition to the national sample described above,
the U.S. PIAAC Cycle 2 included a supplemental sample in particular states. The
purpose of the supplemental sample was to increase the number and diversity of
sampled counties to improve model-based state-
and county-level estimates. After the national sample of PSUs was selected,
supplemental PSUs were selected so that each state had at least two sampled PSUs
in the combined sample. Then SSUs, DUs, and eligible adults were selected within
the PSUs using the same sampling methods as described for the national sample.
About 2 months into data collection (November 2022), collection for the state
supplemental sample was halted due to funding, resulting in an incomplete
supplemental sample. The U.S. PIAAC Cycle 2 sample was designed to be nationally
representative regardless of whether the supplemental state sample was included.
Therefore, it was possible to combine the incomplete supplemental sample with
the national sample, maintaining a nationally representative sample and
improving the diversity for small area estimation purposes.
Questionnaire and Assessment Development
Background Questionnaire
The PIAAC BQ collected detailed information to support
a wide range of contextual analyses. It facilitates the examination of how skill
proficiency is distributed across various sociodemographic groups of the
population. It also allows for insights into how skills are associated with
outcomes and how they are used in personal and professional contexts. Finally,
it facilitates the investigation of how proficiency is related to investments in
education and training, shedding light on the process of skills formation.
PIAAC Cycle 2 was designed to allow results to be as
comparable as possible with those of PIAAC Cycle 1. At the same time, the survey
instruments were improved in several dimensions.
Revisions to the PIAAC BQ focused on
- Adaptation to international standards, such as the International Standard Classification of Education 2011, the framework used to compare statistics on the educational systems of countries worldwide (UNESCO Institute for Statistics 2012);
- Adaptation to changes in the technological environment;
- Enriched information on the working environment and the use of high-performance work practices to make best use of workers’ skills;
- More detailed information on the pathways respondents followed through their educational careers; and
- A new (optional) section on social and emotional skills. (This option was not inlcuded in the U.S. version of the BQ).
The PIAAC Cycle 2 questionnaire included the following topics:
- Personal and background characteristics;
- Education and training;
- Current employment status and work history;
- Use of skills, skills mismatches, and the working environment;
- Noneconomic outcomes; and
- Social and emotional skills.
The international version of the BQ is available.
U.S. BQ Adaptations
The Consortium developed the PIAAC international
master version of the BQ, which was the basis for the U.S. national BQ. Several
questions were adapted from the international version of the questionnaire to be
appropriate in the U.S. educational and cultural context.
Individual questions were evaluated for analytic relevance and respondent burden
(e.g., recall, clarity, salience),
resulting in several additions and deletions for the field test instrument, with
further revisions for the main study. Participating countries were allowed to
add up to 5 minutes of country-specific items. Instead of including a new
section on social and emotional skills,
which was optional for countries, the U.S. national BQ was modified to include a
21-question module on financial literacy, Section L.
Direct Assessment
The PIAAC Cycle 2 direct assessment (literacy,
numeracy, and adaptive problem solving) tasks focused on respondents’ ability to use
information-processing strategies to solve problems they encounter in their
everyday lives. For more details, see the PIAAC Cycle 2 assessment frameworks.
The assessment tasks and materials were designed to measure a broad set of
foundational skills required to successfully interact with the range of
real-life tasks and materials that adults encounter in everyday life. The
resolution of these tasks does not require specialized content knowledge or more
specific skills. The skills assessed in PIAAC are considered general skills
required in a very broad range of situations and domains. The PIAAC assessment
was not designed to identify any minimum level of skills that adults must have
to fully participate in society. A feature of the PIAAC assessment common to all
three skill domains is the need to reflect the changing nature of information in
today’s societies due to the prevalence of data-intensive and complex digital
environments. Therefore, many PIAAC assessment tasks are embedded in these kinds
of environments.
For PIAAC Cycle 2, the constructs of literacy, numeracy, and
adaptive problem solving were refined to better reflect the evolution of skills
in complex digital environments. Each domain is briefly described below (OECD
2021).
Literacy is accessing, understanding, evaluating,
and reflecting on written texts in order to achieve one’s goals, to develop
one’s knowledge and potential,
and to participate in society. PIAAC also evaluates adults’ ability to read
digital texts and traditional print-based texts. The revised construct reflects
the growing importance of reading in digital environments, which poses different
cognitive demands and challenges, and the increasing need to interact with
online texts. For PIAAC Cycle 2, some literacy tasks involved multiple sources
of information, including static and dynamic texts that respondents had to
consult to respond. The texts were presented in multiple text formats, including
continuous (e.g., sentences, paragraphs), non-continuous (e.g., charts, tables),
and mixed text, and
reflected a range of genres.
Numeracy is accessing, using,
and reasoning critically with mathematical content, information, and ideas
represented in multiple ways in order to engage in and manage the mathematical
demands of a range of situations in adult life. It is an essential skill in an
age when individuals encounter an increasing amount and wide range of
quantitative and mathematical information in their daily lives. Numeracy is a
skill parallel to reading literacy, and it is important to assess how these
competencies interact because they are distributed differently across subgroups
of the population. For PIAAC Cycle 2, the assessment of numeracy covered
engagement with mathematical information in digital environments. It also
included an assessment of numeracy components, focused on some of the skills
essential for achieving automaticity and fluency in managing mathematical and
numerical information.
Adaptive problem solving (APS) involves the
capacity to achieve one’s goals in a dynamic situation, in which a method for
solution is not immediately available. It requires engaging in cognitive and
metacognitive processes to define the problem, search for information, and apply
a solution in a variety of information environments and contexts. The assessment
explicitly considers individuals’ ability to solve multiple problems in
parallel, which requires individuals to manage the order in which they approach
a list of problems and to monitor opportunities that arise for solving different
problem sets. The assessment of APS in PIAAC Cycle 2 aimed to highlight the
respondents’ ability to react to unforeseen changes and emerging new
information. Results from PIAAC Cycle 2 were not comparable to the assessment of
problem solving in technology-rich environments from PIAAC 1.
As the objective of PIAAC is to assess how the adult
population is distributed over a wide range of proficiency in each of the
domains assessed, the tasks were designed to capture different levels of
proficiency and vary in difficulty. An adaptive assessment design was employed
in literacy and numeracy to ensure respondents were presented with items that
were challenging for their level of proficiency without being too easy or too
difficult.
Data Collection
The main study data collection was conducted between
September 1, 2022,
and June 16, 2023. A total of 4,637 respondents across the United States
completed the BQ, with 4,574 of them also completing the assessment. This number
includes the core national sample of adults ages 16 to 65 for PIAAC Cycle 2 and
the supplemental sample of adults ages 66 to 74, which was of special interest
to NCES. Although the United States fell short of the designated PIAAC goal for
the number of completed cases due to the low participation rate, the minimum
required for the psychometric modeling was met.
Each sampled household was administered a screener to
determine the eligibility of household members to participate in the survey.
Within households, each sampled person selected completed an
interviewer-administered BQ,
followed by a self-administered tablet-based assessment. Sampled persons who
completed the assessment received an incentive of $100. Sampled households that
had not been contacted in person received a paper version of the screener
questionnaire with an unconditional incentive of $5.
Data Collection Instruments
Before contacting anyone at the sampled address,
interviewers were required to complete a short series of questions called the DU
Observations related to the sampled address. The interviewers completed these
questions using their study iPhone. The information from the DU Observations was
used in nonresponse bias analysis (NRBA) to evaluate whether nonrespondents
lived in homes and environments similar to those of respondents and thus helped
address the generalizability of the data collected from respondents to the whole
population.
The PIAAC household interview was composed of three
distinct instruments: the screener, BQ, and the direct assessment. A short,
self-administered questionnaire called the doorstep interview was also available
for respondents who did not speak English or Spanish, which were the two
languages in which the screener and BQ were available.
(See Figure 1 for an overview of the flow of respondents through the survey.)
Figure 1. Routing flow through the PIAAC instrumentation
Figure 1 detailed description.
Screener
Household members who were 16–74 years old were
eligible to be selected, with up to two persons selected in households with four
or more eligible adults. Interviewers used the screener—a computer-assisted
personal interviewing (CAPI) instrument—to collect the first name, age, and
gender of each household member. The CAPI system conducted a within-household
sampling procedure to select sampled person(s) to participate in the study. In
the United States, the screener was available in English and Spanish.
Partway through data collection, a secondary mode of
screener data collection was added. All households that had received at least
four in-person contact attempts,
but had not yet responded or participated, were sent a paper version of the
screener along with a $5 incentive and a postage-paid envelope to return the
completed questionnaire. Information from the paper screeners was entered into
the tablet to select eligible persons for study participation.
Background Questionnaire
Each sampled person completed the BQ,
which collected respondent information on the following areas: socio-economic
and demographic background; education and training; employment status and work
history; current work or past job; skills used at work and in everyday life;
work practices and the work environment; attitudes and activities; background,
including parents’ education and occupation; and financial literacy. The BQ was
developed as an interviewer-administered CAPI instrument and was conducted on
the interviewer’s tablet. In the United States, the PIAAC Cycle 2 U.S. main
study BQ was available in English and Spanish.
Direct Assessment
Each sampled person completed the assessment using a
tablet. In the United States, the direct assessment was only available in
English. The assessment began with a tablet tutorial to make sure respondents
understood how to interact with the device and the interface. The tutorial
included short video animations that demonstrated actions respondents would use
to complete the assessment items,
such as tapping, dragging and dropping, and highlighting text. It also included
examples of screen layouts and response option formats for the various
assessment tasks. After practicing the tutorial, the sampled person completed
the locator (also referred to as Stage 1),
which was composed of eight numeracy and eight literacy items. The sampled
person then was routed to a combination of literacy, numeracy, or APS tasks of
different difficulty levels.
The APS assessment items were divided into five
clusters, with respondents exposed to two randomly selected clusters of items.
The literacy and numeracy assessments used a hybrid multistage adaptive/linear
design. The adaptive component of the design was based on six different testlets
administered in Stage 2, with three low-difficulty testlets and three
high-difficulty testlets. Assignment to Stage 2 testlets depended on performance
on the locator test and personal characteristics collected in the BQ. Stage 3
also featured six testlets: two of low difficulty, two of medium difficulty, and
two of high difficulty. The assignment to testlets in Stage 3 was driven by
performance in Stage 2. Finally, a linear component was introduced to ensure
that each item was attempted by a sufficient number of respondents from a wide
proficiency range.
The OECD developed the criteria for determining the adaptive design routing through the assessment paths based on respondent performance. Respondents who failed the locator were routed to the
Components section,
which measured basic numeracy and reading skills. Twenty-five
percent of the respondents who did well on the locator were also randomly routed
to the Components section before completing the assessment items, while the
majority of these respondents (75 percent) were routed directly to literacy,
numeracy or APS. Respondents who performed well on the locator received a
combination of two of the following direct assessment instruments—the two-stage,
adaptive modules of literacy or numeracy testlets; the two-stage, linear modules
of literacy or numeracy testlets; or the linear APS clusters. Respondents who
passed the locator but performed relatively poorly received a combination of two
of the following direct assessment instruments: the two-stage, adaptive modules
of literacy or numeracy testlets, the two-stage, linear modules of literacy or
numeracy testlets, or
the linear APS clusters.
After the completion of the direct assessment, a set
of Effort and Performance questions asked respondents about the effort they put
in to completing the assessment and how they thought they performed in the
assessment.
Doorstep Interview
The doorstep interview was a short questionnaire
available on the tablet for sampled persons who had a language barrier and were
unable to complete the BQ in English or Spanish. The doorstep interview was
designed to obtain key information on the characteristics of respondents who
would have been classified as literacy‐related
nonrespondents in the first cycle. These individuals were essential to the
population model for the estimation of proficiencies,
and some information related to their background characteristics helped improve
the population model and contributed to the analysis and reporting of key
findings.
Interviewers used a language identification card,
which listed the languages in which the doorstep interview was available to
ascertain the language spoken by the sampled person. The questionnaire was then
presented to the sampled person on the tablet in their preferred language. The
short series of questions collected information on respondent gender, age, years
of education, current employment status, country of birth, and number of years
in the United States (if nonnative). In the United States, the doorstep
interview was available in 11 languages: Arabic, Chinese (simplified), Chinese
(traditional), Farsi, Korean, Punjabi, Russian, Somali, Spanish, Urdu, and
Vietnamese.
Post-Interview Questionnaire
After the interview was completed, interviewers
completed a brief post-interview questionnaire to record where the interview
took place, whether the sampled person requested assistance with the BQ or
assessment (from the interviewer or other household members), or if there were
any events that may have interrupted or distracted the sampled person during the
interview.
Field Staff Training
To ensure that all interviewers were trained
consistently across participating countries, the Consortium provided a
comprehensive interviewer training package,
including manuals and training scripts to be used by national training teams.
Countries could adapt training materials to their national context as needed.
The Consortium recommended that countries provide all interviewers with
approximately 20 hours of training,
which included general interviewer training and PIAAC-specific
training content. All interviewers in the United States received 2 weeks of
training (approximately 40 hours).
As a result of the COVID-19 pandemic, countries were
allowed to adapt the field interviewer training program from the PIAAC Cycle 1
in-person model to a hybrid model,
with training sessions delivered both in person and virtually. The interviewer
training program in the United States included virtual delivery of
administrative procedures, general interviewing techniques, and introductory
training sessions. In-person training maximized trainee involvement and
emphasized gaining respondent cooperation skills, answering questions about the
study, and practicing the administration of all interview components (i.e.,
the screener, BQ, doorstep interview, direct assessment, and the post-interview
questionnaire).
To ensure that the interviewer training conducted by
national teams met the requirements specified in the PIAAC Technical Standards
and Guidelines (OECD 2022), each country, including the United States,
submitted a summary training report within a month of completing national
training and a final training report within a month of ending data collection to
report on additional attrition trainings held during the field period.
Fieldwork Monitoring
The requirements for monitoring data collection
throughout the field period were specified in the PIAAC Technical Standards and
Guidelines (OECD 2022). These included monthly submission of sample monitoring
and survey operations reports during data collection. The Consortium provided an
international dashboard and specifications for management and monitoring reports
to be used by national teams overseeing data collection. These reports provided
information about interviewer productivity, time of interview, overall interview
timing and timing of individual instruments completed, time elapsed between
interviews, and validation reports. The Consortium required validation of 10
percent of each interviewer’s finalized cases. Countries were also required to
monitor the quality of interviewer fieldwork by reviewing two audio-recordings
of interviews (BQ) completed by each interviewer and the data quality of
completed interviews. Finally, all national teams were required to attend a
series of quality control calls with the Consortium to report on the status of
data collection.
The United States submitted the required monthly
reports and completed four quality control calls with the Consortium during the
field period. Monitoring of fieldwork was implemented using a corporate
dashboard that displayed key quality performance indicators to track interviewer
productivity (interviews started too early or too late in the day, multiple
interviews completed in a day, time elapsed between interviews, etc.) and data
quality (short BQ and/or assessment timings, BQ item response rate for completed
interviews, etc.). Additional automated validation of all completed screeners
and interviews was completed using geospatial location software,
which captured geospatial data by using the GPS feature on the interviewers’
mobile devices.
As required by the Consortium, each interviewer’s 3rd
and 10th BQ interview was reviewed, and corrective feedback was provided as
needed. Telephone validation of 10 percent of each interviewer’s finalized cases
was also implemented. Data quality checks included consistency checks on
respondent age and gender obtained in the screener versus the BQ, checks on
open-ended responses, missing data, and data frequencies.
U.S. Response Rates
This section provides information on the coverage of
the target population, weighted response rates, and the total number of
households and persons for U.S. PIAAC Cycle 2. For information on the other
PIAAC Cycle 2 participating countries, please refer to the upcoming OECD PIAAC Cycle 2
technical report.
As table 1 shows, the U.S. PIAAC Cycle 2 covered
nearly 100 percent of the target population, with an exclusion rate of 0.5
percent. The U.S. PIAAC Cycle 2 achieved an overall weighted response rate of 28
percent, which is the
product of 50 percent, 56 percent, and 99 percent for the screener, BQ,
and assessment, respectively. The overall response rate ranges from 27 percent
to 73 percent across countries that participated in the PIAAC Cycle 2, including
four countries with an overall response rate below 30 percent and one country
above 70 percent. As table 2 shows, the U.S. PIAAC Cycle 2 sample included
16,414 households and 7,754 individuals. Among the 4,637 individuals who
responded to the BQ, 4,574 also responded to the assessment. The response rates
in table 1 are based on the PIAAC core national sample only because the data
collection for the state supplemental sample ended prematurely. The sample size
and numbers of respondents in table 2 include the core national sample and state
supplemental sample. Both tables are for the population ages 16–74.
Tables 1 and 2 are also available in Excel.
Country | Percentage of target population coverage | Overall exclusions from national target population | Weighted screener response rate | Weighted BQ response rate | Weighted assessment response rate | Overall weighted response rate |
---|---|---|---|---|---|---|
United States | 99.6% | 0.5% | 50.2% | 56.1% | 98.9% | 27.8% |
Country | Households in sample | Persons in sample | BQ respondents | Assessment respondents |
---|---|---|---|---|
United States | 16,414 | 7,754 | 4,637 | 4,574 |
Data Cleaning and Coding
To ensure the delivery of a high-quality, clean data
file in a standardized layout, all countries participating in PIAAC Cycle 2 were
required to use the Data Management Expert (DME), a data management software
package. The DME was used in conjunction with the International Association for
the Evaluation of Educational Achievement (IEA)-supplied Data Management Manual
and Technical Standards & Guidelines (OECD 2022) to
- Integrate screener, BQ, and assessment data;
- Clean and verify data through edit checks;
- Export data for coding (e.g., occupation) and import coded data; and
- Produce the final dataset for delivery.
Data cleaning ensured that all information in the
database conformed to the internationally defined data structure,
the national adaptations to questionnaires were reflected appropriately in
codebooks and documentation,
and all variables selected for international comparisons were comparable across
systems. Data edits fell into two categories. Validation checks verified that
case IDs were unique and that values conformed to expected values/ranges. Record
consistency checks identified linkage problems between data tables and potential
issues in the sample or survey control file data. Throughout data collection,
the record consistency checks were closely monitored for discrepancies in
demographic information between the screener and the BQ. Because these
discrepancies could be a signal that a person other than the selected household
member had erroneously completed the BQ and assessment, it was critical to
resolve these issues as early as possible.
The entire suite of edit checks was run periodically
and at the conclusion of data collection. Data fixes were applied where
appropriate, and
reasons for acceptable anomalies were determined. All issues and their outcomes
were documented and submitted to IEA with the data file delivery. IEA conducted
further review and cleaning, resolving issues as needed.
The DME also facilitated the integration of coded data
for verbatim responses related to occupation, industry, language, and country.
IEA provided the coding schemes to be used:
- 2008 International Standard Classification of Occupations (ISCO-08; International Labour Organization 2012) was used to code occupations reported in the BQ. Occupational coding was done to the four-digit level when enough information was available.
- International Standard Industrial Classification of All Economic Activities (ISIC), Revision 4 (United Nations Statistics Division 2007) was followed to assign industry codes. Industry coding was done to the four-digit level when enough information was available.
- ISO 639-2 alpha-34 was used for languages learned at home during childhood and languages spoken at home.
- UN M.495 coding scheme was used to code the country of birth and the country of highest education.
One additional coded variable—identifying the
respondent’s geographic region using the OECD TL2 coding scheme (OECD 2013)—was
suppressed in the U.S. PIAAC dataset because the U.S. population was not sampled
on a regional or state level to be representative.
Verbatim responses from the BQ were exported from the
DME and coded in a separate coding software system. All coding was 100 percent
verified or double-coded to ensure accuracy and consistency across coding staff. The coded data were then
imported into the DME to their appropriate tables for delivery along with the
other study data.
After importing the coded data and reviewing all data
edit checks, additional frequency review and data reconciliation checks were
performed to ensure data were loaded correctly and were in sync with disposition
codes from the PIAAC Study Management System (SMS). The SMS final disposition
codes were compared against the aggregate of data available for each case; some
technical problem cases were discovered by identifying disparities between the
disposition codes and the lack of or incompleteness of data in the DME. Cases
with disparities were reviewed closely; in some instances,
this review yielded the recovery of the missing data. Throughout the process,
possible errors were investigated, documented, and resolved before the delivery
of the final dataset to IEA. The remaining discrepancies were documented in the
final delivery notes delivered along with the final data files.
Weighting in the United States
While the U.S. PIAAC sample is nationally
representative, analysis of the sample data requires the use of weights that
facilitate accurate estimation of population characteristics. The weights were
constructed to account for the complex sample design and nonresponse patterns
and were further adjusted through a calibration and trimming process that used
external population data from the ACS (U.S. Census Bureau 2020) to potentially
improve the accuracy of weighted estimates. The weights also accounted for the
combining of the national sample and the state supplemental sample into a single
sample.
For the national sample, sampling weights were
constructed at each stage of the four-stage sample design. For the sampling of
PSUs, SSUs, DUs, and individuals within DUs, sampling weights were derived as
the inverse of the probability of random selection with adjustments to account
for nonresponse. For sampled DUs that did not yield a complete screener
questionnaire due to nonresponse and for sampled individuals who did not
complete a BQ, nonresponse adjustments were applied to the weights so that
nonrespondents could be represented by respondents with similar characteristics.
For cases where nonresponse was attributable to literacy-related reasons, the
nonresponse adjustments specifically used doorstep interview respondents to
represent nonrespondents (Van de Kerckhove,
Krenzke and Mohadjer 2020). The stage-specific, nonresponse-adjusted sampling
weights were then combined into a single overall sampling weight for the
national sample.
The construction of weights for the supplemental
sample began with poststratification weighting to ensure that the weighted
distribution of respondents aligned with population benchmarks obtained from the
ACS 2022 1-year Public Use Microdata Sample (U.S. Census Bureau 2024). The
poststrata were defined to incorporate respondent characteristics related to the
sample design and proficiency outcomes while satisfying minimum sample size
requirements. Poststratification was used as the initial basis of weighting
because—as a result of the disruptions in data collection—the stage-specific sampling probabilities could not account
for the actual process by which households and persons were contacted for the
survey.
The weights for the national and supplemental state
samples were further adjusted so that the samples could be combined. These
adjustments were applied for every respondent in the supplemental sample and for
every respondent in every PSU of the national sample,
except for the four PSUs that were sampled with certainty because those PSUs
were not eligible to be sampled for the supplemental state sample. The sample
combination adjustments consisted of two steps. In the first step, the same
poststratification process adjustment applied to the state supplemental sample
was applied to the nonresponse-adjusted sampling weights of cases in the
national sample from PSUs not selected with certainty. In the second step,
weights from both samples were scaled by compositing factors, with the values of
these factors determined using the method described by Krenzke and Mohadjer
(2020). Following the compositing adjustment, the weights from the combined
sample could be used to estimate national population characteristics.
The weights of the combined sample underwent raking
and trimming adjustments. The purpose of the raking adjustment was to align
weighted sample distributions with external population benchmarks derived from
ACS data (U.S. Census Bureau 2020) while potentially reducing sampling variance
and possible bias from factors such as nonresponse. The purpose of the trimming
adjustment was to reduce variation in the weights and thereby reduce the
variance of weighted estimates. The variables used for the raking adjustment
were related to age, gender, race and ethnicity, educational attainment, country
of birth (United States or outside the United States), and place of residence.
Largest weights for doorstep interview respondents were trimmed before other
raking and trimming adjustments were made.
The final raking adjustment yielded the final weights
for the combined sample. The final weights are accompanied by 80 sets of
replicate weights constructed using Fay’s method of balanced repeated
replication (Judkins 1990). Each set of replicate weights underwent the same
stages of weighting adjustments described above so that the replicate weights
could be used to estimate variances accounting for both the complex design of
the U.S. PIAAC sample and the many adjustments used to produce the final
weights.
Changes to the Assessment Administration, Content, and Scaling between Cycles 1 and 2
Differences in scores between PIAAC Cycle 1 (2012/14 and 2017) and PIAAC Cycle 2 (2023) are discussed in the U.S. PIAAC Highlights Web Report. Comparisons of 2023 results with 2012/14 and 2017 PIAAC assessments need to be made with caution due to changes in the assessment and scaling methodology.
Key changes for Cycle 2 include:
- Move to exclusive use of tablets for the administration of the survey and assessment in the United States, whereas in previous PIAAC assessments, respondents were offered an option of responding by laptop or in a paper-and-pencil format.
- Framework changes resulting in more items, as well as more interactive items. The literacy and numeracy frameworks were updated for Cycle 2 to reflect the technological and social developments that affect the nature and practice of numeracy and literacy skills and methodological developments in the understanding of the skills measured.
- Design changes that allowed for greater accuracy in routing participants to different paths based on their proficiency. In Cycle 1, a computer or paper-based “core” (i.e., 4 literacy and 4 numeracy items at a very low level) was used to assess whether an individual had sufficient basic skills to take the full assessment. In Cycle 2, a tablet “locator” test (with 8 literacy and 8 numeracy items ranging from very low to medium levels) was used to route participants to different paths based on their level of proficiency.
- Changes in scaling methodology to include the reading and numeracy components data in the proficiency estimates. To improve the precision of the estimates of proficiency at the bottom of the skills distribution, Cycle 2 incorporated performance in the component assessments in estimating the literacy and numeracy proficiency of respondents
Data Limitations
As with any survey, PIAAC data are subject to errors
caused by sampling and nonsampling reasons. Sampling error is due to sampling a
proportion of the target population instead of including everyone in the survey.
Nonsampling error can happen during data collection and data processing.
Researchers should take errors into consideration when producing estimates using
PIAAC data.
Sampling Errors
Sampling error is the uncertainty in an estimate that
arises when not all units in the target population are measured. This
uncertainty, also referred to as sampling variance, is usually expressed as the
standard error of a statistic estimated from sample data. There are two commonly
used approaches for estimating variance for complex surveys: replication and
Taylor series (linearization). The replication approach was used for PIAAC
because of the need to accommodate the complexities of the sample design, the
generation of plausible values (PVs), and the impact of the weighting
adjustments. The specific replication approach used for calculating standard
errors in PIAAC Cycle 2 was balanced repeated replication with Fay’s adjustment
(factor = 0.3).
For estimates that do not involve PVs, the estimates
of standard errors are based entirely on sampling variance. For estimates
involving PVs, calculations of standard errors must account for both the
sampling variance and the variance due to imputation of PVs. The imputation
variance reflects uncertainty due to inferring adults’ proficiency estimates
from their observed performance on a subset of assessment items and other
proficiency-related information.
Standard errors for all BQ items in the U.S. national
public-use file (PUF) can be found in the PUF compendia, forthcoming. The purpose of the compendia is to support PUF users so that they can gain knowledge of the
contents of the PUF and can use the compendia results to ensure they are
performing PUF analyses correctly.
Details of estimating standard errors in the PIAAC Cycle 2 U.S. results can be found in appendix E (Data User Guide) of the
upcoming PIAAC Cycle 2 U.S. technical report.
Population
The results presented for the 2012/14 and 2017 PIAAC assessments are for those individuals who could respond to PIAAC in either English or Spanish. The results for PIAAC Cycle 2 also included adults who did not speak English or Spanish, who were given a short, self-administered survey of background information in the language they identified as the one they best understood. This allowed for an estimation of their proficiency. (See Doorstep Interview in the Data Collection section for more information.)
Nonsampling Errors
Nonsampling error can result from factors such as
undercoverage of the target population, nonresponse by sampled households and
persons, differences between respondents’ interpretations of the survey
questions and the questions’ intended meaning, data preparation errors, and differences in the assesssments and scoring methodology between cycles.
Unlike sampling errors, nonsampling errors are often difficult to measure.
Although PIAAC strives to minimize errors through quality control and weighting
procedures, some errors inevitably remain in the data.
Missing Data
PIAAC used a standard scheme for missing values.
Designated missing codes were used to indicate don’t know, refused, not stated
or inferred, and valid skips. The assessment items also included a missing code
for not administered. For more details on the missing codes, please see appendix
E (Data User Guide) of the upcoming PIAAC Cycle 2 U.S. technical report.
The key BQ variables (e.g., age, gender, highest
education level, employment status, country of birth) had either no or very
little missing data. For a complete list of item response rates, please see the
upcoming PIAAC Cycle 2 U.S. technical report.
Confidentiality and Disclosure Limitations
The NCES Standard 4-2, Maintaining Confidentiality
(NCES 2002) provides guidelines for limiting the risk of data disclosure for
data released by NCES. Confidentiality analyses were conducted on the PIAAC data
in accordance with the NCES Standard. The analyses included a three-step process
to reduce disclosure risk: (1) determine the disclosure risk arising from
existing external data, (2) apply data treatments using a method called data
swapping, and (3) coarsen the data (e.g., top- and bottom-coding, categorization of continuous
data), Swapping, which involves random swapping of data elements between like
cases, was designed to not significantly affect estimates of means and variances for the whole
sample or reported subgroups (Krenzke et al. 2006). Careful consideration was
given to protect respondent privacy while preserving data utility to the
greatest extent possible. Please refer to the upcoming PIAAC Cycle 2 U.S.
technical report for more details on the data confidentiality process.
The following files, included in the PIAAC data dissemination products, were produced following the aforementioned three-step process:
- U.S. national PUF (ages 16–74);
- international PUF (ages 16–65); and
- U.S. national restricted-use file (RUF) (ages 16–74).
The RUF contains noncoarsened, swapped data,
and the PUF contains coarsened, swapped data. Data were also added to two
web-based data tools, the NCES International Data Explorer (IDE) and the OECD
IDE, following the confidentiality procedures established for disseminating data
via data tools. Both the NCES and OECD IDE enable the user to create statistical
tables and charts for adults ages 16–65, while the NCES IDE also facilitates
analyses on 66- to 74-year-olds.
PIAAC Cycle 2 participants were informed that their
privacy was protected throughout all phases of the study and information they
provided could only be used for statistical purposes and would not be disclosed,
or used, in identifiable form for any other purpose except as required by law
(20 U.S.C. §9573 and 6 U.S.C. §151). Individuals are never identified in any
releases (data files, reports, tables, etc.) because reported statistics only
refer to the United States as a whole or to national subgroups. Participants’
names, addresses, and
any contact information collected during the interviews are excluded from the
final datasets.
All individuals who worked on PIAAC field data
collection, including
supervisors and interviewers,
were required to sign a PIAAC confidentiality agreement. All employees who
worked on any aspect of the study, including the management of data collection,
data creation, data dissemination, data analysis, and reporting,
signed an affidavit of nondisclosure.
Statistical Procedures
Test of Significance
Patterns observed in the sample may not be present in
the population. For example, in the sample, the average literacy score for one
region may by chance be higher than for other regions, but,
in fact, that region
might have a lower average literacy score in the population. Statistical
significance tests are commonly used by analysts to help assess whether a
pattern observed in the sample is also present in the population. The result of
a test is said to be statistically significant if the pattern in the sample is
determined to be unlikely to have occurred if the pattern was not also present
in the population (i.e., unlikely to have been a matter of random chance). However, when an observed difference among groups in the
sample is described as statistically significant, that does not necessarily mean
that the difference among the groups in the population is meaningfully large.
The NCES Statistical Standards (NCES 2012) require reported analyses to focus on
differences that are substantively important rather than merely statistically
significant, and the standards note that “it is not necessary, or desirable, to
discuss every statistically significant difference” in reported analyses.
Results of statistical significance tests should be reported with underlying
estimates and accompanying measures such as standard errors, coefficients of
variation, or confidence intervals (Wasserstein and Lazar 2016).
Statistical significance tests rely on estimates of
sampling variance. As such, it is necessary to use the variance estimation
methods described in the upcoming PIAAC Cycle 2 U.S. technical report. Analysts
should use the provided replicate weights; for analyses involving PVs, analysts
should also use the variance estimation formula provided in the technical report
that accounts for the imputation variance of the PVs. These variance estimation
methods are highly flexible and may be used for several kinds of statistical
significance tests.
Throughout PIAAC reports, t-tests are used to assess
the statistical significance of differences in proficiency scores between two
groups or two periods of time. The reports use two types of t-tests. The first
type of t-test compares estimates from two groups within the U.S. PIAAC sample,
denoted and . An example of this
type of t-test is a comparison of average literacy scores of men and women in
the United States (NCES 2018). Because of the complex sample design and
imputation process, the two groups’ estimates are not independent. The standard
error of the difference between the two groups’ estimates, denoted
, must be directly
estimated using the provided replicate weights. Given the two groups’ estimates
and the estimated standard error of their difference, the t-statistic is
computed as follows:
A second type of t-test compares estimates from two
independent samples, such as the PIAAC U.S. sample and another country’s sample,
but it is not generally applicable to comparing estimates from two groups within
the PIAAC U.S. sample. For example, the results of this type of t-test are
included in the PIAAC International Highlights Web Report (NCES 2020) for
comparisons of the U.S. average literacy score to that of Japan. This test is
also applicable to the comparison of estimates from PIAAC U.S. samples from
different cycles. For this test, the two independent estimates
and have a sampling
variance equal to the sum of the two independent estimates’ variances, so the
t-statistic for a statistical significance test may be computed as follows:
Because of the complex sample design, the degrees of
freedom for the reference t-distribution used in tests of significance should be
much smaller than the total number of respondents in the sample. The upcoming
PIAAC U.S. technical report contains guidance on the determination of the degrees of freedom to
use for t-tests and other types of statistical significance tests.
When comparing over time, there is an uncertainty associated with the skills’ scales between Cycle 2 and Cycle 1, because the assessment framework and assessment items are, although similar, not identical. This uncertainty is manifested through “linking error” that is independent of the size of the sample. The linking error is added for the statistical significance of differences in proficiency scores. The value of the linking error is 3.27 for literacy and 2.95 for numeracy. Details of estimating standard errors in the PIAAC Cycle 2 U.S. results can be found in appendix D (Data User Guide) of the upcoming PIAAC Cycle 2 U.S. technical report.
Nonresponse Bias Analysis
Nonresponse bias analysis (NRBA) is used to evaluate
the possible extent of bias originating from differences in proficiency between
those who responded to the survey and those who did not. The proficiency of
survey nonrespondents is unknown, so it is not possible to have an exact measure
of bias in the proficiency estimates. Instead, NRBA provides a way of using
known information about survey respondents and nonrespondents to evaluate the
potential risk for bias in the data.
To reduce nonresponse bias, adaptive survey design
procedures6 were developed by the PIAAC Consortium and followed by the United
States during its data collection (see chapter 3 of the PIAAC Cycle 2 U.S.
technical report). PIAAC Technical Standards and Guidelines (OECD 2022) also
required NRBA for all countries, with additional analyses for those with a
response rate7 below 70 percent.8 With a response rate of 28 percent,9 the United
States conducted the two required sets of analyses: (1) a basic NRBA to identify
differences in respondent and nonrespondent characteristics so that the
weighting process could adjust for the differences, and (2) an extended NRBA to
assess the potential for bias in the final, weighted PIAAC proficiency
estimates.
Basic Nonresponse Bias Analysis
The basic NRBA was used to identify and correct for
bias in respondent characteristics. The analysis was used to inform nonresponse
weighting adjustments for the core national sample and was performed on the
national sample of adults ages 16 to 74, excluding the state supplemental
sample. For this analysis, a classification tree method was applied to divide
the sample into subgroups with different response rates. One tree was fit for
the screener stage and another for the BQ stage. The subgroups were formed using
characteristics that were known for both respondents and nonrespondents and were
related to proficiency,10 such as educational attainment,11 DU type, and urban/rural
designation. Based on the analysis, the strongest predictor of screener response
status was census region, with lower response rates for households in the
Northeast and Midwest. The strongest predictor of BQ response status was the
percentage of the population below 150 percent of the poverty threshold, with
lower response rates for persons in census tracts with lower poverty rates. The
subgroups formed by the classification trees were then used in weighting, with
respondents of similar characteristics representing nonrespondents within the
subgroup. The purpose of the adjustment was to correct for the under- or
overrepresentation of respondents in the identified subgroups, potentially
reducing nonresponse bias in the proficiency estimates. The analysis and
adjustment were based on a limited set of demographic variables, so potential
over- or underrepresentation of certain subgroups might still be present.
Extended Nonresponse Bias Analysis
The United States performed the extended NRBA after
weights and proficiency scores (PVs) were produced. The purpose was to provide
an indication of data quality by evaluating the effect of the data collection
procedures, adaptive survey design, and weighting adjustments on nonresponse
bias and the potential for bias in the final proficiency estimates. Highlights
from two key analyses are provided below, with complete results found in the
upcoming PIAAC Cycle 2 U.S. technical report.
The level-of-effort analysis evaluates how
proficiency estimates change as the number of contacts increases. If
nonrespondents are assumed to be similar to hard-to-reach respondents, the
analysis can provide an indication of the potential for nonresponse bias. This
analysis was performed using the final sample of adults ages 16 to 74, which
included the core national sample and the state supplement. Individuals who
responded on the first attempt (10 percent of respondents) scored 12–14 points
lower than the overall average for the three proficiency domains (literacy,
numeracy, and APS). Cumulatively, those who responded on either the first or
second attempt (37 percent) scored 5–6 points lower than the overall average. By
the fourth attempt (cumulatively 57 percent of respondents), the average
proficiency score was within 1 point of the overall average for each of the
three domains. The results indicated the strong potential for nonresponse bias
if only one or two attempts had been made.
Therefore, it could be inferred that contact protocols requiring multiple
contact attempts likely reduced nonresponse bias in the final PIAAC outcomes.
The analysis relied on respondent data and the assumption that nonrespondents
were similar to hard-to-reach respondents. The actual proficiency of the
nonrespondents and the effect of nonresponse on the overall proficiency
estimates is unknown.
Explained variation in outcomes (EVO) is a measure
that describes how much information is known about the proficiency of the target
population based on the respondent data (i.e., proficiency scores) and the
additional characteristics used in weighting adjustments. The EVO can range from
0 to 100 percent, with a higher EVO indicating that there is more information
about the proficiency level of the target population and less potential for
nonresponse bias. The EVO is approximately equal to RR + (1 – RR)*R2,
where RR is the response rate (28 percent for the United States),
and R2 is based on a regression model with the proficiency
score as the dependent variable and the weighting variables as predictors. The
regression R2 indicates the strength of the relationship
between the weighting variables and the proficiency score and can be thought of
as the amount of information about proficiency that is explained by the
weighting variables.
For the United States, the EVO was calculated based on
the national sample
of adults ages 16 to 74, excluding the state supplemental sample because
response rates could not be calculated for the state supplemental sample. The United States’ EVO ranged from 56 to 59 percent for the three proficiency
outcomes. This meant that data from respondents, together with the weighting
variables for nonrespondents, were estimated to explain 56–59 percent of the
proficiency distribution among the eligible sample cases, compared with the 28
percent explained by respondent data alone. The results indicated that the
weighting variables contributed valuable information about the nonrespondents’
proficiency, making the weighting adjustment effective in reducing bias in the
proficiency estimates. An EVO of 56–59 percent would be equivalent to a response
rate of 56–59 percent where no weighting adjustments were performed to reduce
nonresponse bias (or where the weighting variables have no relationship to
proficiency, i.e., R2 = 0), and the data should be considered
with the same level of caution. Based on international criteria for PIAAC, an
EVO threshold of 50 percent was used to distinguish between a high level of
caution and a moderate level of caution regarding the potential for nonresponse
bias, with the United States falling within the moderate range. Given that the
EVO was below 100 percent, the weighting variables did not provide complete
information about the proficiency level of the nonrespondents, and the potential
for nonresponse bias remains.
In general, lower response rates are associated with a
higher risk for nonresponse bias if nonrespondents are very different from
respondents and if the weighting was not effective in reducing those
differences. The extended NRBA provides evidence that data collection
procedures, adaptive survey design, and weighting adjustments were effective in
reducing nonresponse bias. There were no indications of serious concerns in the
final estimates. However, it is not possible to know or quantify the actual
extent of nonresponse bias, and data users should be aware of the potential for
bias in the final PIAAC estimates.
References
International Labour Organization. (2012).
International Standard Classification of Occupations 2008 (ISCO-08): Structure,
group definitions and correspondence tables. Retrieved from
https://www.ilo.org/publications/international-standard-classification-occupations-2008-isco-08-structure.
Judkins, D.R. (1990). Fay’s method for variance
estimation. Journal of Official Statistics, 6(3): 223-239.
Kish, L. (1965). Survey Sampling. New York: John Wiley & Sons.
Krenzke, T., Roey, S. Dohrmann, S.M., Mohadjer, L.,
Haung, W-C., Kaufman, S., and Seastrom, M. (2006). Tactics for Reducing the Risk
of Disclosure Using the NCES DataSwap Software. Proceedings of the American
Sociological Association: Survey Research Methods Section. Philadelphia:
American Sociological Association.
Krenzke, T., and Mohadjer, L. (2020). Application of
probability-based link-tracing and non-probability approaches to sampling
out-of-school youth in developing countries. Journal of Survey Statistics and
Methodology. Retrieved from
https://doi.org/10.1093/jssam/smaa010.
National Center for Education Statistics. (2002).
Maintaining Confidentiality: NCES Standard: 4-2. In NCES Statistical
Standards. Retrieved from
https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2003601.
National Center for Education Statistics. (2012).
2012 Revision of NCES Statistical Standards: Final. Retrieved from
https://nces.ed.gov/statprog/2012/.
National Center for Education Statistics. (2018).
Data Point: Literacy and Numeracy Skills of U.S. Men and Women (NCES
2018-164). Retrieved from
https://nces.ed.gov/pubs2018/2018164/index.asp.
National Center for Education Statistics. (2020).
PIAAC International Highlights Web Report (NCES 2020-127). Retrieved from
https://nces.ed.gov/surveys/piaac/international_context.asp.
Organisation for Economic Co-operation and
Development. (2013). Large Regions, TL2: Demographic Statistics.
Retrieved from
https://www.oecd-ilibrary.org/urban-rural-and-regional-development/data/large-regions-tl2/demographic-statistics_data-00520-en.
Organisation for Economic Co-operation and
Development. (2021). The Assessment Frameworks for Cycle 2 of the Programme
for the International Assessment of Adult Competencies. Retrieved from
https://doi.org/10.1787/4bc2342d-en.
Organisation for Economic Co-operation and
Development. (2022). Cycle 2 PIAAC Technical Standards and Guidelines.
Retrieved from
https://www.oecd.org/content/dam/oecd/en/about/programmes/edu/piaac/technical-standards-and-guidelines/cycle-2/PIAAC_CY2_Technical_Standards_and_Guidelines.pdf.
UNESCO Institute for Statistics. (2012).
International Standard Classification of Education 2011. Retrieved from
https://spca.education/wp-content/uploads/2024/03/international-standard-classification-of-education-isced-2011-en.pdf.
United Nations Statistics Division. (2007).
International Standard Industrial Classification of All Economic Activities
Revision 4, Series M: Miscellaneous Statistical Papers, No. 4 Rev. 4. New
York: United Nations. Retrieved from
https://unstats.un.org/unsd/classifications/Family/Detail/27.
U.S. Census Bureau. (2020). American Community
Survey 2015-2019 5-Year Data Release. Retrieved from
https://www.census.gov/newsroom/press-kits/2020/acs-5-year.html.
U.S. Census Bureau. (2024). PUMS Data.
Retrieved from
https://www.census.gov/programs-surveys/acs/microdata/access.2022.html#list-tab-735824205.
Van de Kerckhove, W., Krenzke, T., and Mohadjer, L.
(2020). Addressing Outcome-Related Nonresponse Through a Doorstep Interview. In
JSM Proceedings, Survey Research Methods Section (715-724). Alexandria,
VA: American Statistical Association. Retrieved from
http://www.asasrms.org/Proceedings/y2020/files/1505350.pdf.
Wasserstein, R., and Lazar, N. (2016). The ASA
Statement on p-Values: Context, Process, and Purpose. The American
Statistician, 70(2): 129–133. Retrieved from
https://doi.org/10.1080/00031305.2016.1154108.
2024-12-10 04:00:00
Source link