Close Menu
Education News Now

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Fulbright board resigns over alleged political interference : NPR

    June 15, 2025

    NYC teachers union declines to endorse in mayor’s race

    June 14, 2025

    Consequences for colleges whose students carry mountains of debt? Republicans say yes

    June 14, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest VKontakte
    Education News Now
    Facebook X (Twitter) Instagram
    Education News Now
    Home»Education»PIAAC – PIAAC Highlights of U.S. National Results
    Education

    PIAAC – PIAAC Highlights of U.S. National Results

    BelieveAgainBy BelieveAgainDecember 16, 2024No Comments56 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Introduction

    The Program for the International Assessment of Adult
    Competencies (PIAAC) is a comprehensive international survey of adult skills. It
    measures adults’ proficiency across a range of key information-processing skills
    and assesses these adult skills consistently across participating countries.
    PIAAC is administered every 10 years and has had two cycles so far. For PIAAC
    Cycle 1, the United States participated in three rounds of data collection
    between 2011 and 2018. A total of 38 countries participated in these three
    rounds of PIAAC Cycle 1. More detailed information can be found in the
    PIAAC 2012/2014/2017: Main Study, National Supplement, and PIAAC 2017 Technical Report.

    PIAAC Cycle 2 began in 2022–23, with 31 countries
    participating
    in the first round.
    The assessment focused on the key cognitive and workplace skills necessary for
    individuals to participate successfully in the economy and society of the 21st
    century. This multicycle study is a collaboration between the governments of
    participating countries, the Organisation for Economic Co-operation
    and Development (OECD), and a consortium of various international organizations,
    referred to as the PIAAC Consortium. In the United States, PIAAC is sponsored by
    the National Center for Education Statistics (NCES) in the Institute of
    Education Sciences of the U.S. Department of Education.

    An important element of the value of PIAAC is its
    collaborative and international nature. Internationally, PIAAC was developed
    collaboratively by participating countries’ representatives from ministries and
    departments of education and labor as well as by OECD staff through an extensive series of
    international meetings and workgroups. All PIAAC countries must follow common
    standards and procedures. As a result, PIAAC can provide a reliable and
    comparable measure of adult skills in the adult population (ages 16–65) of
    participating countries.

    This Methodology and Technical Notes document provides
    an overview, with a particular focus on the U.S. implementation, of the
    following technical aspects of PIAAC Cycle 2:

    More detailed information on these topics can be found in the upcoming PIAAC Cycle 2 U.S. technical report.

    International Requirements for Sampling, Data Collection, and Response Rates

    The PIAAC Consortium oversees all PIAAC activities on
    behalf of OECD and provides support to participating countries in all aspects of
    PIAAC. Each country is responsible for conducting PIAAC in compliance with the
    PIAAC Technical Standards and Guidelines (OECD 2022) provided by the Consortium
    to ensure that the survey design and implementation yield high-quality and
    internationally comparable data. The standards were generally based on
    agreed-upon policies or best practices to follow when conducting the study, and
    all participating countries were required to follow them to have their data
    included in the OECD reports and data products.

    To ensure all participating countries met the
    standards, the Consortium implemented a comprehensive quality control process to
    monitor all aspects of the study,
    including sample selection and monitoring, background questionnaire (BQ)
    adaptations, instrument translation, interviewer training, data collection,
    coding and data processing, data delivery, and weighting and variance
    estimation. The requirements regarding the target populations, sampling design,
    sample size, exclusions, and defining participation rates are described next.

    International Target Population

    The PIAAC target population consisted of all
    noninstitutionalized adults between the ages of 16 and 65 (inclusive) who
    resided in the country (whose usual place of residency is in the country) at the
    time of data collection. Adults were included regardless of citizenship,
    nationality, or
    language.

    The target population included

    • Full-time and part-time members of the military who did not reside in military barracks or on military bases;

    • Adults in noninstitutional collective dwelling units (DUs) or group quarters, such as workers’ quarters or halfway homes; and

    • Adults living at school in student group quarters, such as dormitories.

    In countries where persons were selected from a
    registry, age at the mid-point of data collection was used to determine
    eligibility. In countries where persons were selected using a screener
    questionnaire
    , age was
    defined as the day the screener was conducted.

    Sampling Design

    It is not feasible to assess every adult in each
    participating country. Therefore, a representative sample of adults needed to be
    selected from a list of adults in the target population, i.e., from a sampling
    frame. The sampling frames for all countries were required to include 95 percent
    or more of the PIAAC target population. That is, the undercoverage rate,
    combined over all stages of sampling, could not exceed 5 percent.

    In some countries, a central population registry
    constituted the frame, and individuals were sampled directly from the frame. In
    other countries, including the United States, a multistage sample design was
    used, with the frame built from other sources, for example, lists of primary
    sampling units, secondary sampling units, dwelling units, and individuals within
    dwelling units.

    The sampling frame at each stage was required to
    include any information necessary for sample design, sample selection,
    and estimation purposes, as well as sufficiently reliable information to sample
    individual units and ultimately to locate individuals for the interview and
    assessment.

    Other requirements for each country’s sampling design
    included the following:


    • The sampling frame(s) had to be up-to-date and contain only one unique record for each sampling unit.

    • For multistage area sample designs in which a population registry was not used, countries were required to have a frame of DUs within the selected geographic clusters.

    • Countries with central population registers were required to have a sampling coordination strategy in place to spread the response burden more equally across the population.

    Sample Sizes

    The minimum sample size requirement for PIAAC Cycle 2
    was between 4,000 and 5,000 completed cases
    per reporting language 1 for the PIAAC target population, with the specific
    requirement depending on the number of sampling stages for the country, which is
    related to the predicted design effect for the country. The overall goal of the
    sample design was to obtain a nationally representative sample of the target
    population in each participating country that was proportional to the population
    across the country, in other words, a self-weighting sample design (Kish 1965).

    Countries with highly clustered samples or with a high
    degree of variation in sampling rates due to either oversampling or variation in
    household size were required to increase the sample size requirements to account
    for the higher expected design effects compared to other countries with equal
    probability samples and the same number of sampling stages. Countries had the
    option to increase the sample size to obtain reliable estimates for groups of
    special interest (e.g., 16- to 29-year-olds) or for geographic regions (e.g.,
    states and provinces) or to extend the age range (e.g., age 66 or over).


    Exclusions

    The PIAAC target population excluded adults in
    institutional collective DUs or group quarters such as prisons, hospitals,
    and nursing homes, as well as adults residing in military barracks and on
    military bases.

    The Consortium reviewed and approved any additional
    exclusions to the PIAAC target population, regardless of whether they exceeded
    the 5 percent threshold noted above. Country-specific exclusions were only
    implemented because of operational or resource considerations, for instance,
    excluding persons in hard-to-reach areas.

    Defined Response Rates

    Although the Consortium did not establish set
    participation or response rate standards for all participating countries, each
    country was required to specify sample size goals for each stage of data
    collection (screener if applicable, BQ, and assessment). Other requirements
    included the following:

    • Each country should specify its assumptions about nonresponse and ineligibility rates.

    • The sample size should be adjusted to account for expected nonresponse.

    • For countries with a screener, sample size goals should be constructed for the screener to account for ineligibility and screener nonresponse, as well as nonresponse to the BQ and the assessment.

    A completed case is one that met all of the following criteria:


    • Responses to key background questions in the full BQ, including age, gender, highest level of schooling, employment status, and country of birth (native/nonnative) were collected.

    • The tablet tutorial section was attempted.

    • The locator2 was attempted.

    Sampling in the United States

    The U.S. PIAAC Cycle 2 National Sample Design

    The target population for U.S. PIAAC Cycle 2 consisted
    of noninstitutionalized adults ages 16–74 who resided in the United States at
    the time of the interview. The 16–65 age group is consistent with the
    international target population, and the 66–74 age group was added as a national
    option. Adults were included regardless of citizenship, nationality, or
    language.

    To select a nationally representative sample, U.S.
    PIAAC used a four-stage stratified cluster sample design. This method involved
    (1) selecting primary sampling units (PSUs) consisting of counties or groups of
    contiguous counties; (2) selecting secondary sampling units (SSUs) consisting of
    area blocks; (3) selecting DUs (for example,
    single-family homes or apartments selected from address listings); and (4)
    selecting eligible persons within DUs. Random selection methods were used at
    each stage of sampling. Initial sample sizes were determined based on a goal of
    5,000 respondents ages 16–65 per PIAAC standards, plus an additional 1,020
    respondents ages 66–74. During data collection, response rates and sample yields
    were monitored and calculated by key demographic and subgroup characteristics.
    These sampling methods and checks ensured that the sample requirements were met
    and that reliable statistics based on a nationally representative sample could
    be produced.

    First Stage

    The PSU sampling frame was constructed from the list
    of counties and population estimates in the Vintage 2020 Census Population
    Estimates, joined with additional county-level data for stratification. To form
    PSUs, small counties were combined with adjacent counties until they reached a
    minimum population size of 15,000 eligible adults;3 most PSUs consisted of a
    single county.

    The four largest PSUs were selected with certainty
    (i.e., with a probability of 1). The remaining PSUs were grouped into major
    strata formed by Census region, metro status,
    and literacy level, where literacy level was based on results from PIAAC
    2012/14/17. Within each major stratum, PSUs were further grouped into minor
    strata formed from one or more proficiency-related variables from the 2015–19
    American Community Survey (ACS; U.S. Census Bureau 2020) related to education,
    ethnicity, poverty, employment status, marital status, occupation, and health
    insurance status.
    Once the strata were formed, one PSU was selected per stratum using a
    probability-proportional-to-size (PPS) technique.

    Second Stage

    The sampling frame of SSUs was constructed from
    block-level data in the Census 2020 PL-94 redistricting file, with blocks
    combined to reach a minimum size of 120 DUs. Within a PSU, SSUs were sorted
    geographically and selected using a systematic PPS technique. This approach
    allowed for a diverse sample of SSUs spread across the PSU.

    Third Stage

    The sampling frame at the third stage, a list of DUs,
    was formed from a combination of residential address lists from the U.S. Postal
    Service (also known as address-based sampling lists) and lists of DUs made by
    field staff (also known as traditional listing) for each sampled SSU. Within an
    SSU, DUs were sorted geographically and selected using a systematic random
    sample. This resulted in an initial self-weighting sample of DUs (i.e., each DU
    had the same overall probability of selection). The initial sample was randomly
    divided into a main sample for initial release and a reserve sample to be used
    as needed.

    Fourth Stage

    The fourth stage sampling frame, a list of
    individuals, was created through information collected in a screener
    questionnaire, in which a household respondent was asked to list people who
    lived in the dwelling and had no usual place of residence elsewhere. Individuals
    were then selected using a stratified simple random sample, with strata based on
    age group (16–65 and 66–74). In the first stratum, one or two 16- to 65-year-olds
    were selected depending on household size. Selecting two persons in larger
    households (households with four or more 16- to 65-year-olds) helped reduce the
    variation due to unequal probabilities of selection. One 66- to 74-year-old
    was selected from the second stratum. Therefore, an eligible household could
    have one to three individuals selected for the survey.

    The U.S. PIAAC Cycle 2 Supplemental State Sample

    In addition to the national sample described above,
    the U.S. PIAAC Cycle 2 included a supplemental sample in particular states. The
    purpose of the supplemental sample was to increase the number and diversity of
    sampled counties to improve model-based state-
    and county-level estimates. After the national sample of PSUs was selected,
    supplemental PSUs were selected so that each state had at least two sampled PSUs
    in the combined sample. Then SSUs, DUs, and eligible adults were selected within
    the PSUs using the same sampling methods as described for the national sample.
    About 2 months into data collection (November 2022), collection for the state
    supplemental sample was halted due to funding, resulting in an incomplete
    supplemental sample. The U.S. PIAAC Cycle 2 sample was designed to be nationally
    representative regardless of whether the supplemental state sample was included.
    Therefore, it was possible to combine the incomplete supplemental sample with
    the national sample, maintaining a nationally representative sample and
    improving the diversity for small area estimation purposes.


    Questionnaire and Assessment Development

    Background Questionnaire

    The PIAAC BQ collected detailed information to support
    a wide range of contextual analyses. It facilitates the examination of how skill
    proficiency is distributed across various sociodemographic groups of the
    population. It also allows for insights into how skills are associated with
    outcomes and how they are used in personal and professional contexts. Finally,
    it facilitates the investigation of how proficiency is related to investments in
    education and training, shedding light on the process of skills formation.

    PIAAC Cycle 2 was designed to allow results to be as
    comparable as possible with those of PIAAC Cycle 1. At the same time, the survey
    instruments were improved in several dimensions.

    Revisions to the PIAAC BQ focused on

    • Adaptation to international standards, such as the International Standard Classification of Education 2011, the framework used to compare statistics on the educational systems of countries worldwide (UNESCO Institute for Statistics 2012);

    • Adaptation to changes in the technological environment;

    • Enriched information on the working environment and the use of high-performance work practices to make best use of workers’ skills;

    • More detailed information on the pathways respondents followed through their educational careers; and

    • A new (optional) section on social and emotional skills. (This option was not inlcuded in the U.S. version of the BQ).

    The PIAAC Cycle 2 questionnaire included the following topics:

    • Personal and background characteristics;

    • Education and training;

    • Current employment status and work history;

    • Use of skills, skills mismatches, and the working environment;

    • Noneconomic outcomes; and

    • Social and emotional skills.

    The international version of the BQ is available.

    U.S. BQ Adaptations

    The Consortium developed the PIAAC international
    master version of the BQ, which was the basis for the U.S. national BQ. Several
    questions were adapted from the international version of the questionnaire to be
    appropriate in the U.S. educational and cultural context.
    Individual questions were evaluated for analytic relevance and respondent burden
    (e.g., recall, clarity, salience),
    resulting in several additions and deletions for the field test instrument, with
    further revisions for the main study. Participating countries were allowed to
    add up to 5 minutes of country-specific items. Instead of including a new
    section on social and emotional skills,
    which was optional for countries, the U.S. national BQ was modified to include a
    21-question module on financial literacy, Section L.

    Direct Assessment

    The PIAAC Cycle 2 direct assessment (literacy,
    numeracy, and adaptive problem solving) tasks focused on respondents’ ability to use
    information-processing strategies to solve problems they encounter in their
    everyday lives. For more details, see the PIAAC Cycle 2 assessment frameworks.
    The assessment tasks and materials were designed to measure a broad set of
    foundational skills required to successfully interact with the range of
    real-life tasks and materials that adults encounter in everyday life. The
    resolution of these tasks does not require specialized content knowledge or more
    specific skills. The skills assessed in PIAAC are considered general skills
    required in a very broad range of situations and domains. The PIAAC assessment
    was not designed to identify any minimum level of skills that adults must have
    to fully participate in society. A feature of the PIAAC assessment common to all
    three skill domains is the need to reflect the changing nature of information in
    today’s societies due to the prevalence of data-intensive and complex digital
    environments. Therefore, many PIAAC assessment tasks are embedded in these kinds
    of environments.

    For PIAAC Cycle 2, the constructs of literacy, numeracy, and
    adaptive problem solving were refined to better reflect the evolution of skills
    in complex digital environments. Each domain is briefly described below (OECD
    2021).

    Literacy is accessing, understanding, evaluating,
    and reflecting on written texts in order to achieve one’s goals, to develop
    one’s knowledge and potential,
    and to participate in society. PIAAC also evaluates adults’ ability to read
    digital texts and traditional print-based texts. The revised construct reflects
    the growing importance of reading in digital environments, which poses different
    cognitive demands and challenges, and the increasing need to interact with
    online texts. For PIAAC Cycle 2, some literacy tasks involved multiple sources
    of information, including static and dynamic texts that respondents had to
    consult to respond. The texts were presented in multiple text formats, including
    continuous (e.g., sentences, paragraphs), non-continuous (e.g., charts, tables),
    and mixed text, and
    reflected a range of genres.

    Numeracy is accessing, using,
    and reasoning critically with mathematical content, information, and ideas
    represented in multiple ways in order to engage in and manage the mathematical
    demands of a range of situations in adult life. It is an essential skill in an
    age when individuals encounter an increasing amount and wide range of
    quantitative and mathematical information in their daily lives. Numeracy is a
    skill parallel to reading literacy, and it is important to assess how these
    competencies interact because they are distributed differently across subgroups
    of the population. For PIAAC Cycle 2, the assessment of numeracy covered
    engagement with mathematical information in digital environments. It also
    included an assessment of numeracy components, focused on some of the skills
    essential for achieving automaticity and fluency in managing mathematical and
    numerical information.

    Adaptive problem solving (APS) involves the
    capacity to achieve one’s goals in a dynamic situation, in which a method for
    solution is not immediately available. It requires engaging in cognitive and
    metacognitive processes to define the problem, search for information, and apply
    a solution in a variety of information environments and contexts. The assessment
    explicitly considers individuals’ ability to solve multiple problems in
    parallel, which requires individuals to manage the order in which they approach
    a list of problems and to monitor opportunities that arise for solving different
    problem sets. The assessment of APS in PIAAC Cycle 2 aimed to highlight the
    respondents’ ability to react to unforeseen changes and emerging new
    information. Results from PIAAC Cycle 2 were not comparable to the assessment of
    problem solving in technology-rich environments from PIAAC 1.

    As the objective of PIAAC is to assess how the adult
    population is distributed over a wide range of proficiency in each of the
    domains assessed, the tasks were designed to capture different levels of
    proficiency and vary in difficulty. An adaptive assessment design was employed
    in literacy and numeracy to ensure respondents were presented with items that
    were challenging for their level of proficiency without being too easy or too
    difficult.

    Data Collection

    The main study data collection was conducted between
    September 1, 2022,
    and June 16, 2023. A total of 4,637 respondents across the United States
    completed the BQ, with 4,574 of them also completing the assessment. This number
    includes the core national sample of adults ages 16 to 65 for PIAAC Cycle 2 and
    the supplemental sample of adults ages 66 to 74, which was of special interest
    to NCES. Although the United States fell short of the designated PIAAC goal for
    the number of completed cases due to the low participation rate, the minimum
    required for the psychometric modeling was met.

    Each sampled household was administered a screener to
    determine the eligibility of household members to participate in the survey.
    Within households, each sampled person selected completed an
    interviewer-administered BQ,
    followed by a self-administered tablet-based assessment. Sampled persons who
    completed the assessment received an incentive of $100. Sampled households that
    had not been contacted in person received a paper version of the screener
    questionnaire with an unconditional incentive of $5.

    Data Collection Instruments

    Before contacting anyone at the sampled address,
    interviewers were required to complete a short series of questions called the DU
    Observations related to the sampled address. The interviewers completed these
    questions using their study iPhone. The information from the DU Observations was
    used in nonresponse bias analysis (NRBA) to evaluate whether nonrespondents
    lived in homes and environments similar to those of respondents and thus helped
    address the generalizability of the data collected from respondents to the whole
    population.

    The PIAAC household interview was composed of three
    distinct instruments: the screener, BQ, and the direct assessment. A short,
    self-administered questionnaire called the doorstep interview was also available
    for respondents who did not speak English or Spanish, which were the two
    languages in which the screener and BQ were available.
    (See Figure 1 for an overview of the flow of respondents through the survey.)

    Figure 1. Routing flow through the PIAAC instrumentation

    Figure 1 detailed description.

    Screener

    Household members who were 16–74 years old were
    eligible to be selected, with up to two persons selected in households with four
    or more eligible adults. Interviewers used the screener—a computer-assisted
    personal interviewing (CAPI) instrument—to collect the first name, age, and
    gender of each household member. The CAPI system conducted a within-household
    sampling procedure to select sampled person(s) to participate in the study. In
    the United States, the screener was available in English and Spanish.

    Partway through data collection, a secondary mode of
    screener data collection was added. All households that had received at least
    four in-person contact attempts,
    but had not yet responded or participated, were sent a paper version of the
    screener along with a $5 incentive and a postage-paid envelope to return the
    completed questionnaire. Information from the paper screeners was entered into
    the tablet to select eligible persons for study participation.

    Background Questionnaire

    Each sampled person completed the BQ,
    which collected respondent information on the following areas: socio-economic
    and demographic background; education and training; employment status and work
    history; current work or past job; skills used at work and in everyday life;
    work practices and the work environment; attitudes and activities; background,
    including parents’ education and occupation; and financial literacy. The BQ was
    developed as an interviewer-administered CAPI instrument and was conducted on
    the interviewer’s tablet. In the United States, the PIAAC Cycle 2 U.S. main
    study BQ was available in English and Spanish.

    Direct Assessment

    Each sampled person completed the assessment using a
    tablet. In the United States, the direct assessment was only available in
    English. The assessment began with a tablet tutorial to make sure respondents
    understood how to interact with the device and the interface. The tutorial
    included short video animations that demonstrated actions respondents would use
    to complete the assessment items,
    such as tapping, dragging and dropping, and highlighting text. It also included
    examples of screen layouts and response option formats for the various
    assessment tasks. After practicing the tutorial, the sampled person completed
    the locator (also referred to as Stage 1),
    which was composed of eight numeracy and eight literacy items. The sampled
    person then was routed to a combination of literacy, numeracy, or APS tasks of
    different difficulty levels.

    The APS assessment items were divided into five
    clusters, with respondents exposed to two randomly selected clusters of items.
    The literacy and numeracy assessments used a hybrid multistage adaptive/linear
    design. The adaptive component of the design was based on six different testlets
    administered in Stage 2, with three low-difficulty testlets and three
    high-difficulty testlets. Assignment to Stage 2 testlets depended on performance
    on the locator test and personal characteristics collected in the BQ. Stage 3
    also featured six testlets: two of low difficulty, two of medium difficulty, and
    two of high difficulty. The assignment to testlets in Stage 3 was driven by
    performance in Stage 2. Finally, a linear component was introduced to ensure
    that each item was attempted by a sufficient number of respondents from a wide
    proficiency range.

    The OECD developed the criteria for determining the adaptive design routing through the assessment paths based on respondent performance. Respondents who failed the locator were routed to the
    Components section,
    which measured basic numeracy and reading skills. Twenty-five
    percent of the respondents who did well on the locator were also randomly routed
    to the Components section before completing the assessment items, while the
    majority of these respondents (75 percent) were routed directly to literacy,
    numeracy or APS. Respondents who performed well on the locator received a
    combination of two of the following direct assessment instruments—the two-stage,
    adaptive modules of literacy or numeracy testlets; the two-stage, linear modules
    of literacy or numeracy testlets; or the linear APS clusters. Respondents who
    passed the locator but performed relatively poorly received a combination of two
    of the following direct assessment instruments: the two-stage, adaptive modules
    of literacy or numeracy testlets, the two-stage, linear modules of literacy or
    numeracy testlets, or
    the linear APS clusters.

    After the completion of the direct assessment, a set
    of Effort and Performance questions asked respondents about the effort they put
    in to completing the assessment and how they thought they performed in the
    assessment.

    Doorstep Interview

    The doorstep interview was a short questionnaire
    available on the tablet for sampled persons who had a language barrier and were
    unable to complete the BQ in English or Spanish. The doorstep interview was
    designed to obtain key information on the characteristics of respondents who
    would have been classified as literacy‐related
    nonrespondents in the first cycle. These individuals were essential to the
    population model for the estimation of proficiencies,
    and some information related to their background characteristics helped improve
    the population model and contributed to the analysis and reporting of key
    findings.

    Interviewers used a language identification card,
    which listed the languages in which the doorstep interview was available to
    ascertain the language spoken by the sampled person. The questionnaire was then
    presented to the sampled person on the tablet in their preferred language. The
    short series of questions collected information on respondent gender, age, years
    of education, current employment status, country of birth, and number of years
    in the United States (if nonnative). In the United States, the doorstep
    interview was available in 11 languages: Arabic, Chinese (simplified), Chinese
    (traditional), Farsi, Korean, Punjabi, Russian, Somali, Spanish, Urdu, and
    Vietnamese.

    Post-Interview Questionnaire

    After the interview was completed, interviewers
    completed a brief post-interview questionnaire to record where the interview
    took place, whether the sampled person requested assistance with the BQ or
    assessment (from the interviewer or other household members), or if there were
    any events that may have interrupted or distracted the sampled person during the
    interview.

    Field Staff Training

    To ensure that all interviewers were trained
    consistently across participating countries, the Consortium provided a
    comprehensive interviewer training package,
    including manuals and training scripts to be used by national training teams.
    Countries could adapt training materials to their national context as needed.
    The Consortium recommended that countries provide all interviewers with
    approximately 20 hours of training,
    which included general interviewer training and PIAAC-specific
    training content. All interviewers in the United States received 2 weeks of
    training (approximately 40 hours).

    As a result of the COVID-19 pandemic, countries were
    allowed to adapt the field interviewer training program from the PIAAC Cycle 1
    in-person model to a hybrid model,
    with training sessions delivered both in person and virtually. The interviewer
    training program in the United States included virtual delivery of
    administrative procedures, general interviewing techniques, and introductory
    training sessions. In-person training maximized trainee involvement and
    emphasized gaining respondent cooperation skills, answering questions about the
    study, and practicing the administration of all interview components (i.e.,
    the screener, BQ, doorstep interview, direct assessment, and the post-interview
    questionnaire).

    To ensure that the interviewer training conducted by
    national teams met the requirements specified in the PIAAC Technical Standards
    and Guidelines (OECD 2022), each country, including the United States,
    submitted a summary training report within a month of completing national
    training and a final training report within a month of ending data collection to
    report on additional attrition trainings held during the field period.

    Fieldwork Monitoring

    The requirements for monitoring data collection
    throughout the field period were specified in the PIAAC Technical Standards and
    Guidelines (OECD 2022). These included monthly submission of sample monitoring
    and survey operations reports during data collection. The Consortium provided an
    international dashboard and specifications for management and monitoring reports
    to be used by national teams overseeing data collection. These reports provided
    information about interviewer productivity, time of interview, overall interview
    timing and timing of individual instruments completed, time elapsed between
    interviews, and validation reports. The Consortium required validation of 10
    percent of each interviewer’s finalized cases. Countries were also required to
    monitor the quality of interviewer fieldwork by reviewing two audio-recordings
    of interviews (BQ) completed by each interviewer and the data quality of
    completed interviews. Finally, all national teams were required to attend a
    series of quality control calls with the Consortium to report on the status of
    data collection.

    The United States submitted the required monthly
    reports and completed four quality control calls with the Consortium during the
    field period. Monitoring of fieldwork was implemented using a corporate
    dashboard that displayed key quality performance indicators to track interviewer
    productivity (interviews started too early or too late in the day, multiple
    interviews completed in a day, time elapsed between interviews, etc.) and data
    quality (short BQ and/or assessment timings, BQ item response rate for completed
    interviews, etc.). Additional automated validation of all completed screeners
    and interviews was completed using geospatial location software,
    which captured geospatial data by using the GPS feature on the interviewers’
    mobile devices.

    As required by the Consortium, each interviewer’s 3rd
    and 10th BQ interview was reviewed, and corrective feedback was provided as
    needed. Telephone validation of 10 percent of each interviewer’s finalized cases
    was also implemented. Data quality checks included consistency checks on
    respondent age and gender obtained in the screener versus the BQ, checks on
    open-ended responses, missing data, and data frequencies.

    U.S. Response Rates

    This section provides information on the coverage of
    the target population, weighted response rates, and the total number of
    households and persons for U.S. PIAAC Cycle 2. For information on the other
    PIAAC Cycle 2 participating countries, please refer to the upcoming OECD PIAAC Cycle 2
    technical report.

    As table 1 shows, the U.S. PIAAC Cycle 2 covered
    nearly 100 percent of the target population, with an exclusion rate of 0.5
    percent. The U.S. PIAAC Cycle 2 achieved an overall weighted response rate of 28
    percent, which is the
    product of 50 percent, 56 percent, and 99 percent for the screener, BQ,
    and assessment, respectively. The overall response rate ranges from 27 percent
    to 73 percent across countries that participated in the PIAAC Cycle 2, including
    four countries with an overall response rate below 30 percent and one country
    above 70 percent. As table 2 shows, the U.S. PIAAC Cycle 2 sample included
    16,414 households and 7,754 individuals. Among the 4,637 individuals who
    responded to the BQ, 4,574 also responded to the assessment. The response rates
    in table 1 are based on the PIAAC core national sample only because the data
    collection for the state supplemental sample ended prematurely. The sample size
    and numbers of respondents in table 2 include the core national sample and state
    supplemental sample. Both tables are for the population ages 16–74.

    Tables 1 and 2 are also available in Excel.






    Table 1. Coverage of target population and response rates: U.S. PIAAC Cycle 2, 2022-23
    CountryPercentage of target population coverageOverall exclusions from national target populationWeighted screener response rateWeighted BQ response rateWeighted assessment response rateOverall weighted response rate
    United States99.6%0.5%50.2%56.1%98.9%27.8%






    Table 2. Sample size and number of respondents: U.S. PIAAC Cycle 2, 2022-23
    CountryHouseholds in samplePersons in sampleBQ respondentsAssessment respondents
    United States16,4147,7544,6374,574

    Data Cleaning and Coding

    To ensure the delivery of a high-quality, clean data
    file in a standardized layout, all countries participating in PIAAC Cycle 2 were
    required to use the Data Management Expert (DME), a data management software
    package. The DME was used in conjunction with the International Association for
    the Evaluation of Educational Achievement (IEA)-supplied Data Management Manual
    and Technical Standards & Guidelines (OECD 2022) to


    • Integrate screener, BQ, and assessment data;
    • Clean and verify data through edit checks;
    • Export data for coding (e.g., occupation) and import coded data; and
    • Produce the final dataset for delivery.

    Data cleaning ensured that all information in the
    database conformed to the internationally defined data structure,
    the national adaptations to questionnaires were reflected appropriately in
    codebooks and documentation,
    and all variables selected for international comparisons were comparable across
    systems. Data edits fell into two categories. Validation checks verified that
    case IDs were unique and that values conformed to expected values/ranges. Record
    consistency checks identified linkage problems between data tables and potential
    issues in the sample or survey control file data. Throughout data collection,
    the record consistency checks were closely monitored for discrepancies in
    demographic information between the screener and the BQ. Because these
    discrepancies could be a signal that a person other than the selected household
    member had erroneously completed the BQ and assessment, it was critical to
    resolve these issues as early as possible.

    The entire suite of edit checks was run periodically
    and at the conclusion of data collection. Data fixes were applied where
    appropriate, and
    reasons for acceptable anomalies were determined. All issues and their outcomes
    were documented and submitted to IEA with the data file delivery. IEA conducted
    further review and cleaning, resolving issues as needed.

    The DME also facilitated the integration of coded data
    for verbatim responses related to occupation, industry, language, and country.
    IEA provided the coding schemes to be used:


    • 2008 International Standard Classification of Occupations (ISCO-08; International Labour Organization 2012) was used to code occupations reported in the BQ. Occupational coding was done to the four-digit level when enough information was available.

    • International Standard Industrial Classification of All Economic Activities (ISIC), Revision 4 (United Nations Statistics Division 2007) was followed to assign industry codes. Industry coding was done to the four-digit level when enough information was available.

    • ISO 639-2 alpha-34 was used for languages learned at home during childhood and languages spoken at home.

    • UN M.495 coding scheme was used to code the country of birth and the country of highest education.

    One additional coded variable—identifying the
    respondent’s geographic region using the OECD TL2 coding scheme (OECD 2013)—was
    suppressed in the U.S. PIAAC dataset because the U.S. population was not sampled
    on a regional or state level to be representative.

    Verbatim responses from the BQ were exported from the
    DME and coded in a separate coding software system. All coding was 100 percent
    verified or double-coded to ensure accuracy and consistency across coding staff. The coded data were then
    imported into the DME to their appropriate tables for delivery along with the
    other study data.

    After importing the coded data and reviewing all data
    edit checks, additional frequency review and data reconciliation checks were
    performed to ensure data were loaded correctly and were in sync with disposition
    codes from the PIAAC Study Management System (SMS). The SMS final disposition
    codes were compared against the aggregate of data available for each case; some
    technical problem cases were discovered by identifying disparities between the
    disposition codes and the lack of or incompleteness of data in the DME. Cases
    with disparities were reviewed closely; in some instances,
    this review yielded the recovery of the missing data. Throughout the process,
    possible errors were investigated, documented, and resolved before the delivery
    of the final dataset to IEA. The remaining discrepancies were documented in the
    final delivery notes delivered along with the final data files.


    Weighting in the United States

    While the U.S. PIAAC sample is nationally
    representative, analysis of the sample data requires the use of weights that
    facilitate accurate estimation of population characteristics. The weights were
    constructed to account for the complex sample design and nonresponse patterns
    and were further adjusted through a calibration and trimming process that used
    external population data from the ACS (U.S. Census Bureau 2020) to potentially
    improve the accuracy of weighted estimates. The weights also accounted for the
    combining of the national sample and the state supplemental sample into a single
    sample.

    For the national sample, sampling weights were
    constructed at each stage of the four-stage sample design. For the sampling of
    PSUs, SSUs, DUs, and individuals within DUs, sampling weights were derived as
    the inverse of the probability of random selection with adjustments to account
    for nonresponse. For sampled DUs that did not yield a complete screener
    questionnaire due to nonresponse and for sampled individuals who did not
    complete a BQ, nonresponse adjustments were applied to the weights so that
    nonrespondents could be represented by respondents with similar characteristics.
    For cases where nonresponse was attributable to literacy-related reasons, the
    nonresponse adjustments specifically used doorstep interview respondents to
    represent nonrespondents (Van de Kerckhove,
    Krenzke and Mohadjer 2020). The stage-specific, nonresponse-adjusted sampling
    weights were then combined into a single overall sampling weight for the
    national sample.

    The construction of weights for the supplemental
    sample began with poststratification weighting to ensure that the weighted
    distribution of respondents aligned with population benchmarks obtained from the
    ACS 2022 1-year Public Use Microdata Sample (U.S. Census Bureau 2024). The
    poststrata were defined to incorporate respondent characteristics related to the
    sample design and proficiency outcomes while satisfying minimum sample size
    requirements. Poststratification was used as the initial basis of weighting
    because—as a result of the disruptions in data collection—the stage-specific sampling probabilities could not account
    for the actual process by which households and persons were contacted for the
    survey.

    The weights for the national and supplemental state
    samples were further adjusted so that the samples could be combined. These
    adjustments were applied for every respondent in the supplemental sample and for
    every respondent in every PSU of the national sample,
    except for the four PSUs that were sampled with certainty because those PSUs
    were not eligible to be sampled for the supplemental state sample. The sample
    combination adjustments consisted of two steps. In the first step, the same
    poststratification process adjustment applied to the state supplemental sample
    was applied to the nonresponse-adjusted sampling weights of cases in the
    national sample from PSUs not selected with certainty. In the second step,
    weights from both samples were scaled by compositing factors, with the values of
    these factors determined using the method described by Krenzke and Mohadjer
    (2020). Following the compositing adjustment, the weights from the combined
    sample could be used to estimate national population characteristics.

    The weights of the combined sample underwent raking
    and trimming adjustments. The purpose of the raking adjustment was to align
    weighted sample distributions with external population benchmarks derived from
    ACS data (U.S. Census Bureau 2020) while potentially reducing sampling variance
    and possible bias from factors such as nonresponse. The purpose of the trimming
    adjustment was to reduce variation in the weights and thereby reduce the
    variance of weighted estimates. The variables used for the raking adjustment
    were related to age, gender, race and ethnicity, educational attainment, country
    of birth (United States or outside the United States), and place of residence.
    Largest weights for doorstep interview respondents were trimmed before other
    raking and trimming adjustments were made.

    The final raking adjustment yielded the final weights
    for the combined sample. The final weights are accompanied by 80 sets of
    replicate weights constructed using Fay’s method of balanced repeated
    replication (Judkins 1990). Each set of replicate weights underwent the same
    stages of weighting adjustments described above so that the replicate weights
    could be used to estimate variances accounting for both the complex design of
    the U.S. PIAAC sample and the many adjustments used to produce the final
    weights.

    Changes to the Assessment Administration, Content, and Scaling between Cycles 1 and 2

    Differences in scores between PIAAC Cycle 1 (2012/14 and 2017) and PIAAC Cycle 2 (2023) are discussed in the U.S. PIAAC Highlights Web Report. Comparisons of 2023 results with 2012/14 and 2017 PIAAC assessments need to be made with caution due to changes in the assessment and scaling methodology.

    Key changes for Cycle 2 include:

    • Move to exclusive use of tablets for the administration of the survey and assessment in the United States, whereas in previous PIAAC assessments, respondents were offered an option of responding by laptop or in a paper-and-pencil format.

    • Framework changes resulting in more items, as well as more interactive items. The literacy and numeracy frameworks were updated for Cycle 2 to reflect the technological and social developments that affect the nature and practice of numeracy and literacy skills and methodological developments in the understanding of the skills measured.

    • Design changes that allowed for greater accuracy in routing participants to different paths based on their proficiency. In Cycle 1, a computer or paper-based “core” (i.e., 4 literacy and 4 numeracy items at a very low level) was used to assess whether an individual had sufficient basic skills to take the full assessment. In Cycle 2, a tablet “locator” test (with 8 literacy and 8 numeracy items ranging from very low to medium levels) was used to route participants to different paths based on their level of proficiency.

    • Changes in scaling methodology to include the reading and numeracy components data in the proficiency estimates. To improve the precision of the estimates of proficiency at the bottom of the skills distribution, Cycle 2 incorporated performance in the component assessments in estimating the literacy and numeracy proficiency of respondents

    Data Limitations

    As with any survey, PIAAC data are subject to errors
    caused by sampling and nonsampling reasons. Sampling error is due to sampling a
    proportion of the target population instead of including everyone in the survey.
    Nonsampling error can happen during data collection and data processing.
    Researchers should take errors into consideration when producing estimates using
    PIAAC data.

    Sampling Errors

    Sampling error is the uncertainty in an estimate that
    arises when not all units in the target population are measured. This
    uncertainty, also referred to as sampling variance, is usually expressed as the
    standard error of a statistic estimated from sample data. There are two commonly
    used approaches for estimating variance for complex surveys: replication and
    Taylor series (linearization). The replication approach was used for PIAAC
    because of the need to accommodate the complexities of the sample design, the
    generation of plausible values (PVs), and the impact of the weighting
    adjustments. The specific replication approach used for calculating standard
    errors in PIAAC Cycle 2 was balanced repeated replication with Fay’s adjustment
    (factor = 0.3).

    For estimates that do not involve PVs, the estimates
    of standard errors are based entirely on sampling variance. For estimates
    involving PVs, calculations of standard errors must account for both the
    sampling variance and the variance due to imputation of PVs. The imputation
    variance reflects uncertainty due to inferring adults’ proficiency estimates
    from their observed performance on a subset of assessment items and other
    proficiency-related information.

    Standard errors for all BQ items in the U.S. national
    public-use file (PUF) can be found in the PUF compendia, forthcoming. The purpose of the compendia is to support PUF users so that they can gain knowledge of the
    contents of the PUF and can use the compendia results to ensure they are
    performing PUF analyses correctly.

    Details of estimating standard errors in the PIAAC Cycle 2 U.S. results can be found in appendix E (Data User Guide) of the
    upcoming PIAAC Cycle 2 U.S. technical report.

    Population

    The results presented for the 2012/14 and 2017 PIAAC assessments are for those individuals who could respond to PIAAC in either English or Spanish. The results for PIAAC Cycle 2 also included adults who did not speak English or Spanish, who were given a short, self-administered survey of background information in the language they identified as the one they best understood. This allowed for an estimation of their proficiency. (See Doorstep Interview in the Data Collection section for more information.)

    Nonsampling Errors

    Nonsampling error can result from factors such as
    undercoverage of the target population, nonresponse by sampled households and
    persons, differences between respondents’ interpretations of the survey
    questions and the questions’ intended meaning, data preparation errors, and differences in the assesssments and scoring methodology between cycles.
    Unlike sampling errors, nonsampling errors are often difficult to measure.
    Although PIAAC strives to minimize errors through quality control and weighting
    procedures, some errors inevitably remain in the data.

    Missing Data

    PIAAC used a standard scheme for missing values.
    Designated missing codes were used to indicate don’t know, refused, not stated
    or inferred, and valid skips. The assessment items also included a missing code
    for not administered. For more details on the missing codes, please see appendix
    E (Data User Guide) of the upcoming PIAAC Cycle 2 U.S. technical report.

    The key BQ variables (e.g., age, gender, highest
    education level, employment status, country of birth) had either no or very
    little missing data. For a complete list of item response rates, please see the
    upcoming PIAAC Cycle 2 U.S. technical report.

    Confidentiality and Disclosure Limitations

    The NCES Standard 4-2, Maintaining Confidentiality
    (NCES 2002) provides guidelines for limiting the risk of data disclosure for
    data released by NCES. Confidentiality analyses were conducted on the PIAAC data
    in accordance with the NCES Standard. The analyses included a three-step process
    to reduce disclosure risk: (1) determine the disclosure risk arising from
    existing external data, (2) apply data treatments using a method called data
    swapping, and (3) coarsen the data (e.g., top- and bottom-coding, categorization of continuous
    data), Swapping, which involves random swapping of data elements between like
    cases, was designed to not significantly affect estimates of means and variances for the whole
    sample or reported subgroups (Krenzke et al. 2006). Careful consideration was
    given to protect respondent privacy while preserving data utility to the
    greatest extent possible. Please refer to the upcoming PIAAC Cycle 2 U.S.
    technical report for more details on the data confidentiality process.

    The following files, included in the PIAAC data dissemination products, were produced following the aforementioned three-step process:


    • U.S. national PUF (ages 16–74);
    • international PUF (ages 16–65); and
    • U.S. national restricted-use file (RUF) (ages 16–74).

    The RUF contains noncoarsened, swapped data,
    and the PUF contains coarsened, swapped data. Data were also added to two
    web-based data tools, the NCES International Data Explorer (IDE) and the OECD
    IDE, following the confidentiality procedures established for disseminating data
    via data tools. Both the NCES and OECD IDE enable the user to create statistical
    tables and charts for adults ages 16–65, while the NCES IDE also facilitates
    analyses on 66- to 74-year-olds.

    PIAAC Cycle 2 participants were informed that their
    privacy was protected throughout all phases of the study and information they
    provided could only be used for statistical purposes and would not be disclosed,
    or used, in identifiable form for any other purpose except as required by law
    (20 U.S.C. §9573 and 6 U.S.C. §151). Individuals are never identified in any
    releases (data files, reports, tables, etc.) because reported statistics only
    refer to the United States as a whole or to national subgroups. Participants’
    names, addresses, and
    any contact information collected during the interviews are excluded from the
    final datasets.

    All individuals who worked on PIAAC field data
    collection, including
    supervisors and interviewers,
    were required to sign a PIAAC confidentiality agreement. All employees who
    worked on any aspect of the study, including the management of data collection,
    data creation, data dissemination, data analysis, and reporting,
    signed an affidavit of nondisclosure.

    Statistical Procedures

    Test of Significance

    Patterns observed in the sample may not be present in
    the population. For example, in the sample, the average literacy score for one
    region may by chance be higher than for other regions, but,
    in fact, that region
    might have a lower average literacy score in the population. Statistical
    significance tests are commonly used by analysts to help assess whether a
    pattern observed in the sample is also present in the population. The result of
    a test is said to be statistically significant if the pattern in the sample is
    determined to be unlikely to have occurred if the pattern was not also present
    in the population (i.e., unlikely to have been a matter of random chance). However, when an observed difference among groups in the
    sample is described as statistically significant, that does not necessarily mean
    that the difference among the groups in the population is meaningfully large.
    The NCES Statistical Standards (NCES 2012) require reported analyses to focus on
    differences that are substantively important rather than merely statistically
    significant, and the standards note that “it is not necessary, or desirable, to
    discuss every statistically significant difference” in reported analyses.
    Results of statistical significance tests should be reported with underlying
    estimates and accompanying measures such as standard errors, coefficients of
    variation, or confidence intervals (Wasserstein and Lazar 2016).

    Statistical significance tests rely on estimates of
    sampling variance. As such, it is necessary to use the variance estimation
    methods described in the upcoming PIAAC Cycle 2 U.S. technical report. Analysts
    should use the provided replicate weights; for analyses involving PVs, analysts
    should also use the variance estimation formula provided in the technical report
    that accounts for the imputation variance of the PVs. These variance estimation
    methods are highly flexible and may be used for several kinds of statistical
    significance tests.

    Throughout PIAAC reports, t-tests are used to assess
    the statistical significance of differences in proficiency scores between two
    groups or two periods of time. The reports use two types of t-tests. The first
    type of t-test compares estimates from two groups within the U.S. PIAAC sample,
    denoted theta hat sub 1 and theta hat sub 2. An example of this
    type of t-test is a comparison of average literacy scores of men and women in
    the United States (NCES 2018). Because of the complex sample design and
    imputation process, the two groups’ estimates are not independent. The standard
    error of the difference between the two groups’ estimates, denoted
    s e open paren theta hat sub 1 minus theta hat sub 2 close paren., must be directly
    estimated using the provided replicate weights. Given the two groups’ estimates
    and the estimated standard error of their difference, the t-statistic is
    computed as follows:

    t equals theta hat sub 1 minus theta hat sub 2 over s e open paren theta hat sub 1 minus theta hat sub 2 close paren

    A second type of t-test compares estimates from two
    independent samples, such as the PIAAC U.S. sample and another country’s sample,
    but it is not generally applicable to comparing estimates from two groups within
    the PIAAC U.S. sample. For example, the results of this type of t-test are
    included in the PIAAC International Highlights Web Report (NCES 2020) for
    comparisons of the U.S. average literacy score to that of Japan. This test is
    also applicable to the comparison of estimates from PIAAC U.S. samples from
    different cycles. For this test, the two independent estimates
    theta hat sub 1 and theta hat sub 2 have a sampling
    variance equal to the sum of the two independent estimates’ variances, so the
    t-statistic for a statistical significance test may be computed as follows:

    t equals theta hat sub 1 minus theta hat sub 2 over the square root of the variance of theta hat sub 1 plus the variance of theta hat sub 2

    Because of the complex sample design, the degrees of
    freedom for the reference t-distribution used in tests of significance should be
    much smaller than the total number of respondents in the sample. The upcoming
    PIAAC U.S. technical report contains guidance on the determination of the degrees of freedom to
    use for t-tests and other types of statistical significance tests.

    When comparing over time, there is an uncertainty associated with the skills’ scales between Cycle 2 and Cycle 1, because the assessment framework and assessment items are, although similar, not identical. This uncertainty is manifested through “linking error” that is independent of the size of the sample. The linking error is added for the statistical significance of differences in proficiency scores. The value of the linking error is 3.27 for literacy and 2.95 for numeracy. Details of estimating standard errors in the PIAAC Cycle 2 U.S. results can be found in appendix D (Data User Guide) of the upcoming PIAAC Cycle 2 U.S. technical report.

    Nonresponse Bias Analysis

    Nonresponse bias analysis (NRBA) is used to evaluate
    the possible extent of bias originating from differences in proficiency between
    those who responded to the survey and those who did not. The proficiency of
    survey nonrespondents is unknown, so it is not possible to have an exact measure
    of bias in the proficiency estimates. Instead, NRBA provides a way of using
    known information about survey respondents and nonrespondents to evaluate the
    potential risk for bias in the data.

    To reduce nonresponse bias, adaptive survey design
    procedures6 were developed by the PIAAC Consortium and followed by the United
    States during its data collection (see chapter 3 of the PIAAC Cycle 2 U.S.
    technical report). PIAAC Technical Standards and Guidelines (OECD 2022) also
    required NRBA for all countries, with additional analyses for those with a
    response rate7 below 70 percent.8 With a response rate of 28 percent,9 the United
    States conducted the two required sets of analyses: (1) a basic NRBA to identify
    differences in respondent and nonrespondent characteristics so that the
    weighting process could adjust for the differences, and (2) an extended NRBA to
    assess the potential for bias in the final, weighted PIAAC proficiency
    estimates.

    Basic Nonresponse Bias Analysis

    The basic NRBA was used to identify and correct for
    bias in respondent characteristics. The analysis was used to inform nonresponse
    weighting adjustments for the core national sample and was performed on the
    national sample of adults ages 16 to 74, excluding the state supplemental
    sample. For this analysis, a classification tree method was applied to divide
    the sample into subgroups with different response rates. One tree was fit for
    the screener stage and another for the BQ stage. The subgroups were formed using
    characteristics that were known for both respondents and nonrespondents and were
    related to proficiency,10 such as educational attainment,11 DU type, and urban/rural
    designation. Based on the analysis, the strongest predictor of screener response
    status was census region, with lower response rates for households in the
    Northeast and Midwest. The strongest predictor of BQ response status was the
    percentage of the population below 150 percent of the poverty threshold, with
    lower response rates for persons in census tracts with lower poverty rates. The
    subgroups formed by the classification trees were then used in weighting, with
    respondents of similar characteristics representing nonrespondents within the
    subgroup. The purpose of the adjustment was to correct for the under- or
    overrepresentation of respondents in the identified subgroups, potentially
    reducing nonresponse bias in the proficiency estimates. The analysis and
    adjustment were based on a limited set of demographic variables, so potential
    over- or underrepresentation of certain subgroups might still be present.


    Extended Nonresponse Bias Analysis

    The United States performed the extended NRBA after
    weights and proficiency scores (PVs) were produced. The purpose was to provide
    an indication of data quality by evaluating the effect of the data collection
    procedures, adaptive survey design, and weighting adjustments on nonresponse
    bias and the potential for bias in the final proficiency estimates. Highlights
    from two key analyses are provided below, with complete results found in the
    upcoming PIAAC Cycle 2 U.S. technical report.

    The level-of-effort analysis evaluates how
    proficiency estimates change as the number of contacts increases. If
    nonrespondents are assumed to be similar to hard-to-reach respondents, the
    analysis can provide an indication of the potential for nonresponse bias. This
    analysis was performed using the final sample of adults ages 16 to 74, which
    included the core national sample and the state supplement. Individuals who
    responded on the first attempt (10 percent of respondents) scored 12–14 points
    lower than the overall average for the three proficiency domains (literacy,
    numeracy, and APS). Cumulatively, those who responded on either the first or
    second attempt (37 percent) scored 5–6 points lower than the overall average. By
    the fourth attempt (cumulatively 57 percent of respondents), the average
    proficiency score was within 1 point of the overall average for each of the
    three domains. The results indicated the strong potential for nonresponse bias
    if only one or two attempts had been made.
    Therefore, it could be inferred that contact protocols requiring multiple
    contact attempts likely reduced nonresponse bias in the final PIAAC outcomes.
    The analysis relied on respondent data and the assumption that nonrespondents
    were similar to hard-to-reach respondents. The actual proficiency of the
    nonrespondents and the effect of nonresponse on the overall proficiency
    estimates is unknown.

    Explained variation in outcomes (EVO) is a measure
    that describes how much information is known about the proficiency of the target
    population based on the respondent data (i.e., proficiency scores) and the
    additional characteristics used in weighting adjustments. The EVO can range from
    0 to 100 percent, with a higher EVO indicating that there is more information
    about the proficiency level of the target population and less potential for
    nonresponse bias. The EVO is approximately equal to RR + (1 – RR)*R2,
    where RR is the response rate (28 percent for the United States),
    and R2 is based on a regression model with the proficiency
    score as the dependent variable and the weighting variables as predictors. The
    regression R2 indicates the strength of the relationship
    between the weighting variables and the proficiency score and can be thought of
    as the amount of information about proficiency that is explained by the
    weighting variables.

    For the United States, the EVO was calculated based on
    the national sample
    of adults ages 16 to 74, excluding the state supplemental sample because
    response rates could not be calculated for the state supplemental sample. The United States’ EVO ranged from 56 to 59 percent for the three proficiency
    outcomes. This meant that data from respondents, together with the weighting
    variables for nonrespondents, were estimated to explain 56–59 percent of the
    proficiency distribution among the eligible sample cases, compared with the 28
    percent explained by respondent data alone. The results indicated that the
    weighting variables contributed valuable information about the nonrespondents’
    proficiency, making the weighting adjustment effective in reducing bias in the
    proficiency estimates. An EVO of 56–59 percent would be equivalent to a response
    rate of 56–59 percent where no weighting adjustments were performed to reduce
    nonresponse bias (or where the weighting variables have no relationship to
    proficiency, i.e., R2 = 0), and the data should be considered
    with the same level of caution. Based on international criteria for PIAAC, an
    EVO threshold of 50 percent was used to distinguish between a high level of
    caution and a moderate level of caution regarding the potential for nonresponse
    bias, with the United States falling within the moderate range. Given that the
    EVO was below 100 percent, the weighting variables did not provide complete
    information about the proficiency level of the nonrespondents, and the potential
    for nonresponse bias remains.

    In general, lower response rates are associated with a
    higher risk for nonresponse bias if nonrespondents are very different from
    respondents and if the weighting was not effective in reducing those
    differences. The extended NRBA provides evidence that data collection
    procedures, adaptive survey design, and weighting adjustments were effective in
    reducing nonresponse bias. There were no indications of serious concerns in the
    final estimates. However, it is not possible to know or quantify the actual
    extent of nonresponse bias, and data users should be aware of the potential for
    bias in the final PIAAC estimates.

    References

    International Labour Organization. (2012).
    International Standard Classification of Occupations 2008 (ISCO-08): Structure,
    group definitions and correspondence tables.
    Retrieved from


    https://www.ilo.org/publications/international-standard-classification-occupations-2008-isco-08-structure
    .

    Judkins, D.R. (1990). Fay’s method for variance
    estimation. Journal of Official Statistics, 6(3): 223-239.

    Kish, L. (1965). Survey Sampling. New York: John Wiley & Sons.

    Krenzke, T., Roey, S. Dohrmann, S.M., Mohadjer, L.,
    Haung, W-C., Kaufman, S., and Seastrom, M. (2006). Tactics for Reducing the Risk
    of Disclosure Using the NCES DataSwap Software. Proceedings of the American
    Sociological Association: Survey Research Methods Section.
    Philadelphia:
    American Sociological Association.

    Krenzke, T., and Mohadjer, L. (2020). Application of
    probability-based link-tracing and non-probability approaches to sampling
    out-of-school youth in developing countries. Journal of Survey Statistics and
    Methodology
    . Retrieved from

    https://doi.org/10.1093/jssam/smaa010
    .

    National Center for Education Statistics. (2002).
    Maintaining Confidentiality: NCES Standard: 4-2. In NCES Statistical
    Standards
    . Retrieved from

    https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2003601
    .

    National Center for Education Statistics. (2012).
    2012 Revision of NCES Statistical Standards: Final.
    Retrieved from

    https://nces.ed.gov/statprog/2012/
    .

    National Center for Education Statistics. (2018).
    Data Point: Literacy and Numeracy Skills of U.S. Men and Women
    (NCES
    2018-164). Retrieved from

    https://nces.ed.gov/pubs2018/2018164/index.asp
    .

    National Center for Education Statistics. (2020).
    PIAAC International Highlights Web Report
    (NCES 2020-127). Retrieved from


    https://nces.ed.gov/surveys/piaac/international_context.asp
    .

    Organisation for Economic Co-operation and
    Development. (2013). Large Regions, TL2: Demographic Statistics.
    Retrieved from


    https://www.oecd-ilibrary.org/urban-rural-and-regional-development/data/large-regions-tl2/demographic-statistics_data-00520-en
    .

    Organisation for Economic Co-operation and
    Development. (2021). The Assessment Frameworks for Cycle 2 of the Programme
    for the International Assessment of Adult Competencies.
    Retrieved from

    https://doi.org/10.1787/4bc2342d-en
    .

    Organisation for Economic Co-operation and
    Development. (2022). Cycle 2 PIAAC Technical Standards and Guidelines.
    Retrieved from


    https://www.oecd.org/content/dam/oecd/en/about/programmes/edu/piaac/technical-standards-and-guidelines/cycle-2/PIAAC_CY2_Technical_Standards_and_Guidelines.pdf
    .

    UNESCO Institute for Statistics. (2012).
    International Standard Classification of Education 2011.
    Retrieved from


    https://spca.education/wp-content/uploads/2024/03/international-standard-classification-of-education-isced-2011-en.pdf
    .

    United Nations Statistics Division. (2007).
    International Standard Industrial Classification of All Economic Activities
    Revision 4, Series M: Miscellaneous Statistical Papers, No. 4 Rev. 4.
    New
    York: United Nations. Retrieved from

    https://unstats.un.org/unsd/classifications/Family/Detail/27
    .

    U.S. Census Bureau. (2020). American Community
    Survey 2015-2019 5-Year Data Release.
    Retrieved from


    https://www.census.gov/newsroom/press-kits/2020/acs-5-year.html
    .

    U.S. Census Bureau. (2024). PUMS Data.
    Retrieved from


    https://www.census.gov/programs-surveys/acs/microdata/access.2022.html#list-tab-735824205
    .

    Van de Kerckhove, W., Krenzke, T., and Mohadjer, L.
    (2020). Addressing Outcome-Related Nonresponse Through a Doorstep Interview. In
    JSM Proceedings, Survey Research Methods Section (715-724). Alexandria,
    VA: American Statistical Association. Retrieved from


    http://www.asasrms.org/Proceedings/y2020/files/1505350.pdf
    .

    Wasserstein, R., and Lazar, N. (2016). The ASA
    Statement on p-Values: Context, Process, and Purpose. The American
    Statistician, 70
    (2): 129–133. Retrieved from

    https://doi.org/10.1080/00031305.2016.1154108
    .

    2024-12-10 04:00:00

    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    BelieveAgain
    • Website

    Related Posts

    Fulbright board resigns over alleged political interference : NPR

    June 15, 2025

    NYC teachers union declines to endorse in mayor’s race

    June 14, 2025

    Consequences for colleges whose students carry mountains of debt? Republicans say yes

    June 14, 2025

    Supreme Court Decision Lets Students Sue Schools More Easily for Disability Bias

    June 13, 2025
    Add A Comment

    Comments are closed.

    New Comments
      Editors Picks
      Top Reviews
      Advertisement
      Demo
      • Contact us
      • Do Not Sell My Info
      • Term And Condition
      Copyright © 2025 Public Education News

      Type above and press Enter to search. Press Esc to cancel.