Frequently asked questions about Income Distributions and Dynamics in America

What questions can the IDDA statistics help answer?

The IDDA statistics offer unique insights into two important features of incomes in the United States: First, how incomes are distributed—who has high income, who has low income, who has intermediate income. Second, how earners move within the income distribution over time.

To better understand how income is distributed, the distributional statistics include the dollar value of income for many percentiles of the income distribution across and within states and demographic groups. This portion of the data will allow researchers to answer detailed questions about earnings, relative earnings, and income inequality. For example, how do earnings for young Hispanic women in New Mexico compare with those in Arizona? Or, does the ranking of states by earnings inequality for prime-age workers change over time?

To better understand income mobility, the dynamic statistics measure the probability that a household at a certain initial income level and with specific demographic characteristics will move to a different income level within 1 or 5 years. This helps to identify which groups of people have more or less income mobility. And, with two decades of data, IDDA can help answer questions on how incomes fluctuate over time. For instance, how has the share of women with incomes above the 99th percentile changed over time?

For more examples of how IDDA statistics can be used to understand incomes across America, read our articles on how incomes have evolved by race and ethnicity within Minnesota, how recessions and recoveries impact inequality across demographic groups, and how incomes of U.S.-born and foreign-born earners at the very top of the income distribution have evolved.

What makes the IDDA statistics unique?

The IDDA resource is a powerful set of income-related statistics that are derived from individual U.S. tax records over a 20-plus-year horizon.

IDDA statistics summarize extensive data from restricted IRS and Census Bureau records to give a novel level of fidelity to income outcomes across demographic groups in the United States. This makes insights from administrative data broadly accessible in a way that they usually are not, including descriptions of the incomes of very high earners.

Furthermore, the statistics were constructed starting with all U.S. individual tax forms, rather than starting with a limited sample, making the statistics more comprehensive than other sources. That comprehensiveness means it is possible to share income and earnings statistics at the intersection of geographic and demographic characteristics, such as age, sex, and race and ethnicity. Read more about what’s in IDDA.

What is the scope of the data?

The IDDA database offers over 6 million statistics on income and earnings from 1998 to 2019. The data include four income and earnings measures: total compensation, wage compensation, nonwage income, and adjusted gross income.

(See a description of these measures.) IDDA allows users to examine these measures by state, race and ethnicity, gender, age, and U.S.- or foreign-born status, as well as intersections of these characteristics.

IDDA does not include person-level microdata. Rather, the datasets contain summary statistics derived from individual tax records. In other words, individual incomes are not visible in IDDA. All statistics are averages of a minimum-size group of people with shared characteristics.

What is the IDDA Native areas data, and what types of Native land areas does this include?

The IDDA Native areas data include over 70,000 statistics on income distributions, income mobility, and migration for both Native and non-Native populations living in Native areas. The Native areas geography in IDDA includes all Census Bureau–defined tribal areas as well as Native Hawaiian trust lands, delineated as of 2017, combined into a single geography.

These statistics are further broken out by age, sex, and their intersections with Native identity. The Native areas geography in IDDA includes, for example, federally and state-recognized Indian reservations, tribal statistical areas, and the trust lands of Native Hawaiians.

The Census Bureau regularly solicits updates to these boundaries from tribal governments. The IDDA Native areas module includes individuals whose residential address falls within a Census block that is designated as a tribal area. IDDA does not provide information on specific Native areas or American Indian reservations, but rather provides a detailed look at incomes in tribal communities across the U.S.

The Native geographies defined by the Census Bureau are broad, but imperfect. Native people have many ways of defining Indian Country. Learn more about Native land areas and Native identity in IDDA.

What population does IDDA represent?

The IDDA statistics cover all individuals who filed a Form 1040 or received a Form W-2 in a given year.

The statistics are not weighted to be representative of the full population. In other words, the statistics are representative of tax filers nationally and within each state, but they are not representative of all U.S. or state residents.

Who is included in the IDDA statistics? Who is excluded?

The IDDA statistics are based on data that include Form 1040 tax returns filed with the IRS between 1998 and 2019 as well as W-2 forms received by employees between 2005 and 2019.

Individuals with W-2s are included in the individual-level earnings measures. The largest groups of people excluded from the individual-level earnings statistics are likely people who are exclusively self-employed or working as independent contractors who do not receive W-2s.

Households with at least one filed Form 1040 are included in the household-level income measures. According to the IRS, approximately 157 million Form 1040s were filed in 2019. Self-employment and contract employment incomes are reported as part of a household’s adjusted gross income on the 1040 form. The federal government does allow households with an adjusted gross income below a certain threshold ($12,200 for a single filer with no dependents in 2019) to not file taxes, although they may choose to file because some benefit programs require tax records. The largest groups excluded from IDDA household-level income statistics are likely those with very low incomes and those over age 65 who receive Social Security payments as their only form of income, because these groups are not required to file.

The demographic information in IDDA is drawn from administrative data sources that are intended to be universal. We make use of multiple sources for key information, such as race and ethnicity, to maximize the coverage of the dataset. However, some records that could not be linked to demographic information were excluded from analysis. This includes tax records that are not associated with a Social Security number. In addition, records were excluded from the 1040 data if they did not link to a valid address or if the number of filers associated with a single address was unusually high.

How is a “household” constructed in IDDA?

In the IDDA statistics, a household is constructed by aggregating all 1040 tax filings from a single address.

The household-level total is then assigned to each adult listed on a 1040 form filed with the common address. This means that IDDA’s household-level statistics represent the total income available to an individual within a household.

As an example, a household could include three adults: two married individuals filing a joint 1040 with $40,000 of income plus a third adult (a roommate or a parent, for instance) filing an individual 1040 with $15,000 of income. The household total income of $55,000 is assigned to each individual, the level at which demographic characteristics are reported.

How are race and ethnicity defined?

IDDA uses seven race and ethnicity categories: Hispanic, non-Hispanic American Indian or Alaska Native, non-Hispanic Asian, non-Hispanic Black, non-Hispanic Native Hawaiian or other Pacific Islander, non-Hispanic White, and non-Hispanic other or multiple races.

However, statistics are not always published for the non-Hispanic other and multiple races group.

Race and ethnicity data come from the 2020 Census Bureau’s Best Race and Ethnicity Administrative Records Composite File. This database compiles race and ethnicity data from multiple sources, including the decennial census, the Indian Health Service, and records from the Temporary Assistance for Needy Families program and the Department of Housing and Urban Development. If an individual does not have data in the Census Bureau’s Best Race and Ethnicity file, race and ethnicity data are taken from the American Community Survey (ACS) or decennial census. This allows IDDA to capture demographic characteristics as accurately as possible while maximizing the coverage of the dataset. Observations were dropped if race and ethnicity information was missing.

How are American Indian or Alaska Native individuals represented in IDDA statistics?

Wherever possible, IDDA includes income statistics for American Indian or Alaska Native individuals intersected with state, sex, and age. In addition, IDDA reports statistics for the Native and non-Native populations living in Census Bureau–delineated tribal areas and Native Hawaiian trust lands.

Race and ethnicity data in the Native areas statistics come from the same sources as in the core dataset, but they are coded using a more inclusive definition of Native identity that includes individuals who report one of their races as American Indian, Alaska Native, Native Hawaiian, or other Pacific Islander, regardless of Hispanic ethnicity. The statistics were put together in this way to prioritize providing income statistics for populations that often are not captured in other databases.

More information on the availability of IDDA statistics by race and ethnicity can be found in the technical documentation. Also, see our articles on Native identity and Native areas in IDDA and how incomes have evolved for Native peoples and for these places.

How is sex defined?

Sex is defined as one of two categories, male or female.

This information comes from an individual’s initial Social Security application, unless the individual has recorded a change to their sex with the Social Security Administration.

Where does the underlying data come from?

The data used to create the IDDA statistics were collected from IRS 1040 and W-2 filings as well as from demographic records collected by the Census Bureau and Social Security Administration.

Databases were linked using unique, anonymized, individual “protected identification keys” (known as “PIKs”) rather than using Social Security numbers, taxpayer identification numbers, or other personal identifying information. This ensures the confidentiality of individuals within the sample.

What are the differences between the W-2 and 1040 statistics in IDDA?

The statistics derived from 1040 forms capture the income available to all persons living in a household at a common address.

These statistics are constructed by aggregating all 1040 filings within a year for an address and generates total household income measures, including wage income and adjusted gross income.

The W-2 data capture an individual’s wage earnings during a given year as reported by their employer. The W-2 data are at the level of the individual, not the household.

What types of income are included in the dataset?

IDDA includes six measures of pretax individual and household income. Incomes summarized in IDDA do not account for differences in taxes paid on these incomes.

IDDA does not include nontaxable transfer incomes, a substantial income source for low-income households and individuals.

The three primary income measures available in IDDA are:

Total compensation, which includes wages, tips, bonuses, and other compensation reported in Box 1 of Form W-2, plus deferred compensation reported in Box 12 of Form W-2 (such as 401k contributions). Together, this represents what an individual earns through formal employment. In the interactive chart and map tool, we refer to this as “individual earnings.”
Household gross income, which sums up all the taxable income reported on Form 1040s filed by individuals residing at the same address. This includes wage and salary earnings as well as taxable nonwage income (defined below). In the interactive chart and map tool, we refer to this as “household total income.”
Household total nonwage income, a broad category of taxable income that includes self-employment income, interest and dividends, unemployment insurance, and the taxable components of Social Security and other retirement income reported on Form 1040. It does not include Supplemental Nutrition Assistance Program (SNAP) benefits or most public assistance.

The datasets available for download include some additional components of individual- and household-level income, including individual deferred compensation and household wage income. Read the definition of each income measure.

How is the IDDA dataset organized?

The dataset is organized into five modules, each of which captures income outcomes reflected in both 1040 and W-2 tax filings at the national and state level. Furthermore, each module offers statistics by demographic characteristics such as age, race and ethnicity, and sex.

The percentiles of income module captures income levels across the distribution. It shows the income of individuals or households at the 10th, 25th, 50th, 75th, 90th, 95th, 98th, 99th, 99.9th, 99.99th, and 99.999th percentiles of the distribution.
The top income shares module provides two ways of measuring the concentration of incomes at the top of the distribution. One is the proportion of income that is held by the top earners within a demographic group. This can answer questions such as, What percentage of total male income is held by the top 10 percent of male earners? The other is the proportion of income above a certain percentile that is held by members of a demographic group. For example, How much of the income held by the top 10 percent of all earners is held by men?
The top income population shares module provides the demographic composition of a slice of the income distribution. This can help answer questions such as, What percentage of the top 10 percent of all earners are Hispanic, or female, or 35–44 years old?
The income change distributions module summarizes nominal changes in income over a 1- or 5-year period for individuals starting in a given income bin, typically an income quartile. It reports the mean and specific percentiles of income changes between the 10th and 90th percentiles within a demographic group. For example, among individuals who started in the lowest quartile of income in 2018, the mean income change after 1 year was $5,024. Fifty percent of individuals had income growth of less than $2,000 (the 50th percentile of income growth) while 90 percent of individuals had income growth of less than $16,410 (the 90th percentile of income growth).
The income transition matrix module shows the probability that an individual starting in a given income bin moves to a particular income bin after 1 or 5 years. This provides measures of upward and downward income mobility for individuals in a particular demographic group. It also includes the probability that an individual starting in a given income bin is not in the W-2 or 1040 data after 1 or 5 years.

Why focus on top income shares?

Top income shares are one common way of measuring inequality both within and across demographic groups. Detailed income inequality data on the top income shares is part of what makes the IDDA statistics unique.

High-income earners often underreport income levels in surveys, which can lead to inaccurate estimates of the top share of the income distribution. The IDDA statistics reduce this error by relying on administrative tax data rather than self-reported income.

By making statements such as, “The top 1 percent of the income distribution held 17 percent of total income” (which IDDA tells us was the case in 2019), we can quantify how concentrated incomes are at the top of the distribution. This has implications for redistributive policy. Observing the share of top income held by a particular demographic group can speak to the opportunity that group has to prosper in the economy.

What is the distinction between the “top income shares” and “top income population shares” modules?

The top income population shares module answers the question, What share of individuals in the top X percent of earners belong to a certain demographic group? Then, for the same underlying population, the top income shares module helps answer the question, What share of income held by the top X percent is held by members of a certain demographic group?

For example, nationally in 2019, men comprised 69 percent of the top 10 percent of earners based on total W-2 compensation. In the same year, men earned 73 percent of the total W-2 compensation held by the top 10 percent of earners. That means men held a slightly larger share of income than their share of the population in the top 10 percent group.

The top income shares module also provides similar measures within demographic groups. These answer the question, What share of all income earned by a particular demographic group accrues to the top X percent of individuals in that demographic group?

How do the “income change distributions” and “income transition matrix” modules help us understand income mobility?

Both the income change distributions module and the income transition matrix module leverage longitudinal data that follow individual tax filers over time. They provide different ways of describing how incomes change for individuals based on where they start in the income distribution and by their demographic characteristics.

The income change distributions summarize income changes in nominal, dollar-value terms. This provides a unique level of detail within an initial-year income bin, breaking apart what “strong” and “weak” income growth look like for individuals with similar initial earnings. For example, what is the 90th percentile of income growth among Hispanic workers who start in the second income quartile (25th–50th percentile) of the national distribution? IDDA tells us that from 2018 to 2019, it was $42,170.

In contrast, the income transition matrix module measures the probability that individuals move between income quartiles over time (after 1 year or 5 years). These provide more standard measures of upward and downward income mobility for different demographic groups, such as, How likely are Hispanic workers to move up from the second income quartile (25th–50th percentile) to the highest quartile (above the 75th percentile of income) from one year to the next?

In both modules, income bins are defined within a geography but across demographic groups. In other words, whether an individual falls in a given income quartile is based on the overall U.S.-level or state-level income distribution.

What is “relative income”?

Relative income (or “relative earnings”) is one way to measure income inequality across demographic groups.

At a given point in the income distribution, relative income is simply the income of one demographic group divided by income for a reference group, expressed as a percent. Often, the reference group is selected to reflect a group of earners that has historically been advantaged in the economy. For example, the gender earnings gap is often expressed as how much women earn as a percentage of how much men earn. As measured in IDDA, in 2019, women in the U.S. earned 72 percent of what men earned at the median, so the relative income of women was 72 percent. This value is computed by taking the 50th percentile of individual total compensation among women ($30,330) and dividing it by the 50th percentile of individual total compensation among men ($42,170). However, relative income could be constructed using a different denominator, such as the U.S. population or a different demographic group.

Users can explore relative income for different demographic groups across U.S. states in IDDA’s interactive chart and map toolkit by selecting “Yes” next to the “Compare these values with ... ?” prompt.

For a step-by-step guide on how to analyze relative earnings by race and ethnicity using IDDA, see the instructions at the end of the article.

Why do I see missing values in the dataset?

Select data in the IDDA statistics needed to be suppressed to preserve the individuals’ confidentiality.

IDDA statistics are based on samples that meet a minimum size so as to prevent a user from determining an individual’s identity from the data. Further details of the suppression process can be found in Table 2 of the IDDA Technical Documentation.

How does IDDA preserve individuals’ confidentiality?

IDDA includes summary statistics derived from individual tax records, not person-level microdata. The data displayed are averages of incomes across households within a portion of the income distribution. At no point does IDDA display income measures for a specific individual or household.

To ensure confidentiality, IDDA suppresses small samples where it would be possible to “back out” an individual’s identity. For example, statistics for a very small demographic from the upper 99th income percentile in a sparsely populated state may be missing from the IDDA dataset. With such a small sample, it could be possible to make conclusions about a specific household’s income. Therefore, these samples are removed.

How do IDDA income statistics compare with Current Population Survey statistics?

The traditional measurement for income outcomes in the U.S. is the Current Population Survey Annual Social and Economic Supplement (CPS ASEC), a survey that is fielded in March of each year and collects detailed income information from a nationally representative sample of over 75,000 households.

The CPS ASEC is a survey, while IDDA is derived from administrative data. Because IDDA is constructed from the universe of tax records in the U.S., it can report income statistics for much smaller demographic and geographic groups than the CPS ASEC, while preserving both confidentiality and statistical accuracy. Survey income measures are subject to misreporting: Respondents, particularly high earners, tend to underreport incomes on surveys, either to appear more average or because they don’t consider all their sources of income. Some respondents refuse to report income data, and those incomes are imputed by a statistical model. Research has also shown that misreporting and nonresponse in the CPS ASEC differ across race, gender, and income level and are affected by social dynamics between interviewers and respondents.

Incomes in the public version of the CPS are also top coded, meaning that all incomes above a certain threshold are replaced with a cutoff value (the top code). For these reasons, the CPS ASEC might underreport some measures of income inequality.

One advantage of the CPS ASEC is that it contains rich information on respondents’ work histories, occupation and industry, education, and household relationships, which is outside of the scope of the tax data IDDA leverages.

How do the IDDA income statistics compare with the Federal Reserve Economic Data (FRED)?

What makes IDDA distinct from the income measures offered in the Federal Reserve Economic Data (FRED) is the amount of information it provides about incomes across the income distribution.

FRED does include income measures, including real median income by state and estimates of real disposable household income. These measures are often based on survey data, such as the Current Population Survey or surveys fielded by the Bureau of Economic Analysis. In contrast, IDDA provides income measures not just at the median, but all along the income distribution. Moreover, these income measures can be analyzed for specific demographic groups. While data sources such as FRED offer a general picture of the income distribution, IDDA offers the ability to answerer specific questions at the intersections of time, place, age, and race and ethnicity—for instance, in 2019, Asian men in California at the 10th percentile of their group’s income distribution earned $8,391. This degree of granularity is not available in FRED.

How do IDDA income statistics complement other data sources?

Looking at IDDA statistics in combination with data from other popular data sources, such as the Federal Reserve Economic Data (FRED), the Survey of Consumer Finances (SCF), and the Regional Economic Accounts from the Bureau of Economic Analysis, can answer additional questions about the sources of income growth, mobility, and inequality.

For example, the SCF includes survey measures of household wealth that are not captured in the IDDA statistics, including retirement savings, property assets, and credit card debt. SCF data are available by race and ethnicity, age, and income percentile, so together IDDA and SCF statistics can help provide a picture of income and wealth for various demographic groups. The BEA regional economic data allow users to look at macroeconomic variables, such as gross domestic product (GDP) growth, alongside the income distributions reported in IDDA. This could help illuminate the dynamics between economic growth and the income distribution at the state level.

Does IDDA contain information on individual or household wealth?

The W-2 and 1040 data capture household income, not wealth.

The forms do not include information on household assets such as property or savings, which are fundamental in measuring a household’s wealth. However, a measure of wealth may be reflected in nonwage household income reported on Form 1040. Nonwage income can come from passive income obtained from household wealth, such as rental properties or financial assets.

Can I use IDDA to study poverty?

IDDA cannot be used to study poverty directly. The Census Bureau uses the official poverty measure (OPM) that varies with family size and composition to assess poverty levels.

The income measure used in the OPM includes cash transfers from public assistance programs that are not included in the IDDA income measures. In addition, IDDA statistics do not capture family size or composition. To better understand poverty in the United States, see the annual Census Bureau poverty report.

What are the limitations of IDDA statistics?

The incomes summarized in IDDA do not account for the amount of taxes households pay, which can vary greatly even among households with similar incomes. Incomes in IDDA also miss nontaxable transfer incomes, which can be a substantial income source for low-income households and individuals.

The IDDA statistics are not weighted to be representative of the full U.S. population. In other words, the adjusted gross income statistics in a state are representative of 1040 filers in that state, but they are not representative of the full population of that state’s residents.

How can I access the IDDA statistics?

IDDA datasets can be downloaded from the IDDA data center.

How should I cite the IDDA statistics?

For academic and research reports, we recommend the following citation format:

Kondo, Illenin, Kevin Rinz, Brandon Hawkins, Natalie Gubbay, John Voorheis and Abigail Wozniak. (2023). “Granular Income Inequality and Mobility using IDDA.” U.S Census Bureau Center for Economic Studies Working Paper CES-23-55. Associated dataset: version 1.0. https://doi.org/10.21034/data.idda

For policy briefs or articles in the popular press, we recommend the following citation format:

Income Distributions and Dynamics in America, Federal Reserve Bank of Minneapolis, https://minneapolisfed.org/institute/income-distributions-and-dynamics-in-america

Where can I find the codebook and technical details about the data?

The codebook and technical documentation are available. Both of these documents are provided in the data center.

Where can I send questions about the IDDA dataset?

Please email the Opportunity & Inclusive Growth Institute at MplsInstitute@mpls.frb.org with any inquiries.

Jump to:

What IDDA can be used for