Misty L. Heggeness is a Visiting Scholar at the Federal Reserve Bank of Minneapolis’ Opportunity & Inclusive Growth Institute. She has conducted research at multiple federal agencies over the past decade and has participated in household survey data collection efforts abroad. Her research focuses on the biomedical research workforce, gender economics, economic demography, and inequality. She is grateful for valuable feedback from Pierre Azoulay, Jeremy Berg, Donna K. Ginther, Julia Lane, and Abigail Wozniak. All opinions and any errors expressed here are solely hers and do not reflect any official position of the Federal Reserve System or the Federal Reserve Bank of Minneapolis.
During a time of national crisis in this global pandemic, federal data collection and statistical production methods can be severely challenged. Equally true, in times of crisis, nations need reliable up-to-date data to inform policy and decision-making. We must take notice of current gaps and step up to the challenge. Government agencies should work together to share, harmonize, and integrate already existing and novel data. I outline weaknesses of currently existing data collection methods and share a path forward into a 21st century statistical world. In this time of COVID-19, now more than ever, we need to be focused on multi-use efficiencies and harness linked real-time big data to improve the production of official statistics and policy research.
The coronavirus is here and our need for real-time data has never been more apparent. While viral infectious diseases cannot spread through electronic files, they wreak havoc on systems occupied by humans like scientific labs, statistical agencies, and other organizations that rely on data. At a moment like this, we would all benefit from re-examining our approach to data collection, storage, and use.
Lessons from the Past: Is History Repeating Itself?
The Great Depression of 1930s forced federal statistical agencies to innovate data collection because (1) the administration needed more up-to-date statistics on unemployment than just once a decade and (2) the federal government did not have the resources to conduct a full count enumeration every year (Anderson 2015). Household survey sampling methodology was the major innovation of that time - allowing the federal government to produce annual unemployment rates by sampling a subset of representative households.
Almost 100 years later, we are again living in unprecedented times. Will history repeat itself? As we experience the COVID-19 pandemic, scholars and policymakers have become obsessed with producing real-time statistics on unemployment claims (Goldsmith-Pinkham and Sojourner 2020) and unemployment rates (Wolfers 2020). This innovative work is happening outside of the federal government. It should spur federal leaders to take up a call to action to innovate their own statistical systems. Wozniak (2020) has recommended innovations in nationally representative surveys that would increase accuracy of reporting on the spread of COVID-19.
One cannot help but wonder: Will this crisis force a new round of innovation in data collection and statistical production within federal systems? Will COVID-19 spur statistical innovation on a global level and move federal statistical agencies towards statistical production using multi-source real-time data?
The Current Environment
Take the U.S. Census Bureau. Mail delivery has begun. With an active pandemic, it is more important today than ever for people to fill out their Census response online as the agency may face significant challenges to collecting data in the field for those who do not respond. It has already announced delays in field operations (Lo Wang 2020). If Census staff cannot go out in field, we might be left with a 2020 Census driven solely by individual willingness to respond the first time we contact them via snail mail. This most likely means that difficult to count subgroups will go underrepresented. The effect will be long lasting as the data collected in today’s census will be used to represent communities for the decade to come.
In an effort to think ahead and innovate, agencies like the Census Bureau continue to experiment with various alternative options for counting individuals that use already existing data from prior decennial censuses, household surveys, administrative records1, and third-party data2. In this day in age, however, when so many of us leave data crumbs everywhere, we need to ask what government systems and the public are willing to tolerate to facilitate the reuse of already existing data to inform policymakers and leaders. It could be the difference between high- and poor-quality data, or even worse, high-quality and no data.
Many countries are behind in data innovation, not because of lack of capacity, but because of fears around privacy and confidentiality. Statistical agencies have an enormous responsibility – to protect the privacy and confidentiality of all persons who have put their trust in them and shared personal information. Like most statistical agencies around the globe, the U.S. has laws governing the protection and use of data. The Privacy Act of 1974 (5 U.S.C. § 552a) governs the collection, maintenance, use, and dissemination of information about individuals maintained in systems of records by federal agencies.3 Title 13 of the U.S. Code governs the compiling, administration, and use of data collected by the U.S. Census Bureau4, and Title 26 of the U.S. Code governs the collection and use of administrative tax records by the U.S. Internal Revenue Service and collaborators.5 Some of these rules come with punishments like hefty monetary fines or significant prison time for anyone who violates them; statistical agencies take their data privacy and protection responsibilities very seriously.
The world of big data today, however, means that non-statistical organizations also collect a plethora of data points. Across the globe statistical agencies are struggling to carefully consider the challenges associated with: (1) the accumulation of administrative data in non-statistical organizations, (2) online survey functionality and safety, (3) next generation tools for privacy protection, and (4) plummeting survey response rates.
At one end of the spectrum, we can look to Nordic countries, which have opted to harmonize and integrate birth, death, health system records, and other administrative data in lieu of traditional census taking every decade (United Nations Economic Commission for Europe 2007). Today, they have the capacity to carry out a full count enumeration without going door-to-door risking infection during a major health crisis.
At the other end of the spectrum, countries where data linkage is prohibited or discouraged rely on costly repeat manual collection, struggle to collect and integrate data due to complex privacy concerns, lack capacity, and make decisions based on fear and mistrust. But, creating capacity and building tolerance for safe and secured data linkage allows governments to better respond during a crisis and creates an environment where scientists and researchers can access linked data to answer complex questions and solve difficult problems in time to inform decision makers.
What Can We Learn from Nordic countries?
Nordic countries have something from which the rest of us could learn. The ability to balance access, privacy, and confidentiality while at the same time allowing researchers an opportunity to use linked data to conduct studies that advance our knowledge of the impact of policies and other topics like disease spread on human wellbeing and health. In fact, in the world of academic economists, tag teaming with a researcher from Sweden or Norway is the new obsession – considered a prime advantage to both advancing one’s career and developing innovative economic knowledge – in a way not possible in many other countries, given the constraints around data linkage, availability, and accessibility.
Through my prior experience as a labor economist at the National Institutes of Health, I know that when social scientists are given access to administrative records collected by scientific agencies, it helps inform policymakers as to what advances innovation, who benefits the most, and where scientific workforce bottleneck issues exist. Linking these records to other types of data (for example, from employers to study what occupations or industries scientists fall into when they leave academia) provides additional power to understand the dynamics of the world we live in. Enhancing our ability to strengthen data linkage will not only help inform during a time of crisis, but it will improve our overall ability to generate data driven decisions and inform policy. Some might say the possibilities would be endless.
Why Do the U.S. and Other Countries Struggle to Advance?
There is one major challenge, though, to successfully executing a merger like this in the U.S. and other countries. Human distrust and fear overpower our ability to make this happen. Data policy professionals are generally not trained in social science research, and they continually confuse individual privacy protections with severe restrictions on access to data for research. This is not a problem unique to the United States. Statistical agencies in South America, Europe, and elsewhere also struggle to gain the trust of their non-statistical counterparts and private sector companies.
Non-statistical agencies have a lot to lose by not taking advantage of the opportunity to link their data to other essential resources, and they have a lot to lose by restricting access to social scientists with expertise in tools that allow one to disentangle causality and isolate a “true” effect of a program or policy. Through my experience in non-statistical agencies, I have witnessed the strong desires of people trained in other fields to jump into the world of statistical inference and social science research as-if it required no additional training on their part. In many cases, the results are uninformative – at best – and dangerously misleading – at worst.
I offer four suggestions to support next steps in improving the environment for us all.
- Policy makers should re-evaluate current statistical laws and policies developed decades ago and update them for the current realities to support an ever moving, evolving, and advancing 21st century data reality.
- Non-statistical agencies must come to the table and, at a minimum, learn about how statistical agencies keep their data secure, and be willing to engage in an earnest conversation about methods for the use of their data for social science research. The methods must be fair and equitable, which may, at times, make non-statistical agencies and their policy leaders uncomfortable.
- Statistical agencies need to do a better job of streamlining the system so that external collaborators with the correct security clearances are able to access the data for research in a timely fashion.
- When studying social phenomenon, policy leaders and social science experts trained in causal inference techniques must work together. Think about it this way: Would you want an economist to perform your open-heart surgery?
Call to Action: Make Data Linkable and Accessible
I end with a call to action that describes where the community can focus its next steps. The example, specific to the U.S., is equally applicable in countries around the globe.
Health and science organizations like the National Institutes of Health, the National Science Foundation, and the Centers for Medicare and Medicaid Services must make the process of accessing their administrative data for social science research accessible and transparent. Third-party data organizations like Zillow and Google should partner with federal statistical agencies to make their data linkable with survey data and work towards a developed method that follows statistical guidelines for producing real-time statistics.
Following guidance from the Commission on Evidence-Based Policymaking and the Foundations for Evidence-Based Policymaking Act of 2018 (Evidence Act; Pub. L. 115-435) is a great place to start engaging in these activities. Sparking conversations with federal statistical agencies or other groups like the Coleridge Initiative or the NORC data enclave and learning from their expertise and experiences can inform organizations in how to develop an accessible, secure, and transparent system that can benefit everyone. At a minimum, bureaucratic distrust from agency-to-agency within the federal system needs to be minimized.
Non-statistical scientific agencies have a responsibility to harness all data for research that informs the policies they create. They should enter into agreements with statistical agencies to create a repository of data that, with the necessary protections and reviews, can be linked to other data sources and used to advance our knowledge base. One concrete example of how to go about this is to move administrative records into the Federal Statistical Research Data Centers (FSRDCs), which already store a plethora of survey and other data in a linkable format. The FSRDC system includes protections that require the development of project proposals, which are reviewed by experts in said field for feasibility and appropriate use of data.
This type of activity would allow our government to link multiple sources of data and get closer to achieving a resemblance of the type of system where administrative data is more effectively used, respondent burden is decreased, and researchers are better able to understand society, predict future pandemics, and isolate cause-and-effect of major policies, programs, and investments.
Every day without action is a day lost in advancing our understanding of key issues for communities, like how ‘social distancing’ mitigates the spread of infectious diseases or how federal scientific funding supports or detracts from innovation or even how healthcare service provisions can best enhance overall well-being. We need to start trusting each other, develop systems that allow the research to take place in a secure environment, and provide access to those with the tools to conduct the relevant type of research must be a priority for us all.
Lots of intriguing innovation takes place when we are feeling high levels of discomfort and in moments of major crisis. If there is any positive side to today’s crisis, it is the opportunity to spur innovation within federal statistical agencies. That would be a significant advancement and considered one major upside to this huge national tragedy, and history will once again repeat itself.
Anderson, Margo J. 2015. “The American Census: A Social History.” 2nd edition. New Haven, CT: Yale University Press.
Goldsmith-Pinkham, Paul, and Aaron Sojourner. 2020. “Predicting Initial Unemployment Insurance Claims using Google Trends.” Unpublished Working Paper (April 4). Accessed April 7, 2020, https://paulgp.github.io/GoogleTrendsUINowcast/google_trends_UI.html.
Lo Wang, Hansi. 2020. “Census Field Operations Further Delayed until April 15 by COVID-19 Pandemic.” NPR News, March 28. Acessed April 7, 2020, https://www.npr.org/sections/coronavirus-live-updates/2020/03/28/823295346/census-field-operations-further-delayed-until-april-15-by-covid-19-pandemic.
United Nations Economic Commission for Europe. 2007. “Register-Based Statistics in the Nordic Countries: Review of Best Practices with Focus on Population and Social Statistics.” New York: United Nations.
Wolfers, Justin. 2020. “The Unemployment Rate is Probably Around 13 Percent.” New York Times The Upshot, April 3. Accessed April 7, 2020, https://www.nytimes.com/2020/04/03/upshot/coronavirus-jobless-rate-great-depression.html.
Wozniak, Abigail. 2020. “Tracking COVID-19 Symptoms and Impact in Real Time: A Survey-Based Surveillance System.” Minneapolis, MN: Federal Reserve Bank Opportunity and Inclusive Growth Institute Policy Brief. Accessed April 7, 2020, https://www.minneapolisfed.org/article/2020/tracking-covid-19-symptoms-and-impact-in-real-time-a-survey-based-surveillance-system.