Can I Use Large Language Models and Other AI (such as ChatGPT, Google Gemini, etc.) with HRS Data?
General Statement: Currently, large language models (LLMs) and other AI tools may not be used to manage, process, or analyze data distributed by HRS.
Under HRS Data Use Agreements, researchers are forbidden to distribute data or other materials we supply (apart from codebooks and metadata, described below) to other members, organizations, or individuals. This means, that use of LLMs is a violation of all existing data use agreements.
For purposes of this policy, LLMs are classified into three categories:
- Type 1: LLMs that retain user-provided data for any purpose, including training the LLM (e.g., GPT, Llama)
- Type 2: LLMs that are licensed by an institution and have conditions of use that do not permit the retention of user-provided data (e.g., the University of Michigan’s Maizey)
- Type 3: Type 2 LLMs that are isolated within a secure network with no access to the Internet
Data Use by LLM Type
| LLM Type | HRS Data Use | Reason |
|---|---|---|
|
Type 1 |
None |
Type 1 LLMs ingest and make use of the data. This counts as redistributing the data to the company operating the LLM, so it is not permitted. |
|
Type 2 |
None |
Type 2 LLMs do not retain or make use of the data, so this does not count as redistribution. However, they are not isolated from broader networks or the Internet, so they would not comply with data security plans for Restricted-Use data. At present, we are also not accepting requests for this kind of data use for Public-Use data. |
|
Type 3 |
None |
Type 3 LLMs do not retain or make use of the data, and they are also isolated on individual machines or within secure networks; however, LLMs are not available within the MiCDA VDE or LINKAGE enclave, so this is not an option for Restricted-Use data. At present, we are also not accepting requests for this kind of data use for Public-Use data. |
Study-Level Metadata and Documentation
It is permissible to use LLMs and AI tools with our public-facing documentation, codebooks, and study-level metadata, including group or population estimates. However, use of person-level data is not permissible.
Thanks to ICPSR and Sebastian Karcher at Syracuse University for originating this taxonomy of LLMs.
