Uploaded on May 12, 2023
Open AI updates data usage policies after data breach, states content sent via API not used to train LLMs; ChatGPT not included in the update.
ChatGPT, Translation, and Confidentiality — ‘We May Use the Data’
ChatGPT, Translation, and Confidentiality —
‘We May Use the Data’
There are no specific terms of use for Open AI’s consumer services, which is
how the company classifies ChatGPT. The answer to what happens to content
submitted to ChatGPT for translation is not found in the user terms of service,
as most people would expect. Instead, it is found in Open AI’s Data Control
Frequently Asked Questions (FAQs) and various linked documents.
The terms of use, or what the company calls data usage policies, govern its API
services. These policies have changed since March 2023, which is when Open AI
confirmed a data breach caused by a bug in ChatGPT’s source code. Compelled
by public criticism for the breach, the company updated the policies to address
data confidentiality and security concerns.
On Open AI’s end, the policy regarding data submitted by customers via its API is
that it will not be used to train or improve the models.
Open AI stated that a vulnerability in the Redis open-source library used by
ChatGPT caused some active users’ chat history to become visible to other users
active at the same time. It also acknowledged that some payment information
from premium users was leaked in March as well, but played down the potential
consequences of this breach.
Enter at Your Own MT Risk
One of the documents linked in the data usage policies is a general statement of
how data is used when transmitted across its consumer services: “When you use
our non-API consumer services ChatGPT or DALL-E, we may use the data you
provide us to improve our models. You can switch off training in ChatGPT
settings (under Data Controls) to turn off training for any conversations created
while training is disabled …”
No distinction is made between these policies for the free and the paid
subscription service, called ChatGPT Plus. The paid version just makes the
service available in high demand, and it claims to be faster and to offer priority
access to new features.
For API usage OpenAI states in its Data Usage Policies, “OpenAI will not use data
submitted by customers via our API to train or improve our models, unless you
explicitly decide to share your data with us for this purpose. You can opt-in to
share data. Any data sent through the API will be retained for abuse and misuse
monitoring purposes for a maximum of 30 days, after which it will be deleted
(unless otherwise required by law).”
Language translation is just one of the many tasks ChatGPT is capable of
performing, and there is no specific mention of content submitted for that
purpose in the updated data usage policies.
Upon searching in the Support area of Open AI’s site (called “Advice and answers
from the OpenAI Team”) to see if there are any specific mentions, users are
redirected to the Data Controls FAQs.
Lock That Door
It is still early in the LLM evolution to see a large-scale use of its translation
capabilities, but there have been some early integrations with translation
management systems, which will depend on robust encryption and safety
features like two-factor authentication to secure this data.
As an example of easy yet risky access, ChatGPT was also being used by
Samsung employees for translation and other tasks until the company’s
leadership prohibited use of the AI tool altogether in April 2023, citing security
concerns.
Unfortunately for the general [non-paying] public, sensitive information cannot
be considered secure when submitted to free translation services. In an example
that precedes LLM-served translation by a few years, after confidential texts
that were submitted to Translate.com’s free service popped up in search engines
like Google and Microsoft in 2017, the company admitted that translations were
“sent to our community to improve accuracy.”
Other providers, like DeepL Translate, have also made the news regarding
questions around data management. DeepL has separate terms and conditions
for the free MT product and the Pro version. Under a section titled “Processing
of the submitted Texts” (a header that looks like unedited German into English
MT), the terms for the free version state that content uploaded for translation,
as well as the translations generated and post edited, are processed for an
unspecified amount of time to train neural networks and translation algorithms.
What these MT providers have in common with ChatGPT, as far as translation is
concerned, is that users are made responsible for their use of data, confidential
or otherwise. In all cases, users are also allowing companies to use data for
various purposes unless they opt out.
Comments