ChatGPT and data: Everything you need to know

We take a look at what ChatGPT does with data and how users can stay safe

Add bookmark

ChatGPT on screen

Since OpenAI unleashed ChatGPT onto the world, opinion has been split between those who believe it will radically improve the way we live and work and those who are worried about its potential for disruption, particularly on the privacy of individuals and organizations.

There have already been incidents where sensitive data has been leaked and employees have landed in hot water after entering confidential company information into the chatbot, with some countries even issuing a temporary ban on its use for data protection purposes.

So, what does ChatGPT do with your data and how can you use it securely?

Where does ChatGPT get data from?

Chat GPT is an artificial intelligence (AI) tool that is powered by machine learning (ML), which means it uses ML algorithms to understand and respond to user prompts in a conversational manner. To do this, it has been “trained” with vast quantities of information scraped from the internet, including 570 GB of data from books, Wikipedia, articles and other online content.

Holding this amount of information gives it the ability to answer questions, write essays, create and debug code, solve complex mathematical equations and even translate different languages.

However, as a natural language processing tool it works on probability, answering questions by predicting what the next word in a sentence should be based on the millions of examples it has been trained on. This means that the information it provides can be inaccurate or incomplete and as most of the data it contains was produced before 2021, it can’t provide information related to events of the last two years.

How secure is ChatGPT?

ChatGPT saves the prompts you enter and its responses to keep training its algorithms. Even if you delete your conversations, the bot can still use this data to improve its AI. This presents a risk if users enter sensitive personal or company information that would be appealing to malicious parties in the case of a breach.

Additionally, it stores other personal details when in use, such as your approximate location, IP address, payment details and device information (although most websites store this type of information for analytics purposes so this is not unique to ChatGPT).

The data collection methods deployed by OpenAI have raised concerns among some researchers, as the scraped data can include copyrighted material. In an article on The Conversation, Uri Gal, Professor in Business Information Systems at the University of Sydney, called ChatGPT “a privacy nightmare”, stating that “if you’ve ever written a blog post or product review, or commented on an article online, there’s a good chance this information was consumed by ChatGPT.”

ChatGPT and cybersecurity

There is also evidence that ChatGPT is already being used for malicious purposes. Its ability to write code means it can be used to create malware, build dark web sites and enact cyber attacks.

At a recent CS Hub advisory board meeting, members talked about noticing that ChatGPT was being deployed to engineer highly sophisticated phishing attacks, using it to improve language as poor spelling and grammar are often tell-tale signs of a phishing attempt. They also reported that it is being used to help malicious actors better understand the psychology of the intended recipients, with the aim of putting them under duress so that phishing attacks are more effective.

In March 2023, more than a thousand AI experts, including OpenAI’s co-founders Elon Musk and Sam Altman called for an immediate pause on the creation of any further, major generative AI tools for a minimum of six months, to allow time for researchers to better understand the risks they pose and how to mitigate them.

 

What data breaches have there been so far?

OpenAI has confirmed that a bug in the chatbot’s source code may have caused a data leak in March 2023, enabling certain users to view parts of another active user’s chat history. It may have also made available payment-related information belonging to 1.2 percent of ChatGPT Plus subscribers who were active during a specific time period.

OpenAI issued a statement saying that it believed the number of users whose data was revealed was “extremely low”, as they would have needed to open a subscription email or clicked certain functions in a specific sequence during a particular timeframe to be affected. Nevertheless, ChatGPT was taken offline for several hours while the bug was patched.

Prior to this, Samsung had three separate incidents where confidential company information was entered into the chatbot (Samsung’s source code, a transcription from a company meeting and a test sequence to identify defective chips) that led to disciplinary investigations.

The data was not leaked as far as we know, but as mentioned above, everything inputted into ChatGPT is stored to train its algorithm, so the proprietary information entered by Samsung staff members is, in theory, now available to anyone using the platform.

What does OpenAI say about the security of its data?

OpenAI says it carries out annual testing to identify security weaknesses and prevent these from being exploited by malicious actors. It also runs a ‘bug bounty program’, inviting researchers and ethical hackers to test the security of the system for vulnerabilities in exchange for cash rewards.
It is worth noting that according to OpenAI, data submitted through its API is not saved, unlike ChatGPT.

To find out exactly how secure data is within ChatGPT we went straight to the source and asked how great a risk there is of it experiencing a data breach.



ChatGPT responded by saying that it does not have direct control over the security of the systems that store and handle its data. “However, OpenAI, the organization behind ChatGPT, takes data security and privacy seriously,” it said. “They have implemented measures to protect the data and minimize the risk of a data breach.

“That being said, no system is entirely immune to potential security vulnerabilities,” it continued. “There is always a small inherent risk of a data breach or unauthorized access. However, organizations like OpenAI employ various security practices, including encryption, access controls, and regular security audits, to mitigate these risks and ensure the confidentiality and integrity of the data.”

What can users do to keep their data safe?

As with all digital applications, ChatGPT’s own advice is that “if you have concerns about the privacy or security of your interactions with ChatGPT, it's advisable to avoid sharing any personally identifiable or sensitive information. While OpenAI aims to provide a secure environment, it's essential to exercise caution when interacting with AI systems or any online platform.”

Other general advice is to create a strong and unique password and close the application after using it, especially on shared devices.

It is also possible to opt out of allowing ChatGPT to store data by completing an online form.

In April this year, OpenAI introduced a new feature enabling users to turn off the chat history. Any conversations started after turning on this feature are not stored to train the algorithm and do not appear in the history sidebar, with the caveat that they are still retained for 30 days before being permanently deleting.

The future of ChatGPT and cybersecurity

Jonathan Jackson, director of sales engineering APJ at BlackBerry Cybersecurity, believes that the likelihood of cyber attacks linked to ChatGPT occurring in the near future is inevitable. “There are plenty of benefits to using this kind of advanced technology and we are just scratching the surface of its potential, but we also cannot ignore the ramifications,” he wrote for Cyber Security Hub in February 2023. “As the platform matures and hackers become more experienced, it will become more difficult to defend without also using AI to level the playing field.”

He added that AI is increasingly being used to create convincing phishing messages that trick people into providing sensitive information or installing malware, while AI tools can also launch distributed denial of service (DDoS) attacks, overwhelming an organization’s systems with traffic to disrupt its operations.

“There are plenty of benefits to using this kind of advanced technology, but we also cannot ignore the ramifications.”

Calls to regulate generative AI are certainly increasing. As well as the open letter calling for a six-month moratorium on developing further projects (which now has more than 30,000 signatories), ‘godfather of AI’ Geoffrey Hinton made headlines when he stepped down from his position at Google to enable him to speak publicly about the potential dangers of the new technology.

Governments around the world are also discussing regulations. The European Union has been the first to propose a major law called the AI Act, which assigns applications of AI to different risk categories. For example, an application that uses personal data to run social scoring of the type the Chinese government deploys would pose an “unacceptable risk” and would be banned.

The ban on the use of ChatGPT in New York City schools has now been lifted, but the US government has released its “Blueprint for an AI Bill of Rights,” which advises businesses how to use AI ethically.

Meanwhile in the UK, formerly enthusiastic prime minister Rishi Sunak recently announced a change in approach to AI that will require its use to be introduced “safely and securely with guard rails in place”. Between February and April 2023, the UK’s Home Office ran an open review of the 1990 Computer Misuse Act, to propose changes that would ensure it covers AI-powered cyber attacks.

While we are only just starting to scratch the surface of what AI is capable of, cyber security professionals and governments need to get ahead of this fast-developing technology before it is too late to avert a serious attack.

Read more