Cyber security risks with AI language models
One of the hottest topics in artificial intelligence (AI) right now is the advances in large language models such as GPT (Generative Pre-training Transformer) and the chat-bot adaption ChatGPT from OpenAI. In this blog post I will take a closer look at ChatGPT and the risks implied with using AI language models.
Written by Christoffer Brax, AI Systems Architect @ Combitech
An AI language model is designed to process and generate human-like text. Language models are trained on large datasets of text and can predict the likelihood of a word or sequence of words appearing in a given context. They are used for a variety of tasks, including language translation, text generation, and natural language processing.
What is ChatGPT?
The primary difference between ChatGPT and other language models is its ability to maintain context and carry a conversation. ChatGPT can keep track of previous statements and responses in a conversation and use this information to generate more appropriate and coherent responses. This allows ChatGPT to have more natural and «human-like» conversations.
Since the release of ChatGPT, people all over the world have been amazed by the capability of the model in many diverse fields. It can answer questions about most topics, generate and run code examples for programming questions, write recruitment ads, rewrite songs and poems and much more.
What also differs ChatGPT from previous language models is that it is trained on a much larger dataset, in many languages. This is very important for smaller languages (such as Swedish) that previously didn’t have any good language models. Compared to previous chat-bots that are used on webpages to answer questions, ChatGPT can answer questions about everything and not only a limited set of pre-defined questions and keywords.
Compared to a web search engine, ChatGPT can formulate answers to a query in well formulated text and not just return a list of webpages that probably include information requested in the query. However, the web search engine constantly updates its database of webpages, while in comparison ChatGPT needs a computationally heavy and time-consuming retraining to update its knowledge.
ChatGPT can on the surface be compared to digital assistants (such as Amazon’s Alexa or Apple’s Siri) but with a completely different internal representation of knowledge.
New technology, new risks
As with all new technologies, AI introduces a lot of possibilities as well as new risks. Looking at cybersecurity, with new language models such as ChatGPT, there is of course a risk of cyber criminals using the same technology for unlawful purposes.
Language models could be used to automatically create very convincing misinformation customized for specific individuals on a large scale. It can also be used to generate tailored phishing emails and interactive phishing chats for instant messaging platforms. This kind of phishing might be much harder to detect by the victims of the scams compared to “ordinary” phishing.
Already, people have hard realizing they are chatting with a bot and not a real human. The bot can be very convincing and use all the “tools” available to persuade a human to for example reveal sensitive information.
Another risk is to use language models to create new malware. Already we have seen that ChatGPT can analyze source code and find vulnerabilities. It can also generate code to exploit the found vulnerabilities. This can again be used by cyber criminals to find new ways to attack systems. On the other hand, this can also be used to improve the quality in software and find bugs and potential vulnerabilities at an early stage. Just as ChatGPT can be exploited, it can increase the security of IT systems.
Use with caution
There are a few things to consider when using ChatGPT (or other language models):
- Bias: Language models can reflect the biases present in the data they are trained on. It's important to be aware of this and take steps to mitigate any potential biases in the model's output.
- Limitations: While ChatGPT is a powerful language model, it is still limited by the data it was trained on and may not be able to generate responses to all prompts or questions.
- Ethics: It's important to consider the ethical implications of using AI language models, particularly in situations where the model's output could have significant impacts on people's lives.
- Accuracy: It has been demonstrated that ChatGPT and other language models not always generate accurate responses. It's therefore important to always double-check the output and ensure that it is correct and appropriate.
In general, it's important to use ChatGPT and other language models responsibly and with an awareness of their limitations and potential impacts.
Disclaimer: Some parts of this blog post might have been generated with a language model.