An Introduction to Artificial Intelligence: Part 2
What it is, what generative AI is, the emergence of ChatGPT and why it's taken the world by storm.
Hi, and welcome to the second post in our Classroom series that covers an Introduction to Artificial Intelligence.
Why now?
One of the big questions around generative AI is why is this happening now all of a sudden? Why has this becoming such a hot topic and why has this technology taken the world by storm recently? There are four main reasons for that:
01 - Scaling Laws
The first reason is something called scaling laws. This is a piece of research that OpenAI did back in January 2020.
The hypothesis is that as you scale the size of the models, i.e. the number of parameters, you would see an increase in the capabilities that the models had. So the race has really been on to be able to create bigger and bigger models that then start to display more and more capabilities.
02 - Data Availability
In order to scale models you need to have a lot of data available to be able to produce models of larger and larger sizes. It's only been recently that large amounts of data has been available. You can see in the table above the list of datasets that GPT-3 was trained on. It was trained on something called Common Crawl, which is effectively a lot of text from the wider Internet. There's also a data set called WebText2, they trained on a lot of books and Wikipedia as well. This totalled about 500 billion tokens that the model was trained on, which resulted in the final model size of 175 billion parameters.
03 - Compute Power
In order to train models on large datasets you need to have a lot of compute power. The hardware that these models are trained on are graphical processor units (GPUs). They're not the usual computer chips that run your laptops or your desktops, and their production has been relatively constrained over the past few years. This is partly because of covid and partly because it has been harder to produce more powerful chips as the technology has shrunk.
For a good overview of the issues surrounding the production of computer chips I highly recommend reading Chip War by Chris Miller.
So the whole industry has been relatively hardware constrained in training these models.
04 - Transformer Models
Lastly is the transformer model architecture that Google developed back in 2017. This really was a big step change in how you build and train these large language models. Without these transformer models we wouldn't have seen the large language models that we see today. So it took Google releasing that back in 2017 to really unleash the opportunity to scale the models in the way that we've seen them scale.
Emergence
When GPT-3 was released back in 2021, it was the first large language model to surpass 100 billion parameters, which is a really interesting threshold as you start to see a lot of emergent behaviour from these models.
I love the graphic above, from a Google blog, which shows all the different emergent capabilities that start to present themselves as you increase the number of parameters in a large language model.
You can see how much more competent the models get, but also the breadth of things that they're able to do as the number of parameters scales. Emergence is an interesting area and it's something that we see in nature. For example, when you see a flock of birds and murmurations, that is a form of emergence.
It is essentially complexity coming out of a lot of small, simple decisions. And that's the kind of thing that we're seeing in these large language models and the reason why they've become seemingly so useful overnight.
Hallucinations
With the scale that brings emergence, you also get hallucinations. Essentially, hallucinations are when a generative AI model gives an answer that isn’t based on anything in its training data. The challenge is that generative AI models present their hallucinations very confidently, which can be very misleading, so it’s an area that is getting a lot of scrutiny right now. Hallucinations are one of the major reasons there are a lot of disclaimers on using generative AI models and every user should be aware that they can sometimes get factually incorrect answers.
I actually think hallucinations are a really interesting area. Broadly speaking, the industry is looking at ways to minimise or eliminate hallucinations, but I like to think of them as a generative AI model’s imagination. I believe we should be researching them more thoroughly so we can understand them better and look at how we can harness them for more creativity. However, we absolutely need a mechanism to flag to users when a model is hallucinating so we’re being completely transparent.
What is ChatGPT?
ChatGPT was built by OpenAI, which is a company that is only 7 years old and has less than 400 employees, which is tiny in comparison to the large digital platforms that they coexist with now.
OpenAI was founded in 2015 as a non-profit organisation and their mission is to ensure that artificial general intelligence (AGI) —by which they mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity. The founders of OpenAI were Elon Musk (who is no longer involved), Sam Altman, who is their current CEO, Ilya Sutskever, Greg Brockman, Andrej Karpathy and Peter Thiel amongst others.
OpenAI transitioned to a for profit organisation in 2019 because of the cost of the hardware required to train all of their generative AI models - they just couldn’t sustain their costs as a non-profit. The main beneficiary of this change was Microsoft, who invested one billion dollars and invested an additional ten billion dollars in early 2023. This increased their stake in OpenAI to about 49%, making them the majority owner and is the reason why the two companies work so closely together.
OpenAI released their first Generative Pre-trained Transformer model (GPT-1) in 2018, along with a research paper that demonstrated how a Large Language Model (LLM) was able to acquire general knowledge through pre training on a large, diverse set of text. GPT-1 had 117 million parameters and it could take a query (prompt) of just over a thousand tokens.
GPT-2 was released about a year later, had 1.5 billion parameters and could take double the number of tokens in a query. That was then followed a year later again by GPT-3 which had 175 billion parameters and was trained on data up to October 2019.
GPT-3.5 was released in March 2022, which was a similar size to GPT-3 but could take double again the number of tokens in a query. GPT-3.5 was trained on data up to September 2021 and they did something called finetuning, which is adjusting the model so that it was well disposed towards chat. This was the model that ChatGPT was released on in November 2022.
When ChatGPT was launched, it was the fastest growing technology that humanity had ever created, reaching 100m users in only two months. This has since been surpassed by Meta’s release of Threads, but that built off the Instagram install base of c.2.4bn users so is a slightly unfair comparison. ChatGPT’s user base has since doubled since launch to 200m monthly users as of May 2023.
GPT-4
More recently, in March 2023, we had the release of GPT-4. Unlike previous GPT models, OpenAI hasn’t officially released many details on GPT-4. However, speculation and research by the community point to some interesting features.
What we do know so far:
The number of tokens that GPT-4 can take has been hugely increased to 32k, four times the size of GPT-3.5.
GPT-4 will be multimodal, which means it is able to take in more than just text and also able to output more than just text as well. For example, it accept images as inputs and generate captions, classifications, and analyses. It can also output code, tables, data visualisations and many other types of content.
OpenAI also added plugins to GPT-4. Plugins are tools designed specifically for generative AI and help ChatGPT access up-to-date information, run computations, or use third-party services. These tools have a wide variety of use cases. For example, you can ask ChatGPT to look up flight times, book you a table at a local restaurant or read a PDF document.
One of the plugins that OpenAI initially released (but has since temporarily disabled for reasons) was Browse with Bing that allowed ChatGPT to access open webpages.
What we think we know so far:
It's thought that GPT-4 is an order of magnitude bigger than GPT-3.5 with 1.76 trillion parameters.
It’s believed that the 1.76 trillion parameters aren’t one single model but multiple models that are all fine-tuned and trained slightly differently. This is called a Mixture of Experts model.
It’s likely that GPT-4 has at least one model fine-tuned for safety and another specifically for coding tasks. Exactly how the models combine and work together is currently unknown, but something that I’ve speculated about in a previous post.
This is a really interesting approach that OpenAI have taken with GPT-4 and points to a future where multiple models are working together to get to the best answer, which has been shown by research to improve results.
In summary, GPT-4 has moved the capabilities of generative AI on another step with the introduction of multimodality, plugins and the mixture of experts approach.
Performance
GPT-4 has shown it self to be incredibly capable across a large variety of tests and is currently the top performing large language model. As you can see from the graphic above, GPT-4 can already match top human performance in SATs with many other knowledge domains showing significant improvements over GPT-3.5 (ChatGPT). Every week more and more research is being published that demonstrates GPT-4’s ability to match or even surpass human performance across a wide variety of disciplines. We’ve seen examples of GPT-4’s capabilities in creative thinking, humanities and social sciences, law and plenty of others.
Where next?
During the course of Sam Altman’s global tour advocating for increased regulation of generative AI models, he dropped a few hints on the future plans for GPT-4 and beyond:
OpenAI want to increase the speed of GPT-4 and decrease the costs. (currently GPT-3.5 is 4x faster and 10x cheaper than GPT-4).
OpenAI want to increase the query size ever further, potentially up to 1m tokens like their rival model Claude 2 from Anthropic.
OpenAI wants to continue to develop the plugin ecosystem and the multimodal capabilities of GPT-4.
OpenAI have also hinted at a ‘stateful’ version of the GPT-4 API that remembers conversation history.
Open AI have also stated that they believe the scaling laws for large language models still holds, so I expect GPT-5 to continue their trend of building larger models.
The Hype is Real
ChatGPT took the world by storm when it was released in November 2022 because it was the first generative AI model that was over 100bn parameters and showed a wide variety of emergent capabilities. It also had a simple and intuitive interface and had been fine-tuned specifically for chat.
But ChatGPT is far from the only generativeAI model that has seen a huge amount of success. Below is a list of some of the other generative AI platforms that have gained a lot of traction:
Claude 2 - a chat model, similar to ChatGPT by Anthropic.
LLaMa 2 - a large language model that has been publicly shared by Meta.
Stable Diffusion - a test-to-image model.
Bard - a chat model, similar to ChatGPT by Google.
Dall-E 2 - a test-to-image model.
Midjourney - a test-to-image model.
Jasper.ai - a model fine-tuned for marketing applications
Character.ai - a platform for creating different characters you can interact with.
So why is there so much hype around generative AI? A big reason is that generative AI models have now got to the size and capability where they are pretty good at a wide variety of things. This is a huge difference to previous artificial intelligence models that were very good at only one thing. But what is the impact of this going to be?
Carl Frey, the Director of the Future of Work at the University of Oxford summed this up perfectly in a webinar when he said that generative AI would reduce the barriers to entry for knowledge work in the same way GPS and Uber had reduced the barriers to entry for taxi drivers. This idea is really exciting because it will be the first time that AI has been able to disrupt knowledge work, which he hopes will lead to people being able to accomplish much more with their time.
Another reason there has been a lot of hype around generative AI is because of the huge amount of investment and resources being thrown at the technology right now. And much of this is in the open source community, thanks to the leak of the LlaMa model from Meta.
Prior to March 2023, building Large Language Models (LLMs) was the domain of large, resource rich companies. For example, GPT-3 cost c.$4.6m and 30 years of processing time to compute. That all changed when Meta’s GPT-like LLM model was leaked to the open source community a week after it was officially release. This led to an immediate levelling of the playing field with many open source developers now able to run LLaMA locally. This greatly accelerated and diversified generative AI development and has led to a huge proliferation of new generative AI platforms, products and services.
The generative AI open source community has been moving at such a pace that it raised a huge amount of concern with engineers at Google. There was a famous thread on Twitter discussing this and predicting that the open source community would win out over time.
Where the open source community is absolutely excelling is getting more and more capabilities out of smaller models. Where this becomes fascinating is I think within the next year we’ll see models that are small enough to fit on a mobile phone and have a similar capability to GPT-3.5. This could be a game changing, especially in countries where mobile internet connectivity isn't great. We’ll soon have a huge knowledge base on our phones without having to rely on an internet connection.
Summary
This brings us to the end of the second instalment in our Classroom series where we’ve introduced Artificial Intelligence. We’ve covered why generative AI has suddenly become an incredibly important technology in 2023, we’ve taken a deep look at OpenAI and ChatGPT, covered other generative AI platforms and demonstrated the importance of the open source community.
In the next article we’ll be covering some of the recent research around generative AI and looking at the impact that the technology is already having on consumers, businesses and society.
Further Reading
If you’re interested in diving deeper into some of the topics covered in this article, below are some links to other interesting reads/watches/listens:
“The future is already here, it’s just not evenly distributed.“
William Gibson
This article was researched and written with help from ChatGPT, but was lovingly reviewed, edited and fine-tuned by a human.