What the Foundational Models Know

Broadly generalizing, the prominent foundational models were trained on enormous amounts of (supposedly) public and open-source data. However, two characteristics of the training inputs are important to keep in mind as you use these models: the currency of the training data, and the breadth of the content.

Let's take GPT-3 as an example:

GPT-3, developed by OpenAI, was trained using a variant of the Transformer architecture, a type of neural network particularly suited for handling sequential data, such as text.
GPT-3 was mainly trained in an unsupervised manner, meaning it wasn't given explicit "correct" answers. Instead, it learned by predicting the next word in a sequence from massive amounts of text.
After the initial training, GPT-3 was fine-tuned on narrower datasets with more explicit guidance to make it more controlled and safe in its responses.

Timeframes

Both GPT-3 and GPT-4 have issues of updateability, as they supposedly rely on data with a cutoff date of September 2021, which may lead to inaccurate or incomplete responses, especially if you are asking about more current events or information. We’re still a long way from offering guarantees that a change in the world is reflected in model behavior.

They also have an issue of provenance. In traditional web-search, we can get to the specific article that provided the output search and verify whether the info is correct. In contrast, when we use LLMs, we just get the output response. The model might provide a provenance string, telling us where it found the information, but we still can’t asses whether we can trust it or not.

<aside> 👉 Next: Open Source vs. Proprietary

</aside>

<aside> ☝ Back to The Models

</aside>