Quantcast
Channel: Have there been attempts to train LLMs using generated "I don't know" data to decrease hallucinations? - GenAI Stack Exchange
Viewing all articles
Browse latest Browse all 3

Have there been attempts to train LLMs using generated "I don't know" data to decrease hallucinations?

$
0
0

Correct me if I am wrong, but I assume that hallucinations can be primarily blamed on the training data underrepresenting responses that specifically state that someone doesn't know a specific fact. On the internet (the source of most training data AFAIK) either you know a certain fact and you share it, or you don't know it and you don't mention your lack of knowledge. Thus LLMs seem to replicate the training data, rarely answering with "I don't know" responses.

It would be fairly easy to generate a big set of training data that's basically countless of questions for which there are probably no known answers with a simple "I don't know" response.

Examples: "What is the average height of elephants in northern India? I don't know", "What is a unique thing about Soest, the Netherlands? I don't know", "What did the US president eat for breakfast yesterday? I don't know".

Have there been any attempts to train an LLM with such data and did it decrease the incidence rate of hallucinations?


Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles





Latest Images