Wikipedia licenses its content
On Jan 15, Wikipedia's owner confirmed that it has agreed licensing arrangements with Microsoft, Meta and Amazon. It had previously agreed licensing with Google back in 2022, along with Perplexity and Mistral AI.
Why does this matter? Garbage in, Garbage out.
The quality of the replies, advice and actions taken by large language models are as good as the training data they have been using.
Wikipedia's content is created and maintained by about 250,000 volunteer editors globally, who write, edit and fact-check the information. It isn't perfect, but it is seen as being a better source than most.
With its 65 million articles across over 300 languages, most of the tech companies have been using Wikipedia as a source of training data, which is reflected in the frequency of it being cited.
But the 'scraping' of high volumes of freely available Wikipedia knowledge, isn't free. It has driven up server demand and costs at the non-profit foundation, that relies on small donations from the public.
Wikipedia has been pushing for greater adoption of its enterprise product, which gives better commercial access for their training needs.
It seems as though the tech giants have 'done the right thing'.
Source:
BESCI AI OPINION
It is good to see the AI companies 'doing the right thing', on the other hand, it is a reminder that your Large Language Model is trained by the internet and the people that hang out and write on it.