Generative A.I. is an enticing but scary proposition for businesses, with as many risks as potential benefits.
Prosus, an Amsterdam-based tech and media investor and operator that I met with recently, is a good example of how a business can deploy generative A.I. while being smart about minimizing the dangers (which range from "hallucinations" to potential copyright infringement).
The company, which is majority owned by South African media company Naspers and is publicly listed on Euronext with a $147 billion market cap, is not an A.I. novice: It created a center of A.I. expertise in 2018, recognizing that the technology could provide benefits across its global portfolio of companies, which ranges from food delivery startups to edtech firms.
At first, the team of roughly 15 machine learning researchers and engineers focused on developing tools for fraud detection and recommendation engines, Euro Beinat, who heads the team, tells me. Prosus has a systematic process for deciding which models it will deploy in production, based around a key business KPI for the model. And the company is constantly pitting the A.I. model it currently has in production against challenger models—often built using different designs. If a challenger beats the incumbent on the KPI target, it replaces the production system and so on, Beinat says, creating a cycle of continuous process improvement.
When Google made its Transformer-based language model BERT—which is a small language model compared to the behemoths that power ChatGPT—available as open-source software in late 2018, Prosus’s A.I. team realized they had enough data across its portfolio companies to train several bespoke versions. They created a “Food BERT” that helped categorize tens of millions of menu items—were they vegetarian? Indian or Japanese? Spicey or sweet?—from all of the restaurants served by Prosus’s various food delivery startups. The magic of Food BERT is that it could arrive at accurate classifications without having to rely on keywords, and it could do so seamlessly across multiple languages, Paul van der Boor, Prosus’s senior director of data science, says. And once Food BERT created this “food knowledge graph,” Prosus could use it to power a recommendation engine that would suggest restaurants—or additional menu items—to customers based on what kind of food they were in the mood for.
Prosus also created its own “Fin BERT,” which could be used to do sentiment analysis of financial news releases and earnings calls. Beinat says it was created to be a tool for the Prosus investor relations team, which used it to edit the scripts Prosus’s executives read on earnings calls, ensuring the language was as positive (or at least neutral) as possible.
Those days already seem like ancient history compared to what is possible with today’s ultra-large language models and generative A.I., van der Boor says. First, Prosus has used large language models (LLMs) in ways that are somewhat similar to the use cases it found for the smaller language models, such as BERT. It used LLMs models to classify content and types of students for its ed tech companies and to categorize items for sale on its various digital classified marketplaces, where the descriptions are written by users. Van der Boor says these use cases are relatively safe “because we're trying to understand what our users are putting on the marketplaces, or what are they learning, as opposed to immediately giving answers to questions.” In other words, here LLMs are being used to analyze existing unstructured data.
To explore LLMs’ more generative side, Prosus created a digital assistant chatbot that runs in its group-wide Slack channels called Plus One, initially built on OpenAI’s GPT-3 model. “We called in Plus One because it's like having an extra team member you can tap on the shoulder and say, ‘Hey, tell me something more about this,’” Beinat says.
Prosus has experimented with several different LLMs and other text-to-image generators from different vendors and open-source hubs. In some cases, it might use one particular LLM to answer one particular kind of question, because the company has found that LLM performs better for that sort of query. Plus One has thousands of users across the Prosus group of companies. It can be used to find and analyze documents in internal databases. Or it can transcribe meetings and then answer questions about the meeting, generating key takeaways and action points, as well as extracting key numbers mentioned in the meeting.
To gather feedback about how the bots were performing, Prosus asked employees to select an emoji for each answer Plus One generates: a thumbs up for a good answer, a heart for a great answer, a thumbs down for a bad answer, and a Pinocchio emoji if Plus One has returned inaccurate or invented information. At first, Pinocchios happened about 15% of the time, van der Boor says. But Prosus began to learn what questions were most likely to result in Pinocchio answers and take steps to steer Plus One to more accurate responses through the use of different meta-prompts, as well as prefiltering what data the model was using to find information, or post-filtering of its answers to weed out obvious problems. Through these techniques, Prosus has reduced Plus One’s Pinnochios down to just under 5% of the answers it gives. “It's still not zero, right? That's a known problem. But [the methods for reducing inaccuracies] are a kind of learning that we can then transfer back to the company,” he says.
One key was building a pipeline in which the language model first categorizes the question prior to trying to answer it, Beinat says. If the question refers to an internal process or document, the prefiltering pipeline then prompts the model to use a vector database or some other search function to find relevant company data, and only then summarize the information. This tends to produce much more accurate results than simply asking the LLM to answer the question immediately based on its pre-training. And, as an added precaution, Prosus doesn’t allow Plus One to actually produce any analysis for users. It can recommend documents to a user that the person can use themselves to find specific information or produce certain analytics, but it won’t do any calculations on its own. The reason, van der Boor says, is that Prosus is concerned about inaccuracy. “We’ve really looked at this because it is a common use case that people say they want to see,” he says. “But we’ve learned the hard way in a sense, because we are trying to do this at scale, that this is just not there yet.”
When it comes to using generative A.I. for images, potential copyright infringement “remains an unresolved concern,” Beinat says. But he points out that in most cases where the company has experimented with A.I.-created images, for instance creating images of menu items for restaurants that did not have photos with their online menus, the images that were needed were highly generic. He also says that having so many Plus One users across the entire Prosus group of companies means that potential legal or ethical pitfalls—such as copyright issues—get spotted faster. "Plus One is a mechanism for collective learning, but also a mechanism for, let’s say, collective checking," Beinat says.
There are lessons here for any company. Perhaps the biggest one is that companies should start playing around with the technology today, but be cautious about putting it into customer-facing applications or relying on it for any mission-critical applications.
With that here’s the rest of this week’s A.I. news.
Jeremy Kahn
@jeremyakahn
jeremy.kahn@fortune.com