Access to information is relatively easy for the city dweller for whom knowledge is at the tip of the finger. Not so much is the case beyond the urban boundaries.
Rural communities frequently depend on community radio, neighbourhood newspapers, and volunteer organisations for hyper-local information. But the corpus of knowledge produced by these entities often remains localised and is absent on the internet making it difficult for the people to re-access it. Added to this are the language challenges.
Students of International Institute of Information Technology-Bangalore (IIIT-B) have devised a solution for this by developing a search interface for colloquial audio content in Kannada language.
Called Graama-Kannada Audio Search, the interface allows the user to search for and access hyperlocal information from the Tumakuru region in audio format.
A search interface for rural communities
The framework was developed by Sharath Srivatsa (PhD Scholar, IIIT-B), Aparna M. (M.S. by Research Scholar, IIIT-B) and Sai Madhavan G. (iMTECH student, IIIT-B) under the guidance of Srinath Srinivasa (Professor and Dean (R&D), Web Science Lab, IIIT-B) and with the help of T. B. Dinesh (iruWay Rural Research Lab, Janastu).
Namma Halli Radio is a community owned WiFimesh radio run by Janastu NGO in the Tumakuru region. Over the years the radio grew an audio corpus rich with information on local customs, cultures, festivals, Covid-19 awareness and so on. But the absence of this data on the internet meant that people could not access the information at a later stage.
The IIIT-B team worked with the community radio and fed the latter’s audio corpus into their search model. The audio was transcribed into text using automatic speech recognition (ASR) models. When a user searches for a certain keyword, this transcribed text would be matched with it to deliver results.
The user can search using keywords in Kannada or English text to obtain results in audio format. The audios would be timestamped to denote the exact location of the keyword.
“For example, someone wants to search for a specific term, say Red Cross. They can just type in the word in English or Kannada. And they’ll be provided with all the audio from the Namma Halli corpus where the word occurs. They can even just jump to the time where the word occurs,” explains Aparna M., one of the team members who developed the interface.
The missing colloquial
Artificial Intelligence (AI) models rely on data fed to them to give outputs. The bias of the data reflects in these models too, as a result of which AI models often fail to reflect the heterogeneity of the human population.
Models like Graama Kannada become relevant here.
Graama Kannada search interface could help add colloquial dialects to language models which have been trained either in English or sanitised formal versions of Kannada.
“The problem with LLM (large language models) is that they are mostly built for a very formal type of Kannada like what is spoken on the All India Radio or seen in a newspaper. They don’t work very well when a person uses colloquial style language to search something,” Ms Aparna explains.
“The main focus of our work is to build models that will be suitable for colloquial content. Since we have access to the community radio’s audio corpus, the model that we have built has given us better accuracy for the Tumakuru dialect,” she notes.
The application, however, currently works on text-based search. But the team notes that they plan to include audio-based search very soon.
“In the future, if someone wants to do a voice search, even if they speak in the Tumakuru dialect, our model will be better in processing it compared to other existing models. The same process can be repeated for other dialects too,” says Ms Aparna.
A window to regional cultures
While the interface has been developed predominantly keeping the community members in mind, Ms Aparna notes that it would also work as a window for the general public to get more local information about an area.
The web application provides a list of most searched words such as Tumakuru, Turuvekere, Gruha Bandhana (quarantine), Dinasi (ration), Lasike (vaccine), Muneshwara Swamy (temple in Tumakuru) and so on.
“This way even if a person is not very familiar with the community, they can understand what the corpus is about by looking at the words that we have given. These keywords can be like a clue to the community to them,” explains Ms. Aparna.
No mean task
The project was started in the beginning of 2022 as part of the PhD work of Sharath Srivatsa, who is the team lead, in collaboration with Janastu. The biggest challenge before the team was converting the audio accessed from the community-radio to text.
“Our idea was to convert the audio to text and then do all the processing on the text. But getting a model to do that was very hard. For low-resource languages (languages with less data available on the internet for training AI systems) like Kannada with dialectical variations, most automatic speech recognition (ASR) models don’t work,” explains Mr. Srivatsa.
Towards the end of 2022, OpenAI introduced the Whisper model for ASR and speech translation. In 2023 Meta also introduced its own multilingual model. The team started experimenting with them and found better results.
But there were still challenges, a major one being spelling mistakes.
“When the audio was converted to text, it had spelling mistakes. For English the word error rate is just around 10% in ASR models given that it is the standard language and spoken across the world. But when it comes to low resource languages, models are not so optimal and efficient. We got around 60% word error rate and out of that 80% was spelling mistake. That is, when the audio was converted to text, it had spelling mistakes.”
The team realised this could become a problem. If the user typed the correct spelling, but the transcript carried a wrong spelling for the same word, the model would fail to match them and deliver results.
So, to address this the team allowed a relaxed criterion or fuzzy matching using which the input text would be compared to texts that are exact or very near.
Simple UI
Once a working model was in place, they started working on the website.
“We made a very simple web application with minimal features. But we made sure that the UI was accessible enough by having Kannada and English words,” says Sai Madhav who worked on the project as part of his internship.
“You can search in English or Kannada. If you do it in English, there is this button for transliterating it from English to Kannada. Let’s say you search the name of a temple. Even with an approximate spelling, it will show you all the audio clips in the corpus that contain that word and the timestamp. So, you can seek to that particular timestamp, and you will be able to hear in what context it is being spoken about,” he adds.
Analysing contrasting worldviews
Apart from adding voice search, the team also plans to add a question-and-answer feature to the model which would allow it to give full-fledged text answer like other LLMs such as ChatGPT.
The team is also trying to analyse the contrast in worldviews between urban and rural communities.
“Information regarding modern societies and what they believe in is well documented and available as well-structured content on the internet. But that’s not the case with low-resource communities. So, we will collect some more corpus on it and try to find out more about their world views and unique beliefs. We want to mine such things and show in in our UI,” says Mr. Srivatsa.
Ms. Aparna explains it further, “We have compared our corpus from the rural region with news corpus in more formal Kannada to find that there is significant difference in the worldviews. For example, let’s take the word development. Rrural community people talk about words like panchayat or Gowda of the village and so on in the context of it. Whereas the mainstream corpus got us results like development, Bangalore and so on.”
The team hopes that in a world were AI models push dominant mainstream views, their efforts would help to add more diversity to the mix.