Sindhi becomes the first language from Pakistan to be selected for digitization


Karachi- In May, this year, the Sindhi language achieved a historic milestone by becoming the first language from Pakistan to be selected for digitization by the Universal Dependencies, a combined project of Stanford University and Google.

Universal Dependencies (UD), developed in 2005, is an ongoing project working to convert languages into machine-readable formats. It has, to date, selected 100 languages, including Sindhi out of the 6,000 human languages spoken globally.

Urdu, the national language of Pakistan, was also picked up by the project, but it was proposed for inclusion by contributors from India.

“Sindhi’s inclusion in UD digital platform is a major breakthrough,” Dr. Mazhar Ali Dootio, a computational linguist, who was among those who applied for Sindhi’s registration in the UD, told Geo News.

Dr. Dootio adds that digitalization is depleting world languages faster than ever in human history. “English is already digitized therefore it is secure” he said, “Including Sindhi in the UD framework means that it will now become a universal language and is on the path of digitalization.”

According to the United Nations, at least 43% of the estimated 6,000 languages spoken in the world are endangered. The United Nations Educational, Scientific and Cultural Organization (UNESCO) warns that half of these languages will be extinct by the end of this century.

Sindh is one of the world’s oldest languages, written in right-handed Perso-Arabic script. Linked to the Indus Civilization, the language’s first recorded script samples were found during excavation of the UNESCO heritage site of Mohenjo Daro in Sindh.

“Historians are working to decode the script found from Mohenjo Daro,” Professor Dr. Muhammad Ali Manjhi, chairman of the Sindhi Language Authority, an autonomous institution in Sindh, told Geo News. He adds that it was during the British Era when Sindhi was declared the official language of Sindh.

“After which Sindhi got much prosperity,” he explained, “We saw lots of books and literature published in the language during British era and afterwards.”

According to the U.K.-based World Mapper, which offers a collection of world maps, Sindhi is spoken by approximately 24 million people in at least nine territories.

After the UD addition, Sindhi will now be accessible online for translation through more than 150 treebanks, said Dr. Dootio.

“Sindhi can be digitized in the next 10 years,” he adds, “If the government helps, which it is not at the moment, the process can be completed earlier, in six to seven years. Once Sindhi is digitized, it will join English, as one of the few languages which are completely digitized.”

Leave a Reply

Your email address will not be published.