Cambodia's Word Puzzle: Digitising the Khmer Language

The ancient tongue of Khmer must be hauled into the computer age for this south-east Asian country to prosper

The main language of Cambodia – Khmer – has the distinction of officially (according to The Guinness Book of Records) having the longest alphabet in the world, with 74 characters. Of these, current Khmer alphabet usage is 33 consonants and 23 vowels – still fairly substantial. Khmer is the second most-commonly spoken language in south-east Asia, after Vietnamese.

Khmer script is also complex both in linguistic and written terms; there are several different accepted spellings of words and it has many ligatures which makes its computer coding quite involved.

The challenges involved in ‘translating’ Khmer into one universally-recognised computer script are manifest. Add to these obstacles the development of several programs which successfully digitise the language but were not developed using Unicode and the potential for too many different program scripts with little or no compatibility across operating platforms was always high.

In the past 10 years there has been a concerted effort to produce a Khmer computer script which can be used easily in government, businesses and schools throughout what is a predominantly rural, agricultural south-east Asian country with a population of almost 15 million people. These social factors are what make a reliable digital version of Khmer so crucial if the country is to develop economically after its decimation at the hands of the Khmer Rouge in the 1970s.

Since 2004, the Open Institute, a non-governmental and not-for-profit organisation in Cambodia, has been working to take the Khmer language digital and also to improve the digital lexicography of this ancient language, dating back to around 600 AD.

Its flagship program, and one which underpins all of its projects, is the Khmer Software Initiative (KhmerOS). This aims to provide computer tools in Khmer that can then be used by everybody. Although its funding for the programme ceased in 2012, the institute continues to provide support on very limited resources, including for its Open Schools Program, which uses the work done by KhmerOS to indirectly, through the training of teachers, educate and empower arguably the most important demographic in how to use technology – children and young people in the country.

Statistics from the World Bank show that in 2012 primary school enrolment in Cambodia stood at 124%:  12 percentage points more than in the whole of the East Asia and Pacific region in 2011. (It’s not unusual for this gross enrolment ratio to exceed 100% because of the inclusion of under- and/or over-aged students in that age cohort.

However, the World Bank also found that in Cambodia, only 3% of women and 6% of men surveyed for its 2010 Demographic and Health Survey had been educated beyond secondary school level.

Against this background, having one universal digital script is even more important in order to make communication easier for all Cambodians and advance the country’s economic development. Given such a low number of Cambodian adults making the onward journey from primary education, the digitisation of Khmer is essential, as it is the only language for the majority of the population.

Although younger people and those living in the major cities in Cambodia may be relatively proficient in English, director of the not-for-profit group Open Institute, Javier Sola, warns: “It is definitely not true that English is replacing Khmer. The number of Cambodians who know sufficient English is extremely low. It is more common in NGO circles, but the number of people who can have a business conversation in English is less than one per cent of the population.”

The KhmerOS project translates free and open source applications (for example, word processing, email and internet) into Khmer. However, as well as assisting Cambodians move forwards in terms of technological ability and digital communication, the digitisation of Khmer is also helping keep this language alive, helping to preserve the rich and long history of the country. This is particularly important not only to support the digital use of Khmer, via a spellchecker facility and an electronic dictionary, but also by digitising all the surviving lexicographic resources which were almost all lost after the Khmer Rouge genocide.

It’s also a long-term project. At the moment, the reference dictionary for the Khmer language only defines 17,000 of the almost 50,000 words currently used. Over time, and with dedication and patience, it will be possible to produce a Khmer language dictionary online with all 50,000 definitions.

The challenge for Cambodia now is to equip and educate its population in how to use the digital version of Khmer across not just computers but also on mobile phones and handheld devices.

Sola explains that there’s still some way to go:

“The lack of support for Khmer scripts in phones (only 29% of phones in 2013 supported Khmer script) has led some people to send SMS or write in Facebook using the Latin script to write Khmer, but this is incomprehensible to most Cambodians and it can only express very simple ideas, as the vocabulary used is very short. The Khmer language and Khmer script are the only possible future for Cambodia.”


Soraya Moeng is a London-based journalist and editor with over 15 years' experience across a wide range of sectors. Formerly deputy editor of Financial World, she has also worked as a radio journalist, charity editor and copywriter for a number of blue-chips. Follow Soraya on Twitter @moeng_s