I got few calls about IAMAI’s report – “Number of Local Language Sites in India stands at a Pathetic 1249!”. I wasn’t able to read this report on the day it was published as I was travelling.
There are parts of the report I agree and disagree with. I contacted IAMAI and IMRB about this report and I must say they have been very supportive and responsive to my concerns (they got in touch with me in less than 15 minutes). I appreciate and thank IAMAI for listening to what I had to say. They have a huge role to play for the language community on the internet. IAMAI asked me to respond to their report.
The report has published the number of sites in each language broken by category. I would like to comment on couple of items only.
- Malayalam should have been included. People tend to ignore a language completely just because it is smaller than the biggest language (Hindi or Tamil). Every language has a user base and if that user base is large enough to monetize it is worth having that language on the web. I believe each of the South Indian languages has a monetizable user base, so we should go for each one of them.
- Telegu should be Telugu. “Religional” should be Regional. Why such casual attitude in preparing the report?
- The report rightly talks about the dismissal usage of languages on State Government sites. They don’t have a clue about the internet and they never will. Governments just refuse to take professional help in developing user friendly sites for their citizens.
- Yes, technology has to get better for users to contribute content (UGC)
Experts say that approx 70% of Indian surfers will be non-English speaking, so transliteration based editors may not really rock. Most text editors have transliteration which means they need to know English but we got to accept the fact that we have more readers than writers in languages, nothing wrong about it. - We cannot always compare the size of each language with English on the web.
Every language has a user base on the net which wants to consume language content and their size varies, for e.g. the size of Tamil internet user base is far higher than Kannada. That doesn’t mean one needs to dump Kannada on the net. The costs needs to be controlled in accordance with the market size. - The report should have talked about few good examples in the language space
Economic Times in Hindi and Gujarati are supposed to be very well. Google News has been launched in 5 Indian languages. Both of these giants wouldn’t bother to venture into the language space if it wasn’t promising. - The report should have highlighted the growth of language usage since 2000.
Let us not always see the half glass empty and whine. It is high time we have a positive approach to life and see the half full part of the glass. I am not saying the space is fully mature, we need to keep innovating. - Since language blogs are getting popular I would like to state few facts about the same.
Language Blogs
We at Oneindia.in have a blogs directory and have been concentrating on individual blog posts instead of the entire blogs. We have a decent number of language blogs (Hindi plus 4 South Indian languages) in our directory and are making every effort to increase the numbers.
IAMAI report
Hindi |
Marathi |
Tamil |
Bengali |
Punjabi |
Telegu |
Kannada |
Total |
|
Blogs |
394 |
2 |
57 |
57 |
1 |
2 |
4 |
523 |
Blogs.Oneindia.in
Hindi |
Tamil |
Telugu |
Kannada |
Malayalam |
|
Blogs |
785 |
2227 |
295 |
675 |
853 |
I hear from people that there about 4000+ blogs in Tamil and Marathi each. Hindi is far larger. When the language blog population is sizable why was the IAMAI crawler not able to identify more blogs than what they have reported? Which crawler did they use?
It is important to differentiate the reading and writing community. I strongly believe the language reader user base is far larger than the writing user base (UGC) today and it will change over time.
Mobile
There is no mention about language+mobile in the report. The mobile penetration in India is far higher than the internet penetration. The mobile penetration has seen the highest growth in Tier-II cities, which is predominantly non-English speaking. Oneindia.in recognized this fact in 2008 and started publishing its content on its WAP portal (Oneindia.mobi) and by SMS. While our WAP traffic is still small (thanks to the GPRS speed), our SMS user base has grown very impressively. This shows you have a good size of user base that wants to read language SMS.
The UGC contribution on the mobile may be small now but youngsters would figure out a way to communicate in their primary language on the mobile soon. Eterno was one of the first companies that developed a full fledged SMS suite in Indian languages.
What is missing?
- State Governments need to take the lead in developing and distributing platform independent fonts.
- Better text editors which have both transliteration (English phonetic keyboard) and Inscript (common keyboard layout for all languages)
- Search engines need to handle language content a lot better. Search engines are not throwing up good results when it comes to languages, hence many or most sites haven’t yet been discovered. Google has been investing a lot into this space and I am positive about seeing positive results
- Google Adsense needs to work in language, currently it works very well in English. Language portals need better monetization and Google Adsense would be the best bet for language publishers (after all don’t language publishers need food too?)
Conclusion
Indian language on the internet is a reality. There is sufficient content out there which can keep a surfer busy. You have serious sites, personal sites, blogs and entertainment sites. The growth in Internet penetration and better search engines would help the growth of languages on the internet.
I guess Mahesh is bang on. While Hindi has a large demographics base, the south Indian (all four markets – Tamil, Kannada, Malayalam and Telugu – in varying proportion have higher literacy and internet awareness. Also the Monetization of SI market is perhaps better. Yes, Mobile, local language will be a potent mix to take this market into a new orbit. Its also a chicken and egg situation, if you do not have content, you cannot have readers and if readers are not visible, content creators are reluctant to produce content. Look at the regional language newspapers. How have they grown in the last few years.
Recently, a Telugu newspaper trawled the Net and they annlunced the number of Telugu blogs as the highest among all Indian languages.And Telugu Wiki has more pages than any language other than English. Please check your facts.
Ok..I checked the source again (it is in Telugu !), and it says there are around 1500 active Telugu blogs, which is second only to Tamil.
But in Wiki, Telugu Wikipedia has more articles (42,039) than Hindi (24,500) and Tamil (16,657).
http://meta.wikimedia.org/wiki/Complete_list_of_language_Wikipedias_available
The number of articles show that even though no. of Telugu bloggers is less, they are far more active than the average blogger.And if we look at the Telugu film sites, the monetization factor is quite obvious.
Mahesh,
Here’s the link: http://www.andhrajyothy.com/
To paraphrase the last paragraph in the article:
With 41,806 articles, 81,607 pages, and 7,906 users, Telugu Wiki is in the number one position among all Indian languages.The national language Hindi has 24,265 articles though it has 9,449 users.Tamil is in the 6th place(?) with 16,380 pages and 5,993 users.
My point: Wiki is a good example of UGC. Can we assume that atleast 75% of Wiki contributors will have their own blogs (all of them may not be updated frequently)? Personally, I belong to TeluguBlog google group, but I blog in English 🙂
Please see http://www.kiruba.com/2009/01/photologue-of-tamil-wikipedia-academy.html which shows how active languages are on Wikipedia
It is a good analysis. I don’t know much about IMAI, but are they genuinely interested in local language development. If so, how come their site is not even in Hindi.
Second, as you noted local languages are not in a competition. They can all improve.
Governments are missing a huge opportunity. Web technologies can make the government more transparent, efficient, and accountable. Not just for filling out the forms, static information but for budget planning and policy development. For instance what if the expense accounts of all the mps are online. What if all the government programs and their cost, benefits, reach, and timeline are online. This is huge, and officially they have to be in local languages. But they are clueless about how to create large data base driven web sites in local languages. It can been done (http://www.tamilmanam.net/). The knowledge base is there. They have to use it.
For all their talk about Tamil, TamilNadu government must be shamed about the state Tamil in their hundreds of websites. 90% are in English. Most db based sites are almost completely in English. Do they not care, or are they ignorant.
Teaching local language typing is the basis for local language computing. Google has provided a Hindi automatic translation engine. It is theoretically possible to have similar engines for other languages. Even if Google does not do it, is not upto the governments to working on such projects.
Ignorance is not excuse, they should ramp up their efforts.
For Tamil informational web sites listings see:
http://ta.wikipedia.org/wiki/WP:Tamil_Websites
In addition there are ezines, and news sites.
There is a sizable blogging community. 2500 seems to be a fair number.
Malayalam wiki is probably number one in terms of quality. Number of articles along is not a good indication. The depth must also be accounted for. A simple technique is to click Random Article for a give number of time and see how many are not place holder or single sentence articles.
Good article. Very interested about Indian languages computing.
Hi Mahesh,
I beg IAMAI to discredit the crawler they used to calculate the number they’ve reported. Is this FUD at it’s worst? The web crawler they’re using obviously did not start with Google.com. A simple search for the keyword हिन्दी returns well over 37,400,000!
> Better text editors which have both
> transliteration (English phonetic keyboard)
> and Inscript (common keyboard layout for all > languages)
We are working on a third alternative that is easy to pick up and use called Lipikaar.
– Santosh
Hello Mahesh,
Since thousands of people would be reading text from many sources on the net , please be patient in making clear, clean statistical n correct data …
Telugu is the largest spoken language in the south, and has more no of blogs n wikipedia articles …. And I appreciate you in quoting the wrong statements done by respective makers nd in pointing the strong n main mistake ‘TelEgu’.
For the people who love their mother tongue …
Loving ur mother tongue should not reflect in criticizing neighbor languages, Learn its not TELEGU or TELUNGU(its there in Olden usage GRANDHIKAMU) now its TELUGU … Thanks …
nanihothearts5@gmail.com
nani5@ymail.com