Languages spoken in India in a map. Image credit Superprof
I got a few calls about IAMAI’s report – “Number of Local Language Sites in India stands at a Pathetic 1249!”.
There are parts of the report I agree and disagree with. I contacted IAMAI and IMRB about this report, and I must say they have been very supportive and responsive to my concerns (they got in touch with me in less than 15 minutes). I appreciate and thank IAMAI for listening to what I had to say. They have a huge role to play for the language community on the internet. IAMAI asked me to respond to their report.
The report has published the number of sites in each language broken by category. I want to comment on a couple of items only.
Language Blogs
We at Oneindia.in have a blog directory and have been concentrating on individual blog posts instead of the entire blogs. We have a decent number of language blogs (Hindi plus 4 South Indian languages) in our directory and are making every effort to increase the numbers.
IAMAI report | Hindi | Marathi | Tamil | Bengali | Punjabi | Telugu | Kannada | Total |
---|---|---|---|---|---|---|---|---|
Blogs | 394 | 2 | 57 | 57 | 1 | 2 | 4 | 523 |
Blogs Oneindia | Hindi | Tamil | Malayalam | Telugu | Kannada |
---|---|---|---|---|---|
Blogs | 785 | 2,227 | 853 | 295 | 675 |
I hear from people that there are about 4,000+ blogs in Tamil and Marathi each. Hindi is far larger. When the language blog population is sizable why was the IAMAI crawler not able to identify more blogs than what they have reported? Which crawler did they use?
It is important to differentiate the reading and writing community. I strongly believe the language reader user base is far larger than the writing user base (UGC) today and it will change over time.
Mobile
There is no mention of language+mobile in the report. The mobile penetration in India is far higher than the internet penetration. The mobile penetration has seen the highest growth in Tier-II cities, which is predominantly non-English speaking. Oneindia.in recognized this fact in 2008 and started publishing its content on its WAP portal (Oneindia.mobi) and by SMS. While our WAP traffic is still small (thanks to the GPRS speed), our SMS user base has grown very impressively. This shows you have a good size of user base that wants to read language SMS.
The UGC contribution on the mobile may be small now but youngsters would figure out a way to communicate in their primary language on the mobile soon. Eterno was one of the first companies that developed a full fledged SMS suite in Indian languages.
What is missing?
Conclusion
Indian language on the internet is a reality. There is sufficient content out there to keep a surfer busy. There are serious sites, personal sites, blogs, and entertainment sites. The growth in Internet penetration and better search engines would help the growth of languages on the internet.
We live in a world awash with digital transformation, and yet filled with painful and…
It is a well-known fact that branding plays a significant role in the success of…
Lately I have been coming across the frustrations faced in the visa application process. Earlier,…
India has the second-largest online population. The rapid Internet access and mobile proliferation is the…
The digital media landscape has changed tremendously over the last decade. Social media has amplified…
It is not surprising to see advertising will grow in the year 2021. It does…
View Comments
I guess Mahesh is bang on. While Hindi has a large demographics base, the south Indian (all four markets - Tamil, Kannada, Malayalam and Telugu - in varying proportion have higher literacy and internet awareness. Also the Monetization of SI market is perhaps better. Yes, Mobile, local language will be a potent mix to take this market into a new orbit. Its also a chicken and egg situation, if you do not have content, you cannot have readers and if readers are not visible, content creators are reluctant to produce content. Look at the regional language newspapers. How have they grown in the last few years.
Recently, a Telugu newspaper trawled the Net and they annlunced the number of Telugu blogs as the highest among all Indian languages.And Telugu Wiki has more pages than any language other than English. Please check your facts.
Ok..I checked the source again (it is in Telugu !), and it says there are around 1500 active Telugu blogs, which is second only to Tamil.
But in Wiki, Telugu Wikipedia has more articles (42,039) than Hindi (24,500) and Tamil (16,657).
http://meta.wikimedia.org/wiki/Complete_list_of_language_Wikipedias_available
The number of articles show that even though no. of Telugu bloggers is less, they are far more active than the average blogger.And if we look at the Telugu film sites, the monetization factor is quite obvious.
Mahesh,
Here's the link: http://www.andhrajyothy.com/
To paraphrase the last paragraph in the article:
With 41,806 articles, 81,607 pages, and 7,906 users, Telugu Wiki is in the number one position among all Indian languages.The national language Hindi has 24,265 articles though it has 9,449 users.Tamil is in the 6th place(?) with 16,380 pages and 5,993 users.
My point: Wiki is a good example of UGC. Can we assume that atleast 75% of Wiki contributors will have their own blogs (all of them may not be updated frequently)? Personally, I belong to TeluguBlog google group, but I blog in English :-)
Please see http://www.kiruba.com/2009/01/photologue-of-tamil-wikipedia-academy.html which shows how active languages are on Wikipedia
It is a good analysis. I don’t know much about IMAI, but are they genuinely interested in local language development. If so, how come their site is not even in Hindi.
Second, as you noted local languages are not in a competition. They can all improve.
Governments are missing a huge opportunity. Web technologies can make the government more transparent, efficient, and accountable. Not just for filling out the forms, static information but for budget planning and policy development. For instance what if the expense accounts of all the mps are online. What if all the government programs and their cost, benefits, reach, and timeline are online. This is huge, and officially they have to be in local languages. But they are clueless about how to create large data base driven web sites in local languages. It can been done (http://www.tamilmanam.net/). The knowledge base is there. They have to use it.
For all their talk about Tamil, TamilNadu government must be shamed about the state Tamil in their hundreds of websites. 90% are in English. Most db based sites are almost completely in English. Do they not care, or are they ignorant.
Teaching local language typing is the basis for local language computing. Google has provided a Hindi automatic translation engine. It is theoretically possible to have similar engines for other languages. Even if Google does not do it, is not upto the governments to working on such projects.
Ignorance is not excuse, they should ramp up their efforts.
For Tamil informational web sites listings see:
http://ta.wikipedia.org/wiki/WP:Tamil_Websites
In addition there are ezines, and news sites.
There is a sizable blogging community. 2500 seems to be a fair number.
Malayalam wiki is probably number one in terms of quality. Number of articles along is not a good indication. The depth must also be accounted for. A simple technique is to click Random Article for a give number of time and see how many are not place holder or single sentence articles.
Good article. Very interested about Indian languages computing.
Hi Mahesh,
I beg IAMAI to discredit the crawler they used to calculate the number they've reported. Is this FUD at it's worst? The web crawler they're using obviously did not start with Google.com. A simple search for the keyword हिन्दी returns well over 37,400,000!
> Better text editors which have both
> transliteration (English phonetic keyboard)
> and Inscript (common keyboard layout for all > languages)
We are working on a third alternative that is easy to pick up and use called Lipikaar.
- Santosh
Hello Mahesh,
Since thousands of people would be reading text from many sources on the net , please be patient in making clear, clean statistical n correct data ...
Telugu is the largest spoken language in the south, and has more no of blogs n wikipedia articles .... And I appreciate you in quoting the wrong statements done by respective makers nd in pointing the strong n main mistake 'TelEgu'.
For the people who love their mother tongue ...
Loving ur mother tongue should not reflect in criticizing neighbor languages, Learn its not TELEGU or TELUNGU(its there in Olden usage GRANDHIKAMU) now its TELUGU ... Thanks ...
nanihothearts5@gmail.com
nani5@ymail.com