tamilspellchecker team mailing list archive
-
tamilspellchecker team
-
Mailing list archive
-
Message #00007
Re: code updated to launchpad and mit demo
2009/2/1 Elanjelian Venugopal <tamiliam@xxxxxxxxx>
>
> 2009/2/2 S.Selvam Siva <s.selvamsiva@xxxxxxxxx>
>
>>
>> what we need is one python file calling tamilpsell.py with some tamil text
>> as argument.Though i have little knowledge on Open Office plugin
>> mechnism,adding it to Open Office will require Open Office specific module
>> (pyUNO,i guess).And our first aim need to be to develop a powerful spell
>> cheking engine .So our plugin may not depend on hunspell.
>>
>
> You are the technical person. So, you have to figure that out. As far as I
> know, OOo uses hunspell. So, the files may have to be converted to hunspell
> if it is to be used there. Mozilla, too, has hunspell as the preferred
> spellchecker.
>
> As of now ,i just maintain list of tamil words (one per line) and make
>> comparison to find out miss-spelled words (This is the starting point of our
>> project).
>>
>
> Which is what the existing Tamil checkers do, albeit I think you have a
> more extensive list of words, which is good.
>
>
>> Affix rules seem to be critical part of tamil spell checking which i have
>> not got any clue so far,except that AU-KBC has developed morphlogical
>> analysis and released a software(Acharam.exe wrriten in java).we will be
>> really happy if you can help us on affix rules .
>>
>
> Well, without the affix file, you'd be having a really huge word list with
> probably about a million or so possible words. For example, take the root
> verb 'kodu' (கொடு). Now, we have to identify all the ways it could be
> modified by a suffix. Eg --
>
> கொடு-க்கிறேன்
> கொடு-க்கின்றேன்
> கொடு-த்தேன்
> கொடு-ப்பேன்
> கொடு-க்கின்றேனா
> கொடு-த்தேனா
> கொடு-ப்பேனா
> கொடு-க்கிறான்
> கொடு-க்கின்றான்
> கொடு-த்தான்
> கொடு-ப்பான்
> கொடு-க்கின்றானா
> கொடு-த்தானா
> கொடு-ப்பானா
> etc...
>
> Other root verbs that may function like like kodu would be...
> படு
> எடு
> நடி
> படி
> etc...
>
> So, we may classify above words as, say Class A, and link them to all the
> rules that would be applicable to them. In that way, we could prevent a lot
> of unnecessary repetition.
>
> Do you have a sample affix file from another language? Say spanish or
> french?
>
Currently i dont have any affix file.It seems now we need to concentrate on
this part rather than anything else.
>
>
--
Yours,
S.Selvam
References