← Back to team overview

sahana-s08-de-duplicator team mailing list archive

Re: Help needed

 

if we pass for each of the field , then we ll have to deal with multiple
match% per record.. which ll be cumbersome

if we pass the entire object for the JW algo, will it be possible to treat
the whole record as a string and compare ?

if not possible , then we ll just send the Firstname only. wat u guys say?

On Tue, Nov 23, 2010 at 3:41 PM, Pradnya Kulkarni <
kulkarni.pradnya@xxxxxxxxx> wrote:

> see inline
>
> On Tue, Nov 23, 2010 at 3:26 PM, Akilandeswari Ramakrishnan <
> aramakr@xxxxxxxx> wrote:
>
>> hi pradnya,
>>
>> have a question.. .from the 'People deduplicator' point of view ..
>>
>> example if ther r 2 records : akila and Akhila(*these are first names i.e
>> one of the column name and not entire records*), i ll pass each of these
>> to get the soundex values. soundex values will be the same (a240) as these
>> are pronounced alike.. . so this is a dupe suspect..
>>
> *   soundex can be used only for first and last name* *for the rest of the
> fields we can use JW*. *Thats what i think, if anyone has any other inputs
> on this please reply*
>
>>
>> Then for the JW algo, I should just pass the 2 strings or the entire 2
>> records ?
>>
>     *you can read the record and call this method for each column in DB
> record with input strings*.* if any other approach for this..please
> discuss*
>
> jaro_winkler(akila, akhila)
>>
>>
> thanks
>> Akila
>>
>>
>> On Tue, Nov 23, 2010 at 10:51 AM, Pradnya Kulkarni <
>> kulkarni.pradnya@xxxxxxxxx> wrote:
>>
>>> 1] I have implemented jaro winkler algorithm
>>> inputs - two strings to be compare
>>> output - distance for strings i.e the decimal value example -
>>> http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
>>> method name - jaro_winkler(str1, str2)
>>>
>>> 2] for soundex
>>> inputs - input string such as person name
>>> output - soundex value
>>> method name - soundex(name)
>>>
>>> to compare two names u can call this function twice and compare the
>>> return values. if values are same then they are phonetically similar.
>>>
>>> you can go ahead and write code and call these methods for now.
>>>
>>> Thanks,
>>> Pradnya
>>>
>>>
>>> On Tue, Nov 23, 2010 at 10:43 AM, Akilandeswari Ramakrishnan <
>>> aramakr@xxxxxxxx> wrote:
>>>
>>>> It would be helpful if you could share how your module works.. after
>>>> your testing is complete..
>>>> in the sense what is the input that it expects
>>>> how will it give the o/p
>>>> basically Input/output parameters..
>>>>
>>>> thnx
>>>> Akila
>>>>
>>>> So that from the controllers.. we would provide the necessary inputs and
>>>> process the o/p from your module accordingly..
>>>>
>>>> On Tue, Nov 23, 2010 at 10:16 AM, Pradnya Kulkarni <
>>>> kulkarni.pradnya@xxxxxxxxx> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have created a new file in 's3deduplicator.py' in modules and added
>>>>> functions for algos.
>>>>> does any one have idea abt how to call methods from modules? and  how
>>>>> to import modules in other file?
>>>>>
>>>>> Let me know as I want to test the algo code
>>>>>
>>>>> Thanks,
>>>>> Pradnya
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~sahana-s08-de-duplicator<https://launchpad.net/%7Esahana-s08-de-duplicator>
>>>>> Post to     : sahana-s08-de-duplicator@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~sahana-s08-de-duplicator<https://launchpad.net/%7Esahana-s08-de-duplicator>
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>
>>>
>>
>

Follow ups

References