← Back to team overview

sahana-s08-de-duplicator team mailing list archive

Re: Help needed

 

probably I can return you the single value if you send the records as array
of value. so we can reuse JW and soundex implementation.

On Tue, Nov 23, 2010 at 4:44 PM, Pradnya Kulkarni <
kulkarni.pradnya@xxxxxxxxx> wrote:

> see inline
>
> On Tue, Nov 23, 2010 at 4:22 PM, Akilandeswari Ramakrishnan <
> aramakr@xxxxxxxx> wrote:
>
>> if we pass for each of the field , then we ll have to deal with multiple
>> match% per record.. which ll be cumbersome
>>
>> if we pass the entire object for the JW algo, will it be possible to treat
>> the whole record as a string and compare ?
>>  *(**if u send  records as an inputs , **I will be doing the same thing
>> as the algorithm works on 2 strings at a time for JW. I am not sure how the
>> different table could be handled in that as location, person has different
>> columns)*
>>
>
>
>> if not possible , then we ll just send the Firstname* and lastname *only.
>> wat u guys say?
>>
>>    *Others, Please provide some inputs on this,
> *
>
>>
>> On Tue, Nov 23, 2010 at 3:41 PM, Pradnya Kulkarni <
>> kulkarni.pradnya@xxxxxxxxx> wrote:
>>
>>> see inline
>>>
>>> On Tue, Nov 23, 2010 at 3:26 PM, Akilandeswari Ramakrishnan <
>>> aramakr@xxxxxxxx> wrote:
>>>
>>>> hi pradnya,
>>>>
>>>> have a question.. .from the 'People deduplicator' point of view ..
>>>>
>>>> example if ther r 2 records : akila and Akhila(*these are first names
>>>> i.e one of the column name and not entire records*), i ll pass each of
>>>> these to get the soundex values. soundex values will be the same (a240) as
>>>> these are pronounced alike.. . so this is a dupe suspect..
>>>>
>>> *   soundex can be used only for first and last name* *for the rest of
>>> the fields we can use JW*. *Thats what i think, if anyone has any other
>>> inputs on this please reply*
>>>
>>>>
>>>> Then for the JW algo, I should just pass the 2 strings or the entire 2
>>>> records ?
>>>>
>>>     *you can read the record and call this method for each column in DB
>>> record with input strings*.* if any other approach for this..please
>>> discuss*
>>>
>>> jaro_winkler(akila, akhila)
>>>>
>>>>
>>> thanks
>>>> Akila
>>>>
>>>>
>>>> On Tue, Nov 23, 2010 at 10:51 AM, Pradnya Kulkarni <
>>>> kulkarni.pradnya@xxxxxxxxx> wrote:
>>>>
>>>>> 1] I have implemented jaro winkler algorithm
>>>>> inputs - two strings to be compare
>>>>> output - distance for strings i.e the decimal value example -
>>>>> http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
>>>>> method name - jaro_winkler(str1, str2)
>>>>>
>>>>> 2] for soundex
>>>>> inputs - input string such as person name
>>>>> output - soundex value
>>>>> method name - soundex(name)
>>>>>
>>>>> to compare two names u can call this function twice and compare the
>>>>> return values. if values are same then they are phonetically similar.
>>>>>
>>>>> you can go ahead and write code and call these methods for now.
>>>>>
>>>>> Thanks,
>>>>> Pradnya
>>>>>
>>>>>
>>>>> On Tue, Nov 23, 2010 at 10:43 AM, Akilandeswari Ramakrishnan <
>>>>> aramakr@xxxxxxxx> wrote:
>>>>>
>>>>>> It would be helpful if you could share how your module works.. after
>>>>>> your testing is complete..
>>>>>> in the sense what is the input that it expects
>>>>>> how will it give the o/p
>>>>>> basically Input/output parameters..
>>>>>>
>>>>>> thnx
>>>>>> Akila
>>>>>>
>>>>>> So that from the controllers.. we would provide the necessary inputs
>>>>>> and process the o/p from your module accordingly..
>>>>>>
>>>>>> On Tue, Nov 23, 2010 at 10:16 AM, Pradnya Kulkarni <
>>>>>> kulkarni.pradnya@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have created a new file in 's3deduplicator.py' in modules and added
>>>>>>> functions for algos.
>>>>>>> does any one have idea abt how to call methods from modules? and  how
>>>>>>> to import modules in other file?
>>>>>>>
>>>>>>> Let me know as I want to test the algo code
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Pradnya
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~sahana-s08-de-duplicator<https://launchpad.net/%7Esahana-s08-de-duplicator>
>>>>>>> Post to     : sahana-s08-de-duplicator@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~sahana-s08-de-duplicator<https://launchpad.net/%7Esahana-s08-de-duplicator>
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Follow ups

References