sahana-s08-de-duplicator team mailing list archive
-
sahana-s08-de-duplicator team
-
Mailing list archive
-
Message #00008
Re: Help needed
see inline
On Tue, Nov 23, 2010 at 4:22 PM, Akilandeswari Ramakrishnan <
aramakr@xxxxxxxx> wrote:
> if we pass for each of the field , then we ll have to deal with multiple
> match% per record.. which ll be cumbersome
>
> if we pass the entire object for the JW algo, will it be possible to treat
> the whole record as a string and compare ?
> *(**if u send records as an inputs , **I will be doing the same thing as
> the algorithm works on 2 strings at a time for JW. I am not sure how the
> different table could be handled in that as location, person has different
> columns)*
>
> if not possible , then we ll just send the Firstname* and lastname *only.
> wat u guys say?
>
> *Others, Please provide some inputs on this,
*
>
> On Tue, Nov 23, 2010 at 3:41 PM, Pradnya Kulkarni <
> kulkarni.pradnya@xxxxxxxxx> wrote:
>
>> see inline
>>
>> On Tue, Nov 23, 2010 at 3:26 PM, Akilandeswari Ramakrishnan <
>> aramakr@xxxxxxxx> wrote:
>>
>>> hi pradnya,
>>>
>>> have a question.. .from the 'People deduplicator' point of view ..
>>>
>>> example if ther r 2 records : akila and Akhila(*these are first names
>>> i.e one of the column name and not entire records*), i ll pass each of
>>> these to get the soundex values. soundex values will be the same (a240) as
>>> these are pronounced alike.. . so this is a dupe suspect..
>>>
>> * soundex can be used only for first and last name* *for the rest of
>> the fields we can use JW*. *Thats what i think, if anyone has any other
>> inputs on this please reply*
>>
>>>
>>> Then for the JW algo, I should just pass the 2 strings or the entire 2
>>> records ?
>>>
>> *you can read the record and call this method for each column in DB
>> record with input strings*.* if any other approach for this..please
>> discuss*
>>
>> jaro_winkler(akila, akhila)
>>>
>>>
>> thanks
>>> Akila
>>>
>>>
>>> On Tue, Nov 23, 2010 at 10:51 AM, Pradnya Kulkarni <
>>> kulkarni.pradnya@xxxxxxxxx> wrote:
>>>
>>>> 1] I have implemented jaro winkler algorithm
>>>> inputs - two strings to be compare
>>>> output - distance for strings i.e the decimal value example -
>>>> http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
>>>> method name - jaro_winkler(str1, str2)
>>>>
>>>> 2] for soundex
>>>> inputs - input string such as person name
>>>> output - soundex value
>>>> method name - soundex(name)
>>>>
>>>> to compare two names u can call this function twice and compare the
>>>> return values. if values are same then they are phonetically similar.
>>>>
>>>> you can go ahead and write code and call these methods for now.
>>>>
>>>> Thanks,
>>>> Pradnya
>>>>
>>>>
>>>> On Tue, Nov 23, 2010 at 10:43 AM, Akilandeswari Ramakrishnan <
>>>> aramakr@xxxxxxxx> wrote:
>>>>
>>>>> It would be helpful if you could share how your module works.. after
>>>>> your testing is complete..
>>>>> in the sense what is the input that it expects
>>>>> how will it give the o/p
>>>>> basically Input/output parameters..
>>>>>
>>>>> thnx
>>>>> Akila
>>>>>
>>>>> So that from the controllers.. we would provide the necessary inputs
>>>>> and process the o/p from your module accordingly..
>>>>>
>>>>> On Tue, Nov 23, 2010 at 10:16 AM, Pradnya Kulkarni <
>>>>> kulkarni.pradnya@xxxxxxxxx> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have created a new file in 's3deduplicator.py' in modules and added
>>>>>> functions for algos.
>>>>>> does any one have idea abt how to call methods from modules? and how
>>>>>> to import modules in other file?
>>>>>>
>>>>>> Let me know as I want to test the algo code
>>>>>>
>>>>>> Thanks,
>>>>>> Pradnya
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~sahana-s08-de-duplicator<https://launchpad.net/%7Esahana-s08-de-duplicator>
>>>>>> Post to : sahana-s08-de-duplicator@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~sahana-s08-de-duplicator<https://launchpad.net/%7Esahana-s08-de-duplicator>
>>>>>> More help : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Follow ups
References