← Back to team overview

commonsense team mailing list archive

Re: [Bug 445125] Re: non-normalized concepts exist

 

[If you're getting this message and don't actually care about the
internals of ConceptNet, let me know -- I suspect our bug traffic is
bugging more people than necessary.]

I agree that normalization should be able to happen without an active
database -- I use that behavior, actually.

Since we have full control of calling MBLEM, we can maintain a static
list of lemmatization overrides, just as a dict. Unless it turns out
to be very easy to change MBLEM for a special case, the override
approach would give us an easy way to fix anything we notice in the
future also.

Thanks for working on this. I suspect we'll need to reparse at some
point. Let's unit test the parse process before we do that, though ;)

-Ken


On Thu, Oct 8, 2009 at 3:32 PM, Rob Speer <rspeer@xxxxxxx> wrote:
> In working on fixing this, I've stumbled across a suboptimal decision I
> made a while ago. I wanted the concept "people" to not match its
> normalized form, which MBLEM thinks is still "people", singular. So I
> associated the SurfaceForm "people" with the concept "person" manually,
> and I guess I assumed that we'd be using SurfaceForms to do nl
> normalization. It seemed like a good way to override special cases.
>
> This was probably dumb. We want to be able to do normalization without
> having the database at all. I'm going to try to convince MBLEM that
> "people" is foremost the plural of "person".
>
> --
> non-normalized concepts exist
> https://bugs.launchpad.net/bugs/445125
> You received this bug notification because you are a direct subscriber
> of the bug.
>

-- 
non-normalized concepts exist
https://bugs.launchpad.net/bugs/445125
You received this bug notification because you are a member of
Commonsense Computing, which is the registrant for ConceptNet.

Status in ConceptNet API: New

Bug description:
I noticed that some concepts seem to be not normalized:

>>> Concept.get('balls', 'en')
<Concept: <en: balls>>
>>> Concept.get('ball', 'en')
<Concept: <en: ball>>
>>> Concept.get('balls', 'en').surfaceform_set.all()[0]
<SurfaceForm: balls>
>>> Concept.get('balls', 'en').get_assertions().count()
45

Where'd that come from?



References