← Back to team overview

commonsense team mailing list archive

[Bug 445125] Re: non-normalized concepts exist

 

_Lots_ of non-normalized concepts exist:

>>> from csc.conceptnet.models import *
>>> from csc.nl import get_nl
>>> en_nl = get_nl('en')
>>> bad_surfaces = []
>>> for text, normalized in SurfaceForm.objects.filter(language='en').order_by().values_list('text', 'concept__text').iterator():
        if en_nl.normalize(text) != normalized:
            bad_surfaces.append(text)
>>> len(bad_surfaces)
29955

-- 
non-normalized concepts exist
https://bugs.launchpad.net/bugs/445125
You received this bug notification because you are a member of
Commonsense Computing, which is the registrant for ConceptNet.

Status in ConceptNet API: New

Bug description:
I noticed that some concepts seem to be not normalized:

>>> Concept.get('balls', 'en')
<Concept: <en: balls>>
>>> Concept.get('ball', 'en')
<Concept: <en: ball>>
>>> Concept.get('balls', 'en').surfaceform_set.all()[0]
<SurfaceForm: balls>
>>> Concept.get('balls', 'en').get_assertions().count()
45

Where'd that come from?



References