commonsense team mailing list archive
-
commonsense team
-
Mailing list archive
-
Message #00111
[Bug 445125] Re: non-normalized concepts exist
_Lots_ of non-normalized concepts exist:
>>> from csc.conceptnet.models import *
>>> from csc.nl import get_nl
>>> en_nl = get_nl('en')
>>> bad_surfaces = []
>>> for text, normalized in SurfaceForm.objects.filter(language='en').order_by().values_list('text', 'concept__text').iterator():
if en_nl.normalize(text) != normalized:
bad_surfaces.append(text)
>>> len(bad_surfaces)
29955
--
non-normalized concepts exist
https://bugs.launchpad.net/bugs/445125
You received this bug notification because you are a member of
Commonsense Computing, which is the registrant for ConceptNet.
Status in ConceptNet API: New
Bug description:
I noticed that some concepts seem to be not normalized:
>>> Concept.get('balls', 'en')
<Concept: <en: balls>>
>>> Concept.get('ball', 'en')
<Concept: <en: ball>>
>>> Concept.get('balls', 'en').surfaceform_set.all()[0]
<SurfaceForm: balls>
>>> Concept.get('balls', 'en').get_assertions().count()
45
Where'd that come from?
References