Journal for novamente - Novamente work



really start on the most direct part of cerego work.
Started: 12/17/2007 14:31  Last Active: 12/17/2007 21:37


Problems:
FeatureNode doesn't seem to store the original word ?? i.e. plurals are made singular, and future/past are made infinitive. Thus LinkableView.getWordString() returns the infinitive form.

Reading this code, I get the impression that the original authors didn't know lisp. There seems to be a fair amount of effort expended in re-invneting basic lisp concepts, but with unusual naming conventions. For example, "FeatureNode" seems to simply be a key-value pair or a frames-n-slots node. A cleaner idea would have eliminated the need for forceFeatures() and forceValues(), and the need to throw exceptions whenever it is accessed incorrectly.

The design of FeatureNode is certainly not orthogonal; its mixing multiple concepts in an inelegent way. I'm sort of disappointed, this should be generic code, and not relex-specific.

Notable in thier absence: foreach functions -- there seems to be a lot of use of iterators in the relex code, and iterators are know non-thread-safe. As a general rule, foreach is a better programming paradigm!

Fixing 51 compile-time warnings. Almost all of these involve iterators, which shouldn't be used anyway ...

Maybe I'm being too negative; clearly there's a lot of helper-class work surrounding FeatureNode, which no generic class would ever have. Still, its a shame that the helper classes are built on a flawed foundation.



Read about UIMA
Started: 12/17/2007 11:38  Last Active: 12/17/2007 14:31


Read about UIMA and about the "analysis engine"


Raw vs. digested output
Started: 12/17/2007 10:30  Last Active: 12/17/2007 11:38


Murilo:

The first set of results are the dependency relations (the final output of Relex). The second set of results you sent is what we call "raw relex output" - a graph of FeatureNodes produced by Relex from where we extract the dependency relations.

You can change the boolean constants in the class RelexEngine to decide what you want from the command-line version.

parse.getRawRelexOutput()
Ah ha!
uima/comp/text/ae/RelexInfoAnnotator.java:              parse.setRawRelexOutput(pSent.printZHeads());
OK, so that's where the raw output comes from. Where does the digested output come from?


Orienting
Started: 12/17/2007 7:53  Last Active: 12/17/2007 10:30


Crazy broken parses:
Physical limits to transistor miniaturization and performance are being approached.
====
Parse 1 of 3
====

[head [name <>
       tense <>
       links [_obj $0[name <>
                      links [to [name <>]]
                      noun_number <>]]]
 background [name <>
             links [_subj-a $0]]]

======



    +-----------------------------------------------------Xp----------------------------------------------------+
    +---------Wd---------+-----------------------------Spx-----------------------------+                        |
    |          +----A----+--Mp-+----------------------Jp---------------------+         +-Pg*b-+----Pv----+      |
    |          |         |     |                                             |         |      |          |      |
LEFT-WALL physical.a limits.n to transistor.n miniaturization[?].n and performance.n are.v being.v approached.v . 


Physical limits to transistor miniaturization and performance are being approached.
====
Parse 2 of 3
====

[head [name <>
       tense <>
       links [_obj $0[name <>
                      noun_number <>]]]
 background [name <>
             links [_subj-a $0]]]

======



    +-----------------------------------------------------Xp----------------------------------------------------+
    +-----------------------------------Wd-----------------------------------+                                  |
    |          +------------------------------A------------------------------+---Spx---+-Pg*b-+----Pv----+      |
    |          |                                                             |         |      |          |      |
LEFT-WALL physical.a limits.n to transistor.n miniaturization[?].n and performance.n are.v being.v approached.v . 


Physical limits to transistor miniaturization and performance are being approached.
====
Parse 3 of 3
====

[head [name <>
       tense <>
       links [_obj [name <>
                    noun_number <>]]]]

======



    +-----------------------------------------------------Xp----------------------------------------------------+
    +-----------------------------------Wd-----------------------------------+---Spx---+-Pg*b-+----Pv----+      |
    |                                                                        |         |      |          |      |
LEFT-WALL physical.a limits.n to transistor.n miniaturization[?].n and performance.n are.v being.v approached.v . 


Brought to you by ...