SIMPLE Editor Technical Documentation
By Preben wik 15 June 2003
This is an overview of some technical issues concerning the creation of a SIMPLE
Database Editor. For information on the editor (manual) please see: Manual.html
Predicative
I feel the most unclear part of the SIMPLE structure is the "Predicate
representation". I have not been able to incorporate it into the editor
as well as i would have liked. It is as of today possible to replace the predicate
of a Semu with an already existing one, but it is not possible to make new ones.
There are several reasons for this. First of all, the creation of Predicates
are quite complicated, and I have not found an easy and intuitive way to make
a GUI for doing it. <To
explain the structure: These are the sgml components of the predicate.>
We could make GUI entries for all this information, but then who would use them?
Secondly, I feel that perhaps using the existing predicates is sufficient. The
Danes have made almost a new predicate for every verb, and I suspect that is
a mistake. There are generic predicates such as for example:
"PRED2hum_food_CRH_1" (divalent, human subject, food oject) with:
arg1 = Human, semanticrolel="Role_ProtoAgent", status="CHECK"
arg2=ArtifactFood, semanticrolel="Role_ProtoPatient", status="DEFAULTCHECK".
Then there are specialised predicates such as:
"PREDbage_CCS_1", (bake) that apart from the title in the pred_id
contains exactly the same information. Details:2PredsSameInfo.txt
I do not understand the reason for doing this, and think that if we were to
make a new Semu "grille" (BBQ) it would make no sense to make a new
"PREDgrille_CCS_1" and it would look funny to give it the predicate
"PREDbage_CCS_1", but it would work just fine to give it the predicate
"PRED2hum_food_CRH_1"
If that is the case, perhaps some cleaning up og the existing predicates is
needed as well?
This and more, are issues where a desicion must be made by someone other than
me.
Selectional restriction = "none" is coded in the data on the surface but not all the way
Hence, the traversal of elements that creates the data structure is broken,
and no arguments show up.
To simply leave it empty is not satisfactory because leaving an empty slot in
the Arg2 field can be seen as saying: "the verb does not take an Arg2"
There are also other args without informarglist. they are all treated as if
their restriction is "none"(they exist in the Predicate table and
the argument table but not in the informarg table) I have compiled a list of
them which is called: "Args
With missing informargl.txt"
The solution used has been to insert an article "informargl= ArgNone"
both in the sgml-files for verb and for nouns:
<InformArg
id="ArgNone"
comment="trick for parsing into a database structure"
status=""
weightvalsemfeaturel="WVSFTemplateNonePROT">
and when parsing the sgml files (InsertPredicateNoun1.pl, InsertPredicateVerb1.pl)check:
if (! $informargl) {
push @informargl, "ArgNone";
}
This is perhaps a mistake though. I don't know what the argument structure should
be for the args contained in
"Args
With missing informargl.txt" Perhaps they should be something else?>
The only way I have found how to tell which argument is ARG1 and which is ARG2
etc. is to look at the arg_id.
There is a pattern that I thought was consistent and that I used to extract
the ArgType. For example ARG1PREDagent_REA_1 is saying its argtype is arg1 and
its semantic type is agent.
However, some ARGs do not follow the pattern of Arg_id=XXXPREDxxxx. For example:
ARG2ændring_CAC_1.
I missed that in the creation of the DB and hence a few of the items in the
argument table do not have an arg_type or have a funny arg_type.
Some have an extra P from the arg_id = fx. ARG2PPREDhum_ACT_1 (misspelling?)
Some have unknown types like
"ATR"(the attributive object of the predicate ):ATRPREDnone_SPE_1
"ASSOC (the associate argument of the predicate )"
"APP (the appositional complement of the predicate)".
arg_ASSOCPREDhuman_CNV_1 makes argtype: "arg_ASSOC",
ARG2EPREDmoney_TRA_1 makes argtype: "ARG2E"
etc.
These I don't know how to treat.
CHECK, DEFAULTCHECK and SHADOW are items of the InformArg element, that says if the argument is obligatory, default or implicit. This information as well as Semantic role information is not shown in the editor. If someone find it to be useful information, it could be added.
Qualia:
searching in the current implementation requires the user to edit the string
selected from the menubutton. In most cases, add SR to the front, and remove
any underscores from the word.
Polysemy inconsistency:
The Polysemy list is incomplete compared with actual data found in the Sgml
structure.
I have found some, but not many in other qualia relations (but there might several
there as well). ex. SREntail = constitutive found in the sgml file but not in
the guidelines.
All qualia relation names, as well as the constitutive features should be extracted
and compared with the lists taken from the FinalGuidelines.doc If additional
items are found in the sgml data they should perhaps be added to the list?
Some qualia relations have been given the Type "unknown!!" and are
therefore not showing up in the database.
In the Sgml files the qualia relations are coded as <RSemu elements. For
example
<RSemU
id="SRArtifactualagentive"
naming="Artifactualagentive"
example=" "
comment="Formal node in the hierarchy"
isal="SRAgentive"
type="PARADIGMATIC">
This says that "SRArtifactualagentive" is of the type "SRAgentive"
(from the isal=). Although I have not found it mentioned in the guidelines there
seem to be a hierarchy in some of these relations as well. For example:"SRRelatedto"
is of type "SRArtifactualagentive"
<RSemU
id="SRRelatedto"
naming="Relatedto"
example=""
comment=""
isal="SRArtifactualagentive"
type="PARADIGMATIC">
While parsing the sgml-files to create the database structure, the types have been identified by using a recursive procedure (sub isPartOf) in the "CreateRwvSemuNounQuery.pl" and "CreateRwvSemuVerbQuery1.pl" files. If it has not found a top item "unknown!!" has been inserted. ( id="SRFormal" has a isal="SRTop", and so does Telic, Agentive a.s.o)
It turns out
<RSemUs
id="SRHasasproperty",id="SRMeasuredby", id="SRMetaphor",
id="SRSynonym",and id="SRQuantifies" do not have an isal=
entry!
50 some entries in the database must be edited manually for these relations
to show up. (I suspect this would affect the people using the sgml data directly
as well)
Double Semu_id
Although a Semu_id is supposed to be a unique identifier,
this is not always the case. I have compiled a list of Danish doubles: DoubleDanishSemu_Ids.txt.
In additon to this there are several, I don't know how many, double Norwegian
Semu_ids.
In the initial Sgml files, a Norwegian translation and example were added inside
the existing Danish Semu, like this:
<SemU
id="USEM_V_bevæge_sig_MOV_1"
idN=""
naming="bevæge"
namingN="bevege"
example="mens vi venter - stort set forgæves - på , at poeten
skal bevæge sig hen for at åbne vinduet "
exampleN="Kartha, som er tidligere jagerflyger, sa at det kasakhstanske
flyet ser ut til å ha beveget seg vekk fra den oppgitte kursen"
comment="full BC 201043075 BSP"
commentN=""
....>
In the database implementation the two were to be separated (but still
linked together with the fields LinkNorsk and LinkDansk) and a Semu_id was created
automatically using the algorithm "take the original danish Semu_id, remove
the Danish naming part, and replace it with Norwegian naming". I discovered
later that it had the unexpected side effect of creating double Semu_ids in
some cases. Sometimes two Danish Semus have been given the same Norwegian translation.
for example: Danish id="USEM_N_Atlanten_3DL_1" and id="USEM_N_Atlanterhavet_3DL_1",
were both translated into "Atlanterhavet" and thus given the Norwegian
Semu_id "N_USEM_N_Atlanterhavet_3DL_1". The same goes for Danish Tyveknægt
and Tyv, Datamaskin, Svømmebaseng, hav etc. This should be sorted out
as it creates some havoc in the data structure. (doubles of qualia etc.)
Ideally I feel the Semu_id should perhaps be "depreciated" and replaced with a proper auto incremented integer ID anyway.
Norwegian Dummies
Dummies are Semus with incomplete, or no semantic information in them. They are created to have something for the qualia relations to point to if they are referring to a non existent Semu. There will always be during the creation of a network, an outer rim that points to nothing, a dead end, or - a dummy. Say for example you are creating the Semu "Hotel" and when filling out the qualia information "Isa <Building> find that Building is not yet in the dictionary. Instead of first making a complete Semu "Building" and finding when you come to "Isa...that you must fill out yet another Semu "Location" etc., The solution used has been to make a dummy with name and possibly some more information, but with no pointers further.
Norwegian dummies has been added (translated) automatically, although many
did not have a Norwegian translation. This means they are clones of the Danish
dummies, with a Danish Naming, but with the language tag and Semu_id changed
(given the prefix ND_). The reason for doing this is to try to get a complete
and closed Norwegian semantic network first, and then let someone at a later
stage change name and add information to the dummies.
Semus With Wrong Word Class
The Wordclass field was generated automatically, from the naive presupposition that any semu in the verb_file.sgml was a verb etc. This is not true however, and several semus (particularily dummies) are labeled with the wrong wordclass.Wrong Linking
Although an attempt has been made to relink all Norwegian qualia relations that initially pointed to Danish Semus, this has only been successful in the cases were a Norwegian translation has been made. Hence a number of links are still pointing from the Norwegian database over to the Danish. Look at for example: slakter - isa "person" , Pattegris - isa "gris"
I have seen some examples of qualia relations that are looping For example:
dataskjerm isa skjerm - isa gjenstand isa enhet isa gjenstand...isa enhet...
Or: Bygning isa bygning...Perhaps a proc that checks for loops would be good
to straighten up things?
Various Yet To Do: (Comm: most has been done as of dec 03)
This list is of course incomplete, but I will mention here some of the things I see that would improve the Editor, and that there has not been time to do. It is not a top to bottom prioritized list, but the things at the very top are things that I feel needs to be done in order to have a useful editor.An big job is to translate the remaining 4000 Semus from Danish to Norwegian. A way to get all Danish Semus that are not translated yet into the results list would help in this process. Suggestions: collect all Semus that has no LinkNorsk and Language = Dansk. perform automatic cloning to Norwegian and add a statement "Dummy" in the comments field. This way a lexicographer only needs to change the Naming field in the existing Semu. Alternatively, A button in the Search Tab that adds the statement "AND that has no linkNorsk" to the SQL query. That way all the search capabilities can still be applied, with the additional button added for this temporary job.
Bokmålsorboka. There are many Semus that only has one option in the bokmåls definition table. They could be inserted automatically. Small job for a programmer - big job for a lexicographer?
What about the semus that are not mentioned in BO?
There Should be a way to manually insert word definitions. But what then
about the links to BO?
suggestion: create an extra field in the Bokmåls table "DEFINISJONSDEL"
that is blank (NULL) unless the definition is created for SIMPLE. Perhaps
also a field for why. I can see two reasons: the word does not exist in
bokmålsordboka, or none of the definitions on that word fits with
the word sense described in the SIMPLE lexicon.
"sub populate" not complete yet. "%" on arguments doesn't work for example. To make things worse, the list of arguments is only available from "edit pred.rep dialog" - awkward. Some other fields in the Search Tab not searchable yet.(date,linkNorsk,LinkDansk) Tedious (but easy) job to do.
There is no way to edit the LinkNorsk elements! Must be available.
The "Sort" procedure for the results list is not done. By pressing the buttons in the header of the list the items in the list should be sorted by that category.
Update changes in the results_list. If you change for example Wordclass on a Semu, the change will not appear in the results list until next time it appears. SaveSemu mustdelete the item from the resultsList, and then insert it again (preferrably on the same row)
Qualia edit:MakeNewDummy should automatically show up in the search list, and be selected so that it's Semu_id will appear in the Entry field - ready for the user to press "Add" (That is the most likely reason why someone would want to make a new dummy Semu) At the moment one must make a new search for it to show up.
Delete Semu: "Foreign key Delete check". When you delete a Semu it could be that some other Semu has a link to the one you are deleting. There is no check or protection against this at the moment. I think all it takes is the same thing as what the "ShowAllLinksToThis" proc is doing: (search the RWeightValSemu table for the Semu_id in the Target field).
Add the missing items in the qualia menubuttons (see the qualia note)
Remove all the "SR" in front of the Semr relations in the RWeightValSemu Table? (see the qualia note)
Export list/ Save list as file: does not give you an option to decide which fields should be part of the list.
Bind stuff: There is quite a lot of <Bind> keyboard shortcuts that could be improved. Tab between NoteBook pages, focusing on entries/menubuttons, opening of menubuttons, etc.
Search for "NOT xxx" possibility in the search Tab would be nice...
Cascade MenuButtons should auto-popup on the same entry that is written in the entry Field. a lot of tedious opening up of menubutton pages could be saved.
Accelerator menubuttons. The numbers to the right of the names in the cascading menubuttons are showing the place the item holds in the hierarchy. The idea was to be able to use them as keyboard shortcuts for fast access to common places in the menubutton structure. - Never got around to make it work.
Have the anchor (gray line) of selected item in the results list remain selected after "Enter" is hit (to open qualia edit window etc.)
PredDialog: To be able to make new predicates, and get more information from the pred.Rep/Selectional restrictions. There is more in the sgml that what is visible in the editor. for ex. Semantic role, and Check -Defaultcheck info. (see the predicate note)
Export back to sgml. There is currently no way to get the stuff back out on an sgml form again.
<bind>Alt+back-arrow like in a Web browser back one step, last edited Semu in the results list. Happens often you are editing something, finds a link you need to do something with, and then you wish to return to what you were editing before. Now you're lost.
Robustness: If packages not installed, no warning, things just won't work.
Hourglass -busy signal- when it's busy. Sometimes hard to tell in a search, whether the machine is working or is ready -found nothing.
Make table structure of content files (appendixA-F) for menubuttons, rework the menubuttons?
There is no List for UnificationPath, which means there is no way to edit the UnificationPath (but who will?) I belive it is possible to extract a list from the back part of the templates.txt file that makes the type list. But to be honest I'm not really sure what it should look like.
Semu_id should end up "depreciated" Links to Id instead. Faster to search, automatically incremented.
Log table for Update, delete and create Semu changes being made in the database.
Known "bugs"
"Edit" from Right-clicking qualia in the edit Tab doesn't work if
the qualia relation Semu is already in the results list.
Hlist will not place a duplicate in the list, and so an error occurs before
showTemplate kicks in.
The same goes for "showAllLinksToThis". Clear Results List first, otherwise it is sometimes hard to see if any results come up. Needs additional proc to check if item is already in the results list.
Pressing Ctrl+s (save) when in a Text widged adds a funny square in the Text field. Must "unbind" ctrl+s from the Text widget. One should also unbind 'Tab' from the text widgets, so it does not interfear with traversal between widgets. At this time only entry Naming, example and comment are text widgets. Naming and comment could just as well be Entry widgets since the information never will span more than one line.
If you want to make a new Norwegian Semu based on a Danish Semu, "MakeNewSemuBasedOnThis" on a Danish Semu, will create a Norwegian Semu but with Danish Qualia links.