correction_of_the_xml-texts
Gap correction tool
-
NB: only working in Safari (Mac browser)
Frequency-sorted morphological “miss” lists
useful for “misses” which occur more than once
Tool: editor
Workflow:
download the chosen text-xml from the repository
choose the same text in Frequency-sorted morphological “miss” lists
find relevant words
copy the raw form, open the relevant text-xml in an editor
look up the copied raw form, replace the form
save the text-xml
parse the xml with XML Validator/SGML Parser
upload the text
using Bbedit one can get a list of all occuring raw forms in the text (Smultron or Jedit don´t have this option)
This may be helpful to compare the number of entities found in the Frequency-sorted morphological “miss” lists and the number found using the editor. This helps a lot to avoid new mistakes.
there is also the option to add new forms to the list of the Formmaker tool
NB: Unfortunately there is no link from the Formmaker tool back to the Frequency-sorted morphological “miss” lists, which slows down the process of correction/supplementation of the morphology. From working experience seen it is much faster to use the Formmaker tool for morphology supplementation
Correction of single mistakes + supplementation of the morphology by the survey of morphosyntactic rules for neologisms and spelling variations
From working experience it is useful and time-relevant to combine these different task steps with each other
Tools:
Editor/Arboreal + Browser/Overviewtool + Formmaker Tool
NB: recommended is to work with two monitors
Browser vs. Overviewtool
Using the Overviewtool instead of the ECHO or Archimedes environment in a browser gives the option to have the image and the text display next to each other. It is recommended for text correction with a lot of single-occuring morphological misses, because one does not spend time to load the images separately.
Unfortunately working with long xml-texts it takes a lot of time to load both the images and the text. According to this case it rather slows down the speed of correction.
NB: ECHO does not display <gap/>
Editor vs. Arboreal
The main difference of doing correction work with either Arboreal or an editor is the possibility to make changes in the xml-structure.
That means, people who are not supposed to or want to prevent themselves from changing the xml-structure should rather use Arboreal
Formmaker vs. Arboreal
The advantage of using Arboreal for the morphological supplementation is that the generated form can be directly send to the relevant server which is doing the morphological analysis
That means the forms of Formmaker have to be added separately.
On the other hand it is sometimes better to work without Arboreal; e.g. if a lot of xml-structure has to be added because of not decoded abrriviations, Formmaker is useful as a separate tool.
Workflow Editor:
download the relevant text from the text-repository
open the file in an editor
open the relevant text in the Overviewtool or in the ECHO/Archimedes environment using a browser
find black colored words (morphological analysed forms appear in ECHO/Archimedes brownish colored, not analysed forms in black)
decide if the form is a neologism/spelling variation or due to false transcription into xml
according to (5.) either add the form to Formmaker or correct the mistake in the editor
save the file after a working session
parse the file with XML Validator/SGML Parser
upload the text
Workflow Arboreal
download the relevant text from the text-repository
open the file in Arboreal
generate IDs.
-
NB: files which already have s-ids do not need this task step
get a morphological analysis by Donatus and highlight the unanalysed forms
One can also upload a morphology file from last session, if one saved it. This may speed up the work, when one handles large files or Donatus has been shut down.
correct the highlighted term or add unknown vocabulary
send new vocabulary to Donatus
save the file (as well as the morphological analysis from Donatus if needed)
parse the file with XML Validator/SGML Parser
upload the text
correction_of_the_xml-texts.txt · Last modified: 2020/10/10 14:13 by 127.0.0.1