User Tools

Site Tools


correction_of_the_xml-texts

Differences

This shows you the differences between two versions of the page.


correction_of_the_xml-texts [2020/10/10 14:13] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +    Gap correction tool 
 + -  see: gap_workflow.html {{:wiki:archimedes:gap_workflow.html|:wiki:archimedes:gap_workflow.html}} 
 + -  NB: only working in Safari (Mac browser) 
 + -  Frequency-sorted morphological "miss" lists 
 + -  useful for "misses" which occur more than once 
 + -  Tool: editor 
 + -  Workflow:  
 +                -  download the chosen text-xml from the repository 
 +                -  choose the same text in Frequency-sorted morphological "miss" lists 
 +                -  find relevant words 
 +                -  copy the raw form, open the relevant text-xml in an editor  
 +                -  look up the copied raw form, replace the form 
 +                -  save the text-xml 
 +                -  parse the xml with XML Validator/SGML Parser 
 +                -  upload the text 
 +          -  using Bbedit one can get a list of all occuring raw forms in the text (Smultron or Jedit don´t have this option) 
 + This may be helpful to compare the number of entities found in the Frequency-sorted morphological "miss" lists and the number found using the editor. This helps a lot to avoid new mistakes. 
 + -  there is also the option to add new forms to the list of the Formmaker tool 
 + -  NB: Unfortunately there is no link from the Formmaker tool back to the Frequency-sorted morphological "miss" lists, which slows down the process of correction/supplementation of the morphology. From working experience seen it is much faster to use the Formmaker tool for morphology supplementation 
 + -  Correction of single mistakes + supplementation of the morphology by the survey of morphosyntactic rules for neologisms and spelling variations  
 + -  From working experience it is useful and time-relevant to combine these different task steps with each other 
 + -  Tools: 
 + -  Editor/Arboreal + Browser/Overviewtool + Formmaker Tool 
 + NB: recommended is to work with two monitors 
 + Browser vs. Overviewtool 
 + Using the Overviewtool instead of the ECHO or Archimedes environment in a browser gives the option to have the image and the text display next to each other. It is recommended for text correction with a lot of single-occuring morphological misses, because one does not spend time to load the images separately.  
 + Unfortunately working with long xml-texts it takes a lot of time to load both the images and the text. According to this case it rather slows down the speed of correction. 
 + NB: ECHO does not display <gap/> 
 + Editor vs. Arboreal 
 + The main difference of doing correction work with either Arboreal or an editor is the possibility to make changes in the xml-structure. 
 + That means, people who are not supposed to or want to prevent themselves from changing the xml-structure should rather use Arboreal 
 + Formmaker vs. Arboreal 
 + The advantage of using Arboreal for the morphological supplementation is that the generated form can be directly send to the relevant server which is doing the morphological analysis 
 + That means the forms of Formmaker have to be added separately. 
 +                   - On the other hand it is sometimes better to work without Arboreal; e.g. if a lot of xml-structure has to be added because of not decoded abrriviations, Formmaker is useful as a separate tool. 
 + -  Workflow Editor:  
 + -  download the relevant text from the text-repository  
 + -  open the file in an editor  
 + -  open the relevant text in the Overviewtool or in the ECHO/Archimedes environment using a browser  
 + -  find black colored words (morphological analysed forms appear in ECHO/Archimedes brownish colored, not analysed forms in black)  
 + -  decide if the form is a neologism/spelling variation or due to false transcription into xml  
 + -  according to (5.) either add the form to Formmaker or correct the mistake in the editor   
 + -   save the file after a working session  
 + -   parse the file with XML Validator/SGML Parser  
 + -   upload the text 
 + -  Workflow Arboreal 
 + -   download the relevant text from the text-repository 
 + -   open the file in Arboreal 
 + -   generate IDs.  
 + This used to be done separately either with the Sentence ID Tool [[http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/archim/sid]] or Ficus [[http://archimedes.fas.harvard.edu/docs/ficus.html]], but is now possible to do in Arboreal 
 + NB: files which already have s-ids do not need this task step 
 + -   get a morphological analysis by Donatus and highlight the unanalysed forms 
 + One can also upload a morphology file from last session, if one saved it. This may speed up the work, when one handles large files or Donatus has been shut down. 
 + -   correct the highlighted term or add unknown vocabulary  
 + -   send new vocabulary to Donatus 
 + -   save the file (as well as the morphological analysis from Donatus if needed) 
 + -   parse the file with XML Validator/SGML Parser  
 + -   upload the text