User Tools

Site Tools


correction_of_the_xml-texts

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

correction_of_the_xml-texts [2007/04/16 16:19] (current)
Line 1: Line 1:
 +   ​- ​ Gap correction tool 
 + -  see: gap_workflow.html {{:​wiki:​archimedes:​gap_workflow.html|:​wiki:​archimedes:​gap_workflow.html}} 
 + -  NB: only working in Safari (Mac browser) 
 + -  Frequency-sorted morphological "​miss"​ lists 
 + -  useful for "​misses"​ which occur more than once 
 + -  Tool: editor 
 + -  Workflow:  
 +                -  download the chosen text-xml from the repository 
 +                -  choose the same text in Frequency-sorted morphological "​miss"​ lists 
 +                -  find relevant words 
 +                -  copy the raw form, open the relevant text-xml in an editor  
 +                -  look up the copied raw form, replace the form 
 +                -  save the text-xml 
 +                -  parse the xml with XML Validator/​SGML Parser 
 +                -  upload the text 
 +          -  using Bbedit one can get a list of all occuring raw forms in the text (Smultron or Jedit don´t have this option) 
 + - ​ This may be helpful to compare the number of entities found in the Frequency-sorted morphological "​miss"​ lists and the number found using the editor. This helps a lot to avoid new mistakes. 
 + -  there is also the option to add new forms to the list of the Formmaker tool 
 + -  NB: Unfortunately there is no link from the Formmaker tool back to the Frequency-sorted morphological "​miss"​ lists, which slows down the process of correction/​supplementation of the morphology. From working experience seen it is much faster to use the Formmaker tool for morphology supplementation 
 + -  Correction of single mistakes + supplementation of the morphology by the survey of morphosyntactic rules for neologisms and spelling variations  
 + -  From working experience it is useful and time-relevant to combine these different task steps with each other 
 + -  Tools: 
 + -  Editor/​Arboreal + Browser/​Overviewtool + Formmaker Tool 
 + - ​ NB: recommended is to work with two monitors 
 + - ​ Browser vs. Overviewtool 
 + - ​ Using the Overviewtool instead of the ECHO or Archimedes environment in a browser gives the option to have the image and the text display next to each other. It is recommended for text correction with a lot of single-occuring morphological misses, because one does not spend time to load the images separately.  
 + - ​ Unfortunately working with long xml-texts it takes a lot of time to load both the images and the text. According to this case it rather slows down the speed of correction. 
 + - ​ NB: ECHO does not display <​gap/>​ 
 + - ​ Editor vs. Arboreal 
 + - ​ The main difference of doing correction work with either Arboreal or an editor is the possibility to make changes in the xml-structure. 
 + - ​ That means, people who are not supposed to or want to prevent themselves from changing the xml-structure should rather use Arboreal 
 + - ​ Formmaker vs. Arboreal 
 + - ​ The advantage of using Arboreal for the morphological supplementation is that the generated form can be directly send to the relevant server which is doing the morphological analysis 
 + - ​ That means the forms of Formmaker have to be added separately. 
 +                   - On the other hand it is sometimes better to work without Arboreal; e.g. if a lot of xml-structure has to be added because of not decoded abrriviations,​ Formmaker is useful as a separate tool. 
 + -  Workflow Editor:  
 + -  download the relevant text from the text-repository  
 + -  open the file in an editor  
 + -  open the relevant text in the Overviewtool or in the ECHO/​Archimedes environment using a browser  
 + -  find black colored words (morphological analysed forms appear in ECHO/​Archimedes brownish colored, not analysed forms in black)  
 + -  decide if the form is a neologism/​spelling variation or due to false transcription into xml  
 + -  according to (5.) either add the form to Formmaker or correct the mistake in the editor ​  
 + -   save the file after a working session  
 + -   parse the file with XML Validator/​SGML Parser  
 + -   ​upload the text 
 + -  Workflow Arboreal 
 + -   ​download the relevant text from the text-repository 
 + -   open the file in Arboreal 
 + -   ​generate IDs.  
 + - ​ This used to be done separately either with the Sentence ID Tool [[http://​archimedes.mpiwg-berlin.mpg.de/​cgi-bin/​archim/​sid]] or Ficus [[http://​archimedes.fas.harvard.edu/​docs/​ficus.html]],​ but is now possible to do in Arboreal 
 + - ​ NB: files which already have s-ids do not need this task step 
 + -   get a morphological analysis by Donatus and highlight the unanalysed forms 
 + - ​ One can also upload a morphology file from last session, if one saved it. This may speed up the work, when one handles large files or Donatus has been shut down. 
 + -   ​correct the highlighted term or add unknown vocabulary  
 + -   send new vocabulary to Donatus 
 + -   save the file (as well as the morphological analysis from Donatus if needed) 
 + -   parse the file with XML Validator/​SGML Parser  
 + -   ​upload the text
correction_of_the_xml-texts.txt · Last modified: 2007/04/16 16:19 (external edit)