User Tools

Site Tools


correction_of_the_xml-texts
  1. Gap correction tool
    1. see: gap_workflow.html :wiki:archimedes:gap_workflow.html
    2. NB: only working in Safari (Mac browser)
  2. Frequency-sorted morphological “miss” lists
    1. useful for “misses” which occur more than once
    2. Tool: editor
    3. Workflow:
      1. download the chosen text-xml from the repository
      2. choose the same text in Frequency-sorted morphological “miss” lists
      3. find relevant words
      4. copy the raw form, open the relevant text-xml in an editor
      5. look up the copied raw form, replace the form
      6. save the text-xml
      7. parse the xml with XML Validator/SGML Parser
      8. upload the text
    4. using Bbedit one can get a list of all occuring raw forms in the text (Smultron or Jedit don´t have this option)
      1. This may be helpful to compare the number of entities found in the Frequency-sorted morphological “miss” lists and the number found using the editor. This helps a lot to avoid new mistakes.
    5. there is also the option to add new forms to the list of the Formmaker tool
      1. NB: Unfortunately there is no link from the Formmaker tool back to the Frequency-sorted morphological “miss” lists, which slows down the process of correction/supplementation of the morphology. From working experience seen it is much faster to use the Formmaker tool for morphology supplementation
  3. Correction of single mistakes + supplementation of the morphology by the survey of morphosyntactic rules for neologisms and spelling variations
    1. From working experience it is useful and time-relevant to combine these different task steps with each other
    2. Tools:
      1. Editor/Arboreal + Browser/Overviewtool + Formmaker Tool
        1. NB: recommended is to work with two monitors
        2. Browser vs. Overviewtool
          1. Using the Overviewtool instead of the ECHO or Archimedes environment in a browser gives the option to have the image and the text display next to each other. It is recommended for text correction with a lot of single-occuring morphological misses, because one does not spend time to load the images separately.
          2. Unfortunately working with long xml-texts it takes a lot of time to load both the images and the text. According to this case it rather slows down the speed of correction.
          3. NB: ECHO does not display <gap/>
        3. Editor vs. Arboreal
          1. The main difference of doing correction work with either Arboreal or an editor is the possibility to make changes in the xml-structure.
            1. That means, people who are not supposed to or want to prevent themselves from changing the xml-structure should rather use Arboreal
        4. Formmaker vs. Arboreal
          1. The advantage of using Arboreal for the morphological supplementation is that the generated form can be directly send to the relevant server which is doing the morphological analysis
            1. That means the forms of Formmaker have to be added separately.
              1. On the other hand it is sometimes better to work without Arboreal; e.g. if a lot of xml-structure has to be added because of not decoded abrriviations, Formmaker is useful as a separate tool.
    3. Workflow Editor:
      1. download the relevant text from the text-repository
      2. open the file in an editor
      3. open the relevant text in the Overviewtool or in the ECHO/Archimedes environment using a browser
      4. find black colored words (morphological analysed forms appear in ECHO/Archimedes brownish colored, not analysed forms in black)
      5. decide if the form is a neologism/spelling variation or due to false transcription into xml
      6. according to (5.) either add the form to Formmaker or correct the mistake in the editor
      7. save the file after a working session
      8. parse the file with XML Validator/SGML Parser
      9. upload the text
    4. Workflow Arboreal
      1. download the relevant text from the text-repository
      2. open the file in Arboreal
      3. generate IDs.
        1. This used to be done separately either with the Sentence ID Tool http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/archim/sid or Ficus http://archimedes.fas.harvard.edu/docs/ficus.html, but is now possible to do in Arboreal
        2. NB: files which already have s-ids do not need this task step
      4. get a morphological analysis by Donatus and highlight the unanalysed forms
        1. One can also upload a morphology file from last session, if one saved it. This may speed up the work, when one handles large files or Donatus has been shut down.
      5. correct the highlighted term or add unknown vocabulary
      6. send new vocabulary to Donatus
      7. save the file (as well as the morphological analysis from Donatus if needed)
      8. parse the file with XML Validator/SGML Parser
      9. upload the text
correction_of_the_xml-texts.txt · Last modified: 2020/10/10 14:13 by 127.0.0.1