User Tools

Site Tools


archimedes_workflows

Archimedes Workflows

Documentation from Harvard (by M. Hyman/M. Schiefsky):

Documentation from MPIWG (text by St. Trzeciok; additional files by B. Fuchs):

Doku by B. Fuchs

- Workflow Archimedes

  1. Metadata
    1. Upload the text in: Filemaker IT server, archimedes_authors
  2. Producing images
    1. see documentation of the library
  3. Transcription and producing the of text-xmls
    1. see transcr.pdf transcr.pdf
      1. example list for entities used in archimedes: entities.html :wiki:archimedes:entities.html
  4. Synchronisation text-xmls and images
  5. Production of thumbnails
  6. Production of cut-outs (cut-outs are drawings or similar illustrations on the images, which are tagged inside the text-xmls)
    1. use cutout-tool
    2. Production: see workflow_online_cutout.pdf :wiki:archimedes:workflow_online_cutout.pdf
    3. Postproduction: see cd_cutout_postprocess.txt :wiki:archimedes:cd_cutout_postprocess.txt and online_cutout_postcutout.txt :wiki:archimedes:online_cutout_postcutout.txt
  7. Correction of the text-xmls
    1. Gap correction tool
      1. see: gap_workflow.html :wiki:archimedes:gap_workflow.html
      2. NB: only working in Safari (Mac browser)
    2. Frequency-sorted morphological “miss” lists
      1. useful for “misses” which occur more than once
      2. Tool: editor
      3. Workflow:
        1. download the chosen text-xml from the repository
        2. choose the same text in Frequency-sorted morphological “miss” lists
        3. find relevant words
        4. copy the raw form, open the relevant text-xml in an editor
        5. look up the copied raw form, replace the form
        6. save the text-xml
        7. parse the xml with XML Validator/SGML Parser
        8. upload the text
      4. using Bbedit one can get a list of all occuring raw forms in the text (Smultron or Jedit don´t have this option)
        1. This may be helpful to compare the number of entities found in the Frequency-sorted morphological “miss” lists and the number found using the editor. This helps a lot to avoid new mistakes.
      5. there is also the option to add new forms to the list of the Formmaker tool
        1. NB: Unfortunately there is no link from the Formmaker tool back to the Frequency-sorted morphological “miss” lists, which slows down the process of correction/supplementation of the morphology. From working experience seen it is much faster to use the Formmaker tool for morphology supplementation
    3. Correction of single mistakes + supplementation of the morphology by the survey of morphosyntactic rules for neologisms and spelling variations
      1. From working experience it is useful and time-relevant to combine these different task steps with each other
      2. Tools:
        1. Editor/Arboreal + Browser/Overviewtool + Formmaker Tool
          1. NB: recommended is to work with two monitors
          2. Browser vs. Overviewtool
            1. Using the Overviewtool instead of the ECHO or Archimedes environment in a browser gives the option to have the image and the text display next to each other. It is recommended for text correction with a lot of single-occuring morphological misses, because one does not spend time to load the images separately.
            2. Unfortunately working with long xml-texts it takes a lot of time to load both the images and the text. According to this case it rather slows down the speed of correction.
            3. NB: ECHO does not display <gap/>
          3. Editor vs. Arboreal
            1. The main difference of doing correction work with either Arboreal or an editor is the possibility to make changes in the xml-structure.
              1. That means, people who are not supposed to or want to prevent themselves from changing the xml-structure should rather use Arboreal
          4. Formmaker vs. Arboreal
            1. The advantage of using Arboreal for the morphological supplementation is that the generated form can be directly send to the relevant server which is doing the morphological analysis
              1. That means the forms of Formmaker have to be added separately.
                1. On the other hand it is sometimes better to work without Arboreal; e.g. if a lot of xml-structure has to be added because of not decoded abrriviations, Formmaker is useful as a separate tool.
      3. Workflow Editor:
        1. download the relevant text from the text-repository
        2. open the file in an editor
        3. open the relevant text in the Overviewtool or in the ECHO/Archimedes environment using a browser
        4. find black colored words (morphological analysed forms appear in ECHO/Archimedes brownish colored, not analysed forms in black)
        5. decide if the form is a neologism/spelling variation or due to false transcription into xml
        6. according to (5.) either add the form to Formmaker or correct the mistake in the editor
        7. save the file after a working session
        8. parse the file with XML Validator/SGML Parser
        9. upload the text
      4. Workflow Arboreal
        1. download the relevant text from the text-repository
        2. open the file in Arboreal
        3. generate IDs.
          1. This used to be done separately either with the Sentence ID Tool http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/archim/sid or Ficus http://archimedes.fas.harvard.edu/docs/ficus.html, but is now possible to do in Arboreal
          2. NB: files which already have s-ids do not need this task step
        4. get a morphological analysis by Donatus and highlight the unanalysed forms
          1. One can also upload a morphology file from last session, if one saved it. This may speed up the work, when one handles large files or Donatus has been shut down.
        5. correct the highlighted term or add unknown vocabulary
        6. send new vocabulary to Donatus
        7. save the file (as well as the morphological analysis from Donatus if needed)
        8. parse the file with XML Validator/SGML Parser
        9. upload the text
  8. Producing parallel texts
    1. NB: Parallel text are not displayable on Archimedes, but in the ECHO-environment
    1. Lemmatized Corpus Search
      1. see Revision History.pdf :wiki:archimedes:revision_history.pdf
    2. Dictionary Lookup Tool, Dictionary Headword Access
    3. XML Validator, SGML Parser
    4. Corpus Language Statistics
      1. used to display corpus word counts by language
    5. xpath access ???
    6. Working Group Home Page
      1. GForge has been used for documentation and to coordinate tasks

Local Text administration (by St. Trzeciok)

  1. By command line
    1. Prerequisites
      1. OS X or any other UNIX based system
      2. User-account and Password for the Archimedes repository (provided by the IT-department)
    2. Establishing a local text repository on the desktop of your computer
      1. Create a folder on your desktop named e.g. sources
      2. Open the program Terminal (a new shell window should appear on your screen)
      3. Change the directory to the new local text repository folder. Type: cd Desktop/sources and press enter
      4. Adjust the network protocol. Type: export CVS_RSH=ssh and press enter
      5. Download the texts from the permanent text repository. Type: cvs -d :ext:username@archimedes.mpiwg-berlin.mpg.de:/archimedes/cvsroot co texts/archimedes/xml and press Enter
      6. Type in your password and press enter
      7. The texts will be in the subdirectory sources/texts/archimedes/xml
    3. Adding a new text to the permanent text repository
      1. Open the program Terminal (a new shell window should appear on your screen)
      2. Change the directory to the local text repository folder. Type: cd Desktop/sources and press enter
      3. Adjust the network protocol. Type: export CVS_RSH=ssh and press enter
      4. Type: cvs -d :ext:username@archimedes.mpiwg-berlin.mpg.de:/archimedes/cvsroot add filename.xml
      5. Type in your password and press enter
      6. Press i (=insert) to add a comment
      7. Press esc after the completion of your comment and type: :wq (=write and quit)
      8. The new file will be uploaded into the permanent text
    4. Adding changed texts to the permanent text repository
      1. Open the program Terminal (a new shell window should appear on your screen)
      2. Change the directory to the local text repository folder. Type: cd desktop/sources and press enter
      3. Adjust the network protocol. Type: export CVS_RSH=ssh and press enter
      4. Type: cvs -d :ext:username@archimedes.mpiwg-berlin.mpg.de:/archimedes/cvsroot commit texts/archimedes/xml
      5. Type in your password and press enter
      6. Press i (=insert) to add a comment, what you have done with the files (!very important!)
      7. Press esc after the completion of your comment and type: :wq (=write and quit)
      8. The changed files will be uploaded into the permanent text repository
    5. Refreshing your local text repository
      1. Open the program Terminal (a new shell window should appear on your screen)
      2. Change the directory to the local text repository folder. Type: cd Desktop/sources and press enter
      3. Adjust the network protocol. Type: export CVS_RSH=ssh and press enter
      4. cvs -d :ext:username@archimedes.mpiwg-berlin.mpg.de:/archimedes/cvsroot up
      5. Type in your password and press enter
      6. Your local repository will be updated from the permanent text repository
archimedes_workflows.txt · Last modified: 2008/09/09 19:56 (external edit)