User Tools

Site Tools


archimedes_workflows

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

archimedes_workflows [2008/09/09 19:56] (current)
Line 1: Line 1:
 +====== Archimedes Workflows ======
 +
 +==== Documentation from Harvard (by M. Hyman/M. Schiefsky): ====
 +
 +
 +  * Archimedes Project Repository: Working Draft 4: http://​archimedes.fas.harvard.edu/​docs/​repository/​
 +  * Archimedes Bundle Structure: http://​archimedes.fas.harvard.edu/​docs/​bundle/​
 +  * The Donatus XML-RPC Interface: http://​archimedes.fas.harvard.edu/​docs/​donatus-api/​
 +
 +==== Documentation from MPIWG (text by St. Trzeciok; additional files by B. Fuchs): ====
 +
 +
 +{{:​wiki:​archimedes:​bve_doc.pdf|Doku by B. Fuchs}}
 +
 +-  Workflow Archimedes
 + -  Metadata
 + -  Produce a small text to describe the author for: [[http://​archimedes2.mpiwg-berlin.mpg.de/​archimedes_templates/​biography.html?​-table=archimedes_authors]]
 + -  Upload the text in: Filemaker IT server, archimedes_authors
 + -  Producing images
 + -  see documentation of the library
 + -  Transcription and producing the of text-xmls
 + -  see transcr.pdf {{:​wiki:​archimedes:​transcr.pdf|transcr.pdf}}
 + -  example list for entities used in archimedes: entities.html {{:​wiki:​archimedes:​entities.html|:​wiki:​archimedes:​entities.html}}
 + -  Synchronisation text-xmls and images
 + -  Production of thumbnails
 + -  see [[http://​pythia.mpiwg-berlin.mpg.de/​department1/​archimedes/​faq.html]]
 + -  Production of cut-outs (cut-outs are drawings or similar illustrations on the images, which are tagged inside the text-xmls)
 + -  use cutout-tool
 + -  Production: see workflow_online_cutout.pdf {{:​wiki:​archimedes:​workflow_online_cutout.pdf|:​wiki:​archimedes:​workflow_online_cutout.pdf}}
 + -  Postproduction:​ see cd_cutout_postprocess.txt {{:​wiki:​archimedes:​cd_cutout_postprocess.txt|:​wiki:​archimedes:​cd_cutout_postprocess.txt}} and online_cutout_postcutout.txt {{:​wiki:​archimedes:​online_cutout_postcutout.txt|:​wiki:​archimedes:​online_cutout_postcutout.txt}}
 + -  Correction of the text-xmls
 + -  Gap correction tool
 + -  see: gap_workflow.html {{:​wiki:​archimedes:​gap_workflow.html|:​wiki:​archimedes:​gap_workflow.html}}
 + -  NB: only working in Safari (Mac browser)
 + -  Frequency-sorted morphological "​miss"​ lists
 + -  useful for "​misses"​ which occur more than once
 + -  Tool: editor
 + -  Workflow: ​
 +                -  download the chosen text-xml from the repository
 +                -  choose the same text in Frequency-sorted morphological "​miss"​ lists
 +                -  find relevant words
 +                -  copy the raw form, open the relevant text-xml in an editor ​
 +                -  look up the copied raw form, replace the form
 +                -  save the text-xml
 +                -  parse the xml with XML Validator/​SGML Parser
 +                -  upload the text
 +          -  using Bbedit one can get a list of all occuring raw forms in the text (Smultron or Jedit don´t have this option)
 + - ​ This may be helpful to compare the number of entities found in the Frequency-sorted morphological "​miss"​ lists and the number found using the editor. This helps a lot to avoid new mistakes.
 + -  there is also the option to add new forms to the list of the Formmaker tool
 + -  NB: Unfortunately there is no link from the Formmaker tool back to the Frequency-sorted morphological "​miss"​ lists, which slows down the process of correction/​supplementation of the morphology. From working experience seen it is much faster to use the Formmaker tool for morphology supplementation
 + -  Correction of single mistakes + supplementation of the morphology by the survey of morphosyntactic rules for neologisms and spelling variations ​
 + -  From working experience it is useful and time-relevant to combine these different task steps with each other
 + -  Tools:
 + -  Editor/​Arboreal + Browser/​Overviewtool + Formmaker Tool
 + - ​ NB: recommended is to work with two monitors
 + - ​ Browser vs. Overviewtool
 + - ​ Using the Overviewtool instead of the ECHO or Archimedes environment in a browser gives the option to have the image and the text display next to each other. It is recommended for text correction with a lot of single-occuring morphological misses, because one does not spend time to load the images separately. ​
 + - ​ Unfortunately working with long xml-texts it takes a lot of time to load both the images and the text. According to this case it rather slows down the speed of correction.
 + - ​ NB: ECHO does not display <​gap/>​
 + - ​ Editor vs. Arboreal
 + - ​ The main difference of doing correction work with either Arboreal or an editor is the possibility to make changes in the xml-structure.
 + - ​ That means, people who are not supposed to or want to prevent themselves from changing the xml-structure should rather use Arboreal
 + - ​ Formmaker vs. Arboreal
 + - ​ The advantage of using Arboreal for the morphological supplementation is that the generated form can be directly send to the relevant server which is doing the morphological analysis
 + - ​ That means the forms of Formmaker have to be added separately.
 +                   - On the other hand it is sometimes better to work without Arboreal; e.g. if a lot of xml-structure has to be added because of not decoded abrriviations,​ Formmaker is useful as a separate tool.
 + -  Workflow Editor: ​
 + -  download the relevant text from the text-repository ​
 + -  open the file in an editor ​
 + -  open the relevant text in the Overviewtool or in the ECHO/​Archimedes environment using a browser ​
 + -  find black colored words (morphological analysed forms appear in ECHO/​Archimedes brownish colored, not analysed forms in black) ​
 + -  decide if the form is a neologism/​spelling variation or due to false transcription into xml 
 + -  according to (5.) either add the form to Formmaker or correct the mistake in the editor  ​
 + -   save the file after a working session ​
 + -   parse the file with XML Validator/​SGML Parser ​
 + -   ​upload the text
 + -  Workflow Arboreal
 + -   ​download the relevant text from the text-repository
 + -   open the file in Arboreal
 + -   ​generate IDs. 
 + - ​ This used to be done separately either with the Sentence ID Tool [[http://​archimedes.mpiwg-berlin.mpg.de/​cgi-bin/​archim/​sid]] or Ficus [[http://​archimedes.fas.harvard.edu/​docs/​ficus.html]],​ but is now possible to do in Arboreal
 + - ​ NB: files which already have s-ids do not need this task step
 + -   get a morphological analysis by Donatus and highlight the unanalysed forms
 + - ​ One can also upload a morphology file from last session, if one saved it. This may speed up the work, when one handles large files or Donatus has been shut down.
 + -   ​correct the highlighted term or add unknown vocabulary ​
 + -   send new vocabulary to Donatus
 + -   save the file (as well as the morphological analysis from Donatus if needed)
 + -   parse the file with XML Validator/​SGML Parser ​
 + -   ​upload the text
 + -  Producing parallel texts
 + -  see coord.pdf {{:​wiki:​archimedes:​coord.pdf|:​wiki:​archimedes:​coord.pdf}}
 + -  NB: Parallel text are not displayable on Archimedes, but in the ECHO-environment
 + -  Further tools according to [[http://​archimedes.mpiwg-berlin.mpg.de/​arch/​archimedes.new.html]]
 + -  Lemmatized Corpus Search
 + -  see Revision History.pdf {{:​wiki:​archimedes:​revision_history.pdf|:​wiki:​archimedes:​revision_history.pdf}}
 + -  Dictionary Lookup Tool, Dictionary Headword Access
 + -  see Archimedes Project_Dictionary Service_Documentation.pdf {{:​wiki:​archimedes:​archimedes_project_dictionary_service_documentation.pdf|:​wiki:​archimedes:​archimedes_project_dictionary_service_documentation.pdf}} or [[http://​archimedes.mpiwg-berlin.mpg.de/​arch/​doc/​dict-server.html]]
 + -  XML Validator, SGML Parser
 + -  Corpus Language Statistics
 + -  used to display corpus word counts by language
 + -  xpath access ???
 + -  Working Group Home Page
 + -  GForge has been used for documentation and to coordinate tasks
 + -  older task list: [[http://​pythia.mpiwg-berlin.mpg.de/​department1/​archimedes/​tasklist.html]]
 +
 +===== Local Text administration (by St. Trzeciok) =====
 +
 +  - By command line
 +               - Prerequisites
 +                  - OS X or any other UNIX based system
 +                  - User-account and Password for the Archimedes repository (provided by the IT-department)
 +               - Establishing a local text repository on the desktop of your computer
 +                  - Create a folder on your desktop named e.g. sources
 +                  - Open the program Terminal (a new shell window should appear on your screen)
 +                  - Change the directory to the new local text repository folder. Type: cd Desktop/​sources and press enter
 +                  - Adjust the network protocol. Type: export CVS_RSH=ssh and press enter
 +                  - Download the texts from the permanent text repository. Type: cvs -d :​ext:​username@archimedes.mpiwg-berlin.mpg.de:/​archimedes/​cvsroot co texts/​archimedes/​xml and press Enter
 +                  - Type in your password and press enter
 +                  - The texts will be in the subdirectory sources/​texts/​archimedes/​xml
 +               - Adding a new text to the permanent text repository
 +                  - Open the program Terminal (a new shell window should appear on your screen)
 +                  - Change the directory to the local text repository folder. Type: cd Desktop/​sources and press enter
 +                  - Adjust the network protocol. Type: export CVS_RSH=ssh and press enter
 +                  - Type: cvs -d :​ext:​username@archimedes.mpiwg-berlin.mpg.de:/​archimedes/​cvsroot add filename.xml
 +                  - Type in your password and press enter
 +                  - Press i (=insert) to add a comment
 +                  - Press esc after the completion of your comment and type: :wq (=write and quit)
 +                  - The new file will be uploaded into the permanent text
 +               - Adding changed texts to the permanent text repository ​
 +                  - Open the program Terminal (a new shell window should appear on your screen)
 +                  - Change the directory to the local text repository folder. Type: cd desktop/​sources and press enter
 +                  - Adjust the network protocol. Type: export CVS_RSH=ssh and press enter
 +                  - Type: cvs -d :​ext:​username@archimedes.mpiwg-berlin.mpg.de:/​archimedes/​cvsroot commit texts/​archimedes/​xml
 +                  - Type in your password and press enter
 +                  - Press i (=insert) to add a comment, what you have done with the files (!very important!)
 +                  - Press esc after the completion of your comment and type: :wq (=write and quit)
 +                  - The changed files will be uploaded into the permanent text repository
 +               - Refreshing your local text repository
 +                  - Open the program Terminal (a new shell window should appear on your screen)
 +                  - Change the directory to the local text repository folder. Type: cd Desktop/​sources and press enter
 +                  - Adjust the network protocol. Type: export CVS_RSH=ssh and press enter
 +                  - cvs -d :​ext:​username@archimedes.mpiwg-berlin.mpg.de:/​archimedes/​cvsroot up
 +                  - Type in your password and press enter
 +                  - Your local repository will be updated from the permanent text repository
 +
 +
 +
 +
 +
 +
  
archimedes_workflows.txt · Last modified: 2008/09/09 19:56 (external edit)