====== Archimedes Workflows ====== ==== Documentation from Harvard (by M. Hyman/M. Schiefsky): ==== * Archimedes Project Repository: Working Draft 4: http://archimedes.fas.harvard.edu/docs/repository/ * Archimedes Bundle Structure: http://archimedes.fas.harvard.edu/docs/bundle/ * The Donatus XML-RPC Interface: http://archimedes.fas.harvard.edu/docs/donatus-api/ ==== Documentation from MPIWG (text by St. Trzeciok; additional files by B. Fuchs): ==== {{:wiki:archimedes:bve_doc.pdf|Doku by B. Fuchs}} - Workflow Archimedes - Metadata - Produce a small text to describe the author for: [[http://archimedes2.mpiwg-berlin.mpg.de/archimedes_templates/biography.html?-table=archimedes_authors]] - Upload the text in: Filemaker IT server, archimedes_authors - Producing images - see documentation of the library - Transcription and producing the of text-xmls - see transcr.pdf {{:wiki:archimedes:transcr.pdf|transcr.pdf}} - example list for entities used in archimedes: entities.html {{:wiki:archimedes:entities.html|:wiki:archimedes:entities.html}} - Synchronisation text-xmls and images - Production of thumbnails - see [[http://pythia.mpiwg-berlin.mpg.de/department1/archimedes/faq.html]] - Production of cut-outs (cut-outs are drawings or similar illustrations on the images, which are tagged inside the text-xmls) - use cutout-tool - Production: see workflow_online_cutout.pdf {{:wiki:archimedes:workflow_online_cutout.pdf|:wiki:archimedes:workflow_online_cutout.pdf}} - Postproduction: see cd_cutout_postprocess.txt {{:wiki:archimedes:cd_cutout_postprocess.txt|:wiki:archimedes:cd_cutout_postprocess.txt}} and online_cutout_postcutout.txt {{:wiki:archimedes:online_cutout_postcutout.txt|:wiki:archimedes:online_cutout_postcutout.txt}} - Correction of the text-xmls - Gap correction tool - see: gap_workflow.html {{:wiki:archimedes:gap_workflow.html|:wiki:archimedes:gap_workflow.html}} - NB: only working in Safari (Mac browser) - Frequency-sorted morphological "miss" lists - useful for "misses" which occur more than once - Tool: editor - Workflow: - download the chosen text-xml from the repository - choose the same text in Frequency-sorted morphological "miss" lists - find relevant words - copy the raw form, open the relevant text-xml in an editor - look up the copied raw form, replace the form - save the text-xml - parse the xml with XML Validator/SGML Parser - upload the text - using Bbedit one can get a list of all occuring raw forms in the text (Smultron or Jedit don´t have this option) - This may be helpful to compare the number of entities found in the Frequency-sorted morphological "miss" lists and the number found using the editor. This helps a lot to avoid new mistakes. - there is also the option to add new forms to the list of the Formmaker tool - NB: Unfortunately there is no link from the Formmaker tool back to the Frequency-sorted morphological "miss" lists, which slows down the process of correction/supplementation of the morphology. From working experience seen it is much faster to use the Formmaker tool for morphology supplementation - Correction of single mistakes + supplementation of the morphology by the survey of morphosyntactic rules for neologisms and spelling variations - From working experience it is useful and time-relevant to combine these different task steps with each other - Tools: - Editor/Arboreal + Browser/Overviewtool + Formmaker Tool - NB: recommended is to work with two monitors - Browser vs. Overviewtool - Using the Overviewtool instead of the ECHO or Archimedes environment in a browser gives the option to have the image and the text display next to each other. It is recommended for text correction with a lot of single-occuring morphological misses, because one does not spend time to load the images separately. - Unfortunately working with long xml-texts it takes a lot of time to load both the images and the text. According to this case it rather slows down the speed of correction. - NB: ECHO does not display - Editor vs. Arboreal - The main difference of doing correction work with either Arboreal or an editor is the possibility to make changes in the xml-structure. - That means, people who are not supposed to or want to prevent themselves from changing the xml-structure should rather use Arboreal - Formmaker vs. Arboreal - The advantage of using Arboreal for the morphological supplementation is that the generated form can be directly send to the relevant server which is doing the morphological analysis - That means the forms of Formmaker have to be added separately. - On the other hand it is sometimes better to work without Arboreal; e.g. if a lot of xml-structure has to be added because of not decoded abrriviations, Formmaker is useful as a separate tool. - Workflow Editor: - download the relevant text from the text-repository - open the file in an editor - open the relevant text in the Overviewtool or in the ECHO/Archimedes environment using a browser - find black colored words (morphological analysed forms appear in ECHO/Archimedes brownish colored, not analysed forms in black) - decide if the form is a neologism/spelling variation or due to false transcription into xml - according to (5.) either add the form to Formmaker or correct the mistake in the editor - save the file after a working session - parse the file with XML Validator/SGML Parser - upload the text - Workflow Arboreal - download the relevant text from the text-repository - open the file in Arboreal - generate IDs. - This used to be done separately either with the Sentence ID Tool [[http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/archim/sid]] or Ficus [[http://archimedes.fas.harvard.edu/docs/ficus.html]], but is now possible to do in Arboreal - NB: files which already have s-ids do not need this task step - get a morphological analysis by Donatus and highlight the unanalysed forms - One can also upload a morphology file from last session, if one saved it. This may speed up the work, when one handles large files or Donatus has been shut down. - correct the highlighted term or add unknown vocabulary - send new vocabulary to Donatus - save the file (as well as the morphological analysis from Donatus if needed) - parse the file with XML Validator/SGML Parser - upload the text - Producing parallel texts - see coord.pdf {{:wiki:archimedes:coord.pdf|:wiki:archimedes:coord.pdf}} - NB: Parallel text are not displayable on Archimedes, but in the ECHO-environment - Further tools according to [[http://archimedes.mpiwg-berlin.mpg.de/arch/archimedes.new.html]] - Lemmatized Corpus Search - see Revision History.pdf {{:wiki:archimedes:revision_history.pdf|:wiki:archimedes:revision_history.pdf}} - Dictionary Lookup Tool, Dictionary Headword Access - see Archimedes Project_Dictionary Service_Documentation.pdf {{:wiki:archimedes:archimedes_project_dictionary_service_documentation.pdf|:wiki:archimedes:archimedes_project_dictionary_service_documentation.pdf}} or [[http://archimedes.mpiwg-berlin.mpg.de/arch/doc/dict-server.html]] - XML Validator, SGML Parser - Corpus Language Statistics - used to display corpus word counts by language - xpath access ??? - Working Group Home Page - GForge has been used for documentation and to coordinate tasks - older task list: [[http://pythia.mpiwg-berlin.mpg.de/department1/archimedes/tasklist.html]] ===== Local Text administration (by St. Trzeciok) ===== - By command line - Prerequisites - OS X or any other UNIX based system - User-account and Password for the Archimedes repository (provided by the IT-department) - Establishing a local text repository on the desktop of your computer - Create a folder on your desktop named e.g. sources - Open the program Terminal (a new shell window should appear on your screen) - Change the directory to the new local text repository folder. Type: cd Desktop/sources and press enter - Adjust the network protocol. Type: export CVS_RSH=ssh and press enter - Download the texts from the permanent text repository. Type: cvs -d :ext:username@archimedes.mpiwg-berlin.mpg.de:/archimedes/cvsroot co texts/archimedes/xml and press Enter - Type in your password and press enter - The texts will be in the subdirectory sources/texts/archimedes/xml - Adding a new text to the permanent text repository - Open the program Terminal (a new shell window should appear on your screen) - Change the directory to the local text repository folder. Type: cd Desktop/sources and press enter - Adjust the network protocol. Type: export CVS_RSH=ssh and press enter - Type: cvs -d :ext:username@archimedes.mpiwg-berlin.mpg.de:/archimedes/cvsroot add filename.xml - Type in your password and press enter - Press i (=insert) to add a comment - Press esc after the completion of your comment and type: :wq (=write and quit) - The new file will be uploaded into the permanent text - Adding changed texts to the permanent text repository - Open the program Terminal (a new shell window should appear on your screen) - Change the directory to the local text repository folder. Type: cd desktop/sources and press enter - Adjust the network protocol. Type: export CVS_RSH=ssh and press enter - Type: cvs -d :ext:username@archimedes.mpiwg-berlin.mpg.de:/archimedes/cvsroot commit texts/archimedes/xml - Type in your password and press enter - Press i (=insert) to add a comment, what you have done with the files (!very important!) - Press esc after the completion of your comment and type: :wq (=write and quit) - The changed files will be uploaded into the permanent text repository - Refreshing your local text repository - Open the program Terminal (a new shell window should appear on your screen) - Change the directory to the local text repository folder. Type: cd Desktop/sources and press enter - Adjust the network protocol. Type: export CVS_RSH=ssh and press enter - cvs -d :ext:username@archimedes.mpiwg-berlin.mpg.de:/archimedes/cvsroot up - Type in your password and press enter - Your local repository will be updated from the permanent text repository