+ ~ -
 
Print E-mail

FAQs Glossary

Page Index
we-need-volunteers

Background

The OTC project began in earnest in January 2011, and for the first time we have given limited public access to DJO. We need to correct about 30,000 journal pages, not including the Household Narrative. This is  a bit too much for our small team of three, and we desperately need volunteers! We'd love to complete this project before the official launch of the DJO site in 2012 â€” and can do so with your help.

A quick technical review will help the reader understand the need for text correction: we store two files (or records) for every journal page, one file being a facsimile of the original page stored as an image file (in ".jpg" format), and the other being a text file, similar to this web page. The text file was produced by applying optical character recognition (OCR) software to the image file, where the accuracy of each process depends on the quality of the image file.

All though the image files were created using a state-of-the-art scanning device, the quality of the original journal pages varied and some contained paper folds, smudge marks, transparency, etc. and as a result the text files contain a number of errors that vary from file to file. This is the main dilemma that we are trying to correct. A secondary problem, relatively trivial, is that the text file contains unwanted information and styling, which can also be corrected at the same time as the actual mistakes.

We have decided to make a magazine, typically 24 pages long, the smallest unit of contribution and as a result we will have 1,101 units of work at the end of the day. So if we find around 1,000 volunteers to take on 1 or 2 magazines each, we will reach the target between us. We reckon that with a typical magazine, it will take about 10 minutes to review and correct each page = 240 minutes or 4 hours' work). Please pass the details to friends you think might be interested.

The Project Scope

The goal of this project is to transform each and every text file into the simplest and most significant form possible without the loss of information. These files will then be used in the future as the basis for a wider set of solutions and projects: access for handheld reading devices, speech synthesis, intricate research via a TEI compatible format, etc. Three clear objectives have been defined for this project:

  1. Correct the OCR errors.
    OCR errors includes spelling mistakes, punctuation mistakes, mixed up sentences and paragraphs, etc.

  2. Remove the information about the volume and magazine.
    Volume information includes page numbers and publication information, but excludes footnotes relating to article content. Magazine information includes the magazine title.

  3. Remove (most) of the original page styling.
    This includes font types, text sizes. etc.

Registering and Selecting a Magazine

  1. Create your DJO site account, if you have not done so before.
    • Visit the Registration Page here, or follow the "Create an account" link on the Login section of the Homepage.
    • Note that on registration an automated email will be sent from this site to the email address you have specified on the registration form; this email contains an activation link, and your account will not be activated until you click on this link.  This automated email will be send from This e-mail address is being protected from spambots. You need JavaScript enabled to view it or This e-mail address is being protected from spambots. You need JavaScript enabled to view it ; please check you spam folder if you do not see this email in your in-box immediately.
  2. Login to the DJO site.
    • Use the Login form on the Homepage or click here.
  3. Select an uncorrected magazine for correction.
    • Hover over the 'Online Text Correction' tab on the Homepage, and select the 'Uncorrected Magazines' option from the drop-down menu.
    • The page will refresh to show a list of 'The Uncorrected Magazines'.
    • Make sure that the User Access Filter is set to 'Registered User' (and not 'Moderator [Advanced]').
    • On the resulting list, select the first available magazine OR scroll down to find a magazine you have a particular interest in; click on the 'Correction Record' link for that magazine.
    • On the 'Correction Record' look under 'Options / For the text corrector,' and click on 'Volunteer.' You will be asked to confirm this; click on OK to proceed.
    • we-need-volunteers
      TIP - now that you have selected a magazine, you can always find it again under 'My Links' on the homepage, under The OTC Panel. Click on the 'Yes' to proceed to the Correction Record for the magazine you are working on.
    • Your magazine will now be marked as "Correction in Progress": this will prevent anyone else from trying to correct the same magazine.

Correcting an Individual Page

1. Proceed or return to the "Correction Record" for the magazine you have selected; scroll or jump down to 'Page Details,' where you will see a series of thumbnail images of each of its pages.

2. Click on 'Edit' for the first page, and the 'Page Editor' screen will display, similar to the one in the image below. Please ignore the 'Overlooked Corrections' panel now. TIP – you can re-size the central Text Editor panel by clicking and dragging on the diagonal handle in the bottom right hand corner of its frame.

the-otc-editor

3. Above the Page Facsimile and Text Editor panels, under 'Options,' you will see an interface with the following key features:

  • A 'Save Now' button, which saves any changes you've made to a page, and allows you to continue editing that page; an 'Exit' button, which is your way back to the standard Read view, where you can move on to the next page. If you have used 'Save Now' first, you will not lose any changes when you exit the page.
  • An Autosave box, which you can disable if you wish. It is set at present to save every 5 minutes.
  • You will find more information below the Page Facsimile and Text Editor panels that can be used without having to leave the interface.

If you do not see this interface, or if you see the notice that you are a guest, then please go to step 4 of "Registering and Selecting a Magazine" above, before proceeding.

Please keep the following in mind while you read the remainder of this section (see image above):

  • Corrections can only ever made to the text file, available in the Text Editor (under 'Edit' on the 'Page Details' section of the magazine Correction Record), and this section is about how to make those corrections.
  • The Text Editor is modified so as to highlight paragraphs and tables, two important structural elements of each page.
  • The text file will always only be one column of information; please never try to recreate the double-column format that you see in the page facsimile image.

For each page, please carry out the following steps:

  1. Remove the Information about the Volume and Magazine

    We recommend that you do this first, otherwise you might correct text that will be removed later. Please use the following guides for each type of magazine page.

    remove-content-one Cover Page Guide

    1. Remove the magazine title, up to, but not including, the title of the first article (except in the case of Extra Christmas Numbers, in which the masthead, title and index should please be retained).
    2. Remove the footer that includes the volume and magazine number.
    Example: Household Words No.1, Page 1 (link)

    1. The magazine title, starting at the quotation from Shakespeare, and ending at the price, was removed, up to, but not including "A PRELIMINARY WORD.".
    2. The footer text was removed: "Vol. I" and "1".

     

    remove-content-two Internal Page Guide

    1. Remove the page header.
    Example: Household Words No.1, Page 2 (link)

    1. The header text was removed: "2", "HOUSEHOLD WORDS" and "Conducted by".
    remove-content-three Closing Page Guide

    1. Remove the page header.
    2. Remove information regarding the publisher and/or printers.
    Example: Household Words No.1, Page 24 (link)

    1. The header text was removed: "24" and "HOUSEHOLD WORDS.".
    2. The following comment about the publisher and printer was removed:
      "Published at the Office, No. 16, Wellington Street North, Strand; and Printed by BRADBURY & EVANS, Whitefriars, London.

      The final page of some magazines ends with an ADVERT for volume editions of the journals or of serialised fiction. PLEASE DO NOT DELETE THESE ADVERTS BUT RETAIN THE TEXT AS IT APPEARS IN THE ORIGINAL (it is not necessary to retain the original layout of the advert – just place it all in one paragraph).
  2. Correct the OCR Errors.

    The severity and amount of OCR errors will vary from page to page, and the effort needed to correct them in turn will also vary. We find that pages in close proximity sometimes share the same problems, hence some magazines will be more complicated to correct, and it might be easier for the volunteer to start with a magazine with fewer errors.

    Guide on Correcting Page Level Errors

    Page elements refer mostly to paragraphs, but can also include the occasional table and/or image.

    1. Fix mixed up columns.

      Once in a while you might find that the OCR software missed the border between multiple columns, and only "saw" one column; this is depicted in the two images to the right. The image with the green blocks shows the correct column structure, while the red blocks shows how the OCR might have got the column structure wrong. We recommend that you delete and rewrite the mixed-up sections (red 4), as to resolve the different threads will almost always be more difficult. But the decision is at the corrector's discretion.

      structure
    2. Fix out of order information.

      Please make sure the content contained inside the text file follows the flow of information, and not simply the columns. The image to the right shows the same page, with the right section order (green) and the wrong section order (red). This type of error is very unusual, and also quick to fix. Simply cut and paste the sections untill the page is in the right order.

      order
    3. Fix missing structures.
      If you find content that is not inside a paragraph, or a table, then please wrap the content with the right element. Of course if you also find that some content is missing, then please recreate it with the missing elements.

    4. Delete empty elements.
      If you find extra empty paragraphs, or tables, then please remove them.

    Guide for Paragraphs, Tables and Images

    How to create, delete and change:

    • Paragraphs:
      You create a paragraph when you press [ENTER], and you can delete a paragraph with the [BACKSPACE] button if you put the cursor inside or behind the paragraph. (NB This function may work rather more consistently using the Mozilla Firefox browser.)
      You create new lines inside a paragraph by holding down [SHIFT] while you press [ENTER]. This is similar convention that is used for most document editors.

    • Tables:
      insert-a-new-tableYou create a table with the 'Insert a new table' button found on the text file editor, or by right clicking on the text file editor. You get more table options by right clicking on a table, for example adding more rows and columns, as well as deleting tables.

    • Images:
      There is no mechanism at the moment to insert images into the text file: we recommend using a placeholder for the time being. Please create a paragraph and put the text "{IMAGE}" inside it, including the curly brackets.
    Please fix the following:

    • Fix spelling and punctuation.
      It is important to ensure that every paragraph contains the exact same text as the facsimile image, across the same number of lines. The process is the same for each paragraph; compare all the words, punctuation and new lines, and if the text file is different please change it.

      Tips on correcting text:
      • insert-custom-characterUse the custom key icon to insert more complex characters (for example the pound sign '£', which should be used instead of 'l.'). Please italicise s. and d. for shillings and pence, even when not italicised in the original.
      • toggle-spellcheckerYou can use the content-editor's built in spell checker, but please use the original spelling from the facsimile, if different. (The exception would be if you find a genuine mistake e.g. a compositor's error, in the facsimile, in which case you should silently correct it. NB Dickens's journals regularly omit the final 'l' in words like 'recal': this is not an error).
  3. Remove (most) of the original page styling.

    Fortunately most of this operation is automated, and there is in fact very litte for you to do regarding this objective. However there are a couple of things that the volunteer can do, and try not to do:

    Please include the following actions:
    • Reconnect paragraphs split by columns.
    • Retain the original length and layout of the lines.
    • Put each article title, including those split over multiple columns, into one paragraph.
    • Reconnect words split over two lines by hyphens. See FAQs for detailed advice, but the basic principle is to take up, or take down, the fewest number of characters to make the word whole. (Use personal discretion if word has even number of characters to each side of the split).
    • Recreate underlined, italics or bold text, except in article headings.
    • Recreate the new lines of paragraphs.
    Please avoid the following actions:
    • Please do not try to recreate the indentation used for paragraphs.
    • Please do not alter the length of lines: retain the original line length and layout
    • Please do not try to recreate multiple columns: there should be only one column.
    • Please do not try to change the font or text types.

Submitting your magazine for approval by a moderator

  1. Please note that you must save every page before you are allowed to proceed to this step. This means if by chance you find a page that has no errors then you will still have to open it in 'Edit' view, and save it, before proceeding to the next page.
    • On completion of all pages: under 'Corrections Details' on the magazine's Correction Record, you will see an update on the number of pages corrected, which should now state '24 of 24 pages' (or 36/36, or 48/48, depending on the number of pages). You will now therefore see, under 'Options / For the text corrector' a button allowing you to submit the magazine for approval:
    Submit2
    • The magazine's status, under 'Correction Details,' will now change to "Waiting for Approval"; you will no longer be able to make changes to your magazine.
  2. Moderators (either part of the small in-house team working on Dickens Journals Online or experienced volunteers who have been approved for moderation work) will select your magazine in strict order of submission date.
  3. You will be notified via an email of the outcome of the moderators' decision, but many thanks for getting this far!
    • If the moderator has found that, for example, you missed a page, then you will be asked via email to complete the missing page. Or, if there is a certain category of error we would like you to remove more consistently, you wlll be sent a short summary, plus be directed to the 'Page Details' on the correction record, where a checklist of what still needs to be done will appear. At the same time the magazine will once again be tagged as "Correction in Progress".
    • If the moderator accepts your corrections then you may stop there or, if you'd like to correct another magazine, please start again from step 2. The completed magazine will be marked as "Corrected".
    • Please be patient while waiting for a moderator's decision; we will do our best to review correction work regularly. Should you wish to check on progress at any time, please feel free to email This e-mail address is being protected from spambots. You need JavaScript enabled to view it with an enquiry. We will be delighted to hear from you.

Who's Online

We have 6002 guests and 2 robots online.