| The Standards for e-books processing
1 Procedures of e-books processing
2 Standards of each process
2.1 Standards for scanned image
2.2 Standards for image processing
2.3 Standards for metadata input
2.4 Standards for content input
2.5 Standards of quality check
1 Procedures of e-books processing
Book scanning
-> Spot clearing and skew correcting
-> Transforming image format
Metadata inputting
-> content-navigation Constructing }
-> Packaging? -> Quality checking
2 Standards of each process
For being easy to management, all books for digitalization will be numbered in the uniform format "AADDDDDD", in which AA represents resource-processing centers, DDDDDD represents the unique identifiers of each book.
The resource processing centers can distribute and manage the serial number on their own, while they have to ensure that each book and each number correspond to each other,that is, a book should not have two numbers, or a number should not be used for two books.
2.1 Standards for scanned image
1. Nomination
All scanned images should be identical with those original pages, without upside-down, absent, repeated or wrong pages. They should be stored as TIFF (Tagged Image File Format) files in sequential order, and named from 00000001.tif in sequence.
2. Resolution
Images will be scanned at a resolution of 600 dots per inch (dpi);
Pages with gray scale images: 600dpi for gray level 256;
Color books: 600dpi for 32K color scanning;
Image files should be stored in the OTIF directory using the compression algorithmin TIF CCITT4.
3. Skew degree
The skew degree of scanned image should be below
3° , and any part of the image should not be tilted or tortured to ensure Images as readable as the original pages. For thick books or books with binding string close to text, a few characters near the page borders will be tortured after scanning, while the text should be clear.
4. Definition
Characters in scanned images should be clear with no stroke missing or overlapping, and their color should be moderate, not too light or dark. The images should be readable, even if the original contents are stained, rusted, too light or too dark in color.
For images scanned from pages which are too thin or too dark, characters on the opposite page are easily scanned together with the images. In this case, the scanned characters should also be readable, though the stain on the characters are usually difficult to clear.
The dark borders surrounding the page images should not be over 0.5cm in width, and fingerprints and dark borders should not overlap the text contents.
5. Image Contents
The obtained images should be roughly located in the center of page, with no obviously close to right or left sides, with intact page header and footer information.
The content of images should be absolutely identical with that in original pages. That is, any content of images in original pages should be included, and any other information in near pages should not be scanned.
2.2 Standards for image processing
1. All information of processed image should be absolutely identical with that in original books, and any useful information on pages such as text, header, footer, written notes and seal (with the exception of collection stamp of library) etc., should not be deleted.
2. All dark lines, fingerprints or shades produced by scanning should be cleared.
3. The skew degree of page should be below
1°
4. Obtained images should be stored in the directory PTIF in the format of DjVu, 300 dip.
2.3 Standards for metadata input
The metadata should meet the standards DC, and metadata input should be correct absolutely, that is, the input accuracy should reach 100%.
1. Title:
For the metadata of degree dissertations:
XX (degree) dissertation of XX University
------.. (title of the dissertation)
For the metadata of general books:
(1) If the title on the book-name page is not consistent with that on the cover, the one on the book-name page should be input.
(1) If the title on the book-name page is not consistent with that on the cover, the one on the book-name page should be input.
(2) If there is only title of series in the book-name page, title of the book should be input in the format "title of series: title of the book".
2. Author: author of the dissertation
3. Subject and key words:
4. Description: The first sentence in the abstract
5. Publisher: XX University
6. Other participants: the supervisor
7. Date: Date on the cover of the dissertation
8. Resource type: TEXT. ABSTRACT
9. Format: TEXT.HTML
10. Language: Chinese or English
Other metadata, such as resource identifier, right management, and coverage will not input until their standards are determined.
2.4 Standards for content input
1. All of navigation information should be correctly input, including content titles and corresponding pages, and all of special characters in contents such as §.
2. All content titles should be input together with corresponding page numbers. If the page number is lacked in text or incorrect in the content, it has to be corrected based upon the book before input.
3. All of navigation information should be input in good trim, and a space should be kept between the serial numbers and the chapters, sections or titles.
4. Other content information, such as cover, content, outlines, abstract, preface, references, supplements, acknowledgements, title page, postscript, oration, introduction, contributions, index, notes, terminology list, copyright form list, and illustration list etc., should be input based upon the book, together with the corresponding page numbers.
Note: No space or other special characters can be input in this item.
5. The completed XML files should be saved in TOC file-folders in appropriate book contents.
6. Special items.
(1) In CATCREATOR, the input of some special characters can bring about errors in CATALOG..XML in some cases, so Chinese or English with identical meanings can be used as substitutions for these characters.
(2) Special tags in contents
a. superscript and subscript
For example: X¯2 means X 2 ; X_2 means X 2
b. complex fraction
For example: [(A+B)/(C+D)]/[(E+F)/(G+H)] stands for
A+B
C+D __
E+F
G+H
c. radical sign
digits under radical sign, for example, v2 means 2 under radical sign
Notes: 3v2(means three squares of v2) is different from 3*v2(means three times of v2)
expression under radical sign, for example, v(A+B ) means A+B under radical sign.
(3) Chinese characters should be input in the simplified or complex form according to original books. For characters not included in GBK, appropriate Chinese Pin Yin can be input.
(4) Special characters which can not be input anyway should be replaced by the character "#".
(5) If content of volume II of a book is only printed in Volume I, it should be added to the volume II.
(6) General content should be input if the book has it.
(7) If there is no title on book cover, it should be input manually.
(8) The title of compiling committee should be input upon the one in original book.
(9) For content titles having both Chinese and English (or other language) characters, all of the characters ahead the page number should be input.
(10) For books with both Chinese and English contents, only the Chinese one will be input.
(11) If the title in contents is not consistent with that in text, the one in text should be input.
(12) For books without contents, a content with three grades should be created and input.
(13) For titles with too more characters, only the first twenty characters will be input and the following ones be input as suspension points ".."instead.
(14) If the page number in contents is incorrect, real content information should be input.
(15) In case a book has two or more than two contents, and one of them is the part of the other, the most intact one should be input.
2.5 Standards of quality check
See as above 2.1-2.4.
|