This table contains data directly pertaining to documents. It includes all fields relevant for all possible (known) types of documents in ISI Web of Science data; that is, a Document is not necessarily always a(n) article, conference, paper, etc.

All "arbitrary" fields found in the original ISI dataset are also appended to this table, as this table is somewhat central to the database schema. Any fields that are not in the following list are considered to be arbitrary:

  • AB (Abstract)
  • AR (New Article Number)
  • AU (Authors)
  • AF (Author Full Names)
  • BA (Book Authors)
  • BE (Edited By)
  • BF (Book Author Full Name)
  • BP (Beginning Page)
  • SE (Book Series Title)
  • BS (Book Series Subtitle)
  • CP (Cited Patent)
  • NR (Cited Reference Count)
  • CR (Cited References)
  • CY (Cited Year)
  • OI (ORCID ID)
  • RI (Author Identifier)
  • HO (Conference Host)
  • CL (Conference Location)
  • SP (Conference Sponsors)
  • CT (Conference Title)
  • MA (Meeting Abstract)
  • DT (Document Type)
  • DI (DOI)
  • D2 (Book DOI)
  • ED (Editors)
  • EM (E-mail Addresses)
  • EF (End of File)
  • EP (Ending Page)
  • ER (End of Record)
  • FN (File Type)
  • SO (Full Journal Title)
  • FU (Funding Agency and Grant Number)
  • FX (Funding Text)
  • BN (ISBN)
  • SN (ISSN)
  • GA (ISI Document Delivery Number)
  • JI (Abbreviated ISO Journal Name)
  • IS (Issue)
  • LA (Language)
  • ID (New ISI Keywords)
  • PG (Number of Pages)
  • DE (Original Keywords)
  • PN (Part Number)
  • PD (Publication Date)
  • PT (Publication Type)
  • PY (Publication Year)
  • PA (Publisher Address)
  • PI (City of Publisher)
  • PU (Publisher)
  • WP (Publisher Web Address)
  • RP (Reprint Address)
  • C1 (Research Addresses)
  • SI (Special Issue)
  • SC (Subject Category)
  • SU (Supplement)
  • TC (Times Cited)
  • TI (Title)
  • J9 (Abbreviated Journal Name)
  • UT (Unique ID)
  • VR (Version Number)
  • VL (Volume)

Note that the above is merely a list of ISI fields handled by the loader, not fields the end up in this table. That being said, this table contains the following fields:

  • PK: Automatically generated. Guarantees uniqueness.
  • ABSTRACT_TEXT: The exact string as found in the "AB" field.
  • ARTICLE_NUMBER: The exact string as found in the "UT" field.
  • BEGINNING_PAGE: The contents of the "BP" field parsed as an integer.
  • CITED_REFERENCE_COUNT: The contents of the "NR" field parsed as an integer.
  • DIGITAL_OBJECT_IDENTIFIER: The exact string as found in the "DI" field.
  • DOCUMENT_TYPE: The exact string as found in the "DT" field.
  • DOCUMENT_VOLUME: The contents of the "VL" field parsed as an integer.
  • ENDING_PAGE: The contents of the "EP" field parsed as an integer.
  • FIRST_AUTHOR_FK: If any authors were provided (in the "AU" field), this will be the foreign key into the PERSON table of the first author specified. See AUTHORS and PERSON.
  • FUNDING_AGENCY_AND_GRANT_NUMBER: The exact string as found in the "FU" field.
  • FUNDING_TEXT: The exact string as found in the "FX" field.
  • ISBN: The exact string as found in the "BN" field.
  • ISI_DOCUMENT_DELIVERY_NUMBER: The exact string as found in the "GA" field.
  • ISI_UNIQUE_ARTICLE_IDENTIFIER: The exact string as found in the "UT" field.
  • ISSUE: The exact string as found in the "IS" field.
  • LANGUAGE: The exact string as found in the "LA" field.
  • PAGE_COUNT: The contents of the "PG" field parsed as an integer.
  • PART_NUMBER: The exact string as found in the "PN" field.
  • PUBLICATION_DATE: The exact string as found in the "PD" field. Is not an actual date field in terms of data types because it may actually be a range or list of dates, or it may contain arbitrary text.
  • PUBLICATION_YEAR: The contents of the "PT" field parsed as an integer.
  • DOCUMENT_SOURCE_FK: If a source was provided (see SOURCE for an explanation of which ISI fields end up in the SOURCE table), this will be the foreign key into the SOURCE table.
  • SPECIAL_ISSUE: The exact string as found in the "SI" field.
  • SUBJECT_CATEGORY: The exact string as found in the "SC" field.
  • SUPPLEMENT: The exact string as found in the "SU" field.
  • TIMES_CITED: The contents of the "TC" field parsed as an integer.
  • TITLE: The exact string as found in the "TI" field.

It is possible for References to be matched to Documents (as an "is a" relationship) during loading based on either Digital Object Identifiers and/or Article Numbers matching. See REFERENCE for more details on this. (There is also an explicit Match References To Papers step, which uses different metrics for matching Documents to References.).

In the various built-in extractions, when Documents are specified in the original ISI dataset, they are considered to be inner documents. When they are only specified as References and are not matched to actual Documents, they are considered to be outer documents.