Metadata tags for academic publications
Download PDF version.
The HTML meta
element
Popular standards for HTML meta
citation tags
HTML meta
tags for academic publications
Table of name
attributes
Additional comments on the citation
namespace
Additional comments on the Dublin Core metadata specification
Additional comments on the PRISM metadata specification
Additional comments on the Eprints metadata specification
Additional comments on the BE Press metadata specification
Recommendations for academic publications
Facebook’s Open Graph protocol
Database-specific conventions
Appendix
COinS (ContextObjects in Spans) / OpenURL Framework — the Key/Encoded-Value (KEV) Format
Single (broad) subject
Multiple (narrow) topics
Introduction
In order to facilitate electronic searching and cataloguing of publication data — be they citations, abstracts, or the full text — it is very helpful to present the identifying information in a standard format. The identifying information could include, for example, the title of an article, the name of a conference, or the name of an author. Such identifying information can be termed “metadata” — literally meaning ‘data that describe the data’.
For example, it would be useful to be able to distinguish searches on the term “water” according to its appearance in the title of an article, the name of a conference, or the name of an author, for example. To do this, the identifying information (“metadata”) can be annotated with consistent labels (“tags”) e.g. art_title
, conf_name
, and auth_name
. However, if the identifying information were haphazardly stored without any annotation, or even using nonstandard annotation (e.g. “name-of-article”), that becomes impossible.
As yet there is no universal standard.
HTML meta
tags
The HTML meta
element
HTML (hypertext mark-up language) is the prevailing standard for formatting text presented on websites. HTML also contains the ability to include metadata — identifying information about the particular web page or content — through the use of the meta
tag. Metadata contained within the meta
tag will be discoverable, but not rendered on the screen. The syntax for the HTML meta
tag is
<meta name="author" content="Jane Doe">
The name
and content
attributes should be set pairwise.
The meta
tags always go inside the head element.
Previously a scheme
attribute could be set in the meta tag, such as
<meta name="identifier" content="0-2345-6634-6" scheme="ISBN">
but this is now obsolete in the latest HTML standard (HTML5).
Although the content attribute can contain practically any text, to be fully compliant with the HTML5 standard: “The name specified must either be a standard metadata name defined in the HTML5 specification or a registered extension to the predefined set of metadata names”.
In principle, the metadata encoded with meta
would not also be able to be expressed using the title
, base
, link
, style
, and script
elements that are already have dedicated components of the head
element. However, note that the standalone author
element within the head
element is designed to specify the author of the web page — not the author of a publication cited on the web page!
The set of defined metadata names in HTML5 is just: application-name, author, description, generator, and keywords. Note that, like the standalone author
element within the head
element — the standard author metadata name must be “a free-form string giving the name of one of the page’s authors.” Thus it too is not intended to specify the author of a publication cited on the web page!
A registered metadata name is any metadata name registered in the central MetaExtensions registration page.
“Extensions to the predefined set of metadata names may be registered in the WHATWG Wiki MetaExtensions page.” Anyone is free to edit the WHATWG Wiki MetaExtensions page at any time to add or amend a metadata name, provided that information consistent with the all of the required definitions (Keyword, Brief description, Specification, Synonyms, and Status) is provided.
In XHTML (extensible HTML) the <meta
> tag must be properly closed. For example
<meta name="author" content="Jane Doe"/>
or
<meta name="author" content="Jane Doe"></meta>
Formally the meta
tag has no end tag in HTML. (However, it may not cause problems to include one anyway, using the ‘minimised’ syntax of the first snippet above.)
Names are case-insensitive, and must be compared in an ASCII case-insensitive manner.
Popular standards for HTML meta
citation tags
There are five commonly used standards for HTML tags relating to academic publications:
- Highwire Press/Google Scholar
citation_*
- Dublin Core
*
(alsodcterms.*
, which is practically equivalent, albeit less common, and semantically preferred) - PRISM
*
- Eprints
*
- BE Press
bepress_*
Google Scholar supports all five of these. Mendeley supports all except for BE Press. In both of these cases, providing tags from more than one of the above sets is not a problem — or even recommended. SharePoint 2013 mentions only three of the above namespaces: Highwire Press (citation_*), Eprints (eprints.*), and Dublin Core (DC.*).
dc:*
and dcterms:*
, with colons instead of dots, are used in XML (rather than HTML). Such as
<dc:creator>Stone J. E.</dc:creator>
They may perhaps(?) also be used in a hybrid ‘HTML–RDFa’ syntax as in
<element property="">
<dc:creator content="Stone J. E.">
The above list of metadata namespaces is arranged roughly in descending order of popularity. (In 2012 the Dublin Core namespaces were probably most popular, which is unsurprising given that it seems to predate the others and was developed as a public standard, rather than for proprietary use.)
All of these are specifically designed to describe academic publications (especially journal articles), with the exception of Dublin Core, which is a general namespace.
Currently only Highwire Press (citation_*
) and Dublin Core (dc.*
and dcterms.*
) name attributes are listed, and all with the status of “Proposal” (formally should have been “Proposed”). As of November 2018 none of the listed extensions to the predefined set of metadata names had the “Ratified” status. Ideally only ratified metadata names would be used. However, both proposed and ratified names are acceptable:
“Conformance checkers [such as HTML validators https://wiki.whatwg.org/wiki/Talk:MetaExtensions#Property_list_revision] may use the information given on the WHATWG Wiki MetaExtensions page to establish if a value is allowed or not: values defined in this specification or marked as "proposed" or "ratified" must be accepted, whereas values marked as "discontinued" or not listed [...] must be reported as invalid. [...]
When an author uses a new metadata name not defined by either this specification or the Wiki page, conformance checkers should offer to add the value to the Wiki, with the details described above, with the "proposed" status.”
PRISM 3.0 — version 3.0 of the “Publishing Requirements for Industry Standard Metadata” — includes eight separate metadata specifications:
- PRISM Advertising Metadata 3.0
prism-ad:*
- PRISM Basic Metadata 3.0
prism:*
- PRISM Dublin Core Metadata 3.0
dc:*
(with optionalprism:*
attributes) - PRISM Image Metadata 3.0
pmi:*
- PRISM Recipe Metadata 3.0
prm:*
- PRISM Usage Rights Metadata 3.0
pur:*
- PRISM Crafts Metadata 3.1
pcm:*
- PRISM Contract Management Metadata 3.1
pcmm:*
Some other namespaces, most notably PSV and PAM, are also associated with PRISM.
HTML meta
tags for academic publications
Table of name
attributes
To aid comparison and expedite quick reference, in the following table the name
attributes are organised by function.
Concept |
Dublin Core 1 |
||||
|
|
|
|
|
|
Names of author(s) |
|
|
|
bepress_citation_author |
|
Other author information |
|
Subsets of · ·
|
|
||
Title |
|
|
|
|
|
Date(s) |
|
|
|
|
|
Type of work |
|
|
|
||
Format & language of work |
|
|
|
|
|
Identifier |
|
|
|
|
|
Publisher — name, location |
|
|
|
|
|
SOURCE [journal, conference, book, report series, etc.] — name, identifiers |
|
|
|
|
|
SOURCE [journal, conference, book, report series, etc.] — editor(s), organiser(s), other characteristics |
|
|
|||
SOURCE [journal, conference, book, report series, etc.] — part |
|
|
|
|
|
SOURCE [journal, conference, book, report series, etc.] — pages |
|
|
|
|
|
Subject, Classification, Category, Key words |
|
|
|
|
|
Abstract |
|
|
|
|
|
Citation |
|
|
|
||
Related resources — similar, cited, cited by |
|
|
|
||
Availability & audience |
|
|
|
|
|
Manager(s)/Supervisor(s) |
|
Subsets of · ·
|
|
||
Rights/Copyright
|
|
|
|
||
Reference/Location (e.g. URL) |
|
|
|
|
|
Record/metadata |
|
||||
Status |
|
|
|||
Miscellaneous |
|
|
|
|
Obviously only some of the above are to be provided for each specific resource.
Suggested name
attributes to include in a meta
tag are shaded.
Additional comments on the citation
namespace
Extension of the citation
namespace
“Highwire Press, a division of Stanford University, developed its schema for journal articles and GS [Google Scholar] extended the tags to cover additional academic paper types, such as working papers, dissertations, manuscripts, conference papers, books and book chapters.” The above table includes all of those extended meta name
attributes, because they have become de facto inclusions in the citations_*
standard. Having said that, the table may also contain user-defined meta name
attributes, because there is no authoritative reference as to which were originally defined by Highwire Press, which were added by Google Scholar, and which were adopted by other agencies or individuals.
Schemata for the citation
namespace
There are apparently no standard URL’s in which a schema for the citation
namespace is defined.
Additional comments on the Dublin Core metadata specification
Author names
It is evident from the official Dublin Core definitions that creator
is the best match for providing an author’s names — not contributor
. Likewise the relevant PRISM standard (a subset of Dublin Core) states: “PRISM recommends that magazine publishers use dc:contributor
for people who do additional reporting, or individuals who would be called out for special acknowledgments, such as research assistants.” Hence the arrangement in the above table.
Dublin Core controlled vocabularies
Valid dcterms.type
and dc.type
options
dcterms.type is recommended to be set to one of the following:
Collection
An aggregation of resources. A collection is described as a group; its parts may also be separately described.
Dataset
Data encoded in a defined structure, e.g. lists, tables, and databases.
Event
A non-persistent, time-based occurrence, e.g. an exhibition, webcast, conference, workshop, open day, performance, battle, trial, wedding, tea party, conflagration.
Image
A visual representation other than text, e.g. images and photographs of physical objects, paintings, prints, drawings, other images and graphics, animations and moving pictures, film, diagrams, maps, musical notation. Note that Image may include both electronic and physical representations.
InteractiveResource
A resource requiring interaction from the user to be understood, executed, or experienced, e.g. forms on Web pages, applets, multimedia learning objects, chat services, or virtual reality environments.
MovingImage
A series of visual representations imparting an impression of motion when shown in succession, e.g. animations, movies, television programs, videos, zoetropes, or visual output from a simulation.
PhysicalObject
An inanimate, three-dimensional object or substance, e.g. a sculpture, fossil, or archæological relic. Note that digital representations of, or surrogates for, these objects should use Image, Text or one of the other types.
Service
A system that provides one or more functions, e.g. a photocopying service, a banking service, an authentication service, interlibrary loans, a Z39.50 or Web server.
Software
A computer program in source or compiled form, e.g. a C source file, MS-Windows .exe executable, or Perl script.
Sound
A resource primarily intended to be heard, e.g. a music playback file format, an audio compact disc, and recorded speech or sounds.
StillImage
A static visual representation, e.g. paintings, drawings, graphic designs, plans and maps. Recommended best practice is to assign the type Text to images of textual materials.
Text
A resource consisting primarily of words for reading, e.g. books, letters, dissertations, poems, newspapers, articles, and archives of mailing lists, and including also facsimiles or images of texts.
By default dc.type
would also be set to one of the above. Note, however, that when used with PRISM a slightly different set of options is suggested.
Valid *.format
, *.language
and *.coverage
options
Metadata in dc.format
and dcterms.format
may describe the file format, physical medium, or dimensions (size or duration) of the resource. The Dublin Core specification recommends that file formats be identified using a “controlled vocabulary” such as the list of Internet Media Types [MIME]. Some file formats commonly relevant to academic publications are: text/plain
(plain text, no formatting etc., as in a *.txt
file), text/rtf
(RTF, rich text format); text/html
(HTML); application/pdf
(PDF); application/msword
(MS Word); application/vnd.ms-powerpoint
(MS Powerpoint); and application/vnd.apple.keynote
(Apple Keynote). TeX and LaTeX are unregistered media types, but if absolutely necessary they can be denoted with application/x-tex
and application/x-latex
(respectively).
Metadata in dc.language
and dcterms.language
describes the language(s) of the resource, for which the recommended best practice is to use a “controlled vocabulary” such as RFC 4646. For example, English is “en”, and Australian English is “en-AU”; Mongolian written in Cyrillic script as used in Mongolia is represented by “mn-Cyrl-MN”; Serbian written using Latin script as used in Serbia and Montenegro is represented as “sr-Latn-CS”.
Similarly, for dc.coverage
and dcterms.coverage
the recommended best practice is to use a “controlled vocabulary” such as the Thesaurus of Geographic Names. For example, Munich (Germany) should be listed as München (Deutschland), whereas Beijing (China) should be listed as Beijing (Zhongguo) — not 北京 (中国).
Administrative Components
Dublin Core also contains an ac
namespace to specify so-called administrative metadata designed to assist with interoperability between different systems that have content metadata. As such, several of these names contain information about the metadata used in the respective system: i.e. ‘meta-metadata’!
Selected comments are included below. For full descriptions see http://biblstandard.dk/ac/ .
Metadata for the entire record
ac.identifier
A string or a number, which identifies the metadata record.
ac.source
A string or a number, which identifies the recording entity (e.g. a library, museum, archive, etc.).
ac.scope
ac.comment
ac.location
An unambiguous reference to the content metadata within a given context. This element is only used if the content metadata and administrative metadata are not in the same location.
ac.language
Language of metadata.
ac.rights
Information about rights held in and over the content metadata.
ac.dateRange
ac.handling
Metadata for update and change
ac.activity
This element reflects an action performed on the content metadata. The element functions as a container, which connects an action (of specified type) with further details about that action.
Attributes that refine the activity specification:
ac.action
The action performed on the content metadata by the responsible entity. The actions are taken from a non-exhaustive list including: created, submitted, modified, checked, link-collected, resource-harvested, resource-disappeared, expired, mail-sent and three codes for deleted (delete-error-record, delete-disappearance and delete-out-of-scope).
ac.name
The name of the entity responsible for undertaking a defined action on the content metadata. Examples of Name include a person, an organisation, or a service. Where the person has an affiliation with an organisation, this information may be included. The name of a person should be provided in reverse order, that is, last name before first name, with a comma separator.
ac.email
Electronic Mail address for the responsible entity.
ac.contact
Information on how to contact the responsible entity.
ac.date
The date on which the activity took place. This unspecified date must be used in connection with an action, e.g. submitted
.
ac.affiliation
The organization with which the named person was associated when involved with the resource.
Metadata for batch interchange of records
ac.database
ac.transmitter
ac.filename
ac.technicalFormat
ac.characterSet
ac.bibliographicFormat
Bibliographic format for data exchange (e.g. MARC21, danMARC2, DC)
ac.resultFile
Dublin Core schemata
It is appropriate to indicate each namespace used for the metadata through inclusion of a dedicated tag in the head
element.
For instance, for DC
<link rel="schema.dc" href="http://purl.org/dc/elements/1.1/">
and for DCTERMS
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" >
For AC
there is no clear guide to the appropriate schema, but using the above pattern and inserting a link to a relevant current XML standard yields
<link rel="schema.AC" href="http://biblstandard.dk/ac/schemas/ac_2011-09-01.xsd" />
Note that a few references exist setting href
to a now-defunct URL, which should avoided.
Non-standard extensions in the DC
namespace
Google Scholar’s previous (circa 2010–2011) and still current advice is to use extensions to the Dublin Core tags such as DC.citation.volume
, DC.citation.issue
, DC.citation.spage
and DC.citation.epage
. However, so long as such tags are not part of the official specification, they should not be used.
Microsoft SharePoint 2013 looks for several non-standard attribute names under the DC
namespace, such as DC.citation.volume
, DC.identifier.issn
and DC.source.issn
. The duplicate options for the ISSN are just one indication that these are not part of the official DUblin Core specification, and so should not be used.
Actual usage of these non-standard attribute
names is varied.
Additional comments on the PRISM metadata specification
PRISM controlled vocabularies
Vocabularies defined within PRISM
A few of the metadata names defined in the PRISM standard have ‘controlled vocabularies’. Below are a selection of valid values relevant to academic publications.
prism.aggregationType
: book
; journal
; magazine
; manual
; newsletter
; newspaper
; report
; whitepaper
; and other
[avoid using].
prism.contentType
: article
; bookChapter
; introduction
; and contentBlock
[to be used as ‘other’, and refined with prism.genre
].
prism.genre
: abstract
; analysis
[typical of a journal article]; appendix
[strictly intended for books, but could be used to indicate Supplementary Material or Supporting Information]; bibliography
[listing for a subject, author, etc.]; chapter
; correction
; coverStory
; essay
[expressing an author’s personal point of view]; feature
[a prominent or special article; may be suitable to indicate keynote or plenary addresses]; foreword
; glossary
; interview
; legalDocument
; letters
; preface
; qAndA
[historically a common component of conference proceedings]; references
[list of materials cited]; reprint
; response
; review
[intended for reviews of media or products, but could be adopted for academic reviews too]; and supplementArticle
[article within a supplement].
prism.issueType
: regularIssue
; and specialIssue
.
prism.platform
: email
; eReader
; print
; recordableMedia
[e.g. CD or DVD]; smartPhone
; tablet
; web
[viewable with a browser]; and other
[avoid using]. None of these are well-suited to describe the platform intended to read a PDF file: web
may be the best of these poor options, but others such as tablet
can additionally be used. This controlled vocabulary also contains an awkward mixture of hardware and software applications that are not mutually exclusive — e.g. email on a smartphone, or web browsing on a tablet.
prism.presentationType
: complexBlock
[suitable for some graphical abstracts]; gallery
[may be suitable for posters comprising graphics with some text]; infoGraphic
[may be suitable for posters that are “heavily text-oriented”]; other
; slideshow
; and video
. This is intended to describe content that can be contained within an HTML figure
element.
Sometimes multiple values are relevant to a single name
attribute, in which case the values should be entered in separate meta
tags, organised in order of decreasing priority, or “from most inclusive to most specific”.
Use of other vocabularies
For some other metadata names in PRISM it is advisable to use a controlled vocabulary, but PRISM neither provides nor references a specific lexicon.
In the case of prism.academicField
, it would be suitable to adopt a system such as that of the Australian and New Zealand Standard Research Classification (ANZSRC). For example, a publication dealing with “Wastewater Treatment Processes” (code 090409), “Water Treatment Processes” (090410) and “Water Quality Engineering” (090508) could be tagged with each of these three phrases; and/or it could the first two could be grouped as “Chemical Engineering” (0904), and the latter generalised to “Civil Engineering” (0905); and/or all of these can be covered by “Engineering” (09). Adopting the advice alluded to above, best practice would be to include all of these phrases in separate tags, organised from the most general to the most specific.
Dublin Core in the PRISM context
PRISM is expected to be used with Dublin Core metadata names in the DC
namespace. This is especially obvious when observing that PRISM uses DC.creator
to provide metadata identifying an author, rather than introducing a competing name
attribute. Thus, PRISM is designed to supplement Dublin Core, not replace it.
As shown in the above table, further information about dc.creator
or dc.contributor
can be provided with prism.role
, prism.place
, or prism.contactInfo
. In XML format (i.e. Profile 1) the appropriate syntax is:
<dc:creator prism:role="writer" prism:location="England">Jane Doe</dc:creator>
There is no evident method for implementation in HTML (or XHTML).
Valid values of the prism.role attribute that are potentially relevant to academic articles include: author
, commentator
, correspondant
[could denote a corresponding author, or the author of a letter to the editor], editor
, illustrator
, interviewee
, interviewer
, interpreter
, narrator
[could denote a presenter at a conference], other
, photographer
, researcher
, researchAssistant
, and translator
.
It is not clear whether more than one prism.role
, prism.place
, or prism.contactInfo
can be defined for a given entity (e.g. person) comprising a dc.creator
or dc.contributor
.
Extension of the PRISM specifications
Users can include their own customised elements alongside the standard PRISM elements: “PRISM is an extensible specification and includes a guide for creating your own namespace.”
PRISM Profiles and HTML implementation
PRISM Profiles
Three Profiles are defined in the PRISM metadata specification: XML-only ( EXtensible Markup Language), RDF/XML (Resource Description Framework/XML), and XMP (Extensible Metadata Platform). The XML-only profile was defined earliest, and is by far the most frequently used and most comprehensively described.
None of these are provide a clear indication of how they might be embedded in HTML. Strictly speaking, XHTML is a type of XML, and XHTML is often readable by HTML viewers, so this may be one approach.
Recommendations for use of PRISM in HTML
Besides the three Profiles for embedding metadata that form a key part of the core PRISM specification, two dependant specifications have been created that each refer to either XHTML or HTML.
PRISM Aggregator Message (PAM) is “an XML tag set that uses PRISM metadata for a very specific purpose”. The PRISM Source Vocabulary (PSV) appears to be designed to allow metadata tagging of multiple content elements within the body of a single HTML5 document. Neither of these provides for using the meta
tag within the head
element of an HTML document (HTML5 or otherwise).
The PSV Specification explicitly states, “It is the intent that all PSV metadata in the Source be captured within the <psv:metadata
block using the <psv:meta
tag and that it not be duplicated or replaced using the HTML5 <meta
tag. The HTML5 <meta
tag has therefore not been included in the model for HTML5 <head
in this PSV Specification.” while a related guide adds, “Although there is an optional <meta
tag in the HTML5 <head
structure, it is not to be used to store metadata about the article. The recommended PSV HTML5 subset definition for the <head
only allows structures such as <link
and <styles
but does not allow for the encoding of metadata. Metadata is expected to be consolidated in the PSV<metadata
block.”
Nevertheless, the official specification on the nextPub PRISM Source Vocabulary (PSV) Framework also states: “If you need to transform PSV into HTML5 to deliver for browser display, you may wish to transform some of the metadata in the <psv:metadata
block into <meta
tags in the HTML5 head. A PSV to HTML5 Transformation Guide to document the transformation of PSV XML into HTML5 for delivery to browsers is planned to be added to the PSV Documentation Set in the future.”
Compromise implementation in HTML meta
tags
In the above table all colons (used in the official PRISM standards) were replaced with dots, as apparently applied in practice. The tags would then be used within HTML meta
elements.
It must be recognised that these practices deviate from the formal PRISM specifications — at least while the foreshadowed “PSV to HTML5 Transformation Guide” remains unpublished.
PRISM schemata
It is appropriate to indicate each namespace used for the metadata through inclusion of a dedicated tag in the head
element.
For instance, for version 1.2 of PRISM
<link title="PRISM schema" rel="schema.prism" href="http://prismstandard.org/namespaces/1.2/basic/" />
(Notice the XHTML-style syntax.) The title
attribute is optional.
For version 2.1 of PRISM
<link rel="schema.prism" href="http://prismstandard.org/namespaces/basic/2.1/" />
<link rel="schema.pur" href="http://prismstandard.org/namespaces/prismusagerights/2.1/" />
etc.
For version 3.0 of PRISM
<link rel="schema.prism" href="http://prismstandard.org/namespaces/basic/3.0/" />
<link rel="schema.pur" href="http://prismstandard.org/namespaces/pur/3.0/" />
etc.
Additional comments on the Eprints metadata specification
There are very few name
attributes in the eprints
namespace — apparently only ten. Furthermore, no publicly available official standard could be found.
Additional comments on the BE Press metadata specification
BE Press metadata appears not to be commonly used. Typically the name
attributes in the bepress
namespace are practically duplicates of those in the citation
namespace (prefixed by bepress_
), except that there are fewer of them.
Rather than expand the contents of the bepress
namespace, is appears that BE Press is open to adopting labels and/or conventions from other metadata systems including Dublin Core, OpenURL, PubMed.
Recommendations for academic publications
Academic publications should be described with metadata following one of more combinations of the Dublin Core, citations
-namespace and PRISM conventions.
Dublin Core and PRISM are tightly controlled standards supported by numerous organisations.
Dublin Core is very widely used across diverse applications. Dublin Core and the citations
-namespace are very commonly used for academic publications; PRISM is also commonly used.
The citations
-namespace includes numerous name
attributes specifically relevant to academic publications. PRISM also includes many name
attributes specifically relevant to publications, albeit more focussed on popular media. Dublin Core also provides several relevant name
attributes (variously in the DC
, DCTERMS
and AC
namespaces), some of which do not appear in the citations
-namespace or PRISM.
By far the DC
namespace is more commonly used than the DCTERMS
or AC
namespaces, and — being older — has broader compatibility; therefore, when the same name
attribute exists in both the DC
and DCTERMS
namespaces, it advantageous to use that in the DC
namespace.
Other conventions
Facebook’s Open Graph protocol
Common property
attributes
Various values are defined in the Open Graph Protocol created by Facebook developers: og:title
, og:url
, og:site_name
, og:image
, og:description
, and og:type
. These do not meet the requirements for registration as “proposed” or “ratified” attributes; rather, they should be used as possible values of the enumerated variable property. This would follow the general syntax:
<meta property="og.*" content="x"/>
<head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# article: http://ogp.me/ns/article#">
<meta property="fb:app_id" content="302184056577324" />
<meta property="og:url" content="http://www.theage.com.au/story/06546512303540.html" />
<meta property="og:type" content="article" />
<meta property="og:title" content="When Great Minds Don’t Think Alike" />
<meta property="og:description" content="How does culture influence thinking?" />
<meta property="og:image" content="http://static.theage.com.au/images/0654646546.jpg" />
<head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# book: http://ogp.me/ns/book#">
<meta property="fb:app_id" content="302184056577324" />
<meta property="og:type" content="book" />
<meta property="og:url" content="http://www.domain.com/pub/Book02103.html" />
<meta property="og:title" content="Sample Book" />
<meta property="og:image" content="http://www.domain.com/pub/images/Book02103.png" />
These are sometimes — albeit rarely — used to provide metadata on academic publications.
The complete list of valid object og:type values is: article
, book
, books.author
, books.book
, books.genre
, business.business
, fitness.course
, game.achievement
, music.album
, music.playlist
, music.radio_station
, music.song
, place
, product
, product.group
, product.item
, profile
, restaurant.menu
, restaurant.menu_item
, restaurant.menu_section
, restaurant.restaurant
, video.episode
, video.movie
, video.other
, video.tv_show
.
Resource-specific property
attributes
There are further dedicated ‘fields’ available for specific resource types.
For book types:
book:author
An array of the Facebook IDs of the users that authored the book.
book:isbn
book:release_date
book:tag
Keywords.
For books.book:
books:author
An array of references to the objects representing the authors of the book.
books:genre
An array of references to the objects representing the genres of the book.
books:initial_release_date
A time representing when the book was initially released.
books:isbn
(Required.)
books:language
books:page_count
books:rating
The rating of the book.
books:release_date
books:sample
A URL of a sample of the book
For article types:
article:author
An array of Facebook profile URLs or IDs of the authors for this article.
article:content_tier
Specification of whether article is free, locked, or metered.
article:expiration_time
article:modified_time
article:published_time
article:publisher
A Facebook page URL or ID of the publishing entity.
article:section
The section of your website to which the article belongs, such as 'Lifestyle' or 'Sports'.
article:tag
Keywords.
It is evident that these tags are not useful in general for academic publications, given that many authors either won’t have a Facebook account, or will have a private Facebook account that they do not wish to refer to in professional circumstances.
Database-specific conventions
Several other databases have their own conventions.
PubMed
For instance, the U.S. PubMed website (pubmed.gov) has both abbreviated tags for simple searches and longer tags for more advanced searches — e.g., [au]
versus [Author]
, [Author - First]
, [Author - Last]
, [Author - Full]
, [Author - Identifier]
and [Author - Corporate]
when searching author names . However, in their HTML encoding they are much less particular. Thus a seven-page 2018 article by Johnson & Key is tagged with
ncbi_uidlist = 30270231
author = Johnson JE , et al.
description = Prog Community Health Partnersh. 2018;12(2):215-221. doi: 10.1353/cpr.2018.0041.
Notice that the second author’s name is omitted in the author
tag; the description
tag is extremely vague, and mixes dates, volumes, issues and page numbers; and the ncbi_uidlist
tag is quite uninformative outside of PubMed. (Note: NCBI refers to the U.S. National Center for Biotechnology Information, part of the U.S. National Library of Medicine (NLM), which manages PubMed.)
Thomson Reuters
Thomson Reuters, who operate Web of Science and Journal Citation Reports, provide a specification for queries of their databases using the OpenURL resolver. These include
rft_id
(info:doi
, info:pmid
, info:ut
), rft.atitle
, rft.jtitle
, rft.btitle
, rft.issn
, rft.isbn
, rft.date
[actually the year], rft.volume
, rft.issue
, rft.spage
, rft.epage
, rft.aulast
, rft.aufirst
, rft.auinit
, rft.auinitm
, and rft.au
.
These are not intended to be used in HTML. However, some analogous metadata tags of the form rft_id
, rft_issn
etc. have been applied on some web pages.
Appendix
Yet other conventions
Yet other conventions
Other conventions relate to news and digital books (or other texts): NewsML (IPTC-NEWSML); NITF (IPTC-NITF); DocBook; ePub.
COinS (ContextObjects in Spans) / OpenURL Framework — the Key/Encoded-Value (KEV) Format
Another method of providing metadata is to embed COinS (ContextObjects in Spans) containing OpenURL strings into a webpage describing or comprising the resource. This is supported by Mendeley for resources in four formats, namely fmt:kev:mtx:journal
, fmt:kev:mtx:book
, fmt:kev:mtx:dissertation
, fmt:kev:mtx:dc
.
“Every ContextObject
must have a Referent, the referenced resource for which the ContextObject
is created. Within the scholarly information community the Referent
will probably be a document-like object, for instance: a book or part of a book; a journal publication or part of a journal; a report; etc.”
The metadata keys have the format rft.*
.
To add a COinS to an HTML document, put a NISO 1.0 "ContextObject" into the "title
" attribute of an HTML span
element with class
attribute set to "Z3988"
. (The official designator for the NISO OpenURL standard is Z39.88-2004). Example:
<span class="Z3988" title="ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.issn=1045-4438"></span>
The OpenURL specification can also be applied to XHTML. For compatibility with HTML browsers, empty span
elements should NOT be minimized.
Subject and topic classifiers
Several subject & topic classifiers are listed as “Vocabulary Encoding Schemes” in the DCMI Metadata Terms documentation. They include: DCMIType, DDC, IMT, LCC, LCSH, MESH [or MeSH], NLM, TGN, and UDC.
Single (broad) subject
Universal Decimal Classification
...
U.S. Library of Congress
...
Multiple (narrow) topics
MeSH (Medical Subject Headings)
There are three basic types of MeSH Records: Descriptors, Qualifiers, and Supplementary Concept Records (SCRs).
These are hierarchical. For example
Congenital Abnormalities C16.131
Abnormalities, Drug Induced C16.131.042
Abnormalities, Multiple C16.131.077
22q11 Deletion Syndrome C16.131.077.019
DiGeorge Syndrome C16.131.077.019.500
Alagille Syndrome C16.131.77.65
Alstrom Syndrome C16.131.77.80
Angelman Syndrome C16.131.77.95
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.