Protocol Transport
SECTIONS: SRU via HTTP
GET | Encoding Issues | SRU
via HTTP POST | SRU via HTTP SOAP (formerly
SRW)
SRU VIA HTTP GET
The client MAY send an SRU request via the HTTP GET method. A URL is
constructed and sent to the server with fixed parameter names with
fixed meanings. When unicode characters need to be encoded, there are
some additional constraints, discussed below.
The response MUST be XML conforming to the response
schema of the operation. SRU via HTTP GET can thus be described as the
simplest case of XML over HTTP.
An example of what might pass over the wire:
GET /lcdb?version=1.2&operation=searchRetrieve&query=dinosaur HTTP/1.1
Host: lx2.loc.gov:210/LCDB
Syntax
An SRU request (when transported via HTTP GET) is a URI as described in RFC 3986 (See Note 1). Specifically it is an HTTP URL (as described in section 3.3 of RFC 1738; however there are some further notes about character encoding below, and uses the standard & separated key=value encoding for parameters in the query part of the URI.
The
parameters for the query section of the URL (the information following
the question mark) of the various operations are described in their own
sections.
ENCODING ISSUES
The following encoding procedure is recommended, in particular, to accomodate
Unicode characters (characters from the Universal Character Set, ISO 10646)
beyond U+007F, which are not valid in a URI. This is normally relevant only to the query parameter of the searchRetrieve operation and the scanClause parameter of the scan operation.
- Convert the value to UTF-8.
- Percent-encode characters as necessary within the value.
See rfc 3986 section
2.1.
- Construct a the URI from the parameter names and encoded values.
Note: In step 2, it is recommended to percent-encode every character in
a value that is not in the URI unreserved set, that is, all
except alphabetic characters, decimal digits, and the following four special
characters: dash(-), period (.), underscore (_), tilde (~). By this procedure
some characters may be percent-encoded that do not need to be -- For example
'?' occurring in a value does not need to be percent encoded, but it
is safe to do so. If in doubt, percent-encode.
Examples
Consider the following parameter:
query=dc.title =/word kirkegård
The name of the parameter is "query" and the value is "dc.title =/word kirkegård"
Note that the first '=' (following "query") must not be
percent encoded as it is used as a URI delimeter, it is not part of a parameter
name or value. The second '=' (preceding the '/') must be percent encoded
as it is part of a value.
The following characters must be percent encoded:
- the second '=', percent encoded as %3D
- the '/', percent encoded as %2F
- the spaces, percent encoded as %20
- the 'å'. Its UTF-8 representation is C3A5, two octets,
and correspondingly it is represented in a URI as two characters percent
encoded as %C3%A5.
The resulting parameter to be sent to the server would then be:
query=dc.title%20%3D%2Fword%20kirkeg%C3%A5rd
Server Procedure
- Parse received request based on '?', '&', and '=' into component parts: the base URL, and parameter names and values.
- For each parameter:
- Decode all %-escapes.
- Treat the result as a UTF-8 string.
Notes:
1. RFC 1738 is obsoleted by RFC 3986. However, RFC 1738 describes
the 'http:' URI scheme; RFC 3986 does not, instead indicating that
a separate document will be written to do so, but it has not yet been
written. So currently there is no valid, normative reference for the
'http:' URI scheme, and so the obsolete RFC 1738 is referenced. When
there is a valid, normative reference, it will be listed here.
SRU VIA HTTP POST
Instead of constructing a URL, the parameters may be sent via POST to
the server. The Content-type header MUST be set to
'application/x-www-form-urlencoded'. Compare to 'text/xml' for SRU via
SOAP below, which can be used to distinguish the two transports at the
same end point.
POST has several benefits over GET for transfering the
request to the server. Primarily the issues with character encoding in
URLs are removed, and an explicit character set can be submitted in the
Content-type HTTP header. Secondly, very long queries might generate a
URL for HTTP GET that is not acceptable by some web servers or client.
This length restriction can be avoided by using POST.
The response for SRU via POST is identical to that of SRU via GET, an xml document.
An example of what might be passed over the wire in the request:
POST /lcdb HTTP/1.1
Host: lx2.loc.gov:210
Content-type: application/x-www-form-urlencoded; charset=iso-8859-1
Content-length: 51
version=1.1&operation=searchRetrieve&query=dinosaur
SRU VIA HTTP SOAP
(Note: "SRU via HTTP SOAP " is the former SRW)
SRU via SOAP is a binding to the SOAP recommendation of the W3C.
In this transport, the request is encoded in XML and wrapped in some
additional SOAP specific elements. The response is the same XML as SRU
via GET or POST, but wrapped in additional SOAP specific elements.
The incremental benefits of SRU via SOAP are the ease of structured
extensions, web service facilities such as proxying and request
routing, and the potential for better authentication systems.
SOAP Requirements
- Clients
and servers MUST support SOAP version 1.1, and MAY support version 1.2
or higher. This requirement is allow as much flexibility in
implementation as possible.
- The service style is 'document/literal'.
- Messages MUST be inline with no multirefs.
- The
SOAPAction HTTP header may be present, but should not be required. If
present its value MUST be the empty string. It MUST be expressed as:
SOAPAction: ""
- As
specified by SOAP, for version 1.1 the Content-type header MUST be
'text/xml'. For version 1.2 the header value MUST be
'application/soap+xml'. End points supporting both versions of SOAP and
SRU via POST thus have three content-type headers to consider.
The specification tries to adhere to the Web
Services Interoperability recommendations.
Parameter Differences
There are some differences regarding the parameters that can be transported via the SOAP binding.
- The 'operation' request parameter MUST NOT be sent. The operation is determined by the XML constructions employed.
- The 'stylesheet' request parameter MUST NOT be sent. SOAP prevents the use of stylesheets to render the response.
Example SOAP request:
<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP:Body>
<SRW:searchRetrieveRequest xmlns:SRW="http://lcnetdev.github.io/zing/srw/">
<SRW:version>1.1</SRW:version>
<SRW:query>dinosaur</SRW:query>
<SRW:startRecord>1</SRW:startRecord>
<SRW:maximumRecords>1</SRW:maximumRecords>
<SRW:recordSchema>info:srw/schema/1/mods-v3.0</SRW:recordsSchema>
</SRW:searchRetrieveRequest>
</SOAP:Body>
</SOAP:Envelope>
|