SRU (Search/Retrieval Using URL)

Relevance Ranking Context Set version 1.1

Version 1.1, 2nd September 2009
see also version 1.0

The default ordering of a result set is left up to the server, including a lack of any explicit ordering. This is addressed in SRU for the most part through the use of the 'sort' / 'sortKeys' parameter in SRU v1.1 and by the 'sortBy' keyword in SRU v1.2 queries. However,for sophisticated relevance based ranking, different algorithms are available, and specific methods might be requested to combine the results of evaluating each operand or clause. This context set attempts to address this issue by defining relation and boolean modifiers for the various known algorithms, and combinations of their results. Several known algorithms have their documentation linked in the table in Appendix A below.

If the 'relevant' relation modifier from the cql context set is given, but no named algorithm, then the server should continue to use the basic semantics -- the server may decide which algorithm to use. It is also legal to include both cql.relevant along with an algorithm from this set, in which case that algorithm should be used. Hence there is no need to include an 'any algorithm' relation modifier in this set.

Also, please note that, as with all context sets, these modifiers are case insensitive. "rel.algorithm=CORI" and "rel.algorithm=cori" are to be treated the same. This is especially true as most of the modifiers are acronyms so may be entered in upper case into queries, even though they are listed in lower case below.

To return relevancy information attached to a record, please see the record metadata extension. (To be written up, ala 'rec' context set)

  • The identifier for the context set is: info:srw/cql-context-set/2/relevance-1.1
  • The recommended short name is: rel
  • The maintainer of the context set is: john.harrison@liv.ac.uk
Sections: Indexes | Relations | Relation Modifiers | Booleans | Boolean Modifiers | Examples

Indexes

There are no indexes defined in this context set.

Relations

There are no relations defined in this context set.

Relation Modifiers

Modifier Name Description
algorithm The algorithm to be used to assign relevance scores to results (see table in Appendix A for examples).
combine The method to be used to combine scores generated for individual operands (see table in Appendix B for examples).
feedback Apply blind relevance feedback to increase recall.
minRaw The minimum raw score that must be achieved (after scores from individual operands have been combined) to be included in results.
minScaled The minimum scaled score that must be achieved (after scores from individual operands have been combined) to be included in results. Scaled scores are proportionate to the highest score. 0 <= scaledScore <= 1 .
const_* A named constant relevant to the algorithm, eg const_k=0.7 This allows constants to be overridden for specific queries or indexes in order to either ensure consistency across servers or to fine tune the results.

Booleans

There are no booleans defined in this context set.

Boolean Modifiers

Modifier Name Description
combine Method to be used to combine scores generated for individual clauses.
minRaw The minimum raw score that must be achieved (after scores from individual clauses have been combined) to be included in results.
minScaled The minimum scaled score that must be achieved (after scores from individual clauses have been combined) to be included in results. Scaled scores are proportionate to the highest score. 0 <= scaledScore <= 1 .
const_* A named constant relevant to the algorithm, as in Relation Modifiers.

Examples

Some examples of how the context set might be used.

    dc.title any/rel.algorithm=lr "fish squid burger cheese"    
cql.anywhere all/rel.algorithm=cori "sanderson denenberg"
or/rel.combine=mean dc.description any/rel.algorithm=cori "information retrieval"
dc.title any/rel.algorithm=lr/rel.const_c0=-0.705 "logistic regression relevance ranking techniques"

Appendix A - Relevance Score Assignment Algorithms

Modifier Value Description
lr Logistic Regression algorithm from UC Berkeley
cori CORI algorithm of Callan et al. (Carnegie Mellon)
okapi OKAPI BM-25 of Robertson et al. (City University, London)
gloss Glossary of Servers of Gravano et al. (Stanford)
ggloss Generalised Glossary of Servers
dtf-cori Decision-Theoretic Framework extension to CORI of Fuhr, Nottelmann (University of Duisburg-Essen)
redde Relevant Document Distribtion Estimation of Callan et al. (Carnegie Mellon)
cdr Cover Density Ranking
pagerank Google's PageRank algorithm of Brin, Page (ex Stanford)
hilltop The Hilltop algorithm of Bharat, Milahila (Google, University of Toronto)

Appendix B - Relevance Score Combination Methods

Modifier Value Description
sum Add the values
mean Average the values
nsum Normalised the summed values
cmbz Normalise and rescale values
max Select maximum value
min Select minimum value
nprv Normalise values and privilege high ranked documents
pivot Normalise sub-record retrieval scores based on document scores