Tuesday, October 27, 2009

Counting on SPARQL Aggregates

This is largely a note to myself so I can remember the syntax for a DISTINCT count which is somewhat unintuitive. The regular syntax to get a count:
  SELECT (count(?foo) AS ?count)
WHERE { ... }
However, this will not be a unique count of the ?foo items. To make the count DISTINCT the syntax is:
  SELECT (count(DISTINCT ?foo) AS ?count)
WHERE { ... }
Not:
  SELECT DISTINCT (count(?foo) AS ?count)
WHERE { ... }
where the DISTINCT in this case has no impact whatsoever.

The use of the "AS" keyboard is itself unintuitive as it breaks the norm for how variable assignment is expressed in the WHERE clause. In fact I think it would be good to drop it and keep the SPARQL vocabulary minimal. The only use of "AS" that I've encountered is to assign a variable in a SELECT statement in the form:
  SELECT (count(?foo) AS ?count) ...
The oddity here is that the variable ?count is assigned on the right hand side of the operator, "AS", instead of the left hand side as we've gotten used to with LET functions. Why not use LET here as well? For example:
  SELECT LET(?count := count(?foo)) ...