Wednesday, July 18, 2018

Citation.js: Use Case for a Wikidata GraphQL API

Citation.js: Use Case for a Wikidata GraphQL API

Citation.js has supported Wikidata input for a long time. However, I’ve always had some trouble with the API. See, when Citation.js processes Wikidata API output (which looks like this) and gets to, say, the P50 (author) property, it encounters this:

"P50": [
	{
		"mainsnak": {
			"snaktype": "value",
			"property": "P50",
			"hash": "1202966ec4cf715d3b9ff6faba202ac6c6ac3df8",
			"datavalue": {
			"value": {
				"entity-type": "item",
				"numeric-id": 2062803,
				"id": "Q2062803"
			},
			"type": "wikibase-entityid"
			},
			"datatype": "wikibase-item"
		},
		"type": "statement",
		"id": "Q46601020$692cc18d-4f54-eb65-8f0a-2fbb696be564",
		"rank": "normal"
	}
]

The problem with this is that there’s no name string readily available: to get the name of this author, and of any author, journal, publisher, etcetera, Citation.js has to make extra queries to the API, to get the data.

In the case of people, you could then just grab the label, but there’s also P735 (given name) and P734 (family name) in Wikidata. That saves some error-prone name parsing, you might think. However, this is what the API output looks like:

{
    "P735":[
        {
            "mainsnak":{
                "snaktype":"value",
                "property":"P735",
                "hash":"26c75e68a9844db73d0ff2e0da5652c5d571e46d",
                "datavalue":{
                    "value":{
                        "entity-type":"item",
                        "numeric-id":15635262,
                        "id":"Q15635262"
                    },
                    "type":"wikibase-entityid"
                },
                "datatype":"wikibase-item"
            },
            "type":"statement",
            "id":"Q22581$3554EADD-B8D8-4506-905B-014823ECC3EA",
            "rank":"normal"
        }
    ],
    "P734":[
        {
            "mainsnak":{
                "snaktype":"value",
                "property":"P734",
                "hash":"030e6786766f927e67ed52380f984be79d0f6111",
                "datavalue":{
                    "value":{
                        "entity-type":"item",
                        "numeric-id":41587275,
                        "id":"Q41587275"
                    },
                    "type":"wikibase-entityid"
                },
                "datatype":"wikibase-item"
            },
            "type":"statement",
            "id":"Q22581$598DF0D7-CEC7-470B-8D0F-DD320796BF01",
            "rank":"normal"
        }
    ]
}

Another two dead ends, another two (one, with some effort) API calls. It would be great if it was possible to get this data with a single API call. I think GraphQL would be a good option here. With GraphQL, you can specify exactly what data you want. I’m not the first one to think of this; in fact, a simple example is already implemented. This is what a query would look like (variables: {"item": "Q30000000"}): Try it online!

query ($item: ID!) {
  entry: item(id: $item) {
    # get every property
    # to get specific properties, use "statements(properties: [...])"
    claims: statements {
      mainsnak {
        ... on PropertyValueSnak {
          # get property id and English label
          property {
            id
            name: label(language: "en") {
              text
            }
          }
          # get value
          value {
            ... on MonolingualTextValue {
              value: text
            }
            ... on StringValue {
              value
            }
            # if value is an item, get the label too
            ... on Item {
              id
              label(language: "en") {
                text
              }
            }
            ... on QuantityValue {
              amount
              unit {
                value: label(language: "en") {
                  text
                }
              }
            }
            ... on TimeValue {
              value: time
            }
          }
        }
      }
    }
  }
}

Another handy thing is that the API output is basically the equivalent of the query in JSON, but with the data filled in. I think a GraphQL API would be really useful for this and similar use cases, and it definitely seems possible given the fact that there is an experimental API available.

No comments:

Post a Comment