Thursday, March 19, 2015

Xquery performance tips

  • Avoid the use of double slashes (“//”)
  • Index XPath expressions where applicable.  For example, if you know that there is only one “Order” and only one “Address” then using an XPath expression like “$body/Order[1]/Address[1]” instead of “$body/Order/Address”
  • Using predicate is less expensive than where clause e.g. students//[gender=”male”] instead of where $students/gender=”male”.
  • Because sorting is expensive, you should not use the order by clause in FLOWR expressions more than is necessary. In addition, you should minimize the size of the sequence values that are sorted.Beware that some set operations (e.g. Union, intersect, except) will run sort implicitly, so avoid sortingwusing “unordered” expression. e.g. unordered { let $smartstudents := $students//(science|engineering)} instead of let $smartstudents := $students//science| $student//engineering
  • Flatten your query - This means push your where clause into the xpath at the beginning to limit the amount of work you have to do once you’re inside the “for” statement.
  • Extract frequently used parts of a large XML document as intermediate variables.  This will consume more memory, but will reduce redundant XPath processing.  For example: let $customer := $body/Order[1]/CustomerInfo[1]
  • Avoid reduplication of processing (calculate, or walk down the same path) e.g., instead of: {$x/categories/foodstuffs/[@type = 'fish']}
    {$x/categories/foodstuffs/[@type = 'bread']}
    use:  
    let $foodstuffs := $x/categories/foodstuffs return
        {$foodstuffs/[@type = 'fish']}
        {$foodstuffs/[@type = 'bread']}
    This can save a significant amount of time especially if $x or $x/categories or $x/categories/foodstuffs has many children.
  • It might be good to do precalculations e.g. you could put all id’s for docs in a certain state inside of a single doc, which is much cheaper to query than many.
  • Since the xquery processor has exponential complexity, it will appear very fast with small data but much worse with large data, so your XQuery performance should be tested with large payloads whenever possible. Limit your data set - Do whatever it takes. You just want to get to the smallest set of data possible in the most direct way. E.g.putting the most specific predicate first.
Its performance improvement depends on the xquery processors but worth to try:
  • Move ordinals to the end of path expressions e.g./book/@isbn)[1] instead of /book[1]/@isbn
  • Avoid predicates in the middle of path expressions e.g.  /book[@ISBN = "1-8610-0157-6"] "n" /book/author[first-name = "Davis"] intead of  book[@ISBN = "1-8610-0157-6"]/author[first-name = "Davis"]
References:
General tips about performances: http://soa-java.blogspot.nl/2013/01/software-guidelines-performance.html