[MarkLogic Dev General] Surprising slowness of cts:uri-match

Discussion:

Rachel Wilson

2014-10-16 16:25:43 UTC

In our experience cts:uri-match is surprisingly slow. For example when profiling a pretty complicated query taking 0.7 seconds, the single cts:uri-match() call takes 70-80% of the total time. (Shallow% and Deep% being the same)

But we thought it should be reading the URI lexicon and so in a database with only 483,475 docs should be lightening fast. We've had to stop using cts:uri-match calls in loops for this reason.

Are there any match patterns to be avoided perhaps? Wildcards in the middle of the pattern, rather than trailing wildcards for example?

Rachel Wilson

2014-10-22 16:05:11 UTC

Permalink

Hi,

I was wondering if anyone had a reply to this.

We're digging even deeper into improving our performance for an API and in several places (because we use it liberally) cts:uri-match ends up being the bottleneck. We are happy to redesign our data and queries where we can to avoid it, but it continues to surprise us that this is the case because we thought the uris are indexed and the function is designed to use wildcards because it's a matcher.

A typical call would be

let $uris := cts:uri-match("/project/" || $projectId ||"/jobs/*",

But we're most surprised by this one, we used as a test, because there aren't even any wildcards.

let $thereShouldBeOnlyOne := cts:uri-match("/project/" || $projectId || "/content/" || $contentId)

Some insight into the inner workings of that function would be great

From: Rachel Wilson <***@bbc.co.uk<mailto:***@bbc.co.uk>>
Date: Thursday, 16 October 2014 17:25
To: MarkLogic Developer Discussion <***@developer.marklogic.com<mailto:***@developer.marklogic.com>>
Subject: Surprising slowness of cts:uri-match

In our experience cts:uri-match is surprisingly slow. For example when profiling a pretty complicated query taking 0.7 seconds, the single cts:uri-match() call takes 70-80% of the total time. (Shallow% and Deep% being the same)

But we thought it should be reading the URI lexicon and so in a database with only 483,475 docs should be lightening fast. We've had to stop using cts:uri-match calls in loops for this reason.

Are there any match patterns to be avoided perhaps? Wildcards in the middle of the pattern, rather than trailing wildcards for example?

Danny Sokolsky

2014-10-22 16:36:42 UTC

Permalink

Hi Rachel,

Can you pass a cts:query into your cts:uri-match call?

How many forests do you have? More forests might help depending upon what you are doing.

But if all of your URIs in your db follow this pattern, ultimately it is going to have to search through a lot of URIs. You could make your URI space a little more selective which might speed it up. Maybe the strings in your URIs are all very similar (the URI match is essentially a string compare)?

What kind of hardware are you running on? The speed of your memory and cpu can be a factor here too.

-Danny

From: general-***@developer.marklogic.com [mailto:general-***@developer.marklogic.com] On Behalf Of Rachel Wilson
Sent: Wednesday, October 22, 2014 9:05 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Surprising slowness of cts:uri-match

Hi,

I was wondering if anyone had a reply to this.

We're digging even deeper into improving our performance for an API and in several places (because we use it liberally) cts:uri-match ends up being the bottleneck. We are happy to redesign our data and queries where we can to avoid it, but it continues to surprise us that this is the case because we thought the uris are indexed and the function is designed to use wildcards because it's a matcher.

A typical call would be

let $uris := cts:uri-match("/project/" || $projectId ||"/jobs/*",

But we're most surprised by this one, we used as a test, because there aren't even any wildcards.

let $thereShouldBeOnlyOne := cts:uri-match("/project/" || $projectId || "/content/" || $contentId)

Some insight into the inner workings of that function would be great

From: Rachel Wilson <***@bbc.co.uk<mailto:***@bbc.co.uk>>
Date: Thursday, 16 October 2014 17:25
To: MarkLogic Developer Discussion <***@developer.marklogic.com<mailto:***@developer.marklogic.com>>
Subject: Surprising slowness of cts:uri-match

In our experience cts:uri-match is surprisingly slow. For example when profiling a pretty complicated query taking 0.7 seconds, the single cts:uri-match() call takes 70-80% of the total time. (Shallow% and Deep% being the same)

But we thought it should be reading the URI lexicon and so in a database with only 483,475 docs should be lightening fast. We've had to stop using cts:uri-match calls in loops for this reason.

Are there any match patterns to be avoided perhaps? Wildcards in the middle of the pattern, rather than trailing wildcards for example?

Michael Blakeley

2014-10-22 23:42:55 UTC

Permalink

It seems like those use cases could be implemented more efficiently without uris-match. The first one could be done with cts:uris and cts:directory-query. The second could use exists(doc($uri)), or cts:uris with cts:directory-query.

Depending on the work uris-match decides to do, it may need to scan the entire uri lexicon for matches. That's O(n) with the number of URIs, probably something like 1M/sec.

-- Mike

Post by Rachel Wilson
Hi,
I was wondering if anyone had a reply to this.
We're digging even deeper into improving our performance for an API and in several places (because we use it liberally) cts:uri-match ends up being the bottleneck. We are happy to redesign our data and queries where we can to avoid it, but it continues to surprise us that this is the case because we thought the uris are indexed and the function is designed to use wildcards because it's a matcher.
A typical call would be
let $uris := cts:uri-match("/project/" || $projectId ||"/jobs/*",
But we're most surprised by this one, we used as a test, because there aren't even any wildcards.
let $thereShouldBeOnlyOne := cts:uri-match("/project/" || $projectId || "/content/" || $contentId)
Some insight into the inner workings of that function would be great
Date: Thursday, 16 October 2014 17:25
Subject: Surprising slowness of cts:uri-match
In our experience cts:uri-match is surprisingly slow. For example when profiling a pretty complicated query taking 0.7 seconds, the single cts:uri-match() call takes 70-80% of the total time. (Shallow% and Deep% being the same)
But we thought it should be reading the URI lexicon and so in a database with only 483,475 docs should be lightening fast. We've had to stop using cts:uri-match calls in loops for this reason.
Are there any match patterns to be avoided perhaps? Wildcards in the middle of the pattern, rather than trailing wildcards for example?
_______________________________________________
General mailing list
http://developer.marklogic.com/mailman/listinfo/general