Thumbnail Correctly paging when searching SharePoint

Correctly paging when searching SharePoint

Ok, so the other day I was working on a project where I needed to search across the SharePoint tenant. I needed to find a wide range of documents and load them in Power BI for reporting purposes. I was using PnP.PowerShell to do this, and it seemed to work fine.

# Searching SharePoint using PnP.PowerShell is easy.
Invoke-PnPSearchQuery -Query $searchQuery

But the customer complained some documents were missing in their Power BI Report. I checked the code, it seemed fine. I checked the search index (by searching for the missing files manually), and found them just where they were expected to be.

So I dug in and quickly got some strange results. My search was returning 100.000 result rows, but if I compared the URL’s, only 36.000 of those result rows appeared to be unique. Running the search again and again got me a wide range of different results. Sometimes I found 60.000 of the 100.000 were unique files, at other times it was 30.000. Where were all those duplicates coming from? Even switching TrimDuplicates to true didn’t help.

For my scenario, I was using my own paging solution to page through the result set, something like this:

$itemCount = 0
while($more) {
    $response = Invoke-PnPSearchQuery -Query $searchQuery -MaxResults 500 -StartRow $itemCount -SelectProperties $properties
    $itemCount = $itemCount + $response.ResultRows.Count
    # Do something with the results
}

Now I know there’s an -All parameter on the Invoke-PnPSearchQuery commandlet, but I couldn’t use that because I ran this search in an Azure Automation Runbook, which would crash with memory issues if I tried to load all 100.000 results in memory at once.

However, just for debugging purposes I tried using it anyway. I discovered that this in fact did give me a 100.000 unique results! Diving into the codebase of PnP.PowerShell, I found that the commandlet is sorting the search by a internal SharePoint search property called [DocId], and requesting only those files where the IndexedDocId is greater than the last DocId of the previous request.

This is a very specific way of paging, and it turns out there is a reason for it. This piece of intriguing Microsoft Documentation suggests that paging the Search results using the StartRow property is not trustworthy on large result sets. Which is what I was experiencing, even if the documentation does not mention the duplicated results I was running into.

I adapted my paging just like PnP.PowerShell was doing it, keeping the StartRow to zero and using the last DocId of the previous request. And now I’m finally getting all the results I need.

$lastDocId = "0"
while($more) {
    $response = Invoke-PnPSearchQuery -Query "$searchQuery IndexDocId>$lastDocId" -MaxResults 500 -StartRow 0 -SelectProperties "$properties,DocId" -SortList @{ "[DocId]" = "ascending" }
    $lastDocId = $rows[$rows.Count - 1].DocId
    # Do something with the results
}

🥳 Problem solved!

The Graph API seems not to have the same problem. I was able to safely use the from and size properties to page through the results, and I got back all the 100.000 unique result rows that I expected. But of course we can use it the same way as we did with PnP.PowerShell, as you can see in the following example:

POST https://graph.microsoft.com/v1.0/search/query
Content-Type: application/json
{
    "requests": [
        {
            "entityTypes": [
                "driveItem"
            ],
            "fields": [
                "DocId",
                "Url"
            ],
            "query": {
                "queryString": "ContentTypeId:0x0101* IndexDocId>0"
            },
            "sortProperties": [
                {
                    "name": "[DocId]",
                    "isDescending": false
                }
            ]
        }
    ]
}

It took me some time before I discovered what was going on, I hope this article will help some people avoid the same API pitfall.

Happy coding!


sharepoint search api microsoft-graph
Support me by sharing this

More

More blogs

Quick tip: customizing the SharePoint site search experience
Quick tip: customizing the SharePoint site search experience

Creating a customized SharePoint site search experience using PnP Modern Search and PnP PowerShell.

Read more
Building a SharePoint New Site Form Look-Alike
Building a SharePoint New Site Form Look-Alike

A post on building a SPFx Form Customizer with a Dynamic Form with field overrides to create an experience that looks like the default SharePoint new site form.

Read more
Automating Purview data retention using Azure Functions
Automating Purview data retention using Azure Functions

An example of how to automatically apply Purview retention labels using Azure Functions.

Read more

Thanks

Thanks for reading

Thanks for reading my blog, I hope you got what you came for. Blogs of others have been super important during my work. This site is me returning the favor. If you read anything you do not understand because I failed to clarify it enough, please drop me a post using my socials or the contact form.


Warm regards,
Martin

Microsoft MVP | Microsoft 365 Architect

Microsoft MVP horizontal