20.4 Optimizing Searches

Whether you are searching Active Directory using filters or with SQL, there are some important guidelines to follow that can help reduce load on the domain controllers, increase performance of your scripts and applications, and reduce the amount of traffic generated on the network. It is also important to socialize these concepts with others as much as possible. It takes only a couple of badly written search filters in a heavily used application to severely impact the performance of your domain controllers!

20.4.1 Efficient Searching

Understanding how to write efficient search criteria is the first important step to optimizing searches. By understanding a few key points, you can greatly improve the performance of your searches. It is also important to reuse data retrieved from searches or connections to Active Directory as much as possible. The following list describes several key points to remember about searching:

Use at least one indexed attribute per search. Certain attributes are marked as "indexed" in Active Directory, which allows for fast pattern matching. They are typically single-valued and unique, which means searches using indexed attributes can determine which objects match them very quickly. If you don't use indexed attributes, the database equivalent of a full table scan must be done to determine the matches.
Use a combination of objectclass and objectcategory in every search. While most of the queries used so far in this chapter have used only objectclass, you should make it a practice always to use a combination of objectclass and objectcategory. The problem with using only objectclass is that it is not indexed because it is multivalued and not unique, while objectcategory is single-valued and indexed. See the next section Section 20.4.2 for more information.
Try to limit the use of trailing (name=*llen) or middle match (name=*lle*) searches. Unlike other directories, Active Directory is not optimized to handle these types of searches, and they should be avoided if possible. In some cases these types of searches can take upwards of 10-15 seconds to complete under Windows 2000!
Use the appropriate search scope. Avoid using subtree searches unless you truly want to search more than one level down. If you only want to search directly below the search base, use the OneLevel scope.
Use paged searching for queries that can potentially return thousands of entries. Most subtree searches should have paging enabled unless you are positive the search will not return more than 1,000 entries or do not want it to return more than 1,000 entries.
Reuse ADO Connection and Command objects as much as possible. ADO Connection and Command objects can be used for multiple searches so there is no need to create additional ones.

20.4.2 Objectclass Versus Objectcategory

It is very important to understand the differences between objectclass and objectcategory and how they should be used during searches. Objectclass is a multi-valued attribute that contains the objectclass hierarchy for an instantiated object. For example, a user object has the following values as part of its objectclass attribute:

top
person
organizationalPerson
user

That is because the user class inherits from the organizationalPerson class, which inherits from the person class, which inherits from the top class. When a class inherits from another, the attributes of the inherited class (also known as the parent class) are available for the inheriting class to use. A class can inherit attributes from abstract and structural classes, which would show up in the objectclass attribute for an instantiated object, but auxiliary classes that get associated with a particular class do not. That's because classes do not inherit attributes from auxiliary classes the way they do from structural and abstract classes. Auxiliary classes allow for a grouping of attributes to be associated with one or more classes in a similar manner to just adding attributes directly to a class's definition.

Objectcategory, on the other hand, is a single-value indexed attribute, which specifies a classification for a type of object. Objectcategory is intended to be an easy way to query for a certain "category" of objects, such as "Person". As an example, both user and contact objects have an objectcategory of Person, so by simply searching for (objectcategory=Person), you could possibly retrieve user or contact objects.

In practice, it is pretty unlikely that you would want to use objectcategory as a means to query a certain category of objects. Also, the majority of objects in Active Directory have an objectcategory that is the same as the objectclass in which they were instantiated, making classification applicable only in a few cases.

Nevertheless, most queries should in fact use a combination of objectclass and objectcategory as part of the search filter or SQL. One of the primary reasons for not using just objectclass is that it is not indexed and is multivalued, which does not make for an efficient query. The other classic problem with using only objectclass is that you can end up with more object types than you were expecting. This is a common problem with using (objectclass=user). You would think you'd only get user objects back using that filter, but you can also potentially get computer objects as well, since the computer objectclass is inherited from the user class (therefore causing it to be one of the values for the objectclass attribute for every computer object). And even though it would be efficient to use only objectcategory because it is indexed, it falls into the same trap as objectclass, because additional objects other than the one you are targeting may get returned (e.g., user objects and contact objects). It is for these reasons that you should try always to use a combination of objectclass and objectcategory in your searches.

Several examples are included next to illustrate what using various combinations of objectclass and objectcategory can return:

People (i.e., Users and Contacts)

(objectcategory=person)

Contacts

(&(objectclass=contact)(objectcategory=person))

Users

(&(objectclass=user)(objectcategory=person))

Users and computers (not optimized)

(objectclass=user)

Users and computers (optimized)

(&(|(objectcategory=person)(objectcategory=computer))(objectclass=user))

Groups

(&(objectclass=group)(objectcategory=group))

Containers

(&(objectclass=container)(objectcategory=container))

Organizational Units

(&(objectclass=organizationalunit)(objectcategory=OrganizationalUnit))

20.4.3 Filtering an Existing Resultset

An optimization technique that can be used when you need to perform a lot of queries is to instead perform one large query and repeatedly filter the resultset to get the subset of entries you want. It is possible to select particular items from a resultset by using the Recordset::Filter property method. Once the Recordset::Filter property has been set, you can access only the items in the resultset that match the filter. Properties such as the Recordset::RecordCount return only the number of items that match the filter. If you then set the filter back to an empty string, the whole resultset is available again. Since filtering a resultset relies on data that is present in the resultset, you can only filter using the Fields object and its values. For example, if you only specify to return the givenName and sn attributes in a query, you can use only those attributes to filter the resultset later. If you do not return cn as a field, there is no way to filter on it later.

Being able to filter an existing resultset is a useful tool but only in certain situations. In our experience, it is especially useful in three situations:

You want to use filtered resultsets to access entries instead of multiple queries.
You want to refine a large resultset without looping through every value.
You want to reduce the load on Active Directory by performing one large query as opposed to several separate queries.

Let's consider a contrived example where use of the Recordset::Filter makes some sense. Let's say we want to count how many usernames begin with each of the 26 letters of the alphabet. The most intuitive method is probably to execute 26 ADO searches and record the Recordset::RecordCount property for each. However, this will hit Active Directory with 26 separate searches. Now let's expand the requirement and say we need these totals recorded continually in a file every minute or so. By now, you may be unwilling to keep hitting Active Directory with this sort of traffic every minute. The other alternative is to execute a single search for all users and loop through the resultset using Recordset::MoveNext, updating an array of 26 counts as we go. This hits Active Directory only once, but it iterates through every item. This process is fast for a moderate number of users, but for a really large number of users, it is much slower. If your resultset returns, say, 20,000 users in a single search, you need to use Recordset::Filter.

To solve the problem, we can write a piece of code that executes one search and then sets 26 separate filters, recording the Recordset::RecordCount value at each stage. Example 20-1 contains the sample code, from which the values are written to the C:\out.txt file.

Example 20-1. Using recordset filters to reduce the load on Active Directory

Option Explicit
   
Const adStateOpen = 1
   
Dim objFileSystem 'A FileSystemObject
Dim objOutput     'A TextStream Object
Dim objConn       'An ADO Connection object
Dim objRS         'An ADO Recordset object
Dim intCount      'An integer
   
'*****************************************************************************
'Create the file if it doesn't exist or truncate it if it does exist
'*****************************************************************************
Set objFileSystem = CreateObject("Scripting.FileSystemObject")
Set objOutput = objFileSystem.CreateTextFile("c:\out.txt", TRUE)
   
'*****************************************************************************
'Write out the current time and date using the VBScript 'Now' function
'*****************************************************************************
objOutput.WriteLine "Starting..." & Now
   
Set objConn = CreateObject("ADODB.Connection")
objConn.Provider = "ADSDSOObject"
objConn.Open "", "CN=Administrator,CN=Users,dc=mycorp,dc=com", ""
If objConn.State = adStateOpen Then
  objOutput.WriteLine "Authentication Successful!"
Else
  objOutput.WriteLine "Authentication Failed."
  WScript.Quit(1)
End If
   
Set objRS = objConn.Execute _
  ("<LDAP://dc=mycorp,dc=com>;(&(objectclass=user)(objectcategory=Person));cn;SubTree")
   
'*****************************************************************************
'Loop through the ASCII characters letters Asc("a") to Asc("z")
'where Asc("a") = 97 and Chr(97) = "a"
'*****************************************************************************
For intCount = 97 To 122
  objRS.Filter = "cn LIKE '" & Chr(intCount) & "*'"
  objOutput.WriteLine(Chr(intCount) & " = " & objRS.RecordCount)
Next
   
objConn.Close
Set objRS = Nothing
   
objOutput.Close

The filter property must be set using a SQL-like query string, not an LDAP search filter. The recordset filter notation is fairly simple to use. The string can b an empty string (""), which removes the current filter; a criteria string; or an array of bookmarks. Bookmarks will be explained in more detail shortly.

20.4.3.1 Using a criteria string

The criteria string can take a number of different forms, which basically can be broken down to:

Field-name  operator  value-to-check

Here are some simple examples:

Name = vicky     'Checks for exact equivalence (=)
size < 10        'Checks for less-than (<)
size > 10        'Checks for greater-than (>)
size >= 5        'Checks greater-than-or-equal-to (>=)
size <= 20       'Checks less-than-or-equal-to (<=)
size <> 10       'Checks for not-equal-to (<>)

Dates are simple to check if you surround them with pound signs (#):

Date = #12/12/99#

You also can use the keyword LIKE:

cn LIKE 'a*'            'Checks for all cn's beginning with "a"
cn LIKE 'ca%'           'Checks for all three-letter cn's beginning with "ca"
cn LIKE '*eithCoo*'

You can also use AND and OR:

size > 10 AND size < 20
cn LIKE 'a*' OR cn LIKE 'b*'

However, there is a strict rule to follow if you want to group a criteria string containing OR with another string using AND. Again, this is sloppy, and Microsoft should look to fixing it in a later release:

(cn LIKE 'a*' OR cn LIKE 'b*') AND (size <> 10)                'This is WRONG!
(cn LIKE 'a*' AND size <> 10) OR (cn LIKE 'b*' AND size <> 10) 'This is CORRECT!

That should be enough to get you started.

20.4.3.2 Using bookmarks

Each object in a resultset has a bookmark associated with it. You can always obtain the bookmark for the current record and store it for later use by retrieving the value of ResultSet::Bookmark. After recording the bookmark, you can instantly jump to that record in the resultset at any time by writing the recorded value back to the bookmark property. For example:

'Record the bookmark for the current record 
objBookMark = objRS.Bookmark
   
'Do something
   
'Now return the current record to the record indicated by the bookmark
objRS.Bookmark = objBookMark

If you read up about the ADO object model on the MSDN site, you will come across the Recordset::Clone method for cloning a resultset. Cloning a resultset will clone bookmarks. However, each recordset's bookmarks can be used only with its own resultset.

[ Team LiB ]