How to speed up nested recordset/SQL calls?

1

I am looking for help on how to speed up the code bit below because as it stands, it is taking too long to perform the task. Any suggestions would be much appreciated. Thanks in advance!

The code bit below is a stripped down version of the actual version but all the important guts should be there. The code works; however, the code is really slow on even a modest size dataset. Needless to say, the primary culprit is the second, nested recordset/SQL call. The LIKE operator is part of the slowdown but I'm more concerned about the nesting and I think the LIKE operator will be required in what we're trying to accomplish. I tried nesting the second SQL call into the first but I didn't see a clean way of doing so.

Platform: Classic ASP, VBScript, MS Access DB

' Go through all people in the table.
sql1 = "SELECT ID, FN, LN, Email FROM Table1"
Call rst1.Open(sql1, cnx, 0, 1)
While Not rst1.EOF
    id = rst1.Fields("ID").Value
    fn = rst1.Fields("FN").Value
    ln = rst1.Fields("LN").Value
    email = rst1.Fields("Email").Value
    If IsNull(email) Or IsEmpty(email) Then
        email = ""
    End If

    ' ----- Figure out if any other people in the table has a similar name or is using the same e-mail address.
    '       Capture both the ID of those other people as well as figure out the total number of possible duplicates.
    sql2 = "SELECT ID FROM Table1"
    sql2 = sql2 & " WHERE"
    sql2 = sql2 & "     ID <> " & id
    sql2 = sql2 & "     AND"
    sql2 = sql2 & "         ("
    sql2 = sql2 & "             FN & ' ' & LN LIKE '%" & Replace(fn & " " & ln, "'", "''") & "%'"
    If email <> "" Then
        sql2 = sql2 & "         OR"
        sql2 = sql2 & "         Email LIKE '%" & Replace(email, "'", "''") & "%'"
    End If
    sql2 = sql2 & "         )"
    Call rst2.Open(sql2, cnx, 0, 1)
    numDups = 0
    possibleDups = ""
    While Not rst2.EOF
        numDups = numDups + 1
        If possibleDups <> "" Then
            possibleDups = possibleDups & ", "
        End If
        possibleDups = possibleDups & rst2.Fields("ID").Value
        Call rst2.MoveNext()
    Wend
    Call rst2.Close()
    ' ----- End nest query.

    Call Response.Write(fn & " " & ln & " has " & numDups & " possible duplicates (" & possibleDups & ")")

    Call rst1.MoveNext()
Wend
Call rst1.Close()

Update 1:

Per request, here is a bit more info on the sample data and the expected output. Table1 is basically a table with the fields: id, fn, ln, email. id is an autogenerated ID representing the entry and fn/ln represent the first/last name, respectively, of the person's entry. Expected output is as coded, e.g.,...

John Doe has 3 possible duplicates (1342, 3652, 98325)
John Doe has 3 possible duplicates (986, 3652, 98325)
John Doe has 3 possible duplicates (986, 1342, 98325)
John Doe has 3 possible duplicates (986, 1342, 3652)
Sam Jones has 0 possible duplicates ()
Jane Smith has 2 possible duplicates (234, 10562)
Jane Smith has 2 possible duplicates (155, 10562)
Jane Smith has 2 possible duplicates (155, 234)

The numbers in parentheses correspond to the id's that appear to be duplicates to each person. A possible duplicate is a scenario in which another entry in the same table appears to share the same name or e-mail. For example, there could be 4 John Doe's and 3 Jane Smith's in the table based on name alone.

Ideally, only one SQL query is required to reduce the roundtrip induced by the recordset call but Access is limited compared to regular SQL Server as far as features and I'm not sure what I'm missing that might help speed this up.

Update 2:

Using the SQL Fiddle by @Abecee, I was able to get a faster query. However, I am now encountering two problems as a result.

The big picture view is still the same. We are looking for possible duplicates based on first name, last name, and e-mail address. However, we also added a search criteria, which are the lines wrapped inside of If searchstring <> "" Then ... End If. Also, note that the e-mail info is now being pulled from a separate table called EmailTable with the fields id, IndividualID (representing Table1.id), and email.

Mods: The updated query is similar but slightly different from the original query above. I'm not sure if it's better to create a whole new question or not, so I'll just leave this here for now. Let me know if I should move this to its own question.

If the code associated with comment A below is uncommented sql1 = sql1 & " OR (INSTR(E1.Email, E2.Email) > 0) ", I get an error message: Microsoft JET Database Engine (0x80040E14) Join expression not supported. The query seems to be coded correctly so what is missing or incorrect?

If the code associated with comment B below is uncommented sql1 = sql1 & " OR INSTR(E1.Email, '" & Replace(searchstring, "'", "''") & "') > 0", the query runs but it hangs. I tried dropping the query directly into Access to see if it'll work (e.g., New Query > SQL View) but it also hangs from within Access. I think the syntax and logic are correct but obviously something is askew. Do you see what or why it would hang with this line of code?

Here is the updated query:

sql1 = sql1 & "SELECT "
sql1 = sql1 & " T1.ID, T1.FN, T1.LN, E1.Email, "
sql1 = sql1 & " T2.ID, T2.FN, T2.LN "
sql1 = sql1 & "FROM "
sql1 = sql1 & "     ((Table1 T1 LEFT JOIN [SELECT E1.* FROM EmailTable E1 WHERE E1.Primary = True]. AS E1 ON T1.ID = E1.IndividualID)"
sql1 = sql1 & "         LEFT JOIN (Table1 T2 LEFT JOIN EmailTable E2 ON T2.ID = E2.IndividualID) "
sql1 = sql1 & "         ON "
sql1 = sql1 & "             ("
sql1 = sql1 & "                 T1.ID <> T2.ID "
sql1 = sql1 & "                 AND "
sql1 = sql1 & "                     ("
sql1 = sql1 & "                         ((INSTR(T1.FN, T2.FN) > 0) AND (INSTR(T1.LN, T2.LN) > 0)) "
' A. When the following line is uncommented, error is "Join expression not supported."
' sql1 = sql1 & "                       OR (INSTR(E1.Email, E2.Email) > 0) "
sql1 = sql1 & "                     ) "
sql1 = sql1 & "             ) "
sql1 = sql1 & "     ) "
If searchstring <> "" Then
    sql1 = sql1 & " WHERE "
    sql1 = sql1 & "     INSTR(T1.FN & ' ' & T1.LN, '" & Replace(searchstring, "'", "''") & "') > 0"
    ' B. When the following line is uncommented, code hangs on the rst1.open() call."
    ' sql1 = sql1 & "   OR INSTR(E1.Email, '" & Replace(searchstring, "'", "''") & "') > 0"
End If
sql1 = sql1 & " ORDER BY T1.LN, T1.FN, T1.ID"

prevID = 0

Call rst1.Open(sql1, cnx, 0, 1)
While Not rst1.EOF
    id = rst1.Fields("ID").Value

    ' Get initial values if we've come across a new ID.
    If (id <> prevID) Then
        fn = rst1.Fields("T1.FN").Value
        ln = rst1.Fields("T1.LN").Value
        email = rst1.Fields("Email").Value
        If IsNull(email) Or IsEmpty(email) Then
            email = ""
        End If

        ' Reset the counter for how many possible duplicates there are.
        numDups = 0

        ' If there is an ID from the second table, then keep track of this possible duplicate.
        tmp = rst1.Fields("T2.ID").Value
        If IsNumeric(tmp) Then
            tmp = CLng(tmp)
        Else
            tmp = 0
        End If
        If tmp > 0 Then
            numDups = numDups + 1
            possibleDups = possibleDups & tmp
        End If
    End If

    ' Figure out if we should show this row. Within this logic, we'll also see if there is another possible duplicate.
    showrow = False
    Call rst1.MoveNext()
    If rst1.EOF Then
        ' Already at the end of the recordset so show this row.
        showrow = True
        Call rst1.MovePrevious()
    Else
        If rst1.Fields("T1.ID") <> lngIndividualIDCurrent Then
            ' Next record is different T1, so show this row.
            showrow = True
            Call rst1.MovePrevious()
        Else
            ' Next record is the same T1, so don't show this row but note the duplicate.
            Call rst1.MovePrevious()
            ' Also, add the new T2 as a possible duplicate.
            tmp = rst1.Fields("T2.ID").Value
            If IsNumeric(tmp) Then
                tmp = CLng(tmp)
            Else
                tmp = 0
            End If
            If tmp > 0 Then
                numDups = numDups + 1
                If possibleDups <> "" Then
                    possibleDups = possibleDups & ", "
                End If
                possibleDups = possibleDups & tmp
            End If
        End If
    End If

    If showrow Then
        Call Response.Write(fn & " " & ln & " has " & numDups & " possible duplicates (" & possibleDups & ")")
    End If

    Call rst1.MoveNext()
    prevID = id
Wend
Call rst1.Close()
sql
ms-access
vbscript
asp-classic
asked on Stack Overflow Dec 9, 2014 by jiminy • edited Dec 11, 2014 by Shadow The Vaccinated Wizard

1 Answer

3

Yes, that's going to be slow because LIKE '%whatever%' is not sargable. So, if [Table1] has 1,000 rows then at best you'll be retrieving the other 999 rows for each row in the table, which means that you'll be pulling 999,000 rows in total.

A few observations:

  1. You are performing the comparisons for every row in the table against every other row. That would be something that you might want to do one time only to find possible dups in legacy data, but as part of the normal operation of an application we would expect to compare one record against all of the others (i.e. the one record that you are inserting or updating).

  2. You are looking for rows WHERE 'fn1 ln1' LIKE('%fn2 ln2%'). How is that significantly different from WHERE fn1=fn2 AND ln1=ln2? That would be sargable, so if you had indexes on [FN] and [LN] then that could speed things up a great deal.

  3. You really should NOT be using an Access database as the back-end for a web application (ref: here).

answered on Stack Overflow Dec 9, 2014 by Gord Thompson

User contributions licensed under CC BY-SA 3.0