Alpha Software Mobile Development Tools:   Alpha Anywhere    |   Alpha TransForm subscribe to our YouTube Channel  Follow Us on LinkedIn  Follow Us on Twitter  Follow Us on Facebook

Announcement

Collapse

The Alpha Software Forum Participation Guidelines

The Alpha Software Forum is a free forum created for Alpha Software Developer Community to ask for help, exchange ideas, and share solutions. Alpha Software strives to create an environment where all members of the community can feel safe to participate. In order to ensure the Alpha Software Forum is a place where all feel welcome, forum participants are expected to behave as follows:
  • Be professional in your conduct
  • Be kind to others
  • Be constructive when giving feedback
  • Be open to new ideas and suggestions
  • Stay on topic


Be sure all comments and threads you post are respectful. Posts that contain any of the following content will be considered a violation of your agreement as a member of the Alpha Software Forum Community and will be moderated:
  • Spam.
  • Vulgar language.
  • Quotes from private conversations without permission, including pricing and other sales related discussions.
  • Personal attacks, insults, or subtle put-downs.
  • Harassment, bullying, threatening, mocking, shaming, or deriding anyone.
  • Sexist, racist, homophobic, transphobic, ableist, or otherwise discriminatory jokes and language.
  • Sexually explicit or violent material, links, or language.
  • Pirated, hacked, or copyright-infringing material.
  • Encouraging of others to engage in the above behaviors.


If a thread or post is found to contain any of the content outlined above, a moderator may choose to take one of the following actions:
  • Remove the Post or Thread - the content is removed from the forum.
  • Place the User in Moderation - all posts and new threads must be approved by a moderator before they are posted.
  • Temporarily Ban the User - user is banned from forum for a period of time.
  • Permanently Ban the User - user is permanently banned from the forum.


Moderators may also rename posts and threads if they are too generic or do not property reflect the content.

Moderators may move threads if they have been posted in the incorrect forum.

Threads/Posts questioning specific moderator decisions or actions (such as "why was a user banned?") are not allowed and will be removed.

The owners of Alpha Software Corporation (Forum Owner) reserve the right to remove, edit, move, or close any thread for any reason; or ban any forum member without notice, reason, or explanation.

Community members are encouraged to click the "Report Post" icon in the lower left of a given post if they feel the post is in violation of the rules. This will alert the Moderators to take a look.

Alpha Software Corporation may amend the guidelines from time to time and may also vary the procedures it sets out where appropriate in a particular case. Your agreement to comply with the guidelines will be deemed agreement to any changes to it.



Bonus TIPS for Successful Posting

Try a Search First
It is highly recommended that a Search be done on your topic before posting, as many questions have been answered in prior posts. As with any search engine, the shorter the search term, the more "hits" will be returned, but the more specific the search term is, the greater the relevance of those "hits". Searching for "table" might well return every message on the board while "tablesum" would greatly restrict the number of messages returned.

When you do post
First, make sure you are posting your question in the correct forum. For example, if you post an issue regarding Desktop applications on the Mobile & Browser Applications board , not only will your question not be seen by the appropriate audience, it may also be removed or relocated.

The more detail you provide about your problem or question, the more likely someone is to understand your request and be able to help. A sample database with a minimum of records (and its support files, zipped together) will make it much easier to diagnose issues with your application. Screen shots of error messages are especially helpful.

When explaining how to reproduce your problem, please be as detailed as possible. Describe every step, click-by-click and keypress-by-keypress. Otherwise when others try to duplicate your problem, they may do something slightly different and end up with different results.

A note about attachments
You may only attach one file to each message. Attachment file size is limited to 2MB. If you need to include several files, you may do so by zipping them into a single archive.

If you forgot to attach your files to your post, please do NOT create a new thread. Instead, reply to your original message and attach the file there.

When attaching screen shots, it is best to attach an image file (.BMP, .JPG, .GIF, .PNG, etc.) or a zip file of several images, as opposed to a Word document containing the screen shots. Because Word documents are prone to viruses, many message board users will not open your Word file, therefore limiting their ability to help you.

Similarly, if you are uploading a zipped archive, you should simply create a .ZIP file and not a self-extracting .EXE as many users will not run your EXE file.
See more
See less

search problem

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    search problem

    I have an interesting search problem. I need to "match" records from tblA with records in tblB. There are exactly 24 different sets of match criteria each with one to six field pairs. Example Match Criteria 1 might be tblA.field1=tblB.field4 .and. tblA.field12=tblB.field6 (etc.).

    I have to match 10,000 records to 10,000 records. Right now I have it worked out as a purely sequential search, no indexes at all. Works perfectly except 10K * 10K means up to 100 million match attempts and the process can run for days.

    I am about to revamp the system using indexes. My guess is that I would pre-index the searched table by the fields in those 24 different match criteria, and then use the appropriate index for each search.

    Although I am sure this will work, my question is about performance considering I will have to change which index is current for each and every one of the 10,000 records in order. I cannot "batch" searches where I might try to find 2000 records using Match Criteria 1, and then 2000 using Match Criteria 2, etc. I will have to set the current index to Match Criteria 1, do the search for one record (get result), set the index to Match Criteria 2, do the search for the next record (get the results), etc.

    Does anyone have general or specific advice about this scenario?
    Steve Wood
    See my profile on IADN


    #2
    Re: search problem

    Steve,

    Are multiple matches possible? i.e. for a given set of match criteria is it possible more than one tblB record will match a given tblA record?

    I'm wondering why you aren't fetching through the 10k records 24 times, instead of 10k times?

    Would it be practical to load the tblB records into an array?

    -- tom

    Comment


      #3
      Re: search problem

      Thanks Tom, good suggestions, an array might help since I would populate it only once. Having said that, not sure how I would make use of that array in this situation.

      The details are a little more complicated.

      It is very possible there will be multiple matches in tblB for each record in tblA. Part of the deal is I have to randomize tblB records as a first step so that if you ran the process again you would come up with totally different set of matched records. So in fact I pre-process tblB with a random number in column one, and index tblB in that order.
      • Select TblA record one.
      • Select Match Criteria 1.
      • Look for a match of tblA record in tblB using Match Criteria.
      • If found, flag that tblB record as a match to tblA record. Also update the tblA record with the primary key (the random number) from tblB to indicate what record it matched.
      • Eliminate the matched tblB record from any match in the future.
      • If no match found in tblB based on Match Criteria 1, move to Match Criteria 2 and try again.
      • Continue through all Match Criteria until a match in tblB is found, or none is found.
      • Move to tblA next record, repeat cycle to last tblA record.

      I left out a few odd steps from above, like not all 24 Match Criteria are always used, it depends on some factors. The end result is that I should have found a match to a portion of tblA records in tblB, and indicated the matched record in tblA. And I will have a number of records in tblA that do not match any records in tblB.

      Side bet - anyone want to guess what industry this process is for, or what problem it is supposed to solve? Hint, it is for a non-profit organization.
      Steve Wood
      See my profile on IADN

      Comment


        #4
        Re: search problem

        Steve,

        If a match is found and the foreign key is inserted into the current tblA record, will it get overwritten there if a later match in tblB is also found?

        Does the tblA record also get discarded after all 24 possible match searches have been run? Or does it get discarded as soon as a match is found?

        Comment


          #5
          Re: search problem

          Tom, once a match is found, both records are out of the equation for any subsequent search.
          Steve Wood
          See my profile on IADN

          Comment


            #6
            Re: search problem

            Blood or organ donations? More like organ donation.

            Comment


              #7
              Re: search problem

              Good guess - its for school districts to find a "matching" student, matching a student who is receiving special services with a 'demographically similar' student who is not receiving special services. After the match the system goes on to show performance statistics over time for both students. The "matched" student is completely "de-identified" so there is no way to identify that student personally.
              Steve Wood
              See my profile on IADN

              Comment


                #8
                Re: search problem

                You say, there are 24 sets of matching criteria each with one to six fields pair(s)..
                I believe, in a different life, somebody asked a similar question and I could offer couple wacky ideas..wacky and wackier..but both work just fine:
                Wacky:
                Depending on the length of these fields, create a calc field for each search criteria then match the calc fields

                Wackier (but in my mind a better one):
                Create a set for each one of these matching criteria. Now you have 24 sets. No need to run any search whatsoever. You simply flag the parents that have children (and flag the childrent of course).
                In fact, if you want, though not a very good idea in my mind, you could reduce the number of the sets to a very small number by making tablA a parent tblB achild then tablA a grand child with the second set of criteria then tablB a great grandchild with the third set of criteria and so on..

                But then of course, there is the wackiest idea:
                For each set of matching criteria, give it an ID
                Create a joint table for both tablA & tablB (or an array)
                Sort the joint table (or the array) on those ID's.
                Flag the repeating ID's. That's your match.

                Comment


                  #9
                  Re: search problem

                  I can envision from your list reducing the complexity of my data by pre-processing data


                  tblA
                  STUDENT# | MATCHCRITERIA# | MATCHCRITERIAKEY
                  01 | 1 | white_32_10000 (e.g., race + age + income)
                  01 | 2 | GRADE12_10000 (e.g., grade + income - same tblA record, but different match criteria)

                  tblB
                  RANDOM#(FK) | MATCHCRITERIA# | MATCHCRITERIAKEY
                  12345678 | 1 | white_32_50000
                  12345678 | 2 | GRADE12_10000 (this one matches student 01 above!)


                  Where the matchcriteriakey is the concatenation of all one to six match criteria values. I can index that column in tblB! Then rather than my complex code to match tblA.fieldX = tblB.fieldY .and. tblA.fieldX1 = tblB.fieldY1 (etc.) I just match up tblA.matchcriteriakey with tblB.matchcriteriakey.

                  At first glance it seems like I am increasing the size of my search. TableA goes from 10K records to 10K * 24 Match Criteria. And tableB does the same. But in reality I am looping so many times in my current process that the resulting number of iterations is the same. Also, the code to come up with the tblA.fieldX = tblB.fieldY .and. tblA.fieldX1 = tblB.fieldY1 (etc.) is so significantly complex that it belongs in a museum and I would love to do away with it.

                  Looking at the above, it would be proper to include a parent table for both tblA and tblB with each student listed only once. As soon as a match is found in tblB I would flag the parent record as having been used in a match and not use subsequently.
                  Steve Wood
                  See my profile on IADN

                  Comment


                    #10
                    Re: search problem

                    Steve,

                    I like G's "set" approach.

                    But if you're using a one-to-many link, finding the matches would be easier if you invert the set relationship and use a one to one link.

                    Comment


                      #11
                      Re: search problem

                      I love Sets, but since I normally use SQL I have abandoned using Sets. This app happens to be DBF but I still cannot go back to Sets even though that is a silly position.
                      Steve Wood
                      See my profile on IADN

                      Comment


                        #12
                        Re: search problem

                        Frankly I like the array idea best and you can turn it into a function.

                        Comment


                          #13
                          Re: search problem

                          I have had to do a similar matching job wirh a social care app.
                          900k records. 4 tables.
                          I took the calc field approach and added a field into the tables which created a fuzzy match base, relevent to the content.
                          Did it in several runs. Not perfect but close enough for the client.
                          See our Hybrid Option here;
                          https://hybridapps.example-software.com/


                          Apologies to anyone I haven't managed to upset yet.
                          You are held in a queue and I will get to you soon.

                          Comment

                          Working...
                          X