Alpha Software Mobile Development Tools:   Alpha Anywhere    |   Alpha TransForm subscribe to our YouTube Channel  Follow Us on LinkedIn  Follow Us on Twitter  Follow Us on Facebook

Announcement

Collapse

The Alpha Software Forum Participation Guidelines

The Alpha Software Forum is a free forum created for Alpha Software Developer Community to ask for help, exchange ideas, and share solutions. Alpha Software strives to create an environment where all members of the community can feel safe to participate. In order to ensure the Alpha Software Forum is a place where all feel welcome, forum participants are expected to behave as follows:
  • Be professional in your conduct
  • Be kind to others
  • Be constructive when giving feedback
  • Be open to new ideas and suggestions
  • Stay on topic


Be sure all comments and threads you post are respectful. Posts that contain any of the following content will be considered a violation of your agreement as a member of the Alpha Software Forum Community and will be moderated:
  • Spam.
  • Vulgar language.
  • Quotes from private conversations without permission, including pricing and other sales related discussions.
  • Personal attacks, insults, or subtle put-downs.
  • Harassment, bullying, threatening, mocking, shaming, or deriding anyone.
  • Sexist, racist, homophobic, transphobic, ableist, or otherwise discriminatory jokes and language.
  • Sexually explicit or violent material, links, or language.
  • Pirated, hacked, or copyright-infringing material.
  • Encouraging of others to engage in the above behaviors.


If a thread or post is found to contain any of the content outlined above, a moderator may choose to take one of the following actions:
  • Remove the Post or Thread - the content is removed from the forum.
  • Place the User in Moderation - all posts and new threads must be approved by a moderator before they are posted.
  • Temporarily Ban the User - user is banned from forum for a period of time.
  • Permanently Ban the User - user is permanently banned from the forum.


Moderators may also rename posts and threads if they are too generic or do not property reflect the content.

Moderators may move threads if they have been posted in the incorrect forum.

Threads/Posts questioning specific moderator decisions or actions (such as "why was a user banned?") are not allowed and will be removed.

The owners of Alpha Software Corporation (Forum Owner) reserve the right to remove, edit, move, or close any thread for any reason; or ban any forum member without notice, reason, or explanation.

Community members are encouraged to click the "Report Post" icon in the lower left of a given post if they feel the post is in violation of the rules. This will alert the Moderators to take a look.

Alpha Software Corporation may amend the guidelines from time to time and may also vary the procedures it sets out where appropriate in a particular case. Your agreement to comply with the guidelines will be deemed agreement to any changes to it.



Bonus TIPS for Successful Posting

Try a Search First
It is highly recommended that a Search be done on your topic before posting, as many questions have been answered in prior posts. As with any search engine, the shorter the search term, the more "hits" will be returned, but the more specific the search term is, the greater the relevance of those "hits". Searching for "table" might well return every message on the board while "tablesum" would greatly restrict the number of messages returned.

When you do post
First, make sure you are posting your question in the correct forum. For example, if you post an issue regarding Desktop applications on the Mobile & Browser Applications board , not only will your question not be seen by the appropriate audience, it may also be removed or relocated.

The more detail you provide about your problem or question, the more likely someone is to understand your request and be able to help. A sample database with a minimum of records (and its support files, zipped together) will make it much easier to diagnose issues with your application. Screen shots of error messages are especially helpful.

When explaining how to reproduce your problem, please be as detailed as possible. Describe every step, click-by-click and keypress-by-keypress. Otherwise when others try to duplicate your problem, they may do something slightly different and end up with different results.

A note about attachments
You may only attach one file to each message. Attachment file size is limited to 2MB. If you need to include several files, you may do so by zipping them into a single archive.

If you forgot to attach your files to your post, please do NOT create a new thread. Instead, reply to your original message and attach the file there.

When attaching screen shots, it is best to attach an image file (.BMP, .JPG, .GIF, .PNG, etc.) or a zip file of several images, as opposed to a Word document containing the screen shots. Because Word documents are prone to viruses, many message board users will not open your Word file, therefore limiting their ability to help you.

Similarly, if you are uploading a zipped archive, you should simply create a .ZIP file and not a self-extracting .EXE as many users will not run your EXE file.
See more
See less

Xbasic - How to skin a cat, the best way. Matching a tag.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Xbasic - How to skin a cat, the best way. Matching a tag.

    This is for you xbasic gurus, I guess.

    I have two tables (dbf) the first contains only a few fields and a couple hundred lines, basically a term and an associated code that goes with that term.

    The second table is about 700,000+ items that has many fields but basically one field could possible contain a term from the first table and if so I need to insert the
    associated code for that term, in another field. Just to make things interesting there can be several items within the target field so that the only way I have found to
    isolate terms is to use "search for word(s)" in the Query Genie (e.g. containsi() in xbasic). I need to rip thru all of the codes file against every line item in the
    master file.

    So I thought I would just run the Query Genie and enter a couple terms and see the generated xbasic to see what the most efficient method to scan the
    file would be. Imagine that.. you can capture the xbasic on the filter, but I cannot figure out how to capture the complete xbasic for the process to apply
    against the entire table. I understand that it would return a result that would then have to be read thru anyway and I do not need to see a browse or form.

    So next I thought about reading up the terms into an array and spin the array/per master file record and do it all in one process (upon term match grab associated
    code and stuff'n go)... but wondered if there was yet a more efficient method then running a couple hundred items against 700,000+ lines.

    Anyone care to share their ideas based on the above? This has to be done quarterly and worst case monthly and even once is one time to many to think about doing manually, in any case.

    Regards,
    Keith
    Keith Weatherhead
    Discus Data, Ltd
    [email protected]

    #2
    Re: Xbasic - How to skin a cat, the best way. Matching a tag.

    Sounds like stringdictionary would be most suitable

    the StringDictionary is much faster than collections when used for very large lists. Supports only character data.

    Comment


      #3
      Re: Xbasic - How to skin a cat, the best way. Matching a tag.

      Ray,

      Thanx for the thought... after looking at the StringDictionary function I am not sure it would do what I need...

      The Master File, field to search, might look like "term1 term2 term3 term4, ..."

      The Code file is
      TermA code1
      TermB code2
      TermC code3

      etc

      TermsA/B/C *could* match Term1/2/3 or may not...

      so there is multiple options, whereas StringDictionary appears to want to check a single value against a list. That is why I had to use the containsi()
      or "contains the word(s)" option to find matches although I would have to do them one at a time to set the proper CODE value which is why I came up
      with a two dimensional array (term,Code) and thought picking the "term" and running thru the containsi() and upon match set "code" might be the way
      to address the situation. I admit I have never used (nor tried StringDictionary) before so maybe I am not fully understanding its capabilities.

      Keith
      Keith Weatherhead
      Discus Data, Ltd
      [email protected]

      Comment


        #4
        Re: Xbasic - How to skin a cat, the best way. Matching a tag.

        Wouldn't this be easier if the field with multiple terms was "split" into several fields, each containing a single term? After adding new fields this could be done with an update operation.

        Comment


          #5
          Re: Xbasic - How to skin a cat, the best way. Matching a tag.

          Tom,

          Why certainly, except for the original file has to be maintained, unfortunately, and checked for changes going forward on a minimum of a quarterly basis.
          So I would have to disassemble, update and reassemble... was hoping not to have to do that. There is a final validation phase I have left out of the
          equation so far, which needs to be done in the updated/reassembled database, just to add a bit of salt to the wound. But that for another rainy day.

          Keith
          Keith Weatherhead
          Discus Data, Ltd
          [email protected]

          Comment


            #6
            Re: Xbasic - How to skin a cat, the best way. Matching a tag.

            Hi Keith,

            Going through the table multiple times is a very inefficient method. You want to pass through the code exactly once.

            Basically, you load up a string or array, and create a function to use in an update operation for the table (It may be possible to create an expanded expression, and fore go the function). The function would temporarily change the field to a CRLF() delimited words, load up 2 crlf strings from the 1st table with the items and corresponding codes, then use strtran_multi() or stritran_multi(), then change the result back to the format for the field from the crlf delimited result.
            Regards,

            Ira J. Perlow
            Computer Systems Design


            CSDA A5 Products
            New - Free CSDA DiagInfo - v1.39, 30 Apr 2013
            CSDA Barcode Functions

            CSDA Code Utility
            CSDA Screen Capture


            Comment


              #7
              Re: Xbasic - How to skin a cat, the best way. Matching a tag.

              Would like to help you skin this cat..unfortunately, I don't see the cat!
              I have no idea what the question is?
              I applaud those who responded, presumably they did understand what I failed to understand due to my limited knowledge of the animal kingdom, more specifically felines....
              I vaguely see one table that has a field that is perhaps a memo or an RTF field that has the word "term" associated with some code and another table that has a field that has the same word "Term" but instead of "Term 1" in the first table, it is "Term A" in the second..or maybe the correlation does not hold..and you want to associate it with the code from the first table.. something along these lines..

              Perhaps an example or a zip file might help.

              Comment


                #8
                Re: Xbasic - How to skin a cat, the best way. Matching a tag.

                I think it is meant to indicate that multiple matches TERMA. TERMB could exist in the string "TERM1TERM2TERMA...TERMB..."
                In which case stringdictionary would not be suitable
                I did however do a test of stritran with three comparisons against a 180 char long string with 1mill iterations (2 mill changes) as follows
                Code:
                ?frstr 'From string
                = asdfgh
                kloiuy
                sdfGhJ
                ?tostr 'to str
                = marios
                calugi
                MASTER
                chstr="asdfghjkloiuytrewqasdfghjkloiuytrewqASDFGHJKL;OPIUYTREWQASDFGHJKLOIUYTREWWSDFGHJKLOIUYTREsdfghjkloiuytreDFGYTREWEDRFGTHJKOIUYTREDFGHJKOIUYTREDFGHHGFDXCVBNMJHGFDxxcvbnmjhgfdcvbnmhhgfd"
                stt=now()
                for i=1 to 1000000
                nstr=stritran_multi(nstr,frstr,tostr)     'first change
                nstr=stritran_multi(nstr,tostr,frstr)    'Change back
                next
                ste=now()-stt
                ?ste
                = 79.473 seconds
                Second run 66 secs
                GEEKY I know - but interested in time scale - pretty fast for the string operation itself. Add to that some file read time.
                Last edited by Ray in Capetown; 08-23-2013, 05:35 AM.

                Comment


                  #9
                  Re: Xbasic - How to skin a cat, the best way. Matching a tag.

                  HI all, sorry for the delay, I was out of the country and yesterday was the return trip.

                  IRA, yes I agree one time thru each file, by all means, however memory will get a thrashing, lol. My thoughts, read the terms/codes in an array. Then read the Master File and on each record spin thru the array of terms/codes to see if a match exists, if so, store the codes in that Mast File record. ONCE thru each file !

                  Thanx for the comments everyone, to answer a few other questions/comment... it IS possible for multiple matches to occur cause the *single* field in the Master File *could* contain multiple Terms_Keys... would not be my preference, but that is the task at hand, not of my doing originally. Any records with multiple matches has to be investigated as to which CODE would be appropriate for the record, so just be cause I found a match, I cannot stop running the TERMS at that point until I exhaust that list. I will just flag that multiple hits occurred and those items will be the first to be reviewed to determine their proper coding.

                  Keith
                  Keith Weatherhead
                  Discus Data, Ltd
                  [email protected]

                  Comment


                    #10
                    Re: Xbasic - How to skin a cat, the best way. Matching a tag.

                    What is the structure of each table?
                    What types of fields are these?
                    Are there any common fields between the tables?
                    What "code" are you referring to?
                    A code as in an item code? or an xbasic code?
                    On the average, how many characters are these "terms"? Do they fit any particular pattern? Ditto for the "codes"?
                    Perhaps others have answers to all of the above..

                    Comment


                      #11
                      Re: Xbasic - How to skin a cat, the best way. Matching a tag.

                      Hi Ray,

                      Basically the methodology I suggested. In version 9 and earlier, the stritran_multi() will run much slower, but should still be relatively fast.

                      Keith,
                      This is the way to do it in an update. Forget arrays, it's a dead end for your needs (in terms of speed). And you must use an update to move through the 700k records quickly (vs xbasic). If you are changing a field used in an index, read my Record Locking 101 tips and pay attention to appends/updates section
                      Regards,

                      Ira J. Perlow
                      Computer Systems Design


                      CSDA A5 Products
                      New - Free CSDA DiagInfo - v1.39, 30 Apr 2013
                      CSDA Barcode Functions

                      CSDA Code Utility
                      CSDA Screen Capture


                      Comment


                        #12
                        Re: Xbasic - How to skin a cat, the best way. Matching a tag.

                        OK, Let me put a better example on the table to help you understand, while I cannot use the specifics for the situation due to non-disclosure issues, this is exactly the same issue using things I can talk about. All data at this point is Character Based in these examples.

                        I am part of a Specialized Due Diligence Audit Team that specializes in Asset Base Securitization Financing of Trade Receivables. Lots of big words there, what it means... big companies, Fortune 500 (at a minimum) and bigger, are borrowing using their A/R as collateral on the deal. There are all sorts of issues that have to be contemplated for the banks doing the financing of these deals.

                        One of the things we have to do is to make sure that the company is tagging their government accounts, why, well all I can say is you cannot perfect a lien over a government entity in the US, you either understand or not, does not matter and is immaterial to the rest of the conversation other than what is important is we have to be able to identify all potential governmental customer entities. Some you may not think of but it works like this....

                        You have a Customer Master File with things like Account_Number, Customer Name, Address, City, State, Zip, Phone, Customer Type, Credit Limit, etc yada, yada, yada. Now if the company flags their government accounts it gives us a nice starting point, but some companies do not flag them or do not even have a field in the Customer Master, for that matter, to give them a place for that info. That matters not to us as we still have to quantify potential government accounts.

                        So how do we go about it, usually manually, but this is going to help in the future get us out of manual mode... and look at the Customer Names to try to identify these government accounts. We look for things like: University of, Univ of, U of, City of, Town of, Dept of, AAFES, etc. Now to make things a bit messier, we have to on most deals make a list of things we believe to be Federal Entities and Municipal Entities, and some deals even made us split Municipal into State and Municipal. If only one person added Customers to the Customer Master it would be better but many companies allow branch level locations to add Customers for their area/region. One such company DOES have a procedure manual detailing how to enter Customer Names in a reasonable manner, however most of their employees fail at reading and comprehension so we get things like: Army Air Force Exchange Service, AAFES, AAFES, A.A.F.E.S., A.A.F.E. Service ... and that is just one sample of the craziness we come across. Another one is: USPO, Post Office, US Post Office, PO of, etc, think for a second of all of the possible naming options that could be employed not counting pure spelling errors.

                        So we developed a table, over time, that contains as many as these type of search-tags as we could come up with. Now we want to bounce that off the Customer Master, Customer Name field. Could there be multiple hits, YES, can there be false positives, YES ... but we would better those than to skip an account as we can easily declassify something tagged wrong, but do not want to understate as that is much worse as issue. While looking for "City" as in "City of Chicago", we would get "Gas City", "Party City", which are obviously NOT government entities and would have to be untagged in the final review.

                        So NOW, you should have a better idea of what type of matching I am trying to do... taking a list of tags (in this case potential government entities) and upon a hit, set whether it is Federal, State, or Municipal. My original situation is very similar in concept, just cannot talk about those terms.

                        Keith
                        Keith Weatherhead
                        Discus Data, Ltd
                        [email protected]

                        Comment


                          #13
                          Re: Xbasic - How to skin a cat, the best way. Matching a tag.

                          Thanks for the education on government covert operations..Thank goodness I am no fortune 500 or 5000 for that matter! I was more interested in the database technical aspects..
                          So, Table one is "Customer Master File".
                          Is that table one that you developed? a dbf table? Or one that you obtain (maybe through some covert government venues? the NSA? a guy in basement of a parking lot? or a guy in a dark alley?)
                          IOW, is that table consistent in terms of structure? or is it different from company to company..
                          Do you need to search multiple fields in that table? or just one?
                          Also, you said, all data is character based "
                          Are any of these fields a memo or an RTF? Not that it makes too much difference, just a technicality.

                          Just, from what you said so far, you will need a regex function

                          What you are trying to do is not much different than someone searching google and misspelled a word or two and google returns any and all possible matches to those words, or someone searching a dictionary and misspelled the word, or for that matter validating an email address for example..
                          You could go in rounds and rounds..but with this many variations seen, not to mention the unseen, you will need a regex function. That is one option.

                          Alternatively, and this will sound crazy at first but it is not..you do things the other way around, meaning:
                          Say each company has a record in the "Customer Master File". You take the field that you want to search and create a set so to speak, breaking each word or sentence into a child record that is still linked to the parent record, then it becomes a whole lot easier searching the child table.

                          Or, alternatively, as you mentioned earlier, in lieu of the child table, you create an array..only problem with that is (and I live in the dark ages of v8) matching words in an array didn't work well for me..but it might in the newer versions.

                          You can create a calc field using regex that will turn true if there is a match. The problem with that is simple: it is not doable because evidently you had to create table for all possible trigger words, meaning there is a lot of them which will exceed the expression filter character limit..but you could get around that by writing a UDF that iterates over these different variation..

                          P.S.
                          How do you suppose Larry Page got to be so rich?
                          BTW, it wasn't he who created this text searches..I remember using an application 30 years ago or so and I think the developer was a gentleman by the name of Nelson, who back then pioneered these searches..

                          Comment


                            #14
                            Re: Xbasic - How to skin a cat, the best way. Matching a tag.

                            Hi Keith,

                            This is all very doable by just taking the source input, and then using Update to update the field (or another field) with 1 or more tags (F,S,M) based upon matching the input. If processing a lot of large tables, or repeating a lot of tables, then speed is very important. Regex, stritran, as well as other built-in functions can handle this, but coding and methodology is very important. I have a lot of items on my Tips page that discuss speed and function choices. One other thing is that if you are only looking for a finite # of tags (F,S,M), there are methods that might be faster with that small # vs an indeterminate # of tags.

                            As an example, a recent client had "optimized" the best that they could (and they are a very experienced a5 developer), but I used entirely different (non-obvious) methods to process the data and increased speed 7 times from 70 minutes to 10 minutes. For another client, 87 times for their code. The point is, that if you need the best possible speed, it is difficult to get it without having a lot of speed development under your belt.

                            Small differences can have big effects. E.g. there are some string functions (in some a5 versions), that work fast under 2k characters that slow down for larger strings. In most, but not all cases, later a5 versions of functions are either the same or faster than older a5 versions, but there are exceptions.

                            Another aspect to your problem is that there are ways to eliminate other false positives, as well as other aspects, e.g. taking all the 1st letters of the words or initials if there are portions that are initials, and match those initials, or looking for leading, middle or trailing matches for certain portions of matching.

                            So if and when you or others exceed your abilities, and assuming you actually have a budget (not always the case for many a5 self-taught developers) consider hiring an expert in that area. I see many people on the board looking to learn, and ask a lot of questions, but they seldom get the fastest solution. http://www.iadn.com lists a wide variety of A5 developers, including myself that can often help.
                            Regards,

                            Ira J. Perlow
                            Computer Systems Design


                            CSDA A5 Products
                            New - Free CSDA DiagInfo - v1.39, 30 Apr 2013
                            CSDA Barcode Functions

                            CSDA Code Utility
                            CSDA Screen Capture


                            Comment


                              #15
                              Re: Xbasic - How to skin a cat, the best way. Matching a tag.

                              We are not government nor are we covert... neither are our customers which is why we have to determine just how much business they do with government entities cause they have special rules when it comes to dealing with them. If the percentage of the customers business is too heavily government favored these programs punish them for being that much concentrated in that type of business. Not my rules, we are doing an "auditing type" of function and have certain rules that we have to play by and that govern our procedures, and no we are not CPAs either.

                              Unfortunately, the files are different from company to company and can be different from one division to another in the same company, what really stinks is companies with multiple customer files (for different divisions) that contain commons customers and there is no linkage between them (customer numbers are different between divisions), and yes there are BIG companies that are that dis-jointed !!! I almost always for simplicity sake use DBF files and the fields are either Character or Numeric (upto 18,2) in size. Many fields that appear to be numeric in these large systems would required edit formats or otherwise since they like to maintain leading zeros. It is easier for me to just import them as character, so I do that all the time. One of the common reports we like to get our hands on out of SAP is a FBL5N report. A/R Transactions for the period you choose, open or closed items, but there are over 100 fields that can be dumped and EVERYONE gives us a different layout based on what they feel is appropriate so even the same report varies from one SAP based company to another.

                              I do not create any of the files, usually get a dump, in CSV, TSV, fixed length records, or Excel depending on the size and level of user gathering the info for us. If I can do what I need in Excel, then I usually just do so, however when I end up with 20,000 lines and up, I import into A5 and use a bit more power to do the tasks required. Wear the heck out of the Query Genie for many things cause it is just too easy to do so and get the job done. Then there are project like this that are not reasonable while working in any kind of manual mode. I have to customize scripts I have developed over and over all the time. I try in cases where possible to rename field-header/names as the script does care about the field order and I have imported sometimes into somewhat standardized field layout... working with customers using SAP makes it a bit easier as their reports are the most standard, but we deal with: SAP, Oracle, PeopleSoft, JD Edwards, DB2, AS400 and a slug of Legacy Systems which as you might guess only resemble each other in the file content naming (eg: Customer File) but layout & lengths and order of fields varies exponentially.

                              Well we are not there to correct the spelling of records in their customer file... if we see something obvious we will out of courtesy inform them so they can correct it. Some customers like and appreciate the heads-up, others are offended by having it pointed out. So for spelling errors we tread lightly until we know how they respond. If we miss an account due to a "true spelling error" so be it. We go to many lengths to allow for style of input and call it good at that point in time. We usually find many errors even at companies that make an effort to maintain the information, which some do not at all in reality. We are not hired to sit there for weeks doing a line-by-line audit, in fact the longest we spend on-site is a week and have a ton of procedures to get done in that time frame, most places is 2 to 3 days, in and out. These kinds of tasks are easily accomplished at our offices but even still we do not have so little to do that we can make a government project out of it, all puns intended !!!

                              I keep going back to my original array idea, read all the terms and codes into an array... then start a read loop on the master file and for each record spin the items out of the array and do a containsi() against the Customer Name field. Upon a match store the associated code for the term getting the hit. Each file is only read one time and it will be compute bound for the loop of array items and then the compute burden broken by I/O operations. If it runs for an hour... so be it, this isn't done daily or even multiple times. When I get the records tagged I need to export them for review and correction by customer company and then we move on to the next thing. I have not really heard a simpler or more straight forward approach yet. While I want to make it reasonably efficient, I want it easily modified as I was just told today I will get a chance to try it against a much bigger data-set next month, their customer file is suppose to have in excess or 7.5 million customers... might need to get an i7 notebook first, lol !!

                              Keith
                              Keith Weatherhead
                              Discus Data, Ltd
                              [email protected]

                              Comment

                              Working...
                              X