Alpha Software Mobile Development Tools:   Alpha Anywhere    |   Alpha TransForm subscribe to our YouTube Channel  Follow Us on LinkedIn  Follow Us on Twitter  Follow Us on Facebook

Announcement

Collapse

The Alpha Software Forum Participation Guidelines

The Alpha Software Forum is a free forum created for Alpha Software Developer Community to ask for help, exchange ideas, and share solutions. Alpha Software strives to create an environment where all members of the community can feel safe to participate. In order to ensure the Alpha Software Forum is a place where all feel welcome, forum participants are expected to behave as follows:
  • Be professional in your conduct
  • Be kind to others
  • Be constructive when giving feedback
  • Be open to new ideas and suggestions
  • Stay on topic


Be sure all comments and threads you post are respectful. Posts that contain any of the following content will be considered a violation of your agreement as a member of the Alpha Software Forum Community and will be moderated:
  • Spam.
  • Vulgar language.
  • Quotes from private conversations without permission, including pricing and other sales related discussions.
  • Personal attacks, insults, or subtle put-downs.
  • Harassment, bullying, threatening, mocking, shaming, or deriding anyone.
  • Sexist, racist, homophobic, transphobic, ableist, or otherwise discriminatory jokes and language.
  • Sexually explicit or violent material, links, or language.
  • Pirated, hacked, or copyright-infringing material.
  • Encouraging of others to engage in the above behaviors.


If a thread or post is found to contain any of the content outlined above, a moderator may choose to take one of the following actions:
  • Remove the Post or Thread - the content is removed from the forum.
  • Place the User in Moderation - all posts and new threads must be approved by a moderator before they are posted.
  • Temporarily Ban the User - user is banned from forum for a period of time.
  • Permanently Ban the User - user is permanently banned from the forum.


Moderators may also rename posts and threads if they are too generic or do not property reflect the content.

Moderators may move threads if they have been posted in the incorrect forum.

Threads/Posts questioning specific moderator decisions or actions (such as "why was a user banned?") are not allowed and will be removed.

The owners of Alpha Software Corporation (Forum Owner) reserve the right to remove, edit, move, or close any thread for any reason; or ban any forum member without notice, reason, or explanation.

Community members are encouraged to click the "Report Post" icon in the lower left of a given post if they feel the post is in violation of the rules. This will alert the Moderators to take a look.

Alpha Software Corporation may amend the guidelines from time to time and may also vary the procedures it sets out where appropriate in a particular case. Your agreement to comply with the guidelines will be deemed agreement to any changes to it.



Bonus TIPS for Successful Posting

Try a Search First
It is highly recommended that a Search be done on your topic before posting, as many questions have been answered in prior posts. As with any search engine, the shorter the search term, the more "hits" will be returned, but the more specific the search term is, the greater the relevance of those "hits". Searching for "table" might well return every message on the board while "tablesum" would greatly restrict the number of messages returned.

When you do post
First, make sure you are posting your question in the correct forum. For example, if you post an issue regarding Desktop applications on the Mobile & Browser Applications board , not only will your question not be seen by the appropriate audience, it may also be removed or relocated.

The more detail you provide about your problem or question, the more likely someone is to understand your request and be able to help. A sample database with a minimum of records (and its support files, zipped together) will make it much easier to diagnose issues with your application. Screen shots of error messages are especially helpful.

When explaining how to reproduce your problem, please be as detailed as possible. Describe every step, click-by-click and keypress-by-keypress. Otherwise when others try to duplicate your problem, they may do something slightly different and end up with different results.

A note about attachments
You may only attach one file to each message. Attachment file size is limited to 2MB. If you need to include several files, you may do so by zipping them into a single archive.

If you forgot to attach your files to your post, please do NOT create a new thread. Instead, reply to your original message and attach the file there.

When attaching screen shots, it is best to attach an image file (.BMP, .JPG, .GIF, .PNG, etc.) or a zip file of several images, as opposed to a Word document containing the screen shots. Because Word documents are prone to viruses, many message board users will not open your Word file, therefore limiting their ability to help you.

Similarly, if you are uploading a zipped archive, you should simply create a .ZIP file and not a self-extracting .EXE as many users will not run your EXE file.
See more
See less

Help with FileFind.GREP() Expression

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Help with FileFind.GREP() Expression

    Im performing a filefind.grep function on a directory that contains several text files to return the file name(s) of the files that match the words contained in a regular expression (see code below). I have worked the expression 100 ways and cannot retrieve a match that includes search text...other text...search text. I can successfully run the grep on either the beginning string or the ending string, but not both. In addition, I use a program called Regex Buddy that confirms that my expression matches the files in question. Where is my expression going wrong? And why do they call them regular expressions? There is nothing regular about them. Any help is appreciated!

    Code:
    STRING A = word(linetxt,1,",")
    STRING B = word(linetxt,3,",")
    sexpr = STRING A +"\(.*\r\n\)*.*"+ STRING B
    delete expression_result
    expression_result = (filefind.grep("*.txt", sexpr , 0 , "$(Filename), $(stop)","FI"))
     
    If (IsNull(expression_result) = .f.) then
     tbl.populate_from_string("file_name, STRING_A, STRING_B", crlf(), expression_result + ", " + STRING A + ", " + STRING B)
     
    end if
    THANKS,
    JAMES
    Thanks,
    James

    #2
    Re: Help with FileFind.GREP() Expression

    Hi James,

    Originally posted by jhackney View Post
    Im performing a filefind.grep function on a directory that contains several text files to return the file name(s) of the files that match the words contained in a regular expression (see code below). I have worked the expression 100 ways and cannot retrieve a match that includes search text...other text...search text. I can successfully run the grep on either the beginning string or the ending string, but not both. In addition, I use a program called Regex Buddy that confirms that my expression matches the files in question. Where is my expression going wrong? And why do they call them regular expressions? There is nothing regular about them. Any help is appreciated!

    Code:
    STRING A = word(linetxt,1,",")
    STRING B = word(linetxt,3,",")
    sexpr = STRING A +"\(.*\r\n\)*.*"+ STRING B
    delete expression_result
    expression_result = (filefind.grep("*.txt", sexpr , 0 , "$(Filename), $(stop)","FI"))
    A couple of things 1st. STRING A should be STRING_A, same for STRING B. Do not use spaces in variable names, field names, function names etc. Alpha will sometimes allow it, but it is aking for trouble, as many times the space will be converted to an underscore. Stick to underscore, and alphanumerics for names.

    Deleting a variable is also a bad idea, and all variables should be DIM'd. See my tips here.

    Also, as far as I know, "F" is not an option for grep expressions, but can be used in the return string

    Now to your grep expression. I don't know what you are trying to find in the text. Is it a string 1 followed by some other text you specified, or just any text, and then followed by string 1 again? Also, do you want the text to be matched to all be on 1 line? On the chance that the strings might contain regex special characters, they should be escaped if you don't know their contents.


    This works for me
    Code:
    STRING_A = regex_escape(word(linetxt,1,","))
    STRING_B = regex_escape(word(linetxt,3,","))
    'sexpr = STRING_A +"\(.*\r\n\)*.*"+ STRING_B
    sexpr=".*"+STRING_A+".*"+STRING_B
    dim expression_result as c
    expression_result = filefind.grep("*.txt", sexpr , 0 , "$(Filename)$(stop),","I")
    Regards,

    Ira J. Perlow
    Computer Systems Design


    CSDA A5 Products
    New - Free CSDA DiagInfo - v1.39, 30 Apr 2013
    CSDA Barcode Functions

    CSDA Code Utility
    CSDA Screen Capture


    Comment


      #3
      Re: Help with FileFind.GREP() Expression

      CSDA, Thanks for the quick reply. String A and String B are not the real variables. I just used the lingo for presentation purposes. I'm pretty old school when it comes to developing my syntax enough so that I'm not even comfortable using underscore. Sorry for the confusion in my submission.

      I can't tell you why the "F" is in there other than it represents the frustration I have had over the last 48 hours trying to nail down this expression. It could also represent the "F" explatives I was throwing around at the screen although i don't recall seing this command in the POSSIX standards other than the reason you mentioned.

      What I am trying to do is develop a broad expression that will scan through approx 1,000 "semi-structured" text documents. Some documents will only be 1 page while others may reach 10. The data I am looking for is likely to be on the first page but can be anywhere from the first line to the last line. The goal is to match a character string that represents a company's name (string a) and then, a character string that represents a company's address (string b). The address may be on the line that follows or at the bottom of the page. B/C the documents are not structured like a form, there will never be a consistent location for the strings. As such, I also plan to match the 2 strings in reverse just in case. The text in between the strings we must plan for may contain any and all possible word characters plus whitespace, tab, CRLF, etc... When we are done, we are only interested in the documents where the company name and address matches our strings. One will not due whithout the other.

      I am going to try your suggestion and I will let you know what the results are. Until then, I hope I have provided a good explanation of the expressions purpose.

      Thanks again!
      Thanks,
      James

      Comment


        #4
        Re: Help with FileFind.GREP() Expression

        You certainly could do this with regex, but it might be an overkill. You could use something as simple as contains() or $:
        Code:
        v_company="Alpha Software"
        v_address="70 Blanchard Road"
        Text=<<%str%
        #Alpha Software, Inc. makes wonderful software called alpha5.
        #They are located at:
        #70 Blanchard Road
        #Burlington, MA 01803-5100
        #%str%
        ?(contains(text,v_company)).and.(contains(text,v_address))
        = .T.
        ?(contains(text,"Alpha*")).and.(contains(text,"70 Blanchard*"))
        = .T.
        ?(v_company $ text).and.(v_address $ text)
        = .T.

        Comment


          #5
          Re: Help with FileFind.GREP() Expression

          It's nice to hear from you again Gabriel. It has been a while.

          When I started this script I was headed down the same path you suggested. After an hour or so I pulled out a calculator to estimate the total number of iterations I would be performing. Assuming approx 1,000 text documents to review and approx 400 search combinations, the script would execute approx 400,000 iterations. Using filefind.grep() I can effectively combine the 1,000 text documents into 1 search target and cut the iterations down to 400. The grep method appears to be the most efficient way to handle my problem, especially when you consider you don't have to actually get any of the text documents before they are processed.

          Please correct me if you do not think this is the case or if there is another way for me to keep the processing iterations limited using another method.
          Thanks,
          James

          Comment


            #6
            Re: Help with FileFind.GREP() Expression

            Originally posted by jhackney View Post
            When I started this script I was headed down the same path you suggested. After an hour or so I pulled out a calculator to estimate the total number of iterations I would be performing. Assuming approx 1,000 text documents to review and approx 400 search combinations, the script would execute approx 400,000 iterations. Using filefind.grep() I can effectively combine the 1,000 text documents into 1 search target and cut the iterations down to 400. The grep method appears to be the most efficient way to handle my problem, especially when you consider you don't have to actually get any of the text documents before they are processed.

            Please correct me if you do not think this is the case or if there is another way for me to keep the processing iterations limited using another method.
            If you are running the search for many criteria, you are paying a high disk or network overhead if you are rereading the file many times (400 in the case you are outlined). It would be better to read a file in, process it for the 400 searches you need to do (Can you stop searching once you found a matching one?), and then move on to the next file.

            Depending on total size of all the text files together, you could even bring them all into main memory of an Alpha 5 variable. If properly indexed to the start and end line of each, one search could instantaneously find 1 or more occurrences within the text, and identify the file associated with the line. This would be fastest, as you are reading the files once, and searching for each criteria once.

            The lower level string searching commands and regex are the fastest way to find matching strings.
            Regards,

            Ira J. Perlow
            Computer Systems Design


            CSDA A5 Products
            New - Free CSDA DiagInfo - v1.39, 30 Apr 2013
            CSDA Barcode Functions

            CSDA Code Utility
            CSDA Screen Capture


            Comment


              #7
              Re: Help with FileFind.GREP() Expression

              James
              Your math is not exactly correct.
              Say you have 400 combos of company/address and say you have 1000 documents to search.
              The first question really is: could one document have more than one combo?
              I am guessing the answer is no. If so, once a combo is found you move on to the next one.
              When you start searching a document for these combos, the probability of iterations range from 1 to 400. So the iterations will range from 1 to 400. What is the average iteration? That depends on which combos you search for first. If you sort these combos from the most to the least common, you could cut these iterations significantly.
              regex will not change that at all, that is unless these companies have something in common that you incorporate in your format and even if they do, the addresses cannot possibly have anything in common.
              Last edited by G Gabriel; 01-20-2010, 02:24 PM.

              Comment


                #8
                Re: Help with FileFind.GREP() Expression

                I actually got it to work!:D and may have found a v10 bug. If you set the regex option to "I" you must also include the default flag "S". Otherwise, the function will not work even though "S" is the default and the regex option is optional.

                Iv'e inserted the code below. I really like the filefind.grep() method. I just completed a search against my test subject of 530 text documents using 422 search combinations of the variables inserted below. The script performed the search and exported the result criteria into a table in approximately 75 seconds. That's 0.18 seconds per search combination. I'm impressed.

                The POSIX regular expression I ended up using a.> finds the 1st string I am searching for, then, b.> matches all text to the end of the file regardless of the number of lines, special characters, and word characters included, then, c.> searches backwards until it finds the 2nd string I am looking for. If both of the strings are matched, the file name is returned.

                For those of you that enjoy working with expressions as much as I do, I use a $40 program called regex buddy to create my search expressions. I use this instead of the UI included in Alpha and highly recommend it to anyone. In addition to helping with the expression, it has a built in grep routine that will evaluate the expression against a file or directory of files. It also has a real time testing UI that displays the results as you type against a test string.

                Code:
                'dims and preceding code not shown'
                coname = word(linetxt,1,",")
                conadd = word(linetxt, 3, ",")
                coid = word(linetxt,4,",")
                sexpr = "[COLOR=blue][B]\("+coname+"\)[[:cntrl:][:print:]]*\("+conadd+"\)[/B][/COLOR]"
                expression_result = (filefind.grep("*.txt", sexpr , 0 , "$(Filename), $(stop)" + coname + ", " + coid + crlf(),"[COLOR=blue][B]SI[/B][/COLOR]"))
                If (IsNull(expression_result) = .f.) then
                  tbl.populate_from_string("file_name, co_name, co_id", crlf(), expression_result)
                 
                 end if
                Thanks to all for your help.
                Thanks,
                James

                Comment


                  #9
                  Re: Help with FileFind.GREP() Expression

                  All is well that ends well. Just for the sake of offering other alternative to those who are avert to regex:
                  You could use along with what was mentioned before other functions such as at() or scansmatch()

                  Comment


                    #10
                    Re: Help with FileFind.GREP() Expression

                    IRA, I like your idea. Even though I have already fixed my issue, I would like to try what you suggested with the single file properly indexed. Is it correct to assume the process would look something like this:

                    1. Read all txt documents to 1 file. Will StringScanner do?
                    2. As each txt document is read into file write file name and corresponding placement (line numbers) that result into a seperate list to establish index.
                    3. Start search on 1 of 422 search combinations and as each match is found:

                    3a. Return filename based on index
                    3b. Return company name based on match
                    3c. Skip to start of next file based on index
                    3d. repeat until eof.

                    4. Repeat until end of 422.

                    I like it. Any suggestions? I will let you know what the results are. Thanks for your help.
                    Thanks,
                    James

                    Comment


                      #11
                      Re: Help with FileFind.GREP() Expression

                      Gabriel sorry for the late reply. You can tell I don't post much by the lingo I use. Also, my background is in forensic auditing which has its own dictionary. What I meant by iterations was the number of times (min) a search string hits the target. I use this as a baseline to estimate processing time before the script is compiled. In this case, each of the 1,000 documents must have at least one match from the search criteria. Unfortunately, if you use just the company name, you may get more than one match as the documents occassionally include the names of other company's. Address is used as the unique identifying criteria b/c it contains digits and letters and has the highest probability to return a unique match from the data we have. With this in mind, each search combination must be compared against each document. Even if a document has been matched, I am required to check it against the other search combinations to ensure that our logic is conclusive. therefore, 400 search compared against 1000 documents will have a minimum of 400,000 iterations. If you can combine the documents into one, or vise versa, without jepordizing the identity or flow of the original document, your iterations would drop significantly.

                      thanks for your help.
                      Thanks,
                      James

                      Comment


                        #12
                        Re: Help with FileFind.GREP() Expression

                        Hi James,

                        1st, I don't think the "S" option in regex as a default is an A5V10 bug, but if you have an example that highlights a difference, that might be helpful. Typically it's a regex expression issue, not the option.

                        Originally posted by jhackney View Post
                        IRA, I like your idea. Even though I have already fixed my issue, I would like to try what you suggested with the single file properly indexed.
                        I've processed 65 Meg files with no problems. However, depending upon the length of the strings some string functions start to get real slow as size increases. This may predicate the requirement to search smaller pieces, but you should test.

                        Originally posted by jhackney View Post
                        Is it correct to assume the process would look something like this:

                        1. Read all txt documents to 1 file. Will StringScanner do?
                        I don't recommend StringScanner at all. I don't even think Alpha uses it internally. It is slow compared to other methods (but may have been fast in earlier versions that had no regex etc)

                        Originally posted by jhackney View Post
                        2. As each txt document is read into file write file name and corresponding placement (line numbers) that result into a seperate list to establish index.
                        3. Start search on 1 of 422 search combinations and as each match is found:

                        3a. Return filename based on index
                        3b. Return company name based on match
                        3c. Skip to start of next file based on index
                        3d. repeat until eof.

                        4. Repeat until end of 422.

                        I like it. Any suggestions? I will let you know what the results are. Thanks for your help.
                        I would recommend that for # 3 that you do a regex_grep of the entire file. If you place a unique delimiter (e.g. chr(28) See my tips here) between each file that you read, then you could search for the delimiter, any text 1, company, any text 2, address and return the line # of the match (which will be the line of the delimiter). Now convert the line number back to the name of the file. Use WORDAT() to find the line number in a list of file line number starts, and then use WORD() to extract the name of the corresponding filename placed into a filename list, and append the Company name you were searching to each item
                        Regards,

                        Ira J. Perlow
                        Computer Systems Design


                        CSDA A5 Products
                        New - Free CSDA DiagInfo - v1.39, 30 Apr 2013
                        CSDA Barcode Functions

                        CSDA Code Utility
                        CSDA Screen Capture


                        Comment

                        Working...
                        X