Alpha Software Mobile Development Tools:   Alpha Anywhere    |   Alpha TransForm subscribe to our YouTube Channel  Follow Us on LinkedIn  Follow Us on Twitter  Follow Us on Facebook

Announcement

Collapse

The Alpha Software Forum Participation Guidelines

The Alpha Software Forum is a free forum created for Alpha Software Developer Community to ask for help, exchange ideas, and share solutions. Alpha Software strives to create an environment where all members of the community can feel safe to participate. In order to ensure the Alpha Software Forum is a place where all feel welcome, forum participants are expected to behave as follows:
  • Be professional in your conduct
  • Be kind to others
  • Be constructive when giving feedback
  • Be open to new ideas and suggestions
  • Stay on topic


Be sure all comments and threads you post are respectful. Posts that contain any of the following content will be considered a violation of your agreement as a member of the Alpha Software Forum Community and will be moderated:
  • Spam.
  • Vulgar language.
  • Quotes from private conversations without permission, including pricing and other sales related discussions.
  • Personal attacks, insults, or subtle put-downs.
  • Harassment, bullying, threatening, mocking, shaming, or deriding anyone.
  • Sexist, racist, homophobic, transphobic, ableist, or otherwise discriminatory jokes and language.
  • Sexually explicit or violent material, links, or language.
  • Pirated, hacked, or copyright-infringing material.
  • Encouraging of others to engage in the above behaviors.


If a thread or post is found to contain any of the content outlined above, a moderator may choose to take one of the following actions:
  • Remove the Post or Thread - the content is removed from the forum.
  • Place the User in Moderation - all posts and new threads must be approved by a moderator before they are posted.
  • Temporarily Ban the User - user is banned from forum for a period of time.
  • Permanently Ban the User - user is permanently banned from the forum.


Moderators may also rename posts and threads if they are too generic or do not property reflect the content.

Moderators may move threads if they have been posted in the incorrect forum.

Threads/Posts questioning specific moderator decisions or actions (such as "why was a user banned?") are not allowed and will be removed.

The owners of Alpha Software Corporation (Forum Owner) reserve the right to remove, edit, move, or close any thread for any reason; or ban any forum member without notice, reason, or explanation.

Community members are encouraged to click the "Report Post" icon in the lower left of a given post if they feel the post is in violation of the rules. This will alert the Moderators to take a look.

Alpha Software Corporation may amend the guidelines from time to time and may also vary the procedures it sets out where appropriate in a particular case. Your agreement to comply with the guidelines will be deemed agreement to any changes to it.



Bonus TIPS for Successful Posting

Try a Search First
It is highly recommended that a Search be done on your topic before posting, as many questions have been answered in prior posts. As with any search engine, the shorter the search term, the more "hits" will be returned, but the more specific the search term is, the greater the relevance of those "hits". Searching for "table" might well return every message on the board while "tablesum" would greatly restrict the number of messages returned.

When you do post
First, make sure you are posting your question in the correct forum. For example, if you post an issue regarding Desktop applications on the Mobile & Browser Applications board , not only will your question not be seen by the appropriate audience, it may also be removed or relocated.

The more detail you provide about your problem or question, the more likely someone is to understand your request and be able to help. A sample database with a minimum of records (and its support files, zipped together) will make it much easier to diagnose issues with your application. Screen shots of error messages are especially helpful.

When explaining how to reproduce your problem, please be as detailed as possible. Describe every step, click-by-click and keypress-by-keypress. Otherwise when others try to duplicate your problem, they may do something slightly different and end up with different results.

A note about attachments
You may only attach one file to each message. Attachment file size is limited to 2MB. If you need to include several files, you may do so by zipping them into a single archive.

If you forgot to attach your files to your post, please do NOT create a new thread. Instead, reply to your original message and attach the file there.

When attaching screen shots, it is best to attach an image file (.BMP, .JPG, .GIF, .PNG, etc.) or a zip file of several images, as opposed to a Word document containing the screen shots. Because Word documents are prone to viruses, many message board users will not open your Word file, therefore limiting their ability to help you.

Similarly, if you are uploading a zipped archive, you should simply create a .ZIP file and not a self-extracting .EXE as many users will not run your EXE file.
See more
See less

Guidance with extract_all_strings function

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Guidance with extract_all_strings function

    I cant seem to get the "extract_all_strings" function to work properly.

    here is the code:

    html_text = extract_all_strings(html_text,"name=\"ngr\"","name=\"lonb\"")

    plain_text = *html_to_plain(html_text)

    htm.write_line(plain_text)

    for example of input htm file see attached file [Input html.doc]

    text in red is what I am trying to extract.

    what I am getting is in attached file [Output Text.txt]

    any ideas welcomed.
    --
    Support your local Search and Rescue Unit, Get Lost!

    www.westrowops.co.uk

    #2
    I think you misunderstand what extract_all_strings() does. You are asking it to give you everything between name="ngr" and name="lonb", which it seems to be doing just fine. It is not parsing your HTML and looking at tags, words, or anything else, just ASCII content.

    You need to identify the text immediately preceeding and immediately following the text you want to extract and use those as your delimiters. I don't see any common delimiters around all blocks, so you're probably going to need to makie several calls to extract_string() with different delimiters each time.

    Also if everything in red is what you want to have in plain_text, you cannot use *html_to_plain(). *html_to_plain()'s purpose is to strip all HTML tags from a string.

    Originally posted by Graham Wickens
    I cant seem to get the "extract_all_strings" function to work properly.

    here is the code:

    html_text = extract_all_strings(html_text,"name=\"ngr\"","name=\"lonb\"")

    plain_text = *html_to_plain(html_text)

    htm.write_line(plain_text)

    for example of input htm file see attached file [Input html.doc]

    text in red is what I am trying to extract.

    what I am getting is in attached file [Output Text.txt]

    any ideas welcomed.

    Lenny Forziati
    Vice President, Internet Products and Technical Services
    Alpha Software Corporation

    Comment


      #3
      Ok Lenny, I tried your suggestion, splitting it into separate scans for each value:

      html_text = extract_all_strings(html_text,"name=\"ngr\" value=",">","~")
      plain_text = *html_to_plain(html_text)
      htm.write_line(plain_text)

      html_text = extract_all_strings(html_text,"name=\"latd\" value=",">","~")
      plain_text = *html_to_plain(html_text)
      htm.write_line(plain_text)

      html_text = extract_all_strings(html_text,"name=\"latm\" value=",">","~")
      plain_text = *html_to_plain(html_text)
      htm.write_line(plain_text)

      html_text = extract_all_strings(html_text,"name=\"lats\" value=",">","~")
      plain_text = *html_to_plain(html_text)
      htm.write_line(plain_text)

      html_text = extract_all_strings(html_text,"name=\"lond\" value=",">","~")
      plain_text = *html_to_plain(html_text)
      htm.write_line(plain_text)

      html_text = extract_all_strings(html_text,"name=\"lonm\" value=",">","~")
      plain_text = *html_to_plain(html_text)
      htm.write_line(plain_text)

      html_text = extract_all_strings(html_text,"name=\"lons\" value=",">")
      plain_text = *html_to_plain(html_text)
      htm.write_line(plain_text)

      The first one works, gives me the value I wanted, the rest just give (CRLF) even though I defined "~" instead of (CRLF) on all scans except the last one, so that I had all the related values on one line for subsequent editing!!
      --
      Support your local Search and Rescue Unit, Get Lost!

      www.westrowops.co.uk

      Comment


        #4
        err! I think I found the first mistake. I rerun it with the following code, but it still ignores the "~" and still inserts (CRLF) but at least it now gets the values I was after>..

        html_text = File.to_string(path0+file_name[j])

        html1_text = extract_all_strings(html_text,"name=\"ngr\" value=",">","~")
        plain_text = *html_to_plain(html1_text)
        htm.write_line(plain_text)

        html2_text = extract_all_strings(html_text,"name=\"latd\" value=",">","~")
        plain_text = *html_to_plain(html2_text)
        htm.write_line(plain_text)

        html3_text = extract_all_strings(html_text,"name=\"latm\" value=",">","~")
        plain_text = *html_to_plain(html3_text)
        htm.write_line(plain_text)

        html4_text = extract_all_strings(html_text,"name=\"lats\" value=",">","~")
        plain_text = *html_to_plain(html4_text)
        htm.write_line(plain_text)

        html5_text = extract_all_strings(html_text,"name=\"lond\" value=",">","~")
        plain_text = *html_to_plain(html5_text)
        htm.write_line(plain_text)

        html6_text = extract_all_strings(html_text,"name=\"lonm\" value=",">","~")
        plain_text = *html_to_plain(html6_text)
        htm.write_line(plain_text)

        html7_text = extract_all_strings(html_text,"name=\"lons\" value=",">")
        plain_text = *html_to_plain(html7_text)
        htm.write_line(plain_text)
        --
        Support your local Search and Rescue Unit, Get Lost!

        www.westrowops.co.uk

        Comment


          #5
          First, there's no reason for you to use extract_all_strings(), you should just be using extract_string(). The difference is that extract_all_string() returns all matches. Since you only have a single match for each set of tags, it will be more efficient to stop looking for more matches once the first is found. This is what extract_string() will do for you.

          Second, even if you use extract_all_strings(), there is no need to specify the delimiter as "~". What this would do is insert a "~" between matches instead of a crlf(). But again, you'll only have a single match so you don't need a delimiter at all. You're not getting a crlf() back from extract_all_strings(), you must be getting nothing (""). Your write_line() adds a crlf().

          Third, since all but your first result are plain text, there is no need to use *html_to_plain on them. If your data will also be just numbers such as your sample document, I'd remove those extra calls for efficiency.

          Finally, here's what works for me. I tested this in the Interactive Window, which is a great way to experiment with the expressions instead of writing to a file, opening the file to see if it was right, then starting all over again.

          Code:
          dim html_text as c
          html_text = get_from_file("c:\george.txt")
          ?extract_string(html_text,"name=\"ngr\" value=",">")
          = "T 26489 95989"
          
          ?*html_to_plain(extract_string(html_text,"name=\"ngr\" value=",">"))
          = "T2648995989"
          
          ?extract_string(html_text,"name=\"latd\" value=",">"))
          = "53"
          
          ? extract_string(html_text,"name=\"latm\" value=",">")
          = "0"
          
          ? extract_string(html_text,"name=\"lats\" value=",">")
          = "00"
          
          ? extract_string(html_text,"name=\"lond\" value=",">")
          = "06"
          
          ? extract_string(html_text,"name=\"lonm\" value=",">")
          = "07"
          
          ? extract_string(html_text,"name=\"lons\" value=",">")
          = "00"

          Lenny Forziati
          Vice President, Internet Products and Technical Services
          Alpha Software Corporation

          Comment


            #6
            Thanks for your assistance and patience, I finally got the data I was after with this:

            FOR j = 3 TO i-1
            html_text = File.to_string(path0+file_name[j])
            html1_text = extract_string(html_text,"name=\"ngr\" value=",">")
            grid_text = stritran(html1_text,"& # 032;","")
            if ut(grid_text) = "" then
            goto nextone
            end if
            latd_text = extract_string(html_text,"name=\"latd\" value=",">")
            latm_text = extract_string(html_text,"name=\"latm\" value=",">")
            lats_text = extract_string(html_text,"name=\"lats\" value=",">")
            lond_text = extract_string(html_text,"name=\"lond\" value=",">")
            lonm_text = extract_string(html_text,"name=\"lonm\" value=",">")
            lons_text = extract_string(html_text,"name=\"lons\" value=",">")
            htm.write_line(latd_text+" "+right("00"+latm_text,2)+" "+right("00"+lats_text,2)+" / "+right("000"+lond_text,3)+" "+right("00"+lonm_text,2)+" "+right("00"+lons_text,2)+" = "+left(grid_text,4)+substr(grid_text,5,3))
            nextone:
            statusbar.set_text("Convert HTML to Text for file "+file_name[j])
            next
            Last edited by Graham Wickens; 02-18-2006, 09:26 AM.
            --
            Support your local Search and Rescue Unit, Get Lost!

            www.westrowops.co.uk

            Comment

            Working...
            X