Announcement

Collapse

The Alpha Software Forum Participation Guidelines

The Alpha Software Forum is a free forum created for Alpha Software Developer Community to ask for help, exchange ideas, and share solutions. Alpha Software strives to create an environment where all members of the community can feel safe to participate. In order to ensure the Alpha Software Forum is a place where all feel welcome, forum participants are expected to behave as follows:
  • Be professional in your conduct
  • Be kind to others
  • Be constructive when giving feedback
  • Be open to new ideas and suggestions
  • Stay on topic


Be sure all comments and threads you post are respectful. Posts that contain any of the following content will be considered a violation of your agreement as a member of the Alpha Software Forum Community and will be moderated:
  • Spam.
  • Vulgar language.
  • Quotes from private conversations without permission, including pricing and other sales related discussions.
  • Personal attacks, insults, or subtle put-downs.
  • Harassment, bullying, threatening, mocking, shaming, or deriding anyone.
  • Sexist, racist, homophobic, transphobic, ableist, or otherwise discriminatory jokes and language.
  • Sexually explicit or violent material, links, or language.
  • Pirated, hacked, or copyright-infringing material.
  • Encouraging of others to engage in the above behaviors.


If a thread or post is found to contain any of the content outlined above, a moderator may choose to take one of the following actions:
  • Remove the Post or Thread - the content is removed from the forum.
  • Place the User in Moderation - all posts and new threads must be approved by a moderator before they are posted.
  • Temporarily Ban the User - user is banned from forum for a period of time.
  • Permanently Ban the User - user is permanently banned from the forum.


Moderators may also rename posts and threads if they are too generic or do not property reflect the content.

Moderators may move threads if they have been posted in the incorrect forum.

Threads/Posts questioning specific moderator decisions or actions (such as "why was a user banned?") are not allowed and will be removed.

The owners of Alpha Software Corporation (Forum Owner) reserve the right to remove, edit, move, or close any thread for any reason; or ban any forum member without notice, reason, or explanation.

Community members are encouraged to click the "Report Post" icon in the lower left of a given post if they feel the post is in violation of the rules. This will alert the Moderators to take a look.

Alpha Software Corporation may amend the guidelines from time to time and may also vary the procedures it sets out where appropriate in a particular case. Your agreement to comply with the guidelines will be deemed agreement to any changes to it.



Bonus TIPS for Successful Posting

Try a Search First
It is highly recommended that a Search be done on your topic before posting, as many questions have been answered in prior posts. As with any search engine, the shorter the search term, the more "hits" will be returned, but the more specific the search term is, the greater the relevance of those "hits". Searching for "table" might well return every message on the board while "tablesum" would greatly restrict the number of messages returned.

When you do post
First, make sure you are posting your question in the correct forum. For example, if you post an issue regarding Desktop applications on the Mobile & Browser Applications board , not only will your question not be seen by the appropriate audience, it may also be removed or relocated.

The more detail you provide about your problem or question, the more likely someone is to understand your request and be able to help. A sample database with a minimum of records (and its support files, zipped together) will make it much easier to diagnose issues with your application. Screen shots of error messages are especially helpful.

When explaining how to reproduce your problem, please be as detailed as possible. Describe every step, click-by-click and keypress-by-keypress. Otherwise when others try to duplicate your problem, they may do something slightly different and end up with different results.

A note about attachments
You may only attach one file to each message. Attachment file size is limited to 2MB. If you need to include several files, you may do so by zipping them into a single archive.

If you forgot to attach your files to your post, please do NOT create a new thread. Instead, reply to your original message and attach the file there.

When attaching screen shots, it is best to attach an image file (.BMP, .JPG, .GIF, .PNG, etc.) or a zip file of several images, as opposed to a Word document containing the screen shots. Because Word documents are prone to viruses, many message board users will not open your Word file, therefore limiting their ability to help you.

Similarly, if you are uploading a zipped archive, you should simply create a .ZIP file and not a self-extracting .EXE as many users will not run your EXE file.
See more
See less

scraper data too large for variable how to cycle past unwanted data

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • scraper data too large for variable how to cycle past unwanted data

    I have a wp scraper that I have built and for quite a while it has ran good, however now there is too much meta data and unneeded text on the html page to put into a variable so data is getting cut off and lost and not working.

    This is a 2 part script that runs a function I wrote.
    Code:
    'Date Created: 29-Sep-2012 08:15:44 PM
    'Last Updated: 28-Mar-2017 11:46:41 AM
    'Created By  : Steven
    'Updated By  : steve
    dim shared tbl as P
    dim shared p3 as waitdialog
    'dim count as N
    'a5.command("VIEW_TRACE")
    
    tbl = table.open("catdata")
    
    
    tbl.fetch_first()
    
    WHILE .NOT. tbl.fetch_eof()
    	
    	DIM shared street as C
    	DIM shared zipcode as C
    	dim shared city as c
    	dim shared state as c
    	p3.create(1,"repeating")
    	p3.set_message("Automated Scrape Currently Scraping ")
    	p3.Pause()
    	p3.Set_Color("red")
    p3.resume()
    street = tbl.Street
    zipcode = tbl.Zipcode
    city = tbl.City
    state = tbl.State
    sdate = tbl.Saledate	
    	autowp()
    	tbl.fetch_next()
    END WHILE
    
    
    
    tbl.close()
    p3.Close()
    
    
    DIM Shared varP_leadsbr as P
    DIM layout_name as c 
    layout_name = "leadsbr@c:\srgypgrabber\whiteleads.ddd"
    DIM tempP as p
    'Get pointer to existing window. In case layout_name is qualified with a dictionary name, extract up to first @. In case formname has spaces, normalize it
    tempP=obj(":"+object_Name_normalize(word(layout_name,1,"@")))
    'Test if pointer is valid
    if is_object(tempP) then 
    	'Test if pointer refers to a form or browse
    	if tempP.class() = "form" .or. tempP.class() = "browse" then 
    		'If so, then activate the already open window
    		tempP.activate()
    		
    	else
    		'Window is not already open, so open it
    		varP_leadsbr = :Browse.view(layout_name)
    		
    
    	end if
    else 
    	varP_leadsbr = :Browse.view(layout_name)
    	
    
    end if
    that is the script to get the data to search with

    this is the function that I wrote to run

    Code:
    'Date Created: 14-Mar-2015 01:57:58 AM
    'Last Updated: 31-Aug-2017 10:21:30 PM
    'Created By  : Steven
    'Updated By  : steve
    FUNCTION autowp AS C ( )
    	dim shared tbl as p
    
    dim dmain as c
    dim cc as c
    dim street as c
    dim zipcode as c
    dim shared city as c
    dim shared state as c
    dim shared saledate as c
    'dim shared casenum as c
    'casenum=tbl.casenum
    saledate=tbl.saledate
    street=tbl.Street
    zipcode=tbl.Zipcode
    city=tbl.city
    state=tbl.state
    'dmain="https://people.yellowpages.com/whitepages/address?street=1000+park+avenue&qloc=fairmont+nc+28340"
    dmain = "https://people.yellowpages.com/whitepages/address?street=" +alltrim(street)+"+" +"&qloc=" +alltrim(city)+"+" +alltrim(state)+"+"+alltrim(zipcode)
    cc = http_get_page2(dmain)
    
    dim srgstring as c
    dim reslts as c
    dim co as c
    co = ""
    co = EXTRACT_STRING( cc,"<div class=\"result-top-left-detail\">","</strong>",1)
    dim coe as c
    coe= extract_string(co,"<strong>"," ")
    	if coe = "page" then
    'reslts="a" 
    dim reslts1 as C
    dim phtrim as c
    dim nmtrim as c
    dim zptrim as c
    dim zipdone as c
    dim st1 as c
    dim en1 as c 
    st1="class=\"\""
    en1="<div class=\"address-map\">"
    dim en2 as c
    en2="</a>"
    reslts1 = EXTRACT_STRING( cc,st1,en1 )
    nmtrim = EXTRACT_STRING(reslts1,">",en2)
    phtrim =EXTRACT_STRING(reslts1,"(","<")
    'zptrim =EXTRACT_STRING(reslts1,"<div class=\"address\">","</div> ")
    'zipdone = right(zptrim,5)
    
    
    
    
    dim ltbl as p
    ltbl=table.open("whiteleads")
    ltbl.enter_begin()
    ltbl.Listed_Name = nmtrim
    ltbl.Street = street
    ltbl.City = city
    ltbl.State = state
    ltbl.Zipcode = zipcode
    ltbl.Phone = phtrim
    ltbl.Saledate = saledate
    'ltbl.Casenum = casenum
    ltbl.enter_end(.t.)
    ltbl.close()
    
    	else 
    	end if
    	
    	
    END FUNCTION
    I tested in the interactive window and realized that it was getting cut off due to variable size
    https://www.housingeducator.org
    k3srg

  • #2
    Re: scraper data too large for variable how to cycle past unwanted data

    You could use SAVE_TO_FILE() to capture the page.

    save_to_file(http_get_page2(dmain),drive_path_name_ext)

    Then loop through the text in the file.
    There can be only one.

    Comment


    • #3
      Re: scraper data too large for variable how to cycle past unwanted data

      I think that is what I did the first time I wrote the code and got lazy speeding it up gotta look thru my old code and see if I can find it.

      I like the Highlander Siggy stan. Thx gonna try to run it that way
      https://www.housingeducator.org
      k3srg

      Comment


      • #4
        Re: scraper data too large for variable how to cycle past unwanted data

        I don't think the problem is with the variable. You are checking for the section between

        class="" and <div class="address-map"

        but the page is now using <div class="address-map full-profile--present">

        as the end of that section.
        There can be only one.

        Comment


        • #5
          Re: scraper data too large for variable how to cycle past unwanted data

          I looked at that too but I was thinking that it would stop the string scanner when it got to the first part and stop after the space after map
          https://www.housingeducator.org
          k3srg

          Comment


          • #6
            Re: scraper data too large for variable how to cycle past unwanted data

            I misstated.

            Your ending delimiter is

            <div class="address-map">

            The closing > is what is causing the extract to fail. the space after map would be sufficient if not for the >.
            There can be only one.

            Comment


            • #7
              Re: scraper data too large for variable how to cycle past unwanted data

              It's still not grabbing the results there is a bunch of meta data i gotta figure out how to get past.
              https://www.housingeducator.org
              k3srg

              Comment


              • #8
                Re: scraper data too large for variable how to cycle past unwanted data

                I was wrong that <div class="address-map" would work because the ending delimiter contains the second quote mark which is not present in the variable contents. In other words, extract_string is trying to find <div class="address-map", not <div class="address-map. You'll need to change to use

                <div class="address-map full-profile--present">
                or
                <div class="address-map full-profile--present"

                Code:
                en1="<div class=\"address-map full-profile--present\">"
                ? EXTRACT_STRING( cc,st1,en1 )
                = Aisa Hajdarevic</a>
                                  <div class="address">
                               1229 Shannon WAY #WA, Bowling Green, KY 42101            </div>
                                  <div class="phone">
                               (270) 999-9999            </div>
                Phone number changed to protect the innocent.
                There can be only one.

                Comment


                • #9
                  Re: scraper data too large for variable how to cycle past unwanted data

                  Winner Winner chicken dinner. It's working again thx stan for all the help
                  https://www.housingeducator.org
                  k3srg

                  Comment

                  Working...
                  X