I have a wp scraper that I have built and for quite a while it has ran good, however now there is too much meta data and unneeded text on the html page to put into a variable so data is getting cut off and lost and not working.
This is a 2 part script that runs a function I wrote.
that is the script to get the data to search with
this is the function that I wrote to run
I tested in the interactive window and realized that it was getting cut off due to variable size
This is a 2 part script that runs a function I wrote.
Code:
'Date Created: 29-Sep-2012 08:15:44 PM 'Last Updated: 28-Mar-2017 11:46:41 AM 'Created By : Steven 'Updated By : steve dim shared tbl as P dim shared p3 as waitdialog 'dim count as N 'a5.command("VIEW_TRACE") tbl = table.open("catdata") tbl.fetch_first() WHILE .NOT. tbl.fetch_eof() DIM shared street as C DIM shared zipcode as C dim shared city as c dim shared state as c p3.create(1,"repeating") p3.set_message("Automated Scrape Currently Scraping ") p3.Pause() p3.Set_Color("red") p3.resume() street = tbl.Street zipcode = tbl.Zipcode city = tbl.City state = tbl.State sdate = tbl.Saledate autowp() tbl.fetch_next() END WHILE tbl.close() p3.Close() DIM Shared varP_leadsbr as P DIM layout_name as c layout_name = "[email protected]:\srgypgrabber\whiteleads.ddd" DIM tempP as p 'Get pointer to existing window. In case layout_name is qualified with a dictionary name, extract up to first @. In case formname has spaces, normalize it tempP=obj(":"+object_Name_normalize(word(layout_name,1,"@"))) 'Test if pointer is valid if is_object(tempP) then 'Test if pointer refers to a form or browse if tempP.class() = "form" .or. tempP.class() = "browse" then 'If so, then activate the already open window tempP.activate() else 'Window is not already open, so open it varP_leadsbr = :Browse.view(layout_name) end if else varP_leadsbr = :Browse.view(layout_name) end if
this is the function that I wrote to run
Code:
'Date Created: 14-Mar-2015 01:57:58 AM 'Last Updated: 31-Aug-2017 10:21:30 PM 'Created By : Steven 'Updated By : steve FUNCTION autowp AS C ( ) dim shared tbl as p dim dmain as c dim cc as c dim street as c dim zipcode as c dim shared city as c dim shared state as c dim shared saledate as c 'dim shared casenum as c 'casenum=tbl.casenum saledate=tbl.saledate street=tbl.Street zipcode=tbl.Zipcode city=tbl.city state=tbl.state 'dmain="https://people.yellowpages.com/whitepages/address?street=1000+park+avenue&qloc=fairmont+nc+28340" dmain = "https://people.yellowpages.com/whitepages/address?street=" +alltrim(street)+"+" +"&qloc=" +alltrim(city)+"+" +alltrim(state)+"+"+alltrim(zipcode) cc = http_get_page2(dmain) dim srgstring as c dim reslts as c dim co as c co = "" co = EXTRACT_STRING( cc,"<div class=\"result-top-left-detail\">","</strong>",1) dim coe as c coe= extract_string(co,"<strong>"," ") if coe = "page" then 'reslts="a" dim reslts1 as C dim phtrim as c dim nmtrim as c dim zptrim as c dim zipdone as c dim st1 as c dim en1 as c st1="class=\"\"" en1="<div class=\"address-map\">" dim en2 as c en2="</a>" reslts1 = EXTRACT_STRING( cc,st1,en1 ) nmtrim = EXTRACT_STRING(reslts1,">",en2) phtrim =EXTRACT_STRING(reslts1,"(","<") 'zptrim =EXTRACT_STRING(reslts1,"<div class=\"address\">","</div> ") 'zipdone = right(zptrim,5) dim ltbl as p ltbl=table.open("whiteleads") ltbl.enter_begin() ltbl.Listed_Name = nmtrim ltbl.Street = street ltbl.City = city ltbl.State = state ltbl.Zipcode = zipcode ltbl.Phone = phtrim ltbl.Saledate = saledate 'ltbl.Casenum = casenum ltbl.enter_end(.t.) ltbl.close() else end if END FUNCTION
Comment