Alpha Software Mobile Development Tools:   Alpha Anywhere    |   Alpha TransForm subscribe to our YouTube Channel  Follow Us on LinkedIn  Follow Us on Twitter  Follow Us on Facebook

Announcement

Collapse

The Alpha Software Forum Participation Guidelines

The Alpha Software Forum is a free forum created for Alpha Software Developer Community to ask for help, exchange ideas, and share solutions. Alpha Software strives to create an environment where all members of the community can feel safe to participate. In order to ensure the Alpha Software Forum is a place where all feel welcome, forum participants are expected to behave as follows:
  • Be professional in your conduct
  • Be kind to others
  • Be constructive when giving feedback
  • Be open to new ideas and suggestions
  • Stay on topic


Be sure all comments and threads you post are respectful. Posts that contain any of the following content will be considered a violation of your agreement as a member of the Alpha Software Forum Community and will be moderated:
  • Spam.
  • Vulgar language.
  • Quotes from private conversations without permission, including pricing and other sales related discussions.
  • Personal attacks, insults, or subtle put-downs.
  • Harassment, bullying, threatening, mocking, shaming, or deriding anyone.
  • Sexist, racist, homophobic, transphobic, ableist, or otherwise discriminatory jokes and language.
  • Sexually explicit or violent material, links, or language.
  • Pirated, hacked, or copyright-infringing material.
  • Encouraging of others to engage in the above behaviors.


If a thread or post is found to contain any of the content outlined above, a moderator may choose to take one of the following actions:
  • Remove the Post or Thread - the content is removed from the forum.
  • Place the User in Moderation - all posts and new threads must be approved by a moderator before they are posted.
  • Temporarily Ban the User - user is banned from forum for a period of time.
  • Permanently Ban the User - user is permanently banned from the forum.


Moderators may also rename posts and threads if they are too generic or do not property reflect the content.

Moderators may move threads if they have been posted in the incorrect forum.

Threads/Posts questioning specific moderator decisions or actions (such as "why was a user banned?") are not allowed and will be removed.

The owners of Alpha Software Corporation (Forum Owner) reserve the right to remove, edit, move, or close any thread for any reason; or ban any forum member without notice, reason, or explanation.

Community members are encouraged to click the "Report Post" icon in the lower left of a given post if they feel the post is in violation of the rules. This will alert the Moderators to take a look.

Alpha Software Corporation may amend the guidelines from time to time and may also vary the procedures it sets out where appropriate in a particular case. Your agreement to comply with the guidelines will be deemed agreement to any changes to it.



Bonus TIPS for Successful Posting

Try a Search First
It is highly recommended that a Search be done on your topic before posting, as many questions have been answered in prior posts. As with any search engine, the shorter the search term, the more "hits" will be returned, but the more specific the search term is, the greater the relevance of those "hits". Searching for "table" might well return every message on the board while "tablesum" would greatly restrict the number of messages returned.

When you do post
First, make sure you are posting your question in the correct forum. For example, if you post an issue regarding Desktop applications on the Mobile & Browser Applications board , not only will your question not be seen by the appropriate audience, it may also be removed or relocated.

The more detail you provide about your problem or question, the more likely someone is to understand your request and be able to help. A sample database with a minimum of records (and its support files, zipped together) will make it much easier to diagnose issues with your application. Screen shots of error messages are especially helpful.

When explaining how to reproduce your problem, please be as detailed as possible. Describe every step, click-by-click and keypress-by-keypress. Otherwise when others try to duplicate your problem, they may do something slightly different and end up with different results.

A note about attachments
You may only attach one file to each message. Attachment file size is limited to 2MB. If you need to include several files, you may do so by zipping them into a single archive.

If you forgot to attach your files to your post, please do NOT create a new thread. Instead, reply to your original message and attach the file there.

When attaching screen shots, it is best to attach an image file (.BMP, .JPG, .GIF, .PNG, etc.) or a zip file of several images, as opposed to a Word document containing the screen shots. Because Word documents are prone to viruses, many message board users will not open your Word file, therefore limiting their ability to help you.

Similarly, if you are uploading a zipped archive, you should simply create a .ZIP file and not a self-extracting .EXE as many users will not run your EXE file.
See more
See less

Programmable Web Crawler

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Programmable Web Crawler

    I originally started out with a problem I thought was related to a dim limitation. It appears it is not, but I haven't been able to get the enclosed code to do what I want. I am trying to build a Web Crawler, collect the data from the web site and check the database against it for any hits.

    In the enclosed code I've never been able to get the line
    x = WAIT_UNTIL(topparent:ACTIVEX1.activex.busy,2)
    to work properly so I added
    xbasic_wait_for_idle(3)
    which seems to make it for the most part
    I implemented Lenny's suggestion to use
    *concat(search_term,Inputdata)

    After running you can see that the string only has 4 pages of the web site. This is why I originally thought there was a limitation in the string size. I put this up here for the community to offer suggestions so that we might have something for the code archive.

    I'm sure there are more members of the community that have use for a programmable web crawler than me.

    You can unzip to a folder called demo and all should work. I hope all who participate will offer improvements or constructive criticism to make this thread worthwile reading. I think Dr Peter Wayne could offer monumental assistance because of his activex experience.

    I am not a programmer so getting the code this far was blind luck. Much trial and error. Makes me wish I knew what I was doing.

    #2
    Re: Programmable Web Crawler

    I'm a little confused.

    Can we get more step by step directions on what to do?
    Al Buchholz
    Bookwood Systems, LTD
    Weekly QReportBuilder Webinars Thursday 1 pm CST

    Occam's Razor - KISS
    Normalize till it hurts - De-normalize till it works.
    Advice offered and questions asked in the spirit of learning how to fish is better than someone giving you a fish.
    When we triage a problem it is much easier to read sample systems than to read a mind.
    "Make it as simple as possible, but not simpler."
    Albert Einstein

    http://www.iadn.com/images/media/iadn_member.png

    Comment


      #3
      Re: Programmable Web Crawler

      Download and unzip the file to demo. Then run it with alpha5. After that look at the code. At that point you will see that I am not a programmer, but I hope I have the beginning of some useful code for the community but I need help, suggestions and comments.

      Comment


        #4
        Re: Programmable Web Crawler

        John, I'm confused! Did I miss something? I downloaded your demo and opened it. Run the button.

        Where can I find the "x = WAIT_UNTIL(topparent:ACTIVEX1.activex.busy,2)"

        ------ edit afterwards ----

        Found it in Pinellasclick, can you tell me what you are doing there. ;)

        Where did you get this code from? You made it yourself or is it translated from from VB(A)?

        I think it is time efficient that when an ActiveX is used we name it or comment it with the PID. now I search for a yellow monkey in the forest.

        Found it "Microsoft Web Browser", alias SHdocVW.

        Where did you get the Microsoft Web Browser object model documentation from?
        Last edited by Marcel Kollenaar; December 22, 2006, 11:34 PM.
        Marcel

        I hear and I forget. I see and I remember. I do and I understand.
        ---- Confusius ----

        Comment


          #5
          Re: Programmable Web Crawler

          I am not a programmer so getting the code this far was blind luck. Much trial and error. Makes me wish I knew what I was doing.
          You're a lucky, guy :) Must have taken a lot of time.
          Marcel

          I hear and I forget. I see and I remember. I do and I understand.
          ---- Confusius ----

          Comment


            #6
            Re: Programmable Web Crawler

            I made this code by myself. I'm sure thats why is confusing. I got the web browser from the example using active x on a form.
            If you have ie6 on your system it should run ok.
            OnInit sets up the web address
            after it goes to a web page.
            OnActivate runs pinellas click
            Pinellas click goes from web page to web page until it encounters an error. After it gets to the web page the contents of the page are stored in a string. That is where I am having problems. After the fourth web page the string dosen't accept any more data.
            The wait states make it work.
            This should accomplish what you are trying to do.
            OnKey closes the form when the esc key is pressed

            Comment


              #7
              Re: Programmable Web Crawler

              Thanks John.
              Marcel

              I hear and I forget. I see and I remember. I do and I understand.
              ---- Confusius ----

              Comment


                #8
                Re: Programmable Web Crawler

                http://en.wikipedia.org/wiki/DOM_Events


                I got all my info(training) online. Above is an address to get you started.

                Comment


                  #9
                  Re: Programmable Web Crawler

                  Originally posted by jkukuda View Post
                  http://en.wikipedia.org/wiki/DOM_Events


                  I got all my info(training) online. Above is an address to get you started.
                  Thanks for the link John!

                  The code is in some way familiar to me. I've done something some years ago with javascript and DHTML in Netscape when I build websites by hand. It had also a document object etc.

                  In your Alpga Five Demo you use an ActiveX Microsoft Internet Browse element. This object has a 'document' property.

                  I was amazed about constructions as these:

                  Code:
                  topparent:ACTIVEX1.activex.document.[COLOR="Red"]getElementById[/COLOR]("dgResults").childNodes.item(0).childNodes.item(21).childNodes.item(0).childNodes.item(2).click()
                  For instance the red colore getElementById(). When I use the Automation Object Browser I can find the document property but then the getElementById() method I can not find it. And that confuses me. I looked also in Word VBA if I can see this method but I see nothing.

                  And normally I'm used to split such constructions in e.g.

                  Code:
                  dim Browse as P
                  dim Doc as P
                  dim Element as P
                  Browse := topparent:ACTIVEX1.activex
                  Doc := Browse.Document
                  Element":= Doc.getElementById("dgResults") ???
                  'And so on...
                  getElementById() is a DOM method, it looks to me that I see ActiveX methods and properties mixed with javascript/vbscript (DOM) constructions and xbasic together.

                  Maybe you found an undocumented feature in Xbasic but IMHO this will not work.
                  Last edited by Marcel Kollenaar; December 25, 2006, 09:31 PM. Reason: IMHO added
                  Marcel

                  I hear and I forget. I see and I remember. I do and I understand.
                  ---- Confusius ----

                  Comment


                    #10
                    Re: Programmable Web Crawler

                    Marcel, I really appreciate your looking at the code. It does work, I run it all the time, I just don't like the way I have to kludge it with the wait states.
                    Some of the stuff there I got with a DOM inspector that comes right from the web page. Its all way over my head and I don't know what possessed me to hang with it till I got it to work. Now I'd like to make it work properly.

                    Comment


                      #11
                      Re: Programmable Web Crawler

                      John,

                      What are you going to do with this once you get it to work? What is the purpose? I guess I just don't get it?

                      Dave Mason
                      Dave Mason
                      [email protected]
                      Skype is dave.mason46

                      Comment


                        #12
                        Re: Programmable Web Crawler

                        Originally posted by jkukuda View Post
                        Marcel, I really appreciate your looking at the code. It does work, I run it all the time, I just don't like the way I have to kludge it with the wait states.
                        Some of the stuff there I got with a DOM inspector that comes right from the web page. Its all way over my head and I don't know what possessed me to hang with it till I got it to work. Now I'd like to make it work properly.
                        I'm the first who believe it when you say so. I've seen more surprizes with Xbasic and do learn every day. I think we have to split it up in parts to see what parts don't work the way you want.

                        But when it keeping halted at a specific point and it does not run through al the code lines. What then?

                        The most simple action is to put a debug(1) statement in a routine and see where and when it stops.
                        Marcel

                        I hear and I forget. I see and I remember. I do and I understand.
                        ---- Confusius ----

                        Comment


                          #13
                          Re: Programmable Web Crawler

                          John,
                          I like what you've started but I don't know where you got the syntax

                          x = WAIT_UNTIL(topparent:ACTIVEX1.activex.busy,2)

                          It seems to me this is the exact opposite of what you want. Isn't the web browser busy while it is downloading? Don't you want something like
                          x=WAIT_UNTIL(.not. topparent:ACTIVEX1.activex.busy,1,5)

                          which would keep checking the browser once a second until the download is complete, or timeout after 5 seconds?

                          I very much like the way you fill in input fields and press buttons in web pages. I didn't know this was possible and you've opened up some interesting concepts that I am eager to try out.
                          Thanks for posting your code!

                          Comment


                            #14
                            Re: Programmable Web Crawler

                            John,

                            I pulled my old (1998) O'Reilly book 'Dynamic HTML - The Definitive Reference by Danny Goodman' from the bookshelf and did some tests with isolated Xbasic code and indeed you get access to the Document Object Model (DOM) of the browsed page by the stacked pointerstructure of the ActiveX element. That is marvellous! As I told you above, you're a lucky boy, not a programmer (your words), but by accident you found a new xbasic access type for the web pages.

                            At first I didn't know and was confused by the long syntax lines. Your code creates some nice new possibilities for referencing the web browser page objects and work with it. I'll do also a test if this works from Word and Excel.

                            I think Peter found new stuff for a new chapter in his next Xbasic book. :)
                            Last edited by Marcel Kollenaar; December 27, 2006, 04:34 PM.
                            Marcel

                            I hear and I forget. I see and I remember. I do and I understand.
                            ---- Confusius ----

                            Comment


                              #15
                              Re: Programmable Web Crawler

                              I use the tools on the link http://www.ieinspector.com/
                              With them I can go to a web page and get the info I need to plug into the alpha browser.
                              The purpose of doing this is when I started with http_get I couldn't get to certain sites because they required passwords and several pages to go thru before I got to the data I wanted.I then began to look for a programmable web crawler. Never finding what I was looking for I found the sample from alpha of a web browser on a form. After much experimenting I finally started to put code in the on activate event. I then began to get some control over the browser. After that I learned a lot of how to control or get info from the browser. It is often much simpler to use than http_get.

                              Using this technique I've pretty much designed a programmable web browser and have direct control thru alpha. I can automatically put data in the database without typing (data from a web page)

                              To me that is the next step in building a database. The app I've been developing eniminates the data entry dept. That saves money and justifies buying my software.

                              I can't wait to try Peter's suggestion as thats the only problem I have left before I can unleash a number of apps.

                              Comment

                              Working...
                              X