Alpha Software Mobile Development Tools:   Alpha Anywhere    |   Alpha TransForm subscribe to our YouTube Channel  Follow Us on LinkedIn  Follow Us on Twitter  Follow Us on Facebook

Announcement

Collapse

The Alpha Software Forum Participation Guidelines

The Alpha Software Forum is a free forum created for Alpha Software Developer Community to ask for help, exchange ideas, and share solutions. Alpha Software strives to create an environment where all members of the community can feel safe to participate. In order to ensure the Alpha Software Forum is a place where all feel welcome, forum participants are expected to behave as follows:
  • Be professional in your conduct
  • Be kind to others
  • Be constructive when giving feedback
  • Be open to new ideas and suggestions
  • Stay on topic


Be sure all comments and threads you post are respectful. Posts that contain any of the following content will be considered a violation of your agreement as a member of the Alpha Software Forum Community and will be moderated:
  • Spam.
  • Vulgar language.
  • Quotes from private conversations without permission, including pricing and other sales related discussions.
  • Personal attacks, insults, or subtle put-downs.
  • Harassment, bullying, threatening, mocking, shaming, or deriding anyone.
  • Sexist, racist, homophobic, transphobic, ableist, or otherwise discriminatory jokes and language.
  • Sexually explicit or violent material, links, or language.
  • Pirated, hacked, or copyright-infringing material.
  • Encouraging of others to engage in the above behaviors.


If a thread or post is found to contain any of the content outlined above, a moderator may choose to take one of the following actions:
  • Remove the Post or Thread - the content is removed from the forum.
  • Place the User in Moderation - all posts and new threads must be approved by a moderator before they are posted.
  • Temporarily Ban the User - user is banned from forum for a period of time.
  • Permanently Ban the User - user is permanently banned from the forum.


Moderators may also rename posts and threads if they are too generic or do not property reflect the content.

Moderators may move threads if they have been posted in the incorrect forum.

Threads/Posts questioning specific moderator decisions or actions (such as "why was a user banned?") are not allowed and will be removed.

The owners of Alpha Software Corporation (Forum Owner) reserve the right to remove, edit, move, or close any thread for any reason; or ban any forum member without notice, reason, or explanation.

Community members are encouraged to click the "Report Post" icon in the lower left of a given post if they feel the post is in violation of the rules. This will alert the Moderators to take a look.

Alpha Software Corporation may amend the guidelines from time to time and may also vary the procedures it sets out where appropriate in a particular case. Your agreement to comply with the guidelines will be deemed agreement to any changes to it.



Bonus TIPS for Successful Posting

Try a Search First
It is highly recommended that a Search be done on your topic before posting, as many questions have been answered in prior posts. As with any search engine, the shorter the search term, the more "hits" will be returned, but the more specific the search term is, the greater the relevance of those "hits". Searching for "table" might well return every message on the board while "tablesum" would greatly restrict the number of messages returned.

When you do post
First, make sure you are posting your question in the correct forum. For example, if you post an issue regarding Desktop applications on the Mobile & Browser Applications board , not only will your question not be seen by the appropriate audience, it may also be removed or relocated.

The more detail you provide about your problem or question, the more likely someone is to understand your request and be able to help. A sample database with a minimum of records (and its support files, zipped together) will make it much easier to diagnose issues with your application. Screen shots of error messages are especially helpful.

When explaining how to reproduce your problem, please be as detailed as possible. Describe every step, click-by-click and keypress-by-keypress. Otherwise when others try to duplicate your problem, they may do something slightly different and end up with different results.

A note about attachments
You may only attach one file to each message. Attachment file size is limited to 2MB. If you need to include several files, you may do so by zipping them into a single archive.

If you forgot to attach your files to your post, please do NOT create a new thread. Instead, reply to your original message and attach the file there.

When attaching screen shots, it is best to attach an image file (.BMP, .JPG, .GIF, .PNG, etc.) or a zip file of several images, as opposed to a Word document containing the screen shots. Because Word documents are prone to viruses, many message board users will not open your Word file, therefore limiting their ability to help you.

Similarly, if you are uploading a zipped archive, you should simply create a .ZIP file and not a self-extracting .EXE as many users will not run your EXE file.
See more
See less

How to avoid bot hitting my server, requesting old images & filling up my error logs?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    How to avoid bot hitting my server, requesting old images & filling up my error logs?

    I used to have a page that contained several .png images but I converted them to .jpg to speed up loading the page.

    In my server errorlogs I notice that a bot about every 4 minutes is crawling my site & requesting those old .png images

    I never entered the site in Google webmasters or any other service.

    What can I possibly do so that they stop hitting my server for those old images ?

    The ip addresses involved are 66.249.75.101 & 66.249.75.149

    This has been going on for quite some time now after that I changed the images from .png to .jpg so it seems these bots are of the die hard type...

    Anybody can shed some light on this ?

    Thx
    Frank

    Tell me and I'll forget; show me and I may remember; involve me and I'll understand

    #2
    Re: How to avoid bot hitting my server, requesting old images & filling up my error l

    Add a new rule to Block them in your firewall on port 80.
    Steve Wood
    See my profile on IADN

    Comment


      #3
      Re: How to avoid bot hitting my server, requesting old images & filling up my error l

      yes but then I'd exclude them from crawling my site at all...
      Frank

      Tell me and I'll forget; show me and I may remember; involve me and I'll understand

      Comment


        #4
        Re: How to avoid bot hitting my server, requesting old images & filling up my error l

        I could be wrong (you can check) but I don't think crawlers look for a specific file by name, they crawl your entire site. That is unless you have a sitemap.xml file on your site that lists those specific files by name. Again, not an SEO expert. If those are Google bots, then you should be able to research at Google about how their bots work.

        EDIT: I just checked my google webmaster account and see there is an option to remove specific URL's from crawling. If you do not have an account, create one at https://www.google.com/webmasters. Go through the process to add your site, then find Optimization > Remove URL's.
        Steve Wood
        See my profile on IADN

        Comment


          #5
          Re: How to avoid bot hitting my server, requesting old images & filling up my error l

          Steve,
          It's really weird. I did check with Google webmasters but what URL should I remove; the page still exists, it's only the images on that page that have changed type from .png to .jpg
          I have no sitemap.xml file yet. All very weird and a science on it's own I guess. I could of course put back those .png files to avoid my errorlog filling up but then I'd be serving those images every 4 minutes.
          In fact I'm working for ALL my images to be served by another server as a CDN to take load away from my A5 WAS.
          Any help is welcome!
          Frank

          Tell me and I'll forget; show me and I may remember; involve me and I'll understand

          Comment


            #6
            Re: How to avoid bot hitting my server, requesting old images & filling up my error l

            I don't believe Google is going to hit your site every four minutes. Perhaps it is something else, here is an example - you know those images at the bottom of all IADN member posts here, says IADN|Member like mine. That image resides on my alphadevnet.com server. So each and every time anyone on this forum reviews a page that has that image, it is recorded on MY alphadevnet.com access log as a hit. If I were to remove that image from my server, it would register as a 404-Missing error.

            Another example, a developer had several PDF's on their server advertizing tires I think. The URL to those PDF's got out to the websphere (on purpuse) but as a result, their server received thousands of requests for those resources. When he removed the PDF's, they registered as thousands of 404-Missing errors. The constant tug at his website for large PDF's was killing his response time for real users.
            Steve Wood
            See my profile on IADN

            Comment


              #7
              Re: How to avoid bot hitting my server, requesting old images & filling up my error l

              Interesting...

              This is from my access log: (the IP address seems to be a Google BOT)
              66.249.78.101 - - [30/Jan/2013:11:59:12 +0100] "GET /css/GrGray/PanelLeftExpand.png?A5v11StagingSessionId=e906aa75f041486984e351ff08cb758f HTTP/1.1" 500 1121

              This is from my error log:
              [Wed Jan 30 11:59:12 2013] [] (C:\A5v11Webroot\Staging\css\GrGray\PanelLeftExpand.png)
              [Wed Jan 30 11:59:12 2013] [Internal Server Error] (C:\A5v11Webroot\Staging\css\GrGray\PanelLeftExpand.png) Script Error: Error:Script:&quot; /css/GrGray/PanelLeftExpand.png&quot; line:22<br/>
              A5WINCLUDE &quot;nojs.a5w&quot; <br/>
              a5w_include unable to open file &#039;nojs.a5w&#039;
              <br/>
              Not found


              The error here is because earlier on I used the GrGray style and that has changed

              Thank you for all your suggestions; keep them coming! I need to understand this!
              Frank

              Tell me and I'll forget; show me and I may remember; involve me and I'll understand

              Comment


                #8
                Re: How to avoid bot hitting my server, requesting old images &amp; filling up my error l

                I see two errors. The first is that you have an existing page, maybe the index.a5w page, that is referenceing those PNG files, and those images are not present. That does not mean google is trying to access the images, it means your existing page is incorrectly referencing those images.

                Second is that you have an a5w_include("nojs.a5w") on some page and that a5w page is missing.
                Steve Wood
                See my profile on IADN

                Comment


                  #9
                  Re: How to avoid bot hitting my server, requesting old images &amp; filling up my error l

                  I was wondering about that as well. I have searched my whole project: there is no reference to those images anymore. nosj.a5w exists but it was set to always denied in web security (Always denied should be ok though as it is included) Now this nojs.a5w page is somewhat special as I use it in a trick to detect if Javascript is enabled or not. Search bots don't use JS so something is going on. My goal for detecting if JS is enabled yes or no is to always display a clean page , if it is not enabled I show a static page telling the user to enable JS. I should maybe exempt those IP adresses from that trick ? All very difficult.
                  Frank

                  Tell me and I'll forget; show me and I may remember; involve me and I'll understand

                  Comment


                    #10
                    Re: How to avoid bot hitting my server, requesting old images &amp; filling up my error l

                    I created a robots.txt file to disallow the images folder & css folder. Problem continues. Switched on raw http log: it's the google-image bot apparently not honoring robots.txt

                    GET /images/twitter.png?A5v11StagingSessionId=d87531061c9f4f42aea64cd744ea272d HTTP/1.1
                    Host: <removed>
                    Connection: Keep-alive
                    Accept: */*
                    From: googlebot(at)googlebot.com
                    User-Agent: Googlebot-Image/1.0
                    Accept-Encoding: gzip,deflate
                    ****************************************
                    30/Jan/2013:14:49:09 +0100
                    Thread ID: 6dfd283fb20b49f2b0c8b47d3abe91fe
                    Request Sequence: 2
                    Socket Handle: 1440
                    KeepAlive Sequence: 1
                    Last edited by Lenny Forziati; 02-21-2013, 01:16 PM. Reason: removed host at Franks's request
                    Frank

                    Tell me and I'll forget; show me and I may remember; involve me and I'll understand

                    Comment


                      #11
                      Re: How to avoid bot hitting my server, requesting old images &amp; filling up my error l

                      Hmmm seems robots.txt is not immediately honored; I see in webmasters that it is now honoring my robots.txt file from 12 hrs ago

                      Will keep it going for a while & see what happens. Still makes me wonder why the image bot keeps looking for those old images though
                      Frank

                      Tell me and I'll forget; show me and I may remember; involve me and I'll understand

                      Comment


                        #12
                        Re: How to avoid bot hitting my server, requesting old images &amp; filling up my error l

                        Turns out that indeed google bots don't first check your robots.txt file & then decide to crawl or not. They read the robots file I think every 24 hours or so and from then on -if your robots.txt file has changed- they will behave accordingly. At least that is what I saw for a couple of days but now I'm seeing the image bot again trying to GET old images in my images directory although I disalolowed that. Can't understand that.
                        Frank

                        Tell me and I'll forget; show me and I may remember; involve me and I'll understand

                        Comment

                        Working...
                        X